Welcome to skippa’s documentation!

Introduction

SciKIt-learn Pipeline in PAndas

Want to create a machine learning model using pandas & scikit-learn? This should make your life easier.

Skippa helps you to easily create a pre-processing and modeling pipeline, based on scikit-learn transformers but preserving pandas dataframe format throughout all pre-processing. This makes it a lot easier to define a series of subsequent transformation steps, while referring to columns in your intermediate dataframe.

Installation

$ pip install skippa

Basic use

Skippa helps you to easily define data cleaning & pre-processing operations on a pandas DataFrame and combine it with a scikit-learn model/algorithm into a single executable pipeline. It works roughly like this:

from skippa import Skippa, columns
from sklearn.linear_model import LogisticRegression
pipeline = (
   Skippa()
   .impute(columns(dtype_include='object'), strategy='most_frequent')
   .impute(columns(dtype_include='number'), strategy='median')
   .scale(columns(dtype_include='number'), type='standard')
   .onehot(columns(['category1', 'category2']))
   .model(LogisticRegression())
)
pipeline.fit(X, y)
predictions = pipeline.predict_proba(new_data)

Modules

Top-level package for skippa.

The pipeline module defines the main Skippa methods The transformers subpackage contains various transformers used in the pipeline.

Indices and tables