skippa.transformers package

Submodules

skippa.transformers.base module

This contains base / utility classes and functions needed for defining/using transformers

class skippa.transformers.base.ColumnSelector(selector)[source]

Bases: object

This is not a transformer, but a utility class for defining a column set.

class skippa.transformers.base.SkippaMixin[source]

Bases: object

Utility class providing additional methods for custom Skippa transformers.

skippa.transformers.base.columns(*args, include=None, exclude=None, **kwargs)[source]

Helper function for creating a ColumnSelector

Flexible arguments: - include or exclude lists: speak for themselves - dtype_include, dtype_exclude, pattern: dispatched to sklearn’s make_column_selector - otherwise: a list to include, or an existing ColumnSelector

Parameters

include (Optional[ColumnExpression], optional) – [description]. Defaults to None.
exclude (Optional[ColumnExpression], optional) – [description]. Defaults to None.

Returns

A callable that returns columns names, when called on a df

Return type

ColumnSelector

skippa.transformers.custom module

This defines custom transformers implementing anything other than existing skleafrn treansformers.

class skippa.transformers.custom.SkippaApplier(cols, *args, **kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Transformer for applying arbitrary function (wraps around pandas apply)

fit(X, y=None, **fit_params)[source]: Nothing to do here

transform(X, y=None, **transform_params)[source]: Use pandas.DataFrame.apply method

class skippa.transformers.custom.SkippaAssigner(**kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Transformer for selecting a subset of columns in a df.

fit(X, y=None, **kwargs)[source]

transform(X, y=None, **kwargs)[source]

class skippa.transformers.custom.SkippaCaster(cols, dtype)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Transformer for casting columns to another data type

fit(X, y=None, **kwargs)[source]: Nothing to do here.

transform(X, y=None, **kwargs)[source]: Apply the actual casting using pandas.astype

class skippa.transformers.custom.SkippaConcat(left, right)[source]

Bases: BaseEstimator, SkippaMixin

Concatenate two pipelines.

fit(X, y=None, **kwargs)[source]

transform(X, y=None, **kwargs)[source]

class skippa.transformers.custom.SkippaDateEncoder(cols, **kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Derive date features using pandas datatime’s .dt property.

fit(X, y=None)[source]

transform(X, y=None, **kwargs)[source]

class skippa.transformers.custom.SkippaDateFormatter(cols, **kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Data strings into pandas datetime

fit(X, y=None, **kwargs)[source]: Nothing to do here

transform(X, y=None, **kwargs)[source]: Apply the transformation

class skippa.transformers.custom.SkippaOutlierRemover(cols, factor=1.5)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Detect and remove outliers, based on simple IQR

fit(X, y=None)[source]

transform(X, y=None)[source]

class skippa.transformers.custom.SkippaRenamer(mapping)[source]

Bases: BaseEstimator, TransformerMixin

Transformer for renaming columns

fit(X, y=None, **kwargs)[source]

Look at the df to determine the mapping.

In case of a columnselector + function: evaluate the column names and apply the renaming function

transform(X, y=None, **kwargs)[source]: Apply the actual renaming using pandas.rename

class skippa.transformers.custom.SkippaReplacer(**kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

fit(X, y=None, **kwargs)[source]

transform(X, y=None, **kwargs)[source]

class skippa.transformers.custom.SkippaSelector(cols)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Transformer for selecting a subset of columns in a df.

fit(X, y=None, **kwargs)[source]

transform(X, y=None, **kwargs)[source]

skippa.transformers.sklearn module

This implements transformers based on existing sklearn transformers

class skippa.transformers.sklearn.SkippaColumnTransformer(transformers, *, remainder='drop', sparse_threshold=0.3, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=True)[source]

Bases: ColumnTransformer, SkippaMixin

Custom ColumnTransformer. Probably not needed anymore.

fit(X, y=None, **kwargs)[source]

Fit all transformers using X.

Parameters

X ({array-like, dataframe} of shape (n_samples, n_features)) – Input data, of which specified subsets are used to fit the transformers.
y (array-like of shape (n_samples,...), default=None) – Targets for supervised learning.

Returns

self – This estimator.

Return type

ColumnTransformer

fit_transform(X, y=None)[source]

Fit all transformers, transform the data and concatenate results.

Parameters

X ({array-like, dataframe} of shape (n_samples, n_features)) – Input data, of which specified subsets are used to fit the transformers.
y (array-like of shape (n_samples,), default=None) – Targets for supervised learning.

Returns

X_t – Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

Return type

{array-like, sparse matrix} of shape (n_samples, sum_n_components)

steps: List[Any]

transform(X, y=None)[source]

Transform X separately by each transformer, concatenate results.

Parameters: X ({array-like, dataframe} of shape (n_samples, n_features)) – The data to be transformed by subset.
Returns: X_t – Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.
Return type: {array-like, sparse matrix} of shape (n_samples, sum_n_components)

class skippa.transformers.sklearn.SkippaLabelEncoder(cols, **kwargs)[source]

Bases: SkippaMixin, LabelEncoder

Wrapper round sklearn’s LabelEncoder

fit(X, y=None, **kwargs)[source]

Fit label encoder.

Parameters: y (array-like of shape (n_samples,)) – Target values.
Returns: self – Fitted label encoder.
Return type: returns an instance of self.

transform(X, y=None, **kwargs)[source]

Transform labels to normalized encoding.

Parameters: y (array-like of shape (n_samples,)) – Target values.
Returns: y – Labels as normalized encodings.
Return type: array-like of shape (n_samples,)

class skippa.transformers.sklearn.SkippaMinMaxScaler(cols, **kwargs)[source]

Bases: SkippaMixin, MinMaxScaler

Wrapper round sklearn’s MinMaxScaler

fit(X, y=None, **kwargs)[source]

Compute the minimum and maximum to be used for later scaling.

Parameters

X (array-like of shape (n_samples, n_features)) – The data used to compute the per-feature minimum and maximum used for later scaling along the features axis.
y (None) – Ignored.

Returns

self – Fitted scaler.

Return type

object

transform(X, y=None, **kwargs)[source]

Scale features of X according to feature_range.

Parameters: X (array-like of shape (n_samples, n_features)) – Input data that will be transformed.
Returns: Xt – Transformed data.
Return type: ndarray of shape (n_samples, n_features)

class skippa.transformers.sklearn.SkippaOneHotEncoder(cols, **kwargs)[source]

Bases: SkippaMixin, OneHotEncoder

Wrapper round sklearn’s OneHotEncoder

fit(X, y=None, **kwargs)[source]

Fit OneHotEncoder to X.

Parameters

X (array-like of shape (n_samples, n_features)) – The data to determine the categories of each feature.
y (None) – Ignored. This parameter exists only for compatibility with Pipeline.

Returns

Fitted encoder.

Return type

self

transform(X, y=None, **kwargs)[source]

Transform X using one-hot encoding.

If there are infrequent categories for a feature, the infrequent categories will be grouped into a single category.

Parameters: X (array-like of shape (n_samples, n_features)) – The data to encode.
Returns: X_out – Transformed input. If sparse=True, a sparse matrix will be returned.
Return type: {ndarray, sparse matrix} of shape (n_samples, n_encoded_features)

class skippa.transformers.sklearn.SkippaOrdinalEncoder(cols, **kwargs)[source]

Bases: SkippaMixin, OrdinalEncoder

Wrapper round sklearn’s OrdinalEncoder

fit(X, y=None, **kwargs)[source]

Fit the OrdinalEncoder to X.

Parameters

X (array-like of shape (n_samples, n_features)) – The data to determine the categories of each feature.
y (None) – Ignored. This parameter exists only for compatibility with Pipeline.

Returns

self – Fitted encoder.

Return type

object

transform(X, y=None, **kwargs)[source]

Transform X to ordinal codes.

Parameters: X (array-like of shape (n_samples, n_features)) – The data to encode.
Returns: X_out – Transformed input.
Return type: ndarray of shape (n_samples, n_features)

class skippa.transformers.sklearn.SkippaPCA(cols, **kwargs)[source]

Bases: SkippaMixin, PCA

Wrapper round sklearn’s PCA

fit(X, y=None, **kwargs)[source]

Fit the model with X.

Parameters

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Ignored.

Returns

self – Returns the instance itself.

Return type

object

fit_transform(X, y=None, **kwargs)[source]: The PCA parent class has a custom .fit_transform method for some reason.

transform(X, y=None, **kwargs)[source]

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Parameters: X (array-like of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.
Returns: X_new – Projection of X in the first principal components, where n_samples is the number of samples and n_components is the number of the components.
Return type: array-like of shape (n_samples, n_components)

class skippa.transformers.sklearn.SkippaSimpleImputer(cols, **kwargs)[source]

Bases: SkippaMixin, SimpleImputer

Wrapper round sklearn’s SimpleImputer

fit(X, y=None, **kwargs)[source]

Fit the imputer on X.

Parameters

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present here for API consistency by convention.

Returns

self – Fitted estimator.

Return type

object

transform(X, y=None, **kwargs)[source]

Impute all missing values in X.

Parameters: X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The input data to complete.
Returns: X_imputed – X with imputed values.
Return type: {ndarray, sparse matrix} of shape (n_samples, n_features_out)

class skippa.transformers.sklearn.SkippaStandardScaler(cols, **kwargs)[source]

Bases: SkippaMixin, StandardScaler

Wrapper round sklearn’s StandardScaler

fit(X, y=None, **kwargs)[source]

Compute the mean and std to be used for later scaling.

Parameters

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data used to compute the mean and standard deviation used for later scaling along the features axis.
y (None) – Ignored.
sample_weight (array-like of shape (n_samples,), default=None) –
Individual weights for each sample.

New in version 0.24: parameter sample_weight support to StandardScaler.

Returns

self – Fitted scaler.

Return type

object

transform(X, y=None, **kwargs)[source]

Perform standardization by centering and scaling.

Parameters

X ({array-like, sparse matrix of shape (n_samples, n_features)) – The data used to scale along the features axis.
copy (bool, default=None) – Copy the input X or not.

Returns

X_tr – Transformed array.

Return type

{ndarray, sparse matrix} of shape (n_samples, n_features)

skippa.transformers.sklearn.make_skippa_column_transformer(*transformers, remainder='drop', **kwargs)[source]

Custom wrapper around sklearn’s make_column_transformer

Return type: SkippaColumnTransformer

skippa.transformers package

Submodules

skippa.transformers.base module

skippa.transformers.custom module

skippa.transformers.sklearn module

Module contents