skippa.transformers package

Submodules

skippa.transformers.base module

This contains base / utility classes and functions needed for defining/using transformers

class skippa.transformers.base.ColumnSelector(selector)[source]

Bases: object

This is not a transformer, but a utility class for defining a column set.

class skippa.transformers.base.SkippaMixin[source]

Bases: object

Utility class providing additional methods for custom Skippa transformers.

skippa.transformers.base.columns(*args, include=None, exclude=None, **kwargs)[source]

Helper function for creating a ColumnSelector

Flexible arguments: - include or exclude lists: speak for themselves - dtype_include, dtype_exclude, pattern: dispatched to sklearn’s make_column_selector - otherwise: a list to include, or an existing ColumnSelector

Parameters
  • include (Optional[ColumnExpression], optional) – [description]. Defaults to None.

  • exclude (Optional[ColumnExpression], optional) – [description]. Defaults to None.

Returns

A callable that returns columns names, when called on a df

Return type

ColumnSelector

skippa.transformers.custom module

This defines custom transformers implementing anything other than existing skleafrn treansformers.

class skippa.transformers.custom.SkippaApplier(cols, *args, **kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Transformer for applying arbitrary function (wraps around pandas apply)

fit(X, y=None, **fit_params)[source]

Nothing to do here

transform(X, y=None, **transform_params)[source]

Use pandas.DataFrame.apply method

class skippa.transformers.custom.SkippaAssigner(**kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Transformer for selecting a subset of columns in a df.

fit(X, y=None, **kwargs)[source]
transform(X, y=None, **kwargs)[source]
class skippa.transformers.custom.SkippaCaster(cols, dtype)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Transformer for casting columns to another data type

fit(X, y=None, **kwargs)[source]

Nothing to do here.

transform(X, y=None, **kwargs)[source]

Apply the actual casting using pandas.astype

class skippa.transformers.custom.SkippaConcat(left, right)[source]

Bases: BaseEstimator, SkippaMixin

Concatenate two pipelines.

fit(X, y=None, **kwargs)[source]
transform(X, y=None, **kwargs)[source]
class skippa.transformers.custom.SkippaDateEncoder(cols, **kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Derive date features using pandas datatime’s .dt property.

fit(X, y=None)[source]
transform(X, y=None, **kwargs)[source]
class skippa.transformers.custom.SkippaDateFormatter(cols, **kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Data strings into pandas datetime

fit(X, y=None, **kwargs)[source]

Nothing to do here

transform(X, y=None, **kwargs)[source]

Apply the transformation

class skippa.transformers.custom.SkippaOutlierRemover(cols, factor=1.5)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Detect and remove outliers, based on simple IQR

fit(X, y=None)[source]
transform(X, y=None)[source]
class skippa.transformers.custom.SkippaRenamer(mapping)[source]

Bases: BaseEstimator, TransformerMixin

Transformer for renaming columns

fit(X, y=None, **kwargs)[source]

Look at the df to determine the mapping.

In case of a columnselector + function: evaluate the column names and apply the renaming function

transform(X, y=None, **kwargs)[source]

Apply the actual renaming using pandas.rename

class skippa.transformers.custom.SkippaReplacer(**kwargs)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

fit(X, y=None, **kwargs)[source]
transform(X, y=None, **kwargs)[source]
class skippa.transformers.custom.SkippaSelector(cols)[source]

Bases: BaseEstimator, TransformerMixin, SkippaMixin

Transformer for selecting a subset of columns in a df.

fit(X, y=None, **kwargs)[source]
transform(X, y=None, **kwargs)[source]

skippa.transformers.sklearn module

This implements transformers based on existing sklearn transformers

class skippa.transformers.sklearn.SkippaColumnTransformer(transformers, *, remainder='drop', sparse_threshold=0.3, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=True)[source]

Bases: ColumnTransformer, SkippaMixin

Custom ColumnTransformer. Probably not needed anymore.

fit(X, y=None, **kwargs)[source]

Fit all transformers using X.

Parameters
  • X ({array-like, dataframe} of shape (n_samples, n_features)) – Input data, of which specified subsets are used to fit the transformers.

  • y (array-like of shape (n_samples,...), default=None) – Targets for supervised learning.

Returns

self – This estimator.

Return type

ColumnTransformer

fit_transform(X, y=None)[source]

Fit all transformers, transform the data and concatenate results.

Parameters
  • X ({array-like, dataframe} of shape (n_samples, n_features)) – Input data, of which specified subsets are used to fit the transformers.

  • y (array-like of shape (n_samples,), default=None) – Targets for supervised learning.

Returns

X_t – Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

Return type

{array-like, sparse matrix} of shape (n_samples, sum_n_components)

steps: List[Any]
transform(X, y=None)[source]

Transform X separately by each transformer, concatenate results.

Parameters

X ({array-like, dataframe} of shape (n_samples, n_features)) – The data to be transformed by subset.

Returns

X_t – Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

Return type

{array-like, sparse matrix} of shape (n_samples, sum_n_components)

class skippa.transformers.sklearn.SkippaLabelEncoder(cols, **kwargs)[source]

Bases: SkippaMixin, LabelEncoder

Wrapper round sklearn’s LabelEncoder

fit(X, y=None, **kwargs)[source]

Fit label encoder.

Parameters

y (array-like of shape (n_samples,)) – Target values.

Returns

self – Fitted label encoder.

Return type

returns an instance of self.

transform(X, y=None, **kwargs)[source]

Transform labels to normalized encoding.

Parameters

y (array-like of shape (n_samples,)) – Target values.

Returns

y – Labels as normalized encodings.

Return type

array-like of shape (n_samples,)

class skippa.transformers.sklearn.SkippaMinMaxScaler(cols, **kwargs)[source]

Bases: SkippaMixin, MinMaxScaler

Wrapper round sklearn’s MinMaxScaler

fit(X, y=None, **kwargs)[source]

Compute the minimum and maximum to be used for later scaling.

Parameters
  • X (array-like of shape (n_samples, n_features)) – The data used to compute the per-feature minimum and maximum used for later scaling along the features axis.

  • y (None) – Ignored.

Returns

self – Fitted scaler.

Return type

object

transform(X, y=None, **kwargs)[source]

Scale features of X according to feature_range.

Parameters

X (array-like of shape (n_samples, n_features)) – Input data that will be transformed.

Returns

Xt – Transformed data.

Return type

ndarray of shape (n_samples, n_features)

class skippa.transformers.sklearn.SkippaOneHotEncoder(cols, **kwargs)[source]

Bases: SkippaMixin, OneHotEncoder

Wrapper round sklearn’s OneHotEncoder

fit(X, y=None, **kwargs)[source]

Fit OneHotEncoder to X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – The data to determine the categories of each feature.

  • y (None) – Ignored. This parameter exists only for compatibility with Pipeline.

Returns

Fitted encoder.

Return type

self

transform(X, y=None, **kwargs)[source]

Transform X using one-hot encoding.

If there are infrequent categories for a feature, the infrequent categories will be grouped into a single category.

Parameters

X (array-like of shape (n_samples, n_features)) – The data to encode.

Returns

X_out – Transformed input. If sparse=True, a sparse matrix will be returned.

Return type

{ndarray, sparse matrix} of shape (n_samples, n_encoded_features)

class skippa.transformers.sklearn.SkippaOrdinalEncoder(cols, **kwargs)[source]

Bases: SkippaMixin, OrdinalEncoder

Wrapper round sklearn’s OrdinalEncoder

fit(X, y=None, **kwargs)[source]

Fit the OrdinalEncoder to X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – The data to determine the categories of each feature.

  • y (None) – Ignored. This parameter exists only for compatibility with Pipeline.

Returns

self – Fitted encoder.

Return type

object

transform(X, y=None, **kwargs)[source]

Transform X to ordinal codes.

Parameters

X (array-like of shape (n_samples, n_features)) – The data to encode.

Returns

X_out – Transformed input.

Return type

ndarray of shape (n_samples, n_features)

class skippa.transformers.sklearn.SkippaPCA(cols, **kwargs)[source]

Bases: SkippaMixin, PCA

Wrapper round sklearn’s PCA

fit(X, y=None, **kwargs)[source]

Fit the model with X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Ignored.

Returns

self – Returns the instance itself.

Return type

object

fit_transform(X, y=None, **kwargs)[source]

The PCA parent class has a custom .fit_transform method for some reason.

transform(X, y=None, **kwargs)[source]

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Parameters

X (array-like of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.

Returns

X_new – Projection of X in the first principal components, where n_samples is the number of samples and n_components is the number of the components.

Return type

array-like of shape (n_samples, n_components)

class skippa.transformers.sklearn.SkippaSimpleImputer(cols, **kwargs)[source]

Bases: SkippaMixin, SimpleImputer

Wrapper round sklearn’s SimpleImputer

fit(X, y=None, **kwargs)[source]

Fit the imputer on X.

Parameters
  • X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns

self – Fitted estimator.

Return type

object

transform(X, y=None, **kwargs)[source]

Impute all missing values in X.

Parameters

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The input data to complete.

Returns

X_imputedX with imputed values.

Return type

{ndarray, sparse matrix} of shape (n_samples, n_features_out)

class skippa.transformers.sklearn.SkippaStandardScaler(cols, **kwargs)[source]

Bases: SkippaMixin, StandardScaler

Wrapper round sklearn’s StandardScaler

fit(X, y=None, **kwargs)[source]

Compute the mean and std to be used for later scaling.

Parameters
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data used to compute the mean and standard deviation used for later scaling along the features axis.

  • y (None) – Ignored.

  • sample_weight (array-like of shape (n_samples,), default=None) –

    Individual weights for each sample.

    New in version 0.24: parameter sample_weight support to StandardScaler.

Returns

self – Fitted scaler.

Return type

object

transform(X, y=None, **kwargs)[source]

Perform standardization by centering and scaling.

Parameters
  • X ({array-like, sparse matrix of shape (n_samples, n_features)) – The data used to scale along the features axis.

  • copy (bool, default=None) – Copy the input X or not.

Returns

X_tr – Transformed array.

Return type

{ndarray, sparse matrix} of shape (n_samples, n_features)

skippa.transformers.sklearn.make_skippa_column_transformer(*transformers, remainder='drop', **kwargs)[source]

Custom wrapper around sklearn’s make_column_transformer

Return type

SkippaColumnTransformer

Module contents