skippa.transformers package
Submodules
skippa.transformers.base module
This contains base / utility classes and functions needed for defining/using transformers
- class skippa.transformers.base.ColumnSelector(selector)[source]
Bases:
object
This is not a transformer, but a utility class for defining a column set.
- class skippa.transformers.base.SkippaMixin[source]
Bases:
object
Utility class providing additional methods for custom Skippa transformers.
- skippa.transformers.base.columns(*args, include=None, exclude=None, **kwargs)[source]
Helper function for creating a ColumnSelector
Flexible arguments: - include or exclude lists: speak for themselves - dtype_include, dtype_exclude, pattern: dispatched to sklearn’s make_column_selector - otherwise: a list to include, or an existing ColumnSelector
- Parameters
include (Optional[ColumnExpression], optional) – [description]. Defaults to None.
exclude (Optional[ColumnExpression], optional) – [description]. Defaults to None.
- Returns
A callable that returns columns names, when called on a df
- Return type
skippa.transformers.custom module
This defines custom transformers implementing anything other than existing skleafrn treansformers.
- class skippa.transformers.custom.SkippaApplier(cols, *args, **kwargs)[source]
Bases:
BaseEstimator
,TransformerMixin
,SkippaMixin
Transformer for applying arbitrary function (wraps around pandas apply)
- class skippa.transformers.custom.SkippaAssigner(**kwargs)[source]
Bases:
BaseEstimator
,TransformerMixin
,SkippaMixin
Transformer for selecting a subset of columns in a df.
- class skippa.transformers.custom.SkippaCaster(cols, dtype)[source]
Bases:
BaseEstimator
,TransformerMixin
,SkippaMixin
Transformer for casting columns to another data type
- class skippa.transformers.custom.SkippaConcat(left, right)[source]
Bases:
BaseEstimator
,SkippaMixin
Concatenate two pipelines.
- class skippa.transformers.custom.SkippaDateEncoder(cols, **kwargs)[source]
Bases:
BaseEstimator
,TransformerMixin
,SkippaMixin
Derive date features using pandas datatime’s .dt property.
- class skippa.transformers.custom.SkippaDateFormatter(cols, **kwargs)[source]
Bases:
BaseEstimator
,TransformerMixin
,SkippaMixin
Data strings into pandas datetime
- class skippa.transformers.custom.SkippaOutlierRemover(cols, factor=1.5)[source]
Bases:
BaseEstimator
,TransformerMixin
,SkippaMixin
Detect and remove outliers, based on simple IQR
- class skippa.transformers.custom.SkippaRenamer(mapping)[source]
Bases:
BaseEstimator
,TransformerMixin
Transformer for renaming columns
- class skippa.transformers.custom.SkippaReplacer(**kwargs)[source]
Bases:
BaseEstimator
,TransformerMixin
,SkippaMixin
skippa.transformers.sklearn module
This implements transformers based on existing sklearn transformers
- class skippa.transformers.sklearn.SkippaColumnTransformer(transformers, *, remainder='drop', sparse_threshold=0.3, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=True)[source]
Bases:
ColumnTransformer
,SkippaMixin
Custom ColumnTransformer. Probably not needed anymore.
- fit(X, y=None, **kwargs)[source]
Fit all transformers using X.
- Parameters
X ({array-like, dataframe} of shape (n_samples, n_features)) – Input data, of which specified subsets are used to fit the transformers.
y (array-like of shape (n_samples,...), default=None) – Targets for supervised learning.
- Returns
self – This estimator.
- Return type
ColumnTransformer
- fit_transform(X, y=None)[source]
Fit all transformers, transform the data and concatenate results.
- Parameters
X ({array-like, dataframe} of shape (n_samples, n_features)) – Input data, of which specified subsets are used to fit the transformers.
y (array-like of shape (n_samples,), default=None) – Targets for supervised learning.
- Returns
X_t – Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.
- Return type
{array-like, sparse matrix} of shape (n_samples, sum_n_components)
- steps: List[Any]
- transform(X, y=None)[source]
Transform X separately by each transformer, concatenate results.
- Parameters
X ({array-like, dataframe} of shape (n_samples, n_features)) – The data to be transformed by subset.
- Returns
X_t – Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.
- Return type
{array-like, sparse matrix} of shape (n_samples, sum_n_components)
- class skippa.transformers.sklearn.SkippaLabelEncoder(cols, **kwargs)[source]
Bases:
SkippaMixin
,LabelEncoder
Wrapper round sklearn’s LabelEncoder
- class skippa.transformers.sklearn.SkippaMinMaxScaler(cols, **kwargs)[source]
Bases:
SkippaMixin
,MinMaxScaler
Wrapper round sklearn’s MinMaxScaler
- fit(X, y=None, **kwargs)[source]
Compute the minimum and maximum to be used for later scaling.
- Parameters
X (array-like of shape (n_samples, n_features)) – The data used to compute the per-feature minimum and maximum used for later scaling along the features axis.
y (None) – Ignored.
- Returns
self – Fitted scaler.
- Return type
object
- class skippa.transformers.sklearn.SkippaOneHotEncoder(cols, **kwargs)[source]
Bases:
SkippaMixin
,OneHotEncoder
Wrapper round sklearn’s OneHotEncoder
- fit(X, y=None, **kwargs)[source]
Fit OneHotEncoder to X.
- Parameters
X (array-like of shape (n_samples, n_features)) – The data to determine the categories of each feature.
y (None) – Ignored. This parameter exists only for compatibility with
Pipeline
.
- Returns
Fitted encoder.
- Return type
self
- transform(X, y=None, **kwargs)[source]
Transform X using one-hot encoding.
If there are infrequent categories for a feature, the infrequent categories will be grouped into a single category.
- Parameters
X (array-like of shape (n_samples, n_features)) – The data to encode.
- Returns
X_out – Transformed input. If sparse=True, a sparse matrix will be returned.
- Return type
{ndarray, sparse matrix} of shape (n_samples, n_encoded_features)
- class skippa.transformers.sklearn.SkippaOrdinalEncoder(cols, **kwargs)[source]
Bases:
SkippaMixin
,OrdinalEncoder
Wrapper round sklearn’s OrdinalEncoder
- fit(X, y=None, **kwargs)[source]
Fit the OrdinalEncoder to X.
- Parameters
X (array-like of shape (n_samples, n_features)) – The data to determine the categories of each feature.
y (None) – Ignored. This parameter exists only for compatibility with
Pipeline
.
- Returns
self – Fitted encoder.
- Return type
object
- class skippa.transformers.sklearn.SkippaPCA(cols, **kwargs)[source]
Bases:
SkippaMixin
,PCA
Wrapper round sklearn’s PCA
- fit(X, y=None, **kwargs)[source]
Fit the model with X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Ignored.
- Returns
self – Returns the instance itself.
- Return type
object
- fit_transform(X, y=None, **kwargs)[source]
The PCA parent class has a custom .fit_transform method for some reason.
- transform(X, y=None, **kwargs)[source]
Apply dimensionality reduction to X.
X is projected on the first principal components previously extracted from a training set.
- Parameters
X (array-like of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.
- Returns
X_new – Projection of X in the first principal components, where n_samples is the number of samples and n_components is the number of the components.
- Return type
array-like of shape (n_samples, n_components)
- class skippa.transformers.sklearn.SkippaSimpleImputer(cols, **kwargs)[source]
Bases:
SkippaMixin
,SimpleImputer
Wrapper round sklearn’s SimpleImputer
- fit(X, y=None, **kwargs)[source]
Fit the imputer on X.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Not used, present here for API consistency by convention.
- Returns
self – Fitted estimator.
- Return type
object
- class skippa.transformers.sklearn.SkippaStandardScaler(cols, **kwargs)[source]
Bases:
SkippaMixin
,StandardScaler
Wrapper round sklearn’s StandardScaler
- fit(X, y=None, **kwargs)[source]
Compute the mean and std to be used for later scaling.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data used to compute the mean and standard deviation used for later scaling along the features axis.
y (None) – Ignored.
sample_weight (array-like of shape (n_samples,), default=None) –
Individual weights for each sample.
New in version 0.24: parameter sample_weight support to StandardScaler.
- Returns
self – Fitted scaler.
- Return type
object
- transform(X, y=None, **kwargs)[source]
Perform standardization by centering and scaling.
- Parameters
X ({array-like, sparse matrix of shape (n_samples, n_features)) – The data used to scale along the features axis.
copy (bool, default=None) – Copy the input X or not.
- Returns
X_tr – Transformed array.
- Return type
{ndarray, sparse matrix} of shape (n_samples, n_features)