Scalers
Scaling
MinMaxScaler
Bases: TransformerStep
Transform features by scaling each feature to a given range.
This transformer scales and translates each feature individually such that it is in the given range on the training set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
range |
tuple
|
Desired range of transformed data. |
required |
clip |
bool
|
Set to True to clip transformed values of held-out data to provided, by default True |
True
|
fillna |
Optional[float]
|
Wheter to fill NaN with a value |
None
|
name |
Optional[str]
|
Name of the step, by default None |
None
|
Source code in ceruleo/transformation/features/scalers.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
fit(df, y=None)
Compute the dataset's bounds
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The input dataset |
required |
partial_fit(df, y=None)
Compute the dataset's bounds
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The input dataset |
required |
Source code in ceruleo/transformation/features/scalers.py
transform(X)
Scale the input dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input dataset |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame with the same index as the input with the data scaled in the range inserted in input |
Source code in ceruleo/transformation/features/scalers.py
PerCategoricalMinMaxScaler
Bases: TransformerStep
Performs a minmax scaler partition of the data trough some categorical feature
Usually, different execution configurations lead to different scales in the features. Therefore, sometimes it is useful to scale the data based on a categorical feature, to reflect the difference in the execution parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
categorical_feature |
str
|
str The name of the categorical feature whose values are going to be used to split each time-series |
required |
scaler |
Optional[Union[MinMaxScaler, RobustMinMaxScaler]]
|
The scaler to use when scaling the data, by default MinMaxScaler |
MinMaxScaler
|
scaler_params |
dict
|
Parameters used when constructing the scaler, by default {} |
{}
|
name |
Optional[str]
|
Name of the step, by default None |
None
|
Source code in ceruleo/transformation/features/scalers.py
partial_fit(X, y=None)
Fit the scaler
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input dataset |
required |
Source code in ceruleo/transformation/features/scalers.py
transform(X)
Scale the input dataset using the appropriate scaler for each category
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input dataset |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame with the same index as the input with the data scaled with respect to the categorical feature |
Source code in ceruleo/transformation/features/scalers.py
RobustMinMaxScaler
Bases: TransformerStep
Scale features using statistics that are robust to outliers.
This Scaler scales the data according to the quantile range. The IQR is the range between the limits provided, by default, 1st quartile (25th quantile) and the 3rd quartile (75th quantile).
The quantiles are approximated using tdigest
Parameters:
Name | Type | Description | Default |
---|---|---|---|
range |
tuple
|
Desired range of transformed data. |
required |
clip |
bool
|
Set to True to clip transformed values of held-out data to provided, by default True |
True
|
lower_quantile |
float
|
Lower limit of the quantile range to compute the scale, by default 0.25 |
0.25
|
upper_quantile |
float
|
Upper limit of the quantile range to compute the scale, by default 0.75 |
0.75
|
tdigest_size |
Size of the t-digest structure, by default 100 |
required | |
name |
Optional[str]
|
Name of the step, by default None |
None
|
Source code in ceruleo/transformation/features/scalers.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
fit(df, y=None)
Compute the quantiles of the dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The input dataset |
required |
partial_fit(df, y=None)
Compute the quantiles of the dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The input dataset |
required |
transform(X)
Scale the input dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input dataset |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame with the same index as the input with the |
DataFrame
|
data scaled with respect to the quantiles of the fiited dataset |
Source code in ceruleo/transformation/features/scalers.py
RobustStandardScaler
Bases: TransformerStep
Scale features using statistics that are robust to outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
quantile_range |
tuple
|
Desired quantile range of transformed data, by defualt (0.25,0.75) |
(0.25, 0.75)
|
Source code in ceruleo/transformation/features/scalers.py
fit(X, y=None)
Compute the mean of the dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
the input dataset |
required |
Source code in ceruleo/transformation/features/scalers.py
partial_fit(X, y=None)
Compute incrementally the mean of the dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
the input dataset |
required |
Source code in ceruleo/transformation/features/scalers.py
transform(X)
Center the input life
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame with the same index as the input with the data centered with respect to the mean of the fiited dataset |
Source code in ceruleo/transformation/features/scalers.py
ScaleInvRUL
Bases: TransformerStep
Scale binary columns according to the inverse of the RUL.Usually this will be used before a CumSum transformation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rul_column |
str
|
Column with the RUL |
required |
Source code in ceruleo/transformation/features/scalers.py
partial_fit(X)
Fit the scaler
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input dataset |
required |
Source code in ceruleo/transformation/features/scalers.py
transform(X)
Scale the input dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input dataset |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame with the same index as the input with the data scaled with respect to the RUL |
Source code in ceruleo/transformation/features/scalers.py
StandardScaler
Bases: TransformerStep
Standardize features by removing the mean and scaling to unit variance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
Optional[str]
|
Name of the step, by default None |
None
|
Source code in ceruleo/transformation/features/scalers.py
fit(df, y=None)
Compute mean and std of the dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The input dataset |
required |
partial_fit(df, y=None)
Compute mean and std of the dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The input dataset |
required |
Source code in ceruleo/transformation/features/scalers.py
transform(X)
Scale the input dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
DataFrame
|
The input dataset |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A new DataFrame with the same index as the input with the data scaled to have null mean and unit variance |