Notebook: Iterators¶
In [9]:
Copied!
from ceruleo.dataset.catalog.PHMDataset2018 import PHMDataset2018, FailureType
from ceruleo.dataset.catalog.PHMDataset2018 import PHMDataset2018, FailureType
Load dataset¶
In [10]:
Copied!
dataset = PHMDataset2018(
tools=['01M01', '04M01']
)
dataset = PHMDataset2018(
tools=['01M01', '04M01']
)
Create a transformer for a dataset¶
In [11]:
Copied!
from ceruleo.iterators.iterators import RelativeToEnd
from ceruleo.transformation.features.cast import ToDateTime
from ceruleo.transformation.features.resamplers import IndexMeanResampler
from ceruleo.transformation.features.selection import (
ByNameFeatureSelector,
)
from ceruleo.transformation.features.slicing import SliceRows
from ceruleo.transformation.features.transformation import Clip
from ceruleo.transformation.functional.pipeline.pipeline import make_pipeline
from ceruleo.transformation.functional.transformers import Transformer
from ceruleo.iterators.iterators import RelativeToEnd
from ceruleo.transformation.features.cast import ToDateTime
from ceruleo.transformation.features.resamplers import IndexMeanResampler
from ceruleo.transformation.features.selection import (
ByNameFeatureSelector,
)
from ceruleo.transformation.features.slicing import SliceRows
from ceruleo.transformation.features.transformation import Clip
from ceruleo.transformation.functional.pipeline.pipeline import make_pipeline
from ceruleo.transformation.functional.transformers import Transformer
In [12]:
Copied!
FEATURES = [
'IONGAUGEPRESSURE', 'ETCHBEAMVOLTAGE', 'ETCHBEAMCURRENT',
'ETCHSUPPRESSORVOLTAGE', 'ETCHSUPPRESSORCURRENT', 'FLOWCOOLFLOWRATE',
'FLOWCOOLPRESSURE', 'ETCHGASCHANNEL1READBACK', 'ETCHPBNGASREADBACK',
]
transformer = Transformer(
pipelineX=make_pipeline(
ToDateTime(index=True),
ByNameFeatureSelector(features=FEATURES),
Clip(lower=-6, upper=6),
IndexMeanResampler(rule='120s'),
SliceRows(initial=RelativeToEnd(1500))
),
pipelineY=make_pipeline(
ToDateTime(index=True),
ByNameFeatureSelector(features=['RUL']),
IndexMeanResampler(rule='120s'),
SliceRows(initial=RelativeToEnd(1500))
)
)
transformed_dataset = transformer.fit_map(dataset)
FEATURES = [
'IONGAUGEPRESSURE', 'ETCHBEAMVOLTAGE', 'ETCHBEAMCURRENT',
'ETCHSUPPRESSORVOLTAGE', 'ETCHSUPPRESSORCURRENT', 'FLOWCOOLFLOWRATE',
'FLOWCOOLPRESSURE', 'ETCHGASCHANNEL1READBACK', 'ETCHPBNGASREADBACK',
]
transformer = Transformer(
pipelineX=make_pipeline(
ToDateTime(index=True),
ByNameFeatureSelector(features=FEATURES),
Clip(lower=-6, upper=6),
IndexMeanResampler(rule='120s'),
SliceRows(initial=RelativeToEnd(1500))
),
pipelineY=make_pipeline(
ToDateTime(index=True),
ByNameFeatureSelector(features=['RUL']),
IndexMeanResampler(rule='120s'),
SliceRows(initial=RelativeToEnd(1500))
)
)
transformed_dataset = transformer.fit_map(dataset)
Iterator¶
In [13]:
Copied!
from ceruleo.iterators.iterators import WindowedDatasetIterator, IterationType
from ceruleo.iterators.iterators import WindowedDatasetIterator, IterationType
Forecast iterator¶
The forecast iterator produces as target the values of the Y transformers that start where the X data ends.
In [14]:
Copied!
iterator = WindowedDatasetIterator(
transformed_dataset,
window_size=150,
step=15,
horizon=5,
iteration_type=IterationType.FORECAST # The default value
)
iterator = WindowedDatasetIterator(
transformed_dataset,
window_size=150,
step=15,
horizon=5,
iteration_type=IterationType.FORECAST # The default value
)
In [15]:
Copied!
X, y, sw = next(iterator)
(X.shape, y.shape)
X, y, sw = next(iterator)
(X.shape, y.shape)
Out[15]:
((150, 9), (5, 1))
It is possible to obtain all the data following the order of the shuffler in an numpy matrix. By default all the data is flattented
In [16]:
Copied!
X, y, sw = iterator.get_data()
(X.shape, y.shape, sw.shape)
X, y, sw = iterator.get_data()
(X.shape, y.shape, sw.shape)
Out[16]:
((1678, 1350), (1678, 5), (1678,))
If flatten is False, we can see the shape of the data. X has 1679 samples, of a window size of 150 and 9 features.
In [17]:
Copied!
X, y, sw = iterator.get_data(flatten=False)
(X.shape, y.shape, sw.shape)
X, y, sw = iterator.get_data(flatten=False)
(X.shape, y.shape, sw.shape)
Out[17]:
((1678, 150, 9), (1678, 5), (1678,))
Seq to Seq Iterator¶
The seq to seq iterator will return as a target a window of a same size as the input aligned with it
In [18]:
Copied!
iterator = WindowedDatasetIterator(
transformed_dataset,
window_size=150,
step=15,
iteration_type=IterationType.SEQ_TO_SEQ
)
iterator = WindowedDatasetIterator(
transformed_dataset,
window_size=150,
step=15,
iteration_type=IterationType.SEQ_TO_SEQ
)
In [19]:
Copied!
X, y, sw = next(iterator)
(X.shape, y.shape)
X, y, sw = next(iterator)
(X.shape, y.shape)
Out[19]:
((150, 9), (150, 1))
Batcher¶
In [20]:
Copied!
from ceruleo.iterators.batcher import Batcher
from ceruleo.iterators.batcher import Batcher
In [21]:
Copied!
batcher = Batcher.new(
transformed_dataset,
batch_size=64,
window=150,
step=15,
horizon=5
)
X, y, sw = next(batcher)
(X.shape, y.shape, sw.shape)
batcher = Batcher.new(
transformed_dataset,
batch_size=64,
window=150,
step=15,
horizon=5
)
X, y, sw = next(batcher)
(X.shape, y.shape, sw.shape)
Out[21]:
((64, 150, 9), (64, 5, 1), (64, 1))
In [ ]:
Copied!