Data Producer¶
-
class
neural_pipeline.data_producer.data_producer.
DataProducer
(datasets: [<class 'neural_pipeline.data_producer.data_producer.AbstractDataset'>], batch_size: int = 1, num_workers: int = 0)[source]¶ Data Producer. Accumulate one or more datasets and pass it’s data by batches for processing. This use PyTorch builtin
DataLoader
for increase performance of data delivery.Parameters: - datasets – list of datasets. Every dataset might be iterable (contans methods
__getitem__
and__len__
) - batch_size – size of output batch
- num_workers – number of processes, that load data from datasets and pass it for output
-
get_data
(dataset_idx: int, data_idx: int) → object[source]¶ Get single data by dataset idx and data_idx
Parameters: - dataset_idx – index of dataset
- data_idx – index of data in this dataset
Returns: dataset output
-
get_indices
() → [<class 'str'>][source]¶ Get current indices
Returns: list of current indices or None if method set_indices()
doesn’t called
-
get_loader
(indices: [<class 'str'>] = None) → torch.utils.data.dataloader.DataLoader[source]¶ Get PyTorch
DataLoader
object, that aggregateDataProducer
. Ifindices
is specified - DataLoader will output data only by this indices. In this case indices will not passed.Parameters: indices – list of indices. Each item of list is a string in format ‘{}_{}’.format(dataset_idx, data_idx) Returns: DataLoader
object
-
global_shuffle
(is_need: bool) → neural_pipeline.data_producer.data_producer.DataProducer[source]¶ Is need global shuffling. If global shuffling enable - batches will compile from random indices of all datasets. In this case datasets order shuffling was ignoring
Parameters: is_need – is need global shuffling Returns: self object
-
pass_indices
(need_pass: bool) → neural_pipeline.data_producer.data_producer.DataProducer[source]¶ Pass indices of data in every batch. By default disabled
Parameters: need_pass – is need to pass indices
-
pin_memory
(is_need: bool) → neural_pipeline.data_producer.data_producer.DataProducer[source]¶ Is need to pin memory on loading. Pinning memory was increase data loading performance (especially when data loads to GPU) but incompatible with swap
Parameters: is_need – is need Returns: self object
-
set_indices
(indices: [<class 'str'>]) → neural_pipeline.data_producer.data_producer.DataProducer[source]¶ Set indices to
DataProducer
. After that,DataProducer
start produce data only by indicesParameters: indices – list of indices in format “<dataset_idx>_<data_idx>` like: [‘0_0’, ‘0_1’, ‘1_0’] Returns: self object
- datasets – list of datasets. Every dataset might be iterable (contans methods