Dataset Classes

class datasets.DatasetBase(*args, **kwargs)[source]

Base class for storing datasets that can be readily used for training.

inputs

A list that contains the networks’ inputs, ready to be called by the model. All pre-treatment must be done beforehand or during class initialization.

output

A list that contains the expected outputs for training or validation. They should also be ready to be read by the loss function to be used.

to(device: device)[source]

Method for changing the input and output tensors’ device.

Parameters:

device – A th.device.

report() str[source]

See report().

class datasets.HASYv2Dataset(**kwargs)[source]

Dataset class for preparing HASYv2Dataset data for network training.

Note that initializing the class won’t load the data into it just yet. One of the specialized methods needs to be called for that.

cross_val(fold: int, train: bool, dataset: HASYv2Dataset = None)[source]

Method for loading data from one fold to the dataset class.

Parameters:
  • fold – The number of the fold (1 to 10).

  • train – Whether we want to load the training or the validation (test) data from the corresponding fold.

  • dataset – Another HASYv2Dataset object containing the entirety of the data (generated through the for_colab() method). This is needed when the dataset’s individual images are not locally available (like in GoogleDrive). This method will then use this base dataset instead of trying to find the files locally.

for_colab()[source]

Loads the entirety of the dataset into one single class instance.

This is useful for passing the complete dataset around without having to move all of the 160,000+ image files. This is of course particularly useful for training models in Google Colab, since uploading the raw dataset to Google Drive has failed several times.

This function will automatically save the full HASYv2Dataset class in the Dataset’s base directory (self.path["base"]) as “colab_dataset.pkl”, a pickle.