Dataset Classes

class datasets.HASYv2Dataset(**kwargs)[source]

Dataset class for preparing HASYv2Dataset data for network training.

Note that initalizing the class won’t load the data into it just yet. One of the specialized methods need to be called for that.

cross_val(fold: int, train: bool, dataset: Optional[datasets.HASYv2Dataset] = None)[source]

Method for loading data from one fold to the dataset class.

Parameters
  • fold – The number of the fold (1 to 10).

  • train – Whether or not we want to load the training or the validation (test) data from the corresponding fold.

  • dataset – Another HASYv2Dataset object containing the entirety of the data (generated through the for_colab() method). This is needed when the dataset’s individual images are not locally available (like in GoogleDrive). This method will then use this base dataset instead of trying to find the files locally.

for_colab()[source]

Loads the entirety of the dataset into one single class instance.

This is useful for passing the complete dataset around without having to move all of the 160,000+ image files. This is of course particularly useful for training models in Google Colab, since uploading the raw dataset to Google Drive has failed several times.

This function will automatically save the full HASYv2Dataset class in the Dataset’s base directory (self.path["base"]) as “colab_dataset.pkl”, a pickle.