Skip to main content

Dataset versions

Pull down ML datasets along with their labels and train/validation/test splits.

Module methods

sdk.dataset_version.get(dataset_version_id) returns a DatasetVersionResource.

LabelResource

Each LabelResource lazily exposes:

  • mask - A NumPy boolean array of the label mask.
  • image_id, class_id, class_name - The image and class the label belongs to.
  • image_height, image_width - The dimensions of the source image.

Example

dataset_version = sdk.dataset_version.get(dataset_version_id=2929)

print("Train images:", dataset_version.train_image_ids)
print("Class index map:", dataset_version.class_id_to_index_map)

# Inspect an individual label mask
first_label = dataset_version.labels[0]
print(first_label.class_name, first_label.mask.shape)

dataset_version.download(folder_path="/local/downloads/datasets")