Skip to main content

Overview

Introduction

The Datasets page allows users to manage and interact with collections of hyperspectral datasets. Users can view dataset status, labeled classes, and metadata such as sensor details, creation, and modification information.

Clicking a dataset opens the Dataset Details Panel, which includes two tabs:

  • Overview – Displays dataset metadata, settings, and analysis tools
  • Data – Allows users to add, organize, and validate data files for training and testing

Dataset Details Panel: Overview Tab

The Overview tab provides key dataset creation functionality and summarizes metadata and analysis results.

Status, Version, and Metadata

Displays the current dataset version, status, ID, last modification date, and author.

  • Use the outlined Version button to switch versions or start a new one

Settings Panel

The Settings section defines the core configuration of a dataset — including its spectral range, processing level, class setup, and optional reference spectra. These parameters can be modified while the dataset is in Draft status, but become locked once finalized.

  • Wavelengths: Displays the spectral range and total number of bands captured by the dataset. Determined by the sensor; cannot be edited once data is added.
  • Data Level Processed: Indicates the preprocessing level applied to the data — either Reflectance or Radiance.
    • Reflectance: Normalized data that is less sensitive to illumination and sensor variability. Recommended for most use cases.
    • Radiance: Sensor-level output that can capture fine signal details but requires careful preprocessing.
    • Warning: Avoid using uncalibrated radiance data. If radiance is used, apply normalization or scaling to correct for illumination variation during training.
  • Classes: Lists the material classes defined within the dataset. Select only relevant target classes, and whenever possible, include an “Other” or “Background” class to capture non‑target materials likely to appear during production (improves generalization; reduces false classifications).
  • Label Attributes: Allows users to associate additional attributes (e.g., material type or quality indicators) with class labels. Used for regression models.
  • Filter Mask Types: Enables selection of predefined mask types to include or exclude certain pixels during training (e.g., cloud masks, vegetation masks).

Reference Spectra Panel

Reference spectra can be linked directly to classes to support Unmixing or Target Detection models. Each class may have one or more reference spectra assigned.

For each class:

  • Select the corresponding reference spectrum from available sources
  • Ensure spectra are normalized and resampled to match the dataset’s wavelength configuration

Analysis Panel (post‑finalization)

Once a dataset is finalized, the Analysis sub‑panel provides visual summaries and metrics to help assess dataset composition, balance, and spectral diversity.

Average Spectral Signature Plot

Displays the mean spectral curve for each class, allowing users to evaluate how distinct or overlapping material signatures are.

  • Distinct curves → strong separability and good potential for classification
  • Overlapping curves → similar materials, mixed pixels, or labeling inconsistencies

Pixel Count Distribution

Shows the number of pixels per class across all data buckets.

  • Balanced distributions reduce bias toward dominant classes
  • Large imbalances can reduce accuracy for underrepresented classes

PCA Plot (Principal Component Projection)

Projects the dataset into a lower‑dimensional space (typically 2D) using the top principal components. Separate scatter plots are shown for Training, Validation, and Test buckets.

How to interpret:

  • Each point represents a pixel, colored by class label
  • Distinct clusters → strong spectral separation and high‑quality data
  • Overlap → similar spectra or label confusion
  • Consistent distribution across buckets → balanced sampling; major shifts suggest sampling bias or acquisition differences
  • Outliers → potential noise or mislabeled pixels

Dataset Details Panel: Data Tab

The Data tab is used to add hyperspectral image data and organize it into Training, Validation, and Test buckets.

Adding Data

  • Drag and drop files into the Training & Validation or Test boxes
  • Data can be split automatically or manually using the provided slider

Compatible Data

Only compatible files are displayed. Files are compatible when:

  • Center wavelengths match (or can be resampled)
  • Processing level matches the dataset
  • At least one set of finalized labels exists

Incompatible files are hidden by default. Click “Show incompatible files” to view them and hover for detailed incompatibility reasons.