Output & Metrics

LipiDetective writes all results to a timestamped subdirectory created inside files.output (resolved relative to experiments/ by default). The folder is named LipiDetective_Output_<YYYY_MM_DD_HH_MM_SS>, and a config.yaml copy is saved at its root.

Training Output

Training creates logger subdirectories inside the output folder:

experiments/<output>/LipiDetective_Output_<timestamp>/
├── config.yaml
├── custom_logger/
│   ├── train_metrics.csv                        # Per-epoch loss & accuracy
│   ├── train_predictions.csv                    # Per-batch predictions vs labels
│   ├── validation_metrics.csv                   # Per-epoch validation metrics
│   ├── validation_predictions.csv               # Validation predictions vs labels
│   ├── plot_loss_accuracy_training.png           # Training loss & accuracy curves
│   ├── plot_loss_training.png                    # Training loss curve
│   ├── plot_loss_accuracy_validation.png         # Validation loss & accuracy curves
│   ├── plot_loss_validation.png                  # Validation loss curve
│   ├── confusion_matrix_train.csv                # Training confusion matrix
│   ├── confusion_matrix_val.csv                  # Validation confusion matrix
│   ├── confusion_matrix_heatmap_train.png        # Training confusion matrix heatmap
│   ├── confusion_matrix_heatmap_validation.png   # Validation confusion matrix heatmap
│   ├── train_lipid_metrics.csv                   # Per-lipid precision/recall/F1
│   └── val_lipid_metrics.csv
└── csv_logger/                                   # PyTorch Lightning CSVLogger output

With k-fold cross-validation (training.k > 1), each fold gets its own subdirectory and plot filenames include the fold identifier:

custom_logger/
├── fold_1/
│   ├── train_metrics.csv
│   ├── plot_loss_accuracy_training_fold_1.png
│   ├── confusion_matrix_heatmap_train.png
│   └── ...
├── fold_2/
│   └── ...
└── ...

Metrics logged per epoch:

Loss — Cross-entropy loss (training and validation)
Accuracy — Custom lipid-aware accuracy that evaluates predicted token sequences against ground truth

Testing Output

Testing writes to custom_logger/ (no fold subdirectory):

custom_logger/
├── test_metrics.csv                      # Per-step loss & accuracy
├── test_predictions.csv                  # Predictions vs labels
├── confusion_matrix_test.csv             # Confusion matrix
├── confusion_matrix_heatmap_testing.png  # Confusion matrix heatmap
└── test_lipid_metrics.csv                # Per-lipid precision/recall/F1

Confusion matrices show per-lipid-class performance, helping identify which lipid classes the model confuses. The test_lipid_metrics.csv reports precision, recall, and F1 per lipid species.

Prediction Output

Prediction writes directly to the output folder root (not inside a logger subdirectory):

experiments/<output>/LipiDetective_Output_<timestamp>/
├── config.yaml
└── predictions.csv

The predictions.csv file contains one row per identified spectrum with the following columns:

file — Source mzML file name
polarity — Ion polarity of the spectrum
spectrum_index — Index of the spectrum within the file
precursor — Precursor m/z value
prediction — Predicted lipid nomenclature
confidence — Model confidence score (0–1)

Spectra below the predict.confidence_threshold are omitted by default (set predict.keep_empty: True to include them).

When predict.output is set to "top3", an additional top3_predictions.csv is written with columns:

file — Source mzML file name
spectrum_index — Index of the spectrum within the file
prediction_1, confidence_1 — Top prediction and its confidence
prediction_2, confidence_2 — Second prediction
prediction_3, confidence_3 — Third prediction

Tuning Output

Hyperparameter tuning (Ray Tune) writes trial results to the output folder. Each trial generates training and validation plots with a trial identifier:

experiments/<output>/LipiDetective_Output_<timestamp>/
├── plot_loss_accuracy_training_trial_0.png
├── plot_loss_accuracy_validation_trial_0.png
├── tune_result.txt
└── ...

WandB Integration

When the wandb config section is enabled, all metrics are additionally logged to Weights & Biases for interactive visualization and experiment comparison. Runs are organized by the wandb.group field.

Custom Evaluation

The Evaluator class provides lipid-aware evaluation logic:

class Evaluator(library: LipidLibrary)[source]

evaluate_regression_accuracy(predictions: Any, lipid_info: dict[str, Any]) → tuple[int, int][source]

evaluate_custom_transformer_accuracy(predictions: Any, labels: Any, lipid_name_dict: dict[str, int], is_last_epoch: bool) → tuple[float, int, int, Tensor][source]

generate_prediction_info(label_hg: str, label_sc1: str, label_sc2: str, pred_hg: str, pred_sc1: str, pred_sc2: str, pred_hg_value: float, pred_sc1_value: float, pred_sc2_value: float, batch: int, epoch: int, idx: int) → dict[str, Any][source]

find_nearest_headgroup(value: float) → tuple[str, float][source]

find_nearest_side_chain(value: float) → tuple[str, float][source]