Output & Metrics
LipiDetective writes all results to a timestamped subdirectory created inside
files.output (resolved relative to experiments/ by default). The folder
is named LipiDetective_Output_<YYYY_MM_DD_HH_MM_SS>, and a config.yaml
copy is saved at its root.
Training Output
Training creates logger subdirectories inside the output folder:
experiments/<output>/LipiDetective_Output_<timestamp>/
├── config.yaml
├── custom_logger/
│ ├── train_metrics.csv # Per-epoch loss & accuracy
│ ├── train_predictions.csv # Per-batch predictions vs labels
│ ├── validation_metrics.csv # Per-epoch validation metrics
│ ├── validation_predictions.csv # Validation predictions vs labels
│ ├── plot_loss_accuracy_training.png # Training loss & accuracy curves
│ ├── plot_loss_training.png # Training loss curve
│ ├── plot_loss_accuracy_validation.png # Validation loss & accuracy curves
│ ├── plot_loss_validation.png # Validation loss curve
│ ├── confusion_matrix_train.csv # Training confusion matrix
│ ├── confusion_matrix_val.csv # Validation confusion matrix
│ ├── confusion_matrix_heatmap_train.png # Training confusion matrix heatmap
│ ├── confusion_matrix_heatmap_validation.png # Validation confusion matrix heatmap
│ ├── train_lipid_metrics.csv # Per-lipid precision/recall/F1
│ └── val_lipid_metrics.csv
└── csv_logger/ # PyTorch Lightning CSVLogger output
With k-fold cross-validation (training.k > 1), each fold gets its own
subdirectory and plot filenames include the fold identifier:
custom_logger/
├── fold_1/
│ ├── train_metrics.csv
│ ├── plot_loss_accuracy_training_fold_1.png
│ ├── confusion_matrix_heatmap_train.png
│ └── ...
├── fold_2/
│ └── ...
└── ...
Metrics logged per epoch:
Loss — Cross-entropy loss (training and validation)
Accuracy — Custom lipid-aware accuracy that evaluates predicted token sequences against ground truth
Testing Output
Testing writes to custom_logger/ (no fold subdirectory):
custom_logger/
├── test_metrics.csv # Per-step loss & accuracy
├── test_predictions.csv # Predictions vs labels
├── confusion_matrix_test.csv # Confusion matrix
├── confusion_matrix_heatmap_testing.png # Confusion matrix heatmap
└── test_lipid_metrics.csv # Per-lipid precision/recall/F1
Confusion matrices show per-lipid-class performance, helping identify which
lipid classes the model confuses. The test_lipid_metrics.csv reports
precision, recall, and F1 per lipid species.
Prediction Output
Prediction writes directly to the output folder root (not inside a logger subdirectory):
experiments/<output>/LipiDetective_Output_<timestamp>/
├── config.yaml
└── predictions.csv
The predictions.csv file contains one row per identified spectrum with the
following columns:
file — Source mzML file name
polarity — Ion polarity of the spectrum
spectrum_index — Index of the spectrum within the file
precursor — Precursor m/z value
prediction — Predicted lipid nomenclature
confidence — Model confidence score (0–1)
Spectra below the predict.confidence_threshold are omitted by default
(set predict.keep_empty: True to include them).
When predict.output is set to "top3", an additional
top3_predictions.csv is written with columns:
file — Source mzML file name
spectrum_index — Index of the spectrum within the file
prediction_1, confidence_1 — Top prediction and its confidence
prediction_2, confidence_2 — Second prediction
prediction_3, confidence_3 — Third prediction
Tuning Output
Hyperparameter tuning (Ray Tune) writes trial results to the output folder. Each trial generates training and validation plots with a trial identifier:
experiments/<output>/LipiDetective_Output_<timestamp>/
├── plot_loss_accuracy_training_trial_0.png
├── plot_loss_accuracy_validation_trial_0.png
├── tune_result.txt
└── ...
WandB Integration
When the wandb config section is enabled, all metrics are additionally
logged to Weights & Biases for interactive visualization
and experiment comparison. Runs are organized by the wandb.group field.
Custom Evaluation
The Evaluator class provides lipid-aware evaluation logic:
- class Evaluator(library: LipidLibrary)[source]
-
- evaluate_custom_transformer_accuracy(predictions: Any, labels: Any, lipid_name_dict: dict[str, int], is_last_epoch: bool) tuple[float, int, int, Tensor][source]