Models
LipiDetective supports four model architectures. Select the model via the
model field in the config:
model: 'transformer' # or 'convolutional', 'feedforward', 'random_forest'
Transformer (Recommended)
The primary model. Uses an encoder-decoder transformer architecture to generate lipid nomenclature as a token sequence from an input spectrum. The encoder processes the spectrum embedding, and the decoder autoregressively predicts lipid tokens (headgroup, fatty acid chains, etc.).
Configure via the transformer section:
transformer:
d_model: 32 # Embedding dimension (must be divisible by num_heads)
num_heads: 4 # Attention heads
dropout: 0.1
ffn_hidden: 256 # Feed-forward hidden dimension
num_layers: 2 # Encoder/decoder layers
output_seq_length: 11
- class TransformerNetwork(config: dict[str, Any], output_attentions: bool = False)[source]
- forward(src: Tensor, tgt: Tensor) Tensor | tuple[Tensor, list[Tensor]][source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Convolutional Neural Network
A 3-layer CNN for regression tasks on spectral data. Useful as a baseline or for simpler prediction tasks.
- class ConvolutionalNetwork(config: dict[str, Any])[source]
- forward(x: Tensor) Tensor[source]
Forward pass of the convolutional network with three convolutional layers and pooling layers, followed by three fully connected linear layers.
- Parameters:
x (torch.Tensor) – input tensor of features with shape (batch_size, 2, n_peaks+1). Dimension 1 is size 2 as the tensor contains the m/z and intensity values of each peak. Dimension 2 is size n_peaks + 1 as the measurement mode (-1 for negative and +1 for positive) and the precursor mass are added.
- Returns:
output of the convolutional network with shape (batch_size, 3). Corresponds to the three masses of the lipid components (headgroup and two side chains) that are supposed to be predicted.
- Return type:
Feed-Forward Network
A simple fully connected network. Serves as a minimal baseline architecture.
- class FeedForwardNetwork(config: dict[str, Any])[source]
- forward(x: Tensor) Tensor[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Random Forest
A scikit-learn RandomForestClassifier wrapper. Operates outside the
PyTorch Lightning pipeline and handles its own data loading from HDF5 files.
Useful for comparison against deep learning approaches.
- class RandomForest(config: dict[str, Any])[source]
-
- use_single_classifier(train_features: list[Any], train_labels: list[Any], test_features: list[Any]) tuple[Any, RandomForestClassifier][source]
- use_triple_classifier(train_features: list[Any], train_labels: list[Any], test_features: list[Any]) tuple[list[list[Any]], RandomForestClassifier, RandomForestClassifier, RandomForestClassifier][source]
- use_triple_regressor(train_features: list[Any], train_labels: list[Any], test_features: list[Any]) tuple[list[list[Any]], RandomForestRegressor, RandomForestRegressor, RandomForestRegressor][source]