_images/maldideepkit_logo.png _images/maldideepkit_logo.png

MaldiDeepKit Documentation#

A catalog of sklearn-compatible deep learning classifiers for MALDI-TOF mass spectrometry. Four PyTorch architectures - MLP (with optional sigmoid-gated attention), 1-D CNN, 1-D ResNet, and 1-D Vision Transformer - wrapped in a unified estimator API, with defaults calibrated for ~6000-bin MALDI-TOF input.


Key Features#

Four Architectures

MLP with optional attention, 1-D CNN, 1-D ResNet, and 1-D Vision Transformer - all calibrated for ~6000-bin MALDI-TOF spectra out of the box.

api/index.html
Sklearn-Compatible

Every classifier implements fit / predict / predict_proba / score / get_params / set_params and plugs into Pipeline, cross_val_score, and GridSearchCV.

quickstart.html#fitting-a-classifier
MaldiSet Integration

Pass a maldiamrkit.MaldiSet directly to fit / predict; MaldiDeepKit duck-types on the DataFrame-like .X attribute, so MaldiSuite’s data model flows end-to-end.

quickstart.html#integration-with-maldiamrkit
Attention Inspection

MaldiMLPClassifier exposes per-sample sigmoid-gated attention via attention_weights_ and get_attention_weights(X).

quickstart.html#inspecting-attention-weights
Training Recipes Built-in

Linear warmup + cosine annealing, AdamW-on-weight-decay dispatch, gradient clipping, AMP, SWA, focal-loss, and threshold tuning. Deep models (ResNet / Transformer) ship with the recipes they need to converge out of the box.

quickstart.html#training-recipe-and-lr-schedule
Leak-Safe Spectral Warping

Pass a Warping (or any sklearn transformer) via warping=; it’s fitted on the training fold only and applied before per-feature standardization during training and inference.

quickstart.html#spectral-warping-pre-scaling
Auto-Scaling

Classifier.from_spectrum(bin_width, input_dim) rescales conv kernels and patches when the spectrum layout deviates from the reference 6000-bin / 3 Da default.

spectrum_scaling.html
Calibration & Threshold Tuning

Post-hoc temperature scaling and balanced-accuracy / F1 / Youden threshold tuning on the validation split, all togglable via classifier kwargs.

quickstart.html#probability-calibration
Uncertainty Quantification

Three drop-in estimators on a shared predict_with_uncertainty interface: Monte Carlo Dropout, Laplace approximation, and split conformal prediction (LAC).

api/uncertainty.html
Reproducible Training

Shared BaseSpectralClassifier seeds Python, NumPy, and PyTorch from random_state so identical configs produce identical weights and predictions.

api/base.html
Strict Persistence

save() writes a state-dict .pt + hyperparameter .json pair (and a sibling .warper.pkl if a warper was fitted); load() fails fast on class or input_dim mismatches.

quickstart.html#persistence
CPU-Friendly

CPU fallback is fully supported and is what the project’s CI runs against; CUDA significantly speeds up training across all four architectures.

installation.html
MaldiSuite Ecosystem

Sibling of MaldiAMRKit (preprocessing) and MaldiBatchKit (batch correction) - three packages sharing the same data model.

papers.html

Quick Example#

import numpy as np
from maldideepkit import MaldiMLPClassifier

rng = np.random.default_rng(0)
X = rng.standard_normal((200, 6000)).astype("float32")
y = rng.integers(0, 2, size=200)

clf = MaldiMLPClassifier(random_state=0)
clf.fit(X, y)

proba = clf.predict_proba(X)
weights = clf.get_attention_weights(X[:10])   # (10, hidden_dim)

Train/test without leakage - the optional spectral warper is fit on the training fold only and applied to held-out samples via the same fitted parameters at predict time:

from sklearn.model_selection import train_test_split
from maldiamrkit.alignment import Warping
from maldideepkit import MaldiCNNClassifier

X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=0,
)

clf = MaldiCNNClassifier(
    warping=Warping(method="shift", n_jobs=-1),
    standardize=True,
    random_state=0,
)
clf.fit(X_train, y_train)            # warper fit on train only
acc = clf.score(X_test, y_test)      # warper reused on test