01 - Quick start#
Fit MaldiMLPClassifier on real MALDI-TOF spectra and inspect its sklearn-compatible API.
All notebooks in this folder use the MALDI-Kleb-AI dataset (Rocchi et al., 2026; Zenodo DOI 10.5281/zenodo.17405072) - Klebsiella isolates from three Italian clinical centres, with Amikacin / Meropenem AMR labels. notebooks/_demo.py caches the 370 MB tarball under ~/.cache/maldideepkit/ (or $MALDIDEEPKIT_CACHE_DIR) on first use.
1. Load the demo dataset#
[1]:
import sys, pathlib
sys.path.insert(0, str(pathlib.Path.cwd().parent)) # make notebooks/_demo.py importable
from notebooks._demo import binary_labels, load_maldi_kleb_ai
demo = load_maldi_kleb_ai(antibiotic='Amikacin', verbose=True)
X, y = binary_labels(demo) # drop intermediates; y == 1 for resistant
print(f'X: {X.shape} | prevalence(R): {y.mean():.2%} | batches: {sorted(demo.batch.unique())}')
X: (741, 6000) | prevalence(R): 49.80% | batches: ['Catania', 'Milan', 'Rome']
2. Fit a classifier#
MaldiMLPClassifier accepts pandas DataFrames and Series directly. With random_state fixed, runs are reproducible.
[2]:
from sklearn.model_selection import train_test_split
from maldideepkit import MaldiMLPClassifier
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.25, stratify=y, random_state=0)
clf = MaldiMLPClassifier(epochs=30, batch_size=32, random_state=0).fit(X_tr, y_tr)
print(f'test accuracy: {clf.score(X_te, y_te):.3f}')
test accuracy: 0.747
[3]:
import numpy as np
from sklearn.metrics import roc_auc_score
proba = clf.predict_proba(X_te)
print(f'AUROC: {roc_auc_score(y_te, proba[:, 1]):.3f}')
proba[:5]
AUROC: 0.826
[3]:
array([[0.35774884, 0.64225113],
[0.5389345 , 0.46106547],
[0.29883578, 0.7011642 ],
[0.42581147, 0.5741885 ],
[0.32527173, 0.6747283 ]], dtype=float32)
3. Save and reload#
Every classifier persists its weights, fitted state, and the constructor kwargs needed to rebuild the model - save() writes a .pt + .json pair, load() restores both.
[4]:
import tempfile
from pathlib import Path
with tempfile.TemporaryDirectory() as tmpdir:
path = Path(tmpdir) / 'quickstart_mlp'
clf.save(path)
restored = MaldiMLPClassifier.load(path)
np.allclose(restored.predict_proba(X_te[:5]), proba[:5])
[4]:
True