04 - Full pipeline#
End-to-end workflow on the MALDI-Kleb-AI Amikacin task: optional batch-effect correction with MaldiBatchKit, on-the-fly augmentation with SpectrumAugment, stratified cross-validation with a deep classifier.
Uses the same Zenodo dataset as the other notebooks (see notebook 01 for caching).
[1]:
import sys, pathlib
sys.path.insert(0, str(pathlib.Path.cwd().parent))
from notebooks._demo import binary_labels, load_maldi_kleb_ai
demo = load_maldi_kleb_ai(antibiotic='Amikacin', verbose=True)
X, y = binary_labels(demo)
batch = demo.batch.loc[X.index]
print(f'X: {X.shape} | prevalence(R): {y.mean():.2%} | batches: {sorted(batch.unique())}')
X: (741, 6000) | prevalence(R): 49.80% | batches: ['Catania', 'Milan', 'Rome']
Cross-validation with on-the-fly augmentation#
SpectrumAugment is a composable per-batch transform applied only during training. All m/z-axis parameters are in bins, so they stay meaningful across bin widths.
[2]:
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from maldideepkit import MaldiCNNClassifier
from maldideepkit.augment import SpectrumAugment
augment = SpectrumAugment(
noise_std=0.01,
intensity_jitter=0.05,
mz_shift_max_bins=2,
blur_sigma=0.5,
)
scores = []
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)
for fold, (train_idx, test_idx) in enumerate(cv.split(X, y)):
clf = MaldiCNNClassifier.from_spectrum(
bin_width=3, input_dim=X.shape[1],
epochs=20, augment=augment, random_state=fold,
)
clf.fit(X.iloc[train_idx], y.iloc[train_idx])
proba = clf.predict_proba(X.iloc[test_idx])
auroc = roc_auc_score(y.iloc[test_idx], proba[:, 1])
scores.append(auroc)
print(f'fold {fold}: AUROC = {auroc:.3f}')
print(f'mean AUROC = {np.mean(scores):.3f} +/- {np.std(scores):.3f}')
fold 0: AUROC = 0.623
fold 1: AUROC = 0.729
fold 2: AUROC = 0.603
mean AUROC = 0.652 +/- 0.055
Position in the MaldiSuite ecosystem#
MaldiAMRKit - I/O, binning,
MaldiSet(used internally by_demo.load_maldi_kleb_ai).MaldiBatchKit - batch-effect correctors that drop straight into a sklearn pipeline (
Harmony,ComBat,SpeciesAwareComBat,QualityWeightedComBat).MaldiDeepKit (this package) - sklearn-compatible deep classifiers and helpers (
SpectralEnsemble,SpectrumAugment).