04 - Full pipeline#

End-to-end workflow on the MALDI-Kleb-AI Amikacin task: optional batch-effect correction with MaldiBatchKit, on-the-fly augmentation with SpectrumAugment, stratified cross-validation with a deep classifier.

Uses the same Zenodo dataset as the other notebooks (see notebook 01 for caching).

[1]:
import sys, pathlib
sys.path.insert(0, str(pathlib.Path.cwd().parent))
from notebooks._demo import binary_labels, load_maldi_kleb_ai

demo = load_maldi_kleb_ai(antibiotic='Amikacin', verbose=True)
X, y = binary_labels(demo)
batch = demo.batch.loc[X.index]
print(f'X: {X.shape} | prevalence(R): {y.mean():.2%} | batches: {sorted(batch.unique())}')
X: (741, 6000) | prevalence(R): 49.80% | batches: ['Catania', 'Milan', 'Rome']

Cross-validation with on-the-fly augmentation#

SpectrumAugment is a composable per-batch transform applied only during training. All m/z-axis parameters are in bins, so they stay meaningful across bin widths.

[2]:
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold

from maldideepkit import MaldiCNNClassifier
from maldideepkit.augment import SpectrumAugment

augment = SpectrumAugment(
    noise_std=0.01,
    intensity_jitter=0.05,
    mz_shift_max_bins=2,
    blur_sigma=0.5,
)

scores = []
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)
for fold, (train_idx, test_idx) in enumerate(cv.split(X, y)):
    clf = MaldiCNNClassifier.from_spectrum(
        bin_width=3, input_dim=X.shape[1],
        epochs=20, augment=augment, random_state=fold,
    )
    clf.fit(X.iloc[train_idx], y.iloc[train_idx])
    proba = clf.predict_proba(X.iloc[test_idx])
    auroc = roc_auc_score(y.iloc[test_idx], proba[:, 1])
    scores.append(auroc)
    print(f'fold {fold}: AUROC = {auroc:.3f}')

print(f'mean AUROC = {np.mean(scores):.3f} +/- {np.std(scores):.3f}')
fold 0: AUROC = 0.623
fold 1: AUROC = 0.729
fold 2: AUROC = 0.603
mean AUROC = 0.652 +/- 0.055

Position in the MaldiSuite ecosystem#

  • MaldiAMRKit - I/O, binning, MaldiSet (used internally by _demo.load_maldi_kleb_ai).

  • MaldiBatchKit - batch-effect correctors that drop straight into a sklearn pipeline (Harmony, ComBat, SpeciesAwareComBat, QualityWeightedComBat).

  • MaldiDeepKit (this package) - sklearn-compatible deep classifiers and helpers (SpectralEnsemble, SpectrumAugment).