Architectural scaling#

MaldiDeepKit’s default hyperparameters are calibrated for the MaldiAMRKit default layout: bin_width = 3 over the 2000-20000 Da range, giving input_dim = 6000. Two things can change that layout:

  • the user picks a different bin width (wider or finer bins), and

  • the user trims the spectrum to a narrower m/z range, shrinking input_dim independently of bin_width.

These two axes affect different architectural concerns, so MaldiDeepKit scales each knob with whichever quantity it actually depends on. This page documents that semantic split and the from_spectrum() factory that applies it.

Why the defaults look the way they do#

Default conv-kernel width (CNN first layer, ResNet stem) and patch size (Transformer) balance two forces:

  • Local feature extraction. A small kernel or patch lets early layers capture short-range correlations between adjacent bins without collapsing distinct local patterns into a single value.

  • Global feature integration. Keeping the kernel / patch small leaves enough depth (for conv stacks) or token-sequence length (for attention and SSM) for deeper components to integrate long-range structure.

The specific reference values (kernel_size=7, patch_size=4) are inherited from well-tested image-domain defaults (ResNet-18 stem, ViT patch).

Empirically, these values also align cleanly with the peak width distribution on DRIAMS spectra at bin_width=3. We measured the full width at half maximum (FWHM) of all detected peaks on 50 random spectra per species-drug pair (scipy.signal.find_peaks with prominence threshold, scipy.signal.peak_widths at rel_height=0.5):

Peak FWHM on DRIAMS (Da, bin_width=3)#

Pair

Spectra

P10

P25

Median

P75

P90

S. aureus / oxacillin

4874

6.6

9.0

11.8

15.1

21.3

E. coli / ceftriaxone

8080

6.3

8.7

11.8

15.4

21.5

K. pneumoniae / ceftriaxone

5512

6.3

8.6

11.8

15.2

21.1

The distribution is strikingly consistent across the three species: the median peak is ~12 Da wide and the 90th-percentile peak is ~21 Da. This is why the image-domain defaults happen to be a natural fit:

  • patch_size=4 (12 Da at bin_width=3) ≈ one median peak per token, so each Transformer token captures roughly one peak’s worth of shape.

  • kernel_size=7 (21 Da) ≈ 90th-percentile peak FWHM, so the first CNN / ResNet conv fully contains ~90% of peaks along with one or two bins of shoulder on either side.

This is empirical observation, not a priori design. The defaults were picked for the local-vs-global balance first; the alignment with DRIAMS peak widths emerged from the measurement above. Different instruments, different preprocessing, or different bin widths will shift the distribution - use from_spectrum() to scale.

The semantic split#

When the spectrum layout changes, each architectural knob is driven by the quantity it physically depends on:

Knob

Driven by

Why

CNN kernel_size

bin_width

Kernels aggregate per-bin information density. A finer bin grid carries less information per bin, so a wider kernel in bins is needed to gather the same local structure; a coarser grid calls for a smaller kernel.

ResNet stem_kernel_size

bin_width

Same argument as for the CNN: the first layer’s job is to aggregate adjacent bin values, which is a per-bin-density concern.

Transformer patch_size

(none: scale-agnostic)

The Transformer uses a learned positional embedding sized to whatever token count the patch embedding produces, so any input_dim works with the default patch_size=4 without tuning. from_spectrum just forwards input_dim.

The MLP has no spectral-layout-dependent knob either: its first layer is a single Linear(input_dim, hidden_dim) that handles any input size identically.

Auto-scaling: from_spectrum#

Each classifier exposes a from_spectrum classmethod that applies the semantic split automatically:

Classifier.from_spectrum(bin_width: int, input_dim: int, **overrides)

Both parameters describe the spectrum layout. Each classifier uses the relevant one internally:

  • MaldiCNNClassifier and MaldiResNetClassifier scale their conv kernel using bin_width.

  • MaldiTransformerClassifier and MaldiMLPClassifier are architecturally scale-agnostic and just record input_dim for shape validation; from_spectrum forwards overrides untouched.

Any keyword in **overrides wins over the auto-scaled default.

from maldideepkit import (
    MaldiCNNClassifier,
    MaldiTransformerClassifier,
)

# Reference layout: kernel_size=7, patch_size=4
cnn = MaldiCNNClassifier.from_spectrum(bin_width=3, input_dim=6000)
tr = MaldiTransformerClassifier.from_spectrum(bin_width=3, input_dim=6000)

# Trim the spectrum (bin_width unchanged, input_dim halved).
# CNN kernel stays at 7 (bin density unchanged).
# Transformer unchanged (scale-agnostic).
cnn_trim = MaldiCNNClassifier.from_spectrum(bin_width=3, input_dim=3000)
tr_trim = MaldiTransformerClassifier.from_spectrum(bin_width=3, input_dim=3000)

# Coarser binning over the full range.
# CNN kernel drops to 5 (fewer bins carry more info per bin).
cnn_coarse = MaldiCNNClassifier.from_spectrum(bin_width=6, input_dim=3000)

# Fine bins, trimmed.
# CNN kernel up to 21 (fine bins -> larger kernel to aggregate).
cnn_fine = MaldiCNNClassifier.from_spectrum(bin_width=1, input_dim=4000)

# Override the auto-choice explicitly.
cnn_custom = MaldiCNNClassifier.from_spectrum(
    bin_width=3, input_dim=6000, kernel_size=(11, 7, 5, 3),
)

Scaling helpers#

The underlying helper is public and lives under maldideepkit._bin_scaling:

  • maldideepkit._bin_scaling.scale_odd_kernel() returns an odd kernel size closest to reference_kernel * reference_bin_width / bin_width, clamped >= 3. Reference: kernel 7 at bin_width=3. Used by CNN and ResNet kernels.

The Transformer’s patch_size is deliberately not scaled by layout: a learned positional embedding sizes itself to whatever token count the patch embedding produces, so the same patch_size=4 works across bin widths and trimmed m/z ranges.

Per-block configurability#

For reviewer-driven ablations, MaldiCNNClassifier accepts a scalar or a per-block tuple for both kernel_size and pool_size:

# Scalar: broadcast to every block
MaldiCNNClassifier(channels=(32, 64, 128, 128), kernel_size=7)

# Tuple: per-block progression
MaldiCNNClassifier(
    channels=(32, 64, 128, 128),
    kernel_size=(11, 7, 5, 3),
    pool_size=(2, 2, 2, 4),
)

MaldiResNetClassifier exposes stem_kernel_size, stem_stride, block_kernel_size, and use_stem_pool as explicit keyword arguments. Defaults are peak-friendly for MALDI-TOF (stem_stride=1, block_kernel_size=7, use_stem_pool=False) and deviate from the literal ResNet-18 backbone. To reproduce the literal configuration, set stem_stride=2, block_kernel_size=3, use_stem_pool=True.

Input-dim sensitivity of the flat dense head#

Only MaldiCNNClassifier has a flat dense head whose input width scales linearly with input_dim. Given four pool_size=2 blocks and channels[-1]=128, the flat head width is 128 * input_dim / 16. For input_dim=6000 this is already 48,000 units; at input_dim=18000 it grows to 144,000. If this matters, prefer MaldiResNetClassifier or MaldiTransformerClassifier - both use a pooled head whose width is independent of input_dim.