Architectural scaling#

MaldiDeepKit’s default hyperparameters are calibrated for the MaldiAMRKit default layout: bin_width = 3 over the 2000-20000 Da range, giving input_dim = 6000. Two things can change that layout:

the user picks a different bin width (wider or finer bins), and
the user trims the spectrum to a narrower m/z range, shrinking input_dim independently of bin_width.

These two axes affect different architectural concerns, so MaldiDeepKit scales each knob with whichever quantity it actually depends on. This page documents that semantic split and the from_spectrum() factory that applies it.

Why the defaults look the way they do#

Default conv-kernel width (CNN first layer, ResNet stem) and patch size (Transformer) balance two forces:

Local feature extraction. A small kernel or patch lets early layers capture short-range correlations between adjacent bins without collapsing distinct local patterns into a single value.
Global feature integration. Keeping the kernel / patch small leaves enough depth (for conv stacks) or token-sequence length (for attention and SSM) for deeper components to integrate long-range structure.

The specific reference values (kernel_size=7, patch_size=4) are inherited from well-tested image-domain defaults (ResNet-18 stem, ViT patch).

Empirically, these values also align cleanly with the peak width distribution on DRIAMS spectra at bin_width=3. We measured the full width at half maximum (FWHM) of all detected peaks on 50 random spectra per species-drug pair (scipy.signal.find_peaks with prominence threshold, scipy.signal.peak_widths at rel_height=0.5):

Peak FWHM on DRIAMS (Da, `bin_width=3`)#
Pair	Spectra	P10	P25	Median	P75	P90
S. aureus / oxacillin	4874	6.6	9.0	11.8	15.1	21.3
E. coli / ceftriaxone	8080	6.3	8.7	11.8	15.4	21.5
K. pneumoniae / ceftriaxone	5512	6.3	8.6	11.8	15.2	21.1

The distribution is strikingly consistent across the three species: the median peak is ~12 Da wide and the 90th-percentile peak is ~21 Da. This is why the image-domain defaults happen to be a natural fit:

patch_size=4 (12 Da at bin_width=3) ≈ one median peak per token, so each Transformer token captures roughly one peak’s worth of shape.
kernel_size=7 (21 Da) ≈ 90th-percentile peak FWHM, so the first CNN / ResNet conv fully contains ~90% of peaks along with one or two bins of shoulder on either side.

This is empirical observation, not a priori design. The defaults were picked for the local-vs-global balance first; the alignment with DRIAMS peak widths emerged from the measurement above. Different instruments, different preprocessing, or different bin widths will shift the distribution - use from_spectrum() to scale.

The semantic split#

When the spectrum layout changes, each architectural knob is driven by the quantity it physically depends on:

Knob	Driven by	Why
CNN `kernel_size`	`bin_width`	Kernels aggregate per-bin information density. A finer bin grid carries less information per bin, so a wider kernel in bins is needed to gather the same local structure; a coarser grid calls for a smaller kernel.
ResNet `stem_kernel_size`	`bin_width`	Same argument as for the CNN: the first layer’s job is to aggregate adjacent bin values, which is a per-bin-density concern.
Transformer `patch_size`	(none: scale-agnostic)	The Transformer uses a learned positional embedding sized to whatever token count the patch embedding produces, so any `input_dim` works with the default `patch_size=4` without tuning. `from_spectrum` just forwards `input_dim`.

The MLP has no spectral-layout-dependent knob either: its first layer is a single Linear(input_dim, hidden_dim) that handles any input size identically.

Auto-scaling: `from_spectrum`#

Each classifier exposes a from_spectrum classmethod that applies the semantic split automatically:

Classifier.from_spectrum(bin_width: int, input_dim: int, **overrides)

Both parameters describe the spectrum layout. Each classifier uses the relevant one internally:

MaldiCNNClassifier and MaldiResNetClassifier scale their conv kernel using bin_width.
MaldiTransformerClassifier and MaldiMLPClassifier are architecturally scale-agnostic and just record input_dim for shape validation; from_spectrum forwards overrides untouched.

Any keyword in **overrides wins over the auto-scaled default.

from maldideepkit import (
    MaldiCNNClassifier,
    MaldiTransformerClassifier,
)

# Reference layout: kernel_size=7, patch_size=4
cnn = MaldiCNNClassifier.from_spectrum(bin_width=3, input_dim=6000)
tr = MaldiTransformerClassifier.from_spectrum(bin_width=3, input_dim=6000)

# Trim the spectrum (bin_width unchanged, input_dim halved).
# CNN kernel stays at 7 (bin density unchanged).
# Transformer unchanged (scale-agnostic).
cnn_trim = MaldiCNNClassifier.from_spectrum(bin_width=3, input_dim=3000)
tr_trim = MaldiTransformerClassifier.from_spectrum(bin_width=3, input_dim=3000)

# Coarser binning over the full range.
# CNN kernel drops to 5 (fewer bins carry more info per bin).
cnn_coarse = MaldiCNNClassifier.from_spectrum(bin_width=6, input_dim=3000)

# Fine bins, trimmed.
# CNN kernel up to 21 (fine bins -> larger kernel to aggregate).
cnn_fine = MaldiCNNClassifier.from_spectrum(bin_width=1, input_dim=4000)

# Override the auto-choice explicitly.
cnn_custom = MaldiCNNClassifier.from_spectrum(
    bin_width=3, input_dim=6000, kernel_size=(11, 7, 5, 3),
)

Scaling helpers#

The underlying helper is public and lives under maldideepkit._bin_scaling:

maldideepkit._bin_scaling.scale_odd_kernel() returns an odd kernel size closest to reference_kernel * reference_bin_width / bin_width, clamped >= 3. Reference: kernel 7 at bin_width=3. Used by CNN and ResNet kernels.

The Transformer’s patch_size is deliberately not scaled by layout: a learned positional embedding sizes itself to whatever token count the patch embedding produces, so the same patch_size=4 works across bin widths and trimmed m/z ranges.

Per-block configurability#

For reviewer-driven ablations, MaldiCNNClassifier accepts a scalar or a per-block tuple for both kernel_size and pool_size:

# Scalar: broadcast to every block
MaldiCNNClassifier(channels=(32, 64, 128, 128), kernel_size=7)

# Tuple: per-block progression
MaldiCNNClassifier(
    channels=(32, 64, 128, 128),
    kernel_size=(11, 7, 5, 3),
    pool_size=(2, 2, 2, 4),
)

MaldiResNetClassifier exposes stem_kernel_size, stem_stride, block_kernel_size, and use_stem_pool as explicit keyword arguments. Defaults are peak-friendly for MALDI-TOF (stem_stride=1, block_kernel_size=7, use_stem_pool=False) and deviate from the literal ResNet-18 backbone. To reproduce the literal configuration, set stem_stride=2, block_kernel_size=3, use_stem_pool=True.

Input-dim sensitivity of the flat dense head#

Only MaldiCNNClassifier has a flat dense head whose input width scales linearly with input_dim. Given four pool_size=2 blocks and channels[-1]=128, the flat head width is 128 * input_dim / 16. For input_dim=6000 this is already 48,000 units; at input_dim=18000 it grows to 144,000. If this matters, prefer MaldiResNetClassifier or MaldiTransformerClassifier - both use a pooled head whose width is independent of input_dim.