Utils Module#

Reproducibility helpers, loss functions, training primitives, and diagnostic utilities shared across the classifier catalog. You typically don’t need to call most of these directly – BaseSpectralClassifier uses them internally – but they are exposed for users building custom training loops or investigating training dynamics.

Reproducibility#

maldideepkit.utils.seed_everything(seed, deterministic=False)[source]#

Seed Python, NumPy, and PyTorch (CPU + CUDA) RNGs in one call.

Parameters:
  • seed (int) – Non-negative integer used for every RNG. Also fixes PYTHONHASHSEED in the current process environment.

  • deterministic (bool) – When True, additionally enable PyTorch’s deterministic algorithm mode. Sets torch.use_deterministic_algorithms(True, warn_only=True), torch.backends.cudnn.deterministic = True, torch.backends.cudnn.benchmark = False, and CUBLAS_WORKSPACE_CONFIG=:4096:8. The env-var must be set before the first CUDA context is created. Once enabled, determinism is sticky - subsequent plain seed_everything(seed) calls do not turn it off.

Return type:

None

maldideepkit.utils.resolve_device(device)[source]#

Resolve a user-facing device specifier to a torch.device.

Parameters:

device (str | device | None) – "auto" (or None) picks cuda when available and falls back to cpu.

Returns:

The resolved device.

Return type:

device

Raises:

ValueError – If device is an unknown string.

Training Primitives#

class maldideepkit.utils.EarlyStopping(patience, min_delta=1e-06, min_delta_rel=0.0)[source]#

Bases: object

Track the best validation loss and signal when to stop training.

Parameters:
  • patience (int) – Number of consecutive epochs without improvement before should_stop flips to True.

  • min_delta (float) – Absolute floor on the improvement counted as progress.

  • min_delta_rel (float) – Relative floor: an epoch counts as improvement only if val_loss < best_loss - max(min_delta, min_delta_rel * |best_loss|). Useful for losses that asymptote near small values where the absolute min_delta is essentially never the binding constraint.

Variables:
  • best_loss (float) – Best validation loss observed so far (inf before the first update).

  • best_state (dict or None) – CPU copy of the model state_dict at the best epoch.

  • should_stop (bool) – True once patience epochs have elapsed without an improvement.

__init__(patience, min_delta=1e-06, min_delta_rel=0.0)[source]#
Parameters:
Return type:

None

step(val_loss, model)[source]#

Record val_loss and snapshot model if it improved.

Parameters:
  • val_loss (float) – Validation loss for the current epoch.

  • model (Module) – Model whose parameters will be cached on improvement.

Returns:

True if the loss improved this call.

Return type:

bool

maldideepkit.utils.train_loop(model, train_loader, val_tensors, criterion, optimizer, scheduler, device, epochs, early_stopping, verbose=False, on_epoch_end=None, warmup_epochs=0, grad_clip_norm=None, use_amp=False, swa_start_epoch=None, use_sam=False, metrics_recorder=None, augment=None, mixup_alpha=0.0, cutmix_alpha=0.0, n_classes=None, mix_generator=None, ema_decay=None)[source]#

Run a classic train + validate loop with early stopping.

Parameters:
  • model (Module) – Already placed on device.

  • train_loader (DataLoader) – Iterates over (x, y) batches of training data.

  • val_tensors (tuple[Tensor, Tensor]) – (X_val, y_val) tensors already on device.

  • criterion (Module) – Loss function, e.g. nn.CrossEntropyLoss.

  • optimizer (Optimizer) – Optimizer bound to model parameters.

  • scheduler (LRScheduler | ReduceLROnPlateau | None) – Optional LR scheduler. ReduceLROnPlateau is stepped on validation loss; any other scheduler is stepped once per epoch with no argument.

  • device (device) – Device on which training is carried out.

  • epochs (int) – Maximum number of epochs.

  • early_stopping (EarlyStopping) – Tracks the best validation loss and stops training when stale.

  • verbose (bool) – If True, prints one line per epoch.

  • on_epoch_end (Callable[[int, float], None] | None) – Called as on_epoch_end(epoch, val_loss) after each epoch.

  • warmup_epochs (int) – If positive, linearly ramp each optimizer param group’s learning rate from 0 to its configured target over the first warmup_epochs epochs. scheduler is not stepped during warmup.

  • grad_clip_norm (float | None) – If set, clip gradient global L2 norm to this value via torch.nn.utils.clip_grad_norm_().

  • use_amp (bool) – If True and device.type == "cuda", run forward + loss under torch.autocast() and use torch.amp.GradScaler for backward. On CPU this is a no-op.

  • swa_start_epoch (int | None) – If set, maintain a torch.optim.swa_utils.AveragedModel starting at this epoch (0-indexed). At end of training, replaces the best-val checkpoint with the SWA average.

  • use_sam (bool) – If True, assume optimizer is a SAMOptimizer and run the two-step SAM update (roughly doubles compute). Grad clipping is applied only on the second gradient.

  • metrics_recorder (Callable[[dict[str, float]], None] | None) – If provided, called once per epoch with a dict containing {"epoch", "train_loss", "val_loss", "lr", "mean_grad_norm", "n_grad_updates"}.

  • augment (Callable[[Tensor], Tensor] | None) – If provided, called on each training batch’s feature tensor after it is moved to device but before the forward pass.

  • mixup_alpha (float) – When > 0, apply MixUp on each training batch with a mix coefficient drawn from Beta(alpha, alpha). Requires n_classes. Labels become soft probability distributions.

  • cutmix_alpha (float) – When > 0, apply CutMix on each training batch. When both mixup_alpha and cutmix_alpha are positive a fair coin picks between the two per batch.

  • n_classes (int | None) – Required when mixup_alpha > 0 or cutmix_alpha > 0.

  • mix_generator (Generator | None) – Optional seeded RNG for MixUp / CutMix draws.

  • ema_decay (float | None) – When set, maintain an exponential moving average of the model parameters: ema = decay * ema + (1 - decay) * model. Typical values 0.99-0.9999. At end of training the EMA weights overwrite the base model.

Returns:

The input model with the best-validation weights loaded (or the EMA / SWA average when those are enabled - precedence EMA > SWA > best_val).

Return type:

Module

Loss Functions#

class maldideepkit.utils.FocalLoss(weight=None, gamma=2.0, label_smoothing=0.0, reduction='mean')[source]#

Bases: Module

Multi-class focal loss with optional class weighting and label smoothing.

Implements

\[L = - (1 - p_t)^\gamma \log p_t\]

where \(p_t\) is the predicted probability of the true class. At \(\gamma = 0\) and label_smoothing=0 this reduces to CrossEntropyLoss.

Parameters:
  • weight (Tensor | None) – Per-class weight tensor of shape (n_classes,). Applied to every sample by gathering at its target index (matches the CrossEntropyLoss convention for weight).

  • gamma (float) – Focusing parameter. 0 degrades to cross-entropy; 2 is the value used in Lin et al. 2017.

  • label_smoothing (float) – Target smoothing in [0, 1). At 0.0 the target is a one-hot vector; otherwise the target distribution becomes (1 - eps) * one_hot + eps / n_classes.

  • reduction (str) – How to reduce the per-sample loss tensor.

__init__(weight=None, gamma=2.0, label_smoothing=0.0, reduction='mean')[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:
Return type:

None

forward(logits, target)[source]#

Compute focal loss for (N, C) logits.

Accepts either integer targets of shape (N,) or a soft probability distribution of shape (N, C) (as produced by MixUp / CutMix). When soft targets are passed the loss becomes

\[L = - \sum_c t_c \, (1 - p_c)^\gamma \log p_c\]

label_smoothing is ignored on the soft-target path.

Class weighting follows the CrossEntropyLoss convention with reduction="mean": the per-sample weight is weight[y_i] (or Σ_c t_c · weight_c for soft targets), and the mean reduction divides by Σ_i sample_weight_i rather than N.

Parameters:
Return type:

Tensor

extra_repr()[source]#

Return a string with the focal-loss hyperparameters for repr.

Return type:

str

Optimizers#

class maldideepkit.utils.SAMOptimizer(params, base_optimizer, rho=0.05, **base_kwargs)[source]#

Bases: Optimizer

Wrap a base optimizer in the SAM two-step update.

Parameters:
  • params (Any) – Parameters or param-group dicts (as for any torch optimizer).

  • base_optimizer (type[Optimizer]) – The base optimizer class (e.g. torch.optim.AdamW). Instantiated internally against the same param groups.

  • rho (float) – Size of the ascent step in parameter space. Paper default is 0.05. Typical range: [0.01, 0.2].

  • **base_kwargs (Any) – Forwarded to the base optimizer (e.g. lr, weight_decay).

__init__(params, base_optimizer, rho=0.05, **base_kwargs)[source]#
Parameters:
Return type:

None

first_step(zero_grad=False)[source]#

Ascend to w + e using the current gradients.

Parameters:

zero_grad (bool)

Return type:

None

second_step(zero_grad=False)[source]#

Undo the ascent and apply the base optimizer step from w.

Parameters:

zero_grad (bool)

Return type:

None

step(closure=None)[source]#

Unsupported. Use first_step / second_step instead.

Parameters:

closure (Any)

Return type:

Any

Post-hoc Calibration#

Post-hoc, validation-set calibrators wired into BaseSpectralClassifier via tune_threshold / calibrate_temperature. Also callable directly.

maldideepkit.utils.tune_threshold(y_true, y_proba, metric='balanced_accuracy')[source]#

Pick the binary decision threshold that maximises metric.

Sweeps the unique observed probabilities (capped at 1000 quantiles) so severely-imbalanced settings still resolve. Falls back to a 99-point linspace(0.01, 0.99) only when no probability lies strictly inside (0, 1).

Parameters:
  • y_true (ndarray) – Binary ground-truth labels in {0, 1}.

  • y_proba (ndarray) – Predicted positive-class probabilities. If a 2-D array is given, column index 1 is used.

  • metric (str) – Which metric to maximise. "youden" = TPR - FPR.

Returns:

Threshold in (0, 1). Use as y_pred = (y_proba >= t).

Return type:

float

maldideepkit.utils.fit_temperature(logits, y_true, max_iter=200, lr=0.1)[source]#

Fit a scalar temperature by LBFGS minimisation of NLL.

Applies to raw logits (not probabilities). Returns the temperature T such that softmax(logits / T) is better-calibrated than the unscaled softmax.

Parameters:
  • logits (Tensor | ndarray) – Held-out logits.

  • y_true (Tensor | ndarray) – Ground-truth class indices.

  • max_iter (int) – LBFGS max iterations.

  • lr (float) – LBFGS step size.

Returns:

Fitted temperature; strictly positive.

Return type:

float

Diagnostics#

maldideepkit.utils.find_lr(classifier, X, y, *, start_lr=1e-08, end_lr=1.0, num_iter=200, smoothing=0.98, divergence_factor=4.0, plot=False)[source]#

Sweep learning rate geometrically and return the LR / loss curve.

Parameters:
  • classifier (BaseSpectralClassifier) – An unfitted classifier configured with the desired architecture / batch_size / loss. Its learning_rate is ignored (we drive it manually across the sweep) and its weights are reset at the start of every call.

  • X (Any) – Training data. Only enough batches to cover num_iter steps are consumed.

  • y (Any) – Training data. Only enough batches to cover num_iter steps are consumed.

  • start_lr (float) – Bounds of the geometric LR sweep.

  • end_lr (float) – Bounds of the geometric LR sweep.

  • num_iter (int) – Number of steps in the sweep.

  • smoothing (float) – Exponential-moving-average factor applied to the per-step loss (0 = no smoothing, 0.99 = heavy smoothing).

  • divergence_factor (float) – Stop early once the smoothed loss exceeds divergence_factor * min_smoothed_loss.

  • plot (bool) – If True, render a matplotlib plot. matplotlib is only imported when this is true.

Returns:

{"lrs": np.ndarray, "losses": np.ndarray, "suggested_lr": float}. suggested_lr is the LR at the steepest-descent point of the smoothed loss curve.

Return type:

dict[str, Any]

Ensembling#

Mean-of-predict_proba ensemble that fits each member independently and averages their probability outputs at inference time.

class maldideepkit.utils.SpectralEnsemble(classifiers)[source]#

Bases: object

Ensemble N fitted or unfitted spectral classifiers.

Parameters:

classifiers (list[Any]) – Unfitted classifier instances. fit() calls each member’s own fit in order.

Variables:

classes (np.ndarray) – Union of class labels reported by the members. Members must agree on the label set after fitting.

__init__(classifiers)[source]#
Parameters:

classifiers (list[Any])

Return type:

None

fit(X, y)[source]#

Fit every member on the same (X, y).

Parameters:
Return type:

SpectralEnsemble

predict_proba(X)[source]#

Return the mean of member predict_proba outputs.

Parameters:

X (Any)

Return type:

ndarray

predict(X)[source]#

Argmax of the averaged probabilities.

Per-member post-hoc calibration / thresholds are intentionally not averaged.

Parameters:

X (Any)

Return type:

ndarray

score(X, y)[source]#

Mean accuracy against y.

Parameters:
Return type:

float

save(path)[source]#

Save each member under <path>_<i>.

Example: SpectralEnsemble.save("my_ens") writes my_ens_0.pt / my_ens_0.json / … plus an index file my_ens.ensemble.json recording the per-member classes.

Parameters:

path (str | Path)

Return type:

None

classmethod load(path)[source]#

Inverse of save().

Parameters:

path (str | Path)

Return type:

SpectralEnsemble

Example#

from maldideepkit.utils import seed_everything, resolve_device

seed_everything(42)
device = resolve_device("auto")   # picks CUDA if available, else CPU

LR finder:

import numpy as np
from maldideepkit import MaldiCNNClassifier
from maldideepkit.utils import find_lr

X = np.random.default_rng(0).standard_normal((256, 1000)).astype("float32")
y = np.random.default_rng(0).integers(0, 2, size=256)
out = find_lr(MaldiCNNClassifier(input_dim=1000, random_state=0), X, y, num_iter=200)
print(out["suggested_lr"])