Utils Module#
Reproducibility helpers, loss functions, training primitives, and
diagnostic utilities shared across the classifier catalog. You
typically don’t need to call most of these directly –
BaseSpectralClassifier uses them internally –
but they are exposed for users building custom training loops or
investigating training dynamics.
Reproducibility#
- maldideepkit.utils.seed_everything(seed, deterministic=False)[source]#
Seed Python, NumPy, and PyTorch (CPU + CUDA) RNGs in one call.
- Parameters:
seed (
int) – Non-negative integer used for every RNG. Also fixesPYTHONHASHSEEDin the current process environment.deterministic (
bool) – WhenTrue, additionally enable PyTorch’s deterministic algorithm mode. Setstorch.use_deterministic_algorithms(True, warn_only=True),torch.backends.cudnn.deterministic = True,torch.backends.cudnn.benchmark = False, andCUBLAS_WORKSPACE_CONFIG=:4096:8. The env-var must be set before the first CUDA context is created. Once enabled, determinism is sticky - subsequent plainseed_everything(seed)calls do not turn it off.
- Return type:
- maldideepkit.utils.resolve_device(device)[source]#
Resolve a user-facing device specifier to a
torch.device.- Parameters:
device (
str|device|None) –"auto"(orNone) pickscudawhen available and falls back tocpu.- Returns:
The resolved device.
- Return type:
- Raises:
ValueError – If
deviceis an unknown string.
Training Primitives#
- class maldideepkit.utils.EarlyStopping(patience, min_delta=1e-06, min_delta_rel=0.0)[source]#
Bases:
objectTrack the best validation loss and signal when to stop training.
- Parameters:
patience (
int) – Number of consecutive epochs without improvement beforeshould_stopflips toTrue.min_delta (
float) – Absolute floor on the improvement counted as progress.min_delta_rel (
float) – Relative floor: an epoch counts as improvement only ifval_loss < best_loss - max(min_delta, min_delta_rel * |best_loss|). Useful for losses that asymptote near small values where the absolutemin_deltais essentially never the binding constraint.
- Variables:
- maldideepkit.utils.train_loop(model, train_loader, val_tensors, criterion, optimizer, scheduler, device, epochs, early_stopping, verbose=False, on_epoch_end=None, warmup_epochs=0, grad_clip_norm=None, use_amp=False, swa_start_epoch=None, use_sam=False, metrics_recorder=None, augment=None, mixup_alpha=0.0, cutmix_alpha=0.0, n_classes=None, mix_generator=None, ema_decay=None)[source]#
Run a classic train + validate loop with early stopping.
- Parameters:
model (
Module) – Already placed ondevice.train_loader (
DataLoader) – Iterates over(x, y)batches of training data.val_tensors (
tuple[Tensor,Tensor]) –(X_val, y_val)tensors already ondevice.criterion (
Module) – Loss function, e.g.nn.CrossEntropyLoss.optimizer (
Optimizer) – Optimizer bound tomodelparameters.scheduler (
LRScheduler|ReduceLROnPlateau|None) – Optional LR scheduler.ReduceLROnPlateauis stepped on validation loss; any other scheduler is stepped once per epoch with no argument.device (
device) – Device on which training is carried out.epochs (
int) – Maximum number of epochs.early_stopping (
EarlyStopping) – Tracks the best validation loss and stops training when stale.verbose (
bool) – IfTrue, prints one line per epoch.on_epoch_end (
Callable[[int,float],None] |None) – Called ason_epoch_end(epoch, val_loss)after each epoch.warmup_epochs (
int) – If positive, linearly ramp each optimizer param group’s learning rate from0to its configured target over the firstwarmup_epochsepochs.scheduleris not stepped during warmup.grad_clip_norm (
float|None) – If set, clip gradient global L2 norm to this value viatorch.nn.utils.clip_grad_norm_().use_amp (
bool) – IfTrueanddevice.type == "cuda", run forward + loss undertorch.autocast()and usetorch.amp.GradScalerfor backward. On CPU this is a no-op.swa_start_epoch (
int|None) – If set, maintain atorch.optim.swa_utils.AveragedModelstarting at this epoch (0-indexed). At end of training, replaces the best-val checkpoint with the SWA average.use_sam (
bool) – IfTrue, assumeoptimizeris aSAMOptimizerand run the two-step SAM update (roughly doubles compute). Grad clipping is applied only on the second gradient.metrics_recorder (
Callable[[dict[str,float]],None] |None) – If provided, called once per epoch with a dict containing{"epoch", "train_loss", "val_loss", "lr", "mean_grad_norm", "n_grad_updates"}.augment (
Callable[[Tensor],Tensor] |None) – If provided, called on each training batch’s feature tensor after it is moved todevicebut before the forward pass.mixup_alpha (
float) – When> 0, apply MixUp on each training batch with a mix coefficient drawn fromBeta(alpha, alpha). Requiresn_classes. Labels become soft probability distributions.cutmix_alpha (
float) – When> 0, apply CutMix on each training batch. When bothmixup_alphaandcutmix_alphaare positive a fair coin picks between the two per batch.n_classes (
int|None) – Required whenmixup_alpha > 0orcutmix_alpha > 0.mix_generator (
Generator|None) – Optional seeded RNG for MixUp / CutMix draws.ema_decay (
float|None) – When set, maintain an exponential moving average of the model parameters:ema = decay * ema + (1 - decay) * model. Typical values0.99-0.9999. At end of training the EMA weights overwrite the base model.
- Returns:
The input
modelwith the best-validation weights loaded (or the EMA / SWA average when those are enabled - precedence EMA > SWA > best_val).- Return type:
Loss Functions#
- class maldideepkit.utils.FocalLoss(weight=None, gamma=2.0, label_smoothing=0.0, reduction='mean')[source]#
Bases:
ModuleMulti-class focal loss with optional class weighting and label smoothing.
Implements
\[L = - (1 - p_t)^\gamma \log p_t\]where \(p_t\) is the predicted probability of the true class. At \(\gamma = 0\) and
label_smoothing=0this reduces toCrossEntropyLoss.- Parameters:
weight (
Tensor|None) – Per-class weight tensor of shape(n_classes,). Applied to every sample by gathering at its target index (matches theCrossEntropyLossconvention forweight).gamma (
float) – Focusing parameter.0degrades to cross-entropy;2is the value used in Lin et al. 2017.label_smoothing (
float) – Target smoothing in[0, 1). At0.0the target is a one-hot vector; otherwise the target distribution becomes(1 - eps) * one_hot + eps / n_classes.reduction (
str) – How to reduce the per-sample loss tensor.
- __init__(weight=None, gamma=2.0, label_smoothing=0.0, reduction='mean')[source]#
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(logits, target)[source]#
Compute focal loss for
(N, C)logits.Accepts either integer targets of shape
(N,)or a soft probability distribution of shape(N, C)(as produced by MixUp / CutMix). When soft targets are passed the loss becomes\[L = - \sum_c t_c \, (1 - p_c)^\gamma \log p_c\]label_smoothingis ignored on the soft-target path.Class weighting follows the
CrossEntropyLossconvention withreduction="mean": the per-sample weight isweight[y_i](orΣ_c t_c · weight_cfor soft targets), and the mean reduction divides byΣ_i sample_weight_irather thanN.
Optimizers#
- class maldideepkit.utils.SAMOptimizer(params, base_optimizer, rho=0.05, **base_kwargs)[source]#
Bases:
OptimizerWrap a base optimizer in the SAM two-step update.
- Parameters:
params (
Any) – Parameters or param-group dicts (as for any torch optimizer).base_optimizer (
type[Optimizer]) – The base optimizer class (e.g.torch.optim.AdamW). Instantiated internally against the same param groups.rho (
float) – Size of the ascent step in parameter space. Paper default is0.05. Typical range:[0.01, 0.2].**base_kwargs (
Any) – Forwarded to the base optimizer (e.g.lr,weight_decay).
Post-hoc Calibration#
Post-hoc, validation-set calibrators wired into
BaseSpectralClassifier via
tune_threshold / calibrate_temperature. Also callable
directly.
- maldideepkit.utils.tune_threshold(y_true, y_proba, metric='balanced_accuracy')[source]#
Pick the binary decision threshold that maximises
metric.Sweeps the unique observed probabilities (capped at 1000 quantiles) so severely-imbalanced settings still resolve. Falls back to a 99-point
linspace(0.01, 0.99)only when no probability lies strictly inside(0, 1).- Parameters:
- Returns:
Threshold in
(0, 1). Use asy_pred = (y_proba >= t).- Return type:
Diagnostics#
- maldideepkit.utils.find_lr(classifier, X, y, *, start_lr=1e-08, end_lr=1.0, num_iter=200, smoothing=0.98, divergence_factor=4.0, plot=False)[source]#
Sweep learning rate geometrically and return the LR / loss curve.
- Parameters:
classifier (
BaseSpectralClassifier) – An unfitted classifier configured with the desired architecture / batch_size / loss. Itslearning_rateis ignored (we drive it manually across the sweep) and its weights are reset at the start of every call.X (
Any) – Training data. Only enough batches to covernum_itersteps are consumed.y (
Any) – Training data. Only enough batches to covernum_itersteps are consumed.start_lr (
float) – Bounds of the geometric LR sweep.end_lr (
float) – Bounds of the geometric LR sweep.num_iter (
int) – Number of steps in the sweep.smoothing (
float) – Exponential-moving-average factor applied to the per-step loss (0 = no smoothing, 0.99 = heavy smoothing).divergence_factor (
float) – Stop early once the smoothed loss exceedsdivergence_factor * min_smoothed_loss.plot (
bool) – IfTrue, render a matplotlib plot.matplotlibis only imported when this is true.
- Returns:
{"lrs": np.ndarray, "losses": np.ndarray, "suggested_lr": float}.suggested_lris the LR at the steepest-descent point of the smoothed loss curve.- Return type:
Ensembling#
Mean-of-predict_proba ensemble that fits each member independently
and averages their probability outputs at inference time.
- class maldideepkit.utils.SpectralEnsemble(classifiers)[source]#
Bases:
objectEnsemble N fitted or unfitted spectral classifiers.
- Parameters:
classifiers (
list[Any]) – Unfitted classifier instances.fit()calls each member’s ownfitin order.- Variables:
classes (np.ndarray) – Union of class labels reported by the members. Members must agree on the label set after fitting.
- predict(X)[source]#
Argmax of the averaged probabilities.
Per-member post-hoc calibration / thresholds are intentionally not averaged.
Example#
from maldideepkit.utils import seed_everything, resolve_device
seed_everything(42)
device = resolve_device("auto") # picks CUDA if available, else CPU
LR finder:
import numpy as np
from maldideepkit import MaldiCNNClassifier
from maldideepkit.utils import find_lr
X = np.random.default_rng(0).standard_normal((256, 1000)).astype("float32")
y = np.random.default_rng(0).integers(0, 2, size=256)
out = find_lr(MaldiCNNClassifier(input_dim=1000, random_state=0), X, y, num_iter=200)
print(out["suggested_lr"])