Φ ∥·∥ d(X,Y)
PhD Research Blog

The Geometry of Learning

How data geometry shapes neural network behaviour, from dataset distances to internal representations.

Metric Geometry × Neural Networks 2024 – Present 2 papers so far

The story begins with a mathematical object from topology — the magnitude of a metric space, introduced by Tom Leinster. Magnitude captures, in a single number, the "effective number of points" in a space — a notion sensitive to both scale and geometry. What happens when you bring this idea into machine learning, and in particular, into understanding how neural networks learn?

01
Chapter 1  ·  ICML 2026

Magnitude Distance: A Geometric Measure of Dataset Similarity

Sahel Torkamani, Henry Gouk, Rik Sarkar  ·  ICML 2026

"If two datasets live in the same space, how far apart are they — not just in raw values, but in their shape, spread, and structure? Classical distances like MMD or FID give you a number, but they often collapse in high dimensions or ignore the geometry that matters. We wanted something grounded in the actual topology of the data."

Core Ideas

Magnitude of metric spaces High-dimensional geometry Generative modeling
Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose magnitude distance, a novel distance metric defined on finite datasets using the notion of the magnitude of a metric space. The proposed distance incorporates a tunable scaling parameter, t, that controls the sensitivity to global structure (small t) and finer details (large t). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support our theoretical analysis and demonstrate that magnitude distance provides meaningful signals, comparable to established distance-based generative approaches.
The first paper established that magnitude gives us a meaningful geometric distance between datasets. But a natural follow-up emerged: magnitude assigns a weight to every point — a per-point quantity reflecting how well that point is "represented" by its surroundings. Could this same idea, applied inside a neural network's representation space, reveal something about how models generalize?
02
Chapter 2  ·  Under Review

Generalization in Neural Networks Through the Lens of Magnitude Potential

Sahel Torkamani, Henry Gouk, Rik Sarkar  · 

"Magnitude distance told us datasets have geometry. The next question was: does a neural network's internal representation space have geometry we can read? We found that the magnitude potential — a quantity tracking how well a point is represented by a given set — when computed at the logit layer, becomes a lens into the model's behaviour. It correlates with memorization, detects grokking, and tracks decision boundary shifts, all from a single geometric quantity."

Core Ideas

Magnitude Potential Memorization scores Grokking detection Neural collapse Training dynamics Logit-layer analysis
Explaining the generalization and training dynamics in neural networks remains a challenge, and various approaches have been developed to study different aspects of these phenomena. In this paper, we introduce the magnitude potential to this study — a quantity based on the theory of metric magnitude — that reflects how well an arbitrary point is represented by a given set. We find that this basic quantity can be applied to examine various features in neural generalization. The ratio between the magnitude potential with respect to a class and with respect to the entire data, computed at the logit layer, is informative of the representation of the point. In experiments, these ratios for individual training points are found to be correlated with the Feldman memorization scores. Magnitude potential ratios aggregated across points detect structural changes in the decision boundaries and provide a geometric indicator of grokking in modular arithmetic. Although the magnitude potential ratio and neural collapse are both closely associated with intra-class and inter-class geometric structure, the magnitude potential ratio remains informative even when neural collapse is explicitly suppressed.
Coming Next
Chapter 3 · The Road Ahead
Can Magnitude Scale?

Computing the magnitude potential requires inverting a kernel matrix over the entire dataset. The next chapter asks whether the geometry can be recovered without this prohibitive cost.

Story in progress — updated as the research develops.