- An elementary and unified proof of Grothendieck's inequality We present an elementary, self-contained proof of Grothendieck's inequality that unifies the real and complex cases and yields both the Krivine and Haagerup bounds, the current best-known explicit bounds for the real and complex Grothendieck constants respectively. This article is intended to be pedagogical, combining and streamlining known ideas of Lindenstrauss--Pe{\l}czy\'nski, Krivine, and Haagerup into a proof that need only univariate calculus, basic complex variables, and a modicum of linear algebra as prerequisites. 3 authors · Nov 28, 2017
- Automated Search for Conjectures on Mathematical Constants using Analysis of Integer Sequences Formulas involving fundamental mathematical constants had a great impact on various fields of science and mathematics, for example aiding in proofs of irrationality of constants. However, the discovery of such formulas has historically remained scarce, often perceived as an act of mathematical genius by great mathematicians such as Ramanujan, Euler, and Gauss. Recent efforts to automate the discovery of formulas for mathematical constants, such as the Ramanujan Machine project, relied on exhaustive search. Despite several successful discoveries, exhaustive search remains limited by the space of options that can be covered and by the need for vast amounts of computational resources. Here we propose a fundamentally different method to search for conjectures on mathematical constants: through analysis of integer sequences. We introduce the Enumerated Signed-continued-fraction Massey Approve (ESMA) algorithm, which builds on the Berlekamp-Massey algorithm to identify patterns in integer sequences that represent mathematical constants. The ESMA algorithm found various known formulas for e, e^2, tan(1), and ratios of values of Bessel functions. The algorithm further discovered a large number of new conjectures for these constants, some providing simpler representations and some providing faster numerical convergence than the corresponding simple continued fractions. Along with the algorithm, we present mathematical tools for manipulating continued fractions. These connections enable us to characterize what space of constants can be found by ESMA and quantify its algorithmic advantage in certain scenarios. Altogether, this work continues in the development of augmenting mathematical intuition by computer algorithms, to help reveal mathematical structures and accelerate mathematical research. 6 authors · Dec 13, 2022
- Volumes of Nullhomotopies in Nilpotent Spaces The Shadowing Principle of Manin has proved a valuable tool for addressing questions of quantitative topology raised by Gromov in the late 1900s. The principle informally provides a way for bounded algebraic maps between differential graded algebras to be translated into nearby genuine maps between their geometric realizations. We extend this principle to finite towers of principal K(G,n) fibrations, and in particular apply this construction to nilpotent spaces. As a specific application of the extended principle, we provide upper bounds on the asymptotic behavior of volumes of nullhomotopies of Lipschitz maps into nilpotent spaces. We further refine these bounds in the case when c = 1 to nearly meet those of the simply connected setting. We similarly refine these bounds in the event the target space is coformal, and demonstrate that the bounds in this setting are nearly sharp. 1 authors · Sep 30, 2025
- New counterexamples to the birational Torelli theorem for Calabi--Yau manifolds We produce counterexamples to the birational Torelli theorem for Calabi-Yau manifolds in arbitrarily high dimension: this is done by exhibiting a series of non birational pairs of Calabi-Yau (n^2-1)-folds which, for n geq 2 even, admit an isometry between their middle cohomologies. These varieties also satisfy an mathbb L-equivalence relation in the Grothendieck ring of varieties, i.e. the difference of their classes annihilates a power of the class of the affine line. We state this last property for a broader class of Calabi-Yau pairs, namely all those which are realized as pushforwards of a general (1,1)-section on a homogeneous roof in the sense of Kanemitsu, along its two extremal contractions. 1 authors · Nov 7, 2022
- Approximate Axiomatization for Differentially-Defined Functions This article establishes a complete approximate axiomatization for the real-closed field R expanded with all differentially-defined functions, including special functions such as sin(x), cos(x), e^x, dots. Every true sentence is provable up to some numerical approximation, and the truth of such approximations converge under mild conditions. Such an axiomatization is a fragment of the axiomatization for differential dynamic logic, and is therefore a finite extension of the axiomatization of real-closed fields. Furthermore, the numerical approximations approximate formulas containing special function symbols by FOL_{R} formulas, improving upon earlier decidability results only concerning closed sentences. 2 authors · Jun 9, 2025
- On Loewner energy and curve composition The composition gamma circ eta of Jordan curves gamma and eta in universal Teichm\"uller space is defined through the composition h_gamma circ h_eta of their conformal weldings. We show that whenever gamma and eta are curves of finite Loewner energy I^L, the energy of the composition satisfies $I^L(gamma circ eta) lesssim_K I^L(gamma) + I^L(eta), with an explicit constant in terms of the quasiconformal K of \gamma and \eta. We also study the asymptotic growth rate of the Loewner energy under n self-compositions \gamma^n := \gamma \circ \cdots \circ \gamma, showing limsup_{n rightarrow infty} 1{n}log I^L(gamma^n) lesssim_K 1, again with explicit constant. Our approach is to define a new conformally-covariant rooted welding functional W_h(y), and show W_h(y) \asymp_K I^L(\gamma) when h is a welding of \gamma and y is any root (a point in the domain of h). In the course of our arguments we also give several new expressions for the Loewner energy, including generalized formulas in terms of the Riemann maps f and g for \gamma which hold irrespective of the placement of \gamma on the Riemann sphere, the normalization of f and g, and what disks D, D^c \subset \mathbb{C} serve as domains. An additional corollary is that I^L(\gamma) is bounded above by a constant only depending on the Weil--Petersson distance from \gamma$ to the circle. 2 authors · May 6, 2025
- Abundance of progression in large set for non commutative semigroup The notion of abundance of certain type of configuration in certain large sets was first proved by Furstenberg and Glazner in 1998. After that many author investigate abundance of different types of configurations in different types of large sets. Hindman, Hosseini, Strauss and Tootkaboni recently introduced another notion of large sets called CR sets. Then Debnath and De proved abundance of arithmetic progression in CR sets for commutative semigroups. In the present article we investigate abundance of progressions in for non-commutative semigroups. 1 authors · Dec 13, 2023
- Preservation of Loewy Diagrams Under Exact Functors We derive sufficient conditions for exact functors on locally finite abelian categories to preserve Loewy diagrams of objects. We apply our results to determine sufficient conditions for induction functors associated to simple current extensions of vertex algebras to preserve Loewy diagrams. 1 authors · May 1, 2023
- Lie Group Decompositions for Equivariant Neural Networks Invariance and equivariance to geometrical transformations have proven to be very useful inductive biases when training (convolutional) neural network models, especially in the low-data regime. Much work has focused on the case where the symmetry group employed is compact or abelian, or both. Recent work has explored enlarging the class of transformations used to the case of Lie groups, principally through the use of their Lie algebra, as well as the group exponential and logarithm maps. The applicability of such methods to larger transformation groups is limited by the fact that depending on the group of interest G, the exponential map may not be surjective. Further limitations are encountered when G is neither compact nor abelian. Using the structure and geometry of Lie groups and their homogeneous spaces, we present a framework by which it is possible to work with such groups primarily focusing on the Lie groups G = GL^{+}(n, R) and G = SL(n, R), as well as their representation as affine transformations R^{n} rtimes G. Invariant integration as well as a global parametrization is realized by decomposing the `larger` groups into subgroups and submanifolds which can be handled individually. Under this framework, we show how convolution kernels can be parametrized to build models equivariant with respect to affine transformations. We evaluate the robustness and out-of-distribution generalisation capability of our model on the standard affine-invariant benchmark classification task, where we outperform all previous equivariant models as well as all Capsule Network proposals. 2 authors · Oct 17, 2023
- Product representation of perfect cubes Let F_{k,d}(n) be the maximal size of a set {A}subseteq [n] such that the equation \[a_1a_2\dots a_k=x^d, \; a_1<a_2<\ldots<a_k\] has no solution with a_1,a_2,ldots,a_kA and integer x. Erdos, S\'ark\"ozy and T. S\'os studied F_{k,2}, and gave bounds when k=2,3,4,6 and also in the general case. We study the problem for d=3, and provide bounds for k=2,3,4,6 and 9, furthermore, in the general case, as well. In particular, we refute an 18 years old conjecture of Verstra\"ete. We also introduce another function f_{k,d} closely related to F_{k,d}: While the original problem requires a_1, ldots , a_k to all be distinct, we can relax this and only require that the multiset of the a_i's cannot be partitioned into d-tuples where each d-tuple consists of d copies of the same number. 5 authors · May 20, 2024
- Algorithm-assisted discovery of an intrinsic order among mathematical constants In recent decades, a growing number of discoveries in fields of mathematics have been assisted by computer algorithms, primarily for exploring large parameter spaces that humans would take too long to investigate. As computers and algorithms become more powerful, an intriguing possibility arises - the interplay between human intuition and computer algorithms can lead to discoveries of novel mathematical concepts that would otherwise remain elusive. To realize this perspective, we have developed a massively parallel computer algorithm that discovers an unprecedented number of continued fraction formulas for fundamental mathematical constants. The sheer number of formulas discovered by the algorithm unveils a novel mathematical structure that we call the conservative matrix field. Such matrix fields (1) unify thousands of existing formulas, (2) generate infinitely many new formulas, and most importantly, (3) lead to unexpected relations between different mathematical constants, including multiple integer values of the Riemann zeta function. Conservative matrix fields also enable new mathematical proofs of irrationality. In particular, we can use them to generalize the celebrated proof by Ap\'ery for the irrationality of zeta(3). Utilizing thousands of personal computers worldwide, our computer-supported research strategy demonstrates the power of experimental mathematics, highlighting the prospects of large-scale computational approaches to tackle longstanding open problems and discover unexpected connections across diverse fields of science. 9 authors · Aug 22, 2023
- Green functions of Energized complexes If h is a ring-valued function on a simplicial complex G we can define two matrices L and g, where the matrix entries are the h energy of homoclinic intersections. We know that the sum over all h values on G is equal to the sum of the Green matrix entries g(x,y). We also have already seen that that the determinants of L or g are both the product of the h(x). In the case where h(x) is the parity of dimension, the sum of the energy values was the standard Euler characteristic and the determinant was a unit. If h(x) was the unit in the ring then L,g are integral quadratic forms which are isospectral and inverse matrices of each other. We prove here that the quadratic energy expression summing over all pairs h(x)^* h(y) of intersecting sets is a signed sum of squares of Green function entries. The quadratic energy expression is Wu characteristic in the case when h is dimension parity. For general h, the quadratic energy expression resembles an Ising Heisenberg type interaction. The conjugate of g is the inverse of L if h takes unit values in a normed ring or in the group of unitary operators in an operator algebra. 1 authors · Oct 18, 2020
- Inversion of adjunction for quotient singularities III: semi-invariant case We prove the precise inversion of adjunction formula for finite linear group quotients of complete intersection varieties defined by semi-invariant equations. As an application, we prove the semi-continuity of minimal log discrepancies for them. These results extend the results in our first paper, where we prove the same results for complete intersection varieties defined by ``invariant equations". 2 authors · Dec 10, 2023
- Unsupervised Discovery of Formulas for Mathematical Constants Ongoing efforts that span over decades show a rise of AI methods for accelerating scientific discovery, yet accelerating discovery in mathematics remains a persistent challenge for AI. Specifically, AI methods were not effective in creation of formulas for mathematical constants because each such formula must be correct for infinite digits of precision, with "near-true" formulas providing no insight toward the correct ones. Consequently, formula discovery lacks a clear distance metric needed to guide automated discovery in this realm. In this work, we propose a systematic methodology for categorization, characterization, and pattern identification of such formulas. The key to our methodology is introducing metrics based on the convergence dynamics of the formulas, rather than on the numerical value of the formula. These metrics enable the first automated clustering of mathematical formulas. We demonstrate this methodology on Polynomial Continued Fraction formulas, which are ubiquitous in their intrinsic connections to mathematical constants, and generalize many mathematical functions and structures. We test our methodology on a set of 1,768,900 such formulas, identifying many known formulas for mathematical constants, and discover previously unknown formulas for pi, ln(2), Gauss', and Lemniscate's constants. The uncovered patterns enable a direct generalization of individual formulas to infinite families, unveiling rich mathematical structures. This success paves the way towards a generative model that creates formulas fulfilling specified mathematical properties, accelerating the rate of discovery of useful formulas. 6 authors · Dec 21, 2024
- On the Topological Complexity of Maps We define and develop a homotopy invariant notion for the topological complexity of a map f:X to Y, denoted TC(f), that interacts with TC(X) and TC(Y) in the same way cat(f) interacts with cat(X) and cat(Y). Furthermore, TC(f) and cat(f) satisfy the same inequalities as TC(X) and cat(X). We compare it to other invariants defined in the papers [15,16,17,18,20]. We apply TC(f) to studying group homomorphisms f:Hto G. 1 authors · Nov 20, 2020
- Specialization maps for Scholze's category of diamonds We introduce the specialization map in Scholzes theory of diamonds. We consider v-sheaves that behave like formal schemes and call them kimberlites. We attach to them: a reduced special fiber, an analytic locus, a specialization map, a Zariski site, and an etale site. When the kimberlite comes from a formal scheme, our sites recover the classical ones. We prove that unramified p-adic Beilinson--Drinfeld Grassmannians are kimberlites with finiteness and normality properties. 1 authors · Dec 10, 2020
- Space-time tradeoffs of lenses and optics via higher category theory Optics and lenses are abstract categorical gadgets that model systems with bidirectional data flow. In this paper we observe that the denotational definition of optics - identifying two optics as equivalent by observing their behaviour from the outside - is not suitable for operational, software oriented approaches where optics are not merely observed, but built with their internal setups in mind. We identify operational differences between denotationally isomorphic categories of cartesian optics and lenses: their different composition rule and corresponding space-time tradeoffs, positioning them at two opposite ends of a spectrum. With these motivations we lift the existing categorical constructions and their relationships to the 2-categorical level, showing that the relevant operational concerns become visible. We define the 2-category 2-Optic(C) whose 2-cells explicitly track optics' internal configuration. We show that the 1-category Optic(C) arises by locally quotienting out the connected components of this 2-category. We show that the embedding of lenses into cartesian optics gets weakened from a functor to an oplax functor whose oplaxator now detects the different composition rule. We determine the difficulties in showing this functor forms a part of an adjunction in any of the standard 2-categories. We establish a conjecture that the well-known isomorphism between cartesian lenses and optics arises out of the lax 2-adjunction between their double-categorical counterparts. In addition to presenting new research, this paper is also meant to be an accessible introduction to the topic. 1 authors · Sep 19, 2022
- Higher Categories and Slices of Globular Operads In an unpublished preprint batanin, Batanin conjectures that it is possible to take `slices' of a globular operad, thereby isolating the algebraic structure in each dimension. It was further hypothesised that the slices of a globular operad for some theory of higher category contain essential information about those higher categories, namely whether or not they are equivalent to the fully weak variety. In this paper, we use the theory of presentations for globular operads developed in Me to provide a concrete definition of slices, and calculate the slices for several key theories of n-category. 1 authors · May 24, 2023
2 A Very Elementary Introduction to Sheaves This paper is a very non-rigorous, loose, and extremely basic introduction to sheaves. This is meant to be a a guide to gaining intuition about sheaves, what they look like, and how they work, so that after reading this paper, someone can jump into the extremely abstract definitions and examples seen in textbooks with at least some idea of what is going on. Most of this material is inspired and built from the work of Dr. Michael Robinson, and that of Dr. Robert Ghrist and Dr. Jakob Hansen, as well as Dr. Justin Curry's PhD thesis, who are some of the only applied sheaf theorists out there and they do an amazing job of explaining sheaves in a concrete way through their research. The rest of this paper is populated by mathematical definitions found in textbooks that I have stretched from two lines into multiple pages, as well as some analogies for thinking of sheaves I have thought of myself. This paper only assumes knowledge of basic linear algebra, basic group theory, and the very fundamentals of topology. If there is anything in the setup that you do not understand it is probably a quick Wikipedia search away. I hope this paper provides insight, intuition, and helpful examples of why sheaves are such powerful tools in both math and science. 1 authors · Feb 2, 2022
- Counting Imaginary Quadratic Fields with an Ideal Class Group of 5-rank at least 2 We prove that there are ggX^{frac{1{3}}}{(log X)^2} imaginary quadratic fields k with discriminant |d_k|leq X and an ideal class group of 5-rank at least 2. This improves a result of Byeon, who proved the lower bound gg X^{1{4}} in the same setting. We use a method of Howe, Leprévost, and Poonen to construct a genus 2 curve C over Q such that C has a rational Weierstrass point and the Jacobian of C has a rational torsion subgroup of 5-rank 2. We deduce the main result from the existence of the curve C and a quantitative result of Kulkarni and the second author. 3 authors · Feb 2, 2025
- A Constructive, Type-Theoretic Approach to Regression via Global Optimisation We examine the connections between deterministic, complete, and general global optimisation of continuous functions and a general concept of regression from the perspective of constructive type theory via the concept of 'searchability'. We see how the property of convergence of global optimisation is a straightforward consequence of searchability. The abstract setting allows us to generalise searchability and continuity to higher-order functions, so that we can formulate novel convergence criteria for regression, derived from the convergence of global optimisation. All the theory and the motivating examples are fully formalised in the proof assistant Agda. 2 authors · Jun 23, 2020
- Learning the greatest common divisor: explaining transformer predictions The predictions of small transformers, trained to calculate the greatest common divisor (GCD) of two positive integers, can be fully characterized by looking at model inputs and outputs. As training proceeds, the model learns a list mathcal D of integers, products of divisors of the base used to represent integers and small primes, and predicts the largest element of mathcal D that divides both inputs. Training distributions impact performance. Models trained from uniform operands only learn a handful of GCD (up to 38 GCD leq100). Log-uniform operands boost performance to 73 GCD leq 100, and a log-uniform distribution of outcomes (i.e. GCD) to 91. However, training from uniform (balanced) GCD breaks explainability. 1 authors · Aug 29, 2023
- On the minimal power of q in a Kazhdan-Lusztig polynomial For w in the symmetric group, we provide an exact formula for the smallest positive power q^{h(w)} appearing in the Kazhdan-Lusztig polynomial P_{e,w}(q). We also provide a tight upper bound on h(w) in simply-laced types, resolving a conjecture of Billey-Postnikov from 2002. 2 authors · Mar 23, 2023
- Models of Abelian varieties over valued fields, using model theory Given an elliptic curve E over a perfect defectless henselian valued field (F,val) with perfect residue field k_F and valuation ring O_F, there exists an integral separated smooth group scheme E over O_F with Etimes_{Spec O_F}Spec Fcong E. If char(k_F)neq 2,3 then one can be found over O_{F^{alg}} such that the definable group E(O) is the maximal generically stable subgroup of E. We also give some partial results on general Abelian varieties over F. The construction of E is by means of generating a birational group law over O_F by the aid of a generically stable generic type of a definable subgroup of E. 1 authors · Mar 28, 2023
1 Positive Geometries and Canonical Forms Recent years have seen a surprising connection between the physics of scattering amplitudes and a class of mathematical objects--the positive Grassmannian, positive loop Grassmannians, tree and loop Amplituhedra--which have been loosely referred to as "positive geometries". The connection between the geometry and physics is provided by a unique differential form canonically determined by the property of having logarithmic singularities (only) on all the boundaries of the space, with residues on each boundary given by the canonical form on that boundary. In this paper we initiate an exploration of "positive geometries" and "canonical forms" as objects of study in their own right in a more general mathematical setting. We give a precise definition of positive geometries and canonical forms, introduce general methods for finding forms for more complicated positive geometries from simpler ones, and present numerous examples of positive geometries in projective spaces, Grassmannians, and toric, cluster and flag varieties. We also illustrate a number of strategies for computing canonical forms which yield interesting representations for the forms associated with wide classes of positive geometries, ranging from the simplest Amplituhedra to new expressions for the volume of arbitrary convex polytopes. 3 authors · Mar 13, 2017
- One-connection rule for structural equation models Linear structural equation models are multivariate statistical models encoded by mixed graphs. In particular, the set of covariance matrices for distributions belonging to a linear structural equation model for a fixed mixed graph G=(V, D,B) is parameterized by a rational function with parameters for each vertex and edge in G. This rational parametrization naturally allows for the study of these models from an algebraic and combinatorial point of view. Indeed, this point of view has led to a collection of results in the literature, mainly focusing on questions related to identifiability and determining relationships between covariances (i.e., finding polynomials in the Gaussian vanishing ideal). So far, a large proportion of these results has focused on the case when D, the directed part of the mixed graph G, is acyclic. This is due to the fact that in the acyclic case, the parametrization becomes polynomial and there is a description of the entries of the covariance matrices in terms of a finite sum. We move beyond the acyclic case and give a closed form expression for the entries of the covariance matrices in terms of the one-connections in a graph obtained from D through some small operations. This closed form expression then allows us to show that if G is simple, then the parametrization map is generically finite-to-one. Finally, having a closed form expression for the covariance matrices allows for the development of an algorithm for systematically exploring possible polynomials in the Gaussian vanishing ideal. 4 authors · Oct 1, 2022
- Learners' Languages In "Backprop as functor", the authors show that the fundamental elements of deep learning -- gradient descent and backpropagation -- can be conceptualized as a strong monoidal functor Para(Euc)toLearn from the category of parameterized Euclidean spaces to that of learners, a category developed explicitly to capture parameter update and backpropagation. It was soon realized that there is an isomorphism LearncongPara(Slens), where Slens is the symmetric monoidal category of simple lenses as used in functional programming. In this note, we observe that Slens is a full subcategory of Poly, the category of polynomial functors in one variable, via the functor Amapsto Ay^A. Using the fact that (Poly,otimes) is monoidal closed, we show that a map Ato B in Para(Slens) has a natural interpretation in terms of dynamical systems (more precisely, generalized Moore machines) whose interface is the internal-hom type [Ay^A,By^B]. Finally, we review the fact that the category p-Coalg of dynamical systems on any p in Poly forms a topos, and consider the logical propositions that can be stated in its internal language. We give gradient descent as an example, and we conclude by discussing some directions for future work. 1 authors · Mar 1, 2021
- Generalized Convolution and Efficient Language Recognition Convolution is a broadly useful operation with applications including signal processing, machine learning, probability, optics, polynomial multiplication, and efficient parsing. Usually, however, this operation is understood and implemented in more specialized forms, hiding commonalities and limiting usefulness. This paper formulates convolution in the common algebraic framework of semirings and semimodules and populates that framework with various representation types. One of those types is the grand abstract template and itself generalizes to the free semimodule monad. Other representations serve varied uses and performance trade-offs, with implementations calculated from simple and regular specifications. Of particular interest is Brzozowski's method for regular expression matching. Uncovering the method's essence frees it from syntactic manipulations, while generalizing from boolean to weighted membership (such as multisets and probability distributions) and from sets to n-ary relations. The classic trie data structure then provides an elegant and efficient alternative to syntax. Pleasantly, polynomial arithmetic requires no additional implementation effort, works correctly with a variety of representations, and handles multivariate polynomials and power series with ease. Image convolution also falls out as a special case. 1 authors · Mar 26, 2019
- Faces of highest weight modules and the universal Weyl polyhedron Let V be a highest weight module over a Kac-Moody algebra g, and let conv V denote the convex hull of its weights. We determine the combinatorial isomorphism type of conv V, i.e. we completely classify the faces and their inclusions. In the special case where g is semisimple, this brings closure to a question studied by Cellini-Marietti [IMRN 2015] for the adjoint representation, and by Khare [J. Algebra 2016; Trans. Amer. Math. Soc. 2017] for most modules. The determination of faces of finite-dimensional modules up to the Weyl group action and some of their inclusions also appears in previous work of Satake [Ann. of Math. 1960], Borel-Tits [IHES Publ. Math. 1965], Vinberg [Izv. Akad. Nauk 1990], and Casselman [Austral. Math. Soc. 1997]. For any subset of the simple roots, we introduce a remarkable convex cone which we call the universal Weyl polyhedron, which controls the convex hulls of all modules parabolically induced from the corresponding Levi factor. Namely, the combinatorial isomorphism type of the cone stores the classification of faces for all such highest weight modules, as well as how faces degenerate as the highest weight gets increasingly singular. To our knowledge, this cone is new in finite and infinite type. We further answer a question of Michel Brion, by showing that the localization of conv V along a face is always the convex hull of the weights of a parabolically induced module. Finally, as we determine the inclusion relations between faces representation-theoretically from the set of weights, without recourse to convexity, we answer a similar question for highest weight modules over symmetrizable quantum groups. 2 authors · Oct 31, 2016
- Reverse derivative categories The reverse derivative is a fundamental operation in machine learning and automatic differentiation. This paper gives a direct axiomatization of a category with a reverse derivative operation, in a similar style to that given by Cartesian differential categories for a forward derivative. Intriguingly, a category with a reverse derivative also has a forward derivative, but the converse is not true. In fact, we show explicitly what a forward derivative is missing: a reverse derivative is equivalent to a forward derivative with a dagger structure on its subcategory of linear maps. Furthermore, we show that these linear maps form an additively enriched category with dagger biproducts. 7 authors · Oct 15, 2019
- A strictly monotone measure on tame sets that corresponds to a numerosity Adapting standard methods from geometric measure theory, we provide an example of a polynomial-valued measure mu on tame sets in R^d which satisfies many desirable properties. Among these is strict monotonicity: the measure of a proper subset is strictly less than the measure of the whole set. Using techniques from non-standard analysis, we display that the domain of mu can be extended to all subsets of R^d (up to equivalence modulo infinitesimals). The resulting extension is a numerosity function that encodes the i-dimensional Hausdorff measure for all iin N, as well as the i-th intrinsic volume functions. 1 authors · Aug 23, 2020
- Hardy inequalities for fractional integrals on general domains We prove a sharp Hardy inequality for fractional integrals for functions that are supported on a general domain. The constant is the same as the one for the half-space and hence our result settles a recent conjecture of Bogdan and Dyda. 2 authors · Jul 17, 2009
- Ulrich bundles on double coverings of projective space Fixed a polarised variety X, we can ask if it admits Ulrich bundles and, in case, what is their minimal possible rank. In this thesis, after recalling general properties of Ulrich sheaves, we show that any finite covering of P^n that embeds as a divisor in a weighted projective space with weights (1^{n+1},m) admits Ulrich sheaves, by using matrix factorisations. Among these varieties, we focus on double coverings of with nge3. Through Hartshorne--Serre correspondence, which we review along the way, we prove that the general such X admits a rank 2 Ulrich sheaf if and only if n=3 and m=2,3,4, and characterise the zero loci of their sections. Moreover, we construct generically smooth components of the expected dimension of their moduli spaces, analyse the action of the natural involution on them and the restriction of those bundles to low degree hypersurfaces. For m=2,3, we verify the existence of slope-stable Ulrich bundles of all the possible ranks. 1 authors · Jul 12, 2025
- Alcove Walks and GKM Theory for Affine Flags We develop the GKM theory for the torus-equivariant cohomology of the affine flag variety using the combinatorics of alcove walks. Dual to the usual GKM setup, which depicts the orbits of the small torus action on a graph, alcove walks take place in tessellations of Euclidean space. Walks in affine rank two occur on triangulations of the plane, providing a more direct connection to splines used for approximating surfaces. Alcove walks in GKM theory also need not be minimal length, and can instead be randomly generated, giving rise to more flexible implementation. This work reinterprets and recovers classical results in GKM theory on the affine flag variety, generalizing them to both non-minimal and folded alcove walks, all motivated by applications to splines. 2 authors · Mar 21, 2023
1 Generative Language Modeling for Automated Theorem Proving We explore the application of transformer-based language models to automated theorem proving. This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans -- the generation of original mathematical terms -- might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance. GPT-f found new short proofs that were accepted into the main Metamath library, which is to our knowledge, the first time a deep-learning based system has contributed proofs that were adopted by a formal mathematics community. 2 authors · Sep 7, 2020
- Lambert W-function and Gauss class number one conjecture We study fixed points of a function arising in a representation theory of the Drinfeld modules by the bounded linear operators on a Hilbert space. We prove that such points correspond to number fields of the class number one. As an application, one gets a solution to the Gauss conjecture for the real quadratic fields of class number one. 1 authors · Dec 1, 2025
- Growth of spinors in the generalized Seiberg-Witten equations on mathbb R^4 and mathbb R^3 The classical Seiberg-Witten equations in dimensions three and four admit a natural generalization within a unified framework known as the generalized Seiberg-Witten (GSW) equations, which encompasses many important equations in gauge theory. This article proves that the averaged L^2-norm of any spinor with non-constant pointwise norm in the GSW equations on mathbb R^4 and mathbb R^3, measured over large-radius spheres, grows faster than a power of the radius, under a suitable curvature decay assumption. Separately, it is shown that if the Yang-Mills-Higgs energy of any solution of these equations is finite, then the pointwise norm of the spinor in it must converge to a non-negative constant at infinity. These two behaviors cannot occur simultaneously unless the spinor has constant pointwise norm. This work may be seen as partial generalization of results obtained by Taubes[Tau17a], and Nagy and Oliveira [NO19] for the Kapustin-Witten equations. 1 authors · Dec 29, 2023
1 The Relativity of Causal Knowledge Recent advances in artificial intelligence reveal the limits of purely predictive systems and call for a shift toward causal and collaborative reasoning. Drawing inspiration from the revolution of Grothendieck in mathematics, we introduce the relativity of causal knowledge, which posits structural causal models (SCMs) are inherently imperfect, subjective representations embedded within networks of relationships. By leveraging category theory, we arrange SCMs into a functor category and show that their observational and interventional probability measures naturally form convex structures. This result allows us to encode non-intervened SCMs with convex spaces of probability measures. Next, using sheaf theory, we construct the network sheaf and cosheaf of causal knowledge. These structures enable the transfer of causal knowledge across the network while incorporating interventional consistency and the perspective of the subjects, ultimately leading to the formal, mathematical definition of relative causal knowledge. 2 authors · Mar 13, 2025
- Category Theory for Quantum Natural Language Processing This thesis introduces quantum natural language processing (QNLP) models based on a simple yet powerful analogy between computational linguistics and quantum mechanics: grammar as entanglement. The grammatical structure of text and sentences connects the meaning of words in the same way that entanglement structure connects the states of quantum systems. Category theory allows to make this language-to-qubit analogy formal: it is a monoidal functor from grammar to vector spaces. We turn this abstract analogy into a concrete algorithm that translates the grammatical structure onto the architecture of parameterised quantum circuits. We then use a hybrid classical-quantum algorithm to train the model so that evaluating the circuits computes the meaning of sentences in data-driven tasks. The implementation of QNLP models motivated the development of DisCoPy (Distributional Compositional Python), the toolkit for applied category theory of which the first chapter gives a comprehensive overview. String diagrams are the core data structure of DisCoPy, they allow to reason about computation at a high level of abstraction. We show how they can encode both grammatical structures and quantum circuits, but also logical formulae, neural networks or arbitrary Python code. Monoidal functors allow to translate these abstract diagrams into concrete computation, interfacing with optimised task-specific libraries. The second chapter uses DisCopy to implement QNLP models as parameterised functors from grammar to quantum circuits. It gives a first proof-of-concept for the more general concept of functorial learning: generalising machine learning from functions to functors by learning from diagram-like data. In order to learn optimal functor parameters via gradient descent, we introduce the notion of diagrammatic differentiation: a graphical calculus for computing the gradients of parameterised diagrams. 1 authors · Dec 13, 2022
- Approximating the Convex Hull via Metric Space Magnitude Magnitude of a finite metric space and the related notion of magnitude functions on metric spaces is an active area of research in algebraic topology. Magnitude originally arose in the context of biology, where it represents the number of effective species in an environment; when applied to a one-parameter family of metric spaces tX with scale parameter t, the magnitude captures much of the underlying geometry of the space. Prior work has mostly focussed on properties of magnitude in a global sense; in this paper we restrict the sets to finite subsets of Euclidean space and investigate its individual components. We give an explicit formula for the corrected inclusion-exclusion principle, and define a quantity associated with each point, called the moment which gives an intrinsic ordering to the points. We exploit this in order to form an algorithm which approximates the convex hull. 3 authors · Aug 7, 2019
- A link between covering and coefficient theorems for holomorphic functions Recently the author presented a new approach to solving the coefficient problems for various classes of holomorphic functions f(z) = sumlimits_0^infty c_n z^n, not necessarily univalent. This approach is based on lifting the given polynomial coefficient functionals J(f) = J(c_{m_1}, dots, c_{m_s}), 2 < c_{m_1} < dots < c_{m_s} < infty, onto the Bers fiber space over universal Teichmuller space and applying the analytic and geometric features of Teichm\"{u}ller spaces, especially the Bers isomorphism theorem for Teichmuller spaces of punctured Riemann surfaces. In this paper, we extend this approach to more general classes of functions. In particular, this provides a strengthening of de Branges' theorem solving the Bieberbach conjecture. 1 authors · Apr 1, 2025
- An open-closed Deligne-Mumford field theory associated to a Lagrangian submanifold Let L subset X be a compact embedded Lagrangian in a compact symplectic manifold. We present the moduli spaces of holomorphic maps of arbitrary genus with boundary on L as a global Kuranishi chart, generalising the work of Abouzaid-McLean-Smith and Hirschi-Swaminathan. We use this to define an open-closed Deligne-Mumford theory whose open genus zero part is the Fukaya A_infty algebra associated to L, and whose closed part gives the Gromov--Witten theory of X. Combined with results of Costello, this has applications in obtaining Gromov--Witten invariants from the Fukaya category. 2 authors · Jan 8, 2025
- Classifying Clustering Schemes Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose various structural conditions on the clustering schemes, under the general heading of functoriality. Functoriality refers to the idea that one should be able to compare the results of clustering algorithms as one varies the data set, for example by adding points or by applying functions to it. We show that within this framework, one can prove a theorems analogous to one of J. Kleinberg, in which for example one obtains an existence and uniqueness theorem instead of a non-existence result. We obtain a full classification of all clustering schemes satisfying a condition we refer to as excisiveness. The classification can be changed by varying the notion of maps of finite metric spaces. The conditions occur naturally when one considers clustering as the statistical version of the geometric notion of connected components. By varying the degree of functoriality that one requires from the schemes it is possible to construct richer families of clustering schemes that exhibit sensitivity to density. 2 authors · Nov 23, 2010
- Neural Network Approximations of PDEs Beyond Linearity: A Representational Perspective A burgeoning line of research leverages deep neural networks to approximate the solutions to high dimensional PDEs, opening lines of theoretical inquiry focused on explaining how it is that these models appear to evade the curse of dimensionality. However, most prior theoretical analyses have been limited to linear PDEs. In this work, we take a step towards studying the representational power of neural networks for approximating solutions to nonlinear PDEs. We focus on a class of PDEs known as nonlinear elliptic variational PDEs, whose solutions minimize an Euler-Lagrange energy functional E(u) = int_Omega L(x, u(x), nabla u(x)) - f(x) u(x)dx. We show that if composing a function with Barron norm b with partial derivatives of L produces a function of Barron norm at most B_L b^p, the solution to the PDE can be epsilon-approximated in the L^2 sense by a function with Barron norm Oleft(left(dB_Lright)^{max{p log(1/ epsilon), p^{log(1/epsilon)}}}right). By a classical result due to Barron [1993], this correspondingly bounds the size of a 2-layer neural network needed to approximate the solution. Treating p, epsilon, B_L as constants, this quantity is polynomial in dimension, thus showing neural networks can evade the curse of dimensionality. Our proof technique involves neurally simulating (preconditioned) gradient in an appropriate Hilbert space, which converges exponentially fast to the solution of the PDE, and such that we can bound the increase of the Barron norm at each iterate. Our results subsume and substantially generalize analogous prior results for linear elliptic PDEs over a unit hypercube. 4 authors · Oct 21, 2022
- On Signs of eigenvalues of Modular forms satisfying Ramanujan Conjecture Let F in S_{k_1}(Gamma^{(2)}(N_1)) and G in S_{k_2}(Gamma^{(2)}(N_2)) be two Siegel cusp forms over the congruence subgroups Gamma^{(2)}(N_1) and Gamma^{(2)}(N_2) respectively. Assume that they are Hecke eigenforms in different eigenspaces and satisfy the Generalized Ramanujan Conjecture. Let lambda_F(p) denote the eigenvalue of F with respect to the Hecke operator T(p). In this article, we compute a lower bound for the density of the set of primes, { p : lambda_F(p) lambda_G(p) < 0 }. 1 authors · Dec 12, 2024
1 Graph Convolutional Neural Networks as Parametric CoKleisli morphisms We define the bicategory of Graph Convolutional Neural Networks GCNN_n for an arbitrary graph with n nodes. We show it can be factored through the already existing categorical constructions for deep learning called Para and Lens with the base category set to the CoKleisli category of the product comonad. We prove that there exists an injective-on-objects, faithful 2-functor GCNN_n to Para(CoKl(R^{n times n} times -)). We show that this construction allows us to treat the adjacency matrix of a GCNN as a global parameter instead of a a local, layer-wise one. This gives us a high-level categorical characterisation of a particular kind of inductive bias GCNNs possess. Lastly, we hypothesize about possible generalisations of GCNNs to general message-passing graph neural networks, connections to equivariant learning, and the (lack of) functoriality of activation functions. 2 authors · Dec 1, 2022
- Probability, valuations, hyperspace: Three monads on Top and the support as a morphism We consider three monads on Top, the category of topological spaces, which formalize topological aspects of probability and possibility in categorical terms. The first one is the Hoare hyperspace monad H, which assigns to every space its space of closed subsets equipped with the lower Vietoris topology. The second is the monad V of continuous valuations, also known as the extended probabilistic powerdomain. We construct both monads in a unified way in terms of double dualization. This reveals a close analogy between them, and allows us to prove that the operation of taking the support of a continuous valuation is a morphism of monads from V to H. In particular, this implies that every H-algebra (topological complete semilattice) is also a V-algebra. Third, we show that V can be restricted to a submonad of tau-smooth probability measures on Top. By composing these two morphisms of monads, we obtain that taking the support of a tau-smooth probability measure is also a morphism of monads. 3 authors · Oct 8, 2019
- Morse theory and Seiberg-Witten moduli spaces of 3-dimensional cobordisms, I Motivated by a variant of Atiyah-Floer conjecture proposed in L2 and its potential generalizations, we study in this article and its sequel as a first step properties of moduli spaces of Seiberg-Witten equations on a 3-dimensional cobordism with cylindrical ends (CCE) \(Y\), perturbed by closed 2-forms of the form \(r*d\ff+w\), where \(r\geq 1\), where \(\ff\) is a harmonic Morse function with certain linear growth at the ends of \(Y\), and \(w\) is a certain closed 2-form. 1 authors · Dec 29, 2024
- Extrinsic systole of Seifert surfaces and distortion of knots In 1983, Gromov introduced the notion of distortion of a knot, and asked if there are knots with arbitrarily large distortion. In 2011, Pardon proved that the distortion of T_{p,q} is at least min{p,q} up to a constant factor. We prove that the distortion of T_{p, p+1}# K is at least p up to a constant, independent of K. We also prove that any embedding of a minimal genus Seifert surface for T_{p,p+1}# K in R^3 has small extrinsic systole, in the sense that it contains a non-contractible loop with small R^3-diameter relative to the length of the knot. These results are related to combinatorial properties of the monodromy map associated to torus knots. 1 authors · Oct 1, 2025
- Complexity of counting points on curves and the factor P_1(T) of the zeta function of surfaces This article concerns the computational complexity of a fundamental problem in number theory: counting points on curves and surfaces over finite fields. There is no subexponential-time algorithm known and it is unclear if it can be NP-hard. Given a curve, we present the first efficient Arthur-Merlin protocol to certify its point-count, its Jacobian group structure, and its Hasse-Weil zeta function. We extend this result to a smooth projective surface to certify the factor P_{1}(T), corresponding to the first Betti number, of the zeta function; by using the counting oracle. We give the first algorithm to compute P_{1}(T) that is poly(log q)-time if the degree D of the input surface is fixed; and in quantum poly(Dlog q)-time in general. Our technique in the curve case, is to sample hash functions using the Weil and Riemann-Roch bounds, to certify the group order of its Jacobian. For higher dimension varieties, we first reduce to the case of a surface, which is fibred as a Lefschetz pencil of hyperplane sections over P^{1}. The formalism of vanishing cycles, and the inherent big monodromy, enable us to prove an effective version of Deligne's `theoreme du pgcd' using the hard-Lefschetz theorem and an equidistribution result due to Katz. These reduce our investigations to that of computing the zeta function of a curve, defined over a finite field extension F_{Q}/F_{q} of poly-bounded degree. This explicitization of the theory yields the first nontrivial upper bounds on the computational complexity. 3 authors · Nov 4, 2025
- Categories of Differentiable Polynomial Circuits for Machine Learning Reverse derivative categories (RDCs) have recently been shown to be a suitable semantic framework for studying machine learning algorithms. Whereas emphasis has been put on training methodologies, less attention has been devoted to particular model classes: the concrete categories whose morphisms represent machine learning models. In this paper we study presentations by generators and equations of classes of RDCs. In particular, we propose polynomial circuits as a suitable machine learning model. We give an axiomatisation for these circuits and prove a functional completeness result. Finally, we discuss the use of polynomial circuits over specific semirings to perform machine learning with discrete values. 2 authors · Mar 12, 2022
- Stable rationality of hypersurfaces in schön affine varieties In recent years, there has been a development in approaching rationality problems through the motivic methods (cf. [Kontsevich--Tschinkel'19], [Nicaise--Shinder'19], [Nicaise--Ottem'21]). This method requires the explicit construction of degeneration families of curves with favorable properties. While the specific construction is generally difficult, [Nicaise--Ottem'22] combines combinatorial methods to construct degeneration families of hypersurfaces in toric varieties and shows the non-stable rationality of a very general hypersurface in projective spaces. In this paper, we extend the result of [Nicaise--Ottem'22] not only for hypersurfaces in algebraic tori but also to those in sch\"{o}n affine varieties. In application, we show the irrationality of certain hypersurfaces in the complex Grassmannian variety Gr(2, n) using the motivic method, which coincides with the result obtained by the same author in the previous research. 1 authors · Feb 12, 2025
- How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning We show that deep neural networks (DNNs) can efficiently learn any composition of functions with bounded F_{1}-norm, which allows DNNs to break the curse of dimensionality in ways that shallow networks cannot. More specifically, we derive a generalization bound that combines a covering number argument for compositionality, and the F_{1}-norm (or the related Barron norm) for large width adaptivity. We show that the global minimizer of the regularized loss of DNNs can fit for example the composition of two functions f^{*}=hcirc g from a small number of observations, assuming g is smooth/regular and reduces the dimensionality (e.g. g could be the modulo map of the symmetries of f^{*}), so that h can be learned in spite of its low regularity. The measures of regularity we consider is the Sobolev norm with different levels of differentiability, which is well adapted to the F_{1} norm. We compute scaling laws empirically and observe phase transitions depending on whether g or h is harder to learn, as predicted by our theory. 3 authors · Jul 8, 2024
- NaturalProver: Grounded Mathematical Proof Generation with Language Models Theorem proving in natural mathematical language - the mixture of symbolic and natural language used by humans - plays a central role in mathematical advances and education, and tests aspects of reasoning that are core to intelligence. Yet it has remained underexplored with modern generative models. We study large-scale language models on two new generation tasks: suggesting the next step in a mathematical proof, and full proof generation. We develop NaturalProver, a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding. On theorems from the NaturalProofs benchmark, NaturalProver improves the quality of next-step suggestions and generated proofs over fine-tuned GPT-3, according to human evaluations from university-level mathematics students. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time, which is to our knowledge the first demonstration of these capabilities using neural language models. 5 authors · May 25, 2022
- Fast Matrix Multiplication via Ternary Meta Flip Graphs Matrix multiplication optimization remains a fundamental challenge in computational mathematics. This work introduces a novel approach that discovers matrix multiplication schemes in the ternary field (Z_T), where coefficients are restricted to {-1, 0, 1} to minimize naive additive complexity. The core of the method is a GPU-accelerated meta flip graph algorithm that maintains ternary safety through specialized arithmetic operations and sign symmetry breaking. Key results include new best ranks for the formats 4 times 5 times 12, 5 times 6 times 10, and 6 times 7 times 9, the independent discovery of 32 schemes in Z_T that match known optimal ranks (including 8 previously known only with rational coefficients), and 30 rank improvements in the binary field. The analysis of 164 known schemes shows that 92 can be implemented in Z_T, while 72 could not be found in the ternary field with current methods, defining the current boundaries of this approach. All software, results, and discovered schemes are provided as open-source. 1 authors · Nov 25, 2025
- Einstein metrics on aligned homogeneous spaces with two factors Given two homogeneous spaces of the form G_1/K and G_2/K, where G_1 and G_2 are compact simple Lie groups, we study the existence problem for G_1xG_2-invariant Einstein metrics on the homogeneous space M=G_1xG_2/K. For the large subclass C of spaces having three pairwise inequivalent isotropy irreducible summands (12 infinite families and 70 sporadic examples), we obtain that existence is equivalent to the existence of a real root for certain quartic polynomial depending on the dimensions and two Killing constants, which allows a full classification and the possibility to weigh the existence and non-existence pieces of C. 2 authors · Aug 1, 2024
- Commutative Width and Depth Scaling in Deep Neural Networks This paper is the second in the series Commutative Scaling of Width and Depth (WD) about commutativity of infinite width and depth limits in deep neural networks. Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i.e. the neural function tends to the same limit no matter how width and depth limits are taken. In this paper, we formally introduce and define the commutativity framework, and discuss its implications on neural network design and scaling. We study commutativity for the neural covariance kernel which reflects how network layers separate data. Our findings extend previous results established in [55] by showing that taking the width and depth to infinity in a deep neural network with skip connections, when branches are suitably scaled to avoid exploding behaviour, result in the same covariance structure no matter how that limit is taken. This has a number of theoretical and practical implications that we discuss in the paper. The proof techniques in this paper are novel and rely on tools that are more accessible to readers who are not familiar with stochastic calculus (used in the proofs of WD(I))). 1 authors · Oct 2, 2023
- Backprop as Functor: A compositional perspective on supervised learning A supervised learning algorithm searches over a set of functions A to B parametrised by a space P to find the best approximation to some ideal function fcolon A to B. It does this by taking examples (a,f(a)) in Atimes B, and updating the parameter according to some rule. We define a category where these update rules may be composed, and show that gradient descent---with respect to a fixed step size and an error function satisfying a certain property---defines a monoidal functor from a category of parametrised functions to this category of update rules. This provides a structural perspective on backpropagation, as well as a broad generalisation of neural networks. 3 authors · Nov 28, 2017
- Specializations of partial differential equations for Feynman integrals Starting from the Mellin-Barnes integral representation of a Feynman integral depending on set of kinematic variables z_i, we derive a system of partial differential equations w.r.t.\ new variables x_j, which parameterize the differentiable constraints z_i=y_i(x_j). In our algorithm, the powers of propagators can be considered as arbitrary parameters. Our algorithm can also be used for the reduction of multiple hypergeometric sums to sums of lower dimension, finding special values and reduction equations of hypergeometric functions in a singular locus of continuous variables, or finding systems of partial differential equations for master integrals with arbitrary powers of propagators. As an illustration, we produce a differential equation of fourth order in one variable for the one-loop two-point Feynman diagram with two different masses and arbitrary propagator powers. 3 authors · Jul 18, 2022
13 Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation Formal proofs are challenging to write even for experienced experts. Recent progress in Neural Theorem Proving (NTP) shows promise in expediting this process. However, the formal corpora available on the Internet are limited compared to the general text, posing a significant data scarcity challenge for NTP. To address this issue, this work proposes Alchemy, a general framework for data synthesis that constructs formal theorems through symbolic mutation. Specifically, for each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it. Subsequently, we mutate the candidate theorem by replacing the corresponding term in the statement with its equivalent form or antecedent. As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M. Furthermore, we perform continual pretraining and supervised finetuning on this augmented corpus for large language models. Experimental results demonstrate the effectiveness of our approach, achieving a 5% absolute performance improvement on Leandojo benchmark. Additionally, our synthetic data achieve a 2.5% absolute performance gain on the out-of-distribution miniF2F benchmark. To provide further insights, we conduct a comprehensive analysis of synthetic data composition and the training paradigm, offering valuable guidance for developing a strong theorem prover. 5 authors · Oct 21, 2024 3
- Adiabatic Solutions of the Haydys-Witten Equations and Symplectic Khovanov Homology An influential conjecture by Witten states that there is an instanton Floer homology of four-manifolds with corners that in certain situations is isomorphic to Khovanov homology of a given knot K. The Floer chain complex is generated by Nahm pole solutions of the Kapustin-Witten equations on R^3 times R^+_y with an additional monopole-like singular behaviour along the knot K inside the three-dimensional boundary at y=0. The Floer differential is given by counting solutions of the Haydys-Witten equations that interpolate between Kapustin-Witten solutions along an additional flow direction R_s. This article investigates solutions of a decoupled version of the Kapustin-Witten and Haydys-Witten equations on R_s times R^3 times R^+_y, which in contrast to the full equations exhibit a Hermitian Yang-Mills structure and can be viewed as a lift of the extended Bogomolny equations (EBE) from three to five dimensions. Inspired by Gaiotto-Witten's approach of adiabatically braiding EBE-solutions to obtain generators of the Floer homology, we propose that there is an equivalence between adiabatic solutions of the decoupled Haydys-Witten equations and non-vertical paths in the moduli space of EBE-solutions fibered over the space of monopole positions. Moreover, we argue that the Grothendieck-Springer resolution of the Lie algebra of the gauge group provides a finite-dimensional model of this moduli space of monopole solutions. These considerations suggest an intriguing similarity between Haydys-Witten instanton Floer homology and symplectic Khovanov homology and provide a novel approach towards a proof of Witten's gauge-theoretic interpretations of Khovanov homology. 1 authors · Jan 2, 2025
- Generative Logic: A New Computer Architecture for Deterministic Reasoning and Knowledge Generation We present Generative Logic (GL), a deterministic architecture that begins from user-supplied axiomatic definitions -- written in a minimalist Mathematical Programming Language (MPL) -- and systematically explores their deductive neighborhood. Definitions are compiled into a distributed grid of simple Logic Blocks (LBs) that exchange messages; any time several expressions unify under an inference rule, a new fact is emitted with full provenance to its sources, yielding replayable, auditable proof graphs. A prototype software implementation instantiates the workflow on first-order Peano arithmetic. Starting only from the Peano axioms, GL enumerates candidate implications, applies normalization and type filters, and automatically reconstructs machine-checkable proofs of foundational arithmetic laws including associativity and commutativity of addition, associativity and commutativity of multiplication, and distributivity. Generated proofs export to navigable HTML so that every inference step can be inspected independently. We outline a hardware-software co-design path toward massively parallel realizations and describe prospective integration with probabilistic models (e.g., Large Language Models (LLMs)) for autoformalization and conjecture seeding. The Python and MPL code to reproduce the Peano experiments, along with the full HTML proof graphs, are available in the project's GitHub repository at https://github.com/Generative-Logic/GL/tree/35a111ea9ba53afe051703d6050be0c3923e9724 and are permanently archived at https://doi.org/10.5281/zenodo.16408441. We invite community feedback and collaboration. 1 authors · Jul 25, 2025
1 The Gauss-Markov Adjunction: Categorical Semantics of Residuals in Supervised Learning Enhancing the intelligibility and interpretability of machine learning is a crucial task in responding to the demand for Explicability as an AI principle, and in promoting the better social implementation of AI. The aim of our research is to contribute to this improvement by reformulating machine learning models through the lens of category theory, thereby developing a semantic framework for structuring and understanding AI systems. Our categorical modeling in this paper clarifies and formalizes the structural interplay between residuals and parameters in supervised learning. The present paper focuses on the multiple linear regression model, which represents the most basic form of supervised learning. By defining two concrete categories corresponding to parameters and data, along with an adjoint pair of functors between them, we introduce our categorical formulation of supervised learning. We show that the essential structure of this framework is captured by what we call the Gauss-Markov Adjunction. Within this setting, the dual flow of information can be explicitly described as a correspondence between variations in parameters and residuals. The ordinary least squares estimator for the parameters and the minimum residual are related via the preservation of limits by the right adjoint functor. Furthermore, we position this formulation as an instance of extended denotational semantics for supervised learning, and propose applying a semantic perspective developed in theoretical computer science as a formal foundation for Explicability in AI. 1 authors · Jul 3, 2025 1
- On Hofstadter's G-sequence We characterize the entries of Hofstadter's G-sequence in terms of the lower and upper Wythoff sequences. This can be used to give a short and comprehensive proof of the equality of Hofstadter's G-sequence and the sequence of averages of the swapped Wythoff sequences. In a second part we give some results that hold when one replaces the golden mean by other quadratic algebraic numbers. In a third part we prove a close relationship between Hofstadter's G-sequence and a sequence studied by Avdivpahic and Zejnulahi. 1 authors · Jul 4, 2023
- Cusps and Commensurability Classes of Hyperbolic 4-Manifolds There are six orientable, compact, flat 3-manifolds that can occur as cusp cross-sections of hyperbolic 4-manifolds. This paper provides criteria for exactly when a given commensurability class of arithmetic hyperbolic 4-manifolds contains a representative with a given cusp type. In particular, for three of the six cusp types, we provide infinitely many examples of commensurability classes that contain no manifolds with cusps of the given type; no such examples were previously known for any cusp type. 1 authors · Sep 24, 2021
7 Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes Math reasoning is a highly active area of Large Language Model (LLM) research because it is a hallmark of artificial intelligence. However, few works have explored how math reasoning is encoded within LLM parameters and if it is a skill that can be isolated within a model. Doing so could allow targeted intervention to improve math performance without altering non-math behavior and foster understanding of how models encode math reasoning. We introduce Math Neurosurgery (MathNeuro), a method for isolating math-specific parameters in LLMs using only forward passes. MathNeuro builds on existing work by using weights and activations to calculate parameter importance, but isolates math-specific parameters by removing those important for general language tasks. Pruning parameters MathNeuro identifies deletes a LLM's math reasoning ability without destroying its general language ability. Scaling these parameters by a small constant improves a pretrained or instruction-tuned LLM's performance by 4-17% on GSM8K while leaving non-math behavior unaltered. MathNeuro is also data efficient: most of its effectiveness holds when identifying math-specific parameters using a single sample. MathNeuro highlights the potential for future work to intervene on math-specific parameters. 4 authors · Oct 22, 2024 2
2 Sheaf Theory through Examples (Abridged Version) This book provides an inviting tour through sheaf theory, from the perspective of applied category theory and pitched at a less specialized audience than is typical with introductions to sheaves. The book makes it as easy as possible for the reader new to sheaves, by motivating and developing the theory via a broad range of concrete examples and explicit constructions, including applications to n-colorings of graphs, satellite data, chess problems, Bayes nets, musical performance, complexes, and more. Included is an extended first chapter introducing and motivating all the necessary category-theoretical background, again with a strong emphasis on concrete examples. A new and unabridged version (including a fifth chapter on more advanced topics and a conclusion) will be available with MIT Press. 1 authors · Dec 15, 2020
- Immersions of complexes of groups Given a complex of groups, we construct a new class of complex of groups that records its local data and offer a functorial perspective on the statement that complexes of groups are locally developable. We also construct a new notion of an immersion of complexes of groups and establish that a locally isometric immersion of a complex of groups into a non-positively curved complex of groups is pi_1-injective. Furthermore, the domain complex of groups is developable and the induced map on geometric realizations of developments is an isometric embedding. 1 authors · Oct 1, 2025
- Yet another argument in favour of NP=CoNP This article shows yet another proof of NP=CoNP$. In a previous article, we proved that NP=PSPACE and from it we can conclude that NP=CoNP immediately. The former proof shows how to obtain polynomial and, polynomial in time checkable Dag-like proofs for all purely implicational Minimal logic tautologies. From the fact that Minimal implicational logic is PSPACE-complete we get the proof that NP=PSPACE. This first proof of NP=CoNP uses Hudelmaier linear upper-bound on the height of Sequent Calculus minimal implicational logic proofs. In an addendum to the proof of NP=PSPACE, we observe that we do not need to use Hudelmaier upper-bound since any proof of non-hamiltonicity for any graph is linear upper-bounded. By the CoNP-completeness of non-hamiltonicity, we obtain NP=CoNP as a corollary of the first proof. In this article we show the third proof of CoNP=NP, also providing polynomial size and polynomial verifiable certificates that are Dags. They are generated from normal Natural Deduction proofs, linear height upper-bounded too, by removing redundancy, i.e., repeated parts. The existence of repeated parts is a consequence of the redundancy theorem for a family of super-polynomial proofs in the purely implicational Minimal logic. It is mandatory to read at least two previous articles to get the details of the proof presented here. The article that proves the redundancy theorem and the article that shows how to remove the repeated parts of a normal Natural Deduction proof to have a polynomial Dag certificate for minimal implicational logic tautologies. 1 authors · Dec 28, 2020
- The Syntax and Semantics of einsum In 2011, einsum was introduced to NumPy as a practical and convenient notation for tensor expressions in machine learning, quantum circuit simulation, and other fields. It has since been implemented in additional Python frameworks such as PyTorch and TensorFlow, as well as in other programming languages such as Julia. Despite its practical success, the einsum notation still lacks a solid theoretical basis, and is not unified across the different frameworks, limiting opportunities for formal reasoning and systematic optimization. In this work, we discuss the terminology of tensor expressions and provide a formal definition of the einsum language. Based on this definition, we formalize and prove important equivalence rules for tensor expressions and highlight their relevance in practical applications. 4 authors · Sep 24, 2025
- On the Hasse principle for divisibility in elliptic curves Let p be a prime number and n a positive integer. Let E be an elliptic curve defined over a number field k. It is known that the local-global divisibility by p holds in E/k, but for powers of p^n counterexamples may appear. The validity or the failing of the Hasse principle depends on the elliptic curve E and the field k and, consequently, on the group Gal(k(E[p^n])/k). For which kind of these groups does the principle hold? For which of them can we find a counterexample? The answer to these questions was known for n=1,2, but for ngeq 3 they were still open. We show some conditions on the generators of Gal(k(E[p^n])/k) implying an affirmative answer to the local-global divisibility by p^n in E over k, for every ngeq 2. We also prove that these conditions are necessary by producing counterexamples in the case when they do not hold. These last results generalize to every power p^n, a result obtained by Ranieri for n=2. 2 authors · Nov 3, 2025
- A Heegaard-Floer TQFT for link cobordisms We introduce a Heegaard-Floer homology functor from the category of oriented links in closed 3-manifolds and oriented surface cobordisms in 4-manifolds connecting them to the category of F[v]-modules and F[v]-homomorphisms between them, where F is the field with two elements. In comparison with previously defined TQFTs for decorated links and link cobordisms, the construction of this paper has the advantage of being independent from the decoration. Some of the basic properties of this functor are also explored. 1 authors · Jun 20, 2024
- Consistency of the Predicative Calculus of Cumulative Inductive Constructions (pCuIC) In order to avoid well-know paradoxes associated with self-referential definitions, higher-order dependent type theories stratify the theory using a countably infinite hierarchy of universes (also known as sorts), Type_0 : Type_1 : cdots . Such type systems are called cumulative if for any type A we have that A : Type_{i} implies A : Type_{i+1}. The predicative calculus of inductive constructions (pCIC) which forms the basis of the Coq proof assistant, is one such system. In this paper we present and establish the soundness of the predicative calculus of cumulative inductive constructions (pCuIC) which extends the cumulativity relation to inductive types. 2 authors · Oct 11, 2017
- Generating functions for some series of characters of classical Lie groups There exist a number of well known multiplicative generating functions for series of Schur functions. Amongst these are some related to the dual Cauchy identity whose expansion coefficients are rather simple, and in some cases periodic in parameters specifying the Schur functions. More recently similar identities have been found involving expansions in terms of characters of the symplectic group. Here these results are extended and generalised to all classical Lie groups. This is done through the derivation of explicit recurrence relations for the expansion coefficients based on the action of the Weyl groups of both the symplectic and orthogonal groups. Copious results are tabulated in the form of explicit values of the expansion coefficients as functions of highest weight parameters. An alternative approach is then based on dual pairs of symplectic and/or orthogonal groups. A byproduct of this approach is that expansions in terms of spin orthogonal group characters can always be recovered from non-spin cases. 1 authors · Mar 1, 2023
- Bosonisation Cohomology: Spin Structure Summation in Every Dimension Gauging fermion parity and summing over spin structures are subtly distinct operations. We introduce 'bosonisation cohomology' groups H_B^{d+2}(X) to capture this difference, for theories in spacetime dimension d equipped with maps to some X. Non-trivial classes in H_B^{d+2}(X) contain theories for which (-1)^F is anomaly-free, but spin structure summation is anomalous. We formulate a sequence of cobordism groups whose failure to be exact is measured by H_B^{d+2}(X), and from here we compute it for X=pt. The result is non-trivial only in dimensions din 4Z+2, being due to the presence of gravitational anomalies. The first few are H_B^4=Z_2, probed by a theory of 8 Majorana-Weyl fermions in d=2, then H_B^8=Z_8, H_B^{12}=Z_{16}times Z_2. We rigorously derive a general formula extending this to every spacetime dimension. Along the way, we compile many general facts about (fermionic and bosonic) anomaly polynomials, and about spin and pin^- (co)bordism generators, that we hope might serve as a useful reference for physicists working with these objects. We briefly discuss some physics applications, including how the H_B^{12} class is trivialised in supergravity. Despite the name, and notation, we make no claim that H_B^bullet(X) actually defines a cohomology theory (in the Eilenberg-Steenrod sense). 2 authors · Nov 17, 2025
- Homotopy Limits and Homotopy Colimits of Chain Complexes We prove that the homotopy limits and homotopy colimits of chain complexes can be computed by the cobar and bar constructions. We also show that the totalizations of double complexes compute the homotopy limits and homotopy colimits of simplicial and cosimplicial chain complexes. 1 authors · Sep 29, 2023
2 All Weight Systems for Calabi-Yau Fourfolds from Reflexive Polyhedra For any given dimension d, all reflexive d-polytopes can be found (in principle) as subpolytopes of a number of maximal polyhedra that are defined in terms of (d+1)-tuples of integers (weights), or combinations of k-tuples of weights with k<d+1. We present the results of a complete classification of sextuples of weights pertaining to the construction of all reflexive polytopes in five dimensions. We find 322 383 760 930 such weight systems. 185 269 499 015 of them give rise directly to reflexive polytopes and thereby to mirror pairs of Calabi-Yau fourfolds. These lead to 532 600 483 distinct sets of Hodge numbers. 2 authors · Aug 7, 2018
- Enabling Efficient Equivariant Operations in the Fourier Basis via Gaunt Tensor Products Developing equivariant neural networks for the E(3) group plays an important role in modeling 3D data across real-world applications. Enforcing this equivariance primarily involves the tensor products of irreducible representations (irreps). However, the computational complexity of such operations increases significantly as higher-order tensors are used. In this work, we propose a systematic approach to substantially accelerate the computation of the tensor products of irreps. We mathematically connect the commonly used Clebsch-Gordan coefficients to the Gaunt coefficients, which are integrals of products of three spherical harmonics. Through Gaunt coefficients, the tensor product of irreps becomes equivalent to the multiplication between spherical functions represented by spherical harmonics. This perspective further allows us to change the basis for the equivariant operations from spherical harmonics to a 2D Fourier basis. Consequently, the multiplication between spherical functions represented by a 2D Fourier basis can be efficiently computed via the convolution theorem and Fast Fourier Transforms. This transformation reduces the complexity of full tensor products of irreps from O(L^6) to O(L^3), where L is the max degree of irreps. Leveraging this approach, we introduce the Gaunt Tensor Product, which serves as a new method to construct efficient equivariant operations across different model architectures. Our experiments on the Open Catalyst Project and 3BPA datasets demonstrate both the increased efficiency and improved performance of our approach. 3 authors · Jan 18, 2024
- Higher Order Automatic Differentiation of Higher Order Functions We present semantic correctness proofs of automatic differentiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with respect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Throughout, we show how the analysis extends to AD methods for computing higher order derivatives using a Taylor approximation. 3 authors · Jan 17, 2021
2 What Algorithms can Transformers Learn? A Study in Length Generalization Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers. 8 authors · Oct 24, 2023
- MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, Isabelle (partially) and HOL Light (partially) and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses. We report baseline results using GPT-f, a neural theorem prover based on GPT-3 and provide an analysis of its performance. We intend for miniF2F to be a community-driven effort and hope that our benchmark will help spur advances in neural theorem proving. 3 authors · Aug 31, 2021
- A proof of van der Waerden's Conjecture on random Galois groups of polynomials Of the (2H+1)^n monic integer polynomials f(x)=x^n+a_1 x^{n-1}+cdots+a_n with max{|a_1|,ldots,|a_n|}leq H, how many have associated Galois group that is not the full symmetric group S_n? There are clearly gg H^{n-1} such polynomials, as may be obtained by setting a_n=0. In 1936, van der Waerden conjectured that O(H^{n-1}) should in fact also be the correct upper bound for the count of such polynomials. The conjecture has been known previously for degrees nleq 4, due to work of van der Waerden and Chow and Dietmann. In this expository article, we outline a proof of van der Waerden's Conjecture for all degrees n. 1 authors · Oct 3, 2024
- On the generation of periodic discrete structures with identical two-point correlation Strategies for the generation of periodic discrete structures with identical two-point correlation are developed. Starting from a pair of root structures, which are not related by translation, phase inversion or axis reflections, child structures of arbitrary resolution (i.e., pixel or voxel numbers) and number of phases (i.e., material phases/species) can be generated by means of trivial embedding based phase extension, application of kernels and/or phase coalescence, such that the generated structures inherit the two-point-correlation equivalence. Proofs of the inheritance property are provided by means of the Discrete Fourier Transform theory. A Python 3 implementation of the results is offered by the authors through the Github repository https://github.com/DataAnalyticsEngineering/EQ2PC in order to make the provided results reproducible and useful for all interested readers. Examples for the generation of structures are demonstrated, together with applications in the homogenization theory of periodic media. 2 authors · Feb 4, 2020
- Mathematical Capabilities of ChatGPT We investigate the mathematical capabilities of ChatGPT by testing it on publicly available datasets, as well as hand-crafted ones, and measuring its performance against other models trained on a mathematical corpus, such as Minerva. We also test whether ChatGPT can be a useful assistant to professional mathematicians by emulating various use cases that come up in the daily professional activities of mathematicians (question answering, theorem searching). In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, only cover elementary mathematics. We address this issue by introducing a new dataset: GHOSTS. It is the first natural-language dataset made and curated by working researchers in mathematics that (1) aims to cover graduate-level mathematics and (2) provides a holistic overview of the mathematical capabilities of language models. We benchmark ChatGPT on GHOSTS and evaluate performance against fine-grained criteria. We make this new dataset publicly available to assist a community-driven comparison of ChatGPT with (future) large language models in terms of advanced mathematical comprehension. We conclude that contrary to many positive reports in the media (a potential case of selection bias), ChatGPT's mathematical abilities are significantly below those of an average mathematics graduate student. Our results show that ChatGPT often understands the question but fails to provide correct solutions. Hence, if your goal is to use it to pass a university exam, you would be better off copying from your average peer! 8 authors · Jan 31, 2023
1 Galois Theory These are the notes for an undergraduate course at the University of Edinburgh, 2021-2023. Assuming basic knowledge of ring theory, group theory and linear algebra, the notes lay out the theory of field extensions and their Galois groups, up to and including the fundamental theorem of Galois theory. Also included are a section on ruler and compass constructions, a proof that solvable polynomials have solvable Galois groups, and the classification of finite fields. 1 authors · Aug 14, 2024
- Homomorphisms between multidimensional constant-shape substitutions We study a class of Z^{d}-substitutive subshifts, including a large family of constant-length substitutions, and homomorphisms between them, i.e., factors modulo isomorphisms of Z^{d}. We prove that any measurable factor map and even any homomorphism associated to a matrix commuting with the expansion matrix, induces a continuous one. We also get strong restrictions on the normalizer group, proving that any endomorphism is invertible, the normalizer group is virtually generated by the shift action and the quotient of the normalizer group by the automorphisms is restricted by the digit tile of the substitution. 1 authors · Jun 19, 2021
- Fullness of the Kuznetsov-Polishchuk exceptional collection for the spinor tenfold Kuznetsov and Polishchuk provided a general algorithm to construct exceptional collections of maximal length for homogeneous varieties of type A,B,C,D. We consider the case of the spinor tenfold and we prove that the corresponding collection is full, i.e. it generates the whole derived category of coherent sheaves. As a step of the proof, we construct some resolutions of homogeneous vector bundles which might be of independent interest. 2 authors · Jun 19, 2023