WHAT IS A MATHEMATICAL PROOF IN THE AGE OF MODERN THEOREM PROVERS? Inês Hipolito University of Lisbon
hipolito.ines @ gmail.com
Abstract A mathematical proof is notable for its clear language which satisfies the logical rules of inference and which convinces us by its intrinsic explanation relatively easy to reproduce. Nevertheless, since the publication of the computer aided proof of the four-colour theorem in 1976, the status of proof has been widely discussed, and that discussion was recently reopened following the verification of the Feit–Thompson theorem by a modern theorem prover. Modern theorem provers enable us to verify mathematical theorems, construct formal axiomatic derivations of remarkable complexity and, potentially, increase confidence in mathematical statements. Proof assistants, therefore, are the result of the efforts by logicians, computer scientists, and mathematician to obtain complete mathematical confidence through computers. In this paper, I will discuss how classical mathematical theorems strongly contrast with the “trivial” use of modern theorem provers. I will specifically address the Feit–Thompson theorem, proof of which was recently verified by the interactive theorem prover Coq in order to assess the proof’s status in the era of modern theorem provers, and, more clearly, whether a machine proof may still be considered a calculus of reasoning.
Keywords: Proof; Formal Proof; Modern Theorem Provers; Feit-Thompson Theorem; Artificial Agency.
What is a Mathematical Proof?
The subject being assessed is not located in space or time, and can be initially understood through inductive basic principles. This is the reason why philosophers must pay special attention to ontological and epistemological issues concerning not only the mathematical subject, but also its abstract entities. It is, then, the responsibility of philosophy to assess what the nature of a mathematical proof is.
A proof is “a sequence of formulae of which each member is either an axiom or is derived from a set of preceding members by application of a rule of inference, and which terminates with the proposition proved. The final member of such a sequence is a theorem” (The Oxford Dictionary of Philosophy, 2008). In other words, “a proof is a chain of reasoning, starting from axioms, usually also with assumptions on which the conclusion then depends, that leads to a conclusion and which satisfies the logical rules of inference” (Oxford Concise Dictionary of Mathematics, 2014).
Generally speaking, all schools would agree that a mathematical proof could be reduced to a series of very small steps, each of which can be verified simply and irrefutably. Therefore, mathematicians would agree on the validity of the basic proof steps and on the methods of combining these into larger proofs. This assumption leads us to the problem of how to translate the informal proof to a formal one, that is, from the natural language that mathematicians write in books to a formal language understood by a computer. Two of the main issues to consider regarding this matter is how a proof convinces and how a proof explains it correctness. Indeed, the mathematician Kahle emphasizes that a mathematical proof “should give us the ability to convince somebody (with a reasonable background knowledge in the area) of the truth of a theorem whenever and wherever. It seems to be just a matter of the fact, that such an ability comes along with an explanation” (Kahle, forthcoming).
What is a Computer-assisted proof?
A computer-assisted proof is a proof in which every logical inference has been checked all the way back to the fundamental axioms of mathematics, and since the result is not subject to intuition, it seems to be less vulnerable to errors (Hales, 2008). A computer-assisted proof is a proof written in a precise artificial language that admits only a fixed repertoire of stylized steps (Harrison, 2008). Therefore, we may say that a computer-assisted proof is a mechanical process that can be verified by the correctness of the formal language as a virtue of proof assistants.
Proof assistants or theorem provers, on the other hand, are computer systems that allow the user to do mathematics on a computer. The user can set up a mathematical theory, define properties and do logical reasoning with them. In other words, they enable mathematical theories to be formalised and so prove theorems (Geuvers, 2009).
Nowadays, there is a large number of computer provers, that can check or construct computer-assisted proofs, such as the HOL Light for classical and higher order logic, based on a formulation of type theory with equality as the only primitive notion; or Coq, which is an interactive theorem prover, enabling the expression of mathematical assertions, mechanically checking proofs of these assertions, helping to find computer-assisted proofs, and extracting a certified program from the constructive proof of its formal specification. Other theorem provers presented in Wiedijk are Mizar, PVS, Otter/Ivy, Isabelle/Isar, Affa/Agda, ACL2, PhoX, IMPS, Metamath, Theorema, Lego, Nuprl, Ωmega, B method and Minlog.
According to Wiedijk (2006), these theorem provers:
(1) are designed for the formalization of mathematics, or, if not designed specifically for that, have been seriously used for this purpose in the past;
(2) are special at something. These are the systems that in at least one aspect are better than all the other systems in the collection. They are the leaders in their field.
In recent years, several theorems have been formally verified. Although there are many to choose from, some significant ones are the following:
a) The Prime Number Theorem. This formalization was written by Jeremy Avigad of Carnegie Mellon University in 2004, with the help of Kevin Donnelly, David Gray and Paul Raff when they were students there. The system that they used was Isabelle. The size of the formalization was: 1,021,313 bytes = 0.97 megabytes, 29,753 lines, 43 files. (Cf. Avigad, J., 2004).
b) The Four Color Theorem. This formalization was carried out by Georges Gonthier of Microsoft Research in Cambridge, in 2004, in collaboration with Benjamin Wener of the École Polytechnique in Paris. The system that he used was Coq and the size of the formalization was 2,621,072 bytes = 2.50 megabytes; 60,103 lines and 132 files. This proof was found back in the nineteen seventies and it did not just involve clever mathematics: an essential part of the proof was the execution of a computer program that for a long time searched through an extremely large number of possibilities.
c) The Jordan Curve Theorem. This formalization was carried out by Tom Hales at the University of Pittsburgh, in 2005. The system that he used was HOL Light. The size of the formalization was: 2,257,363 bytes = 2.15 megabytes, 75,242 lines and 15 files.
2.2. The Formalization of the Odd Order Theorem Recent decades have brought about substantial advances in the use of interactive theorem provers, concerning the verification of hardware and software component correctness, with respect to given specifications. A noteworthy achievement is the most recent machine-checked proof the Odd Order Theorem. This theorem asserts that every finite group of odd order is solvable (Burnside, 1911) and Feit and Thompson proved it in 1963 with a proof that filled an entire issue of the Pacific Journal of Mathematics. It was the longest proof to have emerged in mathematical literature until then.
The formalization of the Odd Order Theorem was made possible through a prover different from Apel and Haken’s proof of the Four-Color Theorem (verified with the Coq System) and the Hale’s Flyspeck project that verified the Kepler conjecture. “The Odd Order Theorem does not rely on mechanical computation and the arguments were meant to be read and understood in their entirety” (Gonthier et al, 2013). What is so particular about this formalization is that the combination of theories involved makes it possible to efficiently switch, between the several perspectives mathematics may have regarding that mathematical object. Another important matter is that this shows a formalization of common patterns of mathematical reasoning. Indeed, “the formalization relies on a variety of argument patterns that require new kind of support” (Gonthier et al, 2013).
The status of computation in Coq’s formalism plays a central role in the Odd Order formal proof. In fact, “every term or type has a computational interpretation, and the ability to unfold definitions and normalize expressions is built in to the underlying logic. Type inference and type checking can take advantage of this computational behavior, as can proof checking, which is just an instance of type checking” (Gonthier et al, 2013, p. 13).
On the other hand, Coq’s logic is constructive; therefore, Coq’s major classical principles such as the excluded middle, choice functions and extensionality are not part of the basis of the Odd Order formalization. In fact, the success of such large-scale formalization demands a careful choice of representation which, through taking advantage of Coq’s type mechanism and computational behaviour, allows organization and encoding in successive layers and interfaces. According to Gonthier et al.:
The lower-level libraries implement constructions of basic objects, constrained by the specifics of the constructive framework. Presented with these interfaces, the users of the higher-level libraries can then ignore these constructions, and they should hopefully be able to describe the proofs to be checked by Coq with the same level of comfort as when writing a detailed page in LATEX. The goal of this section is to describe some of the design choices that were made in that respect (Gonthier, et al, 2013, p.13).
With regard to mathematical theories and especially concerning representation and characters, the computer-assisted proof is organized into two levels: (1) the abstract, in which the hierarchy of structures provides interfaces, notations and shared theories for vectors, F-algebras1 and their morphisms; (2) On the concrete level, these structures are instantiated by particular models, centred on matrices. On this level, an extended Gaussian elimination procedure similar to LU decomposition2 played a key-role. Following Gonthier et al.:
The choices we have made are validated by the successful formalization of representation theory, which depends on both finite group theory and linear algebra. Thanks to the underlying formalization of linear algebra in terms of concrete matrices, it is fairly easy to define the representations of a given group G in terms of square matrices with coefficients in a given field F, as well as other important notions, like the enveloping algebra of a representation, or a group module. Part of the theory of group modules requires the extra assumption that F has a decidable first-order theory. The library includes formal proofs of the fundamental results of representation theory, including Schur’s lemma, Maschke’s theorem, the Jacobson density theorem, the Jordan-Hölder theorem, Clifford’s theorem, the Wedderburn structure theorem for semisimple rings, etc.”.
Indeed the success of the Odd Order formal proof relies both on the inductive types provided by Coq and on various reflection techniques, especially the transference of the methodology of “generic programming” to computer-assisted proofs, using the type inference mechanisms of the Coq system. The proof scripts include more than 150 000 lines and roughly 4000 definitions and 13 000 theorems. The roughly 250 pages of mathematics in the two main sources are translated to about 40 000 lines of computer-assisted proof. During the formalization, they had “to correct or rephrase a few arguments (…) but the most time-consuming part of the project involved getting the base and intermediate libraries right. This required systematic consolidation phases performed after the production of new material. The corpus of mathematical theories preliminary to the actual proof of the Odd Order Theorem represents the main reusable part of this work, and contributes to almost 80 percent of the total length” (Gonthier, 2013, p. 16).
The work involved in this proof is not only a computer-assisted proof of the Odd Order Theorem, but also, and more notably, a substantial library of mathematical components, and a tried and tested methodology that will support future formalization efforts.
Conclusions and Discussion Despite Kant’s well-known account of mathematics, today mathematicians agree that Mathematics consists of analytic truths, as such free of all imperfection, and since it is supposed to be an exact science, it is understandable that in the era of modern provers, the idea of a computer-assisted proof is reluctantly considered. Computer-assisted proof is regarded as far too monotonous and, indeed, may be more error-prone than the usual informal kind since formal manipulations become more complicated and underlying intuition begins to get lost.
From the Platonic mathematical view, mathematical entities exist, are abstract, and are independent of all our rational activities. For example, a Platonist might assert that the number π exists outside of space and time and has the characteristics it does regardless of any mental or physical activities of human beings.
Therefore, from a Platonist point of view, proof assistants and machine-checked proof seem, then, to be a remarkable effort to obtain complete rigour without the mental or physical activities of human beings.
Logicism, on the other hand, claims that a proof is a part of logic and the aim of this project was to produce the mathematical corpus without introducing concepts indefinable in logical terms or theorems which cannot be proved from the primitive sentences of a logical calculus using its rules of proof. Thus, computer-assisted proof could play a central role in the logicist agenda.
The formalist perspective states that a proof is not a body of propositions representing an abstract sector of reality but is much more akin to a game, bringing with it no more commitment to an ontology of objects or properties than ludo or chess (Weir, 2011), and this view has recently proposed that all of our formal mathematical knowledge should be systematically encoded in computer-readable formats, so as to facilitate automated proof checking of mathematical proofs and the use of interactive theorem proving in the development of mathematical theories and computer software3. The validity of any mathematical proposition rests upon the ability to demonstrate its truth through rigorous proof within an appropriate formal system. In formalism, the truths are not a matter of numbers, sets or triangles; rather they are syntactic forms whose shapes and locations have no meaning unless they are given semantics. Accordingly, constructive mathematics is positively characterized by the requirement that proof be algorithmic. This means that when a mathematical object is asserted to exist, an explicit example can be given: a constructive existence proof demonstrates the existence of a mathematical object. As far as the computer-assisted proof is concerned, much more emphasis has been placed on algorithmic procedures to obtain numerical results, and constructive mathematics has come into its own. Indeed, a constructive proof is exactly that: an algorithmic procedure to obtain a conclusion from a set of hypotheses. From the mathematical intuitionism perspective, mathematics is considered to be purely the result of the constructive mental activity of humans and, therefore, one may say that in the intuitionism of Brouwer there is no place for computer-assisted proofs.
In recent decades, several mathematicians have argued that there is much more to mathematics than formal systems. Mathematicians agree that when a proof is valid due to its form only and not due to its content, “it is likely to add very little to an understanding of its subject and ironically may not even be very convincing” (Hanna, 1991).
In the twentieth century, research in the Philosophy of Mathematics was concentrated on the nature of mathematical objects. In the second half, research moved away from foundational concerns and towards scientific knowledge and scientific understanding, and proof and computation did not receive much attention. This situation has changed in recent years and it seems that along with the growth of the significance of computers in mathematics, there has been increased philosophical reflection concerning computation.
In recent decades, we have seen mathematical proofs being machine-checked, such as the four-color theorem, the Kepler conjecture and the Odd Order theorem. The problem addressed in this article deals with the mathematical conception of a priori knowledge. On the one hand, it is tempting to say that the computer proves quasi-empirical knowledge; on the other hand, the notion of proof seems to lose its a priori character. Some researchers4 consider that the empirical factors on which we rely when we accept computer proofs do not appear as premises in the argument and, therefore, computer proofs are a priori after all.
Funding This work was supported by the research project “The Notion of the mathematical proof”, PTDC/MHC-FIL/5363/2012, at the Centre for Artificial Intelligence of the New University of Lisbon, funded by the Portuguese Foundation for Science and Technology, FCT.
Acknowledgements I would like to thank to professor Reinhard Kahle for his important thoughts and guidance.
References Avigad, J. (2004). Notes on a formalization of the Prime Number Theorem. Technical report, Carnegie Mellon
Blackburn, S. (2008). The Oxford Dictionary of Philosophy, Oxford: Oxford University Press.
Brouwer, L. E. J., 1905, Leven, Kunst en Mystiek. Partial English translation in Brouwer, 1975, pp. 1–9. Full English translation (by W. van Stigt) in Notre Dame Journal of Formal Logic, 37(3):389–429.
Burge, T., 1998. ‘Computer Proofs, A Priori Knowledge, and Other Minds’, Noûs, 32: 1–37.
Clapham, C., Nicholson, J. (2014). Oxford Concise Oxford Dictionary of Mathematics, Oxford: Oxford University Press.
Cocchiarella, N. (1986). Frege, Russell and Logicism: a Logical Reconstruction. In Frege Synthesized, Haaparanta an Hintikka (eds.). Holland: D. Reidel Publishing Company.
Demopoulos, W., Clark, P. (2005). The Logicism of Frege, Dedekind, and Russell. The Oxford Handbook of Philosophy of Mathematics and Logic, Oxford: Oxford University Press.
Detlefsen, M. (2005). Formalism. The Oxford handbook of Philosophy of Mathematics and Logic, Oxford: Oxford University Press.
Feit, W., Thompson, J. G. (1929). Solvability of groups of odd order. Pacific Journal of Mathematics, 13(3):775–1029.
Geuvers, H. (2009). Proof Assistants: history, ideas and future. Sahana, Vol 34, Part I, pp. 3-25.
Gödel, K. (1931), Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme, I. and On formally undecidable propositions of Principia Mathematica and related systems I in Solomon Feferman, ed., 1986. Kurt Gödel Collected works, Vol. I. Oxford University Press: 144-195.
Gödel, K. (1964). Russell’s Mathematical Logic. Philosophy of Mathematics (Benacerraf and Putnam, eds.), Cambridge: Cambridge University Press.
Gonthier, G., Asperti, A, Avigad, J., Bertot, Y., Cohen, C. et al. A Machine-Checked Proof of the Odd Order Theorem. Sandrine Blazy and Christine Paulin and David Pichardie. ITP 2013, 4th Conference on Interactive Theorem Proving, July 2013, Rennes, France. Springer, 7998, pp.163-179, LNCS. <10.1007/978-3-642-39634-2 14>.
Hales, T. (2008). ‘Formal Proof’. Notices of the AMS, 55, 11.
Hanna, G., (1991). Mathematical Proof. In Tall, D. (ed.) Advanced Mathematical Thinking, Dordrecht: Kluwer Academic Publishers.
Harrison, J. (2008). ‘Formal Proof – Theory and Practice’. Notices of the AMS, 55, 11.
Heyting, A. (1964). The Intuitionistic Foundations of Mathematics. Philosophy of Mathematics, Cambridge: Cambridge University Press.
Hilbert, D., 1925. ‘On the Infinite’, in Benacerraf & Putnam 1983, 183–201.
Horsten, L., "Philosophy of Mathematics", The Stanford Encyclopedia of Philosophy (Fall 2014 Edition), Edward N. Zalta (ed.), URL = .
Kahle, R. (forthcoming). What is a mathematical proof? Axiomathes: 1-13.
Posy, C. (2005). Intuitionism and Philosophy. The Oxford Handbook of Philosophy of Mathematics and Logic, Oxford: Oxford University Press.
Rayo, A. (2005). Logicism Reconsidered. The Oxford Handbook of Philosophy of Mathematics and Logic. Oxford: Oxford University Press.
Russell, B. (1908). Mathematical Logic as Based on the Theory of Types, American Journal of Mathematics, 30, 222–262; repr. in Bertrand Russell, Logic and Knowledge, London: Allen and Unwin, 1956, 59–102; also appearing in Collected Papers, Vol. 5.
Russell, B. (1919) Introduction to Mathematical Philosophy, London: Routledge.
Russell, B., Whitehead, A. (1913), Principia Mathematica, Cambridge: Cambridge University Press.
Snapper, E. (1979). The Three Crises in mathematics: Logicism, Intuitionism and Formalism. In Mathematics Magazine Vol. 52, No. 4 (Sep., 1979), pp. 207-216.
Tall, D. (1991). Advanced Mathematical Thinking, Dordrecht: Kluwer Academic Publishers.
Troelstra, A. & van Dalen, D., 1988. Constructivism in Mathematics: An Introduction (Volumes I and II), Amsterdam: North-Holland.
Waterfield, Robin (1994). Plato: Republic. Translated, with notes and an introduction. Oxford: Oxford World's Classics.
Weir, A., (2011). "Formalism in the Philosophy of Mathematics", The Stanford Encyclopedia of Philosophy, Edward N. Zalta (ed.), URL = .
Wiedijk, F. (2005). The Seventeen Provers of the World, Berlin, Heidelberg: Springer-Verlag.
1F-algebras generalize algebraic structure. Rewriting the algebraic laws in terms of morphisms eliminates all references to quantified elements from the axioms, and these algebraic laws may then be glued together in terms of a single functor F, the signature.
2 In numerical analysis, LU decomposition (where 'LU' stands for 'Lower Upper', and also called LU factorization) factors a matrix as the product of a lower triangular matrix and an upper triangular matrix.
3 Mathematical intuitionists and constructivists in the “computability” tradition also advocate this idea.