# Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

2404.06405

7

404

## Abstract

Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.

Get summaries of the top AI research delivered straight to your inbox:

## Overview

- This paper explores a method called "Wu's Method" that can boost the performance of symbolic AI systems to rival that of silver medalists in the International Mathematical Olympiad (IMO) for geometry problems.
- The paper also shows that combining Wu's Method with a deep learning system called "AlphaGeometry" can outperform gold medalists at the IMO in geometry.

## Plain English Explanation

The paper discusses a new technique called "Wu's Method" that can significantly improve the problem-solving abilities of symbolic AI systems, allowing them to reach the level of silver medalists in the prestigious International Mathematical Olympiad (IMO) for geometry problems. This is an impressive achievement, as the IMO is one of the most challenging math competitions in the world.

Furthermore, the researchers combined Wu's Method with a deep learning system called "AlphaGeometry" to create an even more powerful system that can outperform the gold medalists at the IMO in geometry. This is a remarkable result, as gold medalists are the very best mathematicians in the world at that age.

The key insight behind Wu's Method is to leverage "symbolic reasoning" techniques, which allow the AI system to work with abstract mathematical concepts and relationships, rather than just raw numerical data. This enables the system to tackle complex, multi-step geometric problems in a more human-like way, rather than just brute-force searches or pattern matching.

By integrating Wu's Method with the deep learning capabilities of AlphaGeometry, the researchers were able to create a hybrid system that combines the strengths of both symbolic and neural network approaches. This allows the system to excel not only at recognizing patterns in data, but also at truly understanding and reasoning about the underlying mathematical principles.

The implications of this research are significant. It demonstrates that AI systems can now rival and even surpass the mathematical abilities of top human experts, at least in the domain of geometry. This could lead to breakthroughs in areas like automated theorem proving, computer-assisted math education, and the development of intelligent tutoring systems.

## Technical Explanation

The paper introduces a novel approach called "Wu's Method" that can significantly boost the performance of symbolic AI systems on challenging geometry problems from the International Mathematical Olympiad (IMO). 1

The key innovation of Wu's Method is its ability to leverage "symbolic reasoning" techniques to tackle complex, multi-step geometric problems. Rather than relying solely on pattern matching or brute-force search, the system is able to work with abstract mathematical concepts and relationships, allowing it to mimic more human-like problem-solving strategies.

The researchers show that by integrating Wu's Method with a deep learning system called "AlphaGeometry", they are able to create a hybrid approach that outperforms even the gold medalists at the IMO in geometry. 2 This remarkable result demonstrates the power of combining symbolic and neural network techniques to tackle challenging mathematical problems.

The experiments conducted in the paper demonstrate that Wu's Method can boost the performance of standalone symbolic AI systems to the level of silver medalists at the IMO. Furthermore, the combination of Wu's Method and AlphaGeometry is able to surpass the abilities of the gold medalists, who are the top mathematicians in the world at that age.

## Critical Analysis

The paper presents a compelling approach to improving the geometric problem-solving abilities of AI systems, but it is important to consider some potential limitations and areas for further research.

One potential concern is the reliance on the IMO dataset, which may not be representative of the full range of geometric problems encountered in real-world applications. It would be valuable to evaluate the performance of the Wu's Method and AlphaGeometry approach on a more diverse set of geometry problems to better understand its broader applicability.

Additionally, the paper does not provide much detail on the specific implementation of Wu's Method or the inner workings of the AlphaGeometry system. More technical information on the architectures, training procedures, and underlying algorithms would be helpful for researchers looking to build upon this work. 3

Finally, while the results are impressive, it is unclear how the performance of the AI systems compares to human experts in other areas of mathematics or problem-solving more broadly. Extending this research to other mathematical domains could provide valuable insights into the strengths and limitations of the proposed approach.

## Conclusion

This paper presents a significant advancement in the field of AI-powered geometric problem-solving, demonstrating that the integration of symbolic reasoning techniques (Wu's Method) with deep learning (AlphaGeometry) can outperform even the world's top human mathematicians on challenging IMO geometry problems.

The implications of this research are far-reaching, as it suggests that AI systems are now capable of rivaling and even surpassing human experts in certain mathematical domains. This could lead to breakthroughs in areas like automated theorem proving, computer-assisted math education, and the development of intelligent tutoring systems. 4

Overall, this work represents an important step forward in the ongoing quest to develop AI systems that can tackle complex, abstract problems with human-like reasoning and problem-solving abilities.

[1] https://aimodels.fyi/papers/arxiv/advancing-geometric-problem-solving-comprehensive-benchmark-multimodal [2] https://aimodels.fyi/papers/arxiv/mathsensei-tool-augmented-large-language-model-mathematical [3] https://aimodels.fyi/papers/arxiv/saas-solving-ability-amplification-strategy-enhanced-mathematical [4] https://aimodels.fyi/papers/arxiv/large-language-models-mathematical-reasoning-progresses-challenges

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

## Related Papers

🌿

### GOLD: Geometry Problem Solver with Natural Language Description

Jiaxin Zhang, Yashar Moshfeghi

0

0

Addressing the challenge of automated geometry math problem-solving in artificial intelligence (AI) involves understanding multi-modal information and mathematics. Current methods struggle with accurately interpreting geometry diagrams, which hinders effective problem-solving. To tackle this issue, we present the Geometry problem sOlver with natural Language Description (GOLD) model. GOLD enhances the extraction of geometric relations by separately processing symbols and geometric primitives within the diagram. Subsequently, it converts the extracted relations into natural language descriptions, efficiently utilizing large language models to solve geometry math problems. Experiments show that the GOLD model outperforms the Geoformer model, the previous best method on the UniGeo dataset, by achieving accuracy improvements of 12.7% and 42.1% in calculation and proving subsets. Additionally, it surpasses the former best model on the PGPS9K and Geometry3K datasets, PGPSNet, by obtaining accuracy enhancements of 1.8% and 3.2%, respectively.

5/2/2024

⚙️

### New!GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving

Jiaxin Zhang, Zhongzhi Li, Mingliang Zhang, Fei Yin, Chenglin Liu, Yashar Moshfeghi

0

0

Recent advancements in large language models (LLMs) and multi-modal models (MMs) have demonstrated their remarkable capabilities in problem-solving. Yet, their proficiency in tackling geometry math problems, which necessitates an integrated understanding of both textual and visual information, has not been thoroughly evaluated. To address this gap, we introduce the GeoEval benchmark, a comprehensive collection that includes a main subset of 2,000 problems, a 750 problems subset focusing on backward reasoning, an augmented subset of 2,000 problems, and a hard subset of 300 problems. This benchmark facilitates a deeper investigation into the performance of LLMs and MMs in solving geometry math problems. Our evaluation of ten LLMs and MMs across these varied subsets reveals that the WizardMath model excels, achieving a 55.67% accuracy rate on the main subset but only a 6.00% accuracy on the hard subset. This highlights the critical need for testing models against datasets on which they have not been pre-trained. Additionally, our findings indicate that GPT-series models perform more effectively on problems they have rephrased, suggesting a promising method for enhancing model capabilities.

5/20/2024

🧠

### FGeo-HyperGNet: Geometric Problem Solving Integrating Formal Symbolic System and Hypergraph Neural Network

Xiaokai Zhang, Na Zhu, Cheng Qin, Yang Li, Zhenbing Zeng, Tuo Leng

0

0

Geometric problem solving has always been a long-standing challenge in the fields of automated reasoning and artificial intelligence. We built a neural-symbolic system to automatically perform human-like geometric deductive reasoning. The symbolic part is a formal system built on FormalGeo, which can automatically perform geomertic relational reasoning and algebraic calculations and organize the solving process into a solution hypertree with conditions as hypernodes and theorems as hyperedges. The neural part, called HyperGNet, is a hypergraph neural network based on the attention mechanism, including a encoder to effectively encode the structural and semantic information of the hypertree, and a solver to provide problem-solving guidance. The neural part predicts theorems according to the hypertree, and the symbolic part applies theorems and updates the hypertree, thus forming a predict-apply cycle to ultimately achieve readable and traceable automatic solving of geometric problems. Experiments demonstrate the correctness and effectiveness of this neural-symbolic architecture. We achieved a step-wised accuracy of 87.65% and an overall accuracy of 85.53% on the formalgeo7k datasets.

4/23/2024

### Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process

Tong Xiao, Jiayu Liu, Zhenya Huang, Jinze Wu, Jing Sha, Shijin Wang, Enhong Chen

0

0

Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand both text and diagram, master essential geometry knowledge, and appropriately apply it in reasoning. However, existing works follow a paradigm of neural machine translation and only focus on enhancing the capability of encoders, which neglects the essential characteristics of human geometry reasoning. In this paper, inspired by dual-process theory, we propose a Dual-Reasoning Geometry Solver (DualGeoSolver) to simulate the dual-reasoning process of humans for GPS. Specifically, we construct two systems in DualGeoSolver, namely Knowledge System and Inference System. Knowledge System controls an implicit reasoning process, which is responsible for providing diagram information and geometry knowledge according to a step-wise reasoning goal generated by Inference System. Inference System conducts an explicit reasoning process, which specifies the goal in each reasoning step and applies the knowledge to generate program tokens for resolving it. The two systems carry out the above process iteratively, which behaves more in line with human cognition. We conduct extensive experiments on two benchmark datasets, GeoQA and GeoQA+. The results demonstrate the superiority of DualGeoSolver in both solving accuracy and robustness from explicitly modeling human reasoning process and knowledge application.

5/13/2024