Test Finds AI Can Only Solve 2% of Complex Math Questions



Is the artificial intelligence (AI) that exists in the market today smarter than humans? According to benchmark test developer FrontierMath, the answer is 100% not yet. Frontiermath was developed by EpochAI to look at the capabilities of current AI to solve complex mathematical problems.



They worked with over 60 mathematicians to come up with hundreds of original, unsolved questions. After tests were performed by several AI models, only 2% of questions were answered successfully even after being given assistance. Among the models tested, the Gemini 1.5 Pro scored the highest followed by the Claude 3.5 Sonnet, o1-Preview, o1-Mini, GPT-4o, and Grok 2 Beta.


FrontierMath's test results show the weakness of existing math benchmark tests such as GSM-8K and MATH which show AI models can answer around 90% of questions given. This according to EpochAI is because the questions asked by these two benchmarks have been leaked so the AI ​​simply regurgitates the answers from the training data. FrontierMath is still new with original questions that can test the real ability of AI to answer questions on its own.

Previous Post Next Post

Contact Form