Test Finds AI Can Only Solve 2% of Complex Math Questions

Is the artificial intelligence (AI) that exists in the market today smarter than humans? According to benchmark test developer FrontierMath, the answer is 100% not yet. Frontiermath was developed by EpochAI to look at the capabilities of current AI to solve complex mathematical problems.

They worked with over 60 mathematicians to come up with hundreds of original, unsolved questions. After tests were performed by several AI models, only 2% of questions were answered successfully even after being given assistance. Among the models tested, the Gemini 1.5 Pro scored the highest followed by the Claude 3.5 Sonnet, o1-Preview, o1-Mini, GPT-4o, and Grok 2 Beta.

FrontierMath's test results show the weakness of existing math benchmark tests such as GSM-8K and MATH which show AI models can answer around 90% of questions given. This according to EpochAI is because the questions asked by these two benchmarks have been leaked so the AI simply regurgitates the answers from the training data. FrontierMath is still new with original questions that can test the real ability of AI to answer questions on its own.

Trending

TNG eWallet Is Now Optimized For Global Use – Users Can Select View By Country

Wow! There are Adult Scenes in Sakura School Simulator

eMADANI RM100 Credit Redemption – What do MAE, Setel, ShopeePay and TNG eWallet Offer?

People's Digital Program Will Be Introduced to Increase Knowledge of New Technologies

10 Services and Devices That Will Be Discontinued in 2024

Test Finds AI Can Only Solve 2% of Complex Math Questions

Contact Form