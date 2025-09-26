ADVERTISEMENT
OpenAI has introduced a new metric, GDPval, to measure how closely its latest AI models, including GPT-5, are performing compared to human professionals in tasks tied to America's economy. The benchmark is an early step toward fulfilling the company's mission to develop Artificial General Intelligence (AGI) capable of economically valuable work.
OpenAI's initial GDPval-v0 test, which assesses performance across 44 occupations in nine major industries (like healthcare, finance, and manufacturing), suggests that advanced models are "already approaching the quality of work produced by industry experts."
GPT-5-high: A souped-up version of GPT-5 was ranked as better than or on par with industry experts 40.6% of the time in the tasks tested.
Anthropic's Claude Opus 4.1: This competing model performed slightly better, winning or tying against human reports in 49% of tasks. OpenAI attributes this high score partly to Claude's ability to produce pleasing graphics, which may have swayed human professional evaluators.
The test asks experienced professionals to compare AI-generated reports (e.g., a competitor landscape from an investment banker) with those created by other humans, then select the best one.
OpenAI acknowledges that GDPval-v0 is currently limited, as it only tests the creation of research reports, which is just a small component of any professional's actual job. However, the progress shown is significant:
OpenAI's previous model, GPT-4o, scored just 13.7% (wins and ties) approximately 15 months ago.
The nearly threefold increase in performance with GPT-5 encourages OpenAI's evaluations lead, Tejal Patwardhan, who expects the rapid improvement to continue.
OpenAI's chief economist, Dr. Aaron Chatterji, suggests these results mean that people in these occupations can use the increasingly capable AI models to "offload some of their work and do potentially higher value things."
Benchmarks like GDPval are becoming crucial as existing, academic AI tests, such as AIME 2025 (math) and GPQA Diamond (science), are nearing saturation. GDPval aims to provide a more real-world assessment of AI's proficiency, a critical step as the industry attempts to definitively measure AI's value across various sectors.