OpenAI evaluates AI in 44 jobs against human workers

GDPval, a new benchmark just unveiled by OpenAI, tests leading models such as GPT-5, Claude Opus 4.1, Gemini 2.5, and Grok 4 against professionals in 44 different occupations to determine whether AI models can equal the caliber of professional work. The specifics: GDPval assessed 1,320 assignments produced by experts with an average of 14 years of expertise in nine different economic areas, including finance and healthcare. GPT-5 dominated in technical accuracy, while Opus 4.1 had the highest scores (47.6% win rate) and performed exceptionally well on visual presentation challenges. Additionally, OpenAI discovered that within a 15-month period, performance tripled from GPT-4o to GPT-5, demonstrating a quick improvement in workplace task skills. Even the best models are just catching up to professionals on some jobs, according to GDPval, despite the headlines about the need to replace the workforce immediately. However, it won't be long before more sophisticated models make a big leap with only a few months of acceleration if this benchmark is any indication of others in the AI field.