A new OpenAI benchmark, GDPval, tests AI models on things people actually do in their jobs — and finds that Claude is about as good as a human for government work
AI models are getting really good at things…
A new OpenAI benchmark, GDPval, tests AI models on things people actually do in their jobs — and finds that Claude is about as good as a human for government work