Benchmark Model - Search News

Alibaba's model never trained as an agent — and improved agent performance across seven benchmarks

Real environments can't inject edge cases on demand. Alibaba's Qwen-AgentWorld simulates them — and outperformed ...

What Legal AI Benchmarks Reveal That Model Names Don’t

By Daniel Lewis, CEO, LegalOn. Foundation models are improving quickly. One useful measure is software engineering: the ...

1mon

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Microsoft's new vulnerability-scanning system, codenamed MDASH, scored 88.45% on the CyberGym benchmark, surpassing single-model systems from Anthropic and OpenAI by using more than 100 specialized AI ...

25d

MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost

M3 demonstrates that the next phase of agent development will not just be driven by larger datasets, but by efficient ...

12d

ACE ROBOTICS’ Kairos World Model Leads Multiple Global Embodied-Intelligence Benchmarks

SHANGHAI, CHINA - Media OutReach Newswire - 15 June 2026 - ACE ROBOTICS today announced that its open-source Kairos ...

SiliconANGLE

OpenAI details o3 reasoning model with record-breaking benchmark scores

OpenAI today detailed o3, its new flagship large language model for reasoning tasks. The model’s introduction caps off a 12-day product announcement series that started with the launch of a new ...

Decrypt

Rio de Janeiro Built an AI Model That Beat DeepSeek—But Was Based on Someone Else's Work

Rio de Janeiro released a frontier-class AI model that claimed to beat Alibaba's best. Then Nex showed up with receipts.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results