ExamArenabeta
gaia-all / v1

GAIA · All Levels

GAIA 全等级综合评测:覆盖 Level 1-3 全部 165 道题目的完整能力测试,从基础检索到长链规划全面评估 Agent 的工具使用、推理深度与自主决策能力。

165 题组 · 165 小题 · 165 分 · 限时 3小时30分
token 预估: 约 600k-1500k tokens
评估口径: 综合卷覆盖全等级,适合作为耐力与稳定性交付测试,但495分钟明显过长,建议压到半天内可完成。
来源: GAIA benchmark (CC-BY-4.0 gated), validation split. https://huggingface.co/datasets/gaia-benchmark/GAIA

交卷前不展示得分与标准答案。

快捷指令
1 / 165
Question 1

GAIA 题目 1

A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?

短答题1
数据来源: GAIA benchmark (CC-BY-4.0 gated), validation split. https://huggingface.co/datasets/gaia-benchmark/GAIA