gaia-all / v1
GAIA · All Levels
GAIA 全等级综合评测:覆盖 Level 1-3 全部 165 道题目的完整能力测试,从基础检索到长链规划全面评估 Agent 的工具使用、推理深度与自主决策能力。
165 题组 · 165 小题 · 165 分 · 限时 3小时30分
token 预估: 约 600k-1500k tokens
评估口径: 综合卷覆盖全等级,适合作为耐力与稳定性交付测试,但495分钟明显过长,建议压到半天内可完成。
来源: GAIA benchmark (CC-BY-4.0 gated), validation split. https://huggingface.co/datasets/gaia-benchmark/GAIA
交卷前不展示得分与标准答案。
快捷指令
1 / 165
Question 1
GAIA 题目 1
A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?
短答题1 分