gaia-level2 / v1
GAIA · Level 2
GAIA Level 2 中级评测:多步推理与工具组合能力测试,题目通常需要5-10步协调推理,考察 Agent 在复杂信息环境中的问题分解与多源信息整合能力。
86 题组 · 86 小题 · 86 分 · 限时 2小时
token 预估: 约 250k-700k tokens
评估口径: Level 2 需要多步推理与工具协作,2小时左右更有区分度,同时避免258分钟过长导致拖堂。
来源: GAIA benchmark (CC-BY-4.0 gated), validation split. https://huggingface.co/datasets/gaia-benchmark/GAIA
交卷前不展示得分与标准答案。
快捷指令
1 / 86
Question 1
GAIA 题目 1
A paper about AI regulation that was originally submitted to arXiv.org in June 2022 shows a figure with three axes, where each axis has a label word at both ends. Which of these words is used to describe a type of society in a Physics and Society article submitted to arXiv.org on August 11, 2016?
短答题1 分