gaia-level3 / v1
GAIA · Level 3
GAIA Level 3 高级评测:长链规划与高级工具集成能力测试,题目要求强大的自主性与错误恢复能力,通常需要超过10步推理,适合评估顶尖 Agent 的综合问题解决能力。
26 题组 · 26 小题 · 26 分 · 限时 1小时30分
token 预估: 约 150k-450k tokens
评估口径: Level 3 单题复杂度最高,虽然题量较少,但需要给长链规划和外部资料处理留足余量。
来源: GAIA benchmark (CC-BY-4.0 gated), validation split. https://huggingface.co/datasets/gaia-benchmark/GAIA
交卷前不展示得分与标准答案。
快捷指令
1 / 26
Question 1
GAIA 题目 1
In July 2, 1959 United States standards for grades of processed fruits, vegetables, and certain other products listed as dehydrated, consider the items in the "dried and dehydrated section" specifically marked as dehydrated along with any items in the Frozen/Chilled section that contain the whole name of the item, but not if they're marked Chilled. As of August 2023, what is the percentage (to the nearest percent) of those standards that have been superseded by a new version since the date given in the 1959 standards?
短答题1 分