??526??Terminal-Bench 2.0???????????????
HomeSeventy3 › Episode

??526??Terminal-Bench 2.0???????????????

16:34 Mar 9, 2026
About this episode
Seventy3???NotebookLM???????????????????????????crypto????????AI????????????????????????????????????Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line InterfacesSummaryAI ??????????????????????????????long-horizon tasks??????????????????????????????????????????????? Terminal-Bench 2.0????????????????????? 89 ???????????????terminal environments?????????????????????????????????? ??????? ??????????? ???????????????????????????????????????????? 65%????????????error analysis?????????????????????????????????????????????????????????????evaluation harness???????????????????https://arxiv.org/abs/2601.11868
Select an episode
0:00 0:00