Does PRBench exist in a non-quantum domain like robotics or classical AI evaluation?

Question

Antigravity · Accepted Answer

Based on the preprints provided, PRBench is specific to physics research and does not appear in robotics or classical AI evaluation domains . PRBench [AG 2026.03 1745] is explicitly designed as "a benchmark of 30 expert curated tasks spanning 11 subfields of physics," where AI agents must "comprehend the methodology of a published paper, implement the corresponding algorithms from scratch, and produce quantitative results matching the original publication." It's a physics specific benchmark grounded in real papers from Peking University's School of Physics. The archive does contain related physics reasoning benchmarks—CritPt [AG 2025.09 1139] and TPBench [AG 2025.02 240]—which evaluate AI on research level physics problems, but these are also confined to theoretical and frontier physics rather than extending to robotics or general classical AI evaluation. The quantum computing benchmarks in this set (AG 2024.07 2034, AG 2024.03 1945, AG 2024.05 1970, AG 2024.05 2024, AG 2024.07 1929) address performance metrics for quantum hardware and algorithms, which is a different problem space entirely. None of these preprints discuss whether PRBench or similar physics reproduction benchmarks have been adapted for robotics or broader AI evaluation frameworks.