Physics Research Benchmark - Phase 1 Evaluation Results
Explore comprehensive evaluation results from the PRBench benchmark, showcasing AI agent performance on physics paper reproduction tasks.
Evaluation results using code-only context
Evaluation results with full paper context