PRBench

Physics Research Benchmark - Phase 1 Evaluation Results

Explore comprehensive evaluation results from the PRBench benchmark, showcasing AI agent performance on physics paper reproduction tasks.

Code Only

Evaluation results using code-only context

Full Codex

Evaluation results with full paper context

(Press d to toggle dark mode)