Leaderboard
Which configuration wins?
Configurations ranked by accuracy against ground truth, then by cost. Every row is a reproducible run pinned to its exact pattern and model. Run experiments in the console to populate this board.
Can the agents catch their own mistakes?
Open console →No runs yet. Run this job to put the first configuration on the board.