Leaderboard

Which configuration wins?

Configurations ranked by accuracy against ground truth, then by cost. Every row is a reproducible run pinned to its exact pattern and model. Run experiments in the console to populate this board.

Can the agents catch their own mistakes?

Open console →
No runs yet. Run this job to put the first configuration on the board.