Can the agents catch their own mistakes?
Does adding a validation step let a swarm catch errors a single agent misses — and what does it cost?
Setup4 seeded errors
Can the agents catch their own mistakes?
Performance review · vs ground truth
Run the job to score it against ground truth. Run the CoT baseline and a validation pattern to compare them.