An open testbed for agentic control & trust

How it works

Trust is a property of the system, not the agent. SPARK Lab is an open testbed for building any system and measuring that trust, level by level. A first instrument, built on Recursiv.

AdmissibilityIs it safe to trust?

GoalInterpretabilityIdentity

You set the goal — your prompt, and what “correct” means — and choose the models and swarms in play.

ReliabilityWill it do the right thing?

measured here

Verification

The lab’s headline metric. Every system runs several times; each is scored against your eval, and we measure how often it actually succeeds.

JustificationIs it rational to delegate to it?

measured here

RecourseCost

The ranking: reliability per dollar. The leaderboard tells you which system is worth trusting with the task.

In practice: write a prompt, pick the models and swarms to compare, run it. The lab measures the Reliability and Justification of every configuration, against your own definition of correct.

Run a test →