Test your prompts. Ship the winner.
Run two prompt variants in parallel. Ship the winner with statistical proof.
Features
Stop guessing which prompt is better
ClawSplit replaces gut-feel iteration with statistical evidence. Run experiments, measure outcomes, ship the winner.
How it works
From hypothesis to winner in 4 steps
No statistics degree required. ClawSplit handles the math.
Create an experiment
Pick which config to test — SOUL.md, a skill, or model routing rules. Upload your variants.
Define your test suite
Add the tasks your agent should handle. ClawSplit runs each variant against the same inputs for a fair comparison.
Let it run
ClawSplit executes both variants in parallel, tracking completion rate, cost, latency, and quality scores.
Ship the winner
Review the results dashboard, see statistical significance, and promote the winning variant to production with one click.
Compare
What changes with ClawSplit
Trusted by teams
What builders are saying
“We were iterating on our support agent SOUL.md by feel for weeks. One experiment showed Variant B was 30% cheaper with identical completion rates.”
“The cost optimizer router cut our token spend by routing simple queries to cheaper models. Took five minutes to set up.”
“Prompt engineering finally feels like engineering. Hypothesize, test, measure, ship. The experiment loop is exactly what was missing.”
FAQ
Common questions
- No. ClawSplit works with your existing SOUL.md and skills. Upload two variants, point it at your agent, and start an experiment. Zero config changes to your running agent.
- ClawSplit uses standard hypothesis testing (two-proportion z-test for completion rates, Welch t-test for cost/latency). You set your confidence threshold and ClawSplit tells you when a winner is clear.
- Task completion rate, token cost, latency, and custom quality scores. Pro users also get model-level breakdowns and cost optimizer analytics.
- Yes. The cost optimizer lets you define routing rules (e.g., send simple tasks to Haiku) and A/B test those rules against your current setup.
- Yes. The Starter plan is free forever — 2 concurrent experiments, basic metrics, and 7-day history. No credit card required.
- Yes. ClawSplit works with any model your OpenClaw agent can talk to — including local models via Ollama, LM Studio, and other self-hosted inference servers. Point your variants at different local models or compare a local model against an API-hosted one. All metrics (completion rate, latency, cost) work the same way.
Run your first experiment in 30 seconds
No signup needed. See real results with live LLM calls.
Try it free →