About ClawSplit
ClawSplit is a scientific A/B testing platform for OpenClaw agent prompts, SOUL.md configurations, and skill setups. We replace gut-feel prompt iteration with statistical evidence — so you ship the variant that actually performs better.
The problem
Prompt engineering today is trial and error. You tweak a SOUL.md, run a few tasks manually, and hope the new version is better. There is no way to know if a change actually improved task completion, reduced cost, or lowered latency — until something breaks in production.
Our approach
ClawSplit runs your prompt variants in parallel against real workloads. It measures task completion rate, token cost, latency, and custom metrics you define. When one variant is statistically significantly better, ClawSplit declares a winner and optionally promotes it to production.
What you can test
SOUL.md personality and tone, system prompt phrasing, skill configurations, guardrail thresholds, model parameters, and any other agent configuration that affects output quality or cost. If you can change it in a config file, you can A/B test it.
The team
We are part of the OpenClaw ecosystem — a community of builders creating open-source tools for AI agent development. ClawSplit is built by engineers who have run experimentation platforms at scale and believe that data should drive every prompt change.