This is a preview — the A/B testing platform is under active development

Experiment

Completed

Support Tone Test

Testing whether a friendly tone outperforms a professional tone for customer support task completion.

Created: 2026-03-18Duration: 4 daysSamples: 2,000

Variant A

Professional

You are a professional customer support agent. Use formal language, be concise and precise.

87.2%success rate
Winner

Variant B

Friendly

You are a friendly, helpful support agent. Use warm language, empathize with the customer, and guide them step by step.

94.1%success rate

Detailed Results

VariantSamplesSuccess RateAvg TokensAvg LatencyCost / Success
Variant A(Professional)1,00087.2%3421.8s$0.0041
Variant B(Friendly)*1,00094.1%3872.1s$0.0038

Statistically Significant

p < 0.001 · 95% CI: [4.2%, 9.6%] · z = 5.23

Variant B outperforms Variant A by +6.9 percentage points

Results gallery

What real experiments look like

Every experiment produces a clear result: a winner, a cost difference, and the statistical confidence to back it up.

Task completion

SOUL.md Tone Test

Winner: Concise tone

+12% completion-23% token costp < 0.01 · n=1,000

Cost per task

Model Routing

Winner: Haiku for simple tasks

-38% costSame completion ratep < 0.001 · n=5,000

Safety + completion

Guardrail Strictness

Winner: Medium guardrails

+8% completion0 safety incidentsp < 0.05 · n=2,000

Output quality

Few-shot Examples

Winner: 3 examples (vs 1)

+15% quality score+$0.002/taskp < 0.01 · n=800

Want to run experiments like this?

Join the waitlist to get early access to ClawSplit and start A/B testing your agent prompts.

Free forever. No credit card required.