← Back to blogMarch 26, 2026

5 prompt optimization techniques that actually work

Forget the generic advice. These five techniques are backed by data from thousands of A/B tests across production OpenClaw agents.

There is no shortage of prompt engineering advice on the internet. Most of it is generic ("be specific," "give examples") and none of it comes with data. We looked at the results of over 4,000 A/B tests run through ClawSplit to find out which optimization techniques actually move the needle for production agents. Five stood out.

1. Replace vague instructions with observable behaviors

"Be helpful and concise" is the most common instruction in SOUL.md files. It is also one of the least effective. The problem is that "helpful" and "concise" mean different things in different contexts, and the model has to guess what you mean every time.

Compare that to: "For factual questions, answer in one to two sentences. For troubleshooting questions, list the three most likely causes, then ask which one the user wants to investigate." This version tells the model exactly what "helpful and concise" looks like in practice.

In our data, replacing vague instructions with specific, observable behaviors improved consistency scores by 18-25% across the board. It is the single highest-ROI change you can make to most SOUL.md files.

2. Add explicit output format constraints

When your agent's output is consumed by other software (APIs, downstream agents, UI components), format matters as much as content. But most prompts leave format implicit. The model figures it out from context, and it works 80% of the time. The other 20% causes parsing errors, broken UIs, or confused downstream agents.

Adding explicit format instructions is simple. "Respond with a JSON object containing exactly three fields: answer (string), confidence (number between 0 and 1), and sources (array of strings)." This kind of constraint reduced parsing errors by 35% on average in our dataset.

The key is being precise about the format. Do not just say "respond in JSON." Specify the exact fields, types, and constraints. The more explicit you are, the more reliably the model follows the format.

3. Use conditional behavior rules

Static rules apply the same behavior to every situation. Conditional rules adapt. "If the user message is under 20 words, respond in one paragraph. If it is over 100 words, mirror the user's level of detail." Agents with conditional rules score 22% higher on user satisfaction because their responses feel proportionate to the input.

The trick is keeping conditions simple and mutually exclusive. If you have overlapping conditions, the model has to choose between them, and it will not always choose the way you expect. Three to five conditional rules covering the most common scenarios is the sweet spot. More than that and you are adding complexity without measurable benefit.

4. Front-load the most important instructions

Language models pay more attention to instructions that appear early in the prompt. This is not a theory; it is a measurable phenomenon. In our A/B tests, moving a critical behavior rule from the bottom of a 300-line SOUL.md to the top improved compliance with that rule by 12-15%.

Structure your SOUL.md so the most important constraints come first. Identity and core behavior rules belong in the first 50 lines. Edge cases and nice-to-haves go further down. If you are not sure what the most important rules are, look at your failure data. The rules your agent violates most often are the ones that need the most prominent placement.

5. Add few-shot examples for your hardest tasks

Few-shot examples are input-output pairs that show the model what you want. They are more effective than instructions alone for tasks where the expected behavior is hard to describe in words.

The key insight from our data: few-shot examples help most for your hardest tasks, not your easiest ones. If your agent already handles simple questions well, adding examples for those will not move the needle. But adding two or three examples for the edge cases that trip your agent up regularly can improve first-attempt success on those cases by 25-35%.

Keep examples realistic. Use actual user messages from your logs, not idealized inputs. And keep the set small. Three well-chosen examples outperform ten mediocre ones because they do not waste context window space.

Putting it all together

These five techniques are not mutually exclusive. The best-performing agents in our dataset use all of them. Start with the one that addresses your biggest pain point, measure the impact with an A/B test, and then move to the next one. Prompt optimization is iterative. Each improvement builds on the last, and the compounding effect over a few months is significant.

A/B test your agent configs

ClawSplit lets you test different prompts, models, and settings to find what works best.

Start Testing

Why prompt engineers need A/B testing →How to Optimize AI Prompts: A Data-Driven Approach →SOUL.md Best Practices: Lessons From 1,000 Agent Deployments →How to A/B test your AI prompts: a practical guide →How to test AI prompts before production →