All insights
Evals before prompts
Why we write the eval suite before the system prompt. The framing forces you to define what "correct" means before you start convincing the model of anything.
It is tempting to start with the prompt. The prompt is fun. The eval is not.
We've found the opposite is the move that ships: write the eval first. Twenty rows of inputs and the answer you want. Now any prompt change is measurable. Every regression has a number.
The other side effect: the eval forces you to define the contract before you build the spaceship. If the eval is hard to write, the feature is half-baked.
Got a problem worth shipping?
If this resonated and you've got a project queued up, we should talk.
Book a call