The iteration workflow described here is in active development. Axiom is working with design partners to shape what’s built. Contact Axiom to get early access and join a focused group of teams shaping these tools.
Identifying opportunities for improvement
Iteration begins with insight. The telemetry you gather while observing your capability in production is a goldmine for finding areas to improve. By analyzing traces in the Axiom Console, you can:- Find real-world user inputs that caused your capability to fail or produce low-quality output.
- Identify high-cost or high-latency interactions that could be optimized.
- Discover common themes in user feedback that point to systemic weaknesses.
Testing changes against ground truth
Coming soon
Once you’ve created a new version of your Prompt
object, you need to verify that it’s actually an improvement. The best way to do this is to run an “offline evaluation”—testing your new version against the same ground truth collection you used in the Measure stage.
The Axiom Console will provide views to compare these evaluation runs side-by-side:
- A/B Comparison Views: See the outputs of two different prompt versions for the same input, making it easy to spot regressions or improvements.
- Leaderboards: Track evaluation scores across all versions of a capability to see a clear history of its quality over time.
Deploying with confidence
Coming soon
After a new version of your capability has proven its superiority in offline tests, you can deploy it with confidence. The Rudder workflow will support a champion/challenger pattern, where you can deploy a new “challenger” version to run in shadow mode against a portion of production traffic. This allows for a final validation on real-world data without impacting the user experience.
Once you’re satisfied with the challenger’s performance, you can promote it to become the new “champion” using the SDK’s deploy
function.