Field Notes
Diagnostic posts from real production AI engagements: what the dashboard didn't catch, what the load test missed, what the CloudWatch metric actually said. Evidence-led, customer sanitised, written so a peer running the same stack can apply the lesson on Monday.
Field Notes: Turning prompt caching on for a production Bedrock workload
Strands' BedrockModel ships with prompt caching off. Two kwargs turn it on, one per-model gotcha catches you, and a 10-turn driver measures 99.9% / 99.8% hit ratios on Nova Pro and Sonnet 4.6 against an 8,156-token production system prefix. The per-call usage block proves it in seconds, not waiting on CloudWatch.
Field Notes: Three things I learned diagnosing a production Bedrock workload
Three findings from a real customer engagement on AWS Bedrock — what a load test was actually doing, why p95 latency was 45 seconds, and the prompt-caching default that costs every team money. Plus the three CloudWatch metrics that catch all three.