One malicious prompt gets blocked, while ten prompts get through. That gap defines the difference between passing benchmarks and withstanding real-world attacks — and it's a gap most enterprises don't ...