Blog
9 hours ago
Separating Detection Authority From Enforcement Authority in LLM Security
I tested 1,448 real attacks against llm-trust-guard and found regex detection around F1 0.487. ML models are no better, a 2025 paper showed all 12 bypassed at >90% attack success rate. The real defense isn't better detection, it's separating what detects from what enforces.
Source: HackerNoon →