Blog
8 hours ago
Mean Pooling Was Hiding Prompt Injections in Our RAG Pipeline
RAG detectors fail because mean pooling averages out malicious signals in long documents. While a short attack gets diluted, the encoder’s raw hidden states capture the threat. By switching to sliding a small window over states and taking the max score you preserve the signal without extra models. It’s a 30-line fix that runs in <1ms, using data your pipeline is already generating but throwing away.
Source: HackerNoon →