attack arXiv Jan 9, 2026 · 12w ago
Ahmad Alobaid, Martí Jordà Roca, Carlos Castillo et al. · NeuralTrust · ICREA +1 more
Proposes Echo Chamber, a multi-turn LLM jailbreak using gradual escalation via poisonous seeds to bypass safety guardrails
Prompt Injection nlp
The availability of Large Language Models (LLMs) has led to a new generation of powerful chatbots that can be developed at relatively low cost. As companies deploy these tools, security challenges need to be addressed to prevent financial loss and reputational damage. A key security challenge is jailbreaking, the malicious manipulation of prompts and inputs to bypass a chatbot's safety guardrails. Multi-turn attacks are a relatively new form of jailbreaking involving a carefully crafted chain of interactions with a chatbot. We introduce Echo Chamber, a new multi-turn attack using a gradual escalation method. We describe this attack in detail, compare it to other multi-turn attacks, and demonstrate its performance against multiple state-of-the-art models through extensive evaluation.
llm NeuralTrust · ICREA · Universitat Pompeu Fabra
defense arXiv Jan 22, 2026 · 10w ago
Joan Vendrell Farreny, Martí Jordà Roca, Miquel Cornudella Gaya et al. · NeuralTrust
Proposes a unified LLM security enforcement layer analogous to WAF, covering prompt injection, jailbreaks, and agent tool abuse
Prompt Injection Insecure Plugin Design nlp
This paper introduces the Generative Application Firewall (GAF), a new architectural layer for securing LLM applications. Existing defenses -- prompt filters, guardrails, and data-masking -- remain fragmented; GAF unifies them into a single enforcement point, much like a WAF coordinates defenses for web traffic, while also covering autonomous agents and their tool interactions.
llm NeuralTrust