ML Security Papers

LLM01

Prompt Injection

Natural language manipulation of LLMs

1367 papers Browse all papers

Monthly publications

Paper types

defense 513

attack 489

benchmark 263

survey 52

tool 50

Domains

nlp 1318

multimodal 315

vision 214

generative 49

audio 30

reinforcement-learning 24

graph 7

federated-learning 2

Top cited papers

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

Securing the Model Context Protocol (MCP): Risks, Controls, and Governance

Defending Against Prompt Injection with DataFilter

Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Browse all 1367 papers