ML Security Papers

LS06

Red-Team Agents

Autonomous LLM offensive agents (PentestGPT-class)

63 papers Browse all papers

Monthly publications

Paper types

attack 33

benchmark 14

tool 12

defense 3

survey 1

Domains

nlp 62

multimodal 3

vision 2

tabular 1

audio 1

generative 1

reinforcement-learning 1

Co-occurring categories

Other OWASP categories that appear on the same papers

LLM01 Prompt Injection

LS10 Benchmarks & Evaluation

LLM08 Excessive Agency

ML01 Input Manipulation Attack

LLM07 Insecure Plugin Design

LS01 Vulnerability Discovery

LS02 Exploit Generation

LS07 Blue-Team Agents

ML04 Membership Inference Attack

LLM06 Sensitive Information Disclosure

ML10 Model Poisoning

LLM03 Training Data Poisoning

LS09 Fuzzing & Test Generation

LS03 Reconnaissance & OSINT

LS04 Patch & Remediation

ML05 Model Theft

LS05 Triage & Prioritization

ML03 Model Inversion Attack

Top cited papers

RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

Diffusion LLMs are Natural Adversaries for any LLM

Takedown: How It's Done in Modern Coding Agent Exploits

Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

Anecdoctoring: Automated Red-Teaming Across Language and Place

RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning

ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks

Browse all 63 papers