Latest papers

3 papers
defense arXiv Oct 2, 2025 · Oct 2025

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning

Zhenyu Pan, Yiting Zhang, Zhuo Liu et al. · Northwestern University · University of Illinois at Chicago +2 more

Adversarial co-evolution MARL framework that trains LLM agents to resist jailbreaks and prompt injection without external guard modules

Prompt Injection Excessive Agency nlpreinforcement-learning
1 citations PDF
tool arXiv Sep 11, 2025 · Sep 2025

MarkDiffusion: An Open-Source Toolkit for Generative Watermarking of Latent Diffusion Models

Leyi Pan, Sheng Guan, Zheyu Fu et al. · Tsinghua University · Beijing University of Posts and Telecommunications +3 more

Open-source Python toolkit for watermarking diffusion model outputs with 24 evaluation tools and 8 automated pipelines

Output Integrity Attack generativevision
PDF Code
defense arXiv Aug 5, 2025 · Aug 2025

Evo-MARL: Co-Evolutionary Multi-Agent Reinforcement Learning for Internalized Safety

Zhenyu Pan, Yiting Zhang, Yutong Zhang et al. · Northwestern University · University of Illinois at Chicago

Defends LLM multi-agent systems against jailbreaks by co-evolving attackers and defenders via MARL, internalizing safety without external guard modules

Prompt Injection Excessive Agency multimodalreinforcement-learningnlp
PDF