Yaodong Yang

defense arXiv Apr 30, 2026 · 21d ago

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Bowen Sun, Chaozhuo Li, Yaodong Yang et al. · Johns Hopkins University · Microsoft Research Asia +2 more

Dual-encoder defense that clusters fragmented malicious prompts across anonymous LLM requests using asymmetric contrastive learning

Prompt Injection nlp

PDF

attack arXiv Apr 7, 2026 · 6w ago

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

Zonghao Ying, Haowen Dai, Lianyu Hu et al. · Beihang University · University of Nottingham Ningbo China +3 more

Black-box jailbreak attack coercing T2I models to render harmful text in benign images via layered prompt decomposition

Prompt Injection multimodalvisionnlp

PDF

defense arXiv Apr 27, 2026 · 24d ago

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Zonghao Ying, Haozheng Wang, Jiangfan Liu et al. · Beihang University · 360 AI Security Lab +1 more

OS-inspired defense framework that intercepts LLM agent tool calls and enforces privilege separation to block prompt injection attacks

Prompt Injection Excessive Agency nlp

PDF

attack arXiv Sep 18, 2025 · Sep 2025

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

Simin Li, Zheng Yuwei, Zihao Mao et al. · Beihang University · Peking University +3 more

Identifies maximally vulnerable agent subsets in large-scale MARL and learns worst-case adversarial policies via hierarchical mean-field control decomposition

Input Manipulation Attack reinforcement-learning

PDF

Papers in Database (4)

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning