Latest papers

3 papers
attack arXiv Apr 2, 2026 · 4d ago

Low-Effort Jailbreak Attacks Against Text-to-Image Safety Filters

Ahmed B Mustafa, Zihan Ye, Yang Lu et al. · University of Nottingham · Xi’an Jiaotong-Liverpool University +1 more

Low-effort prompt-based jailbreaks bypass text-to-image safety filters using linguistic reframing, achieving 74% attack success

Prompt Injection multimodalgenerative
PDF
defense arXiv Feb 23, 2026 · 6w ago

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Zac Garby, Andrew D. Gordon, David Sands · University of Nottingham · University of Edinburgh +2 more

Formal lambda calculus with dynamic information-flow control proves noninterference guarantees for LLM agents against prompt injection

Prompt Injection Excessive Agency nlp
PDF
attack arXiv Dec 21, 2025 · Dec 2025

Adversarial Robustness in Zero-Shot Learning:An Empirical Study on Class and Concept-Level Vulnerabilities

Zhiyuan Peng, Zihan Ye, Shreyank N Gowda et al. · iFLYTEK · University of Chinese Academy of Sciences +3 more

Proposes novel adversarial attacks on Zero-Shot Learning models exploiting class calibration bias and semantic concept vulnerabilities to fully eliminate GZSL accuracy.

Input Manipulation Attack vision
PDF