Latest papers

5 papers
attack arXiv Apr 2, 2026 · 4d ago

Low-Effort Jailbreak Attacks Against Text-to-Image Safety Filters

Ahmed B Mustafa, Zihan Ye, Yang Lu et al. · University of Nottingham · Xi’an Jiaotong-Liverpool University +1 more

Low-effort prompt-based jailbreaks bypass text-to-image safety filters using linguistic reframing, achieving 74% attack success

Prompt Injection multimodalgenerative
PDF
defense arXiv Feb 19, 2026 · 6w ago

Provable Adversarial Robustness in In-Context Learning

Di Zhang · Xi’an Jiaotong-Liverpool University

Proves worst-case ICL robustness bounds showing model capacity scales sqrt(m) with tolerable adversarial distribution shift

Input Manipulation Attack nlp
PDF
defense IEEE Transactions on Image Pro... Jan 23, 2026 · 10w ago

StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors

Qinkai Yu, Chong Zhang, Gaojie Jin et al. · University of Exeter · King Abdullah University of Science and Technology +6 more

Embeds backdoor-based watermarks in medical segmentation models to verify ownership under black-box API conditions

Model Theft vision
PDF Code
defense arXiv Nov 10, 2025 · Nov 2025

HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection

Fangqi Dai, Xingjian Jiang, Zizhuang Deng · Shandong University · Xi’an Jiaotong-Liverpool University +2 more

Novel reward-based alignment method detects LLM-revised human text by tuning scoring models toward human writing preferences

Output Integrity Attack nlp
PDF Code
defense arXiv Oct 10, 2025 · Oct 2025

VisuoAlign: Safety Alignment of LVLMs with Multimodal Tree Search

MingSheng Li, Guangze Zhao, Sichen Liu · Harbin Institute of Technology · Xi’an Jiaotong-Liverpool University

Defends LVLMs against multimodal jailbreaks using MCTS-guided safety prompt trajectories embedded in the reasoning chain

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF