Latest papers

3 papers
attack arXiv Apr 18, 2026 · 4w ago

When Choices Become Risks: Safety Failures of Large Language Models under Multiple-Choice Constraints

Yuheng Chen, Zhiyu Wu, Bowen Cheng et al. · Kagoshima University · Fudan University +1 more

Bypasses LLM safety alignment by reformulating harmful prompts as forced-choice questions where all options violate policies

Prompt Injection nlp
PDF
attack arXiv Mar 23, 2026 · 8w ago

Thermal Topology Collapse: Universal Physical Patch Attacks on Infrared Vision Systems

Chengyin Hu, Yikun Guo, Yuxian Dong et al. · China University of Petroleum-Beijing · University of Electronic Science and Technology of China +3 more

Universal adversarial patch attack on infrared pedestrian detectors using parameterized Bézier curves and cold patches

Input Manipulation Attack vision
PDF
defense arXiv Mar 9, 2026 · 10w ago

Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images

Qishun Yang, Shu Yang, Lijie Hu et al. · King Abdullah University of Science and Technology · China University of Petroleum-Beijing +1 more

Defends VLMs against visual jailbreaks via label-free fine-tuning on neutral threat-image tasks to shape safety-oriented personas

Prompt Injection visionmultimodalnlp
PDF