ML Security Papers

Latest papers

2 papers

attack arXiv Sep 27, 2025 · Sep 2025

Jeongyeon Hwang, Sangdon Park, Jungseul Ok · Pohang University of Science and Technology

Query-free attack evades LLM text watermarks with >99% success using token-surprisal-guided bias inversion

Output Integrity Attack nlp

attack arXiv Aug 5, 2025 · Aug 2025

Hiskias Dingeto, Taeyoun Kwon, Dasol Choi et al. · AIM Intelligence · Seoul National University +3 more

Two-stage gradient-based attack embeds harmful payloads in benign audio to jailbreak audio-language models via RL-PGD optimization

Input Manipulation Attack Prompt Injection audiomultimodalnlp