ML Security Papers

Latest papers

2 papers

defense arXiv Apr 18, 2026 · 4w ago

Bo Yan, Weikai Lin, Yada Zhu et al. · University of Central Florida · University of Rochester +1 more

World-model-based early warning system that detects multi-turn jailbreak attacks 1+ turns before LLM compliance using safety state prediction

Prompt Injection nlp

defense arXiv Jan 19, 2026 · Jan 2026

Chan Naseeb, Adeel Ashraf Cheema, Hassan Sami et al. · IBM · FAST NUCES +1 more

Novel dual-head Swin Transformer architecture detects and localizes AI-generated face swaps and text inpainting attacks in identity documents

Output Integrity Attack vision