Latest papers

2 papers
defense arXiv Apr 18, 2026 · 4w ago

SafeDream: Safety World Model for Proactive Early Jailbreak Detection

Bo Yan, Weikai Lin, Yada Zhu et al. · University of Central Florida · University of Rochester +1 more

World-model-based early warning system that detects multi-turn jailbreak attacks 1+ turns before LLM compliance using safety state prediction

Prompt Injection nlp
PDF
defense arXiv Jan 19, 2026 · Jan 2026

TwoHead-SwinFPN: A Unified DL Architecture for Synthetic Manipulation, Detection and Localization in Identity Documents

Chan Naseeb, Adeel Ashraf Cheema, Hassan Sami et al. · IBM · FAST NUCES +1 more

Novel dual-head Swin Transformer architecture detects and localizes AI-generated face swaps and text inpainting attacks in identity documents

Output Integrity Attack vision
PDF