Latest papers

5 papers
defense arXiv Apr 19, 2026 · 4w ago

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu, Eugene Ilyushin, Jie Ni et al. · Lomonosov Moscow State University · Central University

Runtime security architecture defending LLM agents against prompt injection by mediating tool-use actions with stateful risk reasoning

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
benchmark arXiv Apr 13, 2026 · 5w ago

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya et al. · Lomonosov Moscow State University · Shenzhen University +14 more

Competition report on robust deepfake detection across 42 generators and 36 image transformations with 20 final solutions

Output Integrity Attack visiongenerative
PDF
attack arXiv Mar 30, 2026 · 7w ago

From Pixels to Reality: Physical-Digital Patch Attacks on Real-World Camera

Victoria Leonenkova, Ekaterina Shumitskaya, Dmitriy Vatolin et al. · Lomonosov Moscow State University

Physical adversarial patch attack displayed on smartphone screens to evade real-world face recognition cameras in black-box settings

Input Manipulation Attack vision
PDF
defense arXiv Feb 23, 2026 · 12w ago

BiRQA: Bidirectional Robust Quality Assessment for Images

Aleksandr Gushchin, Dmitriy S. Vatolin, Anastasia Antsiferova · ISP RAS Research Center for Trusted Artificial Intelligence · MSU Institute for Artificial Intelligence +2 more

Defends image quality assessment models against white-box adversarial attacks via Anchored Adversarial Training with ranking loss and clean anchor samples

Input Manipulation Attack vision
PDF
attack arXiv Feb 6, 2026 · Feb 2026

Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance

Haipeng Li, Rongxuan Peng, Anwei Luo et al. · Shenzhen University · Nanyang Technological University +2 more

Adversarial perturbations that evade AI-generated content detectors by manipulating shared CLIP embeddings toward authentic anchors

Input Manipulation Attack Output Integrity Attack visionmultimodal
PDF