Latest papers

4 papers
defense arXiv Apr 30, 2026 · 21d ago

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Bowen Sun, Chaozhuo Li, Yaodong Yang et al. · Johns Hopkins University · Microsoft Research Asia +2 more

Dual-encoder defense that clusters fragmented malicious prompts across anonymous LLM requests using asymmetric contrastive learning

Prompt Injection nlp
PDF
defense arXiv Jan 3, 2026 · Jan 2026

Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models

Jiayi Xu, Zhang Zhang, Yuanrui Zhang et al. · Peking University · Microsoft Research Asia

Training-free, statistically-certified watermarking for AI-generated images via patch-level luminance patterns across diffusion and AR models

Output Integrity Attack visiongenerative
PDF
benchmark arXiv Oct 11, 2025 · Oct 2025

Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images

Chuangchuang Tan, Xiang Ming, Jinglu Wang et al. · Beijing Jiaotong University · Microsoft Research Asia +1 more

Benchmark and evaluation framework for detecting semantic anomalies in AI-generated images, targeting deepfake detection and AIGC authenticity

Output Integrity Attack visionmultimodal
PDF
defense arXiv Aug 2, 2025 · Aug 2025

ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models

Chuangchuang Tan, Jinglu Wang, Xiang Ming et al. · Beijing Jiaotong University · Microsoft Research Asia

Explainable AI-generated image detection via MLLM forensic prompts, plus a new forgery-evidence description dataset

Output Integrity Attack visionmultimodalnlp
PDF