Latest papers

7 papers
defense arXiv Mar 3, 2026 · 5w ago

StegaFFD: Privacy-Preserving Face Forgery Detection via Fine-Grained Steganographic Domain Lifting

Guoqing Ma, Xun Lin, Hui Ma et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +3 more

Steganographic framework hides faces in cover images and detects deepfakes directly in the hidden domain to prevent facial privacy leakage

Output Integrity Attack vision
PDF
defense arXiv Feb 14, 2026 · 7w ago

AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

Weiming Song, Xuan Xie, Ruiping Yin · Beijing University of Technology · Macau University of Science and Technology

Defends LLMs against jailbreaks by extracting safety signals from attention heads and steering logits without fine-tuning

Prompt Injection nlp
PDF
defense arXiv Jan 31, 2026 · 9w ago

Towards Building Non-Fine-Tunable Foundation Models

Ziyao Wang, Nizhang Li, Pingzhi Li et al. · College Park · Macau University of Science and Technology +1 more

Defends open-source LLMs against unauthorized fine-tuning by hiding a sparse subnetwork mask, degrading adaptation without the key

Transfer Learning Attack Model Theft nlp
PDF
attack arXiv Nov 17, 2025 · Nov 2025

VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language

Zonghao Ying, Moyang Chen, Nizhang Li et al. · Beihang University · Wenzhou-Kean University +4 more

Jailbreaks text-to-video models using benign prompts with auditory triggers and cinematic cues that exploit cross-modal priors

Prompt Injection multimodalgenerativevisionnlp
1 citations PDF Code
defense arXiv Sep 23, 2025 · Sep 2025

DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces

Tianshuo Zhang, Li Gao, Siran Peng et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +2 more

Continual-learning deepfake detector using orthogonal LoRA experts to adapt to new forgery types without catastrophic forgetting

Output Integrity Attack vision
PDF
defense arXiv Sep 19, 2025 · Sep 2025

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

Xiaowei Zhu, Yubing Ren, Fang Fang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Zero-shot AI text detector using DNA-inspired mutation-repair scoring to distinguish LLM-generated from human-written text at SOTA accuracy

Output Integrity Attack nlp
PDF Code
defense arXiv Aug 3, 2025 · Aug 2025

DiffusionFF: A Diffusion-based Framework for Joint Face Forgery Detection and Fine-Grained Artifact Localization

Siran Peng, Haoyuan Zhang, Li Gao et al. · Institute of Automation · University of Chinese Academy of Sciences +4 more

Diffusion-based encoder-decoder detects face forgeries and localizes artifacts jointly for improved explainability

Output Integrity Attack visiongenerative
PDF