ML Security Papers

Latest papers

7 papers

defense arXiv Mar 3, 2026 · 5w ago

StegaFFD: Privacy-Preserving Face Forgery Detection via Fine-Grained Steganographic Domain Lifting

Guoqing Ma, Xun Lin, Hui Ma et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +3 more

Steganographic framework hides faces in cover images and detects deepfakes directly in the hidden domain to prevent facial privacy leakage

Output Integrity Attack vision

PDF

Most existing Face Forgery Detection (FFD) models assume access to raw face images. In practice, under a client-server framework, private facial data may be intercepted during transmission or leaked by untrusted servers. Previous privacy protection approaches, such as anonymization, encryption, or distortion, partly mitigate leakage but often introduce severe semantic distortion, making images appear obviously protected. This alerts attackers, provoking more aggressive strategies and turning the process into a cat-and-mouse game. Moreover, these methods heavily manipulate image contents, introducing degradation or artifacts that may confuse FFD models, which rely on extremely subtle forgery traces. Inspired by advances in image steganography, which enable high-fidelity hiding and recovery, we propose a Stega}nography-based Face Forgery Detection framework (StegaFFD) to protect privacy without raising suspicion. StegaFFD hides facial images within natural cover images and directly conducts forgery detection in the steganographic domain. However, the hidden forgery-specific features are extremely subtle and interfered with by cover semantics, posing significant challenges. To address this, we propose Low-Frequency-Aware Decomposition (LFAD) and Spatial-Frequency Differential Attention (SFDA), which suppress interference from low-frequency cover semantics and enhance hidden facial feature perception. Furthermore, we introduce Steganographic Domain Alignment (SDA) to align the representations of hidden faces with those of their raw counterparts, enhancing the model's ability to perceive subtle facial cues in the steganographic domain. Extensive experiments on seven FFD datasets demonstrate that StegaFFD achieves strong imperceptibility, avoids raising attackers' suspicion, and better preserves FFD accuracy compared to existing facial privacy protection methods.

cnn transformer Chinese Academy of Sciences · University of Chinese Academy of Sciences · Beihang University +2 more

PDF arXiv

defense arXiv Feb 14, 2026 · 7w ago

AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

Weiming Song, Xuan Xie, Ruiping Yin · Beijing University of Technology · Macau University of Science and Technology

Defends LLMs against jailbreaks by extracting safety signals from attention heads and steering logits without fine-tuning

Prompt Injection nlp

PDF

defense arXiv Jan 31, 2026 · 9w ago

Towards Building Non-Fine-Tunable Foundation Models

Ziyao Wang, Nizhang Li, Pingzhi Li et al. · College Park · Macau University of Science and Technology +1 more

Defends open-source LLMs against unauthorized fine-tuning by hiding a sparse subnetwork mask, degrading adaptation without the key

Transfer Learning Attack Model Theft nlp

PDF

attack arXiv Nov 17, 2025 · Nov 2025

VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language

Zonghao Ying, Moyang Chen, Nizhang Li et al. · Beihang University · Wenzhou-Kean University +4 more

Jailbreaks text-to-video models using benign prompts with auditory triggers and cinematic cues that exploit cross-modal priors

Prompt Injection multimodalgenerativevisionnlp

1 citations PDF Code

defense arXiv Sep 23, 2025 · Sep 2025

DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces

Tianshuo Zhang, Li Gao, Siran Peng et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +2 more

Continual-learning deepfake detector using orthogonal LoRA experts to adapt to new forgery types without catastrophic forgetting

Output Integrity Attack vision

PDF

defense arXiv Sep 19, 2025 · Sep 2025

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

Xiaowei Zhu, Yubing Ren, Fang Fang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Zero-shot AI text detector using DNA-inspired mutation-repair scoring to distinguish LLM-generated from human-written text at SOTA accuracy

Output Integrity Attack nlp

PDF Code

defense arXiv Aug 3, 2025 · Aug 2025

DiffusionFF: A Diffusion-based Framework for Joint Face Forgery Detection and Fine-Grained Artifact Localization

Siran Peng, Haoyuan Zhang, Li Gao et al. · Institute of Automation · University of Chinese Academy of Sciences +4 more

Diffusion-based encoder-decoder detects face forgeries and localizes artifacts jointly for improved explainability

Output Integrity Attack visiongenerative

PDF

Latest papers

StegaFFD: Privacy-Preserving Face Forgery Detection via Fine-Grained Steganographic Domain Lifting

AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

Towards Building Non-Fine-Tunable Foundation Models

VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language

DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

DiffusionFF: A Diffusion-based Framework for Joint Face Forgery Detection and Fine-Grained Artifact Localization

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue