Latest papers

97 papers
attack arXiv Mar 25, 2026 · 12d ago

How Vulnerable Are Edge LLMs?

Ao Ding, Hongzong Li, Zi Liang et al. · China University of Geosciences · Hong Kong University of Science and Technology +4 more

Query-based extraction attack on quantized edge LLMs using clustered instruction queries to steal model behavior efficiently

Model Theft Model Theft nlp
PDF
defense arXiv Mar 25, 2026 · 12d ago

Beyond Semantic Priors: Mitigating Optimization Collapse for Generalizable Visual Forensics

Jipeng Liu, Haichao Shi, Siyu Xing et al. · Chinese Academy of Sciences · Beihang University

Addresses optimization collapse in VLM-based deepfake detectors through gradient signal enhancement and contrastive regional injection for cross-domain generalization

Output Integrity Attack visionmultimodal
PDF
attack arXiv Mar 22, 2026 · 15d ago

Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs

Zihui Chen, Yuling Wang, Pengfei Jiao et al. · Hangzhou Dianzi University · Beihang University +1 more

LLM-driven universal adversarial attack framework targeting text-attributed graph models across GNN and PLM architectures

Input Manipulation Attack nlpgraph
PDF
defense arXiv Mar 19, 2026 · 18d ago

Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness

Lu Yu, Haiyang Zhang, Changsheng Xu · Tianjin University of Technology · Chinese Academy of Sciences +1 more

Defends CLIP against adversarial examples using complementary text-guided attention to maintain zero-shot generalization while improving robustness

Input Manipulation Attack visionnlpmultimodal
PDF Code
defense arXiv Mar 19, 2026 · 18d ago

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Yue Zhao, Yujia Gong, Ruigang Liang et al. · Chinese Academy of Sciences · Beijing University of Posts and Telecommunications +1 more

Transfers safety functionality between LLMs by transplanting minimal neuron subsets, enabling alignment enhancement and jailbreak defense without retraining

Prompt Injection nlp
PDF
defense arXiv Mar 16, 2026 · 21d ago

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang, Yubing Ren, Yanan Cao et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Black-box framework for third-party watermark detection in LLM outputs using proxy models and statistical tests

Output Integrity Attack nlp
PDF
defense arXiv Mar 16, 2026 · 21d ago

Counterexample Guided Branching via Directional Relaxation Analysis in Complete Neural Network Verification

Jingyang Li, Fu Song, Guoqiang Li · Shanghai Jiao Tong University · Chinese Academy of Sciences

Reformulates neural network verification as CEGAR loop, using spurious counterexamples to guide branching and tighten robustness proofs

Input Manipulation Attack vision
PDF
defense arXiv Mar 13, 2026 · 24d ago

RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection

He Zhu, Yanshu Li, Wen Liu et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences

Black-box adversarial text detector using replaced token detection to identify word-substitution attacks with only two model queries

Input Manipulation Attack nlp
PDF
defense arXiv Mar 13, 2026 · 24d ago

What Makes VLMs Robust? Towards Reconciling Robustness and Accuracy in Vision-Language Models

Sen Nie, Jie Zhang, Zhongqi Wang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences

Freezes pre-trained VLM weights and adapts only shallow layers to achieve adversarial robustness without sacrificing clean accuracy

Input Manipulation Attack visionnlpmultimodal
PDF Code
defense arXiv Mar 10, 2026 · 27d ago

ShapeMark: Robust and Diversity-Preserving Watermarking for Diffusion Models

Yuqi Qian, Yun Cao, Haocheng Fu et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Embeds robust provenance watermarks in diffusion model noise using structural encoding to survive lossy post-processing

Output Integrity Attack visiongenerative
PDF
defense arXiv Mar 6, 2026 · 4w ago

BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

Feiran Li, Qianqian Xu, Shilong Bao et al. · Institute of Information Engineering · University of Chinese Academy of Sciences +4 more

Black-box backdoor detector for text-to-image diffusion models using semantic instruction-response deviation across varied prompts

Model Poisoning visiongenerativemultimodal
PDF Code
defense arXiv Mar 4, 2026 · 4w ago

Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information

Yifan Zhu, Yibo Miao, Yinpeng Dong et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Proposes MI-UE, a theoretically grounded availability-poisoning defense that blocks unauthorized model training by reducing mutual information in poisoned image features

Data Poisoning Attack vision
PDF
defense arXiv Mar 3, 2026 · 4w ago

StegaFFD: Privacy-Preserving Face Forgery Detection via Fine-Grained Steganographic Domain Lifting

Guoqing Ma, Xun Lin, Hui Ma et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +3 more

Steganographic framework hides faces in cover images and detects deepfakes directly in the hidden domain to prevent facial privacy leakage

Output Integrity Attack vision
PDF
benchmark arXiv Mar 2, 2026 · 5w ago

CTForensics: A Comprehensive Dataset and Method for AI-Generated CT Image Detection

Yiheng Li, Zichang Tan, Guoqing Xu et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +1 more

Benchmarks AI-generated CT image detection with a 10-model dataset and novel wavelet-spatial-frequency CNN detector

Output Integrity Attack vision
PDF Code
defense arXiv Mar 2, 2026 · 5w ago

Explanation-Guided Adversarial Training for Robust and Interpretable Models

Chao Chen, Yanhui Chen, Shanshan Lin et al. · Harbin Institute of Technology · Fuzhou University +1 more

Adversarial training framework combining explanation-guided constraints to improve robustness and saliency map stability against adversarial attacks

Input Manipulation Attack vision
PDF
defense arXiv Feb 12, 2026 · 7w ago

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Dong Yan, Jian Liang, Ran He et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +1 more

Defends against LLM attribute inference attacks using fine-grained anonymization and adversarial suffix optimization to induce model rejection

Sensitive Information Disclosure nlp
1 citations PDF Code
attack arXiv Feb 2, 2026 · 9w ago

HPE: Hallucinated Positive Entanglement for Backdoor Attacks in Federated Self-Supervised Learning

Jiayao Wang, Yang Song, Zhendong Zhao et al. · Yangzhou University · Chinese Academy of Sciences +3 more

Proposes HPE backdoor attack for federated self-supervised learning using synthetic positive entanglement and selective parameter poisoning to persist through aggregation

Model Poisoning visionfederated-learning
PDF
attack arXiv Jan 30, 2026 · 9w ago

The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization

Manyi Li, Yufan Liu, Lai Jiang et al. · University of the Chinese Academy of Sciences · Chinese Academy of Sciences +2 more

Attacks machine unlearning defenses in diffusion models by optimizing initial latent variables to reactivate erased NSFW knowledge

Input Manipulation Attack visiongenerative
PDF Code
defense arXiv Jan 27, 2026 · 9w ago

Contrastive Spectral Rectification: Test-Time Defense towards Zero-shot Adversarial Robustness of CLIP

Sen Nie, Jie Zhang, Zhuo Wang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Test-time defense purifies adversarial inputs to CLIP using spectral-guided contrastive rectification, outperforming SOTA by 18.1% against AutoAttack

Input Manipulation Attack visionmultimodal
1 citations PDF Code
defense TPAMI Jan 27, 2026 · 9w ago

Privacy-Preserving Model Transcription with Differentially Private Synthetic Distillation

Bochao Liu, Shiming Ge, Pengju Wang et al. · Chinese Academy of Sciences · Beijing Institute of Astronautical Systems Engineering +1 more

Defends against model inversion by converting trained models to DP-guaranteed equivalents via data-free synthetic distillation without accessing private training data

Model Inversion Attack vision
PDF
Loading more papers…