Latest papers

15 papers
defense arXiv Mar 25, 2026 · 12d ago

High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking

Peipeng Yu, Jinfeng Xie, Chengfu Ou et al. · Jinan University · University of Macau +2 more

Embeds semantic watermarks in face images for copyright protection, pixel-level deepfake localization, and content recovery after manipulation

Output Integrity Attack visiongenerative
PDF
attack arXiv Mar 10, 2026 · 27d ago

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Fan Yang · Jinan University

Jailbreaks thinking-mode LLMs by interleaving multi-task streams, character reversal, and format constraints in a single prompt

Prompt Injection nlp
PDF Code
attack arXiv Mar 5, 2026 · 4w ago

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi, Huajie Chen, Heng Xu et al. · City University of Macau · Jinan University +1 more

Poisons distilled synthetic datasets to embed hidden hijacking tasks in models fine-tuned via transfer learning

Data Poisoning Attack Transfer Learning Attack vision
PDF
defense arXiv Feb 26, 2026 · 5w ago

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Tian Zhang, Yiwei Xu, Juan Wang et al. · Wuhan University · University at Buffalo +1 more

Defends LLM agents against indirect prompt injection via causal takeover detection and context purification at tool-return boundaries

Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Feb 1, 2026 · 9w ago

GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

Xueyi Li, Zhuoneng Zhou, Zitao Liu et al. · Guangdong Institute of Smart Education · Jinan University

Adversarial attack framework targeting LLM graders via token-level gradient perturbations and prompt-level natural language manipulation

Input Manipulation Attack Prompt Injection nlp
PDF Code
defense arXiv Dec 31, 2025 · Dec 2025

Noise-Aware and Dynamically Adaptive Federated Defense Framework for SAR Image Target Recognition

Yuchao Hou, Zixuan Zhang, Jie Wang et al. · Shanxi Normal University · Guizhou University +7 more

Defends federated SAR image classifiers against backdoor attacks using frequency-domain trigger detection and noise-aware adversarial training

Model Poisoning visionfederated-learning
PDF
attack arXiv Dec 26, 2025 · Dec 2025

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Zongmin Zhang, Zhen Sun, Yifan Liao et al. · Hong Kong University of Science and Technology · Nanjing University of Aeronautics and Astronautics +2 more

Proposes BadVSFM, a two-stage backdoor attack on prompt-driven video segmentation models where classic backdoors fail (<5% ASR)

Model Poisoning vision
PDF
attack arXiv Dec 17, 2025 · Dec 2025

CLIP-FTI: Fine-Grained Face Template Inversion via CLIP-Driven Attribute Conditioning

Longchen Dai, Zixuan Shen, Zhiheng Zhou et al. · Jinan University

Inverts leaked face recognition templates into photorealistic face images using CLIP semantic conditioning and StyleGAN latent projection

Model Inversion Attack vision
PDF
attack arXiv Dec 5, 2025 · Dec 2025

Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models

Fan Yang · Jinan University

Jailbreaks LLMs by rewriting harmful prompts into safe-isomorphic ones, generating responses, then reverse-mapping to harmful outputs

Prompt Injection nlp
PDF
defense arXiv Nov 15, 2025 · Nov 2025

Fine-Grained DINO Tuning with Dual Supervision for Face Forgery Detection

Tianxiang Zhang, Peipeng Yu, Zhihua Xia et al. · Jinan University

Proposes DFF-Adapter, a LoRA-based multi-task DINOv2 fine-tuning method for fine-grained deepfake face detection

Output Integrity Attack vision
PDF
attack arXiv Nov 6, 2025 · Nov 2025

P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis Models

Mingliang Hou, Yinuo Wang, Teng Guo et al. · Jilin University · TAL Education Group +1 more

Grey-box membership inference attack on educational cognitive diagnosis models exploiting exposed knowledge state visualizations

Membership Inference Attack tabular
1 citations PDF
defense arXiv Oct 19, 2025 · Oct 2025

Rotation, Scale, and Translation Resilient Black-box Fingerprinting for Intellectual Property Protection of EaaS Models

Hongjie Zhang, Zhiqi Zhao, Hanzhou Wu et al. · Sichuan Normal University · Shanghai University +3 more

Fingerprints EaaS embedding models via point-cloud topology analysis to verify ownership, resilient to rotation, scale, and translation attacks

Model Theft visionnlp
PDF
defense arXiv Oct 16, 2025 · Oct 2025

An Information Asymmetry Game for Trigger-based DNN Model Watermarking

Chaoyue Huang, Gejian Zhao, Hanzhou Wu et al. · Shanghai University · Guizhou Normal University +2 more

Game-theoretic framework for robust DNN model watermarking derives attacker's optimal pruning budget and exponential WSR lower bound

Model Theft vision
PDF
defense arXiv Sep 25, 2025 · Sep 2025

FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies

Shuqiao Liang, Jian Liu, Renzhang Chen et al. · Jinan University

Proposes FerretNet, a lightweight 1.1M-parameter detector for AI-generated images using local pixel dependency reconstruction to expose generation artifacts

Output Integrity Attack visiongenerative
4 citations 1 influentialPDF Code
defense arXiv Aug 9, 2025 · Aug 2025

The Cost of Thinking: Increased Jailbreak Risk in Large Language Models

Fan Yang · Jinan University

Discovers thinking-mode LLMs are more jailbreak-vulnerable and defends via safe thinking intervention using special tokens

Prompt Injection nlp
PDF Code