Latest papers

39 papers
attack arXiv Mar 29, 2026 · 10d ago

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Duanyi Yao, Changyue Li, Zhicong Huang et al. · Hong Kong University of Science and Technology · The Chinese University of Hong Kong +2 more

Semantic backdoor attack on VLMs that injects ads when users ask recommendation questions about specific content categories

Model Poisoning multimodalvisionnlp
PDF
benchmark arXiv Mar 21, 2026 · 18d ago

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu et al. · Zhejiang University · Chongqing University +1 more

Measurement study showing FL poisoning attacks are less effective in practice than research suggests due to heterogeneity and stability constraints

Data Poisoning Attack visionnlptabularfederated-learning
PDF Code
defense arXiv Mar 19, 2026 · 20d ago

MOSAIC: Multi-Objective Slice-Aware Iterative Curation for Alignment

Yipu Dou, Wang Yang · Southeast University

Iterative data mixture optimization framework that balances LLM safety alignment, over-refusal reduction, and instruction following under fixed training budgets

Prompt Injection nlp
PDF Code
defense arXiv Mar 13, 2026 · 26d ago

Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He et al. · Southeast University · Nanyang Technological University +2 more

Test-time backdoor defense for LVLMs that detects poisoned inputs via cross-modal attention anomalies and purifies them by pruning trigger tokens

Model Poisoning multimodalnlpvision
PDF
defense arXiv Mar 11, 2026 · 28d ago

Attribution as Retrieval: Model-Agnostic AI-Generated Image Attribution

Hongsong Wang, Renxi Cheng, Chaolei Han et al. · Southeast University · Purple Mountain Laboratories

Model-agnostic deepfake attribution framework using low-bit fingerprints and retrieval for zero- and few-shot source attribution

Output Integrity Attack vision
PDF Code
defense arXiv Mar 9, 2026 · 4w ago

Where, What, Why: Toward Explainable 3D-GS Watermarking

Mingshu Cai, Jiajun Li, Osamu Yoshie et al. · Waseda University · Southeast University +1 more

Watermarks 3D Gaussian Splatting assets with explainable carrier selection, improving visual quality by +0.83 dB and bit-accuracy by +1.24% over prior methods

Output Integrity Attack visiongenerative
PDF
defense arXiv Mar 5, 2026 · 4w ago

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Lianyu Wang, Meng Wang, Huazhu Fu et al. · Nanjing University of Aeronautics and Astronautics · Southeast University +1 more

Defends VLM intellectual property via dynamic authorization module restricting deployment to user-specified domains at inference time

Model Theft visionnlpmultimodal
PDF
defense arXiv Mar 1, 2026 · 5w ago

S2O: Enhancing Adversarial Training with Second-Order Statistics of Weights

Gaojie Jin, Xinping Yi, Wei Huang et al. · University of Exeter · Southeast University +1 more

Improves adversarial training robustness by optimizing second-order weight statistics via a tightened PAC-Bayesian bound

Input Manipulation Attack vision
PDF Code
benchmark arXiv Feb 23, 2026 · 6w ago

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

Lei Ba, Qinbin Li, Songze Li · Southeast University · Huazhong University of Science and Technology

Benchmark evaluating LLM code interpreter agents against prompt injection, memory poisoning, and backdoor attacks in live sandboxed execution environments

Prompt Injection Excessive Agency nlp
PDF
attack arXiv Feb 9, 2026 · 8w ago

RECUR: Resource Exhaustion Attack via Recursive-Entropy Guided Counterfactual Utilization and Reflection

Ziwei Wang, Yuanhe Zhang, Jing Chen et al. · Wuhan University · Beijing University of Posts and Telecommunications +3 more

Crafts counterfactual prompts using Recursive Entropy to force LRMs into infinite thinking loops, reducing throughput by 90%

Model Denial of Service nlp
PDF
attack arXiv Feb 3, 2026 · 9w ago

Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks

Yi Yu, Qixin Zhang, Shuhan Ye et al. · Nanyang Technological University · Chinese University of Hong Kong +2 more

Gradient-based timing-only adversarial attack on event-driven SNNs retimes spikes to cause misclassification while preserving spike counts

Input Manipulation Attack vision
2 citations PDF Code
defense arXiv Jan 30, 2026 · 9w ago

Beauty and the Beast: Imperceptible Perturbations Against Diffusion-Based Face Swapping via Directional Attribute Editing

Yilong Huang, Songze Li · Southeast University

Proactive defense adds imperceptible adversarial perturbations via W+ space attribute editing to foil diffusion-based deepfake face swapping

Output Integrity Attack visiongenerative
PDF
attack arXiv Jan 29, 2026 · 9w ago

Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise

Puwei Lian, Yujun Cai, Songze Li et al. · Southeast University · The University of Queensland +1 more

Exploits residual semantics in diffusion model noise schedules to perform black-box membership inference without auxiliary data

Membership Inference Attack visiongenerative
PDF
defense arXiv Jan 29, 2026 · 9w ago

RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing

Wenhui Zhang, Huiyu Xu, Zhibo Wang et al. · Zhejiang University · Southeast University

Defends LLM routing classifiers against adversarial trigger-prepending attacks that escalate cost, hijack quality, or bypass safety guardrails

Input Manipulation Attack Prompt Injection nlp
PDF
attack arXiv Jan 22, 2026 · 10w ago

Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models

Fengheng Chu, Jiahao Chen, Yuhong Wang et al. · Southeast University · Zhejiang University +1 more

White-box jailbreak exploits safety-critical attention heads via activation repatching to bypass LLM safety guardrails

Prompt Injection nlp
PDF
attack arXiv Jan 20, 2026 · 11w ago

PINA: Prompt Injection Attack against Navigation Agents

Jiani Liu, Yixin He, Lanlan Fan et al. · Zhejiang University · Southeast University

Proposes PINA, a black-box prompt injection attack against LLM navigation agents achieving 87.5% average attack success rate

Prompt Injection nlp
PDF
attack arXiv Jan 19, 2026 · 11w ago

CODE: A Contradiction-Based Deliberation Extension Framework for Overthinking Attacks on Retrieval-Augmented Generation

Xiaolei Zhang, Xiaojun Jia, Liquan Chen et al. · Southeast University · Nanyang Technological University

Poisons RAG knowledge bases with contradiction-laden documents to cause 5–25x reasoning token overconsumption in LLMs without affecting accuracy

Prompt Injection Model Denial of Service nlp
PDF
tool arXiv Jan 16, 2026 · 11w ago

AJAR: Adaptive Jailbreak Architecture for Red-teaming

Yipu Dou, Wang Yang · Southeast University

Modular agentic red-teaming framework using MCP to orchestrate multi-turn jailbreak algorithms against tool-using LLM agents

Prompt Injection Excessive Agency nlp
PDF Code
attack arXiv Jan 13, 2026 · 12w ago

MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

Yongtong Gu, Songze Li, Xia Hu · Southeast University · Shanghai Artificial Intelligence Laboratory

Evades black-box AI-generated text detectors via multi-stage style-transfer alignment, achieving 92% attack success rate

Output Integrity Attack nlp
PDF
attack arXiv Jan 9, 2026 · 12w ago

Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models

Songze Li, Ruishi He, Xiaojun Jia et al. · Southeast University · Nanyang Technological University +1 more

Proposes Mastermind, a hierarchical multi-agent jailbreak framework that autonomously learns and adapts attack strategies across multi-turn LLM conversations

Prompt Injection nlp
1 citations PDF
Loading more papers…