Latest papers

84 papers
defense arXiv Apr 5, 2026 · 3d ago

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin et al. · Shanghai Jiao Tong University · University of Illinois Urbana-Champaign +1 more

Multi-agent cooperative defense system that adapts across rounds to counter evolving LLM jailbreak attacks through deception and forensic analysis

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Apr 4, 2026 · 4d ago

LOGER: Local--Global Ensemble for Robust Deepfake Detection in the Wild

Fei Wu, Dagong Lu, Mufeng Yao et al. · Shanghai Jiao Tong University · INTSIG Information

Deepfake detector combining global semantic analysis and local patch-level forensics for robust detection across manipulation methods

Output Integrity Attack visionmultimodal
PDF
defense arXiv Apr 4, 2026 · 4d ago

HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild

Fei Wu, Dagong Lu, Mufeng Yao et al. · Shanghai Jiao Tong University · INTSIG Information

Heterogeneous ensemble detector combining multi-scale features and diverse backbones to identify AI-generated images under real-world distortions

Output Integrity Attack visiongenerative
PDF
defense arXiv Mar 30, 2026 · 9d ago

Generalizable Detection of AI Generated Images with Large Models and Fuzzy Decision Tree

Fei Wu, Guanghao Ding, Zijian Niu et al. · Shanghai Jiao Tong University

Combines lightweight artifact detectors with multimodal LLMs via fuzzy decision trees for generalizable AI-generated image detection

Output Integrity Attack visionmultimodal
PDF
defense arXiv Mar 25, 2026 · 14d ago

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Hongyi Miao, Jun Jia, Xincheng Wang et al. · Shandong University · Shanghai Jiao Tong University +4 more

Data poisoning defense that protects private photo datasets from VLM fine-tuning attacks that extract identity-affiliation relationships

Data Poisoning Attack Sensitive Information Disclosure visionnlpmultimodal
PDF
defense arXiv Mar 25, 2026 · 14d ago

AMIF: Authorizable Medical Image Fusion Model with Built-in Authentication

Jie Song, Jun Jia, Wei Sun et al. · Macao Polytechnic University · Shanghai Jiao Tong University +2 more

Medical image fusion model embedding visible copyright watermarks in outputs, removable only with authentication keys

Model Theft Output Integrity Attack visionmultimodal
PDF
attack arXiv Mar 23, 2026 · 16d ago

Thermal Topology Collapse: Universal Physical Patch Attacks on Infrared Vision Systems

Chengyin Hu, Yikun Guo, Yuxian Dong et al. · China University of Petroleum-Beijing · University of Electronic Science and Technology of China +3 more

Universal adversarial patch attack on infrared pedestrian detectors using parameterized Bézier curves and cold patches

Input Manipulation Attack vision
PDF
attack arXiv Mar 20, 2026 · 19d ago

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Fazhong Liu, Zhuoyan Chen, Tu Lan et al. · Shanghai Jiao Tong University

Supply chain attack embedding malicious operational narratives in autonomous coding agent bootstrap guidance, achieving up to 64% success rate

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlp
PDF
defense arXiv Mar 16, 2026 · 23d ago

Architecture-Agnostic Feature Synergy for Universal Defense Against Heterogeneous Generative Threats

Bingxue Zhang, Yang Gao, Feida Zhu et al. · University of Shanghai for Science and Technology · Singapore Management University +1 more

Universal adversarial defense against heterogeneous generative models using feature-space alignment to protect images from unauthorized editing

Input Manipulation Attack visiongenerative
PDF
defense arXiv Mar 16, 2026 · 23d ago

Counterexample Guided Branching via Directional Relaxation Analysis in Complete Neural Network Verification

Jingyang Li, Fu Song, Guoqiang Li · Shanghai Jiao Tong University · Chinese Academy of Sciences

Reformulates neural network verification as CEGAR loop, using spurious counterexamples to guide branching and tighten robustness proofs

Input Manipulation Attack vision
PDF
attack arXiv Mar 14, 2026 · 25d ago

Inevitable Encounters: Backdoor Attacks Involving Lossy Compression

Qian Li, Yunuo Chen, Yuntian Chen · Shanghai Jiao Tong University · Eastern Institute of Technology

Backdoor attacks adapted for lossy compression using ROI coding to preserve trigger information in JPEG bitstreams

Model Poisoning Data Poisoning Attack vision
PDF
defense arXiv Mar 12, 2026 · 27d ago

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai et al. · Shanghai Jiao Tong University · Ant Group +2 more

Embeds per-client backdoor watermarks in federated LMs to trace model leaks to individual culprits via black-box queries

Model Theft Model Poisoning nlpfederated-learningmultimodal
PDF
defense arXiv Mar 10, 2026 · 29d ago

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

Yinpeng Wu, Yitong Chen, Lixiang Wang et al. · Shanghai Jiao Tong University

TEE-based LLM serving system that protects model weights and user data from compromised OS kernels on mobile devices

Model Theft Sensitive Information Disclosure nlp
PDF
attack arXiv Mar 9, 2026 · 4w ago

SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

Junxian Li, Tu Lan, Haozhen Tan et al. · Shanghai Jiao Tong University

Backdoor attack on VLM GUI agents that induces excessive latency via RL-injected trigger-aware long reasoning chains

Model Poisoning multimodalvisionnlp
PDF Code
survey arXiv Mar 2, 2026 · 5w ago

From Secure Agentic AI to Secure Agentic Web: Challenges, Threats, and Future Directions

Zhihang Deng, Jiaping Gui, Weinan Zhang · Shanghai Innovation Institute · Shanghai Jiao Tong University

Surveys prompt injection, toolchain abuse, and agent network threats across LLM agentic systems and web-scale deployments

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Feb 28, 2026 · 5w ago

ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data

Haodong Zhao, Jinming Hu, Zhaomin Wu et al. · Shanghai Jiao Tong University · National University of Singapore +1 more

Defends federated LLM instruction tuning against interspersed backdoor poisoning using frequency-domain gradient signals and global clustering

Model Poisoning Data Poisoning Attack nlpfederated-learning
PDF Code
benchmark arXiv Feb 26, 2026 · 5w ago

Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

Xiaosen Wang, Zhijin Ge, Bohan Liu et al. · Huazhong University of Science and Technology · Xidian University +3 more

Surveys 100+ transfer-based adversarial attacks, proposes unified benchmark framework to address unfair comparisons in the field

Input Manipulation Attack vision
PDF Code
attack arXiv Feb 17, 2026 · 7w ago

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

Haodong Zhao, Jinming Hu, Gongshen Liu · Shanghai Jiao Tong University

Reveals distributed backdoor attacks via low-concentration poisoned data across benign FL clients defeat all existing defenses

Model Poisoning Data Poisoning Attack Training Data Poisoning nlpfederated-learning
PDF
defense arXiv Feb 2, 2026 · 9w ago

MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety

Xiaoyu Wen, Zhida He, Han Qi et al. · Shanghai AI Laboratory · Shanghai Jiao Tong University +1 more

Multi-agent RL co-evolves an LLM attacker and defender, generating novel jailbreaks to train robust safety alignment against unseen prompts

Prompt Injection nlpreinforcement-learning
PDF Code
defense arXiv Jan 27, 2026 · 10w ago

RvB: Automating AI System Hardening via Iterative Red-Blue Games

Lige Huang, Zicheng Liu, Jie Zhang et al. · Shanghai Artificial Intelligence Laboratory · Institute of Information Engineering +1 more

Automates LLM jailbreak guardrail hardening via iterative red-blue adversarial game without model parameter updates

Prompt Injection nlp
PDF
Loading more papers…