Latest papers

5 papers
survey arXiv Mar 23, 2026 · 14d ago

Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Yanming Mu, Hao Hu, Feiyang Li et al. · State Key Laboratory of Mathematical Engineering and Advanced Computing · Information Engineering University +2 more

First end-to-end survey mapping RAG security threats, defenses, and benchmarks across the entire pipeline

Prompt Injection Training Data Poisoning Sensitive Information Disclosure nlp
PDF
attack arXiv Oct 12, 2025 · Oct 2025

SASER: Stego attacks on open-source LLMs

Ming Tan, Wei Li, Hu Tao et al. · Information Engineering University · Xidian University

Embeds malicious executable payloads in open-source LLM weights using steganography, achieving 100% ASR with 98.1% better stealth than prior DNN stego attacks

Model Poisoning nlp
PDF
attack arXiv Oct 1, 2025 · Oct 2025

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Yu Yan, Siqi Lu, Yang Gao et al. · Information Engineering University · Chinese Academy of Sciences +1 more

Flips single bits in deployed LLM weights via remote Rowhammer attacks, collapsing accuracy to 0% and triggering harmful output generation

Model Poisoning nlp
PDF
attack arXiv Aug 31, 2025 · Aug 2025

Sequential Difference Maximization: Generating Adversarial Examples via Multi-Stage Optimization

Xinlei Liu, Tao Hu, Peng Yi et al. · Information Engineering University · Key Laboratory of Cyberspace Security

Novel multi-stage gradient attack outperforms SOTA by optimizing a reconstructed adversarial objective via sequential loss functions

Input Manipulation Attack vision
PDF Code
defense arXiv Jan 8, 2025 · Jan 2025

Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training

Hongxin Zhi, Hongtao Yu, Shaome Li et al. · Information Engineering University

Adversarial training framework using distributionally robust optimization to eliminate class-wise robustness disparity with theoretical guarantees

Input Manipulation Attack vision
PDF