Latest papers

4 papers
benchmark arXiv Feb 10, 2026 · 7w ago

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

Zhisheng Qi, Utkarsh Sahu, Li Ma et al. · University of Oregon · Michigan State University +6 more

First systematic benchmark comparing knowledge-extraction attacks and defenses on RAG systems under unified evaluation protocols

Sensitive Information Disclosure nlp
PDF Code
attack arXiv Jan 30, 2026 · 9w ago

"Someone Hid It": Query-Agnostic Black-Box Attacks on LLM-Based Retrieval

Jiate Li, Defu Cao, Li Li et al. · University of Southern California · Adobe Research +1 more

Black-box query-agnostic adversarial token injection attack manipulates document rankings in RAG and LLM-based retrieval systems using surrogate LLMs

Input Manipulation Attack Prompt Injection nlp
1 citations PDF
defense ACM MM Oct 3, 2025 · Oct 2025

Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations

Naresh Kumar Devulapally, Shruti Agarwal, Tejas Gokhale et al. · The State University of New York · Adobe Research +1 more

Defends user images from unauthorized diffusion model personalization via imperceptible latent-space trajectory-shifted poisoning perturbations

Data Poisoning Attack Output Integrity Attack visiongenerative
PDF Code
defense arXiv Sep 11, 2025 · Sep 2025

Steering MoE LLMs via Expert (De)Activation

Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy et al. · University of California · Adobe Research +2 more

Manipulates MoE expert routing at inference time to steer LLM safety, achieving -100% safety when combined with jailbreaks

Prompt Injection nlp
PDF Code