Latest papers

15 papers
survey arXiv Mar 25, 2026 · 12d ago

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan · University of Central Florida · University of Copenhagen

Unified taxonomy of ML security threats organizing attacks into data-to-data, data-to-model, model-to-data, and model-to-model categories

Input Manipulation Attack Data Poisoning Attack Model Inversion Attack Membership Inference Attack Model Theft Output Integrity Attack Model Poisoning Prompt Injection Sensitive Information Disclosure visionnlpmultimodal
PDF
attack arXiv Feb 25, 2026 · 5w ago

Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!

Zihang Zou, Boqing Gong, Liqiang Wang · University of Central Florida · Boston University

Gradient-based attack exploits diffusion model cross-attention to replicate copyrighted images while evading both visible and invisible watermarks

Output Integrity Attack visiongenerative
PDF Code
defense arXiv Feb 23, 2026 · 6w ago

RobPI: Robust Private Inference against Malicious Client

Jiaqi Xue, Mengxin Zheng, Qian Lou · University of Central Florida

Defends FHE-based private inference against malicious clients who craft adversarial inputs to manipulate model outputs with noise injection into logits and features

Input Manipulation Attack vision
PDF
defense arXiv Feb 11, 2026 · 7w ago

Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away

Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al. · College Park · IIT Bombay +1 more

Inference-time defense for multimodal reasoning VLMs that monitors reasoning traces and steers safety within 1-3 steps to cut jailbreak ASR by 30-60%

Input Manipulation Attack Prompt Injection multimodalnlp
PDF
defense arXiv Feb 9, 2026 · 8w ago

CryptoGen: Secure Transformer Generation with Encrypted KV-Cache Reuse

Hedong Zhang, Neusha Javidnia, Shweta Pardeshi et al. · University of Central Florida · University of California

Cryptographic HE+MPC system enabling privacy-preserving autoregressive LLM inference that protects both user prompts and model weights from semi-honest adversaries

Model Theft nlp
PDF
attack arXiv Jan 30, 2026 · 9w ago

Optimal Transport-Guided Adversarial Attacks on Graph Neural Network-Based Bot Detection

Kunal Mukherjee, Zulfikar Alom, Tran Gia Bao Ngo et al. · Virginia Tech · University of Toledo +2 more

Optimal transport-guided adversarial graph attacks evade GNN-based bot detectors via realistic edge edits and node injection

Input Manipulation Attack graph
2 citations PDF
defense arXiv Jan 30, 2026 · 9w ago

RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance

Miao Lin, Feng Yu, Rui Ning et al. · Old Dominion University · University of Texas at El Paso +3 more

Certified black-box poisoned-sample detector for backdoor attacks that remains robust under real-world class imbalance

Model Poisoning vision
PDF
attack arXiv Jan 20, 2026 · 10w ago

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models

Bingxin Xu, Yuzhang Shang, Binghui Wang et al. · University of Southern California · University of Central Florida +1 more

Backdoor attack on VLA robotic models exploiting action chunking to inject stealthy malicious trajectories with 93% ASR

Model Poisoning Data Poisoning Attack visionmultimodalreinforcement-learning
1 citations PDF
benchmark arXiv Nov 25, 2025 · Nov 2025

Memories Retrieved from Many Paths: A Multi-Prefix Framework for Robust Detection of Training Data Leakage in Large Language Models

Trung Cuong Dang, David Mohaisen · University of Central Florida

Defines LLM training data memorization by number of distinct adversarial prefixes that elicit it, enabling robust leakage auditing of aligned models

Model Inversion Attack Sensitive Information Disclosure nlp
2 citations PDF
defense arXiv Oct 31, 2025 · Oct 2025

Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler

Zixuan Hu, Li Shen, Zhenyi Wang et al. · Nanyang Technological University · Sun Yat-Sen University +2 more

Defends LLMs against harmful fine-tuning by learning data safety attributes via Bayesian inference without requiring attack simulation

Data Poisoning Attack Transfer Learning Attack Training Data Poisoning nlp
5 citations 1 influentialPDF Code
defense arXiv Oct 27, 2025 · Oct 2025

PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

Jiaqi Xue, Yifei Zhao, Mansour Al Ghanim et al. · University of Central Florida · Florida State University +1 more

Embeds robust text watermarks into open-source LLM weights to detect AI-generated content even after fine-tuning or model merging

Output Integrity Attack nlp
PDF
defense arXiv Oct 24, 2025 · Oct 2025

DictPFL: Efficient and Private Federated Learning on Encrypted Gradients

Jiaqi Xue, Mayank Kumar, Yuzhang Shang et al. · University of Central Florida · Florida State University +2 more

Defends federated learning against gradient inversion attacks via efficient homomorphic encryption, achieving 2× overhead of plaintext FL

Model Inversion Attack federated-learning
1 citations PDF Code
defense ICCD Oct 22, 2025 · Oct 2025

CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage

Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali et al. · University of Central Florida

Defends against proprietary RTL hardware IP leakage from LLM memorization via activation-level steering on transformer components

Model Inversion Attack Sensitive Information Disclosure nlp
1 citations PDF Code
attack arXiv Oct 20, 2025 · Oct 2025

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

Elias Hossain, Swayamjit Saha, Somshubhra Roy et al. · University of Central Florida · Mississippi State University +1 more

Attacks LLM inference by corrupting KV cache key vectors at runtime, bypassing prompt filters and causing output degradation across GPT-2 and LLaMA-2

Input Manipulation Attack nlp
2 citations PDF
defense arXiv Sep 8, 2025 · Sep 2025

AttestLLM: Efficient Attestation Framework for Billion-scale On-device LLMs

Ruisi Zhang, Yifei Zhao, Neusha Javidnia et al. · University of California · University of Central Florida

Embeds device-specific watermarks into LLM layer activations inside a TEE to attest model legitimacy and resist model replacement or forgery attacks on-device

Model Theft nlp
PDF