Latest papers

5 papers
defense arXiv Mar 3, 2026 · 4w ago

Understanding and Mitigating Dataset Corruption in LLM Steering

Cullen Anderson, Narmeen Oozeer, Foad Namjoo et al. · University of Massachusetts Amherst · Martian AI +2 more

Analyzes adversarial data poisoning of LLM contrastive steering datasets and defends with robust mean estimation

Data Poisoning Attack Training Data Poisoning nlp
PDF
attack arXiv Jan 17, 2026 · 11w ago

Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang et al. · Griffith University · University of Technology Sydney +1 more

Adversarial attack exploits visual token compression in VLMs by perturbing token importance rankings, causing failures only under compressed inference

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
attack arXiv Nov 26, 2025 · Nov 2025

Dataset Poisoning Attacks on Behavioral Cloning Policies

Akansha Kalra, Soumil Datta, Ethan Gilmore et al. · University of Utah

Clean-label backdoor attacks on behavioral cloning policies using visual triggers and entropy-based test-time triggering

Model Poisoning Data Poisoning Attack visionreinforcement-learning
PDF Code
attack arXiv Oct 1, 2025 · Oct 2025

On the Adversarial Robustness of Learning-based Conformal Novelty Detection

Daofu Zhang, Mehrdad Pournaderi, Hanne M. Clifford et al. · University of Utah · Syracuse University +1 more

Attacks ML-based conformal novelty detectors via black-box perturbations that inflate false discovery rates while preserving detection power

Input Manipulation Attack visiontabular
1 citations PDF
attack arXiv Sep 21, 2025 · Sep 2025

Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving

Xuan Chen, Shiwei Feng, Zikang Xiong et al. · Purdue University · University of Utah +1 more

Backdoor attack on offline RL driving agents using temporal-logic-specified vehicle trajectories as stealthy, real-world-deployable triggers

Model Poisoning reinforcement-learningvision
2 citations PDF