Latest papers

8 papers
benchmark arXiv Apr 1, 2026 · 5d ago

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

Weidi Luo, Xiaofei Wen, Tenghao Huang et al. · University of Georgia · University of California +3 more

Benchmark and guardrail for detecting jailbreak attacks that bypass LLM safety alignment in food safety domain

Prompt Injection nlp
PDF Code
attack arXiv Feb 28, 2026 · 5w ago

Roots Beneath the Cut: Uncovering the Risk of Concept Revival in Pruning-Based Unlearning for Diffusion Models

Ci Zhang, Zhaojun Ding, Chence Yang et al. · University of Georgia · Carnegie Mellon University +3 more

Attacks pruning-based unlearning in diffusion models by reviving erased concepts via side-channel signals from zeroed weight locations

Output Integrity Attack generativevision
PDF
attack arXiv Jan 30, 2026 · 9w ago

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

Tanusree Debi, Wentian Zhu · University of Georgia

Red-teams Google's AP2 payment protocol via prompt injection attacks that hijack agent purchasing decisions and extract sensitive user payment data

Prompt Injection Sensitive Information Disclosure nlp
PDF
defense arXiv Jan 13, 2026 · 11w ago

Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment

Qitao Tan, Xiaoying Song, Ningxi Cheng et al. · University of Georgia · University of North Texas +2 more

Recovers LLM safety alignment eroded by fine-tuning via post-training quantization, without retraining, in 40 minutes on one GPU

Transfer Learning Attack Prompt Injection nlp
PDF Code
attack arXiv Dec 18, 2025 · Dec 2025

MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval

Saksham Sahai Srivastava, Haoyu He · University of Georgia

Poisons LLM agent episodic memory via benign documents, causing persistent unsafe imitation of grafted experience records at retrieval time

Data Poisoning Attack Prompt Injection nlp
4 citations PDF Code
attack arXiv Oct 11, 2025 · Oct 2025

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation

Wentian Zhu, Zhen Xiang, Wei Niu et al. · University of Georgia

Exploits LLM special tokens to construct jailbreak primitives that bypass both safety alignment and content moderation simultaneously

Prompt Injection nlp
PDF
benchmark arXiv Oct 8, 2025 · Oct 2025

Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent

Weidi Luo, Qiming Zhang, Tianyu Lu et al. · University of Georgia · University of Wisconsin–Madison +6 more

Benchmarks LLM-powered agents' ability to execute end-to-end enterprise intrusions aligned with MITRE ATT&CK TTPs

Excessive Agency Prompt Injection nlpmultimodal
4 citations PDF Code
defense arXiv Aug 19, 2025 · Aug 2025

Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text

Zixin Rao, Youssef Mohamed, Shang Liu et al. · University of Georgia · Egypt-Japan University of Science and Technology +1 more

Multi-task framework jointly detects LLM-generated text and attributes authorship to specific LLMs across languages

Output Integrity Attack nlp
PDF Code