Latest papers

42 papers
attack arXiv Mar 23, 2026 · 14d ago

Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates

Samrendra Roy, Kazuma Kobayashi, Souvik Chakraborty et al. · University of Illinois Urbana-Champaign · Indian Institute of Technology Delhi +1 more

Gradient-free adversarial attacks on neural operator digital twins causing catastrophic field prediction failures through sparse physically-plausible perturbations

Input Manipulation Attack vision
PDF
survey arXiv Mar 11, 2026 · 26d ago

The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey

Juhee Kim, Xiaoyuan Liu, Zhun Wang et al. · University of California · Seoul National University +1 more

Surveys attacks and defenses across agentic LLM systems, covering prompt injection, insecure tool use, and excessive agency risks

Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal
PDF
benchmark arXiv Mar 11, 2026 · 26d ago

Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models

Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran · Google DeepMind · University of Illinois Urbana-Champaign

Scaling-law framework comparing four LLM jailbreak paradigms by FLOPs budget, finding prompt-based attacks dominate compute efficiency

Input Manipulation Attack Prompt Injection nlp
PDF
benchmark arXiv Feb 13, 2026 · 7w ago

Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

Xu Li, Simon Yu, Minzhou Pan et al. · Northeastern University · Virtue AI +2 more

Benchmarks multi-turn jailbreaks in tool-using LLM agents and proposes ToolShield, a self-exploration defense reducing ASR by 30%

Prompt Injection Insecure Plugin Design nlp
PDF Code
benchmark arXiv Feb 7, 2026 · 8w ago

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Sai Puppala, Ismail Hossain, Md Jahangir Alam et al. · Southern Illinois University · University of Texas +2 more

Benchmarks LLM agent architectures across 14 attack classes, exposing authorization confusion and tool hijacking as dominant structural risks

Excessive Agency Insecure Plugin Design Prompt Injection nlp
PDF
defense arXiv Feb 4, 2026 · 8w ago

E-Globe: Scalable $ε$-Global Verification of Neural Networks via Tight Upper Bounds and Pattern-Aware Branching

Wenting Li, Saif R. Kazi, Russell Bent et al. · University of Texas at Austin · Los Alamos National Laboratory +1 more

Branch-and-bound neural network verifier using NLP-CC upper bounds to certify or disprove adversarial robustness more efficiently than MIP methods

Input Manipulation Attack vision
PDF
attack arXiv Jan 30, 2026 · 9w ago

Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models

Ye Yu, Haibo Jin, Yaoning Yu et al. · University of Illinois Urbana-Champaign · Boise State University

Audio narrative jailbreak using TTS achieves 98.26% success rate against safety-aligned audio-language models like Gemini 2.0 Flash

Prompt Injection audiomultimodalnlp
1 citations PDF
attack arXiv Jan 22, 2026 · 10w ago

Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems

Mengyu Yao, Ziqi Zhang, Ning Luo et al. · Peking University · University of Illinois Urbana-Champaign

Attacks RAG systems to steal private knowledge bases via knowledge-graph-guided adaptive queries, achieving 84.4% corpus coverage in 1,000 queries

Sensitive Information Disclosure nlp
PDF
attack arXiv Jan 21, 2026 · 10w ago

Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation

Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim et al. · University of Michigan · LG AI Research +1 more

Crafted agent chain-of-thought reasoning inflates LLM/VLM judge false positives by up to 90% across 800 web-task trajectories

Prompt Injection nlpmultimodal
1 citations PDF
benchmark arXiv Jan 19, 2026 · 11w ago

Verifying Local Robustness of Pruned Safety-Critical Networks

Minh Le, Phuong Cao · Georgia Institute of Technology · University of Illinois Urbana-Champaign

Empirically shows pruning ratio non-linearly affects formal L∞ adversarial robustness certificates in safety-critical vision models

Input Manipulation Attack vision
PDF
attack arXiv Jan 16, 2026 · 11w ago

Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents

Kaiyu Zhou, Yongsen Zheng, Yicheng He et al. · Nanyang Technological University · University of Illinois Urbana-Champaign +2 more

Stealthy multi-turn economic DoS attack manipulates MCP tool servers to inflate LLM agent costs 658x while keeping task outputs correct

Model Denial of Service Insecure Plugin Design nlp
2 citations 1 influentialPDF
defense arXiv Jan 7, 2026 · 12w ago

ALERT: Zero-shot LLM Jailbreak Detection via Internal Discrepancy Amplification

Xiao Lin, Philip Li, Zhichen Zeng et al. · University of Illinois Urbana-Champaign · Visa

Defends LLMs against jailbreaks by amplifying internal layer/module/token feature discrepancies to detect attacks without training examples

Prompt Injection nlp
2 citations PDF
attack arXiv Jan 6, 2026 · Jan 2026

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

Devang Kulshreshtha, Hang Su, Chinmay Hegde et al. · Amazon · New York University +1 more

Attacker-LLM-free multi-turn jailbreak via lexical anchor injection achieves 97-100% ASR on GPT/Claude/Llama in ~6.4 queries

Prompt Injection nlp
PDF
attack arXiv Jan 5, 2026 · Jan 2026

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization

Jiwei Guan, Haibo Jin, Haohan Wang · Macquarie University · University of Illinois Urbana-Champaign

Black-box gradient-free attack crafts adversarial images to jailbreak vision-language models with 83% ASR

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
survey arXiv Dec 20, 2025 · Dec 2025

SoK: Understanding (New) Security Issues Across AI4Code Use Cases

Qilong Wu, Taoran Li, Tianyang Zhou et al. · University of Illinois Urbana-Champaign

SoK survey spanning adversarial robustness of vulnerability detectors, insecure LLM code generation, and security gaps in AI4Code benchmarks

Input Manipulation Attack Prompt Injection nlp
1 citations PDF
defense arXiv Dec 11, 2025 · Dec 2025

Clip-and-Verify: Linear Constraint-Driven Domain Clipping for Accelerating Neural Network Verification

Duo Zhou, Jorge Chavez, Hesun Chen et al. · University of Illinois Urbana-Champaign

Accelerates certified adversarial robustness verification via GPU-based domain clipping, reducing BaB subproblems by up to 96%

Input Manipulation Attack vision
2 citations PDF Code
defense arXiv Dec 7, 2025 · Dec 2025

GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering

Jehyeok Yeon, Federico Cinus, Yifan Wu et al. · University of Illinois Urbana-Champaign · University of Southern California +1 more

Proposes graph-regularized sparse autoencoders to capture distributed LLM safety representations for adaptive jailbreak defense with 82% refusal rate

Prompt Injection nlp
1 citations PDF
defense arXiv Dec 7, 2025 · Dec 2025

Toward Reliable Machine Unlearning: Theory, Algorithms, and Evaluation

Ali Ebrahimpour-Boroojeny · University of Illinois Urbana-Champaign

Proposes adversarial-example-based unlearning (AMUN) and a novel MIA to expose class-unlearning vulnerabilities, with TRW as a targeted defense

Membership Inference Attack vision
PDF
defense arXiv Dec 5, 2025 · Dec 2025

LDLT $\mathcal{L}$-Lipschitz Network: Generalized Deep End-To-End Lipschitz Network Construction

Marius F.R. Juston, Ramavarapu S. Sreenivas, Dustin Nottage et al. · University of Illinois Urbana-Champaign · U.S. Army Corps of Engineers

Constructs certifiably robust deep residual networks via LDL^T decomposition of LMI constraints, guaranteeing Lipschitz bounds against adversarial perturbations

Input Manipulation Attack visiontabular
PDF
defense arXiv Dec 5, 2025 · Dec 2025

Matching Ranks Over Probability Yields Truly Deep Safety Alignment

Jason Vega, Gagandeep Singh · University of Illinois Urbana-Champaign

Proposes RAP attack bypassing LLM deep-safety-alignment defenses via rank-guided token selection, then fixes it with attention-regularization defense PRESTO

Prompt Injection nlp
PDF Code
Loading more papers…