Latest papers

5 papers
defense arXiv Feb 26, 2026 · 5w ago

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Tian Zhang, Yiwei Xu, Juan Wang et al. · Wuhan University · University at Buffalo +1 more

Defends LLM agents against indirect prompt injection via causal takeover detection and context purification at tool-return boundaries

Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Oct 19, 2025 · Oct 2025

Black-box Optimization of LLM Outputs by Asking for Directions

Jie Zhang, Meng Ding, Yang Liu et al. · ETH Zürich · University at Buffalo +1 more

Exploits LLMs' comparative confidence expressions as black-box optimization signal for adversarial image attacks, jailbreaks, and prompt injections

Input Manipulation Attack Prompt Injection visionnlpmultimodal
2 citations PDF Code
defense ICCVW Oct 16, 2025 · Oct 2025

PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic Analysis

Soumyya Kanti Datta, Tanvi Ranga, Chengzhe Sun et al. · University at Buffalo

Multimodal deepfake detector fusing phoneme sequences, lip geometry, and facial identity embeddings to catch subtle audio-visual inconsistencies

Output Integrity Attack visionaudiomultimodalnlp
2 citations PDF Code
attack EMNLP Sep 25, 2025 · Sep 2025

Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation

Wenkai Guo, Xuefeng Liu, Haolin Wang et al. · Beihang University · Zhongguancun Laboratory +3 more

Demonstrates training data extraction from federated LLM global models and proposes FL-specific attack tracking parameter updates across rounds

Model Inversion Attack Sensitive Information Disclosure nlpfederated-learning
PDF Code
attack arXiv Aug 20, 2025 · Aug 2025

TAIGen: Training-Free Adversarial Image Generation via Diffusion Models

Susim Roy, Anubhooti Jain, Mayank Vatsa et al. · University at Buffalo · IIT Jodhpur

Training-free diffusion-based black-box adversarial attack on image classifiers with 10x speedup via selective RGB channel perturbation

Input Manipulation Attack visiongenerative
PDF