Latest papers

10 papers
attack arXiv Feb 22, 2026 · 6w ago

Learning to Detect Language Model Training Data via Active Reconstruction

Junjie Oscar Yin, John X. Morris, Vitaly Shmatikov et al. · University of Washington · Cornell University +2 more

Uses reinforcement learning to fine-tune LLMs and detect training data membership via active reconstruction, outperforming passive MIAs by 10.7%

Membership Inference Attack Sensitive Information Disclosure nlp
PDF
benchmark arXiv Feb 21, 2026 · 6w ago

Prior Aware Memorization: An Efficient Metric for Distinguishing Memorization from Generalization in Large Language Models

Trishita Tiwari, Ari Trachtenberg, G. Edward Suh · Cornell University · Boston University +1 more

Proposes Prior-Aware Memorization metric showing 55–90% of LLM 'memorized' sequences are actually statistically common, not genuine leakage

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
attack arXiv Jan 29, 2026 · 9w ago

ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models

Xiaogeng Liu, Xinyan Wang, Yechao Zhang et al. · Johns Hopkins University · NVIDIA +4 more

RL-trained attacker generates short natural prompts that force LRMs into pathologically long reasoning, achieving 286x amplification and >98% detection bypass

Model Denial of Service nlpreinforcement-learning
PDF
defense arXiv Jan 24, 2026 · 10w ago

Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes

Gautam Siddharth Kashyap, Harsh Joshi, Niharika Jain et al. · Macquarie University · Bharati Vidyapeeth’s College Of Engineering +4 more

Proposes ConLLM, a contrastive learning + LLM framework for detecting audio, video, and audio-visual deepfakes

Output Integrity Attack multimodalaudiovisionnlp
PDF Code
tool arXiv Dec 21, 2025 · Dec 2025

Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models

Zhang Wei, Peilu Hu, Zhenyuan Wei et al. · Independent Researcher · Ltd. +12 more

Automated red-teaming tool for LLMs using meta-prompt-guided adversarial generation, finding 3.9× more vulnerabilities than manual testing

Prompt Injection nlp
1 citations PDF
survey IACR ePrint Dec 1, 2025 · Dec 2025

Systems Security Foundations for Agentic Computing

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda et al. · Google · University of California +5 more

Surveys agentic AI security through a systems-security lens, covering prompt injection, tool-use risks, and 11 real-world attack case studies

Prompt Injection Insecure Plugin Design Excessive Agency nlp
3 citations PDF
attack arXiv Nov 27, 2025 · Nov 2025

Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?

Wenkai Huang, Yijia Guo, Gaolei Li et al. · Shanghai Jiao Tong University · Shanghai Key Laboratory of Integrated Administration Technologies for Information Security +4 more

Attacks copyright watermarks on 3D Gaussian Splatting assets by isolating and removing watermark-bearing Gaussian primitives via view-dependent rendering analysis

Output Integrity Attack vision
1 citations PDF
defense TIFS Nov 2, 2025 · Nov 2025

Parameter Interpolation Adversarial Training for Robust Image Classification

Xin Liu, Yichen Yang, Kun He et al. · Huazhong University of Science and Technology · Cornell University

Improves adversarial training stability via parameter interpolation between epochs, reducing overfitting and robustness oscillation in CNNs and ViTs

Input Manipulation Attack vision
9 citations 1 influentialPDF
defense arXiv Oct 20, 2025 · Oct 2025

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Rishi Jha, Harold Triedman, Justin Wagle et al. · Cornell University · Microsoft

Breaks alignment-based defenses for LLM multi-agent control-flow hijacking and proposes ControlValve using control-flow graphs and least privilege

Prompt Injection Excessive Agency nlp
3 citations PDF
benchmark arXiv Sep 6, 2025 · Sep 2025

Benchmarking Robust Aggregation in Decentralized Gradient Marketplaces

Zeyu Song, Sainyam Galhotra, Shagufta Mehnaz · Pennsylvania State University · Cornell University

Benchmarks robust aggregation defenses against Byzantine and Sybil attacks in decentralized federated gradient marketplaces with new economic metrics

Data Poisoning Attack federated-learning
PDF