Latest papers

8 papers
defense arXiv Feb 11, 2026 · 7w ago

Optimizing Agent Planning for Security and Autonomy

Aashish Kolluri, Rishi Sharma, Manuel Costa et al. · Microsoft · EPFL +1 more

Defends AI agents against indirect prompt injection via security-aware planning that maximizes autonomous operation without human oversight

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Feb 10, 2026 · 7w ago

Tracking Finite-Time Lyapunov Exponents to Robustify Neural ODEs

Christian Kuehn, Tobias Wöhrer · TU Munich · TU Wien

Defends neural ODEs against adversarial inputs by suppressing Finite-Time Lyapunov Exponents near decision boundaries during training

Input Manipulation Attack
PDF
benchmark arXiv Jan 28, 2026 · 9w ago

GNN Explanations that do not Explain and How to find Them

Steve Azzolin, Stefano Teso, Bruno Lepri et al. · University of Trento · Fondazione Bruno Kessler +1 more

Reveals malicious planting of deceptive GNN explanations that hide sensitive attribute use, and proposes a faithfulness metric to detect them

Output Integrity Attack graph
PDF
survey arXiv Dec 10, 2025 · Dec 2025

Chasing Shadows: Pitfalls in LLM Security Research

Jonathan Evertz, Niklas Risse, Nicolai Neuer et al. · CISPA Helmholtz Center for Information Security · Max Planck Institute for Security and Privacy +4 more

Surveys nine methodological pitfalls in LLM security research found in all 72 surveyed papers, with case studies showing how each misleads results

Data Poisoning Attack Prompt Injection nlp
2 citations PDF
attack arXiv Nov 25, 2025 · Nov 2025

Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries

Alexander Beiser, Flavio Martinelli, Wulfram Gerstner et al. · TU Wien · EPFL

Proposes specialized data augmentation strategies that enable black-box extraction of neural network weights at 100× parameter-to-data scale

Model Theft vision
PDF Code
defense ICML Nov 9, 2025 · Nov 2025

Probably Approximately Global Robustness Certification

Peter Blohm, Patrick Indri, Thomas Gärtner et al. · TU Wien

Certifies probabilistic global adversarial robustness of neural networks via ε-net sampling with dimension-independent sample size bounds

Input Manipulation Attack vision
PDF
attack arXiv Sep 4, 2025 · Sep 2025

Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair

Piotr Przymus, Andreas Happe, Jürgen Cito · Nicolaus Copernicus University · TU Wien

Adversarial bug reports manipulate LLM-based program repair systems into generating insecure code, bypassing 90% of existing defenses

Prompt Injection nlp
PDF
benchmark arXiv Aug 29, 2025 · Aug 2025

I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks

Daryna Oliynyk, Rudolf Mayer, Kathrin Grosse et al. · University of Vienna · SBA Research +2 more

Proposes first comprehensive threat model and evaluation framework for comparing model stealing attacks on image classifiers

Model Theft vision
PDF