Latest papers

4 papers
attack arXiv Mar 14, 2026 · 23d ago

Shapes are not enough: CONSERVAttack and its use for finding vulnerabilities and uncertainties in machine learning applications

Philip Bechtle, Lucie Flek, Philipp Alexander Jung et al. · University of Bonn · RWTH Aachen University

Adversarial attack crafting physically-plausible perturbations that evade standard validation checks in particle physics ML models

Input Manipulation Attack tabular
PDF
attack arXiv Jan 6, 2026 · Jan 2026

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

Zhakshylyk Nurlanov, Frank R. Schmidt, Florian Bernard · University of Bonn · Bosch Center for Artificial Intelligence

Gradient-free token-level adversarial suffix attack achieving near-100% jailbreak rate and strong transferability to GPT and Gemini

Input Manipulation Attack Prompt Injection nlp
PDF
benchmark arXiv Nov 10, 2025 · Nov 2025

More Agents Helps but Adversarial Robustness Gap Persists

Khashayar Alavi, Zhastay Yeltay, Lucie Flek et al. · University of Bonn · Lamarr Institute for Machine Learning and Artificial Intelligence

Evaluates multi-agent LLM robustness against adversarial text noise, finding collaboration improves accuracy but fails to close the robustness gap

Prompt Injection nlp
PDF
defense arXiv Aug 8, 2025 · Aug 2025

In-Training Defenses against Emergent Misalignment in Language Models

David Kaczér, Magnus Jørgenvåg, Clemens Vetter et al. · University of Bonn · Lamarr Institute for Machine Learning and Artificial Intelligence +1 more

Evaluates four in-training regularization defenses that prevent emergent misalignment when fine-tuning LLMs with malicious data via APIs

Transfer Learning Attack Prompt Injection nlp
PDF