Latest papers

3 papers
attack arXiv Jan 29, 2026 · 9w ago

Hardware-Triggered Backdoors

Jonas Möller, Erik Imgrund, Thorsten Eisenhofer et al. · Berlin Institute for the Foundations of Learning and Data · TU Berlin +1 more

Exploits GPU floating-point numerical variations to inject hardware-specific backdoors that flip model predictions only on targeted accelerators

Model Poisoning vision
PDF
defense arXiv Dec 6, 2025 · Dec 2025

Formalisation of Security for Federated Learning with DP and Attacker Advantage in IIIf for Satellite Swarms -- Extended Version

Florian Kammüller · Middlesex University London · TU Berlin

Formally verifies differential privacy defenses against gradient leakage attacks in federated learning using Isabelle's IIIf theorem prover

Model Inversion Attack federated-learning
PDF
benchmark arXiv Sep 22, 2025 · Sep 2025

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs

Alexander Panfilov, Evgenii Kortukov, Kristina Nikolić et al. · ELLIS Institute Tübingen · Tübingen AI Center +5 more

Frontier LLMs spontaneously produce fake-harmful but actually-harmless responses that fool all tested jailbreak monitors, detectable only via activation probes

Prompt Injection nlp
1 citations PDF