Latest papers

7 papers
defense arXiv Feb 18, 2026 · 6w ago

Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming

Philip Sosnin, Jodie Knapp, Fraser Kennedy et al. · Imperial College London · The Alan Turing Institute

First sound-and-complete certification of data poisoning robustness via a single mixed-integer quadratic program encoding training dynamics

Data Poisoning Attack
PDF
defense Sci. Reports Dec 20, 2025 · Dec 2025

Detection of AI Generated Images Using Combined Uncertainty Measures and Particle Swarm Optimised Rejection Mechanism

Rahul Yumlembam, Biju Issac, Nauman Aslam et al. · Northumbria University · The Alan Turing Institute

Fuses Fisher information, MC dropout entropy, and GP variance via PSO to robustly detect AI-generated images across unseen generators and adversarial attacks

Output Integrity Attack visiongenerative
1 citations PDF
defense arXiv Nov 12, 2025 · Nov 2025

Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Unlearning, and Differential Privacy

Philip Sosnin, Matthew Wicker, Josh Collyer et al. · Imperial College London · The Alan Turing Institute

Formal certification framework bounding parameter reachability to certify ML robustness against adversarial data poisoning, unlearning, and DP

Data Poisoning Attack vision
2 citations 1 influentialPDF
defense arXiv Nov 10, 2025 · Nov 2025

Performance Decay in Deepfake Detection: The Limitations of Training on Outdated Data

Jack Richings, Margaux Leblanc, Ian Groves et al. · The Alan Turing Institute

Deepfake detector hits 99.8% AUROC but loses 30%+ recall within six months as generation techniques advance

Output Integrity Attack vision
PDF
attack TrustCom Oct 14, 2025 · Oct 2025

Fairness-Constrained Optimization Attack in Federated Learning

Harsh Kasyap, Minghong Fang, Zhuqing Liu et al. · The Alan Turing Institute · Indian Institute of Technology (BHU) +4 more

Proposes a Byzantine fairness attack in FL that injects bias up to 90% via optimization while evading accuracy-based defenses

Data Poisoning Attack federated-learningtabular
PDF
benchmark arXiv Oct 7, 2025 · Oct 2025

Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling

Mary Llewellyn, Annie Gray, Josh Collyer et al. · The Alan Turing Institute · Loughborough University

Proposes Bayesian hierarchical evaluation framework with embedding clustering to reliably quantify LLM prompt injection vulnerability

Prompt Injection nlp
PDF
attack arXiv Aug 30, 2025 · Aug 2025

When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment

Hanqi Yan, Hainiu Xu, Siya Qi et al. · King’s College London · The Alan Turing Institute +1 more

Reveals how chain-of-thought reasoning patterns mechanistically bypass LLM refusal via attention heads and cause safety forgetting via neuron entanglement during fine-tuning

Transfer Learning Attack Prompt Injection nlp
PDF