Latest papers

4 papers
tool arXiv Jan 23, 2026 · 10w ago

I Guess That's Why They Call it the Blues: Causal Analysis for Audio Classifiers

David A. Kelly, Hana Chockler · King’s College London

Causal analysis tool finds minimal frequency subsets that manipulate audio classifiers with inaudible, single-frequency perturbations

Input Manipulation Attack audio
PDF
attack arXiv Dec 3, 2025 · Dec 2025

Out-of-the-box: Black-box Causal Attacks on Object Detectors

Melane Navaratnarajah, David A. Kelly, Hana Chockler · King’s College London

Black-box adversarial attack on object detectors using causal pixels to remove, modify, or inject spurious detections

Input Manipulation Attack vision
1 citations PDF
attack arXiv Aug 30, 2025 · Aug 2025

When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment

Hanqi Yan, Hainiu Xu, Siya Qi et al. · King’s College London · The Alan Turing Institute +1 more

Reveals how chain-of-thought reasoning patterns mechanistically bypass LLM refusal via attention heads and cause safety forgetting via neuron entanglement during fine-tuning

Transfer Learning Attack Prompt Injection nlp
PDF
defense arXiv Aug 13, 2025 · Aug 2025

Slow Tuning and Low-Entropy Masking for Safe Chain-of-Thought Distillation

Ziyang Ma, Qingyue Yuan, Linhai Zhang et al. · Southeast University · Nanjing Medical University +1 more

Defends SLM safety alignment during CoT distillation via weight-change dampening and low-entropy token masking

Prompt Injection nlp
PDF