ML Security Papers

Latest papers

4 papers

tool arXiv Jan 23, 2026 · 10w ago

I Guess That's Why They Call it the Blues: Causal Analysis for Audio Classifiers

David A. Kelly, Hana Chockler · King’s College London

Causal analysis tool finds minimal frequency subsets that manipulate audio classifiers with inaudible, single-frequency perturbations

Input Manipulation Attack audio

PDF

attack arXiv Dec 3, 2025 · Dec 2025

Out-of-the-box: Black-box Causal Attacks on Object Detectors

Melane Navaratnarajah, David A. Kelly, Hana Chockler · King’s College London

Black-box adversarial attack on object detectors using causal pixels to remove, modify, or inject spurious detections

Input Manipulation Attack vision

1 citations PDF

attack arXiv Aug 30, 2025 · Aug 2025

When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment

Hanqi Yan, Hainiu Xu, Siya Qi et al. · King’s College London · The Alan Turing Institute +1 more

Reveals how chain-of-thought reasoning patterns mechanistically bypass LLM refusal via attention heads and cause safety forgetting via neuron entanglement during fine-tuning

Transfer Learning Attack Prompt Injection nlp

PDF

defense arXiv Aug 13, 2025 · Aug 2025

Slow Tuning and Low-Entropy Masking for Safe Chain-of-Thought Distillation

Ziyang Ma, Qingyue Yuan, Linhai Zhang et al. · Southeast University · Nanjing Medical University +1 more

Defends SLM safety alignment during CoT distillation via weight-change dampening and low-entropy token masking

Prompt Injection nlp

PDF

Latest papers

I Guess That's Why They Call it the Blues: Causal Analysis for Audio Classifiers

Out-of-the-box: Black-box Causal Attacks on Object Detectors

When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment

Slow Tuning and Low-Entropy Masking for Safe Chain-of-Thought Distillation

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue