ML Security Papers

Latest papers

3 papers

defense arXiv Apr 30, 2026 · 21d ago

Jona te Lintelo, Lichao Wu, Marina Krček et al. · Radboud University · University of Bristol +2 more

Reconfigures MoE LLM safety behavior by steering expert routing at inference time without retraining, defending against jailbreaks

Prompt Injection nlp

defense arXiv Mar 11, 2026 · 10w ago

Sengim Karayalcin, Marina Krcek, Pin-Yu Chen et al. · Leiden University · Radboud University +2 more

Identifies causal 'trigger directions' in ViT activations to analyze, remove, and detect backdoors via weight-space interventions

Model Poisoning vision

benchmark arXiv Oct 31, 2025 · Oct 2025

Ali Satvaty, Suzan Verberne, Fatih Turkmen · University of Groningen · Leiden University

Benchmarks entity-level membership inference of PII and sensitive data in LLMs, revealing limits of existing MIA methods

Membership Inference Attack nlp

1 citations PDF