Latest papers

9 papers
benchmark arXiv Mar 12, 2026 · 25d ago

Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks

Junjie Chu, Yiting Qu, Ye Leng et al. · CISPA Helmholtz Center for Information Security · Delft University of Technology

Benchmarks LLM safety alignment failures when harmful content is embedded in benign tasks like translation, revealing a content-level ethical blind spot

Prompt Injection nlp
PDF
attack arXiv Mar 10, 2026 · 27d ago

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

Gorka Abad, Ermes Franch, Stefanos Koffas et al. · University of Bergen · Delft University of Technology +2 more

Proves backdoor-trained models stay exploitable via alternative triggers even after defenses neutralize the original training trigger

Model Poisoning vision
PDF
attack arXiv Jan 30, 2026 · 9w ago

AST-PAC: AST-guided Membership Inference for Code

Roham Koohestani, Ali Al-Kaswan, Jonathan Katzy et al. · Delft University of Technology

AST-guided membership inference attack for code LLMs using syntax-aware perturbations to audit training data provenance

Membership Inference Attack nlp
PDF
defense arXiv Jan 30, 2026 · 9w ago

Protecting Private Code in IDE Autocomplete using Differential Privacy

Evgeny Grigorenko, David Stanojević, David Ilić et al. · JetBrains Research · Delft University of Technology

Defends LLM code completion against membership inference and memorization using DP fine-tuning, cutting MIA AUC from 0.901 to 0.606

Membership Inference Attack Sensitive Information Disclosure nlp
PDF
benchmark arXiv Jan 23, 2026 · 10w ago

How does Graph Structure Modulate Membership-Inference Risk for Graph Neural Networks?

Megha Khosla · Delft University of Technology

Analyzes how graph structure and edge access at inference time modulate membership inference risk in GNNs, beyond generalization gap

Membership Inference Attack graph
PDF
attack arXiv Jan 6, 2026 · Jan 2026

Quality Degradation Attack in Synthetic Data

Qinyi Liu, Dong Liu, Farhad Vadiee et al. · University of Bergen · Delft University of Technology

Attacks synthetic data generators via label flipping and feature interventions, substantially degrading downstream predictive quality

Data Poisoning Attack tabulargenerative
PDF
survey arXiv Nov 17, 2025 · Nov 2025

SoK: The Last Line of Defense: On Backdoor Defense Evaluation

Gorka Abad, Marina Krček, Stefanos Koffas et al. · University of Bergen · Radboud University +3 more

Surveys 183 backdoor defense papers revealing critical evaluation inconsistencies and proposing standardized assessment recommendations

Model Poisoning vision
1 citations PDF
attack arXiv Nov 8, 2025 · Nov 2025

CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding

Behrad Tajalli, Stefanos Koffas, Stjepan Picek · Radboud University · Delft University of Technology +1 more

Backdoor attack on tabular ML models via categorical-to-float encoding enabling gradient-based universal triggers with 100% ASR

Model Poisoning tabular
PDF
attack arXiv Jan 10, 2025 · Jan 2025

Towards Backdoor Stealthiness in Model Parameter Space

Xiaoyun Xu, Zhuoran Liu, Stefanos Koffas et al. · Radboud University Nijmegen · Delft University of Technology +1 more

Proposes Grond, a backdoor attack stealthy in parameter space that evades 17 diverse defenses via adaptive neuron-level injection

Model Poisoning vision
PDF Code