ML Security Papers

Latest papers

14 papers

defense arXiv Apr 27, 2026 · 24d ago

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

Kaisheng Fan, Weizhe Zhang, Yishu Gao et al. · Harbin Institute of Technology · Peng Cheng Laboratory +1 more

Plug-and-play inference-time backdoor defense detecting trigger-induced attention collapse in LLMs without parameter updates or latency overhead

Model Poisoning Training Data Poisoning nlp

PDF

defense arXiv Apr 5, 2026 · 6w ago

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

Dat Nguyen, Enjie Ghorbel, Anis Kacem et al. · University of Luxembourg · University of Manouba

Deepfake detection framework using localized artifact attention to generalize across unseen face manipulation techniques

Output Integrity Attack visiongenerative

PDF Code

attack arXiv Mar 23, 2026 · 8w ago

Adversarial Camouflage

Paweł Borsukiewicz, Daniele Lunghi, Melissa Tessa et al. · University of Luxembourg

Physical adversarial camouflage patterns painted on faces to evade facial recognition systems in real-world surveillance scenarios

Input Manipulation Attack vision

PDF

tool arXiv Mar 5, 2026 · 11w ago

NOTAI.AI: Explainable Detection of Machine-Generated Text via Curvature and Feature Attribution

Oleksandr Marchenko Breneur, Adelaide Danilov, Aria Nourbakhsh et al. · University of Luxembourg

Deploys explainable AI-text detector combining curvature features, XGBoost, and SHAP attributions in a real-time web app

Output Integrity Attack nlp

PDF

attack arXiv Jan 26, 2026 · Jan 2026

Malicious Repurposing of Open Science Artefacts by Using Large Language Models

Zahra Hashemi, Zhiqiang Zhong, Jun Pang et al. · University of Luxembourg · University of Aberdeen

Persuasion-based jailbreak pipeline exploits LLMs to repurpose open NLP artefacts into harmful research proposals

Prompt Injection nlp

PDF

survey arXiv Jan 22, 2026 · Jan 2026

SoK: Challenges in Tabular Membership Inference Attacks

Cristina Pêra, Tânia Carvalho, Maxime Cordy et al. · University of Porto · TekPrivacy +1 more

Surveys and empirically benchmarks membership inference attacks on tabular data across centralized and federated learning, revealing poor general attack performance but high single-out exposure

Membership Inference Attack tabularfederated-learning

PDF

benchmark arXiv Jan 11, 2026 · Jan 2026

How Secure is Secure Code Generation? Adversarial Prompts Put LLM Defenses to the Test

Melissa Tessa, Iyiola E. Olatunji, Aicha War et al. · University of Luxembourg

Adversarial audit exposes that LLM secure code generation defenses collapse to 3–17% true secure-functional rates under realistic prompt perturbations

Prompt Injection nlpgenerative

PDF

attack arXiv Dec 28, 2025 · Dec 2025

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software

Moustapha Awwalou Diouf, Maimouna Tamah Diao, Iyiola Emmanuel Olatunji et al. · University of Luxembourg · University Cheikh Anta Diop +1 more

RSA pretexting strategy jailbreaks five major LLMs to generate working CVE exploits for ERP software with 100% success rate

Prompt Injection Exploit Generation Red-Team Agents nlp

PDF Code

benchmark TrustCom Dec 17, 2025 · Dec 2025

Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference

Chenxiang Zhang, Tongxi Qu, Zhong Li et al. · University of Luxembourg · Nanjing University

Evaluates how post-training quantization affects membership inference vulnerability, finding 1.58-bit models leak an order of magnitude less than full-precision

Membership Inference Attack vision

PDF

benchmark arXiv Nov 14, 2025 · Nov 2025

M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text

Salima Lamsiyah, Saad Ezzini, Abdelkader El Mahdaouy et al. · University of Luxembourg · King Fahd University of Petroleum and Minerals +2 more

Introduces a 30K-sample shared-task benchmark for detecting LLM-generated text across news and academic domains

Output Integrity Attack nlp

1 citations PDF

defense Consumer Communications and Ne... Nov 14, 2025 · Nov 2025

NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks

Lama Sleem, Jerome Francois, Lujun Li et al. · University of Luxembourg · Institut National Polytechnique de Toulouse +1 more

Detects LLM jailbreaks via negation-aware BLEURT scoring and Isolation Forest anomaly detection without threshold tuning

Prompt Injection nlp

PDF Code

defense arXiv Sep 15, 2025 · Sep 2025

Poison to Detect: Detection of Targeted Overfitting in Federated Learning

Soumia Zohra El Mestari, Maciej Krzysztof Zuziak, Gabriele Lenzini · University of Luxembourg · National Research Council of Italy

Detects FL orchestrator manipulation causing targeted overfitting that enables membership inference and data reconstruction attacks

Model Inversion Attack Membership Inference Attack federated-learning

PDF

defense arXiv Aug 6, 2025 · Aug 2025

Adversarial Attacks and Defenses on Graph-aware Large Language Models (LLMs)

Iyiola E. Olatunji, Franziska Boenisch, Jing Xu et al. · University of Luxembourg · CISPA Helmholtz Center for Information Security

Attacks graph-aware LLMs via poisoning, evasion, and template injection; proposes GALGUARD combining feature correction and GNN defenses

Input Manipulation Attack Data Poisoning Attack Prompt Injection graphnlp

PDF

defense arXiv Jan 2, 2025 · Jan 2025

Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

Dat Nguyen, Marcella Astrid, Anis Kacem et al. · University of Luxembourg · University of Manouba

Detects deepfake videos via spatio-temporal artifact modeling with multi-task learning and pseudo-fake video synthesis

Output Integrity Attack vision

2 citations PDF Code

Latest papers

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

Adversarial Camouflage

NOTAI.AI: Explainable Detection of Machine-Generated Text via Curvature and Feature Attribution

Malicious Repurposing of Open Science Artefacts by Using Large Language Models

SoK: Challenges in Tabular Membership Inference Attacks

How Secure is Secure Code Generation? Adversarial Prompts Put LLM Defenses to the Test

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software

Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference

M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text

NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks

Poison to Detect: Detection of Targeted Overfitting in Federated Learning

Adversarial Attacks and Defenses on Graph-aware Large Language Models (LLMs)

Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue