Latest papers

4 papers
attack arXiv Mar 4, 2026 · 4w ago

Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models

Ziyuan Chen, Yujin Jeong, Tobias Braun et al. · TU Darmstadt · Hessian Center for Artificial Intelligence

Proposes MELT, a LoRA-based backdoor attack on Stable Diffusion 3 requiring tuning fewer than 0.2% of encoder parameters

Model Poisoning Transfer Learning Attack visiongenerativemultimodal
PDF
benchmark arXiv Feb 2, 2026 · 9w ago

AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

Daniil Orel, Dilshod Azizov, Indraneil Paul et al. · Mohamed bin Zayed University of Artificial Intelligence · TU Darmstadt +1 more

Large-scale benchmark revealing AI-generated code detectors fail severely under distribution shift and adversarial conditions

Output Integrity Attack nlp
PDF Code
benchmark arXiv Jan 21, 2026 · 10w ago

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

Anmol Goel, Cornelius Emde, Sangdoo Yun et al. · Parameter Lab · TU Darmstadt +3 more

Benign fine-tuning silently breaks contextual privacy in LLMs, causing inappropriate data disclosure undetected by standard safety benchmarks

Transfer Learning Attack Sensitive Information Disclosure nlp
PDF
attack arXiv Jan 19, 2026 · 11w ago

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Jesus-German Ortiz-Barajas, Jonathan Tonglet, Vivek Gupta et al. · INSAIT · Sofia University +3 more

Jailbreaks MLLMs via adversarial prompting to auto-generate misleading charts, reducing human and MLLM QA accuracy by ~20 points

Prompt Injection multimodalvisionnlp
PDF Code