Latest papers

1 papers
attack arXiv Mar 13, 2026 · 24d ago

Colluding LoRA: A Composite Attack on LLM Safety Alignment

Sihao Ding · Mercedes-Benz

Attack merging individually benign LoRA adapters that collectively disable LLM safety alignment without requiring adversarial prompts

AI Supply Chain Attacks Model Poisoning Prompt Injection nlp
PDF