Latest papers

2 papers
benchmark arXiv Jan 21, 2026 · 10w ago

Auditing Language Model Unlearning via Information Decomposition

Anmol Goel, Alan Ritter, Iryna Gurevych · Technical University of Darmstadt · National Research Center for Applied Cybersecurity ATHENE +1 more

Audits LLM unlearning via Partial Information Decomposition, revealing residual training data remains vulnerable to adversarial reconstruction attacks

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
attack arXiv Jan 19, 2026 · 11w ago

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Jesus-German Ortiz-Barajas, Jonathan Tonglet, Vivek Gupta et al. · INSAIT · Sofia University +3 more

Jailbreaks MLLMs via adversarial prompting to auto-generate misleading charts, reducing human and MLLM QA accuracy by ~20 points

Prompt Injection multimodalvisionnlp
PDF Code