Filip Sondej

Papers in Database (1)

defense arXiv Sep 15, 2025 · Sep 2025

Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning

Filip Sondej, Yushi Yang · Jagiellonian University · University of Oxford

Proposes CIR to robustly remove bio/cyber-hazardous knowledge from LLMs, resisting adversarial fine-tuning and jailbreak recovery attacks

Prompt Injection nlp
PDF Code