Sooel Son

defense arXiv Sep 26, 2025 · Sep 2025

Jaehan Kim, Minkyoo Song, Seungwon Shin et al. · KAIST

Defends MoE LLMs against harmful fine-tuning by penalizing routing drift away from safety-critical experts

Transfer Learning Attack Prompt Injection nlp

3 citations 1 influentialPDF Code

attack arXiv Feb 19, 2026 · 6w ago

Leo Marchyok, Zachary Coalson, Sungho Keum et al. · Oregon State University · Korea Advanced Institute of Science & Technology

Discovers universal activation directions in LLM residual streams that reliably amplify PII leakage beyond existing prompt-based extraction attacks

Model Inversion Attack Sensitive Information Disclosure nlp

defense arXiv Oct 22, 2025 · Oct 2025

Woo Jae Kim, Kyu Beom Han, Yoonki Cho et al. · Korea Advanced Institute of Science and Technology

Defends NeRF IP by embedding adversarial perturbations in rendered outputs to disrupt unauthorized downstream classifiers and 3D localization models

Input Manipulation Attack Output Integrity Attack vision

attack arXiv Feb 6, 2026 · 8w ago

Minkyoo Song, Jaehan Kim, Myungchul Kang et al. · KAIST · National Security Research Institute

Attacks Graph RAG systems to reconstruct proprietary knowledge graphs via multi-turn prompting, reaching 82.9 F1 against safety-aligned LLMs

Sensitive Information Disclosure nlpgraph

Papers in Database (4)