Sooel Son

h-index: 2 12 citations 9 papers (total)

Papers in Database (4)

defense arXiv Sep 26, 2025 · Sep 2025

Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment

Jaehan Kim, Minkyoo Song, Seungwon Shin et al. · KAIST

Defends MoE LLMs against harmful fine-tuning by penalizing routing drift away from safety-critical experts

Transfer Learning Attack Prompt Injection nlp
3 citations 1 influentialPDF Code
attack arXiv Feb 19, 2026 · 6w ago

Discovering Universal Activation Directions for PII Leakage in Language Models

Leo Marchyok, Zachary Coalson, Sungho Keum et al. · Oregon State University · Korea Advanced Institute of Science & Technology

Discovers universal activation directions in LLM residual streams that reliably amplify PII leakage beyond existing prompt-based extraction attacks

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Oct 22, 2025 · Oct 2025

AegisRF: Adversarial Perturbations Guided with Sensitivity for Protecting Intellectual Property of Neural Radiance Fields

Woo Jae Kim, Kyu Beom Han, Yoonki Cho et al. · Korea Advanced Institute of Science and Technology

Defends NeRF IP by embedding adversarial perturbations in rendered outputs to disrupt unauthorized downstream classifiers and 3D localization models

Input Manipulation Attack Output Integrity Attack vision
PDF Code
attack arXiv Feb 6, 2026 · 8w ago

Subgraph Reconstruction Attacks on Graph RAG Deployments with Practical Defenses

Minkyoo Song, Jaehan Kim, Myungchul Kang et al. · KAIST · National Security Research Institute

Attacks Graph RAG systems to reconstruct proprietary knowledge graphs via multi-turn prompting, reaching 82.9 F1 against safety-aligned LLMs

Sensitive Information Disclosure nlpgraph
PDF