Seungwon Shin

h-index: 5 64 citations 29 papers (total)

Papers in Database (3)

defense arXiv Sep 26, 2025 · Sep 2025

Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment

Jaehan Kim, Minkyoo Song, Seungwon Shin et al. · KAIST

Defends MoE LLMs against harmful fine-tuning by penalizing routing drift away from safety-critical experts

Transfer Learning Attack Prompt Injection nlp
3 citations 1 influentialPDF Code
attack arXiv Jan 8, 2026 · Jan 2026

$PC^2$: Politically Controversial Content Generation via Jailbreaking Attacks on GPT-based Text-to-Image Models

Wonwoo Choi, Minjae Seo, Minkyoo Song et al.

Black-box jailbreak bypassing GPT text-to-image political safety filters via semantic obfuscation and cross-language fragmentation

Prompt Injection nlpvisionmultimodalgenerative
PDF
attack arXiv Feb 6, 2026 · 8w ago

Subgraph Reconstruction Attacks on Graph RAG Deployments with Practical Defenses

Minkyoo Song, Jaehan Kim, Myungchul Kang et al. · KAIST · National Security Research Institute

Attacks Graph RAG systems to reconstruct proprietary knowledge graphs via multi-turn prompting, reaching 82.9 F1 against safety-aligned LLMs

Sensitive Information Disclosure nlpgraph
PDF