Jaehyung Kim

h-index: 2 13 citations 10 papers (total)

Papers in Database (2)

attack arXiv Nov 3, 2025 · Nov 2025

Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges

Hamin Koo, Minseon Kim, Jaehyung Kim · Yonsei University · Microsoft Research

Meta-optimized bi-level framework co-evolves jailbreak prompts and LLM judge templates to achieve SOTA attack success rates on Claude models

Prompt Injection nlp
1 citations PDF
attack arXiv Jan 16, 2026 · 11w ago

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Minseo Kwak, Jaehyung Kim · Yonsei University

Novel LLM membership inference attack using top-1 prediction probability gaps and sliding window correlation to detect pretraining data

Membership Inference Attack nlp
PDF