Daeseon Choi

attack arXiv Oct 31, 2025 · Oct 2025

Heehwan Kim, Sungjune Park, Daeseon Choi · Soongsil University

Novel jailbreak attack where an LLM generates obfuscated harmful queries that bypass its own guardrails when re-entered in a new session

Prompt Injection nlp

defense arXiv Jan 13, 2026 · 11w ago

Seong-Gyu Park, Sohee Park, Jisu Lee et al. · Soongsil University

Detects inference-time backdoor triggers in LLM Chain-of-Thought reasoning via output probability shift analysis, achieving AUROC ≈ 1.0

Model Poisoning Prompt Injection nlp

Papers in Database (2)