Yo-Sub Han

defense arXiv Nov 11, 2025 · Nov 2025

Shinwoo Park, Hyejin Park, Hyeseon Ahn et al. · Yonsei University · Rensselaer Polytechnic Institute

Watermarks LLM text outputs via modular token-rank partitioning, supporting binary and multi-bit provenance tracing without fluency loss

Output Integrity Attack nlp

4 citations PDF Code

defense arXiv Jan 7, 2026 · Jan 2026

Su-Hyeon Kim, Hyundong Jin, Yejin Lee et al. · Yonsei University

Defends LLMs against jailbreaks by injecting entropy-triggered safe-reminding phrases into reasoning model thinking steps at inference time

Prompt Injection nlp

attack arXiv Oct 13, 2025 · Oct 2025

Hyeseon An, Shinwoo Park, Suyeon Woo et al. · Yonsei University · Seoul National University

Spoofs LLM watermarks via knowledge distillation, enabling disinformation falsely attributed to trusted models like ChatGPT

Output Integrity Attack nlp

defense arXiv Oct 10, 2025 · Oct 2025

Shinwoo Park, Hyejin Park, Hyeseon Ahn et al. · Yonsei University · Rensselaer Polytechnic Institute

Linguistics-aware LLM text watermarking using POS n-gram entropy to balance quality and detectability without model logit access

Output Integrity Attack nlp

Papers in Database (4)