Yo-Sub Han

h-index: 5 55 citations 36 papers (total)

Papers in Database (4)

defense arXiv Nov 11, 2025 · Nov 2025

WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking

Shinwoo Park, Hyejin Park, Hyeseon Ahn et al. · Yonsei University · Rensselaer Polytechnic Institute

Watermarks LLM text outputs via modular token-rank partitioning, supporting binary and multi-bit provenance tracing without fluency loss

Output Integrity Attack nlp
4 citations PDF Code
defense arXiv Jan 7, 2026 · Jan 2026

How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs

Su-Hyeon Kim, Hyundong Jin, Yejin Lee et al. · Yonsei University

Defends LLMs against jailbreaks by injecting entropy-triggered safe-reminding phrases into reasoning model thinking steps at inference time

Prompt Injection nlp
PDF
attack arXiv Oct 13, 2025 · Oct 2025

DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

Hyeseon An, Shinwoo Park, Suyeon Woo et al. · Yonsei University · Seoul National University

Spoofs LLM watermarks via knowledge distillation, enabling disinformation falsely attributed to trusted models like ChatGPT

Output Integrity Attack nlp
PDF Code
defense arXiv Oct 10, 2025 · Oct 2025

A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Shinwoo Park, Hyejin Park, Hyeseon Ahn et al. · Yonsei University · Rensselaer Polytechnic Institute

Linguistics-aware LLM text watermarking using POS n-gram entropy to balance quality and detectability without model logit access

Output Integrity Attack nlp
PDF Code