Fu-Chieh Chang

Papers in Database (1)

benchmark arXiv Aug 23, 2025 · Aug 2025

Unveiling the Latent Directions of Reflection in Large Language Models

Fu-Chieh Chang, Yu-Ting Lee, Pei-Yuan Wu · MediaTek Research · National Taiwan University

Activation steering reveals latent reflection directions in LLMs, enabling adversarial suppression for jailbreaks or enhancement as a defense

Prompt Injection nlp
PDF