Huan Song

h-index: 1 2 citations 3 papers (total)

Papers in Database (1)

defense arXiv Nov 26, 2025 · Nov 2025

Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs

Dongkyu Derek Cho, Huan Song, Arijit Ghosh Chowdhury et al. · Duke University · AWS Generative AI Innovation Center

Demonstrates RLVR fine-tuning maintains LLM safety guardrails while improving reasoning, breaking the assumed safety-capability tradeoff

Prompt Injection nlp
1 citations PDF