Andreas Haupt

h-index: 4 34 citations 12 papers (total)

Papers in Database (1)

defense arXiv Oct 13, 2025 · Oct 2025

Don't Walk the Line: Boundary Guidance for Filtered Generation

Sarah Ball, Andreas Haupt · Ludwig-Maximilians-Universität München · Munich Center for Machine Learning +1 more

RL fine-tuning steers LLM outputs away from safety classifier margins to reduce jailbreak bypass and over-refusal simultaneously

Prompt Injection nlp
1 citations PDF Code