Sarah Ball

h-index: 3 49 citations 8 papers (total)

Papers in Database (2)

defense arXiv Oct 13, 2025 · Oct 2025

Don't Walk the Line: Boundary Guidance for Filtered Generation

Sarah Ball, Andreas Haupt · Ludwig-Maximilians-Universität München · Munich Center for Machine Learning +1 more

RL fine-tuning steers LLM outputs away from safety classifier margins to reduce jailbreak bypass and over-refusal simultaneously

Prompt Injection nlp
1 citations PDF Code
benchmark arXiv Oct 24, 2025 · Oct 2025

Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models

Sarah Ball, Niki Hasrati, Alexander Robey et al. · Ludwig-Maximilians-Universität München · Carnegie Mellon University +1 more

Analyzes why gradient-optimized adversarial suffixes transfer across LLMs using refusal-direction geometry in activation space

Input Manipulation Attack Prompt Injection nlp
PDF Code