Claudio Pinhanez

h-index: 4 22 citations 10 papers (total)

Papers in Database (1)

benchmark arXiv Nov 11, 2025 · Nov 2025

A methodological analysis of prompt perturbations and their effect on attack success rates

Tiago Machado, Maysa Malfiza Garcia de Macedo, Rogerio Abreu de Paula et al. · IBM Research

Statistically analyzes how prompt perturbations shift jailbreak ASR across SFT, DPO, and RLHF-aligned LLMs, exposing benchmark evaluation gaps

Prompt Injection nlp
PDF