Latest papers

1 papers
attack arXiv Oct 14, 2025 · Oct 2025

RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs

Tuan T. Nguyen, John Le, Thai T. Vu et al. · VNPT AI · University of Wollongong

Embedding-space adversarial suffix attack steers LLM activations away from refusal directions to achieve jailbreaks with fewer queries

Input Manipulation Attack Prompt Injection nlp
PDF