John Le

h-index: 1 1 citations 3 papers (total)

Papers in Database (1)

attack arXiv Oct 14, 2025 · Oct 2025

RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs

Tuan T. Nguyen, John Le, Thai T. Vu et al. · VNPT AI · University of Wollongong

Embedding-space adversarial suffix attack steers LLM activations away from refusal directions to achieve jailbreaks with fewer queries

Input Manipulation Attack Prompt Injection nlp
PDF