John Le

attack arXiv Oct 14, 2025 · Oct 2025

Tuan T. Nguyen, John Le, Thai T. Vu et al. · VNPT AI · University of Wollongong

Embedding-space adversarial suffix attack steers LLM activations away from refusal directions to achieve jailbreaks with fewer queries

Input Manipulation Attack Prompt Injection nlp

Papers in Database (1)