Artur Horal

h-index: 1 1 citations 1 papers (total)

Papers in Database (1)

tool arXiv Oct 8, 2025 · Oct 2025

RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning

Artur Horal, Daniel Pina, Henrique Paz et al. · NOVA University of Lisbon

Adaptive multi-turn red teaming framework that jailbreaks safety-aligned LLMs via hierarchical attack planning and diverse conversational strategies

Prompt Injection nlp
1 citations PDF