Amin Saied

attack arXiv Oct 30, 2025 · Oct 2025

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning

Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar et al. · University of California · Microsoft

Red-teaming framework attacks LLM agents via diverse seed generation and iterative adversarial prompts, with distilled 8B model surpassing DeepSeek-R1 671B on attack success rate

Prompt Injection Excessive Agency nlp

1 citations PDF

Papers in Database (1)

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning