Suvadeep Hajra

Papers in Database (1)

attack arXiv Mar 15, 2026 · 22d ago

Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling

Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty · Indian Institute of Technology Delhi

Efficient red-teaming method that uncovers LLM jailbreaks through diverse response sampling rather than adversarial prompt optimization

Prompt Injection nlp
PDF