benchmark 2026

Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition

Prasoon Goyal , Sattvik Sahai , Michael Johnston , Hangjie Shi , Yao Lu , Shaohua Liu , Anna Rumshisky , Rahul Gupta , Anna Gottardi , Desheng Zhang , Lavina Vaz , Leslie Ball , Lucy Hu , Luke Dai , Samyuth Sagi , Maureen Murray , Sankaranarayanan Ananthakrishnan

Amazon

0 citations

Published on arXiv

2604.17803

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Fine-tuning on crowdsourced adversarial data produced 18.47% improvement on CyberSecEval-Instruct and 29.42% improvement on CyberSecEval-MITRE for secure code generation

Adversarial Arena

Novel technique introduced

Post-training Large Language Models requires diverse, high-quality data which is rare and costly to obtain, especially in low resource domains and for multi-turn conversations. Common solutions are crowdsourcing or synthetic generation, but both often yield low-quality or low-diversity data. We introduce Adversarial Arena for building high quality conversational datasets by framing data generation as an adversarial task: attackers create prompts, and defenders generate responses. This interactive competition between multiple teams naturally produces diverse and complex data. We validated this approach by conducting a competition with 10 academic teams from top US and European universities, each building attacker or defender bots. The competition, focused on safety alignment of LLMs in cybersecurity, generated 19,683 multi-turn conversations. Fine-tuning an open-source model on this dataset produced an 18.47% improvement in secure code generation on CyberSecEval-Instruct and 29.42% improvement on CyberSecEval-MITRE.

Key Contributions

Adversarial Arena framework for generating high-quality multi-turn conversational datasets through competitive red-teaming
Cybersecurity safety alignment dataset of 19,683 multi-turn conversations from 10 academic teams
18.47% improvement on CyberSecEval-Instruct and 29.42% improvement on CyberSecEval-MITRE benchmarks

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_time

Datasets

CyberSecEval-InstructCyberSecEval-MITRE

Applications

secure code generationllm safety alignmentcybersecurity

Read PDF arXiv

Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability

When Scanners Lie: Evaluator Instability in LLM Red-Teaming

Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

How Catastrophic is Your LLM? Certifying Risk in Conversation

Analysing the Safety Pitfalls of Steering Vectors

ObjexMT: Objective Extraction and Metacognitive Calibration for LLM-as-a-Judge under Multi-Turn Jailbreaks

Automating Deception: Scalable Multi-Turn LLM Jailbreaks

OpenAI's GPT-OSS-20B Model and Safety Alignment Issues in a Low-Resource Language