AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

Large language models (LLMs) are increasingly capable of generating functional source code, raising concerns about authorship, accountability, and security. While detecting AI-generated code is critical, existing datasets and benchmarks are narrow, typically limited to binary human-machine classification under in-distribution settings. To bridge this gap, we introduce $\emph{AICD Bench}$, the most comprehensive benchmark for AI-generated code detection. It spans $\emph{2M examples}$, $\emph{77 models}$ across $\emph{11 families}$, and $\emph{9 programming languages}$, including recent reasoning models. Beyond scale, AICD Bench introduces three realistic detection tasks: ($\emph{i}$)~$\emph{Robust Binary Classification}$ under distribution shifts in language and domain, ($\emph{ii}$)~$\emph{Model Family Attribution}$, grouping generators by architectural lineage, and ($\emph{iii}$)~$\emph{Fine-Grained Human-Machine Classification}$ across human, machine, hybrid, and adversarial code. Extensive evaluation on neural and classical detectors shows that performance remains far below practical usability, particularly under distribution shift and for hybrid or adversarial code. We release AICD Bench as a $\emph{unified, challenging evaluation suite}$ to drive the next generation of robust approaches for AI-generated code detection. The data and the code are available at https://huggingface.co/AICD-bench}.

Key Contributions

AICD Bench: a 2M-sample benchmark spanning 77 LLMs across 11 model families and 9 programming languages for AI-generated code detection
Three novel evaluation tasks: robust binary classification under distribution shift, model family attribution, and fine-grained classification across human/machine/hybrid/adversarial code
Empirical evaluation showing current classical and neural detectors generalize poorly under OOD settings, especially on hybrid and adversarial code

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection — specifically source code — which falls under output integrity and content provenance. The benchmark evaluates detector robustness across distribution shifts, model family attribution, and adversarial/hybrid code, advancing the field of AI-generated content detection rather than merely applying existing methods to a narrow domain.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Datasets

AICD Bench

Applications

2025 0 cit.

Output Integrity Attack

100%