benchmark 2026

AI-Generated Music Detection in Broadcast Monitoring

David López-Ayala 1, Asier Cabello 2, Pablo Zinemanas 2, Emilio Molina 2, Martín Rocamora 1

0 citations · 14 references · arXiv (Cornell University)

α

Published on arXiv

2602.06823

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Models achieving high F1 in streaming scenarios drop below 60% F1 when music is background-masked by speech or restricted to short durations in broadcast conditions.

AI-OpenBMAT

Novel technique introduced


AI music generators have advanced to the point where their outputs are often indistinguishable from human compositions. While detection methods have emerged, they are typically designed and validated in music streaming contexts with clean, full-length tracks. Broadcast audio, however, poses a different challenge: music appears as short excerpts, often masked by dominant speech, conditions under which existing detectors fail. In this work, we introduce AI-OpenBMAT, the first dataset tailored to broadcast-style AI-music detection. It contains 3,294 one-minute audio excerpts (54.9 hours) that follow the duration patterns and loudness relations of real television audio, combining human-made production music with stylistically matched continuations generated with Suno v3.5. We benchmark a CNN baseline and state-of-the-art SpectTTTra models to assess SNR and duration robustness, and evaluate on a full broadcast scenario. Across all settings, models that excel in streaming scenarios suffer substantial degradation, with F1-scores dropping below 60% when music is in the background or has a short duration. These results highlight speech masking and short music length as critical open challenges for AI music detection, and position AI-OpenBMAT as a benchmark for developing detectors capable of meeting industrial broadcast requirements.


Key Contributions

  • AI-OpenBMAT: the first dataset (3,294 one-minute excerpts, 54.9 hours) tailored to AI-generated music detection under broadcast conditions, pairing human-composed tracks with Suno v3.5 stylistic continuations at realistic broadcast SNR and duration distributions
  • Systematic benchmarking of CNN baseline and SpectTTTra models across SNR sweeps, duration sensitivity tests, and full broadcast scenario evaluation
  • Demonstration that state-of-the-art streaming-centric detectors degrade substantially in broadcast settings, with F1-scores falling below 60% under speech masking or short music duration

🛡️ Threat Analysis

Output Integrity Attack

The paper's core contribution is evaluating detectors of AI-generated audio content (music) — a direct instance of output integrity and content provenance verification. AI-generated content detection (deepfakes, synthetic audio, AI text) is explicitly within ML09 scope.


Details

Domains
audio
Model Types
cnntransformer
Threat Tags
inference_time
Datasets
AI-OpenBMATOpenBMATSONICSBAF
Applications
broadcast monitoringai-generated music detectiontelevision audio analysis