AI-Generated Music Detection in Broadcast Monitoring

AI music generators have advanced to the point where their outputs are often indistinguishable from human compositions. While detection methods have emerged, they are typically designed and validated in music streaming contexts with clean, full-length tracks. Broadcast audio, however, poses a different challenge: music appears as short excerpts, often masked by dominant speech, conditions under which existing detectors fail. In this work, we introduce AI-OpenBMAT, the first dataset tailored to broadcast-style AI-music detection. It contains 3,294 one-minute audio excerpts (54.9 hours) that follow the duration patterns and loudness relations of real television audio, combining human-made production music with stylistically matched continuations generated with Suno v3.5. We benchmark a CNN baseline and state-of-the-art SpectTTTra models to assess SNR and duration robustness, and evaluate on a full broadcast scenario. Across all settings, models that excel in streaming scenarios suffer substantial degradation, with F1-scores dropping below 60% when music is in the background or has a short duration. These results highlight speech masking and short music length as critical open challenges for AI music detection, and position AI-OpenBMAT as a benchmark for developing detectors capable of meeting industrial broadcast requirements.

Key Contributions

AI-OpenBMAT: the first dataset (3,294 one-minute excerpts, 54.9 hours) tailored to AI-generated music detection under broadcast conditions, pairing human-composed tracks with Suno v3.5 stylistic continuations at realistic broadcast SNR and duration distributions
Systematic benchmarking of CNN baseline and SpectTTTra models across SNR sweeps, duration sensitivity tests, and full broadcast scenario evaluation
Demonstration that state-of-the-art streaming-centric detectors degrade substantially in broadcast settings, with F1-scores falling below 60% under speech masking or short music duration

🛡️ Threat Analysis

Output Integrity Attack

The paper's core contribution is evaluating detectors of AI-generated audio content (music) — a direct instance of output integrity and content provenance verification. AI-generated content detection (deepfakes, synthetic audio, AI text) is explicitly within ML09 scope.

Details

Domains

audio

Model Types

cnntransformer

Threat Tags

inference_time

Datasets

AI-OpenBMATOpenBMATSONICSBAF

Applications

2025 0 cit.

Output Integrity Attack

89%

AI-Generated Music Detection in Broadcast Monitoring

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Multi-Speaker Conversational Audio Deepfake: Taxonomy, Dataset and Pilot Study

Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models

Multilingual Source Tracing of Speech Deepfakes: A First Benchmark

Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR

AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

Melody or Machine: Detecting Synthetic Music with Dual-Stream Contrastive Learning

Toward Noise-Aware Audio Deepfake Detection: Survey, SNR-Benchmarks, and Practical Recipes

The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures