defense 2025

Segment Transformer: AI-Generated Music Detection via Music Structural Analysis

Yumin Kim , Seonghyeon Go

0 citations

α

Published on arXiv

2509.08283

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Beat-aware segment-level structural analysis improves both performance and robustness of full-audio AIGM detection compared to CNN-limited and SpecTTTra baselines on FakeMusicCaps and SONICS

Segment Transformer

Novel technique introduced


Audio and music generation systems have been remarkably developed in the music information retrieval (MIR) research field. The advancement of these technologies raises copyright concerns, as ownership and authorship of AI-generated music (AIGM) remain unclear. Also, it can be difficult to determine whether a piece was generated by AI or composed by humans clearly. To address these challenges, we aim to improve the accuracy of AIGM detection by analyzing the structural patterns of music segments. Specifically, to extract musical features from short audio clips, we integrated various pre-trained models, including self-supervised learning (SSL) models or an audio effect encoder, each within our suggested transformer-based framework. Furthermore, for long audio, we developed a segment transformer that divides music into segments and learns inter-segment relationships. We used the FakeMusicCaps and SONICS datasets, achieving high accuracy in both the short-audio and full-audio detection experiments. These findings suggest that integrating segment-level musical features into long-range temporal analysis can effectively enhance both the performance and robustness of AIGM detection systems.


Key Contributions

  • AudioCAT: a cross-attention-based Transformer decoder pairing SSL audio encoders and FXencoder for short-segment AIGM detection
  • Segment Transformer: a beat-aware segmentation model that captures inter-segment structural relationships across full-length music compositions
  • Two-stage AIGM detection framework combining segment-level and global structural features, evaluated on FakeMusicCaps and SONICS datasets

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated content detection by proposing novel detection architectures — AudioCAT and Segment Transformer — for distinguishing AI-generated from human-composed music, which falls under output integrity and content authenticity verification.


Details

Domains
audio
Model Types
transformer
Threat Tags
inference_time
Datasets
FakeMusicCapsSONICS
Applications
ai-generated music detectionmusic copyright protectioncontent authenticity verification