A list of datasets directly related to Music Information Retrieval Datasets

A list of datasets directly related to Music Information Retrieval Datasets

2019-12-27 13:08:08




This is yet another attempt of maintaining a list of datasets directly related to MIR. Other lists that I have found are this wiki, the ISMIR pagethis web page, and this web page. If you are interested in speech processing, you can find a table of speech datasets on this page. If you are interested in multi-tracks, the Open Multitrack Testbed should be a good starting point. UPF also has an excellent page with datasets for world-music, including Indian art musicTurkish Makam music, and Beijing Opera. A curated list of MIDI sources can be found here. Two additional general resources are piano-midi.de for MIDI files and freesound.org for audio files.

If you know of other data sets that should be included in this list and eventually in the book please send me a note or post a comment.


200DrumMachines   7371 one-shots yes
ACM_MIRUM tempo 1410 excerpts (60s) yes
AcousticBrainz-Genre 15-31 genres with 265-745 subgenres audio features for over 2000000 songs no
ADC2004 predominant pitch 20 excerpts yes
Acoustic Event Dataset 28 event classes 5223 audio snippets yes
Amg1608 valence & arousal 1608 excerpts (30s) no
AMT-pilot structure by multiple annotators 8 songs yes
APL piano practice 620 segments yes
artist20 20 artists 1413 songs no
AudioSet 632 event classes 2084320 clips (10s) no
bach10 multitrack & aligned MIDI 10 chorales yes
ballroom 8 genres & tempo & (down-)beats 698 excerpts (30s) yes
beatboxset1 percussion annotation 14 clips yes
BPS-FH functional annotation 32 sonatas no
C224a 14 genres 224 artists no
C3ka 18 genres 3000 artists no
C49ka-C111ka genres 48800/110588 artists no
CAL10k tags 10870 songs no
CAL500 tags 502 songs yes
CarnaticRhythm sama & beats 176 pieces on request
CASD chords by 4 annotators 50 songs no
CCMixter vocal & background track 50 mixes yes
Chopin22 aligned MIDI 44 recordings yes
Clotho 5 descriptive captions 4981 snippets yes
CMMSD note/rest/transition & onsets & vibrato 36 excerpts no
Coidach 55 genres 26420 songs no
corpusCOFLA editorial & predominant melody 1800 flamenco recordings no
covers80 cover songs 80 song pairs yes
Cross-Composer 11 composers & piece & key & era & instrumentation 1100 chromagrams and chord labels no
Cross-Era composer & piece & key & era & instrumentation 2000 chromagrams and chord labels no
Da-TACOS cover songs 25000 songs no
DALI aligned notes and lyrics 5358 songs no
DAMP karaoke performances & aligned lyrics & pronunciation assessment 34000 monophonic recordings yes
DEAM valence & arousal 1802 excerpts yes
DEAPDataset valence & arousal & dominance & physiological data 120 music video excerpts no
DREANSS onset times & perc. instruments 18 excerpts yes
DrumPt 4 playing techniques app. 2000 annotations yes (see ENST)
EMO-Soundscapes arousal & valence 1213 soundscape recordings yes
emoMusic arousal & valence 744 excerpts (45s) yes
Emotify induced emotion 400 excerpts yes
EMusic arousal & valence 100 excerpts (experimental music) yes
ENST-Drums onset times & perc. instruments & playing technique 318 segments yes
Extendedballroom 9 genres & tempo &amp 4000 excerpts (30s) downloadable
ExtraSensory 51 context labels 300000 sensor recordings from 60 users yes
ffuhrmann 11 predom. instr. 6951 excerpts/220 songs yes/no
FlaBase editorial & biographical & musicological information on flamenco, 1102 artists & 74 palos & 2860 albums 13311 tracks no
FMA-full 161 genres 106574 songs yes
FMA-large 161 genres 106574 excerpts (30s) yes
FMA-medium 16 genres 25000 excerpts (30s) yes
FMA-small 8 genres 8000 excerpts (30s) yes
Fugue structure & cadences 36 fugues (Bach & Shostakovich no
GiantStepsKey key 604 files no
GiantStepsTempo tempo (alternate) 664 files no
GMD genre & valence & arousal 1400 songs downloadable
GNMID14 timestamp & country 110M music ID matches no
Good-sounds.org 12 instruments, pitch, sound quality 8750 notes yes
GPT 7 guitar playing techniques 6580 clips yes
GSD start/stop of guitar solos 60 songs no
GTZAN 10 genres & tempo & key1 & key2 & beat/downbeat & metrical levels 1000 excerpts (30s) yes
GuitarSet midi & pitch & beat & chords 360 guitar excerpts (30s) with Hainsworth tempo
Hainsworth tempo 245 excerpts (60s) yes
HarmonixSet beats, downbeats, structure 912 pop songs no
HHDS multitrack & style & tempo 18 songs yes
HJDB downbeat 236 excerpts yes
holzapfel:onset onset times 78 excerpts yes
homburg 9 genres 1889 excerpts (10s) yes
IADS valence & arousal & dominance 111 sound snippets yes
Multitrack multitrack & style 12 songs yes
IDMT-SMT-Audio-Effects effects on bass and guitar notes 55044 recordings yes
IDMT-SMT-Bass bass performance styles 4300 excerpts yes
IDMT-SMT-Bass-SINGLE-TRACK style annotated bass lines 17 bass lines (?) yes
IDMT-SMT-Drums onset times & perc. instruments 518 files yes
IDMT-SMT-Guitar 9 guitar playing techniques 4700+400 note events yes
iKala singing voice & background 252 excerpts (30s) yes
INRIA:DSD100 multitrack 100 songs yes
INRIA:EuroVision structure 124 songs no
INRIA:Quaero structure 159 songs no
IRMAS 11 instruments 2874 excerpts yes
ISMIR2004Genre 6 genres 729 excerpts (30s) yes
ISMIR2004Tempo tempo 465 excerpts (20s) yes
Jazz Audio-Aligned Harmony Dataset structure & key & chords & beats 113 songs no
Jamendo-VAD voice activity 61+16+16 songs yes
JGDB multitrack & MIDI random generated excerpts yes
Jordan:Classical structure 15 pieces yes
Jordan:Jazz structure 15 pieces yes
JLSDD symbolic scores 77 duos (Josquin & La Rue) no
LabROSA:APT MIDI 29 piano excerpts yes
LabROSA:MIDI audio & MIDI 4 songs yes
last.fm listening habits 992 users no
LFM-1b listening habits 120000 users no
LIND lyrics-based artist and genre graphs 42802 artists/214 genres no
LMD MIDI & tempo & key 176581 MIDI files no
MAESTRO audio aligned MIDI & velocity & sustain 172 hours of piano yes
magnatagatune similarity 25863 excerpts (30s) yes
MAPS piano notes/chords/pieces & tempo/key 238 pieces yes
MARD album reviews 66566 songs no
MARG-AMT MIDI pitch & onset/offset times 30 melodies yes
MAST vocal performance assessment 1018 performances no
McGill Billboard chords 740 songs no
MDBDrums onset times & perc. instrument & playing technique 23 excerpts yes
Medley-solos-DB 8 instruments 21572 clips (3s) yes
MedleyDB multitrack & genre & melody f0 & instrument activation 122 songs yes
MIR-1K vocal and background 1000 excerpts yes
mirex05Train predominant pitch 13 excerpts yes
mirex06Train tempo & beats 20 excerpts (30s) yes
Mid Level Perceptual Music Features 7 perceptual features 5000 audio files yes
MMTD listening behavior 1086808 tweets no
Modal onset times 71 snippets yes
MOODetector:Bi-Modal lyrics & valence & arousal 133 excerpts yes
MOODetector:Multi-Modal lyrics & MIDI & mood 903 excerpts (30s) yes
moodswings arousal & valence 240 excerpts (30s) no
MozartStringQuartets structure, cadences 32 movements no
MSMD piano notes/chords/pieces, synthetic audio, aligned MIDI, aligned sheet music images, OMR 497 pieces no
MSD meta data & proprietary features 1000000 songs no
MTC phrases & key & meter 18000 melodies partially
MTG-Jamendo tags (genre, instruments, mood) 55000 tracks yes
MTG-QBH title & artist 118 queries/481 songs yes/no
musiclef2012 tags 1355 songs no
MusicMicro music listening patterns 136866 users no
MusicNet pitch and onsets 330 recordings implicitly
NES-MDB multi-track MIDI and aligned audio 5000 songs on request
Nine Inch Nails Multitracks multitrack 66 songs yes
NMED-H EEG 24 trials x 16 excerpts (4.5min) no
NMED-RP EEG 20 trials x 10 excerpts (4.5min) no
NMED-TNaturalistic Music EEG Dataset: EEG 30 trials x 16 excerpts (30sec) no
NSynth instrument and pitch 305979 single notes yes
NUS-48E aligned phonemes 48 pairs of sung and spoken yes
ODB onset times 19 excerpts yes
Onset_Leveau onset times 21 excerpts yes
OpenBMAT 6 classes for music presence 1647 excerpts (60s) yes
OpenMIC-2018 20 instruments 20000 excerpts (10s) yes
Orchset predominant pitch 64 excerpts yes
Phenicx-Anechoic audio & aligned MIDI 4 pieces yes
Phonation pitch & vowel & phonation mode 900 monophonic snippets yes
PlaylistDataset playlists 75262 songs/2840553 transitions no
QBT-Extended taps 3365 queries/51 songs MIDI
QMUL:Beatles structure & key & chords & beats 181 songs no
QMUL:King structure & key & chords 14 songs no
QMUL:MichaelJackson structure 38 songs no
QMUL:MixEvaluation multitrack & mixes 18 songs/180 mixes yes
QMUL:Queen structure/key & chords 51/31 songs no
QMUL:RSS structure 60 songs no
QMUL:Zweieck structure & key & chords & beats 18 songs no
QUASI multitrack 11 songs yes
Robbie Williams Annotations (Zanoni-Giorgi) chords & keys & beats 65 songs no
RockCorpus chords & melody & bars 200 songs no
RWC lyrics & 10 genre & 50 instruments & chords & structure & aligned MIDI 115 songs/50 classical/100 songs yes
SALAMI structure 1447 songs no
Sargon structure 4 songs yes
Semantic Artist Similarity artist biographies & similarity 268+2336artists no
Schenker MusicXML & Schenker analysis 41 pieces no
EEG-Recorded Responses to Short Chord Progressions EEG 108/648 trials x 12 stimuli (5s) yes
SDD start of samples 80 songs & 80 samples no
SEILS scores in different symbolic formats 30 madrigals no
Seyerlehner:1517-Artists 19 genres 3180 songs yes
Seyerlehner:Annotated 19 genres 190 songs yes
Seyerlehner:Pop tempo 1105 songs yes
Seyerlehner:Unique 14 genres 3115 excerpts (30s) yes
SHS100K cover songs ca. 10,000 songs with 100,000 tracks no
SISEC multitrack & mix 5 excerpts yes
Slakh synthesized audio and mixes 2100 mixes yes
SMC:MIREX tempo & beat positions 217 excerpts yes
SMD audio & aligned MIDI 50 recordings yes
SoundTracks valence & energy & tension & mood 360+110 excerpts yes
SPAM structure 50 songs no
Shazam Research Dataset: Offsets in-song query times 188M queries over 20 songs no
Su-AMT onset times & pitch 10 excerpts yes
TextureStringQuartets texture 11 movements no
Traditional Flute Dataset audio & aligned MIDI 30 excerpts yes
ThisIsMyJam favorite songs & artists 131k users no
TONAS pitch 72 single-voiced excerpts yes
TPD popularity rating 23385 songs no
Tunebot title & artist 10000 queries/? songs yes/no
UIOWA:MIS single instrument notes many yes
UMA-Piano piano chords 275040 recordings yes
UnmixDB DJ mix parameters 37 playlists yes
URBAN-SED 9 event classes 10000 recordings yes
UrbanSound8k 10 event classes 8732 slices yes
URMP score-aligned video and audio 44 recordings yes
uspop2002 tags & genre & chords 8752 songs no
VocalSet 17 vocal techniques 3560 recordings yes
YousicianUkulele evaluated notes and chords 500000 exercises by 1000 users no
view rawdata-sets.md hosted with ❤ by GitHub
posted @   AHU-WangXiao  阅读(519)  评论(0编辑  收藏  举报
· 软件产品开发中常见的10个问题及处理方法
· .NET 原生驾驭 AI 新基建实战系列:向量数据库的应用与畅想
· 从问题排查到源码分析:ActiveMQ消费端频繁日志刷屏的秘密
· 一次Java后端服务间歇性响应慢的问题排查记录
· dotnet 源代码生成器分析器入门
· ThreeJs-16智慧城市项目(重磅以及未来发展ai)
· .NET 原生驾驭 AI 新基建实战系列(一):向量数据库的应用与畅想
· Browser-use 详细介绍&使用文档
· 软件产品开发中常见的10个问题及处理方法
· Vite CVE-2025-30208 安全漏洞