A list of datasets directly related to Music Information Retrieval Datasets
A list of datasets directly related to Music Information Retrieval Datasets
2019-12-27 13:08:08
Source: https://www.audiocontentanalysis.org/data-sets/
This is yet another attempt of maintaining a list of datasets directly related to MIR. Other lists that I have found are this wiki, the ISMIR page, this web page, and this web page. If you are interested in speech processing, you can find a table of speech datasets on this page. If you are interested in multi-tracks, the Open Multitrack Testbed should be a good starting point. UPF also has an excellent page with datasets for world-music, including Indian art music, Turkish Makam music, and Beijing Opera. A curated list of MIDI sources can be found here. Two additional general resources are piano-midi.de for MIDI files and freesound.org for audio files.
If you know of other data sets that should be included in this list and eventually in the book please send me a note or post a comment.
DATASET | META DATA | CONTENTS | WITH AUDIO |
---|---|---|---|
200DrumMachines | 7371 one-shots | yes | |
ACM_MIRUM | tempo | 1410 excerpts (60s) | yes |
AcousticBrainz-Genre | 15-31 genres with 265-745 subgenres | audio features for over 2000000 songs | no |
ADC2004 | predominant pitch | 20 excerpts | yes |
Acoustic Event Dataset | 28 event classes | 5223 audio snippets | yes |
Amg1608 | valence & arousal | 1608 excerpts (30s) | no |
AMT-pilot | structure by multiple annotators | 8 songs | yes |
APL | piano practice | 620 segments | yes |
artist20 | 20 artists | 1413 songs | no |
AudioSet | 632 event classes | 2084320 clips (10s) | no |
bach10 | multitrack & aligned MIDI | 10 chorales | yes |
ballroom | 8 genres & tempo & (down-)beats | 698 excerpts (30s) | yes |
beatboxset1 | percussion annotation | 14 clips | yes |
BPS-FH | functional annotation | 32 sonatas | no |
C224a | 14 genres | 224 artists | no |
C3ka | 18 genres | 3000 artists | no |
C49ka-C111ka | genres | 48800/110588 artists | no |
CAL10k | tags | 10870 songs | no |
CAL500 | tags | 502 songs | yes |
CarnaticRhythm | sama & beats | 176 pieces | on request |
CASD | chords by 4 annotators | 50 songs | no |
CCMixter | vocal & background track | 50 mixes | yes |
Chopin22 | aligned MIDI | 44 recordings | yes |
Clotho | 5 descriptive captions | 4981 snippets | yes |
CMMSD | note/rest/transition & onsets & vibrato | 36 excerpts | no |
Coidach | 55 genres | 26420 songs | no |
corpusCOFLA | editorial & predominant melody | 1800 flamenco recordings | no |
covers80 | cover songs | 80 song pairs | yes |
Cross-Composer | 11 composers & piece & key & era & instrumentation | 1100 chromagrams and chord labels | no |
Cross-Era | composer & piece & key & era & instrumentation | 2000 chromagrams and chord labels | no |
Da-TACOS | cover songs | 25000 songs | no |
DALI | aligned notes and lyrics | 5358 songs | no |
DAMP | karaoke performances & aligned lyrics & pronunciation assessment | 34000 monophonic recordings | yes |
DEAM | valence & arousal | 1802 excerpts | yes |
DEAPDataset | valence & arousal & dominance & physiological data | 120 music video excerpts | no |
DREANSS | onset times & perc. instruments | 18 excerpts | yes |
DrumPt | 4 playing techniques | app. 2000 annotations | yes (see ENST) |
EMO-Soundscapes | arousal & valence | 1213 soundscape recordings | yes |
emoMusic | arousal & valence | 744 excerpts (45s) | yes |
Emotify | induced emotion | 400 excerpts | yes |
EMusic | arousal & valence | 100 excerpts (experimental music) | yes |
ENST-Drums | onset times & perc. instruments & playing technique | 318 segments | yes |
Extendedballroom | 9 genres & tempo & | 4000 excerpts (30s) | downloadable |
ExtraSensory | 51 context labels | 300000 sensor recordings from 60 users | yes |
ffuhrmann | 11 predom. instr. | 6951 excerpts/220 songs | yes/no |
FlaBase | editorial & biographical & musicological information on flamenco, 1102 artists & 74 palos & 2860 albums | 13311 tracks | no |
FMA-full | 161 genres | 106574 songs | yes |
FMA-large | 161 genres | 106574 excerpts (30s) | yes |
FMA-medium | 16 genres | 25000 excerpts (30s) | yes |
FMA-small | 8 genres | 8000 excerpts (30s) | yes |
Fugue | structure & cadences | 36 fugues (Bach & Shostakovich | no |
GiantStepsKey | key | 604 files | no |
GiantStepsTempo | tempo (alternate) | 664 files | no |
GMD | genre & valence & arousal | 1400 songs | downloadable |
GNMID14 | timestamp & country | 110M music ID matches | no |
Good-sounds.org | 12 instruments, pitch, sound quality | 8750 notes | yes |
GPT | 7 guitar playing techniques | 6580 clips | yes |
GSD | start/stop of guitar solos | 60 songs | no |
GTZAN | 10 genres & tempo & key1 & key2 & beat/downbeat & metrical levels | 1000 excerpts (30s) | yes |
GuitarSet | midi & pitch & beat & chords | 360 guitar excerpts (30s) with Hainsworth | tempo |
Hainsworth | tempo | 245 excerpts (60s) | yes |
HarmonixSet | beats, downbeats, structure | 912 pop songs | no |
HHDS | multitrack & style & tempo | 18 songs | yes |
HJDB | downbeat | 236 excerpts | yes |
holzapfel:onset | onset times | 78 excerpts | yes |
homburg | 9 genres | 1889 excerpts (10s) | yes |
IADS | valence & arousal & dominance | 111 sound snippets | yes |
Multitrack | multitrack & style | 12 songs | yes |
IDMT-SMT-Audio-Effects | effects on bass and guitar notes | 55044 recordings | yes |
IDMT-SMT-Bass | bass performance styles | 4300 excerpts | yes |
IDMT-SMT-Bass-SINGLE-TRACK | style annotated bass lines | 17 bass lines (?) | yes |
IDMT-SMT-Drums | onset times & perc. instruments | 518 files | yes |
IDMT-SMT-Guitar | 9 guitar playing techniques | 4700+400 note events | yes |
iKala | singing voice & background | 252 excerpts (30s) | yes |
INRIA:DSD100 | multitrack | 100 songs | yes |
INRIA:EuroVision | structure | 124 songs | no |
INRIA:Quaero | structure | 159 songs | no |
IRMAS | 11 instruments | 2874 excerpts | yes |
ISMIR2004Genre | 6 genres | 729 excerpts (30s) | yes |
ISMIR2004Tempo | tempo | 465 excerpts (20s) | yes |
Jazz Audio-Aligned Harmony Dataset | structure & key & chords & beats | 113 songs | no |
Jamendo-VAD | voice activity | 61+16+16 songs | yes |
JGDB | multitrack & MIDI | random generated excerpts | yes |
Jordan:Classical | structure | 15 pieces | yes |
Jordan:Jazz | structure | 15 pieces | yes |
JLSDD | symbolic scores | 77 duos (Josquin & La Rue) | no |
LabROSA:APT | MIDI | 29 piano excerpts | yes |
LabROSA:MIDI | audio & MIDI | 4 songs | yes |
last.fm | listening habits | 992 users | no |
LFM-1b | listening habits | 120000 users | no |
LIND | lyrics-based artist and genre graphs | 42802 artists/214 genres | no |
LMD | MIDI & tempo & key | 176581 MIDI files | no |
MAESTRO | audio aligned MIDI & velocity & sustain | 172 hours of piano | yes |
magnatagatune | similarity | 25863 excerpts (30s) | yes |
MAPS | piano notes/chords/pieces & tempo/key | 238 pieces | yes |
MARD | album reviews | 66566 songs | no |
MARG-AMT | MIDI pitch & onset/offset times | 30 melodies | yes |
MAST | vocal performance assessment | 1018 performances | no |
McGill Billboard | chords | 740 songs | no |
MDBDrums | onset times & perc. instrument & playing technique | 23 excerpts | yes |
Medley-solos-DB | 8 instruments | 21572 clips (3s) | yes |
MedleyDB | multitrack & genre & melody f0 & instrument activation | 122 songs | yes |
MIR-1K | vocal and background | 1000 excerpts | yes |
mirex05Train | predominant pitch | 13 excerpts | yes |
mirex06Train | tempo & beats | 20 excerpts (30s) | yes |
Mid Level Perceptual Music Features | 7 perceptual features | 5000 audio files | yes |
MMTD | listening behavior | 1086808 tweets | no |
Modal | onset times | 71 snippets | yes |
MOODetector:Bi-Modal | lyrics & valence & arousal | 133 excerpts | yes |
MOODetector:Multi-Modal | lyrics & MIDI & mood | 903 excerpts (30s) | yes |
moodswings | arousal & valence | 240 excerpts (30s) | no |
MozartStringQuartets | structure, cadences | 32 movements | no |
MSMD | piano notes/chords/pieces, synthetic audio, aligned MIDI, aligned sheet music images, OMR | 497 pieces | no |
MSD | meta data & proprietary features | 1000000 songs | no |
MTC | phrases & key & meter | 18000 melodies | partially |
MTG-Jamendo | tags (genre, instruments, mood) | 55000 tracks | yes |
MTG-QBH | title & artist | 118 queries/481 songs | yes/no |
musiclef2012 | tags | 1355 songs | no |
MusicMicro | music listening patterns | 136866 users | no |
MusicNet | pitch and onsets | 330 recordings | implicitly |
NES-MDB | multi-track MIDI and aligned audio | 5000 songs | on request |
Nine Inch Nails Multitracks | multitrack | 66 songs | yes |
NMED-H | EEG | 24 trials x 16 excerpts (4.5min) | no |
NMED-RP | EEG | 20 trials x 10 excerpts (4.5min) | no |
NMED-TNaturalistic Music EEG Dataset: | EEG | 30 trials x 16 excerpts (30sec) | no |
NSynth | instrument and pitch | 305979 single notes | yes |
NUS-48E | aligned phonemes | 48 pairs of sung and spoken | yes |
ODB | onset times | 19 excerpts | yes |
Onset_Leveau | onset times | 21 excerpts | yes |
OpenBMAT | 6 classes for music presence | 1647 excerpts (60s) | yes |
OpenMIC-2018 | 20 instruments | 20000 excerpts (10s) | yes |
Orchset | predominant pitch | 64 excerpts | yes |
Phenicx-Anechoic | audio & aligned MIDI | 4 pieces | yes |
Phonation | pitch & vowel & phonation mode | 900 monophonic snippets | yes |
PlaylistDataset | playlists | 75262 songs/2840553 transitions | no |
QBT-Extended | taps | 3365 queries/51 songs | MIDI |
QMUL:Beatles | structure & key & chords & beats | 181 songs | no |
QMUL:King | structure & key & chords | 14 songs | no |
QMUL:MichaelJackson | structure | 38 songs | no |
QMUL:MixEvaluation | multitrack & mixes | 18 songs/180 mixes | yes |
QMUL:Queen | structure/key & chords | 51/31 songs | no |
QMUL:RSS | structure | 60 songs | no |
QMUL:Zweieck | structure & key & chords & beats | 18 songs | no |
QUASI | multitrack | 11 songs | yes |
Robbie Williams Annotations (Zanoni-Giorgi) | chords & keys & beats | 65 songs | no |
RockCorpus | chords & melody & bars | 200 songs | no |
RWC | lyrics & 10 genre & 50 instruments & chords & structure & aligned MIDI | 115 songs/50 classical/100 songs | yes |
SALAMI | structure | 1447 songs | no |
Sargon | structure | 4 songs | yes |
Semantic Artist Similarity | artist biographies & similarity | 268+2336artists | no |
Schenker | MusicXML & Schenker analysis | 41 pieces | no |
EEG-Recorded Responses to Short Chord Progressions | EEG | 108/648 trials x 12 stimuli (5s) | yes |
SDD | start of samples | 80 songs & 80 samples | no |
SEILS | scores in different symbolic formats | 30 madrigals | no |
Seyerlehner:1517-Artists | 19 genres | 3180 songs | yes |
Seyerlehner:Annotated | 19 genres | 190 songs | yes |
Seyerlehner:Pop | tempo | 1105 songs | yes |
Seyerlehner:Unique | 14 genres | 3115 excerpts (30s) | yes |
SHS100K | cover songs | ca. 10,000 songs with 100,000 tracks | no |
SISEC | multitrack & mix | 5 excerpts | yes |
Slakh | synthesized audio and mixes | 2100 mixes | yes |
SMC:MIREX | tempo & beat positions | 217 excerpts | yes |
SMD | audio & aligned MIDI | 50 recordings | yes |
SoundTracks | valence & energy & tension & mood | 360+110 excerpts | yes |
SPAM | structure | 50 songs | no |
Shazam Research Dataset: Offsets | in-song query times | 188M queries over 20 songs | no |
Su-AMT | onset times & pitch | 10 excerpts | yes |
TextureStringQuartets | texture | 11 movements | no |
Traditional Flute Dataset | audio & aligned MIDI | 30 excerpts | yes |
ThisIsMyJam | favorite songs & artists | 131k users | no |
TONAS | pitch | 72 single-voiced excerpts | yes |
TPD | popularity rating | 23385 songs | no |
Tunebot | title & artist | 10000 queries/? songs | yes/no |
UIOWA:MIS | single instrument notes | many | yes |
UMA-Piano | piano chords | 275040 recordings | yes |
UnmixDB | DJ mix parameters | 37 playlists | yes |
URBAN-SED | 9 event classes | 10000 recordings | yes |
UrbanSound8k | 10 event classes | 8732 slices | yes |
URMP | score-aligned video and audio | 44 recordings | yes |
uspop2002 | tags & genre & chords | 8752 songs | no |
VocalSet | 17 vocal techniques | 3560 recordings | yes |
YousicianUkulele | evaluated notes and chords | 500000 exercises by 1000 users | no |