public datasets for music information retrieval and recommendation tasks

#nowplaying dataset

#nowplaying is a data set which leverages social media for the creation of a diverse and constantly updated data set describing the music listening behavior of users. For the creation of the data set we rely on Twitter which is frequently facilitated to post which music the respective user is currently listening to. From such tweets, we extract track and artist information and further metadata.

Currently, the dataset contains 74,128,974 listening events, 1,537,602 tracks and 2,381,400 users.

Download a CSV representation of the dataset here (last updated: 2018-08-10 06:11:42 (CEST))

If you make use of our dataset, get further details or want to refer to it, please cite the following paper:

Zangerle, Eva; Pichl, Martin; Gassler, Wolfgang; Specht, Günther

#nowplaying Music Dataset: Extracting Listening Behavior from Twitter Inproceedings

Proceedings of the 1st ACM International Workshop on Internet-Scale Multimedia Management, pp. 21–26, ACM, Orlando, Florida, USA, 2014.

Abstract | BibTeX | Links:

playlists dataset

The playlist dataset is based on the subset of users in the #nowplaying dataset who publish their #nowplaying tweets via Spotify. However, this dataset is based on the user playlists of these users. A description of the generation of the dataset and the dataset itself can be found in the following paper.

Download a CSV representation of the playlist dataset here (last updated: 2015-12-31 00:00:00 (CET)).

If you If you make use of our dataset, get further details or want to refer to it, please cite the following paper:

Pichl, Martin; Zangerle, Eva; Specht, Günther

Towards a Context-Aware Music Recommendation Approach: What is Hidden in the Playlist Name? Inproceedings

15th IEEE International Conference on Data Mining Workshops (ICDM 2015), pp. 1360–1365, IEEE, Atlantic City, 2015.

Abstract | BibTeX | Links:

#nowplaying-RS dataset

The #nowplaying-RS dataset features context- and content features of listening events. It contains 11.6 million music listening events of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, as well as timestamps of the listening events. Moreover, some of the user context features imply the cultural origin of the users, and some others—like hashtags—give clues to the emotional state of a user underlying a listening event.

Feel free to download the dataset and also training and test splits. Also, you can find reference implementations of the conducted experiments on Asmita’s GitHub repository.