Centre for Digital Music


IEEE AASP Challenge:
Detection and Classification of Acoustic Scenes and Events

- Dr Dan Stowell
- Dr Emmanouil Benetos

Challenge Description

NEW: We have uploaded all of the data sets to the Internet Archive, INCLUDING the previously-unreleased private testing data. Available: DCASE Challenge Datasets

For more detailed information on the challenges, please refer to the proposal document.

Below you will find all tasks of the challenge with a short description, their specifications and some sample files. Please, bare in mind that as the tasks are currently in development the samples only serve to provide a first impression of how the tasks will sound like and they might not necessarily be similar to the final ones.

Scene Classification

The scene classification (SC) challenge will address the problem of identifying and classifying acoustic scenes and soundscapes.

The dataset for the scene classification task will consist of 30sec recordings of various acoustic scenes. The dataset will consist of 2 parts each made up of 6 audio recordings for each scene (class). The one will be sent out to the participants as a development set and the second will be kept secret and used for the train/test scene classification task. The list of scenes is: busy street, quiet street, Park, open-air market, bus, subway-train, restaurant, shop/supermarket, office, subway station.

The recording device used for the task is a set of Soundman binaural microphones specifically made so that they imitate a pair of in-ear headphones that the user can wear. The proposed specifications for the recordings are: PCM, 44100 Hz, 16 bit (CD quality).

Public Dataset

Private Dataset

Event Detection

The event detection challenge will address the problem of identifying individual sound events that are prominent in an acoustic scene. Two distinct experiments will take, one for simple acoustic scenes without overlapping sounds and the other using complex scenes in a polyphonic scenario. Three datasets will be used for the task.

Subtask 1 - OL

The first dataset for event detection will consist of 3 subsets (for development, training, and testing). The training set will contain instantiations of individual events for every class. The developement and testing datasets, denoted as office live (OL), will consist of 1 min recordings of every-day audio events in a number of office environments . The audio events for these recordings will be annotated and they will include: door knock, door slam, speech, laughter, keyboard clicks, objects hitting table, keys clinging, phone ringing, turning page, cough, printer, short alert-beeping, clearing throat, mouse click, drawer, and switches.

Training dataset (isolated events)

Development dataset (event sequences)

Test dataset (event sequences)

Subtask 2 - OS

The second dataset will contain artificially sequenced sounds provided by the Analysis-Synthesis team of IRCAM, termed Office Synthetic (OS). The training set will be identical to the one for the first dataset. The development and testing sets will consist of artificial scenes built by sequencing recordings of individual events (different recordings from the ones used for the training dataset) and background recordings provided by C4DM.

Training dataset (isolated events)

Development dataset (event sequences)

Test dataset (event sequences)

Back to main challenge page