Digital Music Research Network

Audio Gets Smart - The What and Why of Semantic Audio Analysis

13 October 2003, New York, NY
Co-sponsorship of workshop at AES 115th Convention

Chair Mark Sandler, Queen Mary, University of London, UK
Panelists: Michael Casey, City University, London, UK
Dan Ellis, Columbia University, New York, USA
Juergen Herre, Fraunhofer Institute (FhG-IIS), Erlangen, Germany

In this workshop, three leading international experts offered a personal view of the technologies and opportunities brought to audio engineering by Semantic Audio Analysis. The new AES Technical Committee on Semantic Audio Analysis has been established to represent this emerging area, and has as one of its initial aims, the goal of promoting SAA within the Audio Engineering community. In a strict sense, Semantic Audio Analysis means the extraction of features from audio (live or recorded) that have some human relevance - rhythm, notes, phrases, or have some physical correlate - instrument, moving vehicle, singing bird. This constitutes a form of 'technical metadata' which can accompany a recording or broadcast. It is different, but complementary to human-entered metadata. Thus metadata is an important element of Semantic Audio Analysis, and our experts for this workshop cover both the extraction of features and their semantic representation. The workshop highlighted examples where SAA can supplement all our interactions with music and audio to provide new work and recreational experiences.

Mark Sandler writes:

On Monday 13 October 2003, the Technical Committee on Semantic Audio Analysis of the Audio Engineering Society (AES) held its first formal event which was a workshop at the 115th Convention of the AES in New York at the Jacob Javitz Center. The workshop was generously supported by the DMRN with travel expenses for the Chair, Mark Sandler, and one of the speakers, Michael Casey.

This workshop, entitled "Audio Gets Smart - The What and Why of Semantic Audio Analysis" attracted about 40 people to listen to the short talks by Dan Ellis, Michael Casey and Juergen Herre each give their personal perspective on what is meant by Semantic Audio and where it may be useful. After the presentations, there was about 30 minutes of lively questions and answers.

The TC's next activities will be at the 116th Convention of AES in Berlin in May 2004. Subject to formal approval, we expect both a workshop to focus on applications of MPEG7 and a special papers session on some of the signal processing techniques of relevance to semantic audio.

