Digital Music Research Network

Digital Music Research Network

EPSRC Network GR/R64810/01

Funded by
Engineering and Physical Sciences Research Council

Report on the International Symposium on Music Information Retrieval (ISMIR’2003)
26-30 October 2003

Mark Plumbley
Queen Mary, University of London

About 150 people attended the International Symposium on Music Information Retrieval (ISMIR’2003) conference in Baltimore, Maryland, USA.

Arbee ChenSunday 26 October was tutorial day: I attended the tutorial by Arbee Chen (National Tsing Hua University) on "Music Retrieval and Analysis". He talked about different music representations and how different search mechanisms could be used to perform exact or approximate matches to musical pieces. He discussed how index systems, such as tree-based and list-based and N-gram, could be used to perform lookup given a substring, and talked about similarity measures such as edit distance. He talked about evaluation of retrieval systems, and outlined their Ultima Project for evalution. He talked about some approaches to music structure analysis, including local boundary detection, discovery of repeating patterns, and automatic phrase extraction. He finished off with an outline of music recommendation systems, and suggested some future research directions for the field.

Tony SeegerOn the Monday, Tony Seeger (Ethnomusicology, UCLA) gave a wide-ranging introductory talk about different sorts of “musics” that we could be dealing with, emphasizing that we shouldn’t restrict our attention to just one tradition, such as western music. He also talked about ethical and legal issues, particularly when these are not the same: for example, a traditional native song may not be covered by “copyright” in the same way as published western music, but nevertheless it may still not be ethical to use it without appropriate permission, especially since these may have a particular cultural significance to their originators.

In the evaluation session, Stephen Downie talked about the problem of evaluation – scientific comparison of competing techniques – and how a TREC-like evaluation framework for music information retrieval is being constructed. A large repository of musical audio is to be held at NCSA, with algorithms run on the NCSA supercomputers, to avoid copyright problems with releasing the music files themselves. (see Roger Dannenberg presented the MUSART testbed for query-by-humming (QBH) evaluation. AV Studio in Library of CongressHe introduced the idea of a “collection” as a specified list of items that take part in a particular experiment, so that a given experiment can be repeated exactly, even if new items have been added to the database.

Reception in Library of CongressIn the afternoon was a visit to the Library of Congress in nearby Washington, DC. We saw some of the work they are undertaking on collecting historical American music in sheet and audio formats, and new initiatives to digitally record and store Congress committee meetings. An evening reception followed in the older Library of Congress buildings, perhaps the most intricately decorated of the government building in the DC area.

Elanor Selfridge-Field posing a question to Alexandra UitdenbogerdOn Tuesday, in the query-by-voice (QBV) session, Colin Meek discussed the various sources of errors that may be introduced in a voiced query, such as poor memory, poor performance, and recording and transcription errors, questioning whether a single “error” model is sufficient when accurate results are required from a QBV system. Both Steffen Pauws (Philips) and Koen Tonghe & Micheline Lesaffre (IPEM) presented work on measuring the accuracy with which people sing or hum tunes, depending on familiarity and expertise. Alexantra Uitdenbogerd, in the Music Perception and Cognition session, measured the ability of musicians and non-musicians to produce Parsons-like “Same/Up/Down” queries, concluding that these contour-based queries were not useful for non-musicians. Oliver Lartillot discussed the difficulty of musical pattern discovery, using strategies based on inter-pitch interval and inter-onset interval ratios.

After lunch, Steve Griffin from the US National Science Foundation gave some background to the funding issues in music information retrieval. Some of this work, such as the collaborative OMRAS project, has been funded through the NSF Digital Libraries Initiative. Future funding in this are may come under a more regular footing, rather than occasional “Initiatives”.

The only set of parallel sessions was (rather unfortunately for me) Music Similarity in parallel with Music Analysis 1. In the first part of the former, Adam Berenzweig (LabROSA, Columbia) discussed evaluation of similarity measures. To avoid “copyright hell” while sharing datasets, they propose the sharing of features (such as MFCCs) rather than raw audio. In the other session, Jana Eggink (Sheffield) presented work on musical instrument recognition, using an interesting “bounded marginalization” technique to cope with polyphonic audio.

Poster SessionAfter the special session on Digital Sheet Music, the poster session included many interesting papers (in too short a time to see them all!), such as Masataka Goto’s posters on chorus recognition, and the RWC music database and Jeffrey Pickens’ poster on “backing-off” techniques for probabilistic models of harmonic transition models.

The final day was opened by Avery Wang, talking about the Shazam “query-by-mobile-phone” system. Able to recognize noisy music samples from a database of currently 1.8million tracks, his talk finished with a live demonstration of the system in London texting back the result to his mobile phone. The following two speaksers discussed systems for using Dynamic Programming methods for aligning audio and scores. Rob Turetsky (Columbia) discussed the problems of using MIDI transcriptions, widely available on the web, but typically made by amateurs. G. Peters (for the IRCAM authors) talked about the use of comb filters for note matching, and the use of separate attack, spectral, and silence models.

Martin McKinney compared features such as MFCCs and psychoacoustical properties (such as “roughness”) for audio and music classification, concluding that parameters based on an auditory model gives better performance. In contrast, Simon Dixon considered classification of ballroom dance music from their rhythmic patterns, noting that most dances have a restricted tempo range so the patterns can be quite well-defined. Wei-Ho Tsui clustered songs based on vocal characteristics, using an EM algorithm method to estimate parameters of a voice model from passages with both voice and accompaniment.

In the afternoon, Chris Raphael presented an HMM-based probabilistic system for harmonic analysis, arguing that the assumptions inherent in the HMM are suitable here, due to the tendency of keys and chords to persist. Dan Ellis argued for transcription of audio to chords rather than notes, since this is probably easier than note transcription, and may be more relevant to listeners. He discussed a method inspired by speech word recognition, using the EM algorithm to label audio using just a chord sequence rather than specific timing and duration of each chord.

Ty Roberts, Trev Huxley, Ton Kalker and Juergen Herre on the Music Information Retrieval in Devices panelIn the final presentation session of the conference, Kjell Lemstrom presented work on geometric algorithms for query matching, based on finding time and pitch transposition vectors; Elias Pampalk talked about exporation of music collections using their “Islands of Music” system based on the Self-Organizing Map neural network; and George Tzanetakis discussed how to make peer-to-peer systems for music content retrieval scalable, using load balancing and query replication.

Josh Reiss and Mark Sandler in the Evaluation panelThe Thursday morning was devoted to panel sessions: I attended the MIR in devices session, chaired by Ty Roberts from Gracenote, Inc. The panel, mainly concerned with metadata and copyright issues, included Trev Huxley (Muse), Ton Kalker (Philips), and Juergen Herre (Fraunhofer IIS). Following the end of that panel, the Evaluation panel continued, finishing with talks by Josh Reiss on audio formats, and Perry Roland on symbolic notation.

The conference papers and tutorials are likely to appear on the web: see the MUSIC-IR email list for further information. [Update 20 Nov 2003: For online papers and presentations see ISMIR 2003 - Presentations]

Mark Plumbley
4 Nov 2003