Funded by
|
|
|
Report on the International Symposium
on Music Information Retrieval (ISMIR’2003)
26-30 October 2003
Mark Plumbley
Queen Mary, University of London
About 150 people attended the International Symposium
on Music Information Retrieval (ISMIR’2003) conference
in Baltimore, Maryland, USA.
Sunday 26 October was tutorial day: I attended the
tutorial by Arbee Chen (National Tsing Hua University)
on "Music Retrieval and Analysis". He talked
about different music representations and how different
search mechanisms could be used to perform exact or
approximate matches to musical pieces. He discussed
how index systems, such as tree-based and list-based
and N-gram, could be used to perform lookup given a
substring, and talked about similarity measures such
as edit distance. He talked about evaluation of retrieval
systems, and outlined their Ultima Project for evalution.
He talked about some approaches to music structure
analysis, including local boundary detection, discovery
of repeating patterns, and automatic phrase extraction.
He finished off with an outline of music recommendation
systems, and suggested some future research directions
for the field.
On the Monday, Tony Seeger (Ethnomusicology, UCLA)
gave a wide-ranging introductory talk about different
sorts of “musics” that we could be dealing
with, emphasizing that we shouldn’t restrict
our attention to just one tradition, such as western
music. He also talked about ethical and legal issues,
particularly when these are not the same: for example,
a traditional native song may not be covered by “copyright” in
the same way as published western music, but nevertheless
it may still not be ethical to use it without appropriate
permission, especially since these may have a particular
cultural significance to their originators.
In the evaluation session, Stephen Downie talked about
the problem of evaluation – scientific comparison
of competing techniques – and how a TREC-like
evaluation framework for music information retrieval
is being constructed. A large repository of musical
audio is to be held at NCSA, with algorithms run on
the NCSA supercomputers, to avoid copyright problems
with releasing the music files themselves. (see music-ir.org/evaluation).
Roger Dannenberg presented the MUSART testbed for query-by-humming
(QBH) evaluation. He introduced the idea of a “collection” as
a specified list of items that take part in a particular
experiment, so that a given experiment can be repeated
exactly, even if new items have been added to the database.
In
the afternoon was a visit to the Library of Congress
in nearby Washington, DC. We saw some of the work they
are undertaking on collecting historical American music
in sheet and audio formats, and new initiatives to
digitally record and store Congress committee meetings.
An evening reception followed in the older Library
of Congress buildings, perhaps the most intricately
decorated of the government building in the DC area.
On Tuesday, in the query-by-voice (QBV) session, Colin
Meek discussed the various sources of errors that may
be introduced in a voiced query, such as poor memory,
poor performance, and recording and transcription errors,
questioning whether a single “error” model
is sufficient when accurate results are required from
a QBV system. Both Steffen Pauws (Philips) and Koen
Tonghe & Micheline Lesaffre (IPEM) presented work
on measuring the accuracy with which people sing or
hum tunes, depending on familiarity and expertise.
Alexantra Uitdenbogerd, in the Music Perception and
Cognition session, measured the ability of musicians
and non-musicians to produce Parsons-like “Same/Up/Down” queries,
concluding that these contour-based queries were not
useful for non-musicians. Oliver Lartillot discussed
the difficulty of musical pattern discovery, using
strategies based on inter-pitch interval and inter-onset
interval ratios.
After lunch, Steve Griffin from the US National Science
Foundation gave some background to the funding issues
in music information retrieval. Some of this work,
such as the collaborative OMRAS project, has been funded
through the NSF Digital Libraries Initiative. Future
funding in this are may come under a more regular footing,
rather than occasional “Initiatives”.
The only set of parallel sessions was (rather unfortunately
for me) Music Similarity in parallel with Music Analysis
1.
In the first part of the former, Adam Berenzweig (LabROSA,
Columbia) discussed evaluation of similarity measures.
To avoid “copyright hell” while sharing
datasets, they propose the sharing of features (such
as MFCCs) rather than raw audio. In the other session,
Jana Eggink (Sheffield) presented work on musical instrument
recognition, using an interesting “bounded marginalization” technique
to cope with polyphonic audio.
After the special session on Digital Sheet Music,
the poster session included many interesting papers
(in too short a time to see them all!), such as Masataka
Goto’s posters on chorus recognition, and the
RWC music database and Jeffrey Pickens’ poster
on “backing-off” techniques for probabilistic
models of harmonic transition models.
The final day was opened by Avery Wang, talking about
the Shazam “query-by-mobile-phone” system.
Able to recognize noisy music samples from a database
of currently 1.8million tracks, his talk finished with
a live demonstration of the system in London texting
back the result to his mobile phone. The following
two speaksers discussed systems for using Dynamic Programming
methods for aligning audio and scores. Rob Turetsky
(Columbia) discussed the problems of using MIDI transcriptions,
widely available on the web, but typically made by
amateurs. G. Peters (for the IRCAM authors) talked
about the use of comb filters for note matching, and
the use of separate attack, spectral, and silence models.
Martin McKinney compared features such as MFCCs and
psychoacoustical properties (such as “roughness”)
for audio and music classification, concluding that
parameters based on an auditory model gives better
performance. In contrast, Simon Dixon considered classification
of ballroom dance music from their rhythmic patterns,
noting that most dances have a restricted tempo range
so the patterns can be quite well-defined. Wei-Ho Tsui
clustered songs based on vocal characteristics, using
an EM algorithm method to estimate parameters of a
voice model from passages with both voice and accompaniment.
In the afternoon, Chris Raphael presented an HMM-based
probabilistic system for harmonic analysis, arguing
that the assumptions inherent in the HMM are suitable
here, due to the tendency of keys and chords to persist.
Dan Ellis argued for transcription of audio to chords
rather than notes, since this is probably easier than
note transcription, and may be more relevant to listeners.
He discussed a method inspired by speech word recognition,
using the EM algorithm to label audio using just a
chord sequence rather than specific timing and duration
of each chord.
In the final presentation session of the conference,
Kjell Lemstrom presented work on geometric algorithms
for query matching, based on finding time and pitch
transposition vectors; Elias Pampalk talked about exporation
of music collections using their “Islands of
Music” system based on the Self-Organizing Map
neural network; and George Tzanetakis discussed how
to make peer-to-peer systems for music content retrieval
scalable, using load balancing and query replication.
The Thursday morning was devoted to panel sessions:
I attended the MIR in devices session, chaired by Ty
Roberts from Gracenote, Inc. The panel, mainly concerned
with metadata and copyright issues, included Trev Huxley
(Muse), Ton Kalker (Philips), and Juergen Herre (Fraunhofer
IIS). Following the end of that panel, the Evaluation
panel continued, finishing with talks by Josh Reiss
on audio formats, and Perry Roland on symbolic notation.
The conference papers and tutorials are likely to
appear on the web: see the MUSIC-IR email list for
further information. [Update 20 Nov 2003: For online
papers and presentations see ISMIR
2003 - Presentations]
Mark Plumbley
4 Nov 2003
|