Report on SARC Research Workshop on Music Informatics and Cognition
29 May, 2004
Sonic Arts Research Centre (SARC),
Queen's University
of Belfast
Report by:
Dave Meredith (
Centre for Computational Creativity,
City University, London)
The
SARC Research Workshop on Music Informatics and Cognition was
one of the events in the programme of the 2004 Sonorities Festival of
Contemporary Music. This festival marked the opening of
the Sonic Arts Research Centre
(SARC) at Queens University of
Belfast.
The workshop was organised by Christina
Anagnostopoulou and consisted of four lectures given by Alan Marsden, Henkjan
Honing, Mark
Steedman and François Pachet.
All four speakers are well-established researchers in the exciting
multi-disciplinary fields of music informatics and computational
music cognition.
Alan Marsden began by tracing the changes that have
taken place since the 1970s in the prevailing attitude towards the
dichotomy between music data and the processing of this data. Whereas
the main focus in the 1970s was on the development of music data
formats, the 1980s saw the emergence of artificial intelligence
approaches emphasizing the importance of modelling musical behaviour.
In the 1990s, the development of neural networks, genetic algorithms
and agent-based systems resulted in the break-down of the distinction
between data and processing. He observed that there are still three
essentially different unstructured data formats used
for exchanging musical information - digital audio, MIDI and notation
encodings - and he stressed that we should "return to the issue of
information and processing".
Dr. Marsden then used
the example task of "shortening" (i.e., summarising) a piece of music
to demonstrate the need for structural representations and tools for
extracting structure from music. Finally, he demonstrated a program
called Novagen which uses Schenkerian principles of melodic structure
to automatically generate tonal melodies in response to a dancer's
movements captured using EyesWeb, a
system developed in Genoa for visual capture and analysis of dance
gestures.
Henkjan Honing described some exciting recent work on
categorical rhythm perception that he has carried out
with Peter
Desain and the other members of the Music, Mind, Machine Group
at the Universities of Amsterdam and Nijmegen.
Dr. Honing described experiments in which expert listeners were asked
to transcribe a range of temporal patterns spanning the space of all
possible 4-event patterns. These experiments revealed the way that
expert listeners categorically perceive temporal patterns that vary
along a continuous time-scale, as rhythms in which the inter-onset
intervals are related by small-integer ratios.
The rhythmic
categories found in these experiments were shown to be connected,
quasi-convex regions in a ternary plot of the complete
rhythm space.
Moreoever, they showed that the categories were not centred on
mechanical renditions of the notated rhythms. The experiments also
showed the powerful effect that both metre and tempo have on rhythm
perception. For example, most listeners perceive a rhythm in which
the inter-onset intervals are (0.263s, 0.421s,
0.316s) as a 1-2-1 rhythm when it is accompanied by a beat
that induces a
duple metre
but as a 1-3-2 rhythm when it is accompanied by a triple-metre
beat.
Dr. Honing then showed how an analysis of jazz performances
revealed that performances of the same notated rhythm by
different
performers may belong to different rhythmic categories.
Mark
Steedman described the advances that he has made since 1984 in
the development of a grammar for characterising the set
of 12-bar blues
chord sequences. Professor Steedman explained that more complicated
chord sequences may be derived from simpler ones by propagating
perfect cadences backwards. He showed that his earlier (1984)
grammar could be interpreted as a finite-state transducer
but that this
could not be used to construct a harmonic analysis of
a piece. In order
to do this, the idea of syntactic substitution had
to be abandoned in
favour of a grammar founded in a model theory for harmony, which
he based on Longuet-Higgins's three-dimensional tonal
space. Professor
Steedman proposed that "musically coherent chord sequences"
correspond to orderly progressions between two points by small steps
within this space. Moreover, he claimed that representing the chord
sequences in Longuet-Higgins's tonal space makes it clear that the
dominant seventh tends to resolve to the tonic because the tonic
triad "fits neatly" into a "hole" in the dominant seventh chord when
viewed in this space.
In the second half of his talk, Professor
Steedman explained how a combinatory categorial grammar would
probably work better for characterising the system of permissible
chord structures in 12-bar blues because it allows left-branching
analyses of structures that we usually think of as being
predominantly right-branching.
The final talk was given by
François Pachet who presented an overview of the technologies
developed at the Sony Computer
Science Laboratory in Paris over the course of the three-year
Cuidado project which ended in December 2003. The main application
developed in the Cuidado project is the Music Browser, a database
management system capable of handling large music catalogues
and
offering many novel content-based access methods in an integrated
environment. It provides a sophisticated interface that allows
the user to search or browse for music titles using not only
textual editorial information (e.g., composer, date, publisher,
etc.) but
also acoustic descriptors, extracted directly from the audio
data. The Music Browser also gives results organised by acoustic
and
cultural similarity. For example, the user may specify that
he wishes
to find "high energy" music or pieces with a similar timbre to some
specified work.
Perhaps the most impressive technology presented was
the Extractor Discovery System (EDS) incorporated into the
Music Browser. This is the first generic scheme for extracting
arbitrary
high-level music descriptors from audio data. EDS is capable
of automatically generating a complete audio extractor
when given only a
test-database and corresponding perceptive tests. It searches
for specific and efficient audio features using genetic
programming and
clusters these features using a machine-learning algorithm.
EDS has been used to generate an extractor that identifies
whether or not a
voice is present in a work with over 80% accuracy.
Finally, Dr.
Pachet described the MusicCrawler application which automatically
computes cultural associations between various text strings
which may
denote performers, composers, genres, song titles etc.
The MusicCrawler crawls the web, accumulating text web
pages
and
detecting occurrences of search items. It then computes
co-occurrence matrices from which it derives inter-item
distances which
can be used
as measures of "cultural similarity".
Many of those attending the
workshop had travelled far and they were rewarded with
the opportunity to hear extended presentations of cutting-edge
research
by leading practitioners in the field. It was noticeable
that most of those attending the workshop were either doctoral
students or
professionals in the field. The coffee breaks and lunch
therefore afforded a valuable opportunity for interesting
discussions with
other researchers from various parts of the world.
Dave Meredith
|