Machine Listening Workshop 2010
Queen Mary University of London
Monday 20th Dec 2010
Organisers: Mark Plumbley and Matthew Davies
Machine listening is a diverse
topic covering many aspects of the computer analysis of sound
including: computational auditory scene analysis, musical audio
analysis, bioacoustics, cochlear implants and the analysis of medical
sounds. The multidisciplinary nature of this research means that those
of us working in the field often publish in different journals and go
to different conferences and therefore don't have regular opportunities
to interact.
In hosting this workshop our aim is to bring together researchers
across the spectrum of machine listening towards the development of a
coherent research community able to exploit our common interest in the
analysis of audio. Our long-term goal is for machine listening to
become as established as the machine vision community.
Provisional
Timetable:
10:00
Registration (+ Coffee)
10:30 Mark Plumbley - Opening Remarks
10:45 Guy Brown - Machine listening systems for noise-robust and reverberation robust automatic speech recognition
11:30 David Reby - Tools for understanding mammal vocal communication: recording, resynthesis and playback
12:15 Lunch / networking + posters
13:40 Mark Lutman - Less is more: sparse approach to speech recognition in noise for cochlear implants
14:25 Simon Godsill - A survey of recent probabilistic approaches in musical audio modelling and transcription
15:10 Coffee
15:30 Miguel Coimbra - Listen to your heart
16:15 Discussion / Future Directions
17:00 Closing Remarks / Wrap-up*
* - There will be an opportunity to continue discussions after the
Workshop in a nearby Pub/Restaurant.
Videos of each invited talk are now available.
To view the videos, click on the appropriate flash player.
Guy Brown (University of Sheffield)
Machine listening systems for noise-robust and reverberation-robust automatic speech recognition
In the first part of this talk I will review work in the field of
computational auditory scene analysis (CASA), which aims to build
machine hearing systems that replicate the ability of human listeners
to perceptually organise sound. Such systems often represent the
grouping of acoustic components in the form of a binary time-frequency
mask, and act as front-end processors for automatic speech recognisers
that use "missing data" principles to achieve noise robustness. In the
second part of the talk, I will describe a machine hearing system that
uses the time-frequency masking principle to achieve robust automatic
speech recognition in reverberant conditions. However, psychophysical
studies suggest that time-frequency masking may not be a good model for
the process by which human listeners achieve perceptual compensation
for the effects of reverberation. Related computational modelling
studies suggest that low-level auditory mechanisms, possibly mediated
by the efferent system, might be implicated in reverberation
robustness. Work on modelling these processes, and incorporating them
into machine hearing systems, will be presented.
David Reby (University of Sussex)
Tools for understanding mammal vocal communication: recording,
re-synthesis and playback
Current studies of mammal vocal communication typically involve three
levels of investigations:
1/ observation, where vocal behaviour is recorded and analysed in order
to relate variation in acoustic components to attributes of the animal
that produces the calls, or variation in social or environmental
contexts,
2/ signal re-synthesis, where specific features of signals are modified
independently in order to mirror their natural co-variation with
relevant traits or contexts, as identified at the observation level.
3/ playback experiments, where the effect of the above modifications on
receivers are tested by monitoring animals' responses to the broadcast
of resynthesised stimuli.
In this talk I will illustrate this approach with specific studies
of social and sexual communication in mammal species, highlighting in
particular the key contribution of the application of the Source/Filter
theory of voice production to this field. I will also discuss the
positive impact of recent advances in digital signal processing and
suggest what technological breakthroughs would further benefit
bioacoustics.
Mark Lutman (University of Southampton)
Less is more: sparse approach to speech recognition in noise for cochlear implants
Processing strategies in cochlear implants have evolved from early
devices attempting crude feature extraction and re-synthesis of
estimated formant structures to continuous interleaved sampling that
attempts to transmit as much of the incoming signal as possible. Common
signal processing strategies used now, such as ACE used in Cochlear
devices, select frequency bands with the most energy within short time
frames and reject other bands. Investigational devices are examining
whether there are benefits to be had from signal compression
algorithms, such as MP3 coding, which utilise redundancy reduction
principles via an auditory model to remove components that are normally
masked.
We take a step back from existing approaches and ask what we want
to transmit to the listener. We argue that speech and most other
environmental signals are highly redundant, containing far less
information than would be suggested by the acoustic bandwidth. We also
argue that hearing impairment can be viewed as an information
bottleneck, which poses a major constriction in cases of severe and
profound hearing loss. It follows that the aim of signal processing
algorithms for cochlear implants should be to extract the salient
information from the incoming signal and throw away redundant parts
("less is more"). This approach maximises the potential to transmit
relevant information through the bottleneck and optimise perception of
the essence of the signal.
We have used mathematical principles of information theory to develop a
signal processing algorithm for cochlear implants named SPARSE. The
term sparse is understood here to be a signal with the minimal
underlying components necessary to represent the incoming signal across
the electrode array.
Results indicate that SPARSE has advantages over ACE for speech
recognition in noise, especially for poorer performing cochlear implant
users. Objective measurements show clear improvements in
speech-to-noise ratio. For sentences in babble noise presented at an
input signal-to-noise ratio of 5dB, thirteen CI users showed a
statistically significant improvement in keyword recognition score (p
< 0.05) amounting to 5% on average, despite their lack of
familiarity with SPARSE.
Radical new approaches to signal processing, based on the
statistical properties of the incoming signal and principles of
information theory, may offer improved performance to users of cochlear
implants and other devices such as auditory brainstem implants, or even
hearing aids. This knowledge may also help to explain how the auditory
system deploys its resources to convey acoustic information to the
brain in the most efficient manner in everyday listening.
Simon Godsill (University of Cambridge)
A survey of recent probabilistic approaches in musical audio modelling and transcription
Over recent years advances in our understanding of the statistical
properties of musical audio, coupled with developments in computational
inference methods, have enabled the use of rigorous probabilistic
approaches in such applications as music transcription and enhancement.
In this talk I will give an introduction to the underlying principles
of the Bayesian computational approach to musical audio modelling,
including the use of high-level structured prior models for musical
note parameters and computational inference using state of the art
adaptations of Markov chain Monte Carlo (MCMC) and particle filtering.
Miguel Coimbra (Universidade do Porto)
Listen to your heart
Auscultation is a hard skill to master. Heart sounds are of low
frequency and the intervals between events are in the order of
milliseconds, requiring significant practice for a human ear to
distinguish the subtle changes between a normal and a pathological
heart sound. The motivation of the DigiScope project - DIGItally
enhanced stethoSCOPE for clinical usage - is to research novel machine
listening algorithms that can extract physiologically meaningful
information from heart sounds, paving the way for digitally enhanced
stethoscopes that are adequate for training physicians to improve their
basic skills in diagnosing and treating heart conditions, or as a
stronger tool for world-wide screening of specific heart pathologies.
Tao Xu and Wenwu Wang (University of Surrey)
Adaptive dictionary learning based compressive sensing for underdetermined speech separation
[Download poster]
Atiyeh Alinaghi, Wenwu Wang and Philip Jackson (University of Surrey)
Blind separation of reverberant speech mixtures
[Download poster]
Jon Barker (University of Sheffield), Emmanuel Vincent (INRIA, Rennes),
Heidi Christensen, Ning Ma and Phil Green (University of Sheffield)
Introducing the PASCAL 'CHiME' Speech Separation and Recognition Challenge
[Download poster]
Dan Stowell (QMUL)
Automatic birdsong segmentation and clustering within the Vamp framework
[Download poster]
Federica Pace and Paul White (University of Southampton)
Classification of Humpback Whale Songs
[Download poster]
Michael Newton and Leslie Smith (University of Stirling)
Spiking onset neurons for sound identification
[Download poster]
Qiang Huang and Stephen Cox (University of East Anglia)
Speaker Spotting in a Tennis Game Using High-Level Information
[Download poster]
Tom Walters (Google)
Realtime features from the stabilized auditory image
[Download poster]
Tim Brookes and Chris Hummersone (University of Surrey)
Machine Listening for Sound Quality Evaluation
[Download poster]
Andrew Nesbit (QMUL)
Passive acoustic monitoring of marine mammals using sparse representations
[Download poster]
Mathieu Barthet, Katy Noland, Manuela Lahne, William Marsh, Rachel Ashworth (QMUL)
Analysis and sonification of zebrafish calcium signals based on transient detection
[Download poster]
Elodie Briefer and Alan McElligott (QMUL)
Social influence on goat kid calls during development
[Download poster]
All places on the workshop have now been allocated.
Note: We do not having funding to cover travel expenses of attending delegates.
Matthew
Davies
Centre for Digital Music
Queen Mary University of London
Mile End Road, London E1 4NS, UK
Tel: +44 (0)20 7882 5528
Fax: +44 (0)20 7882 7997
Venue
The Event will take place at the Arts Lecture Theatre, Queen Mary University of London, Mile End Road, London E1 4NS.
The Arts Lecture Theatre is building 29 on this campus map
View Larger Map
The venue is easily accessible by public transport. It
is within a five minute walk of both Mile End Underground
station (Central, District, and Hammersmith & City
lines) and Stepney Green Underground station
(District, and Hammersmith & City lines).
For travel information, see [opens in new window]:
Hotels
Suggested hotels for staying before or after the workshop:
The day after the Machine Listening Workshop, we are also hosting the annual DMRN event.
Please see here for further details:
DMRN+5 Workshop