Machine Listening Workshop 2010

Queen Mary University of London

Monday 20th Dec 2010

Organisers: Mark Plumbley and Matthew Davies


Machine listening is a diverse topic covering many aspects of the computer analysis of sound including: computational auditory scene analysis, musical audio analysis, bioacoustics, cochlear implants and the analysis of medical sounds. The multidisciplinary nature of this research means that those of us working in the field often publish in different journals and go to different conferences and therefore don't have regular opportunities to interact.

In hosting this workshop our aim is to bring together researchers across the spectrum of machine listening towards the development of a coherent research community able to exploit our common interest in the analysis of audio. Our long-term goal is for machine listening to become as established as the machine vision community.



Provisional Timetable:

10:00 Registration (+ Coffee)
10:30 Mark Plumbley - Opening Remarks
10:45 Guy Brown - Machine listening systems for noise-robust and reverberation robust automatic speech recognition
11:30 David Reby - Tools for understanding mammal vocal communication: recording, resynthesis and playback
12:15 Lunch / networking + posters
13:40 Mark Lutman - Less is more: sparse approach to speech recognition in noise for cochlear implants
14:25 Simon Godsill - A survey of recent probabilistic approaches in musical audio modelling and transcription
15:10 Coffee
15:30 Miguel Coimbra - Listen to your heart
16:15 Discussion / Future Directions
17:00 Closing Remarks / Wrap-up*

* - There will be an opportunity to continue discussions after the Workshop in a nearby Pub/Restaurant.

Invited Speakers

Videos of each invited talk are now available.
To view the videos, click on the appropriate flash player.

Guy Brown (University of Sheffield)
Machine listening systems for noise-robust and reverberation-robust automatic speech recognition
In the first part of this talk I will review work in the field of computational auditory scene analysis (CASA), which aims to build machine hearing systems that replicate the ability of human listeners to perceptually organise sound. Such systems often represent the grouping of acoustic components in the form of a binary time-frequency mask, and act as front-end processors for automatic speech recognisers that use "missing data" principles to achieve noise robustness. In the second part of the talk, I will describe a machine hearing system that uses the time-frequency masking principle to achieve robust automatic speech recognition in reverberant conditions. However, psychophysical studies suggest that time-frequency masking may not be a good model for the process by which human listeners achieve perceptual compensation for the effects of reverberation. Related computational modelling studies suggest that low-level auditory mechanisms, possibly mediated by the efferent system, might be implicated in reverberation robustness. Work on modelling these processes, and incorporating them into machine hearing systems, will be presented.

David Reby (University of Sussex)
Tools for understanding mammal vocal communication: recording, re-synthesis and playback
Current studies of mammal vocal communication typically involve three levels of investigations: 1/ observation, where vocal behaviour is recorded and analysed in order to relate variation in acoustic components to attributes of the animal that produces the calls, or variation in social or environmental contexts, 2/ signal re-synthesis, where specific features of signals are modified independently in order to mirror their natural co-variation with relevant traits or contexts, as identified at the observation level. 3/ playback experiments, where the effect of the above modifications on receivers are tested by monitoring animals' responses to the broadcast of resynthesised stimuli.
In this talk I will illustrate this approach with specific studies of social and sexual communication in mammal species, highlighting in particular the key contribution of the application of the Source/Filter theory of voice production to this field. I will also discuss the positive impact of recent advances in digital signal processing and suggest what technological breakthroughs would further benefit bioacoustics.

Mark Lutman (University of Southampton)
Less is more: sparse approach to speech recognition in noise for cochlear implants
Processing strategies in cochlear implants have evolved from early devices attempting crude feature extraction and re-synthesis of estimated formant structures to continuous interleaved sampling that attempts to transmit as much of the incoming signal as possible. Common signal processing strategies used now, such as ACE used in Cochlear devices, select frequency bands with the most energy within short time frames and reject other bands. Investigational devices are examining whether there are benefits to be had from signal compression algorithms, such as MP3 coding, which utilise redundancy reduction principles via an auditory model to remove components that are normally masked.
We take a step back from existing approaches and ask what we want to transmit to the listener. We argue that speech and most other environmental signals are highly redundant, containing far less information than would be suggested by the acoustic bandwidth. We also argue that hearing impairment can be viewed as an information bottleneck, which poses a major constriction in cases of severe and profound hearing loss. It follows that the aim of signal processing algorithms for cochlear implants should be to extract the salient information from the incoming signal and throw away redundant parts ("less is more"). This approach maximises the potential to transmit relevant information through the bottleneck and optimise perception of the essence of the signal. We have used mathematical principles of information theory to develop a signal processing algorithm for cochlear implants named SPARSE. The term sparse is understood here to be a signal with the minimal underlying components necessary to represent the incoming signal across the electrode array.
Results indicate that SPARSE has advantages over ACE for speech recognition in noise, especially for poorer performing cochlear implant users. Objective measurements show clear improvements in speech-to-noise ratio. For sentences in babble noise presented at an input signal-to-noise ratio of 5dB, thirteen CI users showed a statistically significant improvement in keyword recognition score (p < 0.05) amounting to 5% on average, despite their lack of familiarity with SPARSE.
Radical new approaches to signal processing, based on the statistical properties of the incoming signal and principles of information theory, may offer improved performance to users of cochlear implants and other devices such as auditory brainstem implants, or even hearing aids. This knowledge may also help to explain how the auditory system deploys its resources to convey acoustic information to the brain in the most efficient manner in everyday listening.

Simon Godsill (University of Cambridge)
A survey of recent probabilistic approaches in musical audio modelling and transcription
Over recent years advances in our understanding of the statistical properties of musical audio, coupled with developments in computational inference methods, have enabled the use of rigorous probabilistic approaches in such applications as music transcription and enhancement. In this talk I will give an introduction to the underlying principles of the Bayesian computational approach to musical audio modelling, including the use of high-level structured prior models for musical note parameters and computational inference using state of the art adaptations of Markov chain Monte Carlo (MCMC) and particle filtering.

Miguel Coimbra (Universidade do Porto)
Listen to your heart
Auscultation is a hard skill to master. Heart sounds are of low frequency and the intervals between events are in the order of milliseconds, requiring significant practice for a human ear to distinguish the subtle changes between a normal and a pathological heart sound. The motivation of the DigiScope project - DIGItally enhanced stethoSCOPE for clinical usage - is to research novel machine listening algorithms that can extract physiologically meaningful information from heart sounds, paving the way for digitally enhanced stethoscopes that are adequate for training physicians to improve their basic skills in diagnosing and treating heart conditions, or as a stronger tool for world-wide screening of specific heart pathologies.


Tao Xu and Wenwu Wang (University of Surrey)
Adaptive dictionary learning based compressive sensing for underdetermined speech separation
[Download poster]

Atiyeh Alinaghi, Wenwu Wang and Philip Jackson (University of Surrey)
Blind separation of reverberant speech mixtures
[Download poster]

Jon Barker (University of Sheffield), Emmanuel Vincent (INRIA, Rennes),
Heidi Christensen, Ning Ma and Phil Green (University of Sheffield)
Introducing the PASCAL 'CHiME' Speech Separation and Recognition Challenge
[Download poster]

Dan Stowell (QMUL)
Automatic birdsong segmentation and clustering within the Vamp framework
[Download poster]

Federica Pace and Paul White (University of Southampton)
Classification of Humpback Whale Songs
[Download poster]

Michael Newton and Leslie Smith (University of Stirling)
Spiking onset neurons for sound identification
[Download poster]

Qiang Huang and Stephen Cox (University of East Anglia)
Speaker Spotting in a Tennis Game Using High-Level Information
[Download poster]

Tom Walters (Google)
Realtime features from the stabilized auditory image
[Download poster]

Tim Brookes and Chris Hummersone (University of Surrey)
Machine Listening for Sound Quality Evaluation
[Download poster]

Andrew Nesbit (QMUL)
Passive acoustic monitoring of marine mammals using sparse representations
[Download poster]

Mathieu Barthet, Katy Noland, Manuela Lahne, William Marsh, Rachel Ashworth (QMUL)
Analysis and sonification of zebrafish calcium signals based on transient detection
[Download poster]

Elodie Briefer and Alan McElligott (QMUL)
Social influence on goat kid calls during development
[Download poster]


All places on the workshop have now been allocated.

Note: We do not having funding to cover travel expenses of attending delegates.

Contact information:

Matthew Davies

Centre for Digital Music
Queen Mary University of London
Mile End Road, London E1 4NS, UK
Tel: +44 (0)20 7882 5528
Fax: +44 (0)20 7882 7997


The Event will take place at the Arts Lecture Theatre, Queen Mary University of London, Mile End Road, London E1 4NS.

The Arts Lecture Theatre is building 29 on this campus map

View Larger Map

The venue is easily accessible by public transport. It is within a five minute walk of both Mile End Underground station (Central, District, and Hammersmith & City lines) and Stepney Green Underground station (District, and Hammersmith & City lines).

For travel information, see [opens in new window]:


Suggested hotels for staying before or after the workshop:



The day after the Machine Listening Workshop, we are also hosting the annual DMRN event.
Please see here for further details:
DMRN+5 Workshop