Centre for Digital Music

 
Overview
Audio Engineering
Interactional Sound
Machine Listening
Music Informatics
Music Cognition
Projects
People
Publications
Seminars
Conferences & Events
Education
PhD Study
PhD Graduates
Software
Patents
 

Centre for Digital Music past seminars

For C4DM seminars before 2009 please see our archives at http://c4dm.eecs.qmul.ac.uk/seminars-historical.html

Nicolas Gold
Speaker(s) : Nicolas Gold - University College London  
When : Wed 3rd October 2012 14:00
Where :



Different uses of space to analyze harmonic structures
Speaker(s) : Louis Bigo - IRCAM, Paris, LACL, Université Paris 12  
When : Wed 4th July 2012 14:00
Where : Eng 2.09

This presentation proposes different ways to use the notion of space for visualization and analysis of symbolic harmonic structures. Our approach is motivated by researching symbolic spaces exhibiting the singularity of a musical sequence by its representation. This motivation is central in spatial computing, which we use with the help of the dedicated programming language MGS. After a quick presentation of this language, we will propose two kinds of pitch space to represent harmonic structures: Irregular pitch spaces, in the form of simplicial complexes resulting from a computation. The second kind of space comprises regular pitch spaces (well known as Tonnetze). We use them to analyze geometrical properties associated with musical progressions.

Louis Bigo is a PhD student at IRCAM, Paris and at LACL (Université Paris 12). He began his PhD research in 2010 after he graduated from the ATIAM Masters programme (Acoustics, Signal Processing, Computer Science applied to Music) at IRCAM. Prior to IRCAM, he worked as an Engineer for the Computer Science company Axway after graduating from the Engineering school Polytech'Lille.



Solo and accompaniment separation in polyphonic music
Speaker(s) : Estefanía Cano - Fraunhofer Institute for Digital Media Technology  
When : Thu 28th June 2012 11:00
Where : Eng 2.09

The capability of separating solo parts from its music accompaniment has several applications of interest: music education, performance analysis, and entertainment among others. This talk will describe an efficient implementation of a solo and accompaniment separation algorithm and a recent study conducted to better understand the behavior of musical signals, namely, the evolution of spectral parameters in time, and the intricate relations between the different signal components. Initial results from this study will be presented, as well as some preliminary experiments on the inclusion of this information in the separation scheme.

Estefanía Cano is Ph.D student at Fraunhofer IDMT in Ilmeanau - Germany, working in the field of sound separation. Her work particularly deals with the separation of solo and accompaniment parts in polyphonic music. Estefanía holds degrees in Electronic Engineering (B.S) and music (B.M) from Universidad Pontificia Bolivariana and Universidad de Antioquia in Colombia, and in Music Engineer (M.S) from the University of Miami. During her time in Miami, her research focused in the creation of accompaniment tracks for concert preparation, specifically dealing with classical saxophone recordings and university music students as main users. Her current research follows the same line, expanding the separation scheme to different musical instruments and genres, but also aiming at separation results suitable for education and music practice scenarios. While working at Fraunhofer IDMT, Estefanía has been directly involved with the Song2See project which deals with the development of music education tools to aid students in the process of learning a musical instrument.



Technology as Musical Agitator
Speaker(s) : Isaac Schankler - University of Southern California  
When : Thu 14th June 2012 14:00
Where : BR 4.02

In the popular imagination, technology makes music more fixed, precise, canned, mechanistic. But it can also be an agitator of sorts, a disturbance that introduces entropy or surprise into an otherwise deterministic situation. In this talk we'll look at some of the disruptive uses that technology can be put to in music composition and other musically creative tasks, with applications of spectral feedback loops, speech-to-music transcription, expressive accelerometers, microtonal vocal performance assistance, and human-machine improvisation.

Isaac Schankler is a composer, improviser, and researcher living in Los Angeles, California. His chamber opera Light and Power, commissioned by Juventas New Music Ensemble with assistance from Boston Opera Collaborative and Meet the Composer, recently won Second Place in the US National Opera Association's 2010-11 Opera Production Competition (Professional Division). Other recent honors include grants from the American Composers Forum, the USC Sadye J. Moss Composition Prize, an Associate Artist residency at the Atlantic Center for the Arts, and the Damien Top Prize in the ASCAP/Lotte Lehmann Foundation Art Song Competition. Isaac is the Artistic Director of the concert series People Inside Electronics, and Artist in Residence of the Music Computation and Cognition Laboratory (MuCoaCo) at the USC Viterbi School of Engineering, where he researches human-machine improvisation and musical structure. He holds degrees in composition from the USC Thornton School of Music (DMA) and the University of Michigan (MM, BM).



The Evolution of Music Game Design at Harmonix
Speaker(s) : Eran Egozy - Harmonix  
When : Wed 13th June 2012 14:00
Where : BR 4.02

Harmonix Music Systems is best known for its award winning game titles “Guitar Hero” (first released in 2005) and “Rock Band” (first released in 2007), which fueled an explosive growth of music gaming. However, few realize that Harmonix was founded in 1995, a full 10 years prior to the music game revolution.

What happened during those 10 years?

Eran Egozy, co-founder of Harmonix, will describe Harmonix’s mission, and how it tried to fulfill this mission through different products and business models. The company’s eventual success came not through an overnight flash of brilliance, but rather through prolonged trial-and-error, product iteration, and lessons learned from failed attempts. Eran will also describe Harmonix’s post-Rock Band adventures, including an eye towards future directions in music gaming.

Eran Egozy is the co-founder and chief technical officer of Harmonix Music Systems, one of the pre-eminent game development studios in the world, having developed more than a dozen critically acclaimed music-based video games. Harmonix was founded in 1995 on the principle that non-musicians should be able to experience the joy making music. Beginning in 2005, Harmonix developed Guitar Hero and Guitar Hero 2, fueling the explosive growth of the music games category to over $1 billion in sales. In 2006, Harmonix launched the innovative and award-winning title Rock Band. The blockbuster franchise grew to include Rock Band 2, Rock Band 3, and The Beatles: Rock Band. Harmonix is also the creator behind the ground-breaking hit titles, Dance Central and Dance Central 2, the first fully immersive, no-controller dance games for the Kinect. Eran and his business partner Alex Rigopulos were named in Time Magazine’s Time 100 in 2008, Fortune Magazine’s Top 40 Under 40 in 2009, and USA Network’s Character Approved awards in 2010.

Eran brings extensive technical and musical expertise to Harmonix’s management team. He manages the company's engineering staff, directs intellectual property development, contributes to game design and helps drive corporate strategy. Prior to co-founding Harmonix, Eran conducted research on combining music and technology at the MIT Media Lab. He is also an accomplished clarinetist and performs in Boston’s eclectic chamber music group Radius Ensemble. Eran earned his B.S. and M.S. degrees in Electrical Engineering from the Massachusetts Institute of Technology.

This is a Joint Distinguished Lecturer Seminar with C4DM



Everything you ever wanted to know about Lana Del Rey('s fans), but were too afraid to ask
Speaker(s) : Ben Fields - Musicmetric  
When : Wed 6th June 2012 15:00
Where : BR 3.02

The ways people find and collect music have changed radically over the last decade.  There is considerably less emphasis on the buying and selling of physical media as these forms are replaced by purchased digital downloads and streaming services, alongside a well-established system of peer-to-peer unlicensed distribution via protocols like BitTorrent.  Add to this the advent of social networks, both general and music-focused, and the result is a fundamental change in how fans interact with both music and the artists the produce it.  These new forms of interaction all share a common feature -- they are digitally recorded.  During this talk we'll look these digital interaction breadcrumbs from various social (e.g. Facebook, Soundcloud) and peer-to-peer (BitTorrent) networks and other sources.  We'll take a look at how Musicmetric handles some of the problems associated with data collection.  Then we'll walk through what data is available (and machine-readable!) via Musicmetric's JSON API and look at some example applications of these data-sources.

Ben Fields leads Musicmetric's data science team in an attempt to wrangle some sanity into the Internet's vast supply of horribly formed music data. He has a PhD from the Intelligent Sound and Music Systems group in the Computing Department at Goldsmiths University of London. His work there focused on merging social and acoustic similarity spaces to drive playlist creation and related user-facing systems. He is an expert on metadata, structured data, the semantic web and recommendation systems. In his spare time, he is a co-chair of the annual International Workshop On Music Recommendation And Discovery, has given an Ignite London talk about beer styles, occasionally DJs, is an accredited beer judge and homebrews beer. He thinks bios in the third person are weird but figures that's how they're meant to be written.



A structured sparse approach to audio denoising
Speaker(s) : Kai Siedenburg - Austrian Research Institute for Artificial Intelligence  
When : Thu 31st May 2012 14:00
Where : BR 4.02

Exploiting the structures of audio signals as persistence along time and frequency leads to significant improvements of classical audio denoising algorithms. This talk gives an overview of current research activities in the field of structured sparsity applied to audio denoising. Of particular interest will be shrinkage operators which take into account a signal's persistence properties. It will turn out that they may serve as an efficient alternative to state of the art algorithms both in terms of denoising performance measured in signal to noise ratio and perceptual qualities. Furthermore, a model for adaptively choosing the shrinkage threshold will be presented, as well as the operators' relations to convex minimization problems.

Kai Siedenburg studied Mathematics and Musicology at Humboldt University Berlin, receiving his M.Sc. degree in 2012. In 2008/09 he visited the University of California, Berkeley and its Center For New Music and Audio Technologies (CNMAT) on a Fulbright scholarship. He further worked as student research assistant at the Audio Communication Group at the Berlin Institute of Technology in 2009-2011. His masters thesis on sparse modeling of audio signals was realized at the Numerical Harmonic Analysis Group (NUHAG) Vienna. Throughout his studies Kai was stipendiary of the German National Academic Merit Foundation. Currently, he is working on audio signal processing at the Austrian Research Institute for Artificial Intelligence (OFAI). Kai's academic interests mainly focus on the (cognitive / computational / mathematical) representation and synthesis of musical sounds. He is particularly interested in mathematical models of audio signals, as well as musical timbre, from musicological and perceptual viewpoints. Artistically, he is active as a Jazz-pianist and electronic musician, playing groove-Jazz and being engaged in free improvisation and the design of electronic instruments.



Three experiments in music genre recognition
Speaker(s) : Bob Sturm - Aalborg University Copenhagen  
When : Wed 23rd May 2012 14:00
Where : BR 4.02

During the past decade, many researchers have tackled the problem of making computers automatically recognize the genre of recorded music. This is an important problem because it can, among other things,  ameliorate the deluge into large databases unlabeled, mislabeled, but always poorly labeled, audio data. The first published work in this area in 2001 achieves a mean accuracy of 61% in ten different genres. Another work from 2006 reaches 83% mean accuracy for this same dataset. And work from 2009 and 2010 claims to observe 91% mean accuracy for this same dataset. With genre so difficult to define, and seemingly based on factors more broad than acoustics, these are remarkable results. In this talk, I argue from results of three simple experiments that the improvements we have seen are unfortunate consequences of excellent discrimination based on confounding factors having little to do with music genre.

Bob L. Sturm received the B.A. degree in physics from University of Colorado, Boulder in 1998, the M.A. degree in Music, Science, and Technology, at Stanford University, in 1999, the M.S. degree in multimedia engineering in the Media Arts and Technology program at University of California, Santa Barbara (UCSB), in 2004, and finally the M.S. and Ph.D. degrees in Electrical and Computer Engineering at UCSB, in 2007 and 2009, respectively. Dr. Sturm specializes in signal processing, sparse approximation, and their applications to audio and music. During 2009, Dr. Sturm was a Chateaubriand Post-doctoral Fellow at the Institut Jean Le Rond d'Alembert, Equipe Lutheries, Acoustique, Musique (LAM), at Université Pierre et Marie Curie, UPMC Paris 6. In January 2010, Dr. Sturm became Assistant Professor at the Department of Architecture, Design and Media Technology at Aalborg University Copenhagen. In 2011, he was awarded a two-year Independent Postdoc Grant from the Danish Agency for Science, Technology and Innovation, beginning January 2012. His current research interests are: digital signal processing for audio and music signals, algorithms for sparse approximation and compressive sampling, and music and audio information retrieval.



Communications-­Inspired Compressive Sensing
Speaker(s) : Miguel Rodrigues - University College London  
When : Wed 16th May 2012 14:00
Where : BR 4.02

Compressive sensing (CS) has recently emerged as an important area of research in image sensing and processing. Conventional sensing systems employ a two-step procedure: i) data acquisition; and ii) data compression for subsequent storage or communication. CS systems, in contrast, acquire the data directly in a compressed format. CS signal acquisition or measurement involves projecting the underlying signal or image onto a set of vectors and CS recovery involves solving an inverse problem.

There are two hallmarks of the original CS theory. First, the projection vectors are typically constituted uniformly at random. Second, the inverse recovery problem is regularized based on the assumption that the underlying signal or image admits a sparse representation in some orthonormal basis or frame. However, it has been recognized, even in some of the early CS studies, that improved recovery performance could be achieved by using optimized projection vectors in lieu of the random ones; further, it has also been recently recognized that improved recovery performance could also be achieved by leveraging a signal model that goes beyond the conventional -- often overly primitive -- sparse one.

This talk outlines how to build upon recent advances in the fields of information theory and communications to design CS projections -- or measurement kernels -- matched to a general signal statistical model. The crux of the design approach is the realization that the projections design problem for CS systems exhibits parallels with the precoder design problem for multiple-input-multiple-output (MIMO) communications systems: in the communications problem a source is being matched to a channel whereas in the CS problem a channel, or equivalently the noise covariance, is being matched to the source. This new design approach is shown to lead to theoretical results, which unveil key operations effected by the projection designs, as well as state-of-the-art experimental results in practical CS imaging problems.

This represents joint work with William Carson (U. Porto, Portugal), Minhua Chen (Duke U., USA), Lawrence Carin (Duke U., USA) and Robert Calderbank (Duke U., USA).

Miguel Rodrigues is a Senior Lecturer with the Department of Electronic and Electrical Engineering, University College London, UK. He was previously with the Department of Computer Science, University of Porto, Portugal, raising through the ranks from Assistant to Associate Professor, where he also led the Information Theory and Communications Research Group at Instituto de Telecomunicações - Porto.

He received the Licenciatura degree in Electrical Engineering from the University of Porto, Portugal in 1998 and the Ph.D. degree in Electronic and Electrical Engineering from University College London, UK in 2002. He has carried out postdoctoral research work both at Cambridge University, UK, as well as Princeton University, USA, in the period 2003 to 2007. He has also held visiting research appointments at Princeton University, USA and Duke University, USA in the period 2007 to 2012. He is also a Visiting Fellow at Cambridge University.

His research work, which lies in the general areas of information theory, communications and signal processing, has led to nearly 100 papers in journals and conferences to date.

Dr. Rodrigues was honored with the IEEE Information Theory and Communications Societies Joint Paper Award 2011 for his work on ``Wireless Information-Theoretic Security'' (jointly with M. Bloch, J. Barros and S. McLaughlin). Dr. Rodrigues was also the recipient of the Prize Engenheiro António de Almeida, the Prize Engenheiro Cristiano Spratley, and the Merit Scholarship from the University of Porto.



POSTPONED: Open Data on a Budget
Speaker(s) : Martyn Davies - Six Two Productions  
When : Wed 9th May 2012 15:00
Where : BR 4.02

[THIS EVENT HAS BEEN POSTPONED AND WILL BE REARRANGED FOR A LATER DATE]

 

We've seen more and more of the  music industry embrace hack days over the past year; labels like EMI and Universal Music Group are making valiant efforts to be bigger players and aid innovation through exposure of open data about their artists and releases. This is fine for the big players who can offset the longer term return on investment against what it takes to get up and running, but what about the little guy? What about the indie label? What about the artists with no label, surely they should get to play too? I'd like to present a 'talk in progress' presenting music industry hacks and service hacks that allow small/time-strapped labels to expose more data about their artists, releases and the music itself on a low-to-no budget.

Martyn Davies is a creative technologist, product manager and hacker for hire based in London, where he currently works as the CEO of Six Two Productions, a creative technology company he founded in 2011 to build new web and mobile applications for the music industry out of open APIs and other bits of data. He is also central coordinator for Music Hack Day and the organiser of the London & Cannes Music Hack Days. Previous to this he spent a year at Universal Music Group as Innovation Manager, nine years at the BBC working on BBC Radio 1, BBC Music and BBC Introducing as a content producer, radio producer, developer and product manager on all kinds of wonderful things. He works on other sites/hacks/ideas in his spare time. His favourite place for a meeting is the pub.



Capturing, Visualizing and Recreating Spatial Sound
Speaker(s) : Ramani Duraiswami - University of Maryland; VisiSonics Corporation  
When : Thu 26th April 2012 14:00
Where : Maths Lecture Theatre

The sound field at a point contains information on the spatial origin of the sound, and humans use this information in making sense of the environment. When we hear sound, that sound is filtered by interaction with the environment and our bodies. This process endows the sound with cues that are then decoded by the neural system to perceive the world auditorily in three dimensions. To capture and reproduce this directional information in the sound we need a spatial representation of the sound, and a means to capture and manipulate the sound in this representation. We have explored two classical mathematical physics based representations of directional sound - in terms of spherical wave functions and in terms of plane wave expansions. We have developed spherical microphone arrays that allow the captured sound to be represented directly in these basis.

Plane-wave beamforming allows the sound-field at a point to be visualized as an image, much as a video camera images the light-field at a given point. The registration of the audio images with visual images allows a new way to perform audio-visual scene analysis. Several examples are presented at http://goo.gl/igflH.

The captured sound can be used to recreate spatial sound scenes over headphones that allows perception of the original scene. For the reproduction, our approach incorporates individualized HRTFs (measured via a novel reciprocal technique), room modeling, and tracking.

(Joint work with Adam O'Donovan, Dmitry Zotkin and Nail A. Gumerov)

Ramani Duraiswami is a member of the faculty of the department of computer science at the University of Maryland, College Park. He has broad research interests in a number of areas including scientific computing, spatial audio, machine learning and computer vision. He has a Ph.D. from Johns Hopkins and a B.Tech. from IIT Bombay. See www.umiacs.umd.edu/~ramani for more on his research, and http://www.visisonics.com for more on VisiSonics.



Dictionary learning algorithms and their applications in source separation, speaker tracking, and image denoising
Speaker(s) : Wenwu Wang - University of Surrey  
When : Wed 25th April 2012 14:00
Where : Eng 209

Two related problems have been studied either separately or jointly in sparse representations: sparse coding, that is, to find the sparse linear decompositions of a signal for a given dictionary (i.e. the collection of all codewords), and dictionary design. An over-complete dictionary, one in which the number of codewords is greater than the dimension of the signal, can be obtained by either an analytical (using a predefined transform e.g. DCT) or a learning-based approach. In learning-based approaches, the dictionaries are adapted from a set of training data. Although this may involve higher computational complexity, learned dictionaries have the potential to offer improved performance as compared with predefined dictionaries, since the atoms are derived to capture the salient information directly from the signals. Dictionary learning algorithms are often established on an optimization process involving the iteration between two stages: sparse approximation and dictionary update. This talk will discuss dictionary learning algorithms with particular application interest in source separation, speaker tracking, and image denoising. Focus will be placed on a new method for dictionary update, called simultaneous codeword optimisation (SimCO), which essentially generalises the well-known optimisation framework employed in K-SVD and MOD. Some preliminary results on applying dictionary learning techniques to multimodal tracking of moving speakers will also be demonstrated.

Wenwu Wang has been with the Center for Vision Speech and Signal Processing, at University of Surrey, Guildford, UK, since May 2007, where he is currently a Lecturer. He previously held the prestigious RCUK Fellowship in machine audio perception. His research and teaching interests include blind signal processing, machine learning and perception, and machine audition (listening). During spring 2008, he has been a visiting scholar with The Ohio State University, Columbus, USA, working at (with sponsorship from) the Perception and Neurodynamics Lab and the Center for Cognitive Science.

Previously, from September 2006 to April 2007, he was with Creative Technology Ltd, as a Software R&D Engineer, at the Sensaura Division, Egham, UK, working on 3D positional audio technology for embedded systems and mobile devices. From May 2005 to August 2006, he was a DSP Engineer with Tao Group Ltd (now Antix Labs Ltd), Reading, UK, working on algorithm design for audio, music and video systems based on a real-time multimedia platform. Before that, he was a Postdoctoral Research Associate, from January 2004 to April 2005, with the School of Engineering, Cardiff University, Cardiff, UK, and from May 2002 to December 2003, with King's College London, London, UK, working on blind signal processing for audio, speech and biomedical signals.

He received the B.Sc. degree in 1997, the M.E. degree in 2000, and the Ph.D. degree in 2002, all from Harbin Engineering University, Harbin, China, where he also received the Outstanding Graduate Award, the Excellent Paper Award, the Excellent Thesis Award and numerous Scholarships for academic excellence. He was awarded the PGCAP certificate in July 2010 from the University of Surrey.

He is a Fellow of the Higher Education Academy, a Member of the ISCA, a Senior Member of the IEEE, and belongs to the IEEE Signal Processing, Circuits and Systems, and Computational Intelligence Societies. He has served or currently serves as a reviewer, program committee member, or editor for a number of international journals and conferences. He was acknowledged as an Appreciated Reviewer for IEEE Trans. on Signal Processing in Jan 2008. He has been listed in Marquis Who's Who in the World (in both the 25th Silver Anniversary Edition in 2008 and the 26th Edition in 2009).

His research is funded by the Engineering and Physical Science Research Council (EPSRC), Ministry of Defence (MOD), Defence Science and Technology Laboratory (DSTL), Home Office (HO), Royal Academy of Engineering (RAENG), and the University Research Support Fund (URSF).



Online source separation: a generalised approach
Speaker(s) : Laurent Simon - INRIA Rennes Bretagne Atlantique, France  
When : Thu 19th April 2012 10:30
Where : Electronic Engineering, Room 209

During this talk, we will discuss the problem of online audio source separation. Most online audio source separation algorithms use either a sliding block approach or a stochastic gradient approach, which is faster but less accurate. We propose a general online audio source separation framework that can combine both online approaches. This online approach is based on Ozerov's offline general flexible source separation framework, a framework that lets the user specify spatial, spectral or temporal constraints for each source of the mixture, depending on the knowledge available about sound sources.

Laurent Simon graduated with a Master's degree in Acoustics, Signal Processing and Computer Science applied to Music at IRCAM, Paris, France. He did his Ph.D. in psychoacoustics at the Institute of Sound Recording, University of Surrey, where he developed stereophonic microphone array design techniques based on perceptual cues. In January 2011, he started a postdoctoral position in audio signal processing at INRIA Rennes, France. Here, he focuses on online audio source separation.



Coupling speech source separation and recognition using a `fragment decoding' approach.
Speaker(s) : Jon Barker - University of Sheffield  
When : Wed 4th April 2012 15:00
Where : Electronic Engineering, Room 209

Distant microphone speech recognition presents many challenges. On of the chief difficulties is contending with complex multisource noise backgrounds. Typical backgrounds are composed of multiple competing sound sources, the number of which is generally unknown and whose activity level may be changing unpredictably over time. The talk will present approaches to tackling this problem that are being developed at Sheffield by the EPSRC project CHiME.

The talk will first present the PASCAL `CHiME' Speech Separation and Recognition challenge. This task employs noise backgrounds from a dataset of over 40 hours of binaural audio recorded in various rooms of a busy family home. Real room impulse responses have been used to add utterances from a separate speech recognition corpus into these multisource environments to provide a controlled and yet semi-realistic recognition task. 

Having presented the challenge the talk will motivate our particular approach to the solution: speech fragment decoding (SFD). This technique, inspired by ideas from Bregman's Auditory Scene Analysis account of perception, exploits monaural and binaural cues for sound source separation to locate sound source `fragments' which are then `stitched together' using temporal sequence knowledge represented by traditional statistical models of speech.  CHiME challenge results will be presented along with a discussion of limitations of the technique and directions for future research.

Jon Barker obtained a degree in Electrical and Information Sciences from the University of Cambridge, followed by a Ph.D. in Computer Science from the University of Sheffield in 1998. After graduating he spent a year at the Institut Communication Parlee, Grenoble, studying audio-visual speech perception before returning to the Speech and Hearing Group at Sheffield where he is now a Senior Lecturer. Over the years he has spent time as a visiting researcher at ICSI, IDIAP and Columbia University. His research interests include modelling speech perception in real environments, noise-robust speech recognition and computational hearing for robotics. Much of his recent research has concerned the attempt to design robust speech technology using ideas inspired by our limited understanding of how humans process speech in noisy acoustic environments.



Music Technology for Music Production, Broadcast & DJing
Speaker(s) : Alexander Lerch - zplane.development  
When : Fri 16th March 2012 11:00
Where : Eng 207

zplane.development is a technology provider to the music industry. The licensable portfolio includes music processing technology such as effects and time-stretching/pitch-shifting as well as music analysis technology including beat tracking, key detection, and classification approaches. The talk will cover several zplane technologies and products; it will conclude with current developments in the music software market.

Alexander Lerch studied Telecommunications at the Technical University Berlin and Tonmeister (Sound Engineering) at the University of the Arts Berlin. He received his PhD on algorithmic music performance analysis from the Technical University Berlin. In 2000, he co-founded the company zplane.development, a research-driven technology provider for the music industry. At zplane, he works on the design and implementation of algorithms for music processing and music information retrieval. In addition to his work at zplane, he lectures at the audio communications department of the Technical University Berlin and has been actively contributing to the MIR research field.



Loudness: Considerations from Psychoacoustics to Audio
Speaker(s) : Konstantinos Pastiadis - Aristotle Univ. of Thessaloniki  
When : Wed 29th February 2012 14:00
Where : Electronic Engineering, Room 209

We are discussing issues of loudness equalization. Loudness equalization is an important issue for various types of psychophysical research and applications. At the same time, it demands particular attention from the entertainment industry, music industry, Mass Media and Broadcasting. The approach we follow departs from basic findings on loudness perception and their application in the field of Psychoacoustics, and arrives to the discussion of the technical issues in the world of entertainment. The questioning refers to major dichotomies between Psychoacoustical research and Audio applications. The discussion employs principles and aspects of computational modeling and considerations on cognitive processes that are involved in loudness perception.

Dr. Konstantinos Pastiadis received the degree of Electrical & Computer Engineering from the Department of Electrical & Computer Engineering, Aristotle Univ. of Thessaloniki. He also received his PhD on Voice/Speech Signal Processing from the same department. Currently, he is a Lecturer on Musical Acoustics, Psychoacoustics and Signal Processing at the Department of Musical Studies, Aristotle University of Thessaloniki. He has also received a degree on music from the Yamaha Music Foundation and he is a piano and keyboards performer. His research and teaching interests on Psychoacoustics and Signal Processing include: Psychophysical methods and applications in music perception. Computational models in auditory physiology, perception and production of musical signals. Cochlear/Hearing Implants. Tests/systems for the acquisition, processing and analysis of objective and behavioural data. Musical instruments and singing/voice acoustics.



OpenEMI - labels, developers, and the future of music apps
Speaker(s) : Kara Mukerjee - EMI Music  
When : Wed 22nd February 2012 15:00
Where : Electronic Engineering, Room 209

Between a booming app market and the rapid uptake of music subscription services, record labels are starting to embrace and even develop new consumption technologies and revenue models. By taking a collaborative approach and nurturing the creativity of developer communities, labels can provide their legal, financial, and marketing expertise to help developers navigate the notoriously tricky terrain from idea conception to revenue generation and distribution.

This talk will overview the effects of new consumption behaviours and a rapidly expanding app market on record company marketing strategies and revenue models. We will focus on EMI's "OpenEMI" project which aims to foster relationships between artists, software developers, record companies, and other industry bodies, in order to explore new revenue channels and foster the generation of innovative new digital platforms and services.

Kara Mukerjee is Digital Projects Manager in EMI Music's London-based Digital Strategy team, focusing on the company's innovative OpenEMI project which was launched in late 2011 to drive closer and more innovative contacts between EMI and the technology development sector. From 2007-2011 Kara oversaw the online activities of EMI Music Australia's frontline marketing team including heading up the world's first major label music blog, "The In Sound From Way Out" (www.theinsoundfromwayout.com)

Prior to joining EMI Kara held a variety of roles at Digital Radio Australia and the digital division of Australia's leading media company Fairfax. She holds a BSc in Media & Communications from University of Sydney and has dabbled in the study of mechatronic engineering, software programming, electron microscopy, and A/V production. In her personal time she designs artist websites and has been known to fight crowd members for a TV On The Radio setlist.



Art, Code and Platforms
Speaker(s) : Nick Rothwell - Cassiel  
When : Wed 15th February 2012 15:00
Where : Electronic Engineering, Room 209

A look at ten years of digital art projects, reflecting on software platforms, languages and creative process.

Nick Rothwell is a composer, performer, software architect, programmer and sound artist. He has built media performance systems for projects with Ballett Frankfurt and Vienna Volksoper (choreographer: Michael Klien) and Braunarts, and interactive installations for Sonic Arts Network, TECHNE (Istanbul) and the Kinetica kinetic art fair (London). He has worked at STEIM (Amsterdam), CAMAC (Paris) and ZKM (Karlsruhe) and has composed soundtracks for choreographers Aydin Teker (Istanbul) and Richard Siegal (Laban Centre), and performed with Laurie Booth (Dance Umbrella, New Territories), and at the Different Skies Festival (Arcosanti, Arizona), the ICA, and the Science Museum's Dana Centre.

As part of the Monomatic project he worked on the design and programming of a laser-controlled virtual church bell tower as the headline art commission for Sound and Music's Expo Festival in 2009, and a magnetically-triggered modular music box shown at Kinetica, Netaudio London (at the Roundhouse) and the BEAM festival.

As a collaborator with body>data>space he has developed performance systems and sound scores for projects at CIANT (Prague) and in London. He has recently worked on choreographic visualisation tools for Wayne McGregor|Random Dance at Sadler's Wells, interactive sound and sensing systems for Eddie Ladd's ongoing tour of Ras goffa Bobby Sands (The Bobby Sands Memorial Race), and music composition for Shobana Jeyasingh Dance Company. He is currently working with Simeon Nelson and Rob Godman, designing and programming an algorithmic physics animation for large-scale outdoor projection in Poland (Skyway Festival), Estonia and Lumiere Durham.



Seeing with your ears? Image to sound sensory substitution
Speaker(s) : Michael Proulx - QMUL   Dave Brown - QMUL  
When : Wed 8th February 2012 15:00
Where : Electronic Engineering, Room 209

A sensory substitution device for visually impaired persons aims to provide the missing visual input by converting images into a form that another modality can perceive, such as sound. Here we will provide an introduction to our work on the psychological research related to one sensory substitution device, The vOICe. First Michael will discuss the development of sensory substitution devices and The vOICe, and describe some of the research that has been conducted with it. Then Dave will present some of his research with the device, including current work on perceptual learning. He will conclude with some possible future directions for our research.

Michael Proulx is Lecturer in Cognitive Psychology and has been with the School of Biological and Chemical Sciences at QMUL since 2008. His first degree is from Arizona State, and he earned his MA and PhD from the Johns Hopkins University.

Dave Brown is a PhD student in Psychology working in collaboration with Proulx. He first started working with sensory substitution at the University of Sussex, where he earned his BSc and MRes.



Audio processing for radio broadcast and DJs
Speaker(s) : Mark Hills - freelance software developer  
When : Wed 11th January 2012 15:30
Where : Electronic Engineering, Room 209

This talk gives an overview of Linux-based audio software developed by the author and used by several major radio broadcasters for digital broadcast. We go into considerable detail on a mastering processor, used for live processing of a radio station's final output. We discuss more generally the methods used in this and similar systems, which are potentially applicable in 'auto mix' and other systems correcting or dealing with variation in audio. The talk is followed by a short introduction to the 'xwax' software, which is an open-source program used by DJs for live performance.

Mark Hills is a freelance software developer. His interests are in image and audio processing, principally working for the last 10 years in radio broadcast and film post-production. He has a PhD from University of Nottingham in Computer Vision.



Digital Music Research Network (DMRN+6) - One Day workshop
Speaker(s) : Professor Elaine Chew - Queen Mary, University of London  
When : Tue 20th December 2011 10:30
Where : Arts One Lecture Theatre

Queen Mary's Digital Music Research Network is holding a one-day workshop on Tuesday 20th December. The day will include several interesting talks on latest research including our Key Note Speaker, Professor Elaine Chew who will speak about 'Building Bridges - Creating Sustainable Interdisciplinary Collaborations between Musicians and Engineers'. There will also be posters on display throughout the day.

 

The workshop will also be an ideal opportunity for networking with others working in this field.

To register your interest to attend the workshop at: www.elec.qmul.ac.uk/dmrn/events/dmrnp6/. Final day for registration is Friday  9th December 2011.



Tutorial: 'Basic aspects and concepts of musical rhythm'; Seminar: 'The effect of self-motion on judgments of musical tempo'
Speaker(s) : Justin London - Carleton College, MN, USA  
When : Fri 9th December 2011 13:30
Where : BR 3.01

This event comprises the tutorial 'Basic aspects and concepts of musical rhythm' and the seminar 'The effect of self-motion on judgments of musical tempo'. The tutorial is expected to last around 90 and 120 minutes and will be followed by the seminar.

Tutorial topics

  • Distinction between rhythm and meter
  • Beats and beat entrainment
  • Tempo
  • Psychological aspects of rhythm perception and motor behavior
  • Musical and psychoacoustic rhythm terminology
  • Rhythmic coherence

Seminar abstract
Converging evidence from neuroscience (Chen, Penhune, & Zatorre 2009; Grahn and McAuley 2009) and behavioral studies (Repp 2005) points to an intimate link between rhythm perception and production. This presentation will report on recent experiments in which listeners tapped at two different rates (relative to the tactus) to melodies and percussive patterns presented at a wide range of tempos. It was found that tapping rate affects tempo judgment for some listeners; there is also an interaction between musical training, movement rate, and perceived tempo. These results are compared with those of Boltz (1998, 2011) regarding the effect of melodic structure, and Grahn & McAuley (2009) regarding neurological differences which may be related to differences in rhythmic sensitivity. More broadly, we posit that listening to a melody engages our mechanisms for tracking auditory objects in a way that a purely percussive pattern does not, and that our sense of the speed of a musical passage may depend in part in our sense of how fast we have to move in order to move with it.

Justin London is Professor of Music at Carleton College in Northfield, MN, USA, where he teaches courses in Music Theory, The Philosophy of Music, Music Perception and Cognition, and American Popular Music. Trained as a classical guitarist, he holds the Ph.D. in Music History and Theory from the University of Pennsylvania where he studied with Leonard Meyer. He has written articles and reviews on a wide range of subjects, from humor in Haydn to the perception of complex meters. His book Hearing in Time (Oxford University Press, 2004) is a cross-cultural exploration of the perception and cognition of musical meter. In 2005-2006 he was a visiting scholar at the Centre for Music and Science of Cambridge University under the auspices of a UK Fulbright Foundation grant. He has given many talks and symposia, including the Mannes Institute for Advanced Studies in Music Theory (New York, 2005), the International Orpheus Academy for Music & Theory (Ghent, Belgium, 2007), and the Interdisciplinary College (IK) in cognitive science (Günne, Germany, 2009 & 2010). He served as President of the Society for Music Theory in 2007-2009.



Information and Neural Dynamics in the Perception of Musical Structure
Speaker(s) : Marcus Pearce - QMUL  
When : Wed 7th December 2011 15:00
Where : Electronic Engineering, Room 207

Leonard Meyer (1956) distinguished designative meanings, whereby musical structures refer to externally to non-musical events, from embodied meanings, where musical events refer, through psychological processes of implication or expectation, to other musical events. Meyer argued that embodied meanings are capable of producing affective states in the listener. Our goal is to understand the psychological processes involved in generating these embodied meanings using dynamic probabilistic models of expectation in cognitive and neural information processing of musical structure. We have developed dynamic probabilistic models of melodic prediction that use variable-order contexts, long- and short-term musical structure and combine information from multiple musical features in predicting note attributes such as pitch, onset time and duration. We have also developed a novel information-dynamic model based on the concept of predictive information rate, which measures how much information is gained by current observations about the future, but which is not already known from past observations. We use our information-dynamic models to make predictions about listeners' responses to music, which can then be tested empirically. We have shown, for example, that information-dynamic measures of surprise predict listeners' pitch expectations well. Notes whose pitches are improbable given the preceding context are perceived as unexpected and vice versa. These results generalise across a range of melodic contexts including single intervals, English folks songs, chorale melodies and English hymns and predict listeners? expectations better than existing rule-based models. Using EEG to investigate dynamic aspects of the neural mechanisms involved in musical expectation, we have shown that unexpected notes are associated with characteristic patterns of beta band activation and phase-locking at centro-parietal scalp locations. We have also used the information-dynamic models to predict other aspects of musical perception such as phrase segmentation. We hypothesise that grouping boundaries in music correspond to points where the context fails to inform the listener about the identity of the next musical event. This might happen when an unexpected (low probability) event arrives or because the listener is simply uncertain about what will happen next (high entropy). We have produced evidence to support this hypothesis both at the level of phrase boundaries and of high-level form. This work suggests a relationship between dynamic changes in perceptual expectations and the cognitive representation of musical structure.

Educated in experimental psychology and artificial intelligence at Oxford and Edinburgh, Marcus Pearce received his PhD from City University, London in 2005, before continuing his research on music cognition at Goldsmiths, University of London. Following a year as a post-doctoral fellow working on neuroaesthetics in the Wellcome Laboratory of Neurobiology at University College London, he returned to Goldsmiths as a co-investigator on a EPSRC-funded project investigating information and neural dynamics in the perception of musical structure (http://www.idyom.org). He is currently lecturer in sound and music processing in the Centre for Digital Music at Queen Mary, University of London. He has published widely on computational, psychological and neuroscientific aspects of music cognition, in particular on perceptual expectations and auditory grouping in music perception and production.



The future of digital sheet music
Speaker(s) : Nicolas Froment - MuseScore   Thomas Bonte - MuseScore  
When : Fri 2nd December 2011 13:00
Where : Electronic Engineering, Room 209

MuseScore is open source music notation software written in Qt/C++ and licensed under GPLv2. Three years ago, MuseScore was a rather obscure software package only working on Linux. Today, MuseScore runs on all platforms in 43 languages and rivals with the commercial competitors Finale and Sibelius. With more than 120.000 downloads per month, MuseScore has become curriculum software in music education world wide.

Nicolas Froment and Thomas Bonte, two of the MuseScore core developers, will reveal how MuseScore is reinventing itself as a service in the cloud and as an app for mobile devices. The mission is to make a complete digital experience around sheet music, from desktop to web to mobile. On the numerous Music Hack Days which the two have been attending, they have been working with MIR researchers to extend this experience with a score follower based on chroma features, optical music recognition to import pdf scores, and many more nifty hacks.



On symbolic representations and transformations of sound: the theory of sound-types
Speaker(s) : Carmine Emanuele Cella - University of Bologna  
When : Wed 23rd November 2011 15:00
Where : Electronic Engineering, Room 207

The theory of sound-types is a new representation method for musical signals that, while being generic enough to be used for different signals, fulfils by design the following requirements:

  • signal-dependent semantics: the basis of the representation are inferred from the signal, using learning techniques;
  • scalability: it is possible to change the degree of abstraction in the representation, ranging from the signal level to the symbolic-level in a continuous manner; the degree of abstraction becomes a parameter of the representation;
  • weak invertibility: the representation method is able to generate the represented signal; this possibility does not imply, however, that the generated signal must be waveform-identical to the original one, but only that perceptually relevant parts of it can be reconstructed (that's why it is called weak);
  • generativity: it is possible to generate sounds other than the original one, according to some parameters in the domain of the representation that can be estimated from a given signal or deliberately created. After the presentation of the basic ideas, this talk will show a full analysis-synthesis framework and some applications on real signals.

Carmine Emanuele Cella studied at Conservatory of Music "G. Rossini" in Italy getting diplomas in piano, computer music and composition; he also studied mathematics and got a PhD in mathematical logic at the University of Bologna working on symbolic representations of music. As a composer he won many prizes, including the prestigious Petrassi prize for composition, from the President of the Italian Republic Giorgio Napolitano. From 2007 to 2009 he had a job position as researcher at IRCAM in Paris in the Analysis/Synthesis team and he is currently composer in residence in the same institute.



Organising music for movies: from academic research to professional practice
Speaker(s) : Charlie Inskip - London Metropolitan University  
When : Wed 2nd November 2011 15:00
Where : Electronic Engineering, Room 207

Music is widely used to accompany moving images, in films, advertising, television programmes and computer games. The process of choosing and using a piece of pre-existing commercial music for this purpose is known as synchronisation. The addition of music to a piece of film enhances the final work with cultural meaning, and generates additional income for the rights holders. This talk discusses the information needs of professionals involved in the selection of music, including Users from the advertising and film communities and Owners from the recording and publishing industries. Four discourses, or interpretive repertoires, are identified, which carry conflicting meanings of music and are employed throughout the community, although relative emphases vary according to the viewpoint of the stakeholder. A comparison is drawn between the emphasis of the repertoires and the precision of bespoke music search engines. This is used to make recommendations on how to improve the disintermediated communications process, by emphasising the repertoires employed by the Users rather than those of the Owners.

In this talk a cataloguing scheme informed by this research and designed to reflect the Users' way of thinking about music is presented and discussed. This scheme has been manually applied to a collection of 3.5k commercially available recordings as part of the ongoing development by an independent record company of a web-based application to aid in the search for music to accompany moving images.

Charlie Inskip worked in PR and artist management in the music industry for twenty years. He recently took a Masters in Library and Information Science. This was swiftly followed up with an AHRC-funded PhD which investigated the communications processes and information needs of creative professionals in the music and media industries when searching for and using music to accompany moving images. This allowed him to spend time combining his inside knowledge of the music and media industry with more recently acquired understanding of state-of-the-art information management theories and practices. Since he was awarded his doctorate Charlie has been working with a record company on the development of a web-based application to aid in the search for music for moving images. He is also a Senior Lecturer in Music and Media Management at London Metropolitan University.



Musical Trajectories: Humour, Structure, and Interpretation
Speaker(s) : Elaine Chew - QMUL  
When : Wed 19th October 2011 16:00
Where : Electronic Engineering, Room 209

A firm understanding of music structure and mastery of musical prosody are key components of a performing musician's toolkit. This talk is centred on methods and metaphors in visualisations of music structure and expressive performance as aids to illuminating what it is that musicians do. The presentation begins with some work on representation of tonality and algorithms for tonal analysis using Chew's spiral array model, and their application through an interactive software system, MuSA.RT, to visualisation of music structure, in particular, violations of tonal expectations, as employed by PDQ Bach as a laughter-inducing device. The second part of the talk focuses on performance, and begins with an introduction to the ESP driving metaphor for expressive performance, describing how the roads map to musical interpretations. Some limitations of this intuitive interface led to ensuing work on more detailed analyses of expressive nuances, to be illustrated through selected student projects. The talk concludes with an analysis that connects interpretation to music structure, and is based on a study on tempo variations in performances of Beethoven's Moonlight Sonata by Barenboim, Pollini, and Schnabel.

MuSA.RT was developed in collaboration with Alexandre François, using his software architecture for interactive software systems, and ESP with Jie Liu and Alexandre François. The Beethoven analysis was inspired by a lecture by Jeanne Bamberger. Thanks are owed to Dan Tidhar, whose invitation to a music visualisation workshop at King's College London helped shape the content of this presentation.

Elaine Chew is Professor of Digital Media at the Centre for Digital Music at Queen Mary, University of London. An operations researcher and pianist by training, her research activities aim to explain and de-mystify the phenomenon of music and its performance through the use of formal scientific methods. As a performer, she designs and curates concerts featuring interactive scientific music visualisations, and collaborates with composers to present eclectic post-tonal music. Prof. Chew received her PhD and SM degrees in operations research from the Massachusetts Institute of Technology, a BAS in music (distinction) and mathematical and computational sciences (honors) from Stanford University, and FTCL and LTCL diplomas in piano performance from Trinity College, London. She began her academic career at the University of Southern California, where she was awarded various NSF grants and the Presidential Early Career Award in Science and Engineering for research and education activities at the intersection of music and engineering. She was the 2007-2008 Edward, Frances, and Shirley B. Daniels Fellow at the Radcliffe Institute for Advanced Study at Harvard University. In 2009, Albany Records released a CD, Doubles, featuring her performance of English-American composer Peter Child's bitonal pieces based on Chinese and Malay songs from her childhood.



Acoustic instrument augmentation: motivation, techniques and results
Speaker(s) : Andrew McPherson - QMUL  
When : Wed 12th October 2011 16:30
Where : Electronic Engineering, Room 207

Musical instrument augmentation refers to the process of adding new capabilities to existing instruments through electronic or other means. This talk will discuss the methods and implications of augmenting acoustic instruments, with a particular focus on the magnetic resonator piano (MRP), a hybrid acoustic-electronic grand piano based on electromagnetic string actuation and continuous key motion sensing. By extending rather than replacing traditional instruments, augmented instruments draw on the advanced training of expert performers, promoting ready integration in the concert hall: the MRP has been used in performances across the United States in collaboration with several professional and conservatory-student pianists and composers. The talk will conclude with a discussion of related future research directions, including modelling of expressive physical gesture in performance, low-latency processing of multiple asynchronous sensor data streams, and embedded audio systems for the creation of self-contained augmented instruments.

Andrew McPherson joined Queen Mary University of London as Lecturer in Digital Media in September 2011. He holds a PhD in music composition from the University of Pennsylvania and an M.Eng. in electrical engineering from the Massachusetts Institute of Technology. Prior to joining Queen Mary, he was a postdoc in the Music Entertainment Technology Laboratory at Drexel University, supported by a Computing Innovation Fellowship from the Computing Research Association and NSF. Current research topics include electronic augmentation of the acoustic piano, new musical applications of multi-touch sensing, quantitative studies of expressive performance technique, and embedded audio processing systems. He remains active as a composer of orchestral, chamber and electronic music, with performances across the United States and Canada, including at the Tanglewood and Aspen music festivals.



Interfaces and dependencies: reflections on development of AudioMulch and PortAudio
Speaker(s) : Ross Bencina - AudioMulch  
When : Wed 5th October 2011 16:00
Where : Electronic Engineering, Room 207

AudioMulch is modular audio processing software for musical performance and improvisation. PortAudio is a cross-platform open source library for real-time audio input/output. This talk will reflect on experience developing these and other audio and music projects. In developing these systems challenges often arise outside the field of digital audio signal processing: at the boundaries where a program interfaces to software frameworks, to the operating system, to plugins, to real-time hardware, and to computer networks. The talk will discuss specific practical issues encountered in these areas.

Ross Bencina composes and performs improvised music using computer processed sound. He has developed software for this purpose, AudioMulch, which he has distributed on the internet since 1998. Since the mid-90s Ross has performed both solo and in collaboration with acoustic instrumental musicians and electronic performers. Ross is interested in the possibilities for musical expression offered by computer processed sound. To this end he is associated with various open source software projects aimed at supporting and extending musical uses of computers. Such projects include: PortAudio, oscpack, and reacTIVision -- the computer vision component of the reacTable. Ross studied music at La Trobe Univeristy and he is currently based in Melbourne, Australia.

AudioMulch is available for Mac and Windows from www.audiomulch.com.

You can read Ross’ blog at www.rossbencina.com.



C4DM's 10th anniversary: Past, Present & Future
Speaker(s) :
When : Wed 14th September 2011 10:00
Where : School of Electronic Engineering & Computer Science, Queen Mary

We are celebrating 10 years since Digital Music research began at Queen Mary, University of London. There will be a series of keynote talks throughout the day, a reception and an opportunity to have a look at our latest facilities on site, including a new state-of-the-art listening room, recording studio and performace space.

 

[Registration for this event is now closed.]


You can view videos from the event by clicking here.



Compressive MUSIC: A Missing Link between Compressive Sensing and Array Signal Processing for Joint Sparse Recovery
Speaker(s) : Jong Chul Ye - KAIST, Korea  
When : Fri 29th July 2011 16:00
Where : Electronic Engineering, Room 209

The multiple measurement vector (MMV) problem addresses the identification of unknown input vectors that share common sparse support. Even though MMV problems have been traditionally addressed within the context of sensor array signal processing, the recent trend is to apply compressive sensing (CS) due to its capability to estimate sparse support even with an insufficient number of snapshots, in which case classical array signal processing fails. However, CS guarantees the accurate recovery in a probabilistic manner, which often shows inferior performance in the regime where the traditional array signal processing approaches succeed. The apparent dichotomy between the probabilistic CS and deterministic sensor array signal processing has not been fully understood. The main contribution of the present article is a unified approach that unveils a missing link between CS and array signal processing. The new algorithm, which we call compressive MUSIC, identifies the parts of support using CS, after which the remaining supports are estimated using a novel generalized MUSIC criterion. Using a large system MMV model, we show that our compressive MUSIC requires a smaller number of sensor elements for accurate support recovery than the existing CS methods and that it can approach the optimal l0-bound with finite number of snapshots.

This talk is based on our recent publication: J.M. Kim, O.K. Lee, and J. C. Ye, "Compressive MUSIC: A Missing Link between Compressive Sensing and Array Signal Processing",  to appear in IEEE Trans. on Information Theory, 2011. (http://bisp.kaist.ac.kr/papers/CompMUSIC-KimLeeYe_(double).pdf) 

 

Jong Chul Ye received the B.Sc. and M.Sc. degrees with honors from Seoul National University, Korea, and the Ph.D. degree from the School of Electrical and Computer Engineering, Purdue University, West Lafayette. Before he joined KAIST as an assistant professor in 2004, he worked as research scientist at GE Global Research Center, NY, Philips Research, NY, University of Illinois at Urbana-Champaign. His current research interests include compressed sensing theory,  and statistical signal processing for various imaging modalities such as MRI, NIRS, etc. He received various awards including Guerbet Paper Award from Korean Society for Magnetic Resonance in Medicine (2010), best paper award from Korean Human Brain Mapping Society (2009), etc.  He was the winner of 2009 ISMRM Recon Challenge at ISMRM Workshop.



Interactive Music Apps on Mobile Devices
Speaker(s) : Martin Macmillan - Bounce Mobile  
When : Fri 10th June 2011 15:00
Where : Electronic Engineering, Room 207

Consumers are showing an increasing interest in being able to interact with music rather than just listen to it. Todays smartphone technology has lowered the barriers to creating content and a connected environment provides an ideal mechanism for sharing. The seminar will look at some of the business drivers behind the apps market, licensing challenges of working with record labels and music publishers, and where the market is going, as well as the technology and innovation required to support market growth. 

Martin Macmillan created Bounce Mobile in early 2010 to focus exclusively on providing new ways to interact with digital music using mobile devices. Since 2000 he has been involved in creating and building software startups, and previous businesses have achieved a number of awards including Deloitte Fast50 recognition. Martin has an MA from St Andrews University and is fanatical about music and technology and how they can be combined to create scalable consumer products.



Music Understanding and the Future of Music Performance
Speaker(s) : Roger B. Dannenberg - Carnegie Mellon University  
When : Thu 2nd June 2011 15:00
Where : Bancroft Road Teaching Room 3.02

This is a EECS Distinguished Lecturer Seminar. For more details click here.



Pitch Shifting - Hooked Since 1975
Speaker(s) : Alex U. Case - Sound Recording Technology, University of Massachusetts Lowell, USA  
When : Thu 12th May 2011 14:30
Where : Electronic Engineering, Room 209

David Bowie made a particularly musical use of the recording studio when he collaborated with Carlos Alomar and John Lennon in the creation of “Fame.” In a one-day session at Electric Lady Studios, New York, 1975, that one word, that single syllable inspired a pop music hook built on pitch shifting.  Despite the seemingly limitless capability of hardware and software tools available in the studio today, we have much to learn from this recorded work of art more than three and a half decades later.  Reverse engineering the audio engineering that led to the iconic, descending line, “Fame, fame, fame, ... “ points to extraordinary uses of humble technologies, testament to the value of creative drive and extraordinary musicianship - relevant motivators in today’s production environment.

 

Alex U. Case is an Associate Professor of Sound Recording Technology at the University of Massachusetts Lowell.  With degrees in Mechanical Engineering, Music, and Acoustics, Professor Case has dedicated his professional life to the study of aesthetics, perception, signal processing, electro-acoustics and room acoustics for loudspeaker-mediated art and information.

His research and professional activities focus heavily on the technical foundations, creative motivations, and aesthetic merit of recording and signal processing techniques used in multitrack production.  Case is a widely published author, with over 100 articles appearing in multiple journals and industry trade publications.  He has written the authoritative guide to audio signal processing in multitrack production, entitled Sound FX – Unlocking the Creative Potential of Recording Studio Effects, published by Focal Press,.  The new book, Mix Smart - Pro Audio Tips for Your Multitrack Mix, will be released in the summer of 2011.

Case is a committed educator.  In addition to his undergraduate and graduate teaching, Case has given many invited lectures and master classes across the U.S. in such prestigious programs as Berklee College of Music, Emerson College, Boston University, Johns Hopkins University, Penn State, New England Institute of Art, New York University, Rensselaer Polytechnic Institute, University of Hartford, and internationally at institutions including the Banff Centre for the Arts in Canada, Fermatta Academy of Music in Mexico City, McGill University in Montreal, SAE Institutes in Milan and Paris, and the Shanghai Conservatory of Music in China.

The Audio Engineering Society is central to Case’s career. He serves the Awards Committee, Education Committee, and Membership Committee, and has served the Convention Planning Committees in New York and San Francisco.  He has been a featured speaker and panelist for multiple regional meetings, and instructed several standing-room only tutorials at AES International Conventions in Europe and the U.S.  Case is a Fellow of the Acoustical Society of America, serves as Chair of Technical Committee on Architectural Acoustics, and has been an invited contributor of more than a dozen papers.



Reality is Not a Recording/A Recording is Not Reality
Speaker(s) : Jim Anderson - Clive Davis Institute of Recorded Music, NYU, USA  
When : Thu 12th May 2011 14:00
Where : Electronic Engineering, Room 209

The former New York Times film critic, Vincent Canby, wrote “all of us have different thresholds at which we suspend disbelief, and then gladly follow fictions to conclusions that we find logical.” Any recording is a ‘fiction,’ a falsity, even in its most pure form. It is the responsibility, if not the duty, of the recording engineer, and producer, to create a universe so compelling and transparent that the listener isn’t aware of any manipulation. Using basic recording techniques, and standard manipulation of audio, a recording is made, giving the listener an experience that is not merely logical but better than reality. How does this occur? What techniques can be applied? How does an engineer create a convincing loudspeaker illusion that a listener will perceive as a plausible reality? Recordings will be played.

 

Jim Anderson is an internationally recognized recording engineer and producer of acoustic music for the recording, radio, television, and film industries. He is the recipient of numerous awards and nominations in the recording industry: his recordings have received nine Grammy awards and 25 Grammy nominations; his radio recordings have received two George Foster Peabody Awards and television programs have received two Emmy nominations.
 
A graduate of the Duquesne University School of Music in Pittsburgh PA, Jim has studied audio engineering at the Eastman School of Music and Sender Freies Berlin. During the 1970s, he was employed by National Public Radio and engineered and produced many award-winning classical, jazz, documentary, and news programs. Since 1980 Jim has had a career as an independent audio engineer and producer, living in New York City. He has been a frequent lecturer and speaker for the Audio Engineering Society and master-class guest faculty member at leading international institutes, including the Berklee College of Music, The New England Institute of Art, McGill University, The Banff Centre, Universite de Kunst in Berlin, University of Luleå in Sweden, The New School University, University of Georgia, Tokyo National University of Fine Arts and Music, and Penn State University. He is a professor of recorded music with the Clive Davis Department of Recorded music in the Tisch School of the Arts at New York University and was the department’s Chair from 2004 – 2008.
 
He has served as Vice President for Eastern Sections of the Audio Engineering society (AES), chaired the New York Section of the AES and was Chair of the 119th and 123rd AES Conventions. In 2006, he was made a Fellow of the AES and has received two AES’ Board of Governors Awards. He was the President of the Audio Engineering Society, 2008-2009.



Co-operative Music Applications
Speaker(s) : Neil Cosgrove - LNX Studio  
When : Wed 20th April 2011 14:30
Where : Electronic Engineering, Room 209

Making music on your computer at home can be a lonely experience and designing co-operative music software that works on the internet can become both challenging and complex. The seminar will demonstrate a system that can mirror the studio experience across several computers. When designing such projects many decisions have to be made about both it's features and the functionality of the underlying model. The key areas that will be discussed are:
- what goes into the co-operative experience.
- designing a network protocol for such a system.
- synchronising clocks at different locations.
- how networking and it's inherent latency affects a consistent model.
The software been demonstrated works on Mac OS 10.4 or above. Participants are encouraged to bring their own laptops if they wish to join in on a live collaboration.

Neil Cosgrove has 14 years of experience designing audio software in the SuperCollider language. Building on the knowledge gained from small and novel concepts, he now focuses on standalone applications. Been both a programmer and bedroom producer, he has always enjoyed both using and creating fun musical gadgets. As an independent developer he is an advocate of open source software and he is currently working on LNX_Studio, an application designed for networked music collaborations.

Website: http://lnxstudio.sourceforge.net/



What are the properties of a musical sound that identify the family and register of the instrument?
Speaker(s) : Roy Patterson - University of Cambridge  
When : Fri 8th April 2011 15:00
Where : Electronic Engineering, Room 209

This talk is about the sounds made by instruments of the orchestra, including singers. It is intended to explain the basics of musical note perception; that is, why instruments come in families; what determines ‘register’ within families; and why we hear distinctive differences between members of a given instrument family - even when they are playing the same note. On the surface, the answers to these questions may seem obvious; one could say that brass instruments all make the same kind of sound because they are all made of brass, and the different members of the family sound different because they are different sizes. But there is a deeper explanation involving three acoustic properties of musical sounds, as they occur in air. The talk describes these properties (with audio demos) and explains why they are particularly useful in (a) summarizing the physics of note production by instruments, on the one hand, and (b) explaining the dimensions of musical note perception, on the other hand.


Roy D. Patterson received a B.A degree from the University of Toronto in 1967, and a Ph D. from the University of California in 1971, on residue pitch perception. From 1975-1995, he was a research scientist of the UK Medical Research Council focusing on the measurement of auditory filter shape. He also designed and helped implement auditory warning systems for aircraft, hospitals and fire stations.

Since 1996, he has been the head of the Centre for the Neural Basis of Hearing in the Department of Physiology, Development and Neuroscience at the University of Cambridge, UK. The focus of his current research is an ‘Auditory Image Model’ of auditory perception and how it can be used to 1) normalize communication sounds for glottal pulse rate and vocal tract length, and 2) produce a size-invariant representation of the message in communication sounds at the syllable level.

Dr Patterson is a Fellow of the Acoustical Society of America and has published over 100 articles in JASA and other international journals. See http://www.pdn.cam.ac.uk/groups/cnbh/



CompMusic: Computational Models for the Discovery of the World's Music
Speaker(s) : Xavier Serra - Universitat Pompeu Fabra, Barcelona  
When : Thu 7th April 2011 14:00
Where : Electronic Engineering, Room 209

This is a joint Distinguished Lecturer Seminar and Centre for Digital Music Seminar. For more details click here.



Music research and engineering at the Echo Nest
Speaker(s) : Brian Whitman - The Echo Nest  
When : Wed 6th April 2011 14:00
Where : Electronic Engineering, Room 209

The Echo Nest is a music intelligence company powering smarter music applications for leading media companies and thousands of independent developers through our API. We'll discuss how we analyze the entire world of music automatically - 1.5 million artists, 30 million songs - and show some examples of the imminent future of the music experience made possible by developers tapping into the data.

Brian Whitman teaches computers how to make, listen to, and read about music. He received his doctorate from the Machine Listening group at MIT’s Media Lab in 2005 and his masters in Computer Science from Columbia University’s Natural Language Processing group in 2000. His research links automatically extracted community knowledge of music to its acoustic properties to “learn the meaning of music.” His composition and sound art projects consider the effects of machine interpretation of large amounts of media, such as the first actual “computer music” (as in music for computers) of “Eigenradio”. As the co-founder and CTO of the Echo Nest Corporation, Brian architects an open platform with billions of data points about the world of music: from the listeners to the musicians to the sounds within the songs.



Animal Acoustic Communication in Noisy Social Environments
Speaker(s) : Vivek Nityananda - Queen Mary University of London  
When : Wed 30th March 2011 15:30
Where : Electronic Engineering, Room 105

Several animals communicate in large groups with multiple individuals. Hearing individual voices or calls in such groups is a difficult problem which each of these animals - frogs in a chorus, penguins in a breeding colony, humans at a party - manage to solve in order to communicate. A large body of work has investigated the acoustic and spatial cues that humans use to distinguish individual sources of sounds and solve this so-called 'cocktail party problem'. We, however, know very little about how other animals manage to achieve the same in their acoustic environments. Using a series of acoustic playback experiments, we investigated how Cope's gray tree frog (Hyla chrysoscelis) segregates sources of sound in its environment. In this talk, I present results from these experiments and discuss how both spatial and frequency cues help treefrogs hear in their natural acoustic environments.

 

Dr. Vivek Nityananda completed his Ph.D. at the Centre for Ecological Sciences at the Indian Institute of Science in Bangalore, India. His doctoral work focused on the sensory ecology of acoustic communication in bushcrickets. He has since worked on frog hearing at the University of Minnesota and is currently working at the School of Biological and Chemical Sciences at Queen Mary on a Human Frontiers in Science postdoctoral research grant. The focus of his current research is visual search and attention in bumblebees.



Writing High-Level Game Audio Tools
Speaker(s) : Nicolas Fournel - Sony Computer Entertainment Europe  
When : Wed 9th March 2011 15:30
Where : Electronic Engineering, Room 105

Larger game worlds and more immersive levels require the creation of an increasingly high number of audio assets, many of them having to exhibit a dynamic behaviour. Working on such complex projects with usually small sound teams quickly becomes a challenge and conventional audio tools and scripting systems are often inadequate.

High-level audio tools should both help the sound designers to express their creativity and allow them to boost their productivity. During this presentation, we will examine what constitutes such a tool. In particular, we will see how the next generation of tools will need to interface with the other game subsystems and will be able to leverage the power of audio analysis and procedural generation.


Nicolas Fournel
has 20 years of experience developing commercial digital audio software. He started his career programming sample editors in assembler on the Amiga and later founded Synoptic, a company that specialized in audio software synthesis on PC in the 90s.  He then joined the game industry where he spent the last 11 years designing multi-platform audio engines and tools for companies such as Factor 5,  Konami and EA in their central technology department.

Now a principal programmer at Sony Computer Entertainment Europe, his focus is on building innovative audio systems that empower sound designers while keeping technology invisible. His main interests include audio features extraction, procedural audio and high-level creative tools. 



The (Social) Web and Music, Stars and the Like
Speaker(s) : Markus Schedl - Johannes Kepler University Linz  
When : Wed 2nd March 2011 14:30
Where : Electronic Engineering, Room 105

Music context-based information extraction is a hot research topic, not at least because of the enormous rise Social Web usage has encountered during the last couple of years. In this seminar talk, I will present Web-based methods to extract data about music entities (mostly on the level of music artists) and show how such data can be used to build music applications and services. The talk will cover three areas of Web-based MIR:

- music-related information extraction (How to automatically build a music information system?)
- artist similarity measurement using Web pages and using microblogs (tweets)
- popularity estimation (Who's hot?)

For the first topic, I will report on text-based information extraction methods to determine prototypical artists with respect to a certain category (e.g., genre), to perform automated tagging, to retrieve album cover artwork, and to detect band members and instrumentation. As for the similarity measurement task, I will report on large-scale evaluation experiments aimed at determining well-performing parameter settings for modeling the Web-based music similarity space (e.g., TF-, IDF-formulations, normalization strategies, similarity functions). Finally, I will present and compare different techniques to predicting the popularity of a music artist using different data sources (Web, Twitter, last.fm, Peer-to-Peer Networks).

Markus Schedl graduated in Computer Science from the Vienna University of Technology. He earned his Ph.D. in Computational Perception from the Johannes Kepler University Linz, where he is employed as assistant professor at the Department of Computational Perception. He further holds a Master's degree in International Business Administration from the Vienna University of Economics and Business Administration. Schedl (co-)authored more than 40 refereed conference papers and several journal articles. Furthermore, he reviewed submissions to various conferences and articles for the journals IEEE Transactions on Multimedia and Springer Multimedia Systems, as well as for the IEEE Communications Magazine. He is co-founder of the International Workshop on Advances in Music Information Research. His main research interests include Web Mining, Music and Multimedia Information Retrieval, Information Visualization, and Recommendation/Personalization.



Computational Recognition of Singing Voices in Polyphonic Music based on Statistical Approach
Speaker(s) : Hiromasa Fujihara - National Institute of Advanced Industrial Science and Technology  
When : Tue 22nd February 2011 15:00
Where : Electronic Engineering, Room 105

When humans listen to singing voices, they distinguish these voices from sound mixtures that include not only singing voices but also the sounds of other instruments and environmental noise. However, compared to this innate ability of humans to recognize the real world auditory scene, the ability of current computer to do so is still inadequate. In this talk, we will describe our efforts to tackle the problem of computational recognition of singing voices in polyphonic music. In particular, considering that vocal timbre, melody, and lyrics as the essential elements of singing voices, we will address the following four tasks that correspond to the above elements: 1) singer identification and its application to vocal-timbre-similarity-based MIR, 2) F0 estimation of vocal part in polyphonic music, 3) automatic synchronization between music and lyrics, and 4) concurrent estimation of F0 and phoneme of singing voices.

Hiromasa Fujihara is a Research Scientist of the National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan. He joined AIST in 2007 after graduating the master course of Kyoto University, Kyoto, Japan. He received his PhD degree in Informatics from Kyoto University in 2010. His research interests include singing information processing and music information retrieval.



Lyrics-to-Audio Alignment: Methods of Integrating Textual Chord Labels and an Application
Speaker(s) : Matthias Mauch - National Institute of Advanced Industrial Science and Technology, Japan  
When : Wed 2nd February 2011 15:00
Where : Electronic Engineering, Room 105

Aligning lyrics to audio has a wide range of applications such as the automatic generation karaoke of scores, song-browsing by lyrics, and the generation of audio thumbnails. Existing methods are restricted to using only lyrics and match them to phoneme features extracted from the audio (usually mel-frequency cepstral coefficients). Our novel idea is to integrate the textual chord information provided in the paired chords-lyrics format known from song books and Internet sites into the inference procedure.

We propose two novel methods that implement this idea: firstly, assuming that all chords of a song are known, we extend a hidden Markov model (HMM) framework by including chord changes in the Markov chain and an additional audio feature (chroma) in the emission vector; secondly, for the more realistic case in which some chord information is missing, we present a method that recovers the missing chord information by exploiting repetition in the song.

We conducted experiments with five changing parameters and show that with accuracies of 87.5% and 76.0%, respectively, both methods perform better than the baseline with statistical significance.

We will demonstrate Song Prompter, a software system that acts as a performance assistant by showing horizontally scrolling lyrics and chords in a graphical user interface, together with an audio accompaniment consisting of bass and MIDI drums. The application shows that the automatic alignment is accurate enough to be used in a musical performance.


Matthias Mauch received the Diplom degree in mathematics from the University of Rostock, Germany, in collaboration with the Max-Planck Institute for Demographic Research. He received his Ph.D. degree from the Centre for Digital Music at Queen Mary University of London, U.K. Currently he is a post-doc research scientist at the Media Interaction Group of the National Institute of Advanced Industrial Science and Technology (AIST), Japan. His research focuses on the automatic extraction of high-level musical features from audio, with an emphasis on harmonic progressions and repetitions. He is songwriter in the band Zweieck.

Website: http://matthiasmauch.net



Evaluation of signal derived measures for predicting the perceived quality of blindly separated audio source signals
Speaker(s) : Thorsten Kastner - Department of Information Technologies, University of Erlangen  
When : Wed 26th January 2011 15:00
Where : Electronic Engineering, Room 105

Source separation algorithms are often employed in applications where the aim is the acoustic reproduction of the separated source signals. The perceived quality of the produced audio signals is therefore an important key factor to rate these systems.

Several signal-derived features are compared to assess their relevance in reflecting the perceived audio quality of separated audio source signals. A comparison is presented between three classes of features with respect to their ability to grade separated source signals according to their perceived quality: First, 'classic' SNR measures derivatives thereof, e.g. the ones used in the SASSEC 2007 and SiSEC 2008 source separation campaigns. Second, newly developed features based on the 'classic' measures but augmented to consider perceptual aspects. Third, also features from ITU-R PEAQ measurement scheme have been examined. Using these results, a multivariate linear regression model for perceptual rating of separated audio source signals was set up. The method for selecting the most appropriate features and a reasonable feature subset are presented.

In order to cover a large variety of source signals and different algorithms, the reference ratings are obtained from extensive listening tests rating separated source signals from source separation algorithms submitted to the Stereo Source Separation Campaigns 2007 (SASSEC) and 2008 (SiSEC). Results are presented for predicting the perceived quality of SiSEC items based on a model that was solely trained on SASSEC material.


Thorsten Kastner received his Dipl.-Ing. degree from the University of Erlangen, Germany, in 2000 and joined the Fraunhofer IIS in the same year. After working on face recognition algorithms for biometric access systems at the department Electronic Imaging at Fraunhofer IIS he joined the Fraunhofer IIS Audio department in 2001. He was involved in developing algorithms for Music Information Retrieval, mainly automatic genre estimation, sounds-like query engines and MPEG-7 robust audio fingerprinting for music identification for which the Fraunhofer prize was won in 2004. He is currently working as a researcher at the Department of Information Technologies, University of Erlangen in collaboration with the Fraunhofer Institute IIS and pursuing the Dr.-Ing. degree. His current research activities are Semantic Audio Processing, Blind Source Separation and the perceptual evaluation of source separation algorithms.



Using the Music Ontology in MusicMash2
Speaker(s) : Edward Thomas - University of Aberdeen  
When : Wed 15th December 2010 15:00
Where : Electronic Engineering, Room 105

Abstract:

Search facilities are vital both within folksonomy (or social tagging mechanism) based systems such as YouTube and Flickr and across different folksonomy based systems. Although these systems allow great malleability and adaptability, they also suffer from problems, such as ambiguity in the meaning of tags, flat organisation of tags and some degree of destabilising factor on consensus about which tags best describe some certain Web resources. It has been argued that folksonomy structure can be enhanced by ontologies; however, a key question remains open: how to exploit the benefits of ontologies without bothering untrained users with its rigidity. We propose an approach which exploits open data sources and ontologies, and demonstrate this in the MusicMash2 system, which uses the Music Ontology.

Edward Thomas is a Research Fellow in the Department of Computer Science at the University of Aberdeen. He works on the MOST project (Marrying Ontologies and Software Technologies), and previously worked within the AKT (Advanced Knowledge Technologies) project. His research interests cover linked data and reasoning on the Semantic Web, and he is the creator and lead developer of the TrOWL infrastructure for tractable reasoning of OWL ontologies, as well as the ONTOSEARCH2 and Taggr systems. He has also been heavily involved in the MusicMash2 project, combining tagged resources from YouTube and Flickr and music knowledge encoded in the Music Ontology.



Quality-informed beat tracking of musical audio
Speaker(s) : Norberto Degara - University of Vigo, Spain  
When : Wed 1st December 2010 15:00
Where : Electronic Engineering, Room 105

Abstract:

In this talk, I will present a probabilistic framework for beat tracking of musical audio. The method estimates the time between consecutive beat events and exploits both beat and non-beat information by explicitly modeling non-beat states. In addition to the beat times, a measure of the expected accuracy of the estimated beats is provided. The quality of the observations used for beat tracking is measured and the reliability of the beats is automatically calculated using a k-nearest neighbor regression algorithm.

The performance of the beat tracking system is statistically evaluated and compared with existing algorithms. I will conclude the talk by discussing how reliability information can be used to increase performance and compare automatic beat tracking to human tapping.


Norberto Degara is a Ph.D. candidate at the Signal Theory Department of the University of Vigo, Spain. He received a M.S. in Electrical Engineering from The University of Texas at Austin in 2007. His research interests cover music information retrieval, music signal processing and machine learning techniques applied to music.



A Non-negative Framework for Joint Modeling of Spectral Structure and Temporal Dynamics in Sound Mixtures
Speaker(s) : Gautham J. Mysore - Advanced Technology Labs, Adobe Systems  
When : Wed 1st December 2010 11:30
Where : Electronic Engineering, Room 105

Abstract:

A common theme in most good strategies to modeling audio is the ability to make use of structure. Particularly, audio has a strong spectral and temporal structure. When dealing with sound mixtures, the structure of the individual sources becomes particularly important if we wish deal with the sources separately. In recent years, there has been a great deal of work in modeling audio using non-negative matrix factorization (NMF) and its probabilistic counterparts. They however fail to account for the non-stationarity of audio and do not provide a model of the temporal dynamics of sound sources. On the other hand, Hidden Markov Models (HMMs) have been used for decades to model temporal dynamics. They can be powerful for audio analysis, as shown by their application to speech recognition. However, they have certain limitations when it comes to high quality reconstruction of audio.

We propose a new model, the non-negative hidden Markov model (N-HMM), that combines the best of both worlds. In the proposed model, we jointly learn several small dictionaries that characterize the spectral structure of a given sound source as well as a Markov chain that characterizes the temporal dynamics of the sound source. We demonstrate the application of this model on content-aware audio processing.

We then propose a model of sound mixtures, the non-negative factorial hidden Markov model (N-FHMM), that combines models of individual sources. We demonstrate the application of this model on single channel supervised source separation.


Gautham J. Mysore is a research scientist in the Advanced Technology Labs at Adobe Systems Inc., San Francisco. He is also currently a visiting researcher at the Gatsby Computational Neuroscience Unit at the University College London. He received an M.A. and Ph.D. at the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University. He has also received an an M.S. in Electrical Engineering from Stanford University. His research interests include machine learning and signal processing for various audio applications.



Fingers in the Dyke: can there be a Viable Market for Digital Music?
Speaker(s) : John Darlington - Social Computing Group, Imperial College London  
When : Wed 10th November 2010 15:00
Where : Electronic Engineering, Room 105

Illegal downloading of digital music, piracy, is one feature of the Internet that troubles many people. When one considers the inherent nature of the product and the communities involved it is, perhaps, not surprising that so much copying and sharing goes on and this can have many social and cultural benefits. However, even if one accepts that sharing is a natural communal activity, one is still left with the question of how will musicians be paid for their labours and can a viable market for music survive in the Internet age. This talk will examine the technical and economic nature of digital music to understand why piracy is so prevalent. It will then look at a restructuring of the industry as direct peer-to-peer interactions between artists and fans and examine some novel mechanisms for music distribution, pricing and revenue sharing that could provide the basis for an open market in digital music. The difficulty of sustaining a revenue stream for recorded music has led to a renewed or increased interest in live performances. This talk will examine ways in which live Internet performances and concerts could be organised and mounted to provide alternative modes of interaction and revenue streams. These “live-to-the-net” performances will then be comparable to other content delivered live or on-demand in the converged world of Internet, radio and television and the talk will conclude with an examination of the issues involved in producing, accessing and pricing content in this increasingly distributed and disintermediated digital world. This work is being carried out as part of an EPSRC Digital Economy “Research in the Wild” project led by Professor Darlington and Dr. Thierry Rayna (London Metropolitan University).


John Darlington is a Professor in the Department of Computing at Imperial College and Head of the Social Computing Group and Director of the London e-Science Centre. Professor Darlington has a long and distinguished track record both in the development of novel software technologies and in the creation of facilities to improve the accessibility and ease of use of computational resources. This work has included pioneering developments in functional programming languages, program transformation, functional skeletons, co-ordination forms (later adopted by Google as map/reduce) and component-based application development frameworks and the founding and operation of the Imperial College Fujitsu Parallel Research Centre, the Imperial College Parallel Computing Centre, the London e-Science Centre and the Imperial College Internet Centre.

Professor Darlington has had a long-term interest in the power of the Internet to promote radical economic and social change. As early as 1997 he led an EPSRC ROPA project: "Electronic Trading: Simulation of New Patterns of Economic Interaction and Transport" that foreshadowed recent developments in Internet shopping and trading intermediaries. In the e-Science programme Professor Darlington and the London e-Science Centre led the influential UK e-Science Core Programme project: "A Market for Computational Services". This project, collaborative with Sun Microsystems, developed an architecture to support the use-on demand, pay-per-use model for all types of computing resource, anticipating recent major developments in Cloud computing and App stores. The Social Computing Group and the London e-Science Centre are currently engaged in a number of projects developing innovative Internet and Cloud-based services in a number of academic, social and commercial arenas.



Extending the Musical Experience - From the Physical to the Digital, and Back
Speaker(s) : Gil Weinberg - Georgia Institute of Technology  
When : Wed 30th June 2010 15:00
Where : ITL Top Floor Meeting Room

QMUL Distinguished Lecturer Seminar Series

Over the last 15 years I have explored a number of research directions in which digital technology bears the promise of innovating the core of the musical experience. I experimented with novel gestural expression, collaborative networks, and constructionist learning – research areas that bear the promise of leading to musical experiences that cannot be facilitated by traditional means. My exploration of new gestural expression builds on the notion that through novel sensing and mapping techniques, new expressive musical gestures can be discovered that are not supported by current acoustic instruments. Such gestures, unconstrained by the physical limitation of acoustic sound production, can provide new possibilities for expressive and creative musical experiences for novice as well as trained musicians. Remote and local digital networks can revolutionize collaborative musical experiences by allowing players to take an active role in determining and influencing not only their own musical output but also that of their co-performers. By using the network to interdependently share and control musical materials in a group, musicians can combine their musical ideas into a constantly evolving collaborative musical activity that is novel and inspiring. I also developed constructionist learning musical systems, which bear the promise to enhance music education by providing hands-on access to programmable music making. Through interaction with physical computational objects, learners can construct personally meaningful musical artifacts that enhance and deepen their learning.

While facilitating novel musical experiences that cannot be achieved by traditional means, the digital nature of these projects often led to flat and inanimate speaker-generated sound, hampering the physical richness and visual expression and embodiment of acoustic music. In my current work, therefore, I attempt to combine the benefits of digital computation and acoustic richness, by exploring the concept of “Robotic Musicianship.” I define this concept as a combination of musical, perceptual, and social skills with the capacity to produce rich acoustic responses in a physical and visual manner. The robotic musicianship project aims to combine human creativity, emotion, and aesthetic judgment with algorithmic computational capabilities, allowing human and robotic players to cooperate and inspire each other to push music forward to unexplored domains.

Gil Weinberg is the Director of Music Technology at Georgia Institute of Technology. Dr. Weinberg received his M.S. and Ph.D. degrees in Media Arts and Sciences from MIT, after co-founding and holding positions in music and media software industry in his home country of Israel. In his academic work Weinberg attempts to expand musical expression, creativity, and learning through meaningful applications of technology. His research interests include new instruments for musical expression, musical networks, machine and robotic musicianship, sonification, and music education. Weinberg’s music has been featured in many festivals and concerts. He has published more than 40 peer-reviewed papers. Based on his most recent project – a set of musical applications for cell phones that allow children and novices to create music in expressive and intuitive manner – he is has established a startup company, ZOOZ Mobile.



A Domain Specific Music Search Engine for eHealth
Speaker(s) : Ye Wang - School of Computing, National University of Singapore  
When : Tue 29th June 2010 15:00
Where : Electronic Engineering, Room 105

With a rapidly ageing population in Singapore and the rest of the world, the number of patients with Parkinson’s disease (PwPD) is expected to increase drastically in the next 20 years. A primary symptom of Parkinson’s disease is the progressive loss of physical movement thus a degraded quality of life. Music therapy research has shown that the use of familiar tempo-matched auditory stimuli helps aid in the tasks of walking and gait training for Parkinson's patients. However, searching for such music by cultural relevance and tempo is an inefficient task using current music search engines. To solve this problem, we introduce a novel music retrieval system that incorporates tempo, cultural and beat strength features to help music therapists provide appropriate music for their patients. Unlike current methods available to music therapists (e.g., personal CD/MP3 library search, online database search), we propose a domain-specific search engine that utilizes free database of music found on portals such as YouTube. Preliminary experiment results from our user study demonstrate the effectiveness and usefulness of our search engine for this application.

Ye Wang worked at the Nokia Research Center in Tampere, Finland for 9 years as a research engineer and senior research engineer. Since 2002, he has been a faculty member in the Computer Science Department at the National University of Singapore (NUS). While a faculty member, he has established and directed the Sound and Music Computing (SMC) Lab in the NUS School of Computing. His main research projects include: 1) multimodal mobile music retrieval (M3R) and its applications to Healthcare, and 2) the NUS Mobile Music Group (NuMOG) for edutainment. In the recent past, he has also worked on scalable and error robust audio streaming, perception-aware low-power media processing for portable devices. For more information concerning Dr. Wang’s and the SMC lab’s research activity, interested parties may visit his webpage: http://www.comp.nus.edu.sg/~wangye/.



An Industrial Strength Audio Search Algorithm
Speaker(s) : Avery Wang - Shazam Entertainment  
When : Mon 28th June 2010 15:00
Where : Electronic Engineering, Room 105

This is a reprise of the talk I gave at ISMIR 2003 on some of the key insights behind the Shazam service. The audio search algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a mobile phone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression, out of a database of several million tracks. The algorithm uses a combinatorially hashed time-frequency constellation analysis of the audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified. Furthermore, for applications such as radio monitoring, search times on the order of a few milliseconds per query are attained, even on a massive music database.

Avery Wang has degrees in Mathematics and Electrical Engineering from Stanford University, specializing in digital signal processing algorithms. He wrote his dissertation on the auditory source separation problem at CCRMA under Julius Smith. He also spent two years at the Ruhr-Universität Bochum with Christof von der Malsburg at the Institut für Neuroinformatik on a Fulbright scholarship. He co-founded Shazam Entertainment in year 2000 and is the principal creator of the audio search technology.



Environmental sonifications 'Hour Angle' and 'Flood Tide'
Speaker(s) : John Eacott - School of Media Arts and Design, University of Westminster  
When : Thu 13th May 2010 14:00
Where : Electronic Engineering, Room 105

Hour Angle and Flood tide are musical works generated from gradually changing environmental data. Hour Angle uses calculations of the position of Earth and Sun while Flood Tide uses live readings of tidal flow. Common to both works are a set of algorithmic processes that translate the data into musical values. A software process that I now call LiveNotation is used to display the values as musical notation that appears on computer screens and is read and performed by musicians. In this presentation I will discuss the ideas behind the work and the processes and considerations used to generate music illustrated with extracts of previous performances.

John Eacott is a trumpeter and composer whose career started in the 1980s with anarchic jazzers Loose Tubes and post-industrial metal bashers Test Dept. In the 1990s he focused on composing many works for Theatre including the worldwide touring production of Gormenghast for the David Glass Ensemble and arrangements for the 2002 Royal Shakespeare Company production of Timon of Athens. Film scores include the Miramax feature Three Steps to Heaven (1995), Escape to Life with Vanessa Redgrave (2000) and jazz arrangements for Alfie starring Jude Law (2003). His many television soundtracks include the BBC documentary series In the Footsteps of Alexander the Great BBC2 (1997). His orchestral compositions have been performed and recorded by the Scottish Chamber Orchestra and Docklands Sinfonietta. Previous algorithmic / generative works include The Street, an interactive sound environment (2000), Morpheus, a CD Rom of generative electronica (2001), and Intelligent Street, a sound space in which users alter their sound environment by sending text messages (2003). Since the completion of his PhD in 2007, John has focused on making accessible live performances using algorithmic composition methods to sonify environmental data. His tide sonification Flood Tide has been performed 9 times including performances at the Royal Shakespeare Company, Stratford Upon Avon, Greenwich Royal Observatory and Thames Festival London 2009. He is Principal Lecturer in music at the University of Westminster, London.



Finding Music and Multimedia on the Web: A Yahoo Perspective
Speaker(s) : Malcolm Slaney - Yahoo! Research Laboratory  
When : Tue 11th May 2010 14:00
Where : Electronic Engineering, Room 105

Without a doubt the Internet has changed the way people consume music. But it also brings a wealth of data and new opportunities for music-information retrieval services. Our goal is to connect users with their entertainment and information needs.

The data is both plentiful and noisy. We have billions of ratings by users about their musical interests. One one hand the the large amount of data means we can build robust models. On the other hand, the data does come from people, with all their idiosyncratic behavior and opinions. This wealth of personal data---we have to assume it is all correct---sometimes means what we think it means, and other times represents personal behaviors unrelated to anybody else's opinion. Separating out the signal from the noise is the new frontier for web sciences.

I'll illustrate my talk with several kinds of technologies we find interesting, drawing from successes we have had from all types of multimedia. These approaches impact recommendations, tagging, and search. Our approaches draw heavily from the world of machine learning, often taking novel directions because of the size of our datasets. The frontiers of web science are wonderful.

Malcolm Slaney is a principal scientist at Yahoo! Research Laboratory. He received his PhD from Purdue University for his work on computed imaging. He is a coauthor, with A. C. Kak, of the IEEE book “Principles of Computerized Tomographic Imaging.” This book was recently republished by SIAM in their “Classics in Applied Mathematics” Series. He is coeditor, with Steven Greenberg, of the book “Computational Models of Auditory Function.” Before Yahoo!, he has worked at Bell Laboratory, Schlumberger Palo Alto Research, Apple Computer, Interval Research and IBM’s Almaden Research Center. He is also a (consulting) Professor at Stanford’s CCRMA where he organizes and teaches the Hearing Seminar. His research interests include auditory modeling and perception, multimedia analysis and synthesis, music similarity and audio search, and machine learning. For the last several years he has lead the auditory group at the Telluride Neuromorphic Workshop. He is a Fellow of the IEEE.



Sparse image representation in nonlocal transform domain
Speaker(s) : Karen Egiazarian - Tampere University of Technology, Finland   Alessandro Foi - Tampere University of Technology, Finland  
When : Fri 26th March 2010 14:00
Where : Electronic Engineering, Room 105

Nonlocal methods have emerged during the past five years as one of the most promising developments in signal and image processing. These methods are based on the principle that natural signals, particularly images, are characterized by mutual self-similarity between patches of data found at different locations. In this talk we present the so-called grouping and collaborative filtering approach: mutually similar patches in an image or video are collected and jointly transformed using a higher-dimensional transform, sparsity is then enforced by shrinkage of the higher-dimensional spectrum. This approach has proved to be very successful, especially as the core element of denoising, deblurring, and other inverse filtering algorithms, including super-resolution and compressive-sensing reconstruction. We discuss various aspects related to the adaptivity of the transforms used in collaborative filtering, with particular emphasis on the geometrical adaptation and on the learning of basis elements from noisy data.

Alessandro Foi received the M.Sc. degree in Mathematics from the Università degli Studi di Milano, Italy, in 2001, the Ph.D. degree in Mathematics from the Politecnico di Milano in 2005, and the D.Sc.Tech. degree in Signal Processing from Tampere University of Technology, Finland, in 2007. His research interests include mathematical and statistical methods for signal processing, functional analysis, and harmonic analysis. Currently, he is a senior researcher at the Department of Signal Processing, Tampere University of Technology. His work focuses on spatially adaptive algorithms for denoising and deblurring of digital images and on noise modeling for digital imaging sensors.

Karen Egiazarian received the Ph.D. degree from Moscow M. V. Lomonosov State University, Russia, in 1986, and Doctor of Technology degree from Tampere University of Technology, Finland, in 1994. He is a leading scientist in signal, image, and video processing, with about 500 refereed journal and conference articles, books and patents. His main interests are in the field of image restoration, multirate signal processing, efficient algorithms, image compression, and digital logic. He is an Associate Editor of SPIE Journal of Electronic Imaging, an Associate Editor of Research Letters in Signal Processing, and a Member of the DSP Technical Committee of the IEEE Circuits and Systems Society. He is a senior member of the IEEE.



Harnessing the embodied knowledge of musicians to allow the real-time performance of correlated music and computer graphics
Speaker(s) : Ilias Bergstrom - University College London  
When : Mon 8th March 2010 16:15
Where : Electronic Engineering, Room 105

A presentation of a novel, entirely custom developed system, for facilitating the live performance of Visual Music / abstract animation. The hypothesis is that by using musical instruments as the primary user interface for the performance, we may usefully re-map the embodied/enactive knowledge that musicians have of their instruments. Musicians may then perform live visual music, taking advantage of the expressivity their instruments afford them. For this work, a new control data mapping strategy had to be developed, of ‘Mutable Mapping’, which entails manually manipulating the mapping during a performance, gradually altering and re-routing digital control data.

Visual arts, music and technology have always competed for my attention, but never has one managed to distract me from the other for too long. From drawing and airbrushing, I was soon compelled to create imagery using the computer. Gravitating towards generative, procedural imagery, my computer science studies taught me to use program code as an artistic medium. In parallel, from playing drums and keyboards, I later also included computers in the music making process, intrigued by the sound creation capabilities they afford. My work has increasingly gravitated towards integrating visual arts, music and technology, while always maintaining live embodied performance at the center stage.

Ilias Bergstrom is a PhD candidate at University College London, currently putting the finishing touches to his thesis. He holds a masters degree from the UCL MSc in Vision, Imaging and Virtual Environments, and a bachelor’s degree in Computer Science from Växjö University, Sweden.



Presentation of Télécom-ParisTech / Automatic separation and transcription of the main melody from polyphonic music signals
Speaker(s) : Gaël Richard - Télécom ParisTech, Paris, France  
When : Fri 5th March 2010 14:00
Where : Electronic Engineering, Room 105

In this talk I will give a brief presentation of Télécom ParisTech (formerly known as ENST), the Audio research group, and will address its main research topics. During the second part of the talk, I will discuss the problem of "monaural main instrument / accompaniment separation" along with the transcription of the melody played by the main instrument, within a unified framework. I will in particular describe the signal model used for leading instrument source separation which extends previous works on the domain with explicit "MIR" knowledge. The proposed signal spectrum model explicitly uses pitches (or fundamental frequencies) both to extract the main instrument from the others and to transcribe the pitch sequence played by that instrument. Results in source separation and melody transcription will be given.

Gaël Richard received the State Engineering degree from TELECOM ParisTech (formerly ENST), Paris, France, in 1990, the PhD degree from LIMSI-CNRS, University of Paris-XI, in 1994 in speech synthesis and the Habilitation à Diriger des Recherches degree from the University of Paris XI in September 2001. After his PhD, he spent two years at the CAIP Center, Rutgers University, Piscataway, NJ, in the speech processing group of Prof. J. Flanagan, where he explored innovative approaches for speech production. Between 1997 and 2001, he successively worked for Matra Nortel Communications, Bois d'Arcy, France, and for Philips Consumer Comunications, Montrouge, France. In particular, he was the project manager of several large-scale European projects in the field of audio and multimodal signal processing. In September 2001, he joined the Department of Signal and Image Processing of TELECOM ParisTech, where he is now full Professor in audio signal processing and Head of the Audio, Acoustics and Waves research group. He is co-author of over 80 papers, inventor in a number of patents and one of the experts of the European commission in the field of speech and audio signal processing. Prof. Richard is a senior member of IEEE and Associate Editor of the IEEE Transactions on Audio, Speech and Language Processing.



Shifting Contexts for Computer Music, from Mainframes to DIY Culture
Speaker(s) : Atau Tanaka - University of Newcastle  
When : Tue 16th February 2010 14:00
Where : Electronic Engineering, Room 105

Computer music as undergone fundamental shifts over the past 20 years – it has gone real time, it has become interactive, it has become miniaturized, and completely democratized. I’ll map out my personal trajectory in this time to look at broader evolutions in the field with sensors, networks, and mobility. These are not just technological changes, but changes that bring about shifts in musical approaches. Form factors change, analogue is reconciled with digital, and new directions in Open Source and DIY culture continue to challenge our assumptions on what it means to be an artist, composer, performer, participant, in these evolving musical/technological landscapes.

Atau Tanaka bridges the fields of media art, experimental music, and research. He worked at IRCAM, was Artistic Ambassador for Apple France, has been researcher at Sony Computer Science Laboratory Paris, and was an Artistic Co-Director of STEIM in Amsterdam. Atau creates sensor-based musical instruments for performance, and is known for his work with biosignal interfaces. He seeks to harness collective musical creativity in mobile environments, seeking out the continued place of the artist in democratized digital forms. His work has been presented at Ars Electronica, SFMOMA, Eyebeam, V2, ICC, and ZKM and has been mentor at NESTA. He is Chair of Digital Media at Newcastle University and is Director of Culture Lab.



Some mathematical tricks for signal processing
Speaker(s) : Monika Döerfler - University of Vienna  
When : Mon 1st February 2010 16:15
Where : Electronic Engineering, Room 105

In this talk, we will try to explain some mathematical insights of potential benefit for the applied signal analyst. In particular, we will talk about time-frequency or Gabor frames and their connection to the commonly used short-time Fourier transform. The mathematical approach bears the potential for generalizations, in particular adaptive transforms. Further, we will briefly explain the idea of structured sparsity in time-frequency representations.

Monika Dörfler is a PostDoc at the mathematical department of the University of Vienna. She is heading the interdisciplinary project Audio-Miner (of which Queen Mary is a partner), which started in January 2010. Her research focus is on local aspects of time-frequency analysis. For details, see http://homepage.univie.ac.at/monika.doerfler/.



Recent progress in music/acoustic signal processing at the University of Tokyo
Speaker(s) : Shigeki Sagayama - University of Tokyo  
When : Fri 11th September 2009 14:00
Where : Electronic Engineering, Room 105

Our lab (30 people) is working on music and acoustic signal processing as well as music information processing and speech recognition/synthesis/dialog. In my talk, I will demonstrate some results from our recent progress such as multi-F0 analysis, chord detection, rhythm recognition, music genre classification, automatic music composition, automatic accompaniment, harmonic-percussive sound separation, tempo/pitch conversion, music note manipulation, etc.

Shigeki Sagayama received BE, ME, and PhD degrees from the University of Tokyo all in Mathematical Engineering and Information Physics. He joined NTT Labs 1974. From 1990 to 1993, he was responsible for the speech processing department at ATR Interpreting Telephony Research Laboratories until coming back to NTT Labs in 1993. In 1998, he became a professor, Japan Advanced Institute of Science and Technology (JAIST). Since 2000, he has been a professor, Graduate School Information Science and Technology, the University of Tokyo.



Sparse Approximation and Atomic Decomposition: Considering Atom Interactions in Evaluating and Building Signal Representations
Speaker(s) : Bob Sturm - Institut Jean Le Rond d'Alembert (IJLRDA)   
When : Wed 1st July 2009 15:00
Where : Electronic Engineering room 105

I will present work from my recent dissertation, which makes contributions to the sparse approximation and efficient representation of complex signals, e.g., acoustic signals, using greedy iterative descent pursuits and overcomplete dictionaries. As others have noted before, peculiar problems arise when a signal model is mismatched to the signal content, and a pursuit makes bad selections from the dictionary. These result in models that contain several atoms having no physical significance to the signal, and instead exist to correct the representation through destructive interference. This diminishes the efficiency of the generated signal model, and hinder the useful application of sparse approximation to signal analysis (e.g., source identification), visualization (e.g., source selection), and modification (e.g., source extraction). While past works have addressed these problems by reformulating a pursuit to avoid them, in this dissertation we use these corrective terms to learn about the signal, the pursuit algorithm, the dictionary, and the created model. Our thesis is essentially that a better signal model results when a pursuit builds it considering the interaction between the atoms. We formally study these effects and propose novel measures of them to quantify the interaction between atoms in a model, and to illuminate the role of each atom in representing a signal. We propose and study different ways of incorporating these new measures into the atom selection criteria of greedy iterative descent pursuits, and show analytically and empirically that these interference-adaptive pursuits can produce models with increased efficiency and meaningfulness.

Dr. Sturm has received an undergraduate degree in physics from the University of Colorado, Boulder (B.A. 1998), a graduate degree in computer music from Stanford University (M.A. 1999), and a few other graduate degrees from the University of California, Santa Barbara (M.S. 2004, M.S. 2007, Ph. D. 2009). He continues his research in sparse approximation and signal representation as a Chateaubriand Fellow post-doctoral researcher at UPMC - Paris 06 with Professor Laurent Daudet.



Less is more: sparse representations for audio
Speaker(s) : Laurent Daudet - Musical Acoustics Group, D'Alembert Institute for Mechanical Engineering, University Pierre-and-Marie-Curie - Paris 6  
When : Tue 2nd June 2009 15:00
Where : Electronic Engineering, room 105

This talk will be focused on signal modeling using sparse decompositions in overcomplete dictionaries, with a strong focus on audio signals. In such models, a signal is approximated by combining a small number of elementary waveforms ("atoms"), taken from a very large collection ("dictionary"). This provides extra flexibility (e.g. apparently avoids time-frequency resolution constraints) but comes with increased complexity over standard Fourier-based analysis. Greedy techniques have however been developed that provide near-optimal decompositions in reasonable computational cost, i.e. applicable on large-scale multimedia databases. After a general overview, I will discuss recent applications that takes advantage of sparsity, combining scalable audio coding with Music Information Retrieval applications.


Laurent Daudet is Associate Professor at the Pierre-and-Marie-Curie University (UPMC aka Paris 6), France. He's also Visiting Senior Lecturer at Queen Mary University of London. After a physics education at the Ecole Normale Superieure, Paris, France, he received a Ph.D. degree in applied mathematics from the Universite de Provence, Marseille, France, in 2000. In 2001 and 2002, he was an EU Marie Curie Post-doctoral Fellow with Prof Mark Sandler at the Centre for Digital Music at Queen Mary University of London. Since 2002, he has been working at UPMC where he joined the Musical Acoustics Laboratory (LAM), now part of the D'Alembert Institute for mechanical engineering. He is author or coauthor of over 70 publications on various aspects of digital audio signal processing. His research focuses mainly on applications of sparse signal processing for the analysis and synthesis of audio signals.



Recovering some statistical information of Room Impulse Responses using Matching Pursuit
Speaker(s) : Guillaume Defrance - Equipe Lutheries, Acoustique, Musique - LAM  
When : Wed 20th May 2009 15:00
Where : Electronic Engineering room 105

Matching Pursuit, a well-known technique used in audio decomposition and sparse representation, is applied to Room Impulse Responses (RIRs) in order to investigate some statistical foundations of Room Acoustics. The detection of arrivals and the estimation of mixing time are therefore possible. This study is a first step towards a validation of the ergodic theory of reverberation. The use of Matching Pursuit is implicit since correlation between the impulse response and the direct sound is assumed.

This presentation shows why the compensation of the energy decay of the RIR is necessary to obtain stationary signals, and also how to estimate the best temporal boundaries of the direct sound of the RIR. The choice of a stopping criteria, based on the similarity between acoustical indices of the original RIR and those of the reconstructed signal, is discussed.

The cumulative distribution functions of arrivals of experimental and synthesized RIRs (using a stochastic model, which is presented) are compared. The mixing time is estimated when the arrival density becomes constant. The dependance of mixing time upon the distance source/receiver is investigated with measured and synthesized RIRs. It is shown how the integration of the diffusion to the model improves the match between mixing times of experimental and synthesized RIRs.


Guillaume Defrance is a PhD student (supervised by Jean-Dominique Polack) at the Institut Jean Le Rond d'Alembert in the LAM team (previously the Laboratory of Musical Acoustics), at University Paris 6, France. Guillaume obtained a Master degree of Physical Acoustics at the University Pierre et Marie Curie (Paris). Guillaume has investigated different topics in acoustics during his studies, such as: the physics of instruments (classic guitar), the reproduction of a soundfield in 3D (Wave Field Synthesis), and the study of the cross-cultural perception of urgent sounds.
His PhD concerns stochastic modeling of room acoutics. Guillaume has refurbished a software widely used in the room acoustics community (the OpenMIDAS package), studied the detection of onset of room impulse responses (RIRs), and investigated statistics of RIRs in order to estimate arrivals and the mixing time (among others).