Centre for Digital Music

 
 

Object-based Coding of Musical Audio

Contact
- Dr Mark Plumbley

Contents of this page: Summary | Sound Examples | Software | Publications | Participants

Summary

Current coding systems for musical audio tend to use transform coding or filterbanks, with bit allocation determined by psychophysical masking thresholds. A recent alternative approach is to decompose the signal into a parametric encoding, consisting for example of either sine waves plus noise, or sine waves plus transients plus noise.

In this project, we developed a methodology for audio coding using higher-level "sound objects", consisting of individual notes or chords played by particular instruments, using models based on Bayesian probability theory. While these models can be complex and time-consuming to calculate, we developed new efficient methods to learn these models faster than existing approaches. Using listening tests we showed that these gave better results at low bit rates than alternative methods. We also explored closely related technologyies such as audio source separation and sparse representations, that we believe will be useful for future object-based coding systems.

While full object-based coding of complex polyphonic audio scenes (such as a symphony orchestra) is still a long way off, this project is a first step towards this important long-term goal.

Sound Examples

Software

1. MUSHRAM

MUSHRAM is a MATLAB graphical interface to the MUSHRA ("MUlti Stimulus test with Hidden Reference and Anchor") test, according to ITU recommendation ITU-R BS.1534-1 for the subjective assessment of intermediate quality level of coding systems.

2. Object Based Encoder/Decoder (Matlab)

This archive contains a set of MATLAB files for encoding and decoding sound files to object-based coded ("moc") files, together with examples. See also Vincent & Plumbley (2007).

For more details, see the file "readme.txt".

3. Object-Based Decoder (C++)

This archive contains C++ source for the following programs to decode object-based encoded (moc) files:

  • moc2wav: object based encoded moc to wav file converter (C++ source and windows executable)
  • mocplayer: C++ source code for a streamer compressed file player (tested on Linux Studio to Go! and Ubuntu Gutsy Gibbon)

A set of example compressed object files are also provided.

Publications

Davies, M. E., M. J. Jafari, S. A. Abdallah, E. Vincent & M. D. Plumbley (2007) Blind speech separation using space-time independent component analysis. In S. Makino, T.W. Lee, H. Sawada (Eds.): Blind Speech Separation. Springer.

Davies, M. E., C. J. James, S. A. Abdallah & M. D. Plumbley (eds.) (2007) Proceedings of the 7th International Conference on Independent Component Analysis and Signal Separation, ICA 2007, London, UK. Springer-Verlag.

Downie, J. S., K. West, A. Ehmann, & E. Vincent (2005). The 2005 Music Information Retrieval Evaluation eXchange (MIREX 2005): preliminary overview. In Proc. ISMIR, pp. 320-323.

Jafari, M. G., E. Vincent, S. A. Abdallah, M. D. Plumbley & M. E. Davies (2006) Blind source separation of convolutive audio using an adaptive stereo basis. In: A K Nandi and X Zhu (eds.), Proceedings of the ICA Research Network International Workshop, Liverpool, UK, pp 105-108.

Jafari, M. G., E. Vincent, S. A. Abdallah, M. D. Plumbley & M. E. Davies (2008) An adaptive stereo basis method for convolutive blind audio source separation. To appear in Neurocomputing. [Preprint]

Leveau, P., E. Vincent, G. Richard, and L. Daudet (2006) Mid-level sparse representations for timbre identification: design of an instrument-specific harmonic dictionary. In Proc. 1st Workshop on Learning the Semantics of Audio Signals (LSAS), pp 1-11.

Leveau, P., E. Vincent, G. Richard & L. Daudet (2008) Instrument-specific harmonic atoms for mid-level music representation. IEEE Trans. on Audio, Speech and Language Processing, 16(1), 116-128.

Myatt, T., B. Eaglestone, E. Miranda, M. D. Plumbley, F. Rumsey (2005) Digital Music Research UK Roadmap.

Plumbley, M. D. (2005) Geometry and homotopy for L1 sparse representations. In: Proc. Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS'05), Rennes, France. [Preprint]

Plumbley, M. D. (2006) Recovery of Sparse Representations by Polytope Faces Pursuit. In Proceedings of the 6th International Conference on Independent Component Analysis and Blind Source Separation (ICA 2006), Charleston, SC, USA, pp 206-213.

Plumbley, M. D. (2007a) On polar polytopes and the recovery of sparse representations. IEEE Transactions on Information Theory 53(9), 3188-3195.

Plumbley, M. D. (2007b). Dictionary Learning for L1-Exact Sparse Coding. In M. E. Davies, C. J. James, S. A. Abdallah and M. D. Plumbley (eds.), Proceedings of the 7th International Conference on Independent Component Analysis and Signal Separation, ICA 2007, London, UK, pp 406-413. [Preprint]

Plumbley, M. D., S. A. Abdallah, T. Blumensath & M. E. Davies (2006a) Sparse Representations of Polyphonic Music. Signal Processing 86(3), 417-431.

Plumbley, M. D., S. A. Abdallah, T. Blumensath, M. G. Jafari, A. Nesbit, E. Vincent & B. Wang (2006b) Musical audio analysis using sparse representations. In COMPSTAT 2006 Proceedings in Computational Statistics, Rome, Italy, pp 104-117. [Preprint]

Sutton, C., E. Vincent, M. D. Plumbley & J. P. Bello (2006) Transcription of vocal melodies using voice characteristics and algorithm fusion. In Proc. Music Information Retrieval Evaluation eXchange (MIREX 2006).

Vincent, E. (2005) MUSHRAM: A Matlab interface for MUSHRA listening tests. (Software).

Vincent, E. (2006) Musical source separation using time-frequency source priors. IEEE Trans. on Audio, Speech and Language Processing, 14(1), pp. 91-98.

Vincent, E. & R. Gribonval (2005) Construction d’estimateurs oracles pour la séparation de sources. In Proc. 20th GRETSI Symposium on Signal and Image Processing, pp. 1245-1248. [Preprint]

Vincent, E., R. Gribonval & M. D. Plumbley (2007) Oracle estimators for the benchmarking of source separation algorithms. Signal Processing 87(8), 1933-1950.

Vincent, E., M. G. Jafari & M. D. Plumbley (2006) Preliminary guidelines for subjective evalutation of audio source separation algorithms. In: A K Nandi and X Zhu (eds.), Proceedings of the ICA Research Network International Workshop, Liverpool, UK, pp 93-96. [Preprint]

Vincent, E. & M. D. Plumbley (2005a) Predominant-F0 estimation using Bayesian harmonic waveform models. In Proceedings of the 1st Annual Music Information Retrieval Evaluation eXchange (MIREX 2005).

Vincent, E. & M. D. Plumbley (2005b) A prototype system for object coding of musical audio. In Proceedings of the 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 05), pp 239-242. [Preprint] [Sound Files]

Vincent, E. & M. D. Plumbley (2006a) Single-Channel Mixture Decomposition Using Bayesian Harmonic Models. In Proceedings of the 6th International Conference on Independent Component Analysis and Blind Source Separation (ICA 2006), Charleston, SC, USA, pp 722-730.

Vincent E. & M. D. Plumbley (2006b) Fast factorization-based inference for Bayesian harmonic models. In Proceedings of the 2006 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2006), Maynooth, Ireland, pp 117-122. [Preprint]

Vincent, E. & M. D. Plumbley (2007) Low bitrate object coding of musical audio using Bayesian harmonic models. IEEE Transactions on Audio, Speech and Language Processing, 15, 1273-1282.

Vincent, E. & M. D. Plumbley (2008) Efficient Bayesian inference for harmonic models via adaptive posterior factorization. To appear in Neurocomputing. [Preprint]

Wang, B. & M. D. Plumbley (2005) Musical audio stream separation by non-negative matrix factorization. In Proceedings of the DMRN Summer Conference, Glasgow, Scotland, UK.

Wang, B. & M. D. Plumbley (2006) Investigating single-channel audio source separation methods based on non-negative matrix factorization. In: A K Nandi and X Zhu (eds.), Proceedings of the ICA Research Network International Workshop, Liverpool, UK, pp 17-20. [Preprint]

Welburn, S. J., M. D. Plumbley & E. Vincent (2007) Object-coding for resolution-free musical audio. In Proceedings of the 31st International AES Conference: "New Directions in High Resolution Audio", London, UK, June 25-27, 2007.

Participants

Dr Mark Plumbley
Prof Mark Sandler
Prof Mike Davies
Dr Emmanuel Vincent
Dr Matthew Davies
Beiming Wang
Steve Welburn

Duration

01 October 2004 to 30 November 2007

Sponsor

EPSRC Grant GR/S75802/01: £202,353