by Benjamin Villalonga Correa

Audio Compression

Problems in audio compression form just a subset of the general problem of signal compression, and general techniques can well be applied to solve them. However, it is possible to benefit greatly from being aware of the very particular way in which the human brain perceives and interprets sound, being able to optimize compression techniques to keep only information that is relevant to human perception. In this presentation, I focus on speech compression, and more particularly on an implementation using a Linear Predicting Model (LPM). The LPM provides a very efficient way of reconstructing a signal from a very small set of compressed data (up to 95% of data can be neglected), generating a sythesized speech that keeps the original phonemes and the quality of the voice of the speaker, who can be recognized easily. This technique has been used in telephony applications.

Presentation Summary

In this presentation, I talk about:

  • the audio compression problem.
  • human perception of the voice.
  • the speech compression problem.
  • the Linear Prediction Model (LPM).
  • LPM implementation for speech compression.

Examples

  • A Jupyter notebook to compress and decompress audio using an LPM. Feel free to contact me if you are having trouble using or understancing this notebook.
  • A Jupyter notebook to fourier-analyze the audio files manipulated in using the compression-decompression notebook referred to above.
  • A .wav file (me saying the English alphabet) used as a demonstration for the previous two notebooks.

References

  • Three practical explanations of how the LPM works for speech compression are this one, this one and this one.
  • A comprehensive book on Linear Prediction Models is: Vaidyanathan, P. P. (2007). The theory of linear prediction. Synthesis lectures on signal processing, 2(1), 1-184.

All Signal Processing.