We get a lot of questions about DSP every day. In this article I am going to illustrate how Digital Audio works, and how your computer works with audio data. This is a high level overview that will introduce you to fundamental aspects of digital audio.
A Familiar Scene
Let us assume you are about to record yourself speaking into a microphone. Your microphone is plugged in to your computer via an Audio Interface. You have armed your track in Logic and press record. You speak in to your computer and you can see your audio waveform displayed on the screen as you record.
Now you want to listen back to your recording (because you conform to best practice of course…). So you put on your headphones which are connected to the headphone port of your audio interface. You press play and you can hear your recording.
For the user, this is a simple process. However, there is a lot of fundamental stuff that is happening that you need to understand for Digital Audio and DSP.
From Sound Waves to Data…
Sound is nothing more than a pressure wave that propagates in one direction through a medium such as air. When a sound wave hits your microphone’s diaphragm, that microphone creates a voltage potential across the cable terminating at the audio interface. That voltage is an analog signal. This is to say that it is infinitely precise and can be expressed over time like in the following figure. Infinitely precise, means that we can make infinite steps between two amplitude values or between two points in time. 
That voltage is then processed by an Analog to Digital Converter (ADC) to create a digital version of that signal. Digital signals are different as they are quantized in both time and amplitude. This means that a value (sample) is taken at fixed intervals (sampling frequency). That value is rounded to the nearest value that can be expressed in bits (bit depth). This is shown in the illustration below. This means that our digital signal is less true to the original signal. However, if we take more samples and increase our bit depth, then we can have a more accurate representation of the analog signal. 
You may be wondering: “How high does my Sampling Frequency need to be?”
Our minimum required sampling frequency is dictated by Nyquist-Shannon Sampling Theorem. The main idea of the theory is as follows:
At least two samples per period of a wave are required in order to express the positive and negative portions of a signal.
In reality more samples per period may be required. But as a rule of thumb, this means that our Nyquist Frequency (the maximum resolvable frequency) is half of the sample rate. So at 48000 Hz sample rate, the Nyquist Frequency is 24000 Hz which encompasses all of human hearing. Sometimes we do not need to sample as high (human speech). Sometimes we need higher sample rates (impulse response measurements and time warped samples). The plot below shows the values that will be replicated when a 500 Hz wave is sampled at 12 kHz. If you would like to experiment with this, please refer to the Python Script on my Github.
Quantization is a method by which we map input values (which are infinite precision in our case) to a smaller data set (which is finite in range and precision). Quantizers replace the sampled voltage values with a value represented in bits (combinations of 1’s and 0’s). Common word lengths in audio are 8 bits, 16 bit and 24 bit. The following picture shows the representation for 2 bits. Notice the significant rounding errors. 
These rounding errors are called quantization error. Too much error can be audible and is quantifiable as the signal-to-quantization-noise ratio. This can be calculated as:
This means that a pristine analog recording, through a 16 bit ADC, will have a maximum signal-to-quantization-noise ratio of approximately 98.09 dB. If you would like to experiment with this, please refer to the Python Script on my Github.
How does your computer use this?
After the ADC does this sampling and quantization, the audio data is now a bit stream. It is a constant flow of bits to your CPU. The ADC is constantly feeding data to the CPU at a rate of 44.1 kHz, 48 kHz or some other rate. The audio software is doing whatever operations need to be done. This could be writing audio to disk or processing through an effect. Likewise, your ADC is constantly requesting and pushing audio data to your headphones and speakers in the reverse process while you are listening to playback.
However, a CPU or other processor does not typically do this sample by sample. It is more common for a processor to take a chunk of audio data called a buffer. The buffer has a fixed number of samples (128, 256, 512, 1024 samples). This buffer can either be processed all at once or on a sample by sample basis.
Why do you need to know this?
There are many topics I did not cover today including, aliasing, dithering and jitter. However, this concept of sampling and quantization is fundamental to Digital Signal Processing and the creation of audio effects. This limited precision will dictate how you handle data and calculations in your algorithms. It will be a limiting constraint for filter design, reverb design, distortion and other audio effects.
I hope this article helped others understand how audio works on a computer. In short, a computer takes an input voltage, samples it, quantizes it, and processes it in sets of samples called buffers. This, core idea has many implications in more advanced applications of DSP. I hope this helps guide your understanding so that you can understand later articles that I will be publishing.
Be good to each other and take it easy..
Will Fehlhaber is an Acoustics Engineer and Audio Programmer from the UK and Bay Area.
- Abbey Road Red Hackathon 2019
- Who’s in LA next week? Music and Tech Maker’s Meetup
- The Audio Programmer Podcast Episode 09 – Mark Watt (SpaceCraft Synth)
- The Audio Programmer Podcast 08 – Nick Thompson (Creative Intent) Announces His New JUCE Module!
- The Audio Programmer Podcast 07 – Jules Storer (Founder of JUCE / SOUL)