Saturday, April 13, 2013

Basic Mathematics of Sound: Sample Rate, Sample Size, and Binary

When I first sat down to write this post, my intent was to teach some of the people who follow me on YouTube what sample sizes and rates are all about. You may have seen reference to sample rates before: CD’s at 16/44.1. High quality studio sessions at 24/96. I figured that I could type up a few paragraphs, record a short accompanying video, and be done in under an hour.

But then I started to think about what I’d have to explain if I explained sample rates: for starters, how frequency is measured, what is considered the normal range for human hearing, and how binary works. And then I started to realize that I should probably touch on the Nyquist Theorum, which directly affects minimum sample rates required to make a recording sound good. If I got into Nyquist, it seemed that overlooking a quick explanation of aliasing and quantization noise would be criminal. And if I was going to mention anti-aliasing techniques, it would be a shame to skip over a basic explanation of dithering.

So this is going to be a story that touches as lightly as possible about some of the mathematics of sound and recording, but I promise that I will try to explain this in the most simple, common-sense, layman terms possible. I don’t want your eyes to glaze over and have you navigate to the latest episode of Breaking Bad, where the science seems more applicable to everyday life. Therefore, if you’re a professional audio engineer and you’re reading through this, and one of my explanations makes you start sweating and stuttering and your heart begins to palpitate, remember that I’m trying to make these explanations more accessible for a wide audience of people who don’t have advanced degrees in audio engineering. I’m going to explain things in ways that make simple sense to me. If you see an outright mistake, sure, go ahead and email me. But realize that sometimes I’m just trying to keep things simple. I’m sort of implying the spherical cow.

Before you go further in reading the rest of this post, here’s a link to an associated tutorial video that I put together to accompany this post:

Although watching the video is the best way to learn about this topic, because of my illustrations on the whiteboard, I've also put a copy of the audio portion of that tutorial video on SoundCloud, for people who would like to download it to listen to in vehicles, while travelling, etc. Here's the audio-only version:

Sample Rates

Alright, let’s get started. You’ve probably heard lots of things about sampling. First of all, you need to understand that I’m talking about sample rates and frequency, which relate to the way that a computer converts an analogue signal (a real-world sound) to a digital representation. The word “sampling” is also used in the music industry in reference to recording a short section of audio, perhaps from another record or song, and pasting copies or altered copies of that into a new song. I’m not referring to that kind of sampling.

When “digitizing” an audio source, the way that a computer works is that it takes a measurement of the audio many times per second, and then just plays these samples back in order very quickly. Each individual slice is called a sample of the audio. The number of times per second that the audio is sampled is called the “sample rate.”

Basically, anything that is expressed in “occurrences during a period of time” is a frequency. There was a German physicist and Nobel Prize winner named named Gustav Ludwig Hertz. Any time people refer to frequency, they refer to something that happens over and over again at a regular interview, whether it is a cyclical thing (rotation, oscillations, or waves) or a periodic thing (counts of an event). The number of occurrences per second is the frequency, and the unit it is expressed in is called the Hertz (Hz). The “period” of something, ie. the time between occurrences, is the reciprocal of the frequency.

So when something is recorded at 800 Hz, that means that a sample measurement of the sound is recorded eight hundred times a second. That seems like a lot, eh? It’s not. In today’s world of audio engineering, a typical sample rate is much faster than that. All CD’s have been standardized as having sample frequencies of 44,100 Hz, or 44.1 kHz. That’s why the default sample frequency for a lot of music is at 44.1 kHz, because it’s been conformed for CD distribution.

Having a higher sample frequency gives you a better true representation of what was happening in the underlying waveform. Let’s try to use a really simple example. Let’s say that you’re in a concert hall listening to a singer. The singer’s volume, as he/she sings, is jumping up and down a lot, from very quiet to very loud and back. If you take a “sample” once per minute, you don’t have a very good idea of how loud the singer is over the time that he/she is singing. You have no idea whether the sound is louder or softer in the other fifty-nine seconds between your samples, or maybe both, jumping up and down. But if you increase your sample rate so you can take sample once per second, you’ve got a better idea of how much the singer is changing their volume over time.

That was a coarse example. Increasing your sample frequency means that your digital interpretation of the audio is more accurate. But to get a really accurate representation in today’s world, computers sample audio at a stunning 44,100 times per second to get a really accurate representation. And that’s just for CD’s. If you can sample faster, your digital sound is going to be even better (more similar to the original). DVD’s are recorded at a slightly higher sample rate than CD’s, at 48 kHz. And in today’s recording studios, sampling audio twice as fast is quite common, at rates of 96 kHz. Of course, taking twice as many measurements (96 thousand per second instead of 48 thousand per second) means that you’re going to require twice as much storage space on your computer, and more accurate equipment, which is why many studios don’t go with rates that are higher than 96 kHz.

So now that you understand what sample frequency is, what does the bit depth mean? The simple answer is “the resolution or accuracy of each individual sample.” But in order to understand that better, I’m going to talk a bit about binary numbers. I promise, this next section about binary is the only section where I have to get fairly mathematical.

Binary Notation

How does binary work? Binary is a numbering system. It’s the simplest complex numbering system, base two. There are only two digits in this numbering system, 0’s and 1’s. We’re used to base 10, which has ten different digits. Base two should be a lot easier with only two digits to think about. And base two is also easy to deal with when you’re thinking about computers and electrical engineering. Computers can’t “think” because they aren’t sentient brains. But numbers can be represented by “simulating” the 1’s and 0’s of binary with two different power states, power-on and power-off.

In binary, a single digit is called a “bit.” Bit is basically the base-two equivalent of “digit” in the base-ten system that we’re used to.

In binary, a numerical value is called a “word.” Word is basically the base-two equivalent of “number” in base-ten.

In base ten, we don’t really use the phrase “number length” to talk about how many digits are in a number. But in base-two, we use the phrase “word-length”. Computers have to deal with electrical connections that are much more simple than the human brain, so we have to keep things simple and consistent. When computers communicate, instead of a stream of single bits, they can sometimes deal with full words, ie. a group of bits communicated simultaneously. Think of it like a highway with multiple lanes, and individual cars as being bits. Because there are multiple lanes, several bits can pass a certain point at the same time. Computers are analogous because a full “word” of bits can often be shared as a single entity. The word-length refers to how many bits that is.

In the early days, computers were simple and could only understand short binary words. By the 1980’s, the commodore 64 and the apple computers were talking with 8-bit word lengths. Soon after, PC’s with MS/DOS came out that talked in 16-bit words. In the past few years, PC’s have grown up from 32-bit operating systems to 64-bit.

In the audio world, a sixteen bit word length allows for a lot of different numbers. The number of different samples possible in binary depends on the square of the word length. If you have four-bit words, you have sixteen different choices (4^2). If you have eight-bit words, you have 256 different choices (8^2). If you have sixteen-bit words, you have 65,536 choices. If you have 24-bit words, you have TONS of choices – 16,777,216 to be exact.

Ok, enough math. What does this mean? Well, having more choices means higher resolution. What if you could measure the volume of a sound that could vary from complete silence (zero decibels) to the volume of a loud jet engine (128 dB)? And what if your scale for measuring is digital? With an analogue measurement, such as recording on magnetic tape, you can measure the exact volume. But if you have to have a digital representation, you only have certain numeric choices. If you’re limited to 4-bit sample size/resolution, then remember that 4 bits only gives you sixteen possibilities. So you have to go with some pretty rough measurements. Anything from 0 to 8 dB might have to be represented in your sample as “0”, from 8 to 16 dB as “1”, from 16 to 24 dB as “2” and so on. But there’s a lot of variation between say 8 and 16 dB. That’s not very accurate if you later see that your sample was written down as “1” and you have no idea whether the real sound was at 8dB or 16dB, or anything in between.

But what if you can increase your sample width, the number of choices. If you can measure the sound with 16-bit sample size, you have 65,536 different possible levels to choice from. That gives you a lot more choices in the scale from silence up to 128dB. You might be looking at a scale like this:
       0 = 0.000 dB
       1 = 0.002 dB
       2 = 0.004 dB
       3 = 0.006 dB

And all the way up to:

       65,534 = 127.998 dB
       65,536 = 128.000 dB

Obviously, by having more bits, you can capture/communicate more information at a higher resolution, which gives you a better representation of what the volume was in the original sound. Going from 16-bit sample size to 24-bit sample size obviously means that you can measure things with an even better resolution. By the way, note that I'm talking in generalizations here so far. If you're an experienced audio engineer, you'll know that digital audio in a DAW is treated a bit differently in that the higher sample size actually means a lower noise floor, but we'll get into that in tutorials 8 and 9. For now, let's keep things simple.

If you want a rough example of a real world analogy, think about the resolution of the camera in your cell phone. If you’ve got a 3 megapixel camera in one phone and a 13 megapixel camera in a second phone, the 13mp is obviously going to give you a better picture, right? That’s because it’s a higher resolution. You’ll get a more accurate representation of what you’re trying to record (photograph) because there are more bits used to store the information.

CD standard resolution is 16-bit. That should be the minimum sample size that you want to work with in a music production or recording environment. Anything less sounds noticeably imperfect even to untrained ears. But we have the technology to do better. If you see a sound card that is referred to as 24/96, it means that the sample size is 24-bits, and the frequency with which those samples are taken is 96,000 times per second. If you have the choice, try to work with 24-bit equipment, and make sure your computer software has your “project settings” at 24-bit instead of a lower number. The only drawback is that 24-bit recording takes up more space on your storage device.

Before I move on, let me just say something about a different type of binary. Different type? Well, in all of the above, I’m assuming that you’re using what’s called a “fixed point” notation. But there is also something called a “floating point” notation, so you’ll see things like “32-bit floating.” In such a system, the last eight bits may not be used specifically to increase resolution, but might instead be used to increase dynamic range significantly. I won’t bother trying to explain the significand/mantissa or the rest of the theory. You’ll find all kinds of discussion and debate about this on the internet, but I think the simple answer is that 32-bit floating isn’t necessary much better than 24-bit fixed, and 32-bit takes up 33% more space. Check out this link for more:

For now, I’d suggest that you shouldn’t select 32-bit at the start of a project because your newly recorded files will be 33% larger without any improvement whatsoever in fidelity. It makes more sense to switch a session's resolution to 32-bit float later, when bouncing mixes or performing complex signal and effects processing.

Sample Rates as applied to Sound

So I started out to explain the difference between sample frequency (times per second that samples are taken) and sample depth (number of bits of data per sample). And it turned into a three thousand word essay. Can I give you anything more practical to wrap things up? I’ll try:

First, be aware that if you are saving audio files, a single STEREO audio file at 16-bit resolution and sample rate of 44.1kHz will take up approximately ten megabytes of disk space for each minute of audio. Memorize that. Once you know that, you can calculate potential storage requirements for all variations of sample size, rate, number of tracks, and project length.


Let’s say you’re recording a vocal (single mono track), an acoustic guitar (single mono track), and a piano (feeding a stereo signal to your DAW). All told, you have a total of four tracks. Mono signals count as a single track, and stereo signals count as two. Four mono track is equal to two stereo tracks. So based on what you’ve memorized of 10 megs per minute of stereo audio at CD quality (16/44.1), then you’ll need double the storage space for your project, because you have the equivalent of two tracks. So budget for 20 mb per minute of audio.

Let’s say that you’re making a recording that will be exactly eight minutes long. Multiply your 20megs by 8, and you’ll need 160megs of storage.

But wait, let’s say that a studio engineer comes in and says that he wants you to change from 16-bit to 24-bit sample sizes. Your requirement just grew by 50%, so now you need 240megs of storage instead of 160.

Then, let’s say that he also adds that the project will be for DVD with no CD equivalent, so you need to change from 44.1 kHz sampling to 48 kHz. Roughly, add 10% to your numbers, so your 240megs becomes 264megs.

Then finally, the engineer changes his mind yet again and decides to jump it up from 48 kHz to 96, just because he’s going to be working with a lot of digital effects and he wants the highest project quality possible. So double it again, and your storage requirements go from 264 to 528megs.

That kind of stuff is handy to know when you’re calculating space requirements for a project. However, to be honest, if I’m budgeting for storage space for a project, I’ll double what my calculations show me, just to be safe. So I’d want to have a full gigabyte of storage available for the example above. Things always get out of control and take up more room than you anticipate.

Oh yes, and what do I recommend/use for sample rates? I often just use 16/44.1 for projects. Face it, CD standard has been great quality for a couple decades. How can you go wrong? Unless the project is very important, using 16/44.1 saves disk space, and saves a bit of time because I don’t have to down-sample my final track at the end for compatibility with CD players. For most of my work, CD quality is just fine. However, I'll sometimes use 24/44.1 for projects. That's an odd setting, which you'll rarely see, but I'll explain why I use that in tutorials 8 and 9. You'll also see most studios use 24/96 for their projects. The advantage of 24/96 is that when you save it as an archive, if you need to go back to it ten years from now, computers will probably have advanced so much that it’ll probably even be possible for cell phones to be used to edit projects of that complexity.

Alright, that’s enough for today. I’ll save the Nyquist Theorem, Quantization Noise, Anti-Aliasing, and Dithering for future tutorials. Thanks for reading. I hope you now understand a lot more about the basic mathematics of audio.

If you’ve read all the way through this, you obviously want to learn more about audio recording and music production work. I don’t have a ton of written tutorials like this online, but I do have quite a few detailed YouTube videos that you might enjoy. I've got an organized list of those videos in the index of my "videos" page on my main website. If you're interested in any of those topics, you should bookmark this page right now:

Thanks for your interest in this series, and thanks for sharing this post or links to any of the videos.

Follow Jonathan Clark on other sites:
        Main Site:
        Music Blog: