Submitted by Kuosch on Thu, 2020-05-07 - 12:50

I'm writing this in an attempt to shed light on one of the commonly misunderstood topics in audio: digital sound.

Of course, digital audio is much too large a topic to cover in a single blog post, so I'll do a general overview on the various aspects, some of which I'll come back to in future posts, and try to explain some of the most common misunderstood concepts.

Digital audio - what is it?

Simply put, digital audio is any audio that has been digitized, which means it's been sampled (that is: measured at some fixed time interval), and quantized (that is, samples have been given discrete numerical values). Digital audio can be stored and transported in many ways, such as on CDs or streamed from a server. But more about that later. The most important thing about digital audio is that digital signals are robust.

Let's consider an example in analog audio, such as a guitarist on the stage. The pickups of their instrument are analog, their amplifier is analog, and their speaker cabinet is of course analog. This setup is very susceptible to the environment. If the electrical circuits aren't properly grounded and cables well shielded, the stage lights can cause an annoying hum to eb picked and amplified by the system. And if someone has left a phone nearby and it connects to the nearest cell tower, the resulting noise is more than annoying, although unfortunately familiar.

Now, the beauty of digital signals is that they are composed of ones and zeroes. Nothing else. Most commonly the ones and zeroes correspond to high and low voltages, but not always. They could as well be holes on a paper tape, or different lengths of markings on an optical surface. They beauty of digital is that the information is coded, abstracted away from the signal carrying it. And because of this abstraction, digital signals are robust. With the exception of some very harsh environmental conditions, the signal can almost always be fully recovered and decoded back to analog, without measurable or audible degradation. This also means that unless a cable between two digital devices was broken, replacing it will not change anything. If the replacement is aesthetically pleasing, go ahead, but don't think it will change anything sonically, because it cannot.

This leads to one of the biggest misunderstandings: if somebody says there is a difference between two digital devices, using the same source material, the changes they hear must be analog in nature. Since digital transport signal is recoverable, the audio does not change in digital circuitry unless explicit digital signal processing is involved. If this was not true, the device you're reading this on would not work at all.

Signal noise

Signal noise from various sources is often claimed to be detrimental to digital audio. But digital signals as I've mentioned before,

As an example, the very common CMOS logic defines the logic levels as following:

  • Low level: 0 to 1/3 of operating voltage
  • High level: 2/3 of operating voltage and above

So the voltage range from zero to the operating voltage is divided to three equal bands, low logic level, high logic level, and between these is the forbidden level. The circuitry will never produce a signal of this middle voltage, and would reject it on input too. And because the logic levels are so wide, in all but the most extreme cases any noise that has coupled onto the signal is ignored. This is how we can transport digital audio across the world, whereas analog audio is susceptible to degradation with each interconnection.

Jitter

Jitter is the deviation in the timing of a digital signal. Transfers are supposed to happen at steady intervals but the clocks used to create the timing aren't perfect, and errors occur. This is a known problem for AES3 and S/PDIF connections, which do not carry a separate clock signal, but the timing is embedded in the bitstream itself. I found this paper from 1998 which studied this matter quite extensively, and it says jitter is mainly an issue in equipment that doesn't separate the conversion clock from the transfer clock. “For nearly all program material no audible degradation was heard for any amount of jitter added below the level at which the DIR lost lock.” The authors continue by stating that jitter is an issue that should be dealt with, but this study is over 20 years old, and hopefully technology has improved since then.

But more often I hear people speak of jitter in the context of USB audio, where it makes even less sense. USB as it is most commonly used in the asynchronous isochronous mode, transmits a packet of data at fixed intervals, each packet composed of several samples. The receiver therefore has to store these samples before sending them to the digital-to-analog conversion circuits. This makes the system effectively immune to jitter in the USB transport. For that matter, all digital equipment should run the conversion circuitry from high-quality clock sources, and buffer samples coming through any transport medium. In any case, we should not blame the whole of digital audio for poor impementations.

Compression

I'll be writing about this one a lot in future, but the biggest confusion I'd like to address here is that people often mix dynamic range compression and data compression together. Dynamic compression makes quiet parts of audio louder and is used in studios and broadcasting to either make the sound have more punchiness to it, or simply just to make everything audible in a loud environment such as in a car. Data compression, as the name says, is about making digital audio to fit in a smaller storage space. This can either be done without losing any information (lossless compression) or by removing redundant and inaudible information in the audio (lossy compression). Lossless compression always produces the original audio bit-by-bit and thus it's ideal for archival purposes. Lossy compression achieves much better compression ratios, and therefore is a better choice for transferring music etc. but the decoded signal is different from the original, and usually further editing or compression does not work well.

High resolution

High resolution audio refers to audio with more data than the traditional CD quality, meaning either a bit depth of more than 16 bits and/or sampling frequency higher than 44.1 kHz (CD) or 48 kHz (DAT & video). In my opinion, high resolution audio is useful in studios for editing and manipulating sound, but not really worth it just for listening. I know many people will disagree about that statement, but let's think about it for a second. Most music is compressed during the mixing and mastering stage, and I don't know if any record uses the full dynamic range available with 16 bits. Likewise, higher sampling rate audio would be significantly different only at frequencies above 20 kHz, unless those ultrasonics have been filtered out in the studio, but it is an anatomical fact that human hearing becomes less sensitive to high frequencies with age, so much so that people above 30 commonly don't hear frequencies above 15 kHz all that well, if at all. Fortunately there is very little musically significant information in the highest octave.