All audio arrives at our ears "late". Sound travels through air, the speed of sound is slow. This acoustic latency is significant to a recording engineer but other areas of audio have to think about it on a bigger scale.
The live sound world is very good at this, the speakers halfway down a festival site are called “delays” because the signals feeding them have delays applied to them to time-align them with the speakers at the stage. When the sound wave from the stage passes the delays, the delayed signal is played out at the same time.
In most studio situations this delay effect is usually short enough to go unnoticed because we are so used to it - But only if we are only hearing the sound once. We are reasonably tolerant of acoustic delays, or “latencies” if they correspond with how we know the world to work. If someone shouts to us from 25 meters away, the fact that their mouth and their voice are out of sync by 80ms doesn’t bother us because the visual and the auditory information correspond. If that person was standing next to us and their mouth moved but the sound emerged 80ms later I think we’d find that pretty disturbing.
Visual Sync
But these examples are linking the visual and the auditory and in audio-only situations these cues don’t apply, but if we think about situations where system latency bothers us, it’s always because there is a discrepancy between the latent audio and something else, usually the same audio reaching the listener by another route.
Sensitivity To Hearing Sounds Twice
We are more tolerant of the arrival time of audio than perhaps we think. When playing on a stage a bass player and a drummer might easily be 4 metres apart, the acoustic latency between the snare drum and the bass player’s ears would be 12ms. If listening to each other acoustically this would be unlikely to be an issue but if they are also listening to each other through monitor wedges then there are two paths for the snare drum to take to the bass player’s ears: acoustically from drum to ears (12ms) and electrically through the cables (so fast as to be negligible) and from a speaker at the bass player’s feet - still a small distance from the ears (6ms). With two arrival times to compare, our brains become much less tolerant because our brains are extremely sensitive to differences of arrival time because of our stereo hearing. Arrival times are spatial cues. Relatively long differences can tell us about our environment, first reflections from walls tell us about the space we are in, and very short differences tell is what direction a sound source is relative to us - we are really good at hearing this stuff and therefore are perfectly designed to really dislike latency caused by audio systems when we can hear the original sound at the same time - i.e. tracking in a recording studio!
DAW Latency
We’ve got lots of good information about latency up on the site already, one very popular article from a few years ago was this piece on tracking latencies which gives a thorough explanation of the relationship between buffer settings in a DAW and the equivalent acoustic latency
From the Pro Tools Fundamentals series, Managing Latency In Native Systems gives an overview of the issue and some of the common solutions. Latency in native DAWs is well understood and, as it has always been a persistent issue in DAW workflows, people who are familiar with it are understandably wary of any potential source of additional latency.
Network Latency
Our friends at DAD have posted a thorough explanation of latency in a Dante system. Following the signal from a microphone to headphones through a typical Dante system and examining each point where latency occurs.
One of the regular points of frustration for people new to AoIP is the lack of a single figure for latency. This is because there isn’t one answer, it depends on the network but in the example, it can be seen that additional latency introduced by the network is small enough to of little concern under normal conditions. For example:
“Traditional” Latency:
Distance from the vocalist's mouth to the microphone – 25cm of distance equals approximately 0.8ms of analog latency.
AD conversion – Filters typically introduce around 20 samples of digital delay, which translates into 0.4ms of latency.
ASIO / DAW – Getting in and out of your DAW also introduces digital latency. Typically 64 samples, which equals 1.3ms in either direction. 2.6ms in total.
Software Plugins – In many cases added plugins will also add to the total latency, but since these are not necessarily added during tracking (latency is less of an issue when mixing, of course), we will not add a specific amount of latency in this example.
DA conversion – Filters typically introduce between 10-40 samples of digital delay. If we use 30 samples as an example, this equals 0.6ms of digital latency.
Monitoring – In this case, a vocalist would wear headphones and there would be no analog latency worth mentioning, but if for instance a guitarist or bassist would be tracking in the control room, there would most likely be at least 1-2ms of analog latency from the monitors and to his/her ears.
OK. If we add up those points of latency, you would experience a total of 4.4ms of latency. This would be at a sample rate of 48 kHz, and in case you record at 96 kHz the latency introduced digitally (AD/DA conversion and ASIO/DAW) would be halved. The distance from mount to mic would, of course, be the same, so in total the latency at 96 kHz would be 2.6ms.
AoIP Latency
Now, let’s find out how much additional latency you could expect in case you decide to establish an IP-based audio solution such as Audinate’s Dante network protocol.
There is not a fixed value of milliseconds of latency being introduced on a Dante network. It depends on a number of things, including your computer, the number of switches in the total network, etc. But if you have less than 4 switches in your network, you could likely go as low as 0.25 ms of latency on either side of the ASIO/DAW instance.
In that case, you should add 0.5ms of latency that is caused by the network:
Total latency @48 kHz: 4.9ms
Total latency @96 kHz: 3.1ms
The important thing to keep in mind, though, is that if we look at the percentage of latency introduced by adding IP audio, the difference is very subtle:
Percentage of latency caused by the network:
48 kHz: approx: 10%
96 kHz: approx: 16%
If you would like more detail then I strongly recommend you follow the link to Latency from Microphone to Headphone in AoIP Solutions on the NTP website to see a video presentation by DAD’s Jan Lykke talking through an example system.