So what are the component responsible for the rest of latency?
The analog circuitry.
EDIT: correction, it is not really all âanalogâ stuff, but something out of the control of our software.
@Jan can probably explain.
Letâs take the 64 frames sync case, which has 4.473ms total latency.
64 / 48kHz = 1.333ms;
Running with 3 periods per buffer means 1.333 x 3 = 4ms
So the analog circuitry latency on the Duo X is 0.47ms.
I believe this is lower on the Duo, around 0.2ms.
Anyway, 0.5ms latency is almost negligible.
The math checks out, there is no way around how this works.
What I see happens regularly though, is that applications show their block latency, which misleads users.
At 64 frames with 48kHz sample rate, the block latency is 1.333ms, which many applications will just show as-is. But the actual total/physical latency is at least double than that, 3x for certain audio cards, and plus some from hardware (usb audio cards are usually the worst case in terms of added latency)
This âblock latencyâ is outside of jack2? Does one of your last benchmark here
include this 2.6ms latency? Otherwise figures are wrong somehow.
jack
itself doesnât add latency. It is an interface to the low level driver, which connects to the CODEC chip. On jack we can set the amount of samples per block, which is this example is 128.
I assume the reason why to do block processing instead of sample processing is clear. The rule of thumb is, the more samples, the less pre/post-amble operations, less context switches, interrupts, etcâŚ
There are also many reasons on why to work with double buffering, a.k.a. ping pong buffering and this is called synced mode in jack world. async mode means weâre adding another extra buffer (also called frame), which I hadnât seen before this example here and was explained by @falkTX above. As far as I can tell, the math above is correct.
I am not familiar with the jack_iodelay
tool, but it seems to be doing some averaging over many frames. In the end it shouldnât really matter, because the system should be deterministic and there shouldnât be any difference on latency between frames. Also, somehow there seems to be a slight difference between async 2 buffers and sync 3 buffers, which I donât completely grasp. This difference of ~0.02ms is however negligible.
What comes on top of this whole buffering thing is the CODEC latency, normally around 1ms and in this case even less and the latency of the analog components, on the order of tenths of nanoseconds.
Yes, it includes the block latency 3 times. 2 for the number of audio periods (minimum is always 2 as far as I know), and then 1 extra for the async mode.
so with this we get 2.7ms * 3 = 8.1ms (128/48kHz is actually closer to 2.7ms than it is to 2.6ms)
With the codec 0.4ms latency, the final result is 8.5ms.
Thanks! So the bottleneck is the driver? It would be possible to use a customized one for ultra low latency?
It is for sure possible. Bela did it as it was already mentioned in this thread previously. I wouldnât expect it to be an easy task nor this solution to be easily scalable to further MOD devices. It is all a matter of whether the gain is worth the effort. For my ears sub 5ms is enough, though pushing harder for lower latency would allow further chaining of devices (which anyway is not my use case with the MOD pedals )
To me giving up on control chain possibility would be a good tradeoff for <3ms latency
Quoting Bela FAQ
So you are running Pd and and you claim less than one millisecond latency? How is that possible, given Pdâs minimum buffer size is 64 samples?
For two reasons. First, we are not running the Pd program itself, and second one, we are not using the Linux ALSA drivers. Pd patches are compiled into C code using the Heavy Audio Tools. The C code produced is highly optimised and is automatically wrapped into our C++ API. This bypasses the whole Linux kernel (and ALSA) and allows it to run with buffer sizes as small as 2 audio samples, giving a rountrip latency below 1ms for audio (because the ADC/DAC have some built-in latency) and below 100us for analog (whose converters are faster).
Also mentioned above, the Bela isnât very comparable to the MOD devices beyond a very broad sense of being a programmable audio unit. There are notable differences in the technical design choices, the end-user experience, the out-of-the-box practical applications, and the target audiences.
In an academic sense, itâs nice to know what a lower bar might be for digital audio units but everything in hardware and software is about trade-offs and the MOD architecture is focused on building a product that fulfills their vision, which is still heavily based on the use of open source software across the whole stack. This makes a wealth of knowledge and tooling available from the vast Linux ecosystem as compared to other more exotic solutions. There will be more users who have useful knowledge about inner workings and can bring their experience and perspectives into the mix. The toolsets allow for easier building of things like the web interface and continued regular development. This in turn lowers the bar for use as relatively non-technical users are able to grasp the interface and assemble complicated pedalboards with minimal introduction.
Itâs OK to conclude that if you need guaranteed sub-4ms latency then these devices wonât be a good choice for you. Many users are already happy, productive, and performing with their Duos on >8ms latency (some users even double the default rate in settings). Users have reported using their Duos in acoustic, orchestral, ambient, rock, etc scenarios with success. Nobody is wrong in the end, just that people have different physical acoustical abilities and tolerances, different use cases with varying latency tolerances and different expectations.
Iâm just trying to understand what the real limitations are and if they can be overcome. If I think of the next Dwarf, I think of a machine designed for live situations, used in most cases by guitarists or bass players. In this scenario, the âcompetitorsâ certainly work below the notorious 8ms (I know for sure that Fractal roundtrip is 2ms, I personally tried Hotone Ampero and Mooer GE300 and they are definitely below 8-6ms, you can feel it under your fingers). I donât think (or rather I hope) that choosing to use a standard kernel rather than a customized one in which the customized part is only the one related to the audio driver does not affect the usage of the other features that are not strictly related to the signal processing (UX, plugin development, etc.). Considering then that if wireless devices are used for I/O the latency would further increase (with Hotone Ampero + 2 wireless devices I think the latency is around 8ms and it is just acceptable, with an A/B test the difference is evident) I would say that an effort could be made to try to improve this aspect (if there are margins of improvement not dictated by the hardware used and at this point I understand that this is not the case with Dwarf and Duo X). It would be an improvement to brag about! I hope that last working test that has given 4.473ms total latency would work well on Dwarf too, it could be still acceptable.
But might very well end up being the case.
The compromise is that lower latency = more cpu load = less plugins that can be loaded overall.
@unbracketed please correct me if I am wrong, but in Belaâs case only very specific sounds can be loaded right? can it handle more than 1 PD file? and use externals? (problem with PD seems to always be the externals haha)
From a guitarist point of view, loading no more than 6-7 plugins could be enough. With this limitation if it would be possible to reach 2-3ms latency it would be great. I know Duo X can currently load a lot of heavy plugins simultaneously but I suspect in most real use cases it will lead to unused CPU power. If that low latency could be obtained sacrificing CPU power and selected via UI settings would be a huge feature! I mean something like âCheck this flag for low latency (no more than x% CPU usage is allowed or no more than X plugins are allowed)â.
We understand that, thanks for the continued feedback.
As I said before, this was something that was never too viable for the Duo, so we did not research it much.
For the Duo X and Dwarf, it makes sense to give some options.
We will do some tests and discuss this in the team.
I highly appreciate the use of well known technology like ALSA and jack. It makes a lot of fun hacking around in the system and makes it a LOT more valuable. Since the computing power constantly grows, Iâm sure that we could half the latency in a few years
Is there anything else I should do except for setting the buffer of jack to 64 for testing it by myself on the Duo X?
you can change /usr/bin/mod-jackd
as that is the script that starts up jack.
the â-Sâ option param is already there, set if reading a file, but since it did not work so well on the Duo it also enables the extra alsa period buffer when doing so.
I think if you look at the file you will understand quickly
Iâve just upgraded my Duo to the latest 1.9.1 with mainline kernel - is this likely to affect latency positively or negatively? Thanks.
Negatively, by around 0.4ms I think.
There is something that is mainline kernel is doing regarding i2s that adds a tiny bit of delay somewhere, but I could not find out where or how exactly
I think we can live with that