Modelling: best practices & conventions

Hi guys,

Since some of us have been experimenting with the model training, I thought this thread could serve as a “cutting board” to assemble best practices: confirm or debunk impressions and assumptions.
If this ferments and matures, it can be distilled to Wiki material.

Please add your remarks and ideas for conventions

amp settings
convention: all at noon unless specified
Do you guys set everything at noon?
I tweak a little and try to tune down the gain a bit, even if it concerns “high gain models”.

including cab IR’s or not"
convention: amp only, cab included optional extra

Adding a model + a cab IR in your chain can get processor heavy. Combinign amp and cab in one model could be a solution but I didn’t get satisfying results so far. For me the standard is “amp only” for now but I was wondering how it works for the rest.

So when it comes to the input and target file sound files, some specs need to be followed:

File size:
exact same size, down to the number of samples
What length should be optimal?

input file
convention: the input file supplied by MOD Audio
I know there is a demo file and I always use it but would something else be more suitable for lower tuning or acoustic instruments?

sample rate
convention: 48000hz
(and not the typical standard 44100 you find in many tools)

normalisation and peaks
Convention: normalised to -3db? (or-6db?)
I’ve read somewhere we have to aim for the same peak level of -6db, is that correct?
Should both files have this? What is the best way to achieve correct normalisation?

Latency when re-amping
when re-amping plugins on your computer, latency may occur. I haven’t found the best way to mitigate this yet (besides liding to match some wave shapes… which isn’t easy when doing high gain stuff)

training model type
convention: at least 1 “standard”, a second should be “light” or “lighterst”?
When we train and deliver json files, should we publish a JSON for each trainign model?
(from lightest to heavy)

Who has heard significant results between trainings? For me, the light/lightest training sounds just as good as the standard.

esr results
What intervals can we label and how significant is this value?

Mixed succes: make sure you listen and compare very carefully before publishing

output gain
How do we keep the output gain uniform across the models we provide to the community?
In a different thread I saw the addition of al ine of code to the JSON. Is that an absolute value or do you need to compare with others and deicde how much it should be? what method would be the best way to determine?


thanks for taking the initiative. Most of the stuff I dealt with the last few months you touched on.
The guide will also be updated before the release.

It really depends and should be adjusted to taste. The AIDA-X eq is very flexible so that you can shape your tone AFTER the model to some extend.

this is the most crucial part of the training. And as you mentioned could be even genre/amp specific.
So with a community effort we could combine eq sweeps with riffs from various genres/guitar. They then can be split up and rearranged to fit the amp and playing style.
As an example: I probably don’t need a baritone deathcore riff when training an AC30.

I do it for both files Because the closer the wav files are in volume the easier for the network.
The -6db is because we can adjust the pregain up to +6db. But It’s not that crucial because we can adjust the out gain vial the json file. This needs to be implemented into the training - amongst other things.

I’d also found that on high gain it hard sometimes to match them exactly - sometimes even the phase is inverted. To make it easier I add a standard reaper click source to the begging and glue it.

for me the standard and light are nearly identical. You may notice a better high end but that’s also dependent on the amp. Heavy vs. Light it’s more noticeable but not by much and you need to consider the use case. I played gigs and rehearsals always with the light and in a mix those subtle nuances doesn’t really matter. This is more at home by yourself thing.

this is also highly dependent on the dataset. I don’t hear or feel a difference from an esr of 0.06 to 0.009 or better - at least with my equipment. But values above 0.15 give you a little different sound than the amp/pedal/device you try to model. But again sometimes that mattes sometimes not.

I would go for consitency and go for the -6db. Test the model file out and see what the output (in db) is and adjust that via the out_gain in the json file. This will be adapted by the training in a next version.


Great info @spunktsch, very valuable post for me

Yeah best thing would be updating the guide, as it is the best source for this kind of info.

I think many will underestimate the importance of manually aligning the files. Some might not even be aware this latency is a -thing- . The example on how to distorted waves is a valuable one.

I’ll start of editing my OP tomorrow and adding your info already.


Some thoughts on the latency problem:
One can master the latency problem by actually measuring and compensating for it in JACK, or Ardour in ALSA mode (if you’re on Linux, that is. Don’t know how to do that on Windows…). I had no latency issues doing that modeling some physical tube preamps (ESR < 0.01).

More of a general approach (for physical gear) would be to use a separate file player (like the mod itself) and actually record the input.wav along with the target.wav. Since both are recorded together, the same latency applies and they should be perfectly aligned.

To model a plugin, I would bounce the input.wav with the plugin, instead of actually recording the target.wav, no latency issues there.

Different question regarding “Best practices”:
What would be the best setup to capture a (real, physical) tube Amp Head without the cab? I know of the HK RedBox that can get the pre-cab signal, but that uses its own cabsim. Can I just use any DI?

1 Like


Look at this contraption I made on my Dwarf :smiley:
It is now a standalone re-amp device

File player: → to 2 outputs → to 2 inputs.
One of the iputs is just a patch cable but the other has a the pedal that is being re-amped on the outside.


This way, the signal does the same steps.

For VST plugins in Reaper, I render once with FX ON and once with FX off. Some plugins showed significant latency though (and my pc is no low end machine at all)

This is how @itskais did it:

1 Like

I ve noticed quite noticable difference when profiling an overdive between standard and hard, it sounded really harsh at standard and very similar to the original in hard mode. I’ve tried to train twice at hard mode and there has been some slight differences between those two trainings. I still do not know what epochs is and what value should be set to get best results. Skip connection should be on or off?

I did it like this and it worked perfectly, no latency issues and very easy to execute.

this was very helpfull and even a noob like me was able to do it wright, thanks!

Amp settings:
I’ve been setting them to where I like them

Cab IRs:
I have gotten mixed results but generally, I find it sounds better with the IR separate. Side note: it would be nice if there was a standard way for us to know if someone else’s amp model includes the cab or not

I hadn’t considered this, but I wonder if that could lead to better models if I also rendered the input file with no plugins

Other note:
I find my AIDA models have poor dynamics. For example, the amp I am modelling is usually set to edge-of-breakup so it cleans up when I play lightly and breaks up when I dig in. However, on the AIDA model, it gets quieter and louder based on my playing but the clipping doesn’t change much. Is anyone else noticing this and have they found a solution to this?

1 Like

It could depend on the input file lacking of dynamics? What are you using?

I’ve been playing around more since posting this and have had mixed results. Sometimes it comes out gainier than expected with few dynamics, sometimes it comes out very clean, and other times it comes out just right, in between. I still haven’t figured it a consistent trick yet.

Are you using always the same input.wav? Can you share it?

I use the input.wav that Mod provides

what is your setup? If you use a reamp DI box (mandatory I would say) you need to check the levels and the expected tone WHILE playing the Dataset through the amp. If the output recorded from the amp doesn’t exibit this behaviour, you won’t find it in the model. If you still find issues, it may be the NAM Dataset which is poor in dynamics. In this case just for you to know me and @spunktsch are using a different Dataset, but at the moment we’re still not ready for sharing

I just use Helix Native in Reaper.

Hi all!!
Can anyone tell me why i never reach the 500 epochs?It stops way before that and says patience validation or something like that…
Also how can i reset the code if i want to train multiple models without exciting and redoing everything??If i just replace the target.wav with another capture and click the cells again it says neural network found contiuning training and the result it like a mix from the previous capture or just the previous capture…

I don’t think that is a negative thing, doesn’t that mean the algorithm recognizes that it isn’t learning anything new and it doesn’t need the time to know what it’s about?

can’t really answer the second question since I always select a new folder on my drive for every new model so I start from that step again

1 Like

Hi @Boukman,

the first thing is totally normal. As @LievenDV said it stops when the model is already the best version.
(should be a easier to understand prompt on our side of the training)

The second thing is as like you said: it starts the training with the previous results.
You just have to delete the folder of the models in the Results folder.

1 Like

Thanks a lot!!I`ll try it!!

So, I made a few attempts at making models, but in the end, my ESR ended up somewhere between 1.000xxxx and 0.9xxxx, and from what I recall, that’s bad. Also, the best one (0.9xxxx) ended up completely silent. I think my issue was getting the input and target audio lined up perfectly. After recording, I use audacity to line up the 2 files. I line up the 2 clicks at the beginning, but by the end of the file, there appears to be some drift. So, to combat this, I decided to record both Dwarf outputs to a single stereo track. Output 1 going through the amp, then the audio interface, output 2 going straight to the audio interface. Finally, in Audacity, I split the stereo track into 2 mono tracks and save one as target.wav and one as input.wav. This solved the problem and my resulting ESR ended up at 0.0012241610093042254, which is a lot better, and sounds pretty decent. Just thought I’d post this here for anyone else facing similar issues.

Here’s the pedalboard I used to make the recording.