Validation patience limit reached at epoch

Hi guys, I am bumping into this when training.

I am not completely sure what this message means, but looks to me like training fails prematurely, being unable to achieve any useful results.

Also I do not know what ESR is, but to my guess that’s probably some kind of deviation metric, that can indicate quality of the model, and for the working json’s I’ve seen this esr value is much closer to zero, and mine is almost 1.

Also result, “predicted” sound does not resemble “target” even far.
Any ideas what I could make wrong?

---
device = MOD-DWARF
file_name = drive
unit_type = LSTM
size = 8
skip_connection = 1
 35%|██████████████▎                          | 105/300 [04:08<07:17,  2.24s/it ]**validation patience limit reached at epoch 106**
 35%|██████████████▎                          | 105/300 [04:11<07:47,  2.40s/it]
done training
testing the final model
testing the best model
finished training: drive_LSTM-8
Training done!
ESR after training:  0.9969175457954407

P.S. I am running learning via Colab connected to the “local environment”, where local environment is on my windows laptop with jupyter docker container in WSL2, based on the aidadsp/pytorch image.

It seems to be an exotic setup, so I should mention this, in case if I’ve shot my own leg with going this way.

 FROM aidadsp/pytorch

USER root
RUN apt update && \
    apt -y install git
RUN mkdir /content && \
    chown aidadsp /content
	USER aidadsp
RUN pip install jupyter_http_over_ws && \
    jupyter serverextension enable --py jupyter_http_over_ws && \
    pip install librosa plotly 
USER aidadsp
#instead of the goole drive input
RUN mkdir /content/drive
RUN mkdir /content/drive/input

WORKDIR /content
ENTRYPOINT ["jupyter", "notebook", "--ip='*'", "--port=8888", "--allow-root", "--NotebookApp.allow_origin='https://colab.research.google.com'", "--NotebookApp.port_retries=0"]

the first message is

this message means that the model is not getting any better and instead of running until the end it stops.
Basically saving you the time with same result.

Did you align the input.wav and the target.wav?
If you’d like to post these I can have a look later and maybe see whats wrong.

Thank you for the answer, here it is:

I’ve noticed that target is one sample shorter, but it seems to be handled by the script:

Input rate: 48000 length: 14523000 [samples]
Target rate: 48000 length: 14522999 [samples]
Warning! Length for audio files
  /content/drive/input.wav
  /content/drive/target.wav
does not match, setting both to 14522999 [samples]
Preprocessing the training data...

I’ve looked into them in Audacity side by side and do not see any misalignment, both should be mono 48khz.

And, for opposite approach of testing - is there any valid input.wav and target.wav pair to try?

@ignis32 you can test with these: deerinkstudios-bogner-red - Google Drive

Bogner Red Channel - ESR after training: 0.0259

Hmm. Your files processed fine for me.
Current model: deerinkstudios-bogner-red_LSTM-16
ESR: 0.028407106176018715

Interesting is that your input.wav differs from the one I’ve downloaded by link from the AIDA-X Model Trainer.ipynb provided by MOD modelling tutorial Your input.wav is two minutes shorter, missing some real guitar licks in addition to the first 3 minutes of creepy noises. Why? :slight_smile:

However, cutting my own input and target to the same size does not make any difference, unfortunately.

this is the one NAM uses. Can you reregister your target with it and normalize both to -6db? (that input.wav should be -6)

If the numbers are better see how it sounds to the original.

I tried these files with the Colab notebook with the standard parameters, but I’m getting different results. Bigger ESR and sometimes I got the warning validation patience limit reached…

I’m not sure if I’m missing something, but each training gets quite different values without changing anything…. Is this normal?

Interesting this -6db headroom, all of the clean models I did are at very low volume (both files 0db normalized), is it related? Or is there a way to edit the json file to push up the volume?

this is normal. If you don’t change folders the training uses the model file generated before as a starting point. So it adds more epochs to those already in the model.

To get around that is to delete the .json files generated in the training folder or change the folder where you have input.wav and target.wav

That’s quite a catch. I use new folders for every new experiment, but it is something that I could easily miss.

I think this should go into some FAQ of some sort in addition to MOD Aida X Modeling Guide.

1 Like

I agree. This could also be in the script as an option.

Taking a look at the input files that you provided @ignis32, I can notice 2 issues:

First thing you can see in the beginning of the audio files, that for the first pop in the input file there’s no corresponding pop in the target file, and for the second pop in the input file there’s somehow 2 pops in the target file. Now I don’t really understand how that happened, but similar things seem to be happening during the rest of the files as well.

Second: taking a closer look at the second pop and how the target reacts to that, you can see that there’s a latency of at least 5ms between the pop in the input signal and the target signal (assuming that the 1st pop in the target corresponds to the 2nd pop in the input).

These issues defintely explain the inability of the model to converge to a good solution. Issue 1 means that somehow there’s no clear operation that gets you from input to target. Issue 2 means that the model would have to generate the target sound with a latency of at least 5ms (which is not only difficult for the model, but also not appreciated when playing in realtime haha)

Now these issues may have happened during the reamping process or if you did some quick processing after the reamping. My advice is just to pay more attention to that, data is pretty crucial when it comes to AI in general and our case specifically.

For comparison, here’s how other valid reamping files can look like in terms of response and response time.

Hope this helps, let us know if you manage to fix it!

1 Like

With removal of that 5ms latency I’ve ended with ESR: 0.7310895919799805 and moved 20 epoch further to 125/300.

“Predicted” has now some overdrive feel, far from the target, but that’s looks a noticeable improvement to me.

I guess my next move would be doing something with normalization, or maybe doing some other reamp.

To conclude, 5ms lag is significant.

Same problem here, training usually stops before reaching the limit of epochs, I’m trying to compare different results for example
Light training at 600 epochs (I tried many times but always stops before) vs Standard at 300 because CPU consumption on Dwarf could be a problem on live gigs if I would like to use some “heavy” plugins. I will give you some results soon

this isn’t necessarily a bad thing. If the ESR is good enough the model wouldn’t improve with more epochs.
Its just a time saver.
A esr around 0.01 - 0.09 mostly sounds pretty good.
And I used the light model for live use in the earlier stages of the plugin. Depends on the room you play but for small venues its definitely enough.

2 Likes