I give up!....validation patience limit reached!

I’m doing something stupid, just I know.
I get the same fail every time. I have not got a “good” model through yet.
What triggers “validation patience limit reached at epoch XX” ?

<- RUN CELL (►)

This will check for GPU availability, prepare the code for you, and mount your drive.

Show code

— Checking GPU availability… GPU unavailable, using CPU instead. RECOMMENDED: You can enable GPU through “Runtime” → “Change runtime type” → “Hardware accelerator:” GPU → Save Getting the code… Checking for code updates… Mounting google drive… Mounted at /content/drive Ready! you can now move to step 1: DATA

1. The Data (upload + preprocessing) :bookmark_tabs:

Step 1.1: Download the capture signal

Download the pre-crafted “capture signal” called input.wav from the provided link.

Step 1.2 Reamp your gear

Use the downloaded capture signal to reamp the gear that you want to model. Record the output and save it as “target.wav”. For a detailed demonstration of how to reamp your gear using the capture signal, refer to this video tutorial starting at 1:10 and ending at 3:44.



<- RUN CELL (►)

Step 1.3 upload

  • In drive, put the 2 audio files with which you would like to train in a single folder.
    • input.wav : contains the reference (dry/DI) sound.
    • target.wav : contains the target (amped/with effects) sound.
  • Use the file browser in the left panel to find a folder with your audio, right-click “Copy Path”, paste below, and run the cell.
    • ex. /content/drive/My Drive/training-data-folder




Show code

— Input file name: /content/drive/MyDrive/input.wav Target file name: /content/drive/MyDrive/target.wav Input rate: 48000 length: 14523000 [samples] Target rate: 48000 length: 14523000 [samples] Preprocessing the training data… Data prepared! you can now move to step 2: TRAINING

2. Model Training :man_lifting_weights:



<- RUN CELL (►)

Training usually takes around 10 minutes, but this can change depending on the duration of the training data that you provided and the model_type you choose.
Note that training doesn’t always lead to the same results. You may want to run it a couple of times and compare the results.

Choose the Model type you want to train:
Generally, the heavier the model the more accurate it is, but also the more CPU it consumes. Here’s a list of approximate CPU consumption of each model type on a MOD Dwarf:

  • Lightest: 25% CPU
  • Light: 30% CPU
  • Standard: 37% CPU
  • Heavy: 46% CPU


Some training hyper parameters (Recommended: ignore and continue with default values):



Show code

— device = MOD-DWARF file_name = MyDrive unit_type = LSTM size = 16 skip_connection = 0 36% 71/200 [21:31<37:59, 17.67s/it]validation patience limit reached at epoch 72 36% 71/200 [21:52<39:45, 18.49s/it] done training testing the final model testing the best model finished training: MyDrive_LSTM-16 Training done! ESR after training: 0.801609218120575

Correct me if I’m wrong (please):

It happens that he quits before the maximum of iterations but yours seems to stop quite quickly

That seems like a high value; this would mean accuracy is off.
My guess is that the high ESR and the short cycle is be due to the fact that there is too much difference between the input and the target?(that the algoritm attempts to match signals that are misaligned in time for example?)

I’m no specialist, this is speculation

I’ve been trying to model a BM Deacy - total ego trip, great sound but only one person on the planet can play that tone all night, and it is not me!
So yes there is distortion and compression but it is no more than the Rockman JSON that some kind person posted.

I had a similar problem once; I (totally random) solved fit by creating a longer input file by taking the longer one and adding some of my own chuggin’s DI signal behind it. Accuracy wasn’t supergreat but it was an improvement and the proces didn’t stop so early.

Perhaps you can try experimenting with the input file?

After you create target.wav try to enlarge at maximum resolution “target” and “input” and align them watching the two ticks at start, after that cut them in the same place at start and end and export them.
Maybe your “target” has a little delay due to latency.
My first captures had the same issue

I was aligning both “input” and “target” on the + leading edge of the first alignment click to compensate for any latency. I even got down to where on the square wave click I was aligning to. Still, all failed. Do we know what the ideal alignment point is?
I tried the going positive slopes, the plateaus, and right in between. all failed

@mark_couling I had the same issue. I could fix it by aligning a zero crossing on both signals. Meaning: I looked for a spot, where the input signal switched from negative to positive signal values. Then I took the target signal and shifted it forward (I made it come earlier) until the zero crossing of it was at the same sample as the one on the input signal.

just for the context:
My input signal was a clean bass signal and I captured a bass overdrive pedal. It’s not a 100% match, but with a bit of post-EQ, it works. ESR was 0.008 or so. I’m retraining it right now and will probably share it then

good tip! Will include that in my “best practices” article I’m writing


I’m having this same validation patience limit reached early in the training. I’ve tried all sorts:

  • Aligning to reduce delay

  • different normalising levels on target file

  • using input and output files from NAM that have produced 0.0005 ESR via NAM training

and nothing gets better than 0.98 via AIDA colab training (set for 400 epochs). What I end up doing as a workaround is running the AIDA input file through the NAM capture plugin then running this audio as the target through the AIDA trainer.

Any advice greatly appreciated on what could be causing the high ESR via AIDA colab?

The validation patience limit thing happens to me a lot too. It can be quite frustrating but I find that an amp model can still be usable when it gets beyond 300 epochs (I always set it 400 and its nice when it reaches the that) Some captures go the full distance while others end prematurely so I personally view it as a luck based affair. I also find that the trainer struggles more with clean tones than it does with high gain ones strangely enough. It usually takes multiple attempts to get good results.

I was originally using two mono tracks to record input and target (thinking that i could adjust the gain individually) obviously, both trimmed to length and time aligned as needed.
I found i had much better luck using a stereo pair and adjusting the output of the source devices…
All i could think was that two mono tracks are not EXACTLY the same as one stereo track that i split in the recorder and then uploaded to the cloud.
got some good ones in the end.

1 Like