While training: KeyError: 'test_lossESR_final

LievenDV · October 4, 2024, 11:05pm

Picking up modeling amps from digital blends again.

Getting the KeyError: ‘test_lossESR_final’ error when I try to do the training step while training a model. This is not the first time I create a model but it’s the first time I’m stuck in this step. It’s been a while since I created my last model though.
Filesizes of input and target check out, so does the rate.

I fail to dirive the cause from the coude output

/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py:28: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, min, max):
/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py:33: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):

args.model = SimpleRNN
args.device = MOD-DWARF
args.file_name = PF-SpacePirate
args.input_size = 1
args.hidden_size = 16
args.unit_type = LSTM
args.loss_fcns = {'ESR': 0.75, 'DC': 0.25}
args.skip_con = 0
args.pre_filt = A-Weighting
existing model file found, loading network.. continuing training..
/usr/local/lib/python3.10/dist-packages/torch/__init__.py:955: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:432.)
  _C._set_default_tensor_type(t)
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:60: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
  warnings.warn(
  0% 1/540 [00:03<34:53,  3.88s/it]
Traceback (most recent call last):
  File "/content/Automated-GuitarAmpModelling/dist_model_recnet.py", line 238, in <module>
    val_output, val_loss = network.process_data(dataset.subsets['val'].data['input'][0],
  File "/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py", line 777, in process_data
    output[l * chunk:(l + 1) * chunk] = self(input_data[l * chunk:(l + 1) * chunk])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py", line 698, in forward
    x, self.hidden = self.rec(x, self.hidden)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py", line 917, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-6e2df7b97e5b> in <cell line: 46>()
     44 model_dir = f"/content/Automated-GuitarAmpModelling/Results/{file_name}_{config_file}-{skip_con}"
     45 step = max(step, 2)
---> 46 print("Training done!\nESR after training: ", extract_best_esr_model(model_dir)[1])

/content/Automated-GuitarAmpModelling/colab_functions.py in extract_best_esr_model(dirpath)
    161   with open(stats_file) as json_file:
    162     stats_data = json.load(json_file)
--> 163     test_lossESR_final = stats_data['test_lossESR_final']
    164     test_lossESR_best = stats_data['test_lossESR_best']
    165     esr = min(test_lossESR_final, test_lossESR_best)

KeyError: 'test_lossESR_final'

GrimJim · October 5, 2024, 1:06pm

The google colab page which the MODAudio/AIDA DSP crew set up for model training has been neglected for a long time. Maybe they’ll get around to updating & fixing it some day or maybe it will continue to be left alone. It all depends on whatever business situation the company is in (which I do not know of) at the moment. Perhaps the team might tell us more.

LievenDV · October 5, 2024, 2:16pm

Ah, …

This process is the core of user generated content when it comes to profile tech.

Hope the program still has future

spunktsch · October 5, 2024, 3:00pm

Hey @LievenDV,

I wrote about it in this thread. We still intent to fix it but it will take time.

Romain_P · October 5, 2024, 6:37pm

Hi there,

I’m experiencing the same problem as LievenDV.
Here is the error code:

/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py:28: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, min, max):
/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py:33: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):

args.model = SimpleRNN
args.device = MOD-DWARF
args.file_name = Normal
args.input_size = 1
args.hidden_size = 16
args.unit_type = LSTM
args.loss_fcns = {'ESR': 0.75, 'DC': 0.25}
args.skip_con = 1
args.pre_filt = A-Weighting
no saved model found, creating new network
/usr/local/lib/python3.10/dist-packages/torch/__init__.py:955: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:432.)
  _C._set_default_tensor_type(t)
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:60: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
  warnings.warn(
  0% 1/200 [00:02<09:38,  2.91s/it]
Traceback (most recent call last):
  File "/content/Automated-GuitarAmpModelling/dist_model_recnet.py", line 238, in <module>
    val_output, val_loss = network.process_data(dataset.subsets['val'].data['input'][0],
  File "/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py", line 777, in process_data
    output[l * chunk:(l + 1) * chunk] = self(input_data[l * chunk:(l + 1) * chunk])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py", line 691, in forward
    x, self.hidden = self.rec(x, self.hidden)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py", line 917, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-c34516351da8> in <cell line: 46>()
     44 model_dir = f"/content/Automated-GuitarAmpModelling/Results/{file_name}_{config_file}-{skip_con}"
     45 step = max(step, 2)
---> 46 print("Training done!\nESR after training: ", extract_best_esr_model(model_dir)[1])

/content/Automated-GuitarAmpModelling/colab_functions.py in extract_best_esr_model(dirpath)
    161   with open(stats_file) as json_file:
    162     stats_data = json.load(json_file)
--> 163     test_lossESR_final = stats_data['test_lossESR_final']
    164     test_lossESR_best = stats_data['test_lossESR_best']
    165     esr = min(test_lossESR_final, test_lossESR_best)

KeyError: 'test_lossESR_final'

I hope the project is not dead, because AIDA is really a nice plugin.

Bye

pilal · October 27, 2024, 5:08pm

Some temporary fixes have been made on the aidadsp_devel branch.
When running STEP 0 you’ll be ask to restart the session. DON’T !
Edit the code of Step five.
Find the line : shutil.copyfile(os.path.join(model_dir, ‘model_keras.json’), os.path.join(‘/content’, os.path.split(model_dir)[-1]+‘.aidax’))\

Replace .aidax with .json. Or if you’re not confortable with messing with the code, download manually the model file in the content folder.

Don’t run step 4 as it may crash the whole session.

Optionnaly at step 2, I usually change “norm=True” to “norm=False”.

spunktsch · October 28, 2024, 1:25pm

thanks for the heads up, missed that line.
Should be fixed now for step 5.

LievenDV · October 28, 2024, 5:27pm

Ok, Tried again;

had to “restart session” because some updates but third time worked like a charm.
Need to test the model itself but it’s a high gain amp model that landed on an 0.018 ESR so that should be promising enough.

Thanks all involved!

LievenDV · March 29, 2025, 4:09pm

Having this error again today

@spunktsch same type of issue or is it me? ^^

Code/error dump:

/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py:27: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py:32: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1743264274.702080    2302 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743264274.708400    2302 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

args.model = SimpleRNN
args.device = MOD-DWARF
args.file_name = Trivium lead
args.input_size = 1
args.hidden_size = 16
args.unit_type = LSTM
args.loss_fcns = {'ESR': 0.75, 'DC': 0.25}
args.skip_con = 0
args.pre_filt = A-Weighting
no saved model found, creating new network
/usr/local/lib/python3.11/dist-packages/torch/__init__.py:1236: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /pytorch/torch/csrc/tensor/python_tensor.cpp:434.)
  _C._set_default_tensor_type(t)
/usr/local/lib/python3.11/dist-packages/torch/optim/lr_scheduler.py:62: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
  warnings.warn(
  0% 1/400 [00:03<22:55,  3.45s/it]
Traceback (most recent call last):
  File "/content/Automated-GuitarAmpModelling/dist_model_recnet.py", line 238, in <module>
    val_output, val_loss = network.process_data(dataset.subsets['val'].data['input'][0],
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py", line 777, in process_data
    output[l * chunk:(l + 1) * chunk] = self(input_data[l * chunk:(l + 1) * chunk])
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/content/Automated-GuitarAmpModelling/CoreAudioML/networks.py", line 698, in forward
    x, self.hidden = self.rec(x, self.hidden)
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/rnn.py", line 1124, in forward
    result = _VF.lstm(
             ^^^^^^^^^
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-b5bcbd2cac6e> in <cell line: 0>()
     44 model_dir = f"/content/Automated-GuitarAmpModelling/Results/{file_name}_{config_file}-{skip_con}"
     45 step = max(step, 2)
---> 46 print("Training done!\nESR after training: ", extract_best_esr_model(model_dir)[1])

/content/Automated-GuitarAmpModelling/colab_functions.py in extract_best_esr_model(dirpath)
    161   with open(stats_file) as json_file:
    162     stats_data = json.load(json_file)
--> 163     test_lossESR_final = stats_data['test_lossESR_final']
    164     test_lossESR_best = stats_data['test_lossESR_best']
    165     esr = min(test_lossESR_final, test_lossESR_best)

KeyError: 'test_lossESR_final'

madmaxwell · April 3, 2025, 10:29am

@LievenDV @pilal

When running STEP 0 you’ll be ask to restart the session. DON’T !

Why? Quite the opposite you have to restart the session otherwise the torch version in use will still be the most recent one which is not supported.

Brief explanation: Colab instances are updated by Google without user notification. As per today is Jammy Ubuntu 22.04.4 LTS, released from Ubuntu 2024-09-12 18:47 and we basically don’t know when Colab decided to perform the upgrade. An Ubuntu release dictates a python version and everything else on top of that. All this current ML stuff is python-based.

This sucks, since things will break every now and then.

I am working on two issues:

The torch / cuda version comparison was not handling the >= case but only >, fixed it now
The package tensorflow==2.11.0 tensorboard==2.11.0 won’t install on Ubuntu Jammy. This dep is required only for final model conversion, I am working to remove this dep

Please use this link for the Colab script, aka the one on Github. In the past we had a Google Drive version, that’s not up to date. We should have substistuted all the links in doc, but beware

https://colab.research.google.com/github/AidaDSP/Automated-GuitarAmpModelling/blob/aidadsp_devel/AIDA_X_Model_Trainer.ipynb

pilal · April 3, 2025, 11:02am

At the time of this post there was a first “restart session” pop up long before the script has ended. Should have been more precise.

LievenDV · April 3, 2025, 11:20am

@madmaxwell

I tried the link you provided
in step 0 (desp check) he asks to install some things, this happens.
He asks to restart, I did as you instructed
"Your environment is correctly set up. No need to reinstall dependencies."

Though, in the setup step (1), I het this error:

Checking GPU availability... GPU available! 
mkdir: cannot create directory ‘/content/temp’: File exists
Installing dependencies...
Getting the code...
Checking for code updates...
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-3-712270af17f8> in <cell line: 0>()
     43 
     44 
---> 45   from colab_functions import wav2tensor, extract_best_esr_model, prep_audio, create_csv_nam_v1_1_1
     46   import plotly.graph_objects as go
     47   from CoreAudioML.networks import load_model

1 frames
/usr/local/lib/python3.11/dist-packages/torch/utils/tensorboard/__init__.py in <module>
----> 1 import tensorboard
      2 from distutils.version import LooseVersion
      3 
      4 if not hasattr(tensorboard, "__version__") or LooseVersion(
      5     tensorboard.__version__

ModuleNotFoundError: No module named 'tensorboard'

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

madmaxwell · April 3, 2025, 1:12pm

Yeah

The package tensorflow==2.11.0 tensorboard==2.11.0 won’t install on Ubuntu Jammy. This dep is required only for final model conversion, I am working to remove this dep

From my msg above! I’m working on it

LievenDV · April 3, 2025, 7:41pm

Thanks Max!

Pity we always have to bother you like this

madmaxwell · April 4, 2025, 7:17am

Pity we always have to bother you like this

Quite the opposite, I still consider this community pretty much unique. As per today there aren’t too many 100% audio dedicated embedded devices with a store open to third party apps. For more than a year Mod was the only device on Earth of this kind supporting neural models, and the integration we did back then is still unmatched in terms of performance. I hope to see more of this in future Mod-derivative products, and as Aida DSP we’re ready for new challenges!

PS: in parallel with going motivational, I have successfully refactored modelToKeras.py to run without tensorflow deps, I am landing the changes to next and I will adjust AIDA_X_Model_Trainer.ipynb to work against new next implementation. The work but also the most important part, the testing, is taking some time.

LievenDV · April 4, 2025, 7:34am

Well I hope we can keep it going for a long while because I have some more modelling lined up

I kinda rely on the workbook to continue my experiments

Thanks we’ll hear it when there is a new version in production

gochotactico · April 5, 2025, 9:01am

Hello, I’m new to the forum and I’m having the same problem as the one described here. I’m trying to capture my amp but I just can’t get it to work. I don’t know if you’ve solved the problem yet. I hope for a positive response to the matter in question.

pilal · April 6, 2025, 11:32pm

To sum up, use the the notebook from the aidadsp_devel branch.

https://colab.research.google.com/github/AidaDSP/Automated-GuitarAmpModelling/blob/aidadsp_devel/AIDA_X_Model_Trainer.ipynb

Edit the code for cell 0.
Look for the line :

!pip3 install --disable-pip-version-check --no-cache-dir tensorflow==2.11.0 tensorboard==2.11.0

Change version number to 2.12.0.

You should be good to go.

madmaxwell · April 7, 2025, 7:48am

I am landing the changes into next branch, had to switch to other stuff hope do conclude today. In the next branch I have removed the need to install or tweak tensorflow installation. I was still getting errors by simply upgrading tensorflow & tensorboard packages. The reason was that some cross-deps were not satisfied. Can we move on github otherwise here is confusing for users

github.com/AidaDSP/Automated-GuitarAmpModelling

Jupyter Notebook broken on Google Colab

opened 03:42PM - 01 Apr 25 UTC

ChrsBr

bug

The notebook fails to install dependencies in the first step. ``` ERROR: Could …not find a version that satisfies the requirement tensorflow==2.11.0 (from versions: 2.12.0rc0, 2.12.0rc1, 2.12.0, 2.12.1, 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0, 2.13.1, 2.14.0rc0, 2.14.0rc1, 2.14.0, 2.14.1, 2.15.0rc0, 2.15.0rc1, 2.15.0, 2.15.0.post1, 2.15.1, 2.16.0rc0, 2.16.1, 2.16.2, 2.17.0rc0, 2.17.0rc1, 2.17.0, 2.17.1, 2.18.0rc0, 2.18.0rc1, 2.18.0rc2, 2.18.0, 2.18.1, 2.19.0rc0, 2.19.0) ERROR: No matching distribution found for tensorflow==2.11.0 ``` I guess this is because Google Colab updated to Python 3.11 in January and tensorflow 2.11 seems not to be compatible with that. When I modify the script to first install Python 3.10, installing the dependencies works fine.

gianfranco · April 7, 2025, 9:26am

Hi @madmaxwell @pilal

Sharing here a suggestion from a colleague that worked with the trainer a bit:

I do have some ideas for making the maintenance a bit more manageable.

I would probably switch the project to the new uv (uv) python project and dependency management tool.
Then, I would also try get rid of the dependency on TensorFlow since it is only used for exporting the models but puts some restrictions on the python version making it harder to maintain the other dependencies.