While training: KeyError: 'test_lossESR_final

madmaxwell · April 7, 2025, 9:53am

Thanks @gianfranco, I’ve removed tensorflow dep past Friday already, I am now finishing testing after rebasing the Jupyter script on current next branch

madmaxwell · April 9, 2025, 11:06am

Generating model file: SimpleRNN_MOD-AUDIO-UG_in1_LSTM-8-1_skip0.aidax

Model file saved to: /content/Automated-GuitarAmpModelling/SimpleRNN_MOD-AUDIO-UG_in1_LSTM-8-1_skip0.aidax

allright @LievenDV @pilal and others here, I’ve finished the testing for the new training script & codebase. Lot of things to share, but before diving into the new stuff I would like to double check with you if the basic stuff is working

https://colab.research.google.com/github/AidaDSP/Automated-GuitarAmpModelling/blob/next/AIDA_X_Model_Trainer.ipynb

can you help me in this job? I will then merge this stuff into aidadsp_devel

madmaxwell · April 9, 2025, 11:12am

Step 4 Model Evaluation is still WIP atm, so just do the training in Step 3 and move to Step 5 for export

LievenDV · April 9, 2025, 11:19am

Will check this afternoon (Wednesday 9/4/25)

LievenDV · April 9, 2025, 1:44pm

Step 0; check deps.
Script did some installs, asked to restart so I did.

Step 1: retup, run cell: I get error

Checking GPU availability... GPU available! 
Getting the code...

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>
    ColabKernelApp.launch_instance()
  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance
    app.start()
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 205, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/usr/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
    await self.process_one()
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 499, in process_one
    await dispatch(*args)
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
    await result
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 730, in execute_request
    reply_content = await reply_content
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py", line 383, in do_execute
    res = shell.run_cell(
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/zmqshell.py", line 528, in run_cell
    return super().run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 2975, in run_cell
    result = self._run_cell(
  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell
    return runner(coro)
  File "/usr/local/lib/python3.11/dist-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner
    coro.send(None)
  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3257, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1-727379d7cf67>", line 19, in <cell line: 0>
    device = torch.device("cuda")
<ipython-input-1-727379d7cf67>:19: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
  device = torch.device("cuda")
Checking for code updates...
Installing dependencies...
Mounting google drive...
Mounted at /content/drive
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-727379d7cf67> in <cell line: 0>()
     50   os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:2"
     51 
---> 52   from colab_functions import wav2tensor, extract_best_esr_model, create_csv_aidax
     53   from prep_wav import WavParse
     54   import plotly.graph_objects as go

ImportError: cannot import name 'create_csv_aidax' from 'colab_functions' (/content/Automated-GuitarAmpModelling/colab_functions.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

madmaxwell · April 9, 2025, 3:04pm

Ok I’m looking into it

madmaxwell · April 9, 2025, 5:37pm

Ok I’ve updated the PyTorch / Cuda deps so that now my Colab instance looks like this

PyTorch: 2.3.1+cu121
Cuda: 12.1
Python 3.11.11
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

in the meantime my Colab instance is lagging heavily and I cannot execute the cells. I will let you know when I am able to continue the tests

pilal · April 9, 2025, 6:10pm

I just trained a model and everything went fine.

LievenDV · April 9, 2025, 7:29pm

The workbook CAN make models

First try

I was having issues in step one, tried to refresh my browser etc but they persisted.
I tried a whole new window with a different google user and google drive and succesfully trained a model.
Landed on a ESR of 0.33 though.
(High gain model of a Trivium Amp Knob Rhythm)
The character is pretty similar though… but it misses oomph in low end.

Second try

with built in IR this time instead of amp only
Though, also on this second account, I can’t get the workbook to find folders and files since the workbook won’t refresh my google drive folders.

Caching creating issues? is there anotherway except the “refresh” option next to a folder?
The folders in the workbook won’t refresh and he doesn’t accept new folders on my drive, gives error saying folder/files do not exist.

Trying to use an existing folder that I used for a previous training doesn’t work because then he says the files already exist and he won’t upload the files again.

Third try

trying again an hour later, with fresh browser sessions and I could continue
the “cab included” version rendered a version with ESR 0.47 sqo that was even worse :p.
But it succeeded in making " a model"

Fourth try

I thought 'I’m in aworking session now, let’s try another". …but it couldn’t, getting the old errors in step 1 again

madmaxwell · April 10, 2025, 7:09am

Okay first of all we need to separate specific training issues with a particular “Audio Circuit” from issues with current next branch

I.e. was this model that is now failing to train on next succeeding with old codebase? If possible use the exact input.wav and target.wav from a previous experiment. The key is to reduce the moving parts to a minimum, so that we can isolate the issue.

The folders in the workbook won’t refresh and he doesn’t accept new folders on my drive, gives error saying folder/files do not exist.

Let me check again how the files are handled in the AIDA_X_Model_Trainer.ipynb script

madmaxwell · April 10, 2025, 7:27am

he says the files already exist

should now have been fixed by latest commit below

commit b0995ecfd37f1a100fc9b79edbdc2e323485afda (HEAD -> next, origin/next, github/next)
Date:   Thu Apr 10 09:25:48 2025 +0200

    Avoid using shutil.copy which is unable to handle simple file overwrite scenarios

LievenDV · April 10, 2025, 7:29am

alright:
rewrite enable fix: testing as soon a I get on a different machine

— up till now, before that test (wast typing this up during your reply)-----------

without issues caused elsewhere in previous tries, the step 0 seems to work now. That means the initial issue is solved.
The training part seems to work as well. The quality of a new model was off but it works.
Trying with previous files didn’t work but that’s mostly because the flow only works well under certain circumstances. I can’t do 2 trainings after each other

Need a completely new session
Some time needs to pass between sessions
files need to be in uploaded in google drive before trying
I tried with an already existed fodldr and files of a model that I knew that worked a year ago. got error it already existed so will try again with that later today.
got the message my credits were depleted, could be due to security policy on this work laptop

madmaxwell · April 10, 2025, 7:38am

Allright, with the commits

6017205 (HEAD -> next, origin/next, github/next) Get rid of shutil.copyfile also in Export
66a2373 Revert "Revert "Avoid loading existing model if found""
b0995ec Avoid using shutil.copy which is unable to handle simple file overwrite scenarios
15041d1 Just use DATA_DIR to copy the final exported model

the script should handle multiple experiments exec in a row. I have changed the default behaviour so that if an existing model is found, then is not continuing training anymore but is now starting from scratch every time. Also I’ve fixed the annoying file copy handle, that was basically failing if files already exist. Can you retry? Thanks for your patience!

madmaxwell · April 10, 2025, 8:34am

Okay with latest commits now also

4. Model Evaluation

is working again. I also cleaned up some additional bits. Let’s pin also @spunktsch on this for additional feedback.

When can discuss the changes in a new thread and possibly troubleshoot training issues. I just need the “ok” from this thread so that I can merge next in aidadsp_devel and move to new stuff

madmaxwell · April 10, 2025, 3:07pm

According to tests made by several people the training is now working and this issue is closed. I will merge next into aidadsp_devel branch.

A new feature that I can’t wait to share is the possibility now to run the whole thing (the Colab aka the Jupyter script) locally

git clone https://github.com/aidadsp/Automated-GuitarAmpModelling.git
cd Automated-GuitarAmpModelling && git checkout next && git submodule update --init --recursive
docker compose up -d

then just type on your browser

http://localhost:8080

note that this require docker to be installed and configured in your workstation, and a GPU. There are TONS of tutorials online on how to do so, since it’s the backbone of AI training envs accross several scenarios: local, cloud and CI. So worth investing your precious time on this skill!!!

LievenDV · April 10, 2025, 7:20pm

I have never done this before and managed to do it in a short timespan on a win10
(with a bit of help of chatgpt who guided me through some terminology and bios settings to enable virtualisation)

gochotactico · April 12, 2025, 10:26pm

The truth is that I have not been able to get the workbook to work. If you could give me the steps specified to at least get it working, that would help me a lot.

madmaxwell · April 14, 2025, 7:45am

Hello, you need to open this link and follow step by step.

https://colab.research.google.com/github/AidaDSP/Automated-GuitarAmpModelling/blob/aidadsp_devel/AIDA_X_Model_Trainer.ipynb

in the meantime I have merged next into aidadsp_devel. People is confirming now the script works

again, so I’m sure you will be able to make it work! In case, provide some additional details: "it doesn’t

work" is not enough for us to understand.

gochotactico · April 14, 2025, 10:04pm

Well, I tried to run the workbook and I’m still getting the same error as the last few days. When I try to run the setup or step 1, I get this error:

python

Checking GPU availability... GPU available!   
Getting the code...  
Checking for code updates...  
Installing dependencies...  
Mounting google drive...  
Mounted at /content/drive  
---------------------------------------------------------------------------  
RuntimeError                              Traceback (most recent call last)  
<ipython-input-2-727379d7cf67> in <cell line: 0>()  
     50   os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:2"  
     51   
---> 52   from colab_functions import wav2tensor, extract_best_esr_model, create_csv_aidax  
     53   from prep_wav import WavParse  
     54   import plotly.graph_objects as go  

5 frames  
/usr/local/lib/python3.11/dist-packages/torch/_library/fake_impl.py in register(self, func, source)  

RuntimeError: operator torchvision::nms does not exist

I have no idea what it could be and I haven’t seen anyone else here experiencing this.

LievenDV · April 14, 2025, 10:52pm

I have this too…

I’ve been running training locally in a Docker image without this issue.

When I test this after a couple of days online via the classic way, I get this as well.

Deps step:

WARNING: Your environment has PyTorch 2.6.0+cu124 and CUDA 12.4. This environment is not supported.
Proceeding to install required dependencies...

Setup step:

Checking GPU availability... GPU available! 
Getting the code...
Checking for code updates...
Installing dependencies...
Mounting google drive...
Mounted at /content/drive
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-727379d7cf67> in <cell line: 0>()
     50   os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:2"
     51 
---> 52   from colab_functions import wav2tensor, extract_best_esr_model, create_csv_aidax
     53   from prep_wav import WavParse
     54   import plotly.graph_objects as go

5 frames
/usr/local/lib/python3.11/dist-packages/torch/_library/fake_impl.py in register(self, func, source)

RuntimeError: operator torchvision::nms does not exist