Error when exporting trained model

Lately, I’ve been getting an error which won’t let me export or download after finishing trained models. I’ve never had this problem before in my time of using the model trainer until very recently.

Down below is a copy of the code that shows up when trying to export a trained model.

> Generating model file: Orange OR120 Crunch Test model.json
> /usr/local/lib/python3.10/dist-packages/keras/src/layers/core/input_layer.py:26: UserWarning: Argument `input_shape` is deprecated. Use `shape` instead.
>   warnings.warn(
> WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
> I0000 00:00:1722906726.918138   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.964890   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.965190   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.965881   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.966147   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.966332   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906727.060654   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906727.060932   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906727.061205   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> Traceback (most recent call last):
>   File "/content/Automated-GuitarAmpModelling/modelToKeras.py", line 85, in <module>
>     lstm_layer = keras.layers.LSTM(hidden_size, activation=None, weights=lstm_weights, return_sequences=True, recurrent_activation=None, use_bias=bias_fl, unit_forget_bias=False)
>   File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/rnn/lstm.py", line 486, in __init__
>     super().__init__(
>   File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/rnn/rnn.py", line 204, in __init__
>     super().__init__(**kwargs)
>   File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/layer.py", line 266, in __init__
>     raise ValueError(
> ValueError: Unrecognized keyword arguments passed to LSTM: {'weights': [array([[-4.64390498e-03,  8.43106359e-02, -8.31696242e-02,
>         -6.38481677e-02, -3.17510545e-01,  1.49999306e-01,
>         -2.86676466e-01,  2.15041280e-01, -3.72158647e-01,
>          1.41348734e-01,  1.88434303e-01,  8.33668560e-02,
>          2.53838062e-01, -1.26453742e-01,  6.82209013e-03,
>         -3.05161811e-02,  4.76790555e-02,  9.35292244e-02,
>         -8.73495489e-02,  3.67253348e-02, -1.75170109e-01,
>         -4.32963446e-02, -8.66313800e-02,  1.16616800e-01,
>         -5.26590347e-01,  7.48868659e-02, -1.07993223e-01,
>         -8.40747431e-02,  7.63554946e-02, -1.71959952e-01,
>         -1.91733479e-01,  1.20556697e-01,  1.92059621e-01,
>         -2.83274114e-01,  2.73865253e-01,  1.21221140e-01,
>         -6.12211525e-01,  1.23889208e+00, -1.26676500e+00,
>         -6.77534789e-02, -1.23839431e-01,  1.26742080e-01,
>          4.66871142e-01, -4.09075618e-01, -3.19231719e-01,
>          3.89545448e-02, -7.01428890e-01, -5.43421891e-04,
>          1.68996230e-02, -1.66286156e-01, -2.34960631e-01,
>          3.34443823e-02,  1.28992677e-01, -3.36274296e-01,
>         -5.37445173e-02, -4.53085452e-02,  6.39919341e-02,
>         -1.97191145e-02, -3.80552933e-03, -4.51324806e-02,
>         -1.55252978e-01,  1.15365756e+00,  2.24652156e-01,
>         -9.44509953e-02]]), array([[-0.17964645,  0.25568137,  0.21861663, ..., -0.10716321,
>         -0.00238903,  0.38228613],
>        [-0.37276298,  0.01081463,  0.1564104 , ...,  0.06062626,
>         -0.08346827,  0.23109984],
>        [ 0.06336492,  0.11828368, -0.03863488, ...,  0.03618264,
>         -0.48156601, -0.19605348],
>        ...,
>        [ 0.24165745,  0.20604421,  0.58149827, ...,  0.45216867,
>          0.43325123,  0.24461728],
>        [-0.42249793, -0.27418131, -0.15306103, ...,  1.09719026,
>          0.08983386, -0.14855994],
>        [ 0.31684455,  0.17004417, -0.14520794, ..., -0.59670144,
>          0.15913162,  0.03115464]]), array([ 0.03045244,  0.61540875,  0.97076464, -2.04676569, -1.26952249,
>         0.32732394,  0.89703417, -0.36222738, -1.80146372,  0.16542026,
>         0.79674131,  0.50879617,  0.05789363,  0.26923135,  0.02137228,
>         1.48363173,  0.67931215,  0.50625056,  0.18523889,  2.07478809,
>         1.35269082, -0.1282013 ,  0.40214343,  0.45933437,  1.66571057,
>         0.85095823, -0.24459287,  0.46056713,  0.54284786,  1.09635508,
>        -0.16104783,  0.08547631,  0.1655647 , -0.41905095, -0.11920086,
>         0.32213762,  0.13933085, -0.13247236,  0.27529777, -0.41632511,
>        -0.05018181,  0.15617287, -0.20218337, -0.15579043,  0.28067771,
>         0.30057193, -0.03643238,  0.39063578,  0.18349517,  0.28971492,
>         0.07503192,  0.02594861, -0.18928754,  0.59984547,  1.21236187,
>         0.49216588,  0.15279882,  0.27200954,  1.59668618,  1.12000352,
>         0.33180845,  0.26186374,  0.11560587,  1.45965922])]}
> ---------------------------------------------------------------------------
> FileNotFoundError                         Traceback (most recent call last)
> <ipython-input-12-f9ac5a56b131> in <cell line: 60>()
>      58 
>      59 
> ---> 60 shutil.copyfile(os.path.join(model_dir, 'model_keras.json'), os.path.join('/content', model_filename))
>      61 files.download(os.path.join('/content', model_filename))
>      62 
> 
> /usr/lib/python3.10/shutil.py in copyfile(src, dst, follow_symlinks)
>     252         os.symlink(os.readlink(src), dst)
>     253     else:
> --> 254         with open(src, 'rb') as fsrc:
>     255             try:
>     256                 with open(dst, 'wb') as fdst:
> 
> FileNotFoundError: [Errno 2] No such file or directory: '/content/Automated-GuitarAmpModelling/Results/Orange OR120 Crunch_LSTM-16-0/model_keras.json'

I’m obviously no expert when it comes to google colab so if anyone can help me figure out & identify the problem here, I would be very thankful.

1 Like

Anyone???

Because of this error, I cannot export or download any models that I train. This has only started happening recently. Why is this community so quiet?

it’s likely that something changed in the keras python library. This has to be addressed in the github repo the colab script uses.

1 Like

Forgive my lack of knowledge but I’m wondering who’s job it is to do that and if there is any way to let them know of the issue.

I published this problem as an issue on AIDA-X github repo, hope it’s solved as quick as possible. This should be a top priority for AIDA and for the Dwarf as a multifx device.

Firstly because it looks like a quick fix for someone who understands the python libraries that the training uses, as the problem seem to be caused by deprecated arguments and name changes inside the Keras python library like @spunktsch pointed out

Other than that, an amp modeling software/platform needs to have it’s way of modeling new amps working, and it seems like it’s been 2 weeks since it broke with no fix and no word about it.

I understand how long it can take to fix something like that and that it may not be as simple of a problem as it seems, and I don’t feel that I’m owed that fix in no way. What bothers me is the complete lack of communication when a feature that big/important is broken. What is happening behind the scenes about it?

2 Likes

Mod is famous for it’s lack of communication… we regularly wonder if it’s still alive. An announcement about a substantial investment that could change Mod Audio’s life was made to those who had invested in MOD, but with a confidentiality clause: it was on January 31.

4 Likes

@GrimJim you are right.

In the end it’s our responsibility that stuff keeps working. I’m also not very happy with a few things and the fact that we are still just 2 people at AIDA DSP slows down things a lot.

It’s not that we don’t have ideas to make the captures a more straight forward process. But to actually do that we need more time.

MOD isn’t existent anymore, at least given the recent lack of communication - not only in the forum.
So there will be no updates to all the links and files that got pushed through MOD.

But to not just find excuses I’ll talk to @madmaxwell what we can do to the script to make it work again on colab.

6 Likes

This is extraordinarily alarming coming from you, given how closely you worked with them on AIDA integration.

4 Likes

2 weeks since it broke

if it’s about training, please open issues here Issues · AidaDSP/Automated-GuitarAmpModelling · GitHub

AIDA-X is the plugin source code repo and the inference engine. Let’s reference the issue in training repo and discuss there!

Regarding keras deps: when you work with Colab, the env is updated every month more or less. Stuff breaks. There should be a commented section at the beginning which installs the expected versions. Let’s verify this in the next days. There is also a docker compose app / docker container, which is the way me and @spunktsch perform training! The docker image will work for sure, since the env is controlled by us.

Sorry for your experience, let’s handle it on github, thanks for the feedback

1 Like

In my experience, I have always used the following link Google Colab
for model training. That always worked until recently. When I run the cell to export a trained model, the error code (as shown on the OP) shows up.
What is the docker thing you’ve mentioned? is it part of Colab? Could you explain what it is and where can I find it?

Since I’m experiencing the issue as well, I’m looking for alternatives.

This tech (and the process to create it) is way too sexy to peter out like this.

I read about the “docker” method.
Is there an alternative to run on a Win10 machine (my other pc).
I vaguely remember looking into that but it seemed to contain steps that were beyond me, despite my carrer in ICT in the 00’ :stuck_out_tongue:

1 Like

In addition to offer docker method, I’ve fixed the current AIDA_X_Model_Trainer.ipynb to install known working dependencies instead of relying on the ones that Colab decides to install. So the issues around training should be solved.

We tracked the issue here for doc KeyError: 'test_lossESR_final' running the trainer in google colab · Issue #8 · AidaDSP/Automated-GuitarAmpModelling · GitHub.

There are a few points of discussion:

I would need help to adjust AIDA_X_Model_Trainer.ipynb to fit changes in next branch where I have landed some new models (and more will come). Those are slight modified RNN networks that needs to be investigated. I really wish I had more time to do that but atm I’m full.

This activity is tracked here Adapt AIDA_X_Model_Trainer.ipynb to changes in next branch · Issue #9 · AidaDSP/Automated-GuitarAmpModelling · GitHub. I am already using this branch (next) for local trainings so it’s kinda tested

Regarding Google Colab, some of you already know that is not the ideal solution, also in the Pro version. It would be much better to just have a cloud machine with GPU access and the capability to just run our docker image. I have created also a docker-compose.yml in next branch. The reason is no more hassle with dependencies and more importantly, we are all training with same torch / cuDNN versions. The solution already works locally (this is the way I perform trainings in a dedicated server).

11 Likes

The fix was nice while it lasted. Unfortunately, The Colab model trainer has broken again. I understand the nature of Google Colab is an ever changing beast etc. so stuff like this happens. Anyway, my situation is as follows…

After running step 0 (Deps.), I then run step 1. (set-up) and it fails because of (what i can tell) a ModuleNotFound error stating “no module named tensorboard” so I added !pip install tensorboard before step 1 and that made step 1 work again.

However after training, When I run the final step (model export) the following error shows up…

Generating model file: ReAmp 2 - British 45 w MTZ & HMZ_LSTM-16-0.aidax
Traceback (most recent call last):
File “/content/Automated-GuitarAmpModelling/modelToKeras.py”, line 4, in
from tensorflow import keras
ModuleNotFoundError: No module named ‘tensorflow’

FileNotFoundError Traceback (most recent call last)
in <cell line: 0>()
16 get_ipython().system(‘python3 modelToKeras.py -lm “$model_path”’)
17
—> 18 shutil.copyfile(os.path.join(model_dir, ‘model_keras.json’), os.path.join(‘/content’, os.path.split(model_dir)[-1]+‘.aidax’))
19 files.download(os.path.join(‘/content’, model_filename))
20

/usr/lib/python3.11/shutil.py in copyfile(src, dst, follow_symlinks)
254 os.symlink(os.readlink(src), dst)
255 else:
→ 256 with open(src, ‘rb’) as fsrc:
257 try:
258 with open(dst, ‘wb’) as fdst:

FileNotFoundError: [Errno 2] No such file or directory: ‘/content/Automated-GuitarAmpModelling/Results/ReAmp 2 - British 45 w MTZ & HMZ_LSTM-16-0/model_keras.json’

Seeing a ModuleNotFound error stating tensorflow being missing, I added !pip install tensorflow hoping it would be a fix of sorts. It didn’t help. Instead, it just makes the export error look like this…

Generating model file: ReAmp 2 - British 45 w MTZ & HMZ_LSTM-16-0.aidax
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1737389440.134242 11555 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1737389440.140494 11555 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/local/lib/python3.11/dist-packages/keras/src/layers/core/input_layer.py:26: UserWarning: Argument input_shape is deprecated. Use shape instead.
warnings.warn(
I0000 00:00:1737389443.033363 11555 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13949 MB memory: → device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5
Traceback (most recent call last):
File “/content/Automated-GuitarAmpModelling/modelToKeras.py”, line 85, in
lstm_layer = keras.layers.LSTM(hidden_size, activation=None, weights=lstm_weights, return_sequences=True, recurrent_activation=None, use_bias=bias_fl, unit_forget_bias=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.11/dist-packages/keras/src/layers/rnn/lstm.py”, line 486, in init
super().init(
File “/usr/local/lib/python3.11/dist-packages/keras/src/layers/rnn/rnn.py”, line 204, in init
super().init(**kwargs)
File “/usr/local/lib/python3.11/dist-packages/keras/src/layers/layer.py”, line 285, in init
raise ValueError(
ValueError: Unrecognized keyword arguments passed to LSTM: {‘weights’: [array([[ 6.85593943e-13, 2.31633845e-09, -2.08443582e-10,
1.29936656e-04, 1.49625051e-03, 1.63036002e-05,
-2.15014438e-07, 8.57912710e-07, 5.75254336e-02,
2.60625097e-06, 2.29189071e-04, 1.35362102e-03,
4.98012360e-03, 2.10718341e-08, 1.38716960e-05,
4.30791579e-06, 7.33855956e-13, 2.72265099e-09,
1.60017344e-09, 1.53481684e-04, 1.72244280e-03,
1.82085496e-05, -8.07840053e-08, 1.14668319e-06,
6.26638383e-02, 3.11153008e-06, 2.73769954e-04,
1.61592348e-03, 5.36912354e-03, 3.12392352e-08,
1.14625336e-05, 6.58163071e-06, 2.33928006e-07,
1.55269936e-05, 5.40821093e-05, 1.75052928e-03,
-3.43618877e-02, -9.11797746e-04, 4.83917946e-04,
-2.18486399e-04, -2.23889872e-01, 4.71613312e-04,
-1.41166744e-03, -2.97270878e-03, 8.33799224e-03,
-1.13270660e-04, 2.80707551e-04, -1.41778565e-03,
7.67831599e-13, 2.39060260e-09, 1.09052345e-09,
1.40120741e-04, 1.63832388e-03, 1.69845480e-05,
-1.10181531e-07, 9.91846719e-07, 4.91075292e-02,
2.85956298e-06, 2.52237951e-04, 1.47175335e-03,
4.78533050e-03, 2.77963430e-08, 1.30952685e-05,
5.40440169e-06]]), array([[-2.10184836e-19, 5.05299538e-15, -1.59839132e-14, …,
-9.04291900e-14, -4.71948175e-12, 4.93118428e-11],
[ 3.41069013e-16, -9.55320892e-14, -9.09858777e-13, …,
-7.30266889e-12, -4.13714535e-10, 1.58821967e-09],
[-4.12765826e-16, -2.41629470e-13, 1.34307680e-12, …,
6.40665689e-12, -6.11719009e-10, -2.98495118e-09],
…,
[ 1.13641921e-15, 1.19262608e-12, -2.62885894e-12, …,
-1.20345929e-11, 4.07612166e-10, 5.32258904e-09],
[-1.98583211e-15, -1.54913912e-11, -2.63567623e-11, …,
-1.36660683e-10, -8.52114379e-09, -4.87000040e-09],
[ 4.72560765e-14, -1.20542430e-11, -7.76823814e-11, …,
-3.67972347e-10, -2.12027835e-08, 1.39827581e-08]]), array([-4.52699033e-12, -1.76186798e-08, -6.43415561e-08, -9.97643569e-04,
-1.58371236e-02, -1.19860291e-04, -7.88440138e-06, -7.76120123e-06,
-8.32401067e-02, -1.89936636e-05, -1.70826912e-03, -1.01945354e-02,
-1.20812077e-02, -4.00856493e-07, -6.86088460e-05, -6.27224435e-05,
-4.02238616e-12, -1.85308124e-08, -5.93572445e-08, -9.83577571e-04,
-1.73679013e-02, -1.24635117e-04, -7.85384964e-06, -7.68170503e-06,
-1.75929725e-01, -1.71169613e-05, -1.75282266e-03, -1.01039466e-02,
-1.45140574e-02, -3.86889951e-07, -3.88310698e-04, -6.49140493e-05,
-1.03784259e-07, 7.41338590e-06, -6.12715894e-06, 1.14381441e-03,
-3.11631744e-03, -4.74439905e-04, -6.76144700e-05, 5.92988508e-05,
1.31635915e-01, -1.67774479e-04, -1.35820732e-03, -3.23351379e-03,
-2.25071632e-02, -5.09746314e-07, 8.76725157e-04, 1.04102190e-04,
-4.50363835e-12, -1.76016979e-08, -6.42376747e-08, -9.96393035e-04,
-1.58547871e-02, -1.19794153e-04, -7.88150737e-06, -7.74211549e-06,
-9.36228335e-02, -1.86432680e-05, -1.70382357e-03, -1.01700909e-02,
-1.22231566e-02, -4.00518132e-07, -7.55080400e-05, -6.24794193e-05])]}

FileNotFoundError Traceback (most recent call last)
in <cell line: 0>()
16 get_ipython().system(‘python3 modelToKeras.py -lm “$model_path”’)
17
—> 18 shutil.copyfile(os.path.join(model_dir, ‘model_keras.json’), os.path.join(‘/content’, os.path.split(model_dir)[-1]+‘.aidax’))
19 files.download(os.path.join(‘/content’, model_filename))
20

/usr/lib/python3.11/shutil.py in copyfile(src, dst, follow_symlinks)
254 os.symlink(os.readlink(src), dst)
255 else:
→ 256 with open(src, ‘rb’) as fsrc:
257 try:
258 with open(dst, ‘wb’) as fdst:

FileNotFoundError: [Errno 2] No such file or directory: ‘/content/Automated-GuitarAmpModelling/Results/ReAmp 2 - British 45 w MTZ & HMZ_LSTM-16-0/model_keras.json’

I hope that we’ll get a fix for this issue again.

I did a test and successfully trained a model. In cell 0, change tenserflow and tensorboard from 2.11.0 to 2.12.0.
If you don’t want to edit the code every time you load the notebook you can use my fork.
https://github.com/pilali/Automated-GuitarAmpModelling/blob/aidadsp_devel/AIDA_X_Model_Trainer.ipynb

1 Like

Thank you. It worked.

I hope this fix will last long enough until the next problem Inevitably comes up. Many such cases in my time of using the model trainer colab page.

Nice. Unfortunately at some point these little workarounds won’t be enough. But until then…