Error when exporting trained model

Lately, I’ve been getting an error which won’t let me export or download after finishing trained models. I’ve never had this problem before in my time of using the model trainer until very recently.

Down below is a copy of the code that shows up when trying to export a trained model.

> Generating model file: Orange OR120 Crunch Test model.json
> /usr/local/lib/python3.10/dist-packages/keras/src/layers/core/input_layer.py:26: UserWarning: Argument `input_shape` is deprecated. Use `shape` instead.
>   warnings.warn(
> WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
> I0000 00:00:1722906726.918138   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.964890   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.965190   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.965881   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.966147   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906726.966332   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906727.060654   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906727.060932   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> I0000 00:00:1722906727.061205   20783 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
> Traceback (most recent call last):
>   File "/content/Automated-GuitarAmpModelling/modelToKeras.py", line 85, in <module>
>     lstm_layer = keras.layers.LSTM(hidden_size, activation=None, weights=lstm_weights, return_sequences=True, recurrent_activation=None, use_bias=bias_fl, unit_forget_bias=False)
>   File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/rnn/lstm.py", line 486, in __init__
>     super().__init__(
>   File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/rnn/rnn.py", line 204, in __init__
>     super().__init__(**kwargs)
>   File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/layer.py", line 266, in __init__
>     raise ValueError(
> ValueError: Unrecognized keyword arguments passed to LSTM: {'weights': [array([[-4.64390498e-03,  8.43106359e-02, -8.31696242e-02,
>         -6.38481677e-02, -3.17510545e-01,  1.49999306e-01,
>         -2.86676466e-01,  2.15041280e-01, -3.72158647e-01,
>          1.41348734e-01,  1.88434303e-01,  8.33668560e-02,
>          2.53838062e-01, -1.26453742e-01,  6.82209013e-03,
>         -3.05161811e-02,  4.76790555e-02,  9.35292244e-02,
>         -8.73495489e-02,  3.67253348e-02, -1.75170109e-01,
>         -4.32963446e-02, -8.66313800e-02,  1.16616800e-01,
>         -5.26590347e-01,  7.48868659e-02, -1.07993223e-01,
>         -8.40747431e-02,  7.63554946e-02, -1.71959952e-01,
>         -1.91733479e-01,  1.20556697e-01,  1.92059621e-01,
>         -2.83274114e-01,  2.73865253e-01,  1.21221140e-01,
>         -6.12211525e-01,  1.23889208e+00, -1.26676500e+00,
>         -6.77534789e-02, -1.23839431e-01,  1.26742080e-01,
>          4.66871142e-01, -4.09075618e-01, -3.19231719e-01,
>          3.89545448e-02, -7.01428890e-01, -5.43421891e-04,
>          1.68996230e-02, -1.66286156e-01, -2.34960631e-01,
>          3.34443823e-02,  1.28992677e-01, -3.36274296e-01,
>         -5.37445173e-02, -4.53085452e-02,  6.39919341e-02,
>         -1.97191145e-02, -3.80552933e-03, -4.51324806e-02,
>         -1.55252978e-01,  1.15365756e+00,  2.24652156e-01,
>         -9.44509953e-02]]), array([[-0.17964645,  0.25568137,  0.21861663, ..., -0.10716321,
>         -0.00238903,  0.38228613],
>        [-0.37276298,  0.01081463,  0.1564104 , ...,  0.06062626,
>         -0.08346827,  0.23109984],
>        [ 0.06336492,  0.11828368, -0.03863488, ...,  0.03618264,
>         -0.48156601, -0.19605348],
>        ...,
>        [ 0.24165745,  0.20604421,  0.58149827, ...,  0.45216867,
>          0.43325123,  0.24461728],
>        [-0.42249793, -0.27418131, -0.15306103, ...,  1.09719026,
>          0.08983386, -0.14855994],
>        [ 0.31684455,  0.17004417, -0.14520794, ..., -0.59670144,
>          0.15913162,  0.03115464]]), array([ 0.03045244,  0.61540875,  0.97076464, -2.04676569, -1.26952249,
>         0.32732394,  0.89703417, -0.36222738, -1.80146372,  0.16542026,
>         0.79674131,  0.50879617,  0.05789363,  0.26923135,  0.02137228,
>         1.48363173,  0.67931215,  0.50625056,  0.18523889,  2.07478809,
>         1.35269082, -0.1282013 ,  0.40214343,  0.45933437,  1.66571057,
>         0.85095823, -0.24459287,  0.46056713,  0.54284786,  1.09635508,
>        -0.16104783,  0.08547631,  0.1655647 , -0.41905095, -0.11920086,
>         0.32213762,  0.13933085, -0.13247236,  0.27529777, -0.41632511,
>        -0.05018181,  0.15617287, -0.20218337, -0.15579043,  0.28067771,
>         0.30057193, -0.03643238,  0.39063578,  0.18349517,  0.28971492,
>         0.07503192,  0.02594861, -0.18928754,  0.59984547,  1.21236187,
>         0.49216588,  0.15279882,  0.27200954,  1.59668618,  1.12000352,
>         0.33180845,  0.26186374,  0.11560587,  1.45965922])]}
> ---------------------------------------------------------------------------
> FileNotFoundError                         Traceback (most recent call last)
> <ipython-input-12-f9ac5a56b131> in <cell line: 60>()
>      58 
>      59 
> ---> 60 shutil.copyfile(os.path.join(model_dir, 'model_keras.json'), os.path.join('/content', model_filename))
>      61 files.download(os.path.join('/content', model_filename))
>      62 
> 
> /usr/lib/python3.10/shutil.py in copyfile(src, dst, follow_symlinks)
>     252         os.symlink(os.readlink(src), dst)
>     253     else:
> --> 254         with open(src, 'rb') as fsrc:
>     255             try:
>     256                 with open(dst, 'wb') as fdst:
> 
> FileNotFoundError: [Errno 2] No such file or directory: '/content/Automated-GuitarAmpModelling/Results/Orange OR120 Crunch_LSTM-16-0/model_keras.json'

I’m obviously no expert when it comes to google colab so if anyone can help me figure out & identify the problem here, I would be very thankful.

1 Like

Anyone???

Because of this error, I cannot export or download any models that I train. This has only started happening recently. Why is this community so quiet?

it’s likely that something changed in the keras python library. This has to be addressed in the github repo the colab script uses.

1 Like

Forgive my lack of knowledge but I’m wondering who’s job it is to do that and if there is any way to let them know of the issue.

I published this problem as an issue on AIDA-X github repo, hope it’s solved as quick as possible. This should be a top priority for AIDA and for the Dwarf as a multifx device.

Firstly because it looks like a quick fix for someone who understands the python libraries that the training uses, as the problem seem to be caused by deprecated arguments and name changes inside the Keras python library like @spunktsch pointed out

Other than that, an amp modeling software/platform needs to have it’s way of modeling new amps working, and it seems like it’s been 2 weeks since it broke with no fix and no word about it.

I understand how long it can take to fix something like that and that it may not be as simple of a problem as it seems, and I don’t feel that I’m owed that fix in no way. What bothers me is the complete lack of communication when a feature that big/important is broken. What is happening behind the scenes about it?

2 Likes

Mod is famous for it’s lack of communication… we regularly wonder if it’s still alive. An announcement about a substantial investment that could change Mod Audio’s life was made to those who had invested in MOD, but with a confidentiality clause: it was on January 31.

4 Likes

@GrimJim you are right.

In the end it’s our responsibility that stuff keeps working. I’m also not very happy with a few things and the fact that we are still just 2 people at AIDA DSP slows down things a lot.

It’s not that we don’t have ideas to make the captures a more straight forward process. But to actually do that we need more time.

MOD isn’t existent anymore, at least given the recent lack of communication - not only in the forum.
So there will be no updates to all the links and files that got pushed through MOD.

But to not just find excuses I’ll talk to @madmaxwell what we can do to the script to make it work again on colab.

6 Likes

This is extraordinarily alarming coming from you, given how closely you worked with them on AIDA integration.

4 Likes

2 weeks since it broke

if it’s about training, please open issues here Issues · AidaDSP/Automated-GuitarAmpModelling · GitHub

AIDA-X is the plugin source code repo and the inference engine. Let’s reference the issue in training repo and discuss there!

Regarding keras deps: when you work with Colab, the env is updated every month more or less. Stuff breaks. There should be a commented section at the beginning which installs the expected versions. Let’s verify this in the next days. There is also a docker compose app / docker container, which is the way me and @spunktsch perform training! The docker image will work for sure, since the env is controlled by us.

Sorry for your experience, let’s handle it on github, thanks for the feedback

1 Like

In my experience, I have always used the following link Google Colab
for model training. That always worked until recently. When I run the cell to export a trained model, the error code (as shown on the OP) shows up.
What is the docker thing you’ve mentioned? is it part of Colab? Could you explain what it is and where can I find it?

Since I’m experiencing the issue as well, I’m looking for alternatives.

This tech (and the process to create it) is way too sexy to peter out like this.

I read about the “docker” method.
Is there an alternative to run on a Win10 machine (my other pc).
I vaguely remember looking into that but it seemed to contain steps that were beyond me, despite my carrer in ICT in the 00’ :stuck_out_tongue:

1 Like

In addition to offer docker method, I’ve fixed the current AIDA_X_Model_Trainer.ipynb to install known working dependencies instead of relying on the ones that Colab decides to install. So the issues around training should be solved.

We tracked the issue here for doc KeyError: 'test_lossESR_final' running the trainer in google colab · Issue #8 · AidaDSP/Automated-GuitarAmpModelling · GitHub.

There are a few points of discussion:

I would need help to adjust AIDA_X_Model_Trainer.ipynb to fit changes in next branch where I have landed some new models (and more will come). Those are slight modified RNN networks that needs to be investigated. I really wish I had more time to do that but atm I’m full.

This activity is tracked here Adapt AIDA_X_Model_Trainer.ipynb to changes in next branch · Issue #9 · AidaDSP/Automated-GuitarAmpModelling · GitHub. I am already using this branch (next) for local trainings so it’s kinda tested

Regarding Google Colab, some of you already know that is not the ideal solution, also in the Pro version. It would be much better to just have a cloud machine with GPU access and the capability to just run our docker image. I have created also a docker-compose.yml in next branch. The reason is no more hassle with dependencies and more importantly, we are all training with same torch / cuDNN versions. The solution already works locally (this is the way I perform trainings in a dedicated server).

10 Likes