NAM LSTM experimentations

modep · January 12, 2024, 9:21pm

Comparing my LSTM custom settings to NAM Nano models.
http://coginthemachine.ddns.net/mnt/namhtml/

Zavorra · January 13, 2024, 11:13am

Can you elaborate some more on that?

modep · January 13, 2024, 7:53pm

I now managed to get a quite acceptable quality down to only ~1/10 of the CPU that a WaveNet Standard uses, on my computer. The NAM file is only 7kB and it has been added to the web page.

With this low CPU it is probably possible to use one or two NAM instances on a really cheap SoC/SBC or an ancient computer.

modep · January 14, 2024, 6:55pm

First off the capture files used need to have more things to use when training. Having more dynamic information and different polarities makes the models a lot better in my tests. Try these files and read the containing .txt file: https://fastupload.io/A4tYVcrQlPHbtNd/file

madmaxwell · January 15, 2024, 8:28am

Interesting. Worth a branch here? GitHub - AidaDSP/Automated-GuitarAmpModelling at next we can setup a few tests with @itskais too and compare with our metrics!

modep · January 15, 2024, 10:06am

If it is for AIDA-X then I have to look through the training script and see what parameters being used and maybe not used. I have found out while tweaking parameters that the LSTM parameters are very sensitive and can make a hugh difference.

madmaxwell · January 15, 2024, 11:13am

Okay maybe just point me / invite me to your NAM fork, so I can have a look at the differencies and I can take care of the tests on our training script.

github.com/AidaDSP/AIDA-X

Possible to use more num_layers?

opened 07:35PM - 12 Aug 23 UTC

38github

enhancement

I trained a model using 3 num_layers which improved a fuzz sound from the defaul…t. Using default I got ESR 0.18 but by using 3 num_layers I got 0.10 and after one more round of training it went down to 0.08. The problem is that the plugin can't use it correctly. It does load the model but the sound is very quiet and does not have much distortion. Is it possible to enable more num_layers kind of like how you enabled larger hidden_sizes? 3 num_layers file: [_LSTM-40-0_ESR_0p0886.zip](https://github.com/AidaDSP/AIDA-X/files/12328141/_LSTM-40-0_ESR_0p0886.zip)

is that you correct? I have implemented the necessary modifications on the training script, I just need to finalize the runtime mods to support a couple (for now) of multi-layer rnn. I still need to check CPU consumption

Also listened to your Dataset, honestly I had experienced the best results with our training script by using more musical / guitar oriented DI tracks instead of test signals. Would be worth discussing also this topic with you.

modep · January 15, 2024, 11:27am

Some of the test signals are there to help with compression but I haven’t done much testing with that yet.

That is me, yes. The trainings that I have posted about in this thread is done with NAM though and doesn’t seem to ever need 3 num_layers. AIDA-X needs more tweakable parameters to be able to train like I am doing with NAM, e.g. drop_last and pin_memory which should be set to false. With WaveNet, my experience is if set to false that they reduce high frequencies BUT with LSTM they actually make the higher frequencies “available”. The problem I had with LSTM with default settings in NAM was that it sounded like it had a very audible low-pass filter. The strength it had though was no aliasing sounding artifacts AND the low CPU.

The parameters “train_burn_in” and “train_truncate” should also be able to be changed. Tweaking these and finding the optimal values lowers the ESR/MSE drastically.

The x_test and y_test validation files is taken from a part of the training file that has the loudest amplitude and in my tests has shown to produce most distortion and sag/compression characteristics. When used the part that is commonly used with NAM, which is a lot quieter, the trained model did not model the compression too well.

Adding many loudness and opposite polarity layers seem to have made the models act and feel more like the devices. Feels less forgiving than many models. This is highly subjective though and needs an ABX test.

One interesting thing with ESR is that a WaveNet can have an ESR of 0.0005 and sound much worse to me than an LSTM with ESR 0.05. I think WaveNet has issues with aliasing and ringing artifacts and it can be mitigated with “pre_emph_mrstft_weight” and “pre_emph_mrstft_coef”. Overall, I believe WaveNet models eq curves on full rigs and extreme eq settings better than LSTM. LSTM can in those difficult cases get there 95% with a lot less CPU though.

modep · January 16, 2024, 11:18pm

http://coginthemachine.ddns.net/mnt/namhtml/namconfig.html

Here is all you need. Download NAM Colab file and use the settings and wav files described on that page. Don’t forget to compile your AIDA-X or NAM plugin with CLANG for lower CPU usage. I compiled them with -O3 -lto. On Intel and AMD add -march="x86-64-v3" for a very big speed up if you have a compatible CPU.

I have compiles at:
http://coginthemachine.ddns.net/mnt/nam/software_src_misc/
http://coginthemachine.ddns.net/mnt/aida-x/software_src_misc/

modep · January 22, 2024, 3:12pm

It seems that no one is really interested in this or am I wrong?

spunktsch · January 22, 2024, 5:11pm

It is interesting and appreciated to post you findings.
This forum might not be the best place because there are 3 people max that know about those specifics and what they mean.

Rom · January 22, 2024, 5:12pm

I personally don’t understand what is in play here… A better way to train NAM ? Which will require less CPU but sound better ?
If you have some .json files to test it would be less confusing for me.

CharlyRebell · January 22, 2024, 5:49pm

As spunktsch said, your work is highly appreciated and the improvement in quality looks very promising, but you have to be quite tech-savvy to bring it to use and those who are struggle to find time for some experiments.
Is there a chance your findings could be integrated into the online aida-x training colab? Maybe in some kind of advanced mode that gives access to these parameters?

modep · January 22, 2024, 6:19pm

Yes, exactly. What is needed is NAM Colab file and pasting the config that I have posted.

modep · January 22, 2024, 6:21pm

If I remember correctly AIDA-X is limited to a selection of num_layers and hidden_sizes so things must at least be changed in the application code itself. I don’t know how AIDA-X loads NAM files or if it does.

Techno · January 22, 2024, 7:44pm

I have some experience with Collab and NN so I’ve been reading what you posted with interest. I learned about LSTM because of what you wrote since I was curious.

Unfortunately my knowledge is limited and I can’t contribute unless it’s about running tests on Collab and sending the results or something like that. I have never captured an amp or trained a model either.

As the others said I really appreciate you’re doing this and sharing your results. Thanks a lot!

Techno · January 22, 2024, 7:46pm

And I’m pretty sure it doesn’t?

madmaxwell · January 30, 2024, 3:24pm

Nope, for me it’s just a really busy period. We are all happy when we see involvement on this part, which is freaking difficult to tame as you may had figured out. We all made assumptions to let the ML part as generic and simple to use as possible, but the reality is that every amplifier (device) is a story, with some specific settings helping a lot with some devices and not with others.

The problem is not tweaking the parameters of training to obtain better result on a device, or at least this is the beginning of the story. The problem is understanding if this would work as a generic rule for every training. Does it makes sense in the first place? I’m thinking about injecting device type at the beginning and select a pool of configs based on that…

Can we move the optimization of the runtime to another thread maybe in github and continue here? On small devices we use aidadsp-lv2 runtime not AIDA-X. On the devices we believe we’re building very optimized binaries with all the flags and lto and stuff. See official buildroot recipe or community Yocto recipe.

I believe WaveNet models eq curves on full rigs and extreme eq settings better than LSTM

thanks for saying that. Why are we asking the neural network to implement the eq with no help? We have sweeps and white noise at multiple levels. We could just use them to guess the overall EQ setting and implement it using fir or iir, eventually recycling layers from the engine. Do you have time to look into this with me? I can provide instructions and technical support.

has the loudest amplitude and in my tests has shown to produce most distortion and sag/compression characteristics

just for this amp or you tested among a pool of devices? I think also you are doing your tests in snapshot mode? How about testing the same with conditioned models? Have you checked the guitar volume pot sensibility? Is the model correctly responding to guitar volume pot changes? On this I need more time to perform the tests.

Finally I’m repeating what I was asking in another thread: we need the COCO Dataset equivalent for Model Sims. A pool for carefully recorded devices with no Cabs, offering a wide range of devices: from od/dist pedals to clean amp to cowboys from shell. And we discuss overall score vs single device improvements. Otherwise we become mad you don’t think?

The first thing we should also fix: how NAM produced this Dataset? Was a script? Was a late night intuition? Was from a paper? Because I keep saying that with this Dataset and A-Weighting filter pre-emph, which is Aida default, we get those bassy sounding amps. If we use another Dataset, that we used for our premium models and that is IP (I’m currently busy registering a brand new one with a guitarist in the studio), the models sounds just right to my ears from EQ point of view.

madmaxwell · October 27, 2024, 4:20pm

Hi, regarding this issue, today had time to check. So basically what NAM calls train_burn_in and train_truncate are implemented with different names (because there are no standard names) here

github.com

AidaDSP/Automated-GuitarAmpModelling/blob/6dcf59084012a97064117d92168abab9c85457ec/dist_model.py#L255


      
          network.save_state = True
          patience_counter = 0
          validation_patience_limit_epoch = 0
          
          
# This is where training happens
          # the network records the last epoch number, so if training is restarted it will start at the correct epoch number
          for epoch in tqdm(range(train_track['current_epoch'] + 1, args.epochs + 1)):
              ep_st_time = time.time()
          
          
    # Run 1 epoch of training,
              epoch_loss = network.train_epoch(dataset.subsets['train'].data['input'][0],
                                               dataset.subsets['train'].data['target'][0],
                                               loss_functions, optimiser, args.batch_size, args.init_len, args.up_fr)
          
          
    writer.add_scalar('Time/EpochTrainingTime', time.time()-ep_st_time, epoch)
          
          
    # Run validation
              if epoch % args.validation_f == 0:
                  val_ep_st_time = time.time()
                  val_output, val_loss = network.process_data(dataset.subsets['val'].data['input'][0],
                                                   dataset.subsets['val'].data['target'][0], loss_functions, args.val_chunk)

it’s very easy to change them by simply passing correct args to this script, their default values are

init_len = 200
up_fr = 1000

while NAM here uses respectively 4096 and 512. The default values in Automated are the ones from the paper. Where NAM values come from? Would be good to experiment, if someone has time.

Regardng drop_last and pin_memory, a bit of context

n PyTorch, the drop_last and pin_memory parameters are used in the DataLoader class. Here’s what they do:

drop_last: When set to True, this parameter drops the last incomplete batch if the dataset size is not divisible by the batch size. If set to False, it will include the last incomplete batch.
pin_memory: When set to True, this parameter enables faster data transfer to CUDA-enabled GPUs by allocating the data in page-locked memory. If set to False, it will not use page-locked memory.

but if those variables are set to false, as @modep experiments are reporting, then it’s like not specifying them. NAM set both to true.

Of course I’ve inspected NAM LSTMCore and Nano models.