Training a million models at the same time

StefanPerko · May 15, 2022, 2:41pm

I have written my own code for doing Monte-Carlo simulations with SGD and a one-parameter model, for research. Specifically, I would draw 10^6 data sets (x,y) from a known distribution with x,y 1-dimensional and train 10^6 instances of this model on the corresponding data sets by directly doing SGD on the a 10^6-element vector of parameters.

Can I do this in flux somehow too?

What I DON’T want is training each model separately. That would take me approximately 86 hours for a process that right now takes 10-30 seconds.

I also wanted to try training one model with a million parameters by using a custom loss function and treating the 10^6 as a dimension in the data. This doesn’t work either. It just tells me it runs out of memory if I only try to create the model with Dense(10^6 => 10^6, bias = false). The same happens for 10^5, but I takes maybe a minute before it tells me that. Maybe you could use 10^4-chunks, but that would still be much less efficient than my own code where I don’t have any issues with memory like this.

Any ideas whether there is a clever way of doing this already or do I need to still use my own code in the end?

GunnarFarneback · May 15, 2022, 4:14pm

A dense layer connects each of the 10^6 inputs to each of the 10^6 outputs, which requires 10^12 weights, so it’s no surprise that you run out of memory. Presumably your own code is doing something different from a dense layer.

StefanPerko · May 15, 2022, 5:06pm

Ah! Yes, of course. Sorry for the confusion. I need to force the weight matrix to be diagonal somehow - or I suppose I can probably just not use a Dense Layer and do it directly.

ToucheSir · May 15, 2022, 5:25pm

I’m not really sure I understand the training objective (sounds like a good candidate for using libraries from one of Julia’s PPL ecosystems), but if what you need is literally Dense with a diagonal weight matrix, we have Flux.Scale.

StefanPerko · May 15, 2022, 6:12pm

Perfect! That’s exactly what I need (because you can consider them 10^6 independent models).

Topic		Replies	Views
Two questions on Flux Machine Learning	23	4745	October 2, 2020
Flux.jl, treat rows independently General Usage flux	0	416	September 16, 2019
Flux: ERROR: OutOfMemoryError() New to Julia flux	1	419	October 4, 2019
Flux's model-zoo CIFAR10 example saturates 8GB gpu General Usage gpu , flux	5	652	June 29, 2020
Params not getting updated during training New to Julia flux	25	1735	October 11, 2020

Training a million models at the same time

Related topics