I have written my own code for doing Monte-Carlo simulations with SGD and a one-parameter model, for research. Specifically, I would draw 10^6 data sets (x,y) from a known distribution with x,y 1-dimensional and train 10^6 instances of this model on the corresponding data sets by directly doing SGD on the a 10^6-element vector of parameters.
Can I do this in flux somehow too?
What I DON’T want is training each model separately. That would take me approximately 86 hours for a process that right now takes 10-30 seconds.
I also wanted to try training one model with a million parameters by using a custom loss function and treating the 10^6 as a dimension in the data. This doesn’t work either. It just tells me it runs out of memory if I only try to create the model with Dense(10^6 => 10^6, bias = false). The same happens for 10^5, but I takes maybe a minute before it tells me that. Maybe you could use 10^4-chunks, but that would still be much less efficient than my own code where I don’t have any issues with memory like this.
Any ideas whether there is a clever way of doing this already or do I need to still use my own code in the end?