Hi there, I’m thinking about making use of Flux and CuArrays for a new project. However, I was a little troubled to learn about this issue. It’s hard to a get a sense from that thread how often this is coming up as a problem for people. How many other folks are pushing CuArrays to a similar limit? How many folks have done so and run into this problem? Has anyone run into anything similar with KnetArrays using Knet? (On a related note, how realistic is it to use KnetArrays as part of a Flux model? Is that possible?) Just looking to get a sense of what I am walking into.
I have been forced to optimize the KnetArray memory allocator to be able to implement large(ish) models, e.g. https://github.com/knetml/Mac-Network replicating https://arxiv.org/abs/1803.03067. More recently I started playing with large neural machine translation models based on RNNs and Transformers which provide a very good test for robust memory management and made additional improvements which will be in the next release (e.g. https://github.com/denizyuret/Knet.jl/commit/a2f91b2294329add8b3f0509c2a100bb63663211). If you try your model and find any problems, let me know.
(To answer the original question in the title: depends on the problem, the more irregular array sizes are (think different sentence lengths in NMT) and the larger minibatch sizes are the more pressure on the memory manager and the more risk things will go wacky before gc has a chance to clear it up).
Thanks @denizyuret, that’s helpful!
I appreciate your work on this – I’d imagine it isn’t easy.
So this seems like a trend to note and keep in mind for future Julia ML feature development / roadmap etc. >>
38 results for “ERROR: Out of gpu memory” here >> https://discourse.julialang.org/search?q=ERROR%3A%20Out%20of%20gpu%20memory%20
Also relevant from your @denizyuret’s work on large neural machine translation models based on RNNs and Transformers described here >> CuArray allocation issue: How often is it a problem? , and memory allocation errors like this
**ERROR:** **Out of gpu memory**
for` `function` `vgg_m(x0,weights)
described here >> `` Gpu out of memory ;
and also noting one human genome has approximately 100GB raw data, 250GB analyzed file sizes. I think we are starting to see useful large machine learning data sets that are either completely intractable or at best cost prohibitive using anything other than high performance CPUs and relatively less expensive NVMe SSD technologies with fs compression.
For large memory models I think it is very helpful that Knet has its own mechanism for File IO , so please continue to develop Julia KnetArray memory allocator for us to be able to implement machine learning models using CPUs with NVMe SSDs and fs compression for data sets larger than GPU 11GB Memory limitations.
Read the last comment in that thread, the allocator is under active development. Recent testing has shown that performance of Knet+CuArrays is very good, and much of the slowdowns in that thread are caused by excessive allocation patterns that put unusual stress on the GC.
Thanks @maleadt, for all of your hard work. I appreciate hearing about the efforts to fix the allocation issues.