CuArray allocation issue: How often is it a problem?

haberdashPI · August 13, 2019, 7:08pm

Hi there, I’m thinking about making use of Flux and CuArrays for a new project. However, I was a little troubled to learn about this issue. It’s hard to a get a sense from that thread how often this is coming up as a problem for people. How many other folks are pushing CuArrays to a similar limit? How many folks have done so and run into this problem? Has anyone run into anything similar with KnetArrays using Knet? (On a related note, how realistic is it to use KnetArrays as part of a Flux model? Is that possible?) Just looking to get a sense of what I am walking into.

denizyuret · August 27, 2019, 6:27pm

I have been forced to optimize the KnetArray memory allocator to be able to implement large(ish) models, e.g. https://github.com/knetml/Mac-Network replicating [1803.03067] Compositional Attention Networks for Machine Reasoning. More recently I started playing with large neural machine translation models based on RNNs and Transformers which provide a very good test for robust memory management and made additional improvements which will be in the next release (e.g. https://github.com/denizyuret/Knet.jl/commit/a2f91b2294329add8b3f0509c2a100bb63663211). If you try your model and find any problems, let me know.

(To answer the original question in the title: depends on the problem, the more irregular array sizes are (think different sentence lengths in NMT) and the larger minibatch sizes are the more pressure on the memory manager and the more risk things will go wacky before gc has a chance to clear it up).

haberdashPI · August 27, 2019, 6:55pm

Thanks @denizyuret, that’s helpful!

Marc.Cox · September 28, 2019, 1:14am

Hi Deniz,

I appreciate your work on this – I’d imagine it isn’t easy.

So this seems like a trend to note and keep in mind for future Julia ML feature development / roadmap etc. >>

38 results for “ERROR: Out of gpu memory” here >> Search results for 'ERROR: Out of gpu memory ' - JuliaLang

Also relevant from your @denizyuret’s work on large neural machine translation models based on RNNs and Transformers described here >> CuArray allocation issue: How often is it a problem? , and memory allocation errors like this **ERROR:** **Out of gpu memory** for` `function` `vgg_m(x0,weights) described here >> `` Gpu out of memory ;

and also noting one human genome has approximately 100GB raw data, 250GB analyzed file sizes. I think we are starting to see useful large machine learning data sets that are either completely intractable or at best cost prohibitive using anything other than high performance CPUs and relatively less expensive NVMe SSD technologies with fs compression.

For large memory models I think it is very helpful that Knet has its own mechanism for File IO , so please continue to develop Julia KnetArray memory allocator for us to be able to implement machine learning models using CPUs with NVMe SSDs and fs compression for data sets larger than GPU 11GB Memory limitations.

maleadt · September 28, 2019, 6:46pm

Read the last comment in that thread, the allocator is under active development. Recent testing has shown that performance of Knet+CuArrays is very good, and much of the slowdowns in that thread are caused by excessive allocation patterns that put unusual stress on the GC.

haberdashPI · September 30, 2019, 2:40pm

Thanks @maleadt, for all of your hard work. I appreciate hearing about the efforts to fix the allocation issues.

Topic		Replies	Views
GPU support in Knet: Cannot allocate before CuArrays has been initialized New to Julia knet , gpuarrays	2	910	November 8, 2019
State of deep learning in Julia Machine Learning	18	15648	September 28, 2019
Unreasonable memory usage with M4 GPU GPU metaljl	2	209	December 21, 2024
Out of memory using Flux CNN during back propagation phase Machine Learning	2	626	June 28, 2019
CUDA: MVectors always allocate memory and cause "Out of Memory Error" GPU question	2	915	June 14, 2019

CuArray allocation issue: How often is it a problem?

Related topics