CUDA: MVectors always allocate memory and cause "Out of Memory Error"

try this simple code:

using CUDAnative
using CUDAdrv
using CuArrays
function kernel(i,arr)
	for i in 1:i
		a = MVector{3,Float32}(undef)
		arr[i] = a[1]+1.0f0
	end
end
N1 = 1
N2 = 10000
arr1 = CuArray(Array{Float32,1}(undef,N2))
arr2 = CuArray(Array{Float32,1}(undef,N2))
@cuda threads = 10 blocks = 10 kernel(N1,arr1) #work on my machine
@cuda threads = 10 blocks = 10 kernel(N2,arr2) #don't work on my machine,out of memory error

It seems that in the loop, every call to the MVector will allocate a new array, which will quickly run out of the GPU memory.This is not consistent with the behavior on CPU,which doesn’t allocate at all

N2 = 10000
arr = Array{Float32,1}(undef,N2)
@allocated kernel(10000,arr)
#0

Some infos

CuArrays v1.0.2
CUDAdrv v3.0.0
CUDAnative v2.1.0

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Currently I have to allocate the MVectors before entering the loop, and set each element to zero in the begin of the loop to reinit the MVectors manually.

See https://github.com/JuliaGPU/CUDAnative.jl/issues/340, could you try with Julia 1.2?

Yes,this code works fine in Julia 1.2.0-rc1.0.I guess I should move to this new version of Julia!