Strange behavior inside a for-loop with ArrayFire

Hello everyone, I have a issue and I don’t even know the exactly cause.

I am using ArrayFire to speed up some matrix multiplications. However, I cannot retrieve the data unless I do some useless operation into my data. Here a basic example of working code:

using ArrayFire
screen = rand(100,100,3)
r = rand(50,3)
β = rand(50)
k₀ = 1
function getE_scat_GPU(screen, r, β, k₀)
    nAtoms = size(r,1)
    nΘ = size(screen,1)
    nΦ = size(screen,2)

    E_scat_gpu = AFArray(zeros(Complex{Float32},nΘ, nΦ ))
    distance = similar(E_scat_gpu)
    xₛ = AFArray(zeros(Float32,nΘ, nΦ ))
    yₛ = similar(xₛ)
    zₛ = similar(xₛ)
    Rₛ = similar(xₛ)
    temp_exp = similar(E_scat_gpu)
    all_xₛ = AFArray(screen[:,:,1])
    all_yₛ = AFArray(screen[:,:,2])
    all_zₛ = AFArray(screen[:,:,3])
    for j=1:nAtoms
        xₛ = all_xₛ - Float32(r[j,1])
        yₛ = all_yₛ - Float32(r[j,2])
        zₛ = all_zₛ - Float32(r[j,3])

        Rₛ = sqrt( xₛ^2  + yₛ^2 + zₛ^2 )
        distance = im*Float32(k₀)*Rₛ
        temp_exp = exp(distance)
        E_scat_gpu += β[j]*temp_exp/distance
        dummy = sum(E_scat_gpu) # comment this line
    E_scat = Array(E_scat_gpu)
    return E_scat

result_GPU = getE_scat_GPU(screen, r, β, k₀)

Don’t worry about the whole code, just focus on the line with variable called dummy in the end of the loop. It is a operation unnecessary for the the logic of my program.

However, I need this last line to be able to retrieve the information inside the function - Actually I don’t need to do this operation, if I use e.g. println(sum(E_scat_gpu)), works as well.

If I remove this lines, I get a matrix full of NaN.

Any clues ???

could be a bug in your version of arrayfire library, I get no error with this line commented out.

> afinfo()
ArrayFire v3.6.2 (CUDA, 64-bit Linux, build dc38ef13)
Platform: CUDA Toolkit 10, Driver: 410.79

I updated the ArrayFire, and the error continues :disappointed_relieved:

julia> afinfo()
ArrayFire v3.6.2 (OpenCL, 64-bit Linux, build dc38ef1)
[0] AMD: gfx900, 8176 MB

Use sync() instead of sum() to work around this bug, and open a ticket with ?