Populating a 3D array with discretely sampled scalar field

question
quantum
performance

#1

Hi,
I’m trying to populate a 3D array (predetermined size of about a million) with values of a Float64 function on a cartesian grid.
I think the term for this is “discretely sampled volumetric data” in scientific visualization.
Anyway, I wrote 3 different ways to do it in Julia and measured the performance with @time but all of them were less than half the speed of an equivalent C code (gcc -O2).
I fiddled with my Julia code just enough to learn that some allocations need to be pulled out of the loop, that I need to be careful how I select a range of elements while avoiding slowdowns, etc. I basically checked through all of the points in Docs » Performance Tips that apply to me, but the speed isn’t that satisfying.

Perhaps this is the limit of Julia as far as populating an array is concerned…
Could anyone let me know if there’s a faster way? Thanks.


#2
function fill_grid_1!(gridR::Array{Float64,3})
  rVec = zeros(Float64, 3)
  s = 1
  @inbounds for k in 0:zRes
    for j in 0:yRes
      for i in 0:xRes
        rVec[1] = x0 + i*dx
        rVec[2] = y0 + j*dx
        rVec[3] = z0 + k*dx
        gridR[s] = R_5_0(norm(rVec))
        #gridR[i+1,j+1,k+1] = R_5_0(norm(rVec))
        s += 1
      end
    end
  end
  nothing
end

Turning off bounds checking can help a lot in tight loops. Also, give @simd a try, but Julia defaults to -O3 which should add it automatically.

Not sure why the first option is allocating so much more though. I would check to make sure it’s all okay. @inbounds might fix the allocations.


#3

The memory allocation in fill_grid_1!, which really shouldn’t allocate in the loop, cries type instability and code_warntype confirms it. The problem is that the globals aren’t as const as intended. Try

const x0 = const y0 = const z0 = -10.0
const xRes = const yRes = const zRes = 100

#4

Other things worth trying, applicable to the C code as well.

You don’t really need rVec. Compute the norm yourself. (Using the StaticArrays package could be an alternative but doubtful it’s a win here.)

        x = x0 + i*dx
        y = y0 + j*dx
        z = z0 + k*dx
        gridR[s] = R_5_0(sqrt(x * x + y * y + z * z))

Evaluate the polynomial in R_5_0 with Horner’s method. The macro @evalpoly may be helpful.

The division by 300 * √5 could be multiplication by a precomputed constant.


#5

Thanks for your replies.
I did everything said here so far and the Julia code is now more than 4 times faster than the C code.
(fixing const declarations gave the biggest speedup, so I selected that reply as the correct answer)

const x0 = const y0 = const z0 = -10.0
const xRes = const yRes = const zRes = 400

# ...

function R_5_0(r::Float64)
  ρ5 = 0.4*r
  return exp(-0.5*ρ5) * @evalpoly(ρ5, 120, -240, 120, -20.0, 1.0) * 0.0014907119849998597
end

function fill_grid_2!(gridR::Array{Float64,3})
  s = 0
  @inbounds for k in 0:zRes
    for j in 0:yRes
      for i in 0:xRes
        x = x0 + i*dx
        y = y0 + j*dx
        z = z0 + k*dx
        gridR[s += 1] = R_5_0(√(x*x + y*y + z*z))
      end
    end
  end
end

#6

Yes, I don’t know how I missed those globals. Here’s some literature on why this happens, and a few related things to look out for.

http://docs.julialang.org/en/stable/manual/performance-tips/#avoid-global-variables