Calculating 2-point correlation of a large 3D array: `view` seems to make it worse?

I am trying to calculate the 2-point correlation of a large 3D BitArray (the data comes from a micro-CT scan of a porous structure, which I have already thresholded, hence each voxel is either 1 or 0). At the moment, I am using an array which has 512 elements in each definition, and that might get bigger.

My working definition of the 2-point correlation is f(d) = \frac{1}{n}\sum_{i=1}^n x_i x_{i+d} ; so at a particular voxel separation d, I apply an index shift to the array both forward and back, multiply the two slices, and sum over the elements.

The function I have written is:

function autocorr_eucl(bindat::BitArray, hmax)
    # First, we need to get all the possible displacements below a certain distance, and arrange them by distance.
    idict = Dict{Float64,Array{Tuple{Int64,Int64,Int64},1}}()
    h_full = ceil(hmax * sqrt(3))
    for hi in 1:h_full
        indcombs = [(j, i, hi-i-j) for i in 0:hi, j in 0:hi if (i + j <= hi) ]
        for ind in indcombs
            dist = round(hypot(ind...), sigdigits=4)
            if dist <= hmax
                if haskey(idict, dist)
                    push!(idict[dist], ind)
                else
                    idict[dist] = [ind]
                end
            end
        end
    end
    ks = collect(keys(idict))
    xs = sort(ks)
    nd = length(xs)
    ys = similar(xs)
    ns = similar(xs)
    odims = size(bindat)
    osize = prod(odims)
    ox, oy, oz = odims
    
    # Now that we have all the possible displacements, 
    # we calculate the correlation at each distance.
    # This is relatively easy to parallelize, so I do that.
    @sync for i in 1:nd
        @spawn begin
            inds = idict[xs[i]]
            ninds = length(inds)
            ns[i] = ninds
            totsize = 0.0
            # smsize = prod(odims .- inds[1])
            for ind in inds
                nx, ny, nz = ind 
                ndims = odims .- ind
                contrib = sum(view(bindat, 1:ox-nx, 1:oy-ny, 1:oz-nz) .* 
                    view(bindat, 1+nx:ox, 1+ny:oy, 1+nz:oz))
                # contrib = sum(bindat[1:ox-nx, 1:oy-ny, 1:oz-nz] .* 
                #     bindat[1+nx:ox, 1+ny:oy, 1+nz:oz])
                nsize = prod(ndims)
                totsize += nsize / osize
                ys[i] += contrib
                # ys[i] += contrib / nsize / ninds
            end
            ys[i] /= (totsize * osize)
        end
    end
    return xs, ys
end

(I already added some thread parallelism here, to try and speed it up a bit, which did give me about a 3x speedup for 4 threads. I’m impressed it was so easy!)
In keeping with the performance tips in the Julia manual, I tried accessing the array slices with views instead of the original slice. (The key lines here are contrib = ...; the version with slices is commented out.) However, I find that it takes about 3x longer using array views than when I use the basic slices.

As a MWE,

exdat = rand(512, 512, 512) .< .95
@time autocorr_eucl(exdat, 5)
@time autocorr_eucl(exdat, 5)

yields
2.221220 seconds (1.16 k allocations: 4.548 GiB, 40.11% gc time)
when I use simple slices, whereas when I use view I find
8.895422 seconds (683 allocations: 1.517 GiB, 5.77% gc time)

What’s going on? I can see that this improves the allocations a lot, but the function gets much slower. (And if there are other things I can do to improve this code, I’m open to suggestions.)

I’m not 100% certain I have this correct, but it looks like you’re trying to compute a distance histogram - given some collection of elements with coordinates, computing and binning their respective euclidean distances relative to each other in a pairwise fashion?

If so, it’s generally preferred to compute distances on the fly and increment counts in an integer array representing observations directly. Please let me know if I have it right, and I may be able to help further - we have some overlap in domain I think.

If I’ve understood you correctly, you’re looking at the first part of my code, which shouldn’t be the biggest performance bottleneck, I think. I am finding all the possible index shifts within a certain radius and binning them by distance, but that is just to facilitate a further calculation.

For a 2D case that I can actually visualize, here is what I am trying to do:

Given a BitArray like the following:

[0 0 1
 1 1 0
 1 1 1]

the two-point correlation is something like trying to estimate how often we can predict the value of a pixel’s neighbors based on only its own value; the way I have defined it, calculating the two-point correlation at distance 1 from the center value would be f(1)_{2,2}= \frac{1}{4}(a_{2,2}a_{2,1} + a_{2,2}a_{1,2} + a_{2,2}a_{2,3} + a_{2,2}a_{3,2}) = \frac{1}{4}(1*0 + 1*1 + 1*1 +1*0) = \frac{1}{2}
In order to do this programmatically for every point at the same time and generate statistics, for the above array, I am writing
f(1) = \frac{1}{12}(a_{1,1}a_{1,2} + a_{1,2}a_{1,3} + a_{2,1}a_{2,2} + a_{2,2}a_{2,3} + a_{3,1}a_{3,2} + a_{3,2}a_{3,3} + a_{1,1}a_{2,1} + a_{1,2}a_{2,2} \\+ a_{1,3}a_{2,3}+ a_{2,1}a_{3,1} + a_{2,2}a_{3,2} + a_{2,3}a_{3,3} )
f(1) = \frac{1}{12}(0 + 0 + 1 + 0 + 1 + 1 + 0 + 0 + 0 + 1 + 1 + 0) = \frac{5}{12}
and representing that in code as

corr = (sum(A[1:2, :] .* A[2:3, :]) + sum(A[:,1:2] .* A[:,2:3]))/12

Once I start looping over all possible index shifts and tying those to particular distances (since I want to calculate this measure separately for each distance at all possible distances), it winds up looking like this.

inds = Dict(1.0 => [(1,0), (0,1)]
nx, ny = 3, 3
for ind in inds[1.0]
    xshift, yshift = ind
    contrib = A[1:end-xshift, 1:end-yshift] .* A[1+xshift:end, 1+yshift:end]
    correlations[1.0] += contrib / (nx-xshift) / (ny-yshift) / len(inds)
end

I anticipated that allocating the array slices was a bottleneck of the code, since I am allocating two slices for every index shift possible (which is a lot of possible shifts, once I calculate as far as 10 or 15 pixels of distance).