I am trying to calculate the 2-point correlation of a large 3D `BitArray`

(the data comes from a micro-CT scan of a porous structure, which I have already thresholded, hence each voxel is either 1 or 0). At the moment, I am using an array which has 512 elements in each definition, and that might get bigger.

My working definition of the 2-point correlation is f(d) = \frac{1}{n}\sum_{i=1}^n x_i x_{i+d} ; so at a particular voxel separation d, I apply an index shift to the array both forward and back, multiply the two slices, and sum over the elements.

The function I have written is:

```
function autocorr_eucl(bindat::BitArray, hmax)
# First, we need to get all the possible displacements below a certain distance, and arrange them by distance.
idict = Dict{Float64,Array{Tuple{Int64,Int64,Int64},1}}()
h_full = ceil(hmax * sqrt(3))
for hi in 1:h_full
indcombs = [(j, i, hi-i-j) for i in 0:hi, j in 0:hi if (i + j <= hi) ]
for ind in indcombs
dist = round(hypot(ind...), sigdigits=4)
if dist <= hmax
if haskey(idict, dist)
push!(idict[dist], ind)
else
idict[dist] = [ind]
end
end
end
end
ks = collect(keys(idict))
xs = sort(ks)
nd = length(xs)
ys = similar(xs)
ns = similar(xs)
odims = size(bindat)
osize = prod(odims)
ox, oy, oz = odims
# Now that we have all the possible displacements,
# we calculate the correlation at each distance.
# This is relatively easy to parallelize, so I do that.
@sync for i in 1:nd
@spawn begin
inds = idict[xs[i]]
ninds = length(inds)
ns[i] = ninds
totsize = 0.0
# smsize = prod(odims .- inds[1])
for ind in inds
nx, ny, nz = ind
ndims = odims .- ind
contrib = sum(view(bindat, 1:ox-nx, 1:oy-ny, 1:oz-nz) .*
view(bindat, 1+nx:ox, 1+ny:oy, 1+nz:oz))
# contrib = sum(bindat[1:ox-nx, 1:oy-ny, 1:oz-nz] .*
# bindat[1+nx:ox, 1+ny:oy, 1+nz:oz])
nsize = prod(ndims)
totsize += nsize / osize
ys[i] += contrib
# ys[i] += contrib / nsize / ninds
end
ys[i] /= (totsize * osize)
end
end
return xs, ys
end
```

(I already added some thread parallelism here, to try and speed it up a bit, which did give me about a 3x speedup for 4 threads. I’m impressed it was so easy!)

In keeping with the performance tips in the Julia manual, I tried accessing the array slices with views instead of the original slice. (The key lines here are `contrib = ...`

; the version with slices is commented out.) However, I find that it takes about 3x *longer* using array views than when I use the basic slices.

As a MWE,

```
exdat = rand(512, 512, 512) .< .95
@time autocorr_eucl(exdat, 5)
@time autocorr_eucl(exdat, 5)
```

yields

` 2.221220 seconds (1.16 k allocations: 4.548 GiB, 40.11% gc time)`

when I use simple slices, whereas when I use `view`

I find

`8.895422 seconds (683 allocations: 1.517 GiB, 5.77% gc time)`

What’s going on? I can see that this improves the allocations a lot, but the function gets much slower. (And if there are other things I can do to improve this code, I’m open to suggestions.)