This is a type stability problem. What you want is
struct C3D{P, T<:Number}
Nx::NTuple{P, Array{T,1}}
Ny::NTuple{P, Array{T,1}}
Nz::NTuple{P, Array{T,1}}
wgt::NTuple{P, T}
end
Otherwise every access to a field of elem is type unstable. The reason your second version is faster is because it adds a function barrier so accumulate! is type stable. With this change (and the corresponding change to the constructor), getϕ_a takes 253.749 μs (4 allocations: 6.67 KiB)