I’m writing distance histograms in Julia, and I think I’ve hit a limit in the performance of my serial code. The problem is pretty simple - given a list of vectors (positions), compute the distance (euclidian) between all unique pairs that can be made between those vectors, then bin these distances into linear ranges. There’s a lot of fast ways of solving the problem, like:
const arr = rand(1:1.0:100, 2, 10000) out = zeros(10^4, 10^4) @tullio out[i,j] = sqrt((arr[1,i] - arr[1,j])^2 + (arr[2,i] - arr[2,j])^2) fit(Histogram, vec(LowerTriangular(out)), bins)
But these ultimately are memory hungry, and fit(Histogram) is the limiting step.
Binning on the fly is possible though, as:
function dist1(r, histo, rmax, nbins) @inbounds @fastmath for i = 1:(size(r)-1) for j = i+1:size(r) d=0. @simd for k in 1:2 tmp=r[i,k]-r[j,k] d+=tmp*tmp end d=sqrt(d) rIndex=ceil(Int64, d/rmax*nbins) if rIndex>=nbins continue end histo[rIndex]+=1 end end return histo end points = rand(1000, 2) hist = zeros(Int, 100) dist1(points, hist, 5, 100)
You can also write a comparable cross-correlation distance histogram which finds and bins the distances between two lists of vectors:
function dist2(r1, r2, histo, rmax::Int64, nbins::Int64) @inbounds @fastmath for i in 1:size(r1) for j in 1:size(r2) d::Float64 = hypot(r1[i,1]-r2[j,1], r1[i,2]-r2[j,2]) rIndex=ceil(Int64, d/(rmax*nbins)) if rIndex>=nbins continue end histo[rIndex]+=1 end end return histo end
And with these you can tile a large dist1 problem into a series of problems split between dist1 and dist2. I’m not too concerned with this tiling (as this post is long enough), but I’d love to know if anybody knows a way computing the result from dist1 and dist2 faster
For reference on my PC, with 1000 2d vectors:
@btime dist1(points, hist, 5, 100) 2.392 ms (0 allocations: 0 bytes) points2 = points.+1 @btime dist2(points, points2, hist, 5, 100) 4.660 ms (0 allocations: 0 bytes)
Worth noting comparing a pair of identical points will crash this program, but my problem doesn’t generate these points. Also happy to share my blocking approach to parallelise this.