Y.A.t.Q : Yet Another @threads Question

If you care for performance of array traversal code, it’s crucial to consider memory layout. In Julia, arrays are column-major by default, and switching to row-major gives almost a 5-fold speedup for me. This is without paralellization:

X_sub_r = PermutedDimsArray(permutedims(X_sub), (2, 1))
X_r = PermutedDimsArray(permutedims(X), (2, 1))
sequenceCosine(X_sub, X)  # 3.6 s
sequenceCosine(X_sub_r, X_r)  # 0.72 s

Writing this function in terms of map can make it absolutely trivial to parallelize:

# sequential - has the same performance as sequenceCosine, 0.7 s for row-major arrays
function sequenceCosine3(X_sub, X)
	map(Iterators.product(eachrow(X_sub), eachrow(X))) do (i, j)
    	cosine_dist(i, j)
	end
end

# parallel - 0.26 s for me
function parallelCosine3(X_sub, X)
	ThreadsX.map(Iterators.product(eachrow(X_sub), eachrow(X))) do (i, j)
    	cosine_dist(i, j)
	end
end
4 Likes