I'm new to Julia and am hoping to use Apple's MLX Framework with Julia

That seems like a lot of allocations. I think many of them are coming from your norm(x .- y)s — that allocates a temporary for x .- y on every call. That’s where I’m sure you could do significantly better, even on a plain old CPU. Changing those to Distances.euclidean from Distances.jl (and nothing else!) gets me down to 18s on my crappy old M1 laptop with way fewer allocations:

julia> @time radii, C_r = grassberger_procaccia(X)
250.868078 seconds (2.13 G allocations: 1.692 TiB, 37.04% gc time, 0.28% compilation time)

julia> @time radii, C_r = grassberger_procaccia_euclidean(X)
 18.588317 seconds (862.84 k allocations: 128.668 MiB, 0.02% gc time, 1.61% compilation time)

You’ll also want to avoid using tid to divide up your threads work; see PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads() for more details there.

2 Likes