Optimization tips for my julia code. Can I make it even faster and/or memory efficient?

Ha, never trust a random guy from internet. Yes, you are right, centroids_cnt should be moved to the outer loop. With this change results less amazing kind of ~9.5s which is still good.

I’ve updated gist: https://discourse.julialang.org/t/optimization-tips-for-my-julia-code-can-i-make-it-even-faster-and-or-memory-efficient/34614/8 · GitHub

1 Like

That’s the last one :slight_smile:
After implementing custom pairwise function, refactoring sum_of_squares, and reordering loops in order to respect memory layout, I was finally able to achieve ~5s of execution time (now for real). It’s in the same gist, so I wouldn’t repeat link here. Which is even more interesting, new version can beat Clustering implementation.

using Clustering
@btime kmeans(transpose($X), 2) # 187.700 ms (95 allocations: 83.93 MiB)
@btime Kmeans($X, 2, verbose = false, tol=1e-10) # 118.517 ms (27 allocations: 45.78 MiB)

The next interesting step is the parallelization of this problem, it should be easy enough. Even more interesting would be to try to parallelize it on GPU, something tells me that it is going to be spectacular.


Maybe I should package it and serve it a Julia package?

Clustering.jl is an existing and popular package for doing clustering.
I suggest benchmarking against it’s current implementation, and if your code is faster, make a PR.


@Skoffer would be the best person for the PR since he came up with the optimizations. From all the benchmarking results, this implementation is faster and more memory efficient than the current implementation in Clustering.jl.

It’s worth making a PR @Skoffer :slight_smile: .