Optimization tips for my julia code. Can I make it even faster and/or memory efficient?

Skoffer · February 14, 2020, 2:58pm

Ha, never trust a random guy from internet. Yes, you are right, centroids_cnt should be moved to the outer loop. With this change results less amazing kind of ~9.5s which is still good.

I’ve updated gist: https://discourse.julialang.org/t/optimization-tips-for-my-julia-code-can-i-make-it-even-faster-and-or-memory-efficient/34614/8 · GitHub

Skoffer · February 14, 2020, 8:26pm

That’s the last one
After implementing custom pairwise function, refactoring sum_of_squares, and reordering loops in order to respect memory layout, I was finally able to achieve ~5s of execution time (now for real). It’s in the same gist, so I wouldn’t repeat link here. Which is even more interesting, new version can beat Clustering implementation.

using Clustering
@btime kmeans(transpose($X), 2) # 187.700 ms (95 allocations: 83.93 MiB)
@btime Kmeans($X, 2, verbose = false, tol=1e-10) # 118.517 ms (27 allocations: 45.78 MiB)

The next interesting step is the parallelization of this problem, it should be easy enough. Even more interesting would be to try to parallelize it on GPU, something tells me that it is going to be spectacular.

PyDataBlog · February 14, 2020, 9:17pm

Maybe I should package it and serve it a Julia package?

oxinabox · February 15, 2020, 1:27pm

Clustering.jl is an existing and popular package for doing clustering.
I suggest benchmarking against it’s current implementation, and if your code is faster, make a PR.

PyDataBlog · February 15, 2020, 2:22pm

@Skoffer would be the best person for the PR since he came up with the optimizations. From all the benchmarking results, this implementation is faster and more memory efficient than the current implementation in Clustering.jl.

It’s worth making a PR @Skoffer .

Topic		Replies	Views
HELP: Implementing K-means from scratch with Julia General Usage question	4	1503	February 6, 2020
[ANN] ParallelKMeans v1.0.0 - KMeans In Super Sonic Mode Package Announcements	16	1171	May 26, 2021
Julia Performance - Help Needed Performance question , python	40	2921	September 17, 2021
Issues with shared memory parallel k-means implementation Julia at Scale performance , parallel	1	723	December 28, 2018
Making efficient some small subroutines Performance	22	1119	May 8, 2020

Optimization tips for my julia code. Can I make it even faster and/or memory efficient?

Related topics