Improving an algorithm that compute gps distances

mcabbott · April 26, 2020, 2:31pm

That’s not great, I wonder why? What versions of Julia and of LoopVectorization was this on? (If you have a minute, a run without loading LoopVectorization would test whether that is the issue, I guess it could be fussy about hardware.)

Thanks for pulling them together. Interesting that Jax uses so little memory, I guess it doesn’t actually materialise diff_lat as numpy does.

Vasily_Pisarev · April 26, 2020, 2:44pm

Without LoopVectorization, the result is much more decent:

julia> @btime distances_tullio($a, $b);
551.643 ms (648 allocations: 190.91 MiB)

LoopVectorization was v0.7.2

julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-suse-linux)
  CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, ivybridge)

Jax’s memory footprint is the same as others’, it just works with Float32 arrays while others with Float64.

mcabbott · April 26, 2020, 3:01pm

Thanks, that’s better. Looking things up, i7-3770 does not have AVX2, perhaps that’s the issue.

I meant that, accounting for 32/64, it’s like the efficient Julia algorithms, and unlike the first one / the numpy version, which make some large intermediate arrays. Perhaps this would not surprise someone who knew more about Jax.

aplavin · April 26, 2020, 4:21pm

But it’s written exactly this way in the first post:

dmolina · April 26, 2020, 4:53pm

Yes, you are right, my fault.

cgarciae · April 26, 2020, 5:52pm

@mcabbott @Vasily_Pisarev I would love to update my test with your new code but I don’t see most of the functions you are using on this thread. Are they in a gist or can you post the final version? Thanks!

cgarciae · April 26, 2020, 6:00pm

This is my initial investment -_- Also Julia has this on its main page:

Easy to use
Julia has high-level syntax, making it an accessible language for programmers from any background or experience level. Browse the Julia microbenchmarks to get a feel for the language.

So is it a complex tool or is it easy to use?

David_Cardozo · April 26, 2020, 6:23pm

Thank you all @cgarciae and I were trying to get our hands in Julia for some problem we encounter and making benchmarks against numpy and JAX, hopefully we could learn a lot from this thread.

Thanks

cgarciae · April 26, 2020, 6:25pm

Hmm. I’d have to argue with this: I took existing Numpy code and just by adding jax.jit decorator I got all this speedup, and it also runs on GPU if you install Jax with cuda support. I found it really easy to use.

mcabbott · April 26, 2020, 6:30pm

Vasily collected lots of them in this post above.

BTW, I managed to dig up an old computer with an i5-3427U, on which distances_threaded_simd seems to benefit a little from @avx, but distances_tullio is a disaster. So perhaps this can be narrowed down… LoopVectorization uses CpuId.jl and should ideally detect these things.

DNF · April 26, 2020, 6:35pm

It was claimed upthread that Jax only computed the results lazily, which would mean that the comparison wasn’t very relevant. Can you confirm this?

kevbonham · April 26, 2020, 6:38pm

Yes. Both. It’s super easy to use, as evidenced by the fact that I’m a biologist with no CS training and I’m about to be incredibly productive with it. But eeking out every ounce of performance is a different matter. The fact that both simplicity and complexity can exist in the same language is the strength.

cgarciae · April 26, 2020, 6:53pm

Easy-to-use usually refers to the learning curve. I don’t think Julia is hard, I am just surprised new comers like me get “attacked” for not knowing everything from the begging or reading the whole manual, that just not how you learn these days.

Vasily_Pisarev · April 26, 2020, 7:04pm

The code is under “Function definitions” spoiler in post

That’s fine. Just next time, please share your benchmarks here when you ask for help and get it. As we see, it benefits everyone that way.

DNF · April 26, 2020, 7:08pm

I think the amount of patience depends a bit on how cocksure the newcomer appears to be. I notice that I sometimes get a bit snarky when someone new to the language shows off their–quite understandably–flawed Julia code vs some other language, and proceeds loudly to proclaim how much Julia is behind

And this happens quite frequently.

I think most people who ask for help get very friendly treatment, but “with great confidence comes sharper feedback.”

anon92994695 · April 26, 2020, 7:09pm

I don’t think people are attacking you - but you did put a bunch of posts on social media of benchmarks with some pretty inefficient code…

Julia can require some digging to really tune things, but the end result is typically - things faster then JAX. Sometimes rivaling FORTRAN. So it’s probably best when you run benchmarks but are brand new to using/understanding some of your samples to state that. “I don’t really know julia well but JAX seems fast!”, but that’s not how your tweets read.

There’s a lot of myths the julia community has to continuously battle from the blow-hards in the python community(not everyone is like that - but there’s a lot of reinforcement bias going around). Posts like this just make our lives harder - so that’s probably why you’ve met some friction…

JAX is fast though and yea it is pretty easy to use. But just because code is written one way, JAX, will be doing a lot of optimizations behind your back. Just make sure you compare apples to apples when sharing research publicly or yea people might say “hummmm”…

cgarciae · April 26, 2020, 7:42pm

@Vasily_Pisarev You should use np.asarray as the end since np.array forces a copy of the data. I made these changes and got this numbers using 8 cores:

distances  1.744 s (40 allocations: 286.18 MiB)
distances_bcast  1.464 s (30 allocations: 95.44 MiB)
distances_threaded  330.340 ms (105 allocations: 190.82 MiB)
distances_threaded_simd  150.763 ms (104 allocations: 190.82 MiB)
dist_np_test  1.413 s (39 allocations: 95.37 MiB)
dist_jax_test 259.303 ms (8 allocations: 320 bytes)

code: final.jl · GitHub

Wow! Julia + SIMD is amazing. I am guessing Jax doesn’t use SIMD. As noted on Twitter the allocation numbers for Jax don’t mean anything.

Thanks all! I’ve learned a lot today

ChrisRackauckas · April 26, 2020, 9:23pm

If you make the arrays CuArrays then the broadcasted form will act on the GPU. ]add CuArrays will even install CUDA for you. I’m curious about the GPU timings.

mcabbott · April 27, 2020, 8:31am

Some further tweaking over here gets distances_threaded_simd is down to 65 ms (with Float32, on a 6-core CPU) and distances_tullio to 28 ms.

distances_bcast does also work on the GPU (31 ms, on an ancient one). As does distances_tullio, via KernelAbstractions, but this is very slow right now, not sure why.

baggepinnen · April 27, 2020, 8:41am

In the broadcasted version, won’t cos.(lat1) .* cos.(lat2') cause these cosines to be calculated n^2 instead of n, due to the transpose?

Topic		Replies	Views
Compute pairwise distances (Distances.jl) General Usage performance , distances , tullio , loopvectorization	9	3708	February 1, 2024
PythonCall spends a lot of time showing stuff for JAX Performance python , pythoncall , jax	3	217	October 15, 2024
Alternative to computation in a loop for distances for fast performance Performance	9	1044	December 7, 2021
Parallel computing on a subset of an array Performance question	12	1104	June 29, 2018
Faster squared euclidean distance calculation Performance	11	1806	October 2, 2021

Improving an algorithm that compute gps distances

Related topics