I have been trying to step up my game in Julia by thinking more parallel and speed to make the fastest version of my code, and I find myself faced with a number of not-so-thoroughly-documented options so I have a few questions:
-
I understand that @inbounds drops checking the bounds when accessing an array element, but why would not using that be ever a use case? I assume every functioning piece of code should not have any array cross its bounds. So after testing my code, and making sure no out of bounds error can happen, isn’t it a no-brainer to just add in @inbounds to speed things up a little?
-
When would @fastmath be a good or a bad idea?
-
If I understand correctly @simd seems to attempt vectorizing a loop when each iteration is independent from the rest. So how does this accelerate things given Julia’s cooperative threading which only switches to another thread when the active thread is blocked or waiting, as far as I understand. And how similar or different is this to the Julia backend of GPUArrays and to the @async-@sync pattern?
-
@spawnat vs Channel. The first one is about calling a function in another processor, and channels manage communications between procs on a more fine grained level, right? Any nice comparison of these would be appreciated.
-
MPI.jl is the standard tool for distributed programming in Julia, right? Is there any other perhaps easier tool for managing distributed programs, either in a cluster or using AWS?
As I am sure you can tell, my experience in parallel and distributed programming is somewhere between none and limited. Any other ideas, tools or docs that I missed and that might be useful to include here would be nice to add. I am sure this post will show up in a google search by anyone in my current shoes in the future, so kindly be thorough. Thanks a lot!