This isn’t an example of advanced high-performance Julia code, except for the @simd
macros and knowing to use length 4 rather than 3. But it reads pretty clean and simple to me (I mean it’s good code!), but just functions and for-loops. This is also the kind of simple code you should be able to get running fast with numpy as there isn’t much different that the compiler would do.
More advanced high-performance julia you will see much more use of the type system to pass more information to the compiler, maybe @generated
functions and things like the LoopVectorization.jl @avx
macro. But as someone who also writes high performance C++, julia is much easier.