MOST likely you call sum, then std, i.e. separate loops and scan twice, BUT technically the functional approach doesn’t rule out optimal performance: scanning just once… In theory the compiler (of functional languages at least) is allowed to reverse the order of loops or doing even more advanced (because these are pure functions), though in practice it likely doesn’t:
I.e. these are pure functions, and could be merged with a sufficiently good compiler, both for naive loops, and the better versions for numerically stable code, as in Base (I at least assume). I have no confidence compilers actually do that however… and I’m not even sure it would be good, i.e. now you have those two functions precompiled, ready to use, not inlined, but for the compiler to do this it would have to inline them and merge/interleave the loads, so they were shared, no longer separate in each loop.
So what does the compiler do, or should?
- It could scan up in memory for
sum
, but forstd
, it could scan down, i.e. what’s most recently in the cache. If both scan in the same direction (do they, likely, then suboptimal), then your data may be out of the cache, if not L3, then at least likely L1. - You could also start two threads, one to scan for sum, and the other for std, then both will use load instructions, load into same (or different) L1 Dcache, but at least benefit from the data prefetched into the shared L2/3 cahce. THEN it would be better for both to scan in the same direction, whatever is chosen…
Note, 1. and 2. have opposite directions, so the algorithms can’t be optimal direction-wise to support both. Because of functional programming, the order needs not be defined. But I’m pretty sure imperative loop code is used.
A very good compiler could even reverse order of an imperative loop, since the function is pure, though I very much doubt it would. And I’m even less convinced Numba or other compilers would do such trick.
I’m only trying to show that functional can be better in theory… though in practice it likely wouldn’t be.
Maybe…, or you pay in implementation complexity by NOT using functional programming? And at least in this case this doesn’t apply since these are pure functions. If you allow overwriting, then likely you disable some possible optimizations. Like for above, you want to know the array isn’t mutated by some yet another function. If it happens, you’re likely left with bugs, why Julia doesn’t use @view
by default, the default of other languages like C/C++ (it’s faster using views
yes, just potentially dangerous, and you must know what you’re doing, i.e. not use threads to at least mutate, or then be very careful…).
What you pay with functional programming/persistent data structures (that are otherwise very nice) is more GC pressure, but that doesn’t seems too bad, and Julia GC always getting better. Julia just needs to free eagerly in loops, when possible with compiler support (not manually doing such with Bumper.jl).