Writing code to count the number of inversions of a permutation, I was surprised that it allocates. Thus I wrote what I thought equivalent code with a for loop, and it does not allocate. Here is the code; why is count so slow?
function inversions(p::Vector{Int})
l=0
for j in 2:length(p)
for i in 1:j-1
l+=(p[i]>p[j])
end
end
l
end
function inversions1(p::Vector{Int})
count(p[i]>p[j] for j in 2:length(p) for i in 1:j-1)
end
p=collect(10:-1:1)
I’m not an expert on when or why an allocation happens, but it seems to me that in inversions1 you’re creating a generator expression depending on p (which may need allocation) and pass that to count, whereas in inversions all you need is the known length of the vector and two loops with the condition. A quick look at @code_lowered seems to confirm that:
I really don’t see the problem here - to me, the loop is much more clear and explicit about what’s going on than the generator expression. It does precisely what you expect without any hidden allocation, whereas the generator by lieu of existing already allocates. You can even go as far as to say that they are distinct by what they’re doing - the second version explicitly constructs that generator (maybe with some rewrite to return it, should all those permutations be needed lazily) whereas the first version only gives the static count of inversions.
My point is just that a “shorter” generator expression does not necessarily lead to the same code as writing an explicit loop getting just what you want. I suspect that in the OP what lead to the allocation was the call to Base.Flatten in the lowered code when generating that generator, which stems from the fact that there are two for in there.
@Jean_Michel It’s not that count allocates, but the generators you’re creating (and that are being flattened) allocate. In Julia, loops are fast so if you just need the count and don’t care about which pair of elements it’s true for, just writing the loop is probably going to be both faster and simpler to write.
I think that it is more subtle than this: the cost of many abstractions can be made negligible or in some cases zero, but this does not always hold and/or is not always done automatically (requiring a bit of prodding, or extra work).
Writing very fast code still requires some work and experience. Coming to Julia and expecting everything to be magically faster is a setup for disappointment.
No, not at all! Why have a language feature when you’re discouraged from using it? For some things generator syntax is indeed more useful e.g. single inline for or when the runtime is dominated by something else or when you can reuse a generator for multiple things. The example here just doesn’t benefit from one of those cases, but rather the double for generator causes an allocation which dominates the runtime. Additionally, no one in this thread has tested whether that generator function has constant overhead or whether it is in O(n). Just 10 elements in p are surely not representative for the general case.
a) It is context dependent. In case running performance is not important it could be written with less keystrokes (ie could have better “finger” performance).
b) it is implementation dependent. Clever compiler could translate this count(p[j-1]>p[j] for j in 2:length(p)) without allocation, couldn’t it? Till than we probably have to warn people not to use it if performance is important.
Would you mind to add MWE? (I don’t want to disturb you and will really understand you don’t! )
a) My post was very much in response to you saying “Wouldn’t be usefult [sic] to propose then in “benchmark tips” to not use generators?”. Of course running time is important. That’s why I mentioned the case where the runtime of the generator is dominated by some other thing. Please don’t twist what I said into something I did not say.
b) I don’t know, I’m not a compiler writer. I can imagine though that not unrolling and leaving that generator there could possibly have benefits e.g. lazy evaluation - the compiler doesn’t really know if evaluating that generator is a complex operation or not.
@code_lowered is just the lowered code. It is purely a syntactic transformation of the original code. Of course there will be new there since that is what is in the syntax of the code. It is the optimizer that has the job of eliminating those, so look at @code_typed instead.
Many benchmarks tips reflect ‘bugs’ in the current version of Julia: like the disaster with closures,etc…
Since Julia promises the possibility to use functional style without cost, these must absolutely be fixed.
The functional code is superior, promising easy parallelization for instance (and less bugs).