Is there any significant performance difference between comprehensions and equivalent for loops for constructing arrays? Similarly, is there any reason to prefer one over the other in terms of performance? The documentation doesn’t seem to discuss this.
(I’m referring to speed of code execution rather than how visually appealing or easy to type one or the other is.)
Broadly speaking, it’s easier for a new user to get good performance for simple tasks with a comprehension than with a for loop because of gotchas like bounds checking and inefficient indexing strategies. However, for complicated (usually nested) problems, there are often things you can do with for loops that are ‘smarter’ than the code a comprehension would produce.
My major difficulty in writing generic code with for loops (or any preallocated container which I fill) is figuring out the result type. Comprehensions, map and broadcasting employ an elaborate machinery for this which is optimized for the compiler, but it is quite hard to replicate is user code for more involved types.
push!! from BangBang.jl will allow you to widen the eltype of a container as needed using a similar strategy to map, broadcast and comprehensions.
There was some discussion about this on Slack recently; it’d be nice if Base defined a generic, interface for widening an eltype on demand like this. This is a pretty complicated wheel that many people have had to independently reinvent out there in the package ecosystem.
Thanks all, this information is really helpful. @Tamas_Papp just to make sure I understand, an advantage of comprehensions then is that they automatically figure out what data type the resulting array will be composed of? So the user doesn’t need to figure out the right data type and preallocate manually?
As a follow-up question, in general will a program be faster if you do happen to know the resulting data type beforehand and do the array preallocation by hand? Just curious for future reference.
Yes, that’s what I consider the main advantage of comprehensions, map (which is currently implemented using comprehensions, eg collect), and broadcasting.
Calculating result types can be tedious and sometimes impossible (a lot of Julia code punts on this, even in Base or the standard libraries, leading to subtle bugs). IMO the best approach is a flexible heuristic, used to implement collect, which can be described as making an educated guess (eg using the first element) and then widening if necessary. For type stable code, the compiler is smart enough to figure out that widening is not needed and that branch is never called.
That said, if you can figure out the result type and reuse preallocated containers, you can avoid the cost of allocation. This is a viable strategy for speeding up some inner loops. OTOH, if you just preallocate every time, there is usually no significant advantage (provided that eg collect can figure out the correct size using the same mechanism).
These are all trade-offs you have to think about in the context of your problem. A lot of code, especially libraries for AD, works best with a non-modifying, “functional” approach. So I generally try to write code using this approach and then maybe preallocate in some tight loops that profiling shows to be significant. In my experience, the gains from using smarter algorithms always outweighs the cost of allocation most of the time, so I generally worry little about this unless there is a compelling reason.