Difference between `for ... end` and `[... for ...]`?

I encounter a very strange bug when calling an external library (say a function foo) with the two syntaxes in the title:

julia> data = Any[]; for _=1:10000 push!(data,foo()) end
# OK
julia> data = [foo() for _=1:10000]
# crashes the external library, and brings julia down with it

I guess that I’m not safeguarding properly some memory somewhere; but I thought that these two pieces of code were entirely equivalent. Does someone have inspiration as to where the crash can come from?

It’s hard for me to post a MWE because of its reliance on the external library. If someone wants I can try to set up a non-minimal example to reproduce it.

Many thanks in advance!

I also think that from the point of view of the external libraries these calls should be identical. (From Julia’s perspective there is a difference in types and allocation patterns mostly).

Stupid question: Are you sure the crash is not random? Or depends on the number of calls in total or something like this? I.e. does it also crash if you reverse the two lines, so first use the array comprehension, then the loop? Does it matter how large the array comprehension is?

1 Like

The first case will be an Any array, whereas the second array will have a type determined by the return type of foo(). If you have an external non-Julia library (e.g. a C library), probably it only works for a particular type of array.


Are you able to post the stack trace? Is the crash in the creation of data or in its subsequent usage?

One quick thought is that you can make the two lines of code one step closer by explicitly marking the array comprehension as an Any array:

julia> data = Any[foo() for _=1:10000]

Thanks a lot for all these suggestions! I’m trying to get something reproductible, but unfortunately compiler / source package changes make this hard.

One question/comment. Are you using threads? Explicitly or implicitly (I believe Pluto and VS Code may enable for you).

It might not matter, but I’m thinking could it in theory, or in a future Julia version.

The former is linear, it must call push! 10000 times, but that does however NOT mean the “array” is enlarged that often since Julia is clever behind the scenes.

With linear I mean it must add one element at a time to the end of the array (user visible), even though in practice it only enlarges (and moves likely) the underlying storage 9 times (clever logarithmic, avoiding O(n) allocations):

julia> @time (data = Int64[]; for _=1:10000 push!(data, 1) end)
  0.000566 seconds (9 allocations: 326.547 KiB)

The other way is better than that, and differs more than than just in allocation amount/speed (though at first didn’t seem so, because of global scope):

julia> @time data = [1 for _=1:10000];
  0.038610 seconds (19.32 k allocations: 1.376 MiB, 99.53% compilation time)

julia> test() = data = [1 for _=1:10000];

julia> @time test();  # I'm not worrying for now why I do not get the expected 1 allocation
  0.000014 seconds (2 allocations: 78.172 KiB)

It’s 40x faster despite the other only having 4.5 times the number of allocations… Why? Probably since it uses SIMD/vectorized (vmovups), while the former likely doesn’t. In your case you might be enabling SIMD optimization and hitting a bug (in LLVM). SIMD is concurrency while not threads. Julia is allowed to use it without @simd macro if it can proof identical end-result. The macro allows SIMD if I recall even if slightly different.

Now what I’m thinking, the former much do in order, because of user-visible push!, unless sufficiently advanced compiler (that doesn’t and likely never will exist).

Despite the latter using for and seemingly linear, it wouldn’t be if using threads, or other concurrency. You’re only asking for the end-result, and the compiler should even be allowed to allocate the full array and populate in reverse order (with a pure foo function), or from both ends or split n ways etc.

You might think you need to annotate locally threads used. I’m not sure, maybe it’s required, but it shouldn’t be, so are threads used? If threads are used foo needs to be re-entrant. Is it or does foo use ccall? Are you showing an MWE that actually fails, or is it a simplification? E.g. if 10000 is not a constant, but a function returns that value, it might need to be called in each iteration.

1 Like

@Palli Thanks for the insight, indeed there’s definitely an issue with threads (which the library calls and perhaps lets hanging). What I wrote is an oversimplification, I can’t create a nice MWE for now.