What is the most fitting container in foreach

WalterMadelim · November 10, 2025, 12:29pm

It’s a small question but I have an impulse to ask:

I find that the generator is lazy

julia> const a = fill(0, 4);

julia> function f!(i)
           i == 3 && error()
           a[i] = i
           -i
       end;

julia> foreach(println, f!(i) for i = 1:4)
-1
-2
ERROR: 
Stacktrace:
 [1] error()
   @ Base ./error.jl:45
 [2] f!
   @ ./REPL[3]:2 [inlined]
 [3] #2
   @ ./none:-1 [inlined]
 [4] iterate
   @ ./generator.jl:48 [inlined]
 [5] foreach(f::typeof(println), itr::Base.Generator{UnitRange{Int64}, var"#2#3"})
   @ Base ./abstractarray.jl:3188
 [6] top-level scope
   @ REPL[4]:1

julia> a
4-element Vector{Int64}:
 1
 2
 0
 0

So here you find that -1 and -2 gets printed before the second arg of foreach is unlazily evaluated. (as opposed to foreach(println, [f!(i) for i = 1:4])).

Now I want the behavior of the non-lazy [] case. But I feel that it allocates a Vector, which is a non-lightweight object in julia. My intuition is that there should be a more efficient container than the disposable [] here. Or there should be a fused foreach method.

So,

is there a solution better than []? (I guess NTuple is not because the length in practice might be large)

PS the background is that I’m doing
foreach(wait, [Threads.@spawn(f(j)) for j = 1:1000000])
But according to my above findings, I suspect that it’s incorrect to write
waitall(Threads.@spawn(f(j)) for j = 1:1000000).
Am I correct?

sgaure · November 10, 2025, 12:46pm

I don’t understand what you’re after? If you don’t want a generator, i.e. the “lazy” evaluation, you need to store the output of f! (or @spawn) somewhere. If it doesn’t fit in a tuple/the stack, it must be allocated. A vector is fine for that.

WalterMadelim · November 10, 2025, 12:53pm

That’s why I hesitated to ask.

But I think there should be a more concise datastruct (less ordinary) than the standard Vector.
Because a Vector has other characteristics, e.g. index, variable length, which are not needed for my particular usage.

sgaure · November 10, 2025, 12:59pm

You can do something like this, but I don’t really see what you gain except a single small allocation (for the Vector wrapper around Memory):

tasks = Memory{Task}(undef, N)
for i in 1:N
    tasks[i] = @spawn whatever(i)
end
waitall(tasks)

Sevi · November 10, 2025, 1:43pm

This sounds a bit like an XY problem. And I don’t really agree with this assessment

Of course, there are a few bytes required on top of the bare elements of the vector, but it doesn’t quite get more lightweight than this (except maybe Memory which was already mentioned). If your collection has more than a couple elements, the size of the elements will dominate the small overhead of the rest.

Anyhow, before “optimizing away” something like the allocation of the vector in your example, make sure you have identified the performance bottlenecks of the code as a whole e.g. by profiling it.

It’s hard to say for sure from your isolated example, but if the allocation itself is really taking up a significant portion of time/memory, then you could think about changing the overall algorithm (you can loop over the array twice, once just to detect errors; preallocate a big vector and re-use it all the time; use an IOBuffer to store whatever you want to print until you decide if you actually want to print it; etc.). Otherwise, I wouldn’t bother with this particular allocation (as @sgaure said, if you want to store the results somewhere then you have to also put them somewhere in memory…).

If you want the tasks to fail early, rather than finish all and then realize that one of them had an error, have a look at the failfast keyword of waitall:

help?> waitall
search: waitall waitany wait iswritable

  waitall(tasks; failfast=true, throw=true) -> (done_tasks, remaining_tasks)

  Wait until all the given tasks have been completed.

  If failfast is true, the function will return when at least one of the given tasks is finished by exception. If throw is true, throw
  CompositeException when one of the completed tasks has failed.

  failfast and throw keyword arguments work independently; when only throw=true is specified, this function waits for all the tasks to
  complete.

  The return value consists of two task vectors. The first one consists of completed tasks, and the other consists of uncompleted
  tasks.

foreach(wait, ...) would wait until all tasks are finished and then return, whereas waitall by default errors as soon as one of the tasks errors (or as soon as it notices).

abraemer · November 10, 2025, 1:44pm

You are falling into the optimization trap here causing you to overthink something that does not have an impact. Fortunately for your program to be fast, it is not important for everything it does to be as fast as possible - it’s sufficient if the parts that take most time to be as efficient as possible.

The work you do in the tasks should outweigh the cost of allocating a vector to store their references by many order of magnitude. So you do not need to waste time/braincycles thinking about it. Should that not be the case then you should rather change your setup ans probably not use Tasks at all.

sgaure · November 10, 2025, 2:02pm

These characteristics do not get in the way in any meaningful sense. They don’t require any allocations except for 40 bytes for the Vector wrapper (24 bytes) around Memory (16 bytes). Indexing calls getindex, which is inlined and eliminated by the compiler, just like getproperty is, and most often the indirection due to reallocatability of the Vector is also eliminated. These extra characteristics of a Vector compared to, say, a pointer, can for most purposes be thought of as a minor compile time cost.

Topic		Replies	Views
How to create a vector of variable length in julia? General Usage	7	11726	January 25, 2019
A generator with two "for" keywords is slow Performance	14	1824	December 4, 2017
Comprehensions and vectorization New to Julia	13	5035	September 20, 2017
Memory allocations when returning vectors General Usage array , memory-allocation	15	1546	June 6, 2018
Blog post: Loop fusion and vectorization in Julia 0.6 Internals & Design announcement , broadcast	28	8538	May 4, 2017

What is the most fitting container in foreach

Related topics