A generator with two "for" keywords is slow

Jan_Dolinsky · November 30, 2017, 9:29am

Hello,

I create a vector of vectors as follows.

v = map(rand, fill(10^4, 5));

I want to compute a mean of all the elements in all 5 vectors, i.e. flattening the “v”. I approached this in the following three different ways.

flatMean_generator(v) = mean( x for u ∈ v for x ∈ u )
flatMean_flatten(v) = mean(Iterators.flatten(v))

function flatMean_explicitLoop(v)
    a = zero(v[1][1])
    
    for u ∈ v, x ∈ u
        a += x
    end
    
    a / sum(length, v)
end

The timings indicate that flatMean_generator() and flatMean_flatten() are a lot slower with considerable number of allocations if compared to flatMean_explicitLoop().

@time getFlatMean_generator(v) # executed several times
0.001598 seconds (50.01 k allocations: 1.526 MiB)
@time getFlatMean_flatten(v)
0.001641 seconds (50.01 k allocations: 1.526 MiB)
@time getFlatMean_explicitLoop(v)
0.000216 seconds (5 allocations: 176 bytes)

Is this a known issue ? Generators with a single “for” keyword are in general as fast as an explicit loop.

Thanks,
Jan

dpsanders · November 30, 2017, 4:09pm

You are timing in global scope. Wrap everything in a function or use BenchmarkTools.jl instead.

EDIT: v is in global scope.

EDIT: Nonetheless, your observation seems to be correct.

dpsanders · November 30, 2017, 4:11pm

Jan_Dolinsky:

flatMean_generator(v) = mean( x for u ∈ v for x ∈ u )
flatMean_flatten(v) = mean(Iterators.flatten(v))

function flatMean_explicitLoop(v)
a = zero(v[1][1])
for u ∈ v, x ∈ u
    a += x
end

a / sum(length, v)
end

Also please make sure that code is runnable. You seem to have changed the name of your functions half way through.

Jan_Dolinsky · December 1, 2017, 3:45pm

Sorry for unintentional changing of the function names. The following should work out of the box.

# Three different approaches to flattening of a vector of vectors.
flatMean_Generator(v) = mean( x for u ∈ v for x ∈ u )
flatMean_Flatten(v) = mean(Iterators.flatten(v))
            
function flatMean_ExplicitLoop(v)
    a = zero(v[1][1])
    
    for u ∈ v, x ∈ u
        a += x
    end
    
    a / sum(length, v)
end

# The data, a vector of 5 vectors of length 10^4.
v = map(rand, fill(10^4, 5))

Using @btime gives the following.

@btime flatMean_Generator(v)
723.562 μs (62314 allocations: 1.90 MiB)

0.4983977752093445

@btime flatMean_Flatten(v)
639.019 μs (62308 allocations: 1.90 MiB)

0.4983977752093445

@btime flatMean_ExplicitLoop(v)
64.727 μs (1 allocation: 16 bytes)

0.4983977752093445

The difference between the first two functions and explicit loop is about 10x.

Thanks.
Jan

p.s. The way the timing is done should be OK as @time or @btime is benchmarking what is happening inside, e.g., flatMean_Generator(v). The fact that “v” is from a global scope should be OK too. Correct me if I am wrong.

kristoffer.carlsson · December 1, 2017, 4:02pm

How is v in global scope? It gets directly passed to the functions?

Jan_Dolinsky · December 1, 2017, 4:14pm

Yes, “v” gets directly passed to the function. To prevent de-focusing. The important thing here is that a generator with double “for” loop in it is about 10x slower than his explicit friend :).

I would like to find out whether this is a known issue.

Thanks,
Jan

Seif_Shebl · December 1, 2017, 4:57pm

A simpler mean(mean(v)) will take up similar time as explicit loops, though (version 0.6.0).

flatMean_Generator(v) = mean( x for u ∈ v for x ∈ u )
flatMean_Flatten(v) = mean(Iterators.flatten(v))
            
function flatMean_ExplicitLoop(v)
    a = zero(v[1][1])
    
    for u ∈ v, x ∈ u
        a += x
    end
        
    a / sum(length, v)
end

# The data, a vector of 5 vectors of length 10^4.
const v = map(rand, fill(10^4, 5))

using  BenchmarkTools

@btime flatMean_Generator(v)
@btime flatMean_Flatten(v)
@btime flatMean_ExplicitLoop(v)
@btime @fastmath mean(mean(v))

  362.564 μs (50008 allocations: 1.53 MiB)
  326.641 μs (50002 allocations: 1.53 MiB)
  37.462  μs (0 allocations: 0 bytes)
  41.054  μs (10 allocations: 391.02 KiB)

Jan_Dolinsky · December 1, 2017, 5:15pm

I believe the mean(mean(v)) should not work. “v” contains 5 elements with each being a vector. The inner mean(v) should not work, and so is the case for mean(mean(v)).

@btime @fastmath mean(mean(v))
DimensionMismatch("dimensions must match")

Seif_Shebl · December 1, 2017, 5:34pm

@Jan_Dolinsky Yes, it works. The inner mean(v) yields a vector of 10^4 elements, and the outer mean yields 0.5010905750825436 for one run as other functions, of course.

Jan_Dolinsky · December 1, 2017, 5:38pm

“v” has 5 elements. mean(v) can not therefore yield 10^4 elements.

Seif_Shebl · December 1, 2017, 6:42pm

@Jan_Dolinsky
mean of five scalars → one scalar,
mean of five vectors → one vector,
mean of five matrices → one matrix, and so on.

So, here v has five elements each of which is a vector, then mean should return one vector as I said, this vector has 10^4 elements, does that make sense?

dpsanders · December 1, 2017, 8:51pm

Yes, sorry, I was wrong, thanks.

nalimilan · December 2, 2017, 11:24am

Here is the allocation profile for mean after calling flatMean_flatten(v) with Julia master:

        - function mean(f::Callable, iterable)
 11248661     state = start(iterable)
        0     if done(iterable, state)
        0         throw(ArgumentError("mean of empty collection undefined: $(repr(iterable))"))
        -     end
        0     count = 1
       32     value, state = next(iterable, state)
        0     f_value = f(value)
        0     total = f_value + zero(f_value)
        0     while !done(iterable, state)
  1599968         value, state = next(iterable, state)
        0         total += f(value)
        0         count += 1
        -     end
        0     return total/count
        - end

So iterating over the Flatten iterator triggers allocations. I’m not sure why this happens, since the code appears to be type stable. Probably worth filing an issue if nobody gives an explanation here.

The fact that the generator is slow reflects the same problem, since it uses Flatten under the hood.

kristoffer.carlsson · December 2, 2017, 11:27am

While typestable, a tuple wrapping mutable state still needs to be allocated on the heap, unless an optimization removes that allocation. Perhaps this is fixed using https://github.com/JuliaLang/julia/pull/24113 which includes the more aggressive allocation elminiation.

Edit: Tried the branch, seems the problem remains.

Jan_Dolinsky · December 4, 2017, 3:53pm

@Seif_Shebl

Thanks for elaborating further. You’re observations are correct. When I re-checked the code today I could do mean(v) and get a vector of 10^4 elements. Sorry for the confusion.

In my original problem each vector is of random length and this was where my confusion was coming from.
Imagine

const v = map(rand, rand(10^3:10^4, 5));

then

mean(v)

will not work. You could do

flatMean(v) = sum(sum, v) / sum(length, v)

I however prefer a generator or a flattening operator because you could define a generator which flattens “v” and then use it in all the functions which accepts an iterator, i.e. you could do

function calculateVariousStats(v)
g = ( x for u ∈ v for x ∈ u )

mean(g), std(g) # var(g), higher order stats, etc.
end

Sorry again for the confusion.

Topic		Replies	Views
For loop Performance Performance question	8	1621	January 11, 2020
Nested for loops vs Iterators.product performance Performance question	16	231	September 16, 2024
Generator slow when passed as argument Performance question , performance , generator , map	6	474	December 20, 2022
Iterating over two vectors with vcat is faster than Iterators.flatten and CatViews? Performance memory-allocation	10	1826	October 1, 2018
For-loop over two arrays, sequentially General Usage	3	1279	May 26, 2022

A generator with two "for" keywords is slow

Related topics