Concatenating iterables without allocating memory

Jakub_Wronowski · January 12, 2020, 11:50pm

Hi, for a library I am working now I need to find minimum and maximum values. I am looking elegant way to express that.

signal1 = rand(10000000)
signal2 = rand(10000000)
@time lo, hi = extrema(vcat(signal1, signal2))
  0.159702 seconds (7 allocations: 152.588 MiB)

Code above looks good but allocates lots of memory just to iterate values, so it’s not making sense for me.

In C# I can write

(lo, hi) = signal1.Concat(signal2).Extrema();

and there is no allocation at all. It there any predefined method in Base that works in a similar way?

ChrisRackauckas · January 12, 2020, 11:54pm

Iterators.flatten

StevenSiew · January 12, 2020, 11:59pm

julia> vec1 = rand(100);

julia> vec2 = rand(100);

julia> extrema1 = extrema(vec1)
(0.019555577401858537, 0.9995712478383711)

julia> extrema2 = extrema(vec2)
(0.016606741252352952, 0.9849117868980357)

julia> extrema_combined = extrema(vcat(collect(extrema1),collect(extrema2)))
(0.016606741252352952, 0.9995712478383711)

ChrisRackauckas · January 13, 2020, 12:01am

collect will allocate, and so will vcat.

Jakub_Wronowski · January 13, 2020, 12:01am

@ChrisRackauckas it still allocates memory

@time Iterators.flatten((signal1, signal2)) |> extrema
  0.285192 seconds (20.00 M allocations: 610.352 MiB, 24.17% gc time)

StevenSiew · January 13, 2020, 12:02am

julia> length(extrema1)
2

Yes it will allocate! I am losing sleep over the remaining of my 8 gigabytes of computer memory.

Jakub_Wronowski · January 13, 2020, 12:04am

@ChrisRackauckas, ok this works for me, I need to read implementations to understand what is going on

@time Iterators.flatten(zip(signal1, signal2)) |> extrema
  0.080821 seconds (8 allocations: 256 bytes)

ChrisRackauckas · January 13, 2020, 12:10am

Just because an example is length 2 doesn’t mean the real code is… it’s still a valid question to compose iterators without allocating intermediates…

Jakub_Wronowski · January 13, 2020, 12:12am

I think @StevenSiew is right here, extrema always returns 2 elements no matter how long input is, and it does not allocate.

Jakub_Wronowski · January 13, 2020, 12:37am

I will stick to solution with Iterators.flatten, I wrote a helper method

function concatenate(iterables...)
    return zip(iterables...) |> Iterators.flatten
end

@time lo, hi = extrema(concatenate(signal1, signal2))
  0.072824 seconds (12 allocations: 352 bytes)
(4.430056321780285e-9, 0.9999998165880861)

I cannot reason why sometimes flatten allocates even more than the size of arrays is

@time Iterators.flatten((signal1, signal2)) |> extrema
  0.281161 seconds (20.00 M allocations: 610.352 MiB, 23.62% gc time)
(4.430056321780285e-9, 0.9999998165880861)

@time Iterators.flatten([signal1; signal2;]) |> extrema
  0.174777 seconds (8 allocations: 152.588 MiB, 11.98% gc time)
(4.430056321780285e-9, 0.9999998165880861)

@time Iterators.flatten(vcat(signal1, signal2)) |> extrema
  0.159507 seconds (8 allocations: 152.588 MiB)

@time Iterators.flatten(zip(signal1, signal2)) |> extrema
  0.073975 seconds (8 allocations: 256 bytes)
(4.430056321780285e-9, 0.9999998165880861)

It seems implementation is here, correct me if I am wrong:
https://github.com/JuliaLang/julia/blob/2d5741174ce3e6a394010d2e470e4269ca54607f/base/iterators.jl#L898
I will try to figure it out tomorrow, it’s very late in my time zone.

Jakub_Wronowski · January 14, 2020, 1:01am

It looks like the problem is with extrema function. I put details in existing github issue. It describes different problem but they can be fixed together. I think I have draft of the solution, if no one will reply I will open pull request.

https://github.com/JuliaLang/julia/issues/31442#issuecomment-573939429

aramirezreyes · January 14, 2020, 4:36am

Is CatViews.jl useful for your purposes?

Jakub_Wronowski · January 14, 2020, 6:00am

Nope, it is slow and allocates memory like a crazy, but thanks, the API is nice.

ulia> @time CatView(signal1, signal2) |> minimum
 24.332661 seconds (320.16 M allocations: 14.611 GiB, 6.30% gc time)
2.4065947235030194e-8

julia> @time CatView(signal1, signal2) |> extrema
 25.199472 seconds (340.00 M allocations: 14.901 GiB, 6.13% gc time)
(2.4065947235030194e-8, 0.9999999501303669)

julia> @time Iterators.flatten((signal1, signal2)) |> extrema
  0.311673 seconds (20.00 M allocations: 610.352 MiB, 25.23% gc time)
(2.4065947235030194e-8, 0.9999999501303669)

aramirezreyes · January 14, 2020, 6:17am

Wow, interesing! I thought that was what CatViews was for!

Argel Ramírez Reyes

DNF · January 14, 2020, 6:51am

In addition to the problems with extrema, you shouldn’t use the @time macro for benchmarking, and especially avoid benchmarking with non-constant globals.

Use the BenchmarkTools package, and remember to interpolate the variables.

tkf · January 14, 2020, 7:21am

Flatten is a typical example where performance is not great with the lazy iterator approach. Fortunately, we’ll have transducers in Julia 1.4. It means that we can write efficient “fused” extrema quite easily:

julia> using BenchmarkTools

julia> @btime extrema(Iterators.flatten(($signal1, $signal2)));
  258.958 ms (20000002 allocations: 610.35 MiB)

julia> @btime mapfoldl(
           x -> (x, x),
           ((a, b), (c, d)) -> (min(a, c), max(b, d)),
           Iterators.flatten(($signal1, $signal2)),
       );
  42.014 ms (0 allocations: 0 bytes)

OK, I lied. Above example actually does not work yet, until this bug fix is merged

https://github.com/JuliaLang/julia/pull/34369

Trying to comment here actually helped me find out this bug

GunnarFarneback · January 14, 2020, 7:48am

This solution is not particularly nice but it is allocation free in released Julia 1.x versions:

julia> @btime extrema(Iterators.flatten(extrema.(($signal1, $signal2))));
  47.482 ms (0 allocations: 0 bytes)

DNF · January 14, 2020, 7:59am

Is this the syntax with transducers? I must admit I find it a bit hard to read.

tkf · January 14, 2020, 8:12am

The ugly mapfoldl I wrote above is just a plain old mapfoldl. It should be executable in all Julia 1.x.

I should’ve clarified that transducers built in Base is just an implementation detail for executing foldl (see Transducer as an optimization: map, filter and flatten by tkf · Pull Request #33526 · JuliaLang/julia · GitHub). Things like sum(x for x in xs if x !== missing) are processed through transducers “behind your back” so that it can be compiled to a better machine code. But there is no stable API for touching transducers in Base.

I think it’s better to use transducers in reduce as well so that you can just write extrema(Iterators.flatten(...)) and it’s fast.

Jakub_Wronowski · January 14, 2020, 9:33am

@tkf, thank you so much, I am happy my example helped you to fix that
I am coming from C# and F# world, and I am really missing those pipelining features. When I first realised that in Julia map(x->2x, someArray) allocates and array for its result I was more than disappointed

Topic		Replies	Views
ANN: Transducers.jl, efficient and composable algorithms for map- and reduce-like operations Package Announcements	24	5401	March 25, 2019
Slow `reduce(vcat, itr)` Performance cat	26	1394	September 13, 2022
Iterating over two vectors with vcat is faster than Iterators.flatten and CatViews? Performance memory-allocation	10	1867	October 1, 2018
ANN: Transducers.jl 0.3. taking "zeros" seriously, type stability improvements, fusible groupby, OnlineStats, "GPU support", and more Package Announcements	29	3137	December 2, 2020
Could Transducers help here? Performance performance , transducers	4	983	January 23, 2019

Concatenating iterables without allocating memory

Related topics