Perhaps it is easiest to not try to be too smart and just do
(min1, max1), (min2, max2) = extrema(signal1), extrema(signal2)
return min(min1, min2), max(max1, max2)
Perhaps it is easiest to not try to be too smart and just do
(min1, max1), (min2, max2) = extrema(signal1), extrema(signal2)
return min(min1, min2), max(max1, max2)
@mschauer, this approach was already suggested two times in previous posts.
I donât think it is being âtoo smartâ. In every modern language (C# with LINQ, Java with Streams) I am allowed to do such lazy concatenation of arrays without performance penalties. I think there is a bug or a design flaw that needs to be addressed.
I think this is because you are going âagainst the grainâ of the memory layout here. Compare what you are trying with the âgenericâ but very natural version using an unzip
*)
mins, maxs = unzip(extrema.(signals))
minimum(mins), maximum(maxs)
It is to me more natural to compute the extremas first of each series and combine the results than to combine the series and compute the extremas. That also respects the locality of the memory better.
*) something is indeed missing
It doesnât seem so unnatural to me to want to treat two (or more!) series as one, and do some operation on it. For example, it could be that they are semantically one series that just happened to be split up for size reasons.
Perhaps the operation even depends on boundary effects which makes it hard to apply the operation on the parts separately.
@mschauer I really like your approach.
But please consider those 2 examples:
const signal1 = rand(10000000)
const signal2 = rand(10000000)
julia> @btime Iterators.flatten(($signal1, $signal2)) |> extrema
283.061 ms (20000002 allocations: 610.35 MiB)
(6.728651613663317e-8, 0.9999999625593397)
julia> @btime Iterators.flatten(zip($signal1, $signal2)) |> extrema
68.612 ms (3 allocations: 64 bytes)
(6.728651613663317e-8, 0.9999999625593397)
In the first one I am going exactly with the memory layout and allocation is high, performance is poor. In the second one I am going âagainst the grainâ, interleaving data from both arrays. For some reason that I donât understand this âagainst the grainâ version is faster and optimal.
Also when I define my own function
function my_extrema(itr)
begin
vmin = vmax = first(itr)
for item in itr
vmax = max(item, vmax)
vmin = min(item, vmin)
end
end
return (vmin, vmax)
end
Performance as is good as I would expect for original extrema
from Julia base library.
julia> @btime Iterators.flatten(($signal1, $signal2)) |> my_extrema
68.700 ms (2 allocations: 48 bytes)
(6.728651613663317e-8, 0.9999999625593397)
If someone is able to explain it to me I would be thankful. I am going to investigate it anyway but I have no time this right now.
Good that you insist. I also agree with you that something is not right when computing extrema
of a flattened iterator. It only has nothing to do with flatten (as you also said). Something is not right about how extrema
worksâŚ
Just because Iâm currently working with it, hereâs OnlineStats
â performance:
julia> o = Extrema()
Extrema: n=0 | value=(Inf, -Inf)
julia> @btime foreach(x -> fit!(o, x), ($signal1, $signal2))
51.465 ms (0 allocations: 0 bytes)
julia> value(o)
(4.8156351573069855e-8, 0.9999999801147899)
For comparison, hereâs what flatten results in on my machine:
julia> @btime Iterators.flatten(zip($signal1, $signal2)) |> extrema
43.234 ms (3 allocations: 64 bytes)
(4.8156351573069855e-8, 0.9999999801147899)
So OnlineStats
needs zero allocations but takes 20% longerâŚ
@yakir12 , thanks for another solution. I am not concerned about those tens or hundreds bytes. Hundreds of megabytes and gigabytes from above examples are painful for me.
Probably your example is slightly slower, cause access to variables can be optimised better than fields of mutable struct. 20% slower is fine for me, 400% slower not.
Out of curiosity, are you running on a system where you are not allowed to allocate (e.g. a GPU)? If not, why do you care about the total number of allocations and not just the time it takes to run the operation?
@kristoffer.carlsson I am starting a quantitative finance project and I am exploring possibility to use Julia as a backend for numeric calculations. I am not planning to use GPU, just CPU. Youâre right, I care about total time of calculation only, but I can see that there is a correlation between time of execution and amount of allocations. Also operation that clearly should not allocate at all allocates a lot (610 MiB for 150 MiB of data processed) in some rare circumstances (Base.extrema
) and works as expected for other cases (Base.sum
, Base.minimum
, my_extrema
).
I am willing to investigate it when I find some time for debugging and reading generated lowered and native code but I am also hoping that this is a known issue that someone already works on.
I agree that it would be great if map
, filter
, etc. are lazy. But I think that core devs had to balance compilation time, execution time, and usability. I imagine non-lazy map
was an OK solution for this especially in early days of Julia compiler.
Unlike many languages (except Clojure and what else?), Julia has builtin transducers. I think itâs cool that you can do:
julia> function demo()
a = (1, 2, 3)
b = SVector(4, 5, 6)
return sum(x for xs in (a, b) for x in xs)
# another way to write Iterators.flatten
end
demo (generic function with 1 method)
julia> @code_llvm demo()
; @ REPL[80]:2 within `demo'
define i64 @julia_demo_19412() {
L29.1:
; @ REPL[80]:4 within `demo'
ret i64 21
}
I tried to explain why doing flatten using transducers works so well in my talk in the last JuliaCon (YouTube). See, e.g., this slide.
If you care the performance of lazy iterators, I think Julia is one the best languages. Well, at least as long as you use foldl
and reduce
everywhere.
@tkf, thanks for links, I saw your recent PRs and I think the work you did is outstanding.
I think I did such things 10 years ago in F#, and can easily do stuff like this also in C#. I prepared even performance comparison with C#.
csharp> var a = new [] {1, 2, 3};
csharp> var b = new [] {4, 5, 6};
csharp> Time(() => new [] {a,b}.SelectMany(x => x).Sum());
00:00:00.0002290
julia> const a = [1,2,3];
julia> const b = [4,5,6];
julia> @btime sum(x for xs in ($a, $b) for x in xs)
54.606 ns (8 allocations: 192 bytes)
21
225 ns vs 54 ns
csharp> var a = new [] {1, 2, 3};
csharp> var b = Enumerable.Repeat(1, 10000000);
csharp> Time(() => new [] {a,b}.SelectMany(x => x).Sum());
00:00:00.1963760
julia> const a = [1,2,3];
julia> const b = [1 for i in 1:10000000];
julia> @btime sum(x for xs in ($a, $b) for x in xs)
8.107 ms (8 allocations: 192 bytes)
10000006
196 ms vs 8 ms
csharp> Time(() => new [] {a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b}.SelectMany(x => x).Sum());
00:00:01.8202740
julia> @btime sum(x for xs in (a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b,a,b) for x in xs)
86.810 ms (44 allocations: 1.17 KiB)
100000060
1820 ms vs 86 ms
I used interactive C# session, one can say that compiled would be faster. I checked that with all optimisations enabled itâs 1200 ms (on mac, win could do better, but no more that twice). What I really wanted to prove, that you are wrong, but I failed. Julia seems to be superior in this field, no doubt.
Anyway I still think this behaviour of flatten iterator with extrema
seems to be a bug and I opened an issue with more details: Extrema, flattened iterator and too much memory alloation ¡ Issue #34385 ¡ JuliaLang/julia ¡ GitHub
As I said in my first comment, reduce with Iterators.flatten
was slow in Julia < 1.4 which did not have transducers. Maybe you are checking this only in a released version of Julia. See:
In 1.5.0-DEV.67:
julia> @btime sum(x for xs in ($a, $b) for x in xs)
5.718 ns (0 allocations: 0 bytes)
In 1.3.1:
julia> @btime sum(x for xs in ($a, $b) for x in xs)
40.254 ns (8 allocations: 192 bytes)
Right, I used 1.3.1, on 1.5 for the last one I have:
43.255 ms (1 allocation: 16 bytes)
100000060
Itâs more that 25x faster than C#. I wouldnât believe if someone told me that. I wonder if more complicated iterators like linked lists or trees will be that much faster too.
My guess is that there wonât be much improvement for the usual linked lists or trees as iteration over such structures probably is dominated by chasing the pointers. But, I think transducers would be a great framework for iterating over âhybridâ data structures like B-trees.