Make EWMA as fast as pandas

nsajko · November 19, 2023, 2:17pm

That’s not a type conversion, rather it’s a type assertion. So it’s just there for safety.

nsajko · November 19, 2023, 2:28pm

What do you mean? Are you referring to fma? It doesn’t just make the implementation faster, it also makes it more accurate, so I don’t understand your viewpoint.

lmiq · November 19, 2023, 2:47pm

fma (muladd) and Base.OneTo, basically

The use or not of fma is subject of long discussions here, and for sensible reasons it is not the default, I don’t want to revive this discussion here.

But my point of view is that these are strategies that fall outside the scope of essential Julia syntax, so I’m curious on how much they really matter for performance, because they may cause the wrong impression that they are needed for writing performant code.

People here in the forum has expressed a view that writing performance oriented Julia code is like learning a new language, and I disagree, from my experience, with that.

adienes · November 19, 2023, 2:49pm

in this case when I tried it it was a factor 2 speedup, so yes definitely really matters

nsajko · November 19, 2023, 2:49pm

You misunderstand the situation. The discussions are about whether the compiler should be free to compile expressions like a*b + c into fma(a, b, c). So you’re complaining about the wrong thing.

Yeah, Base.OneTo is probably not necessary. My stylistic choice is to use it, though.

lmiq · November 19, 2023, 2:51pm

I’m not complaining about anything.

What I think is important is to keep the perspective that, here for example, there seems to be a 500x speedup by improving the algorithm vs. a 2x speedup by using some lower level tricks.

Just for the records, this version, with @fastmath performs similarly to the fastest one for me:

julia> function ewma_4_sym2(x::AbstractArray{T}, c::T) where {T}
         res = zeros(T, size(x))
         num = zero(T)
         den = zero(T)
         @fastmath for i ∈ eachindex(x)
           j = i - 1
           num = num * c + x[begin+j]
           den = den * c + 1
           res[begin + j] = num/den
         end
         return res
       end
ewma_4_sym2 (generic function with 1 method)

julia> @btime ewma_4_sym2($x,$c);
  120.550 μs (2 allocations: 781.30 KiB)

julia> @btime ewma_4_sym($x,$c);
  119.972 μs (2 allocations: 781.30 KiB)

julia> ewma_4_sym2(x,c) ≈ ewma_4_sym(x,c)
true

@fastmath allows for the the compiler to use fma, afaik.

Palli · November 23, 2023, 5:07pm

FYI, Base.OneTo is part of Julia’s public API. There’s noting wrong with using it, assuming appropriate.

Previously I would be skeptical of everything were you need to qualify with with Base. but I checked and at least by now OneTo is marked public with the new public keyword, so it’s not internal non-API.

Yes, it’s not the default for a* b + c, as in recent C/C++ compilers (clang). That doesn’t mean don’t use, it means, it’s always (a bit) faster, and when you’ve done the analysis that it’s safe, you can (and should?!) use it.

I didn’t do the analysis for fma here for this code, and note fma is usually more accurate, what people worry about are the rare exceptions, nor did I really look into

help?> Base.OneTo

lmiq · November 23, 2023, 5:19pm

I was not saying anything there was wrong. It just make the code less readable for the regular user. Also the optimizations that can be done by hand with fma can be handled by the @fastmath macro here, which IMO also improves readability.

nsajko · November 23, 2023, 5:22pm

This is backwards, IMO, as @fastmath is unsafe. In this case it has no ill effect, but newbies shouldn’t be encouraged to use the unsafe features of the language.

lmiq · November 23, 2023, 5:29pm

I agree. My advice for new users is to not use these macros for performance at all. In fact, even my packages I end up not using anything of these, as the benefits are very minor relative to what can be improved algorithmically.

What I think is that the codes resulting from benchmark competitions scare new users, because the problems are usually very simple and, thus, highly influenced by these micro-optimizations, which end up giving the impression that writing performant code is always this divergent from the simpler and straightforward syntax.

Topic		Replies	Views
Can this be made faster? Performance dataframes	5	550	March 19, 2022
Performance challenge: can you write a faster sum? Performance simd	31	1484	July 9, 2025
Speed up column looping in matrix Performance	6	756	July 26, 2021
Speed up Array computation New to Julia linearalgebra , arrays , matrices	4	357	October 17, 2022
Comparing performance of 2 simple averaging functions - why is one faster? Performance	5	502	August 31, 2020

Make EWMA as fast as pandas

Related topics