Performance drop x10 if `missing` used or two values returned from function

Alex1 · October 16, 2025, 9:29am

Hi, while doning GARCH fitting I discovered strange performance drop:

Case 1 Returning two values instead of one is x10 times slower 1.5ms vs 23μs. See place marked with CHANGE1. Is that a known issue and should not be used?

Case 2 The missing values slows down by x10 1.5ms vs 5μs see place marked with CHANGE2. It seems like a known issue, what’s the best workaround, using NaN or special float values?

Case 3 The multi method is x1.5 slower than simple method. I assumed compiler would optimise it into same code, but it’s not.

The Benchmark for Case 1, 2

using Distributions, BenchmarkTools

struct SimpleVol end

@inline predict_explicit(Q, prev, r) = begin
  d = r - Q.μ
  y = sqrt(d*d + 1e-6)

  v = Q.α*y + (1-Q.α)*prev.v

  (; v), 0.0
  # (; v) # CHANGE1
end

vol_llh_explicit(T, Q, rs) = begin
  n = length(rs)
  l = findfirst(!ismissing, rs)
  state = (; v=abs(rs[l]))
  for t in max(l, 2):n
    rs[t] === missing && continue
    state, _ = predict_explicit(Q, state, rs[l])
    # state = predict_explicit(Q, state, rs[l]) # CHANGE1
  end
end;

returns = rand(Normal(0, 0.015), 10_000);

isweekend(i) = (mod(i,7) == 6) || (mod(i,7) == 0);
returns2 = [isweekend(i) ? missing : returns[i] for i in 1:length(returns)];

# Warmup
vol_llh_explicit(SimpleVol, (α = 0.048, μ = 0.0004, ν = 4.203), returns2)

# Bench
println("Explicit Method")
@benchmark vol_llh_explicit(SimpleVol, (α = 0.048, μ = 0.0004, ν = 4.203), returns2)

# CHANGE2 use returns instead of returns2 in benchmark

Benchmark for Case 3

using Distributions, BenchmarkTools

struct SimpleVol end

@inline predict(::Type{SimpleVol}, Q, prev, r) = begin
  d = r - Q.μ
  y = sqrt(d*d + 1e-6)

  v = Q.α*y + (1-Q.α)*prev.v

  (; v), 0.0
  # (; v) # CHANGE1
end

@inline predict_explicit(Q, prev, r) = begin
  d = r - Q.μ
  y = sqrt(d*d + 1e-6)

  v = Q.α*y + (1-Q.α)*prev.v

  (; v), 0.0
  # (; v) # CHANGE1
end

vol_llh_multi(T, Q, rs) = begin
  n = length(rs)
  l = findfirst(!ismissing, rs)
  state = (; v=abs(rs[l]))
  for t in max(l, 2):n
    rs[t] === missing && continue
    state, _ = predict(T, Q, state, rs[t])
    # state = predict(T, Q, state, rs[t]) # CHANGE1
  end
end;

vol_llh_explicit(T, Q, rs) = begin
  n = length(rs)
  l = findfirst(!ismissing, rs)
  state = (; v=abs(rs[l]))
  for t in max(l, 2):n
    rs[t] === missing && continue
    state, _ = predict_explicit(Q, state, rs[l])
    # state = predict_explicit(Q, state, rs[l]) # CHANGE1
  end
end;

returns = rand(Normal(0, 0.015), 10_000);

isweekend(i) = (mod(i,7) == 6) || (mod(i,7) == 0);
returns2 = [isweekend(i) ? missing : returns[i] for i in 1:length(returns)];

# Warmup
vol_llh_explicit(SimpleVol, (α = 0.048, μ = 0.0004, ν = 4.203), returns2)
vol_llh_multi(SimpleVol, (α = 0.048, μ = 0.0004, ν = 4.203), returns2)

# Bench
println("Explicit Method")
@benchmark vol_llh_explicit(SimpleVol, (α = 0.048, μ = 0.0004, ν = 4.203), returns2)
println("Multi Method")
@benchmark vol_llh_multi(SimpleVol, (α = 0.048, μ = 0.0004, ν = 4.203), returns2)

# CHANGE2 use returns instead of returns2 in benchmark

P.S. The issue on stack overflow, but it’s a bit unrelated, it only about 3d case.

lmiq · October 16, 2025, 4:21pm

I’m not sure about the exact internal details of that, but it is related to the fact that, as it is, the state type cannot be completely inferred because rs[l] cannot be inferred to not be missing. That maybe is the regression.

In any case, if you instead of collecting state into a named tuple, you collect it in a custom struct with a concrete predefined type, the issue disappears and both versions become faster:

old code:

# single return value - without returning `state`:
julia> @b vol_llh_explicit($SimpleVol, $((α = 0.048, μ = 0.0004, ν = 4.203)), $returns2)
8.346 μs

# two return values
julia> @b vol_llh_explicit($SimpleVol, $((α = 0.048, μ = 0.0004, ν = 4.203)), $returns2)
615.610 μs (35719 allocs: 558.109 KiB)

# single return value (return state from function)
julia> @b vol_llh_explicit($SimpleVol, $((α = 0.048, μ = 0.0004, ν = 4.203)), $returns2)
486.399 μs (28576 allocs: 446.500 KiB)

New code:

# 2 return values
julia> @b vol_llh_explicit($SimpleVol, $((α = 0.048, μ = 0.0004, ν = 4.203)), $returns2)
8.346 μs

# single return value
julia> @b vol_llh_explicit($SimpleVol, $((α = 0.048, μ = 0.0004, ν = 4.203)), $returns2)
8.343 μs

Code:

struct SimpleVol end

struct State
    v::Float64
end

function predict_explicit(Q, prev, r)
  d = r - Q.μ
  y = sqrt(d*d + 1e-6)
  v = Q.α*y + (1-Q.α)*prev.v
  #(; v), 0.0
  #(; v) # CHANGE1
  #State(v)
  State(v), 0.0
end

function vol_llh_explicit(T, Q, rs)
  n = length(rs)
  l = findfirst(!ismissing, rs)
  state = State(abs(rs[l]))
  #state = (; v = abs(rs[l]))
  for t in max(l, 2):n
    rs[t] === missing && continue
    state, _ = predict_explicit(Q, state, rs[l])
    #state = predict_explicit(Q, state, rs[l]) # CHANGE1
  end
  return state # addition: makes the results more meaningful, probably
end

Edit: the other thing is that if you return state from the functions the results change, because otherwise the compiler can eliminate the loop. Even with the one-value return function, by simply returning state from the function, there are lots of allocations. Edited the benchmarks above.

pdeffebach · October 16, 2025, 5:55pm

In line with what @lmiq said, a simple type annotation inside predict can also fix the problem.

julia> function predict_explicit_float(Q, prev, r)
         d = r - Q.μ
         y = sqrt(d*d + 1e-6)
         v::Float64 = Q.α*y + (1-Q.α)*prev
         v, 0.0
       end;

julia> function vol_llh_explicit_float(T, Q, rs)
         n = length(rs)
         l = findfirst(!ismissing, rs)
         state = abs(rs[l])
         for t in max(l, 2):n
           rs[t] === missing && continue
           state, _ = predict_explicit_float(Q, state, rs[l])
         end
         return state
       end;

pdeffebach · October 16, 2025, 6:17pm

skipmissing might be the best solution, actually.


julia> function predict_explicit(Q, prev, r)
         d = r - Q.μ
         y = sqrt(d*d + 1e-6)
         v = Q.α*y + (1-Q.α)*prev
         v, 0.0
       end;

julia> function vol_llh_explicit_skipmissing(T, Q, rs)
         n = length(rs)
         rs_sm = skipmissing(rs)
         state = abs(first(rs_sm))
         for r in rs_sm
           state, _ = predict_explicit(Q, state, r)
         end
         return state
       end;

lmiq · October 16, 2025, 6:30pm

Why that one and not skipmissing from base?

(I think you want the base one, the one from Missings.jl is not exported)

pdeffebach · October 16, 2025, 6:41pm

Oh sorry. I forgot that skipmissing was in Base. skipmissings (note the s) is an interesting experiment but is not exported because it doesn’t actually solve all the problems its meant to solve.

Alex1 · October 17, 2025, 5:00am

Thanks for help, indeed, as you suggested:

The biggest problems seems to be type instability, and adding type annotation r::Float64 = rs[t] in vol_llh_explicit.

Additional annotation inside predict_explicit with v::Float64 = Q.α*y + (1-Q.α)*prev.v also speeds it up a little bit.

After these two changes it’s fast. And missing or multi dispatch also resolved, and are same fast.

Fast version 15μs, 0 allocations.

using Distributions, BenchmarkTools

struct SimpleVol end

@inline predict(::Type{SimpleVol}, Q, prev, r) = begin
  d = r - Q.μ
  y = sqrt(d*d + 1e-6)
  v::Float64 = Q.α*y + (1-Q.α)*prev.v
  (; v), 0.0
end

vol_llh(T::Type, Q, rs) = begin
  n = length(rs)
  l = findfirst(!ismissing, rs)
  r::Float64 = rs[l]
  state = (; v=abs(r))
  for t in max(l, 2):n
    rs[t] === missing && continue
    r = rs[t]
    state, _ = predict(T, Q, state, r)
  end
  state
end;

returns = rand(Normal(0, 0.015), 10_000);
isweekend(i) = (mod(i,7) == 6) || (mod(i,7) == 0);
returns2 = [isweekend(i) ? missing : returns[i] for i in 1:length(returns)];

# Warmup
vol_llh(SimpleVol, (α = 0.048, μ = 0.0004, ν = 4.203), returns2)

# Bench
@benchmark vol_llh(SimpleVol, (α = 0.048, μ = 0.0004, ν = 4.203), returns2)

danielwe · October 17, 2025, 8:30am

Just a note that you almost never want to annotate on the left-hand side. It has all sorts of weird semantics. Do this instead:

    v = (Q.α*y + (1-Q.α)*prev)::Float64

Alex1 · October 17, 2025, 9:10am

Thanks, I see it’s conversion vs assertion, yes assertion is preferable. I guess the r = rs[t]::Float64 is also preferable to r::Float64 = rs[t].

Alex1 · October 17, 2025, 4:34pm

I updated the real code and now it’s much faster, thanks!

AD makes it more complicated, handling both Dual and Float64.

P.S.

Also inconvenience - lack of updating structs without reloading julia, NamedTuple work but doesnt’ look very nice instead of

@inline predict(
  Q::Params{T}, prev::State{T}, r::Float64, r2::Float64
) where {T<:Real} = begin
  ...
  return ((; v, μ, s))::State{T}, llh::T
end

You have to use

@inline predict(
  Q::@NamedTuple{
    a::T, a_up_neg::T, b::T, p::T, m::T, le_mean::T, wr::T, sr::T,
    up_sharpness::T, soft_abs_eps::T
  },
  prev::@NamedTuple{v::T, μ::T, s::T},
  r::Float64, r2::Float64
) where {T<:Real} = begin
  ...

  return ((; v, μ, s))::@NamedTuple{v::T, μ::T, s::T}, llh::T
end

pdeffebach · October 17, 2025, 4:58pm

The skipmissing solution avoids any need for a hard-coded type.

mbauman · October 17, 2025, 5:19pm

Another strategy here is to manually union-split the tuple. In short, Julia’s performance of Tuple{Union{Missing,Float64}, Float64} is significantly worse than Union{Tuple{Missing,Float64}, Tuple{Float64,Float64}}.

Even if you know that first element should never be missing, you can typically connive Julia into transforming the tuple-with-union to a union-of-tuples with:

if x === missing
    return (missing, y)
else
    return (x, y)
end

danielwe · October 17, 2025, 5:41pm

Yes, but not just a one-time conversion. If you put a left-hand side type annotation on a variable once, all assignments to that variable in the entire scope will perform conversion, not just the one you annotated. That’s what I meant by “all sorts of weird semantics”.

Alex1 · October 20, 2025, 5:08am

Final version, fast and allows variable arguments as input::NTuple{N, Float64}. Makes possible to use different algorithms with different input parameters.

@inline predict(
  ::Type{Vol1d},
  Q::@NamedTuple{
    a::T, a_up_neg::T, b::T, p::T, m::T, le_mean::T, wr::T, sr::T,
    up_sharpness::T, soft_abs_eps::T
  },
  prev::@NamedTuple{v::T, μ::T, s::T},
  input::NTuple{N, Float64},
  r2::Float64
) where {T<:Real, N}

Topic		Replies	Views
Missing or NaN General Usage	26	12462	August 1, 2018
With Missings, Julia is slower than R General Usage	30	4318	February 26, 2021
Missing data and NamedTuple compatibility Internals & Design	92	10803	April 2, 2018
Significant decrease in performance after seemingly irrelevant changes Performance	16	1249	June 29, 2019
Improving performance elegantly - type stability General Usage type-stability	20	1479	January 13, 2019

Performance drop x10 if `missing` used or two values returned from function

Related topics