Suggestions for a good example of a realistic type instability for Julia demo?

I’m giving a brief demo on Julia for an audience that has never used it, and wanted a short and sweet example of a typical type-instability one might have in their code, and how to fix it. The problem is, the compiler’s gotten too good :slight_smile: I’m struggling to find something which is both short (O(5) lines of code), non-contrived, and which produces a noticeably better @benchmark or perhaps noticeably less red @code_warntype once fixed.

Any suggestions?

4 Likes

How about the classical sum example?

function mysum(x)
    s = 0
    for i in 1:length(x)
        s += x[i]
    end
    return s
end
function mysum_fixed(x)
    s = zero(eltype(x))
    for i in 1:length(x)
        s += x[i]
    end
    return s
end

@code_warntype mysum([1.0, 2.0])
@code_warntype mysum_fixed([1.0, 2.0])
2 Likes

Thanks, this is kind of what I was toying with, but the problem is that on Julia 1.1 at least there’s essentially no speed difference between the two aside from when the array is length-2 (so its a bit contrived), and while @code_warntype has two extra “red” lines, its just not quite conveying the drama I often feel in real code where I fix one type-instability on some inner loop and suddenly I’m like 10x faster.

1 Like

So just put that example in an inner loop? :slight_smile:

Yea, but the percentage speed difference for the length-2 case is still only like 15% so I’d have to add some other complexity to this loop to bring out any real difference…

I don’t know what brief means in this context, but for anything shorter than ~2 hours, perhaps I would not include an actual example. Yes, type stability is important, but you can just demonstrate what it is used for, by giving an example with code_warntype (the function, not the macro) where the types are figured out throughout the function just from the argument types.

It is not that difficult to run into type unstable code in real-life problems, but those are usually more involved than ~5LOC, and bringing in the whole benchmarking and diagnostic machinery for them may just be confusing for a Julia-naive audience.

1 Like

Not really an example of type (in)stability, but type-unaware use of Arrays still hurt the performance. This is similar to the sum example in the sense you need to be careful about the type of the “zero” of the reducing function.

function bad_positives(xs)
    ys = []
    for x in xs
        if x > 0
            push!(ys, x)
        end
    end
    return ys
end

function good_positives(xs)
    ys = similar(xs, 0)
    for x in xs
        if x > 0
            push!(ys, x)
        end
    end
    return ys
end

data = randn(1000)
julia> @btime sum(bad_positives($data))
  14.167 μs (1008 allocations: 23.94 KiB)
395.76576278356066

julia> @btime sum(good_positives($data))
  3.342 μs (9 allocations: 8.33 KiB)
395.76576278356083

But both functions are technically type stable. I also came up with

julia> data = randn(1000);

julia> @btime sum(xs -> get(xs, 1, 0.0), (x > 0 ? [x] : [] for x in $data))
  41.299 μs (2518 allocations: 109.88 KiB)
422.3665048562676

julia> @btime sum(xs -> get(xs, 1, 0.0), (x > 0 ? Float64[x] : Float64[] for x in $data))
  24.201 μs (1005 allocations: 86.23 KiB)
422.3665048562676

but this is very contrived :slight_smile:

Maybe do it from the caller’s perspective:

julia> using BenchmarkTools

julia> aa = Any[1:1000;];

julia> ai = [1:1000;];

julia> @btime sum($aa)
  59.873 μs (1458 allocations: 22.78 KiB)
500500

julia> @btime sum($ai)
  199.573 ns (0 allocations: 0 bytes)
500500

julia> @code_warntype sum(aa)
Body::Any
1 ─ %1 = Base.identity::typeof(identity)
│   %2 = Base.add_sum::typeof(Base.add_sum)
│   %3 = invoke Base._mapreduce(%1::typeof(identity), %2::typeof(Base.add_sum), $(QuoteNode(IndexLinear()))::IndexLinear, _2::Array{Any,1})::Any
└──      return %3

julia> @code_warntype sum(ai)
Body::Int64
1 ── %1  = Base.identity::typeof(identity)
│    %2  = Base.add_sum::typeof(Base.add_sum)
...
7 Likes

Wow, I am amazed. Does anyone know what improvement was responsible for that?

An example that demonstrates that small-union optimization is cool, but still expensive:

julia> @noinline function mymax(init, x)
           s = init
           for i in 1:length(x)
               xi = x[i]
               s = s<xi ? xi : s
           end
           return s
       end
julia> r=rand(Int, 1000);
julia> @btime mymax(-Inf32, r)
  9.255 μs (1 allocation: 16 bytes)
julia> @btime mymax(0x00, r);
  12.723 μs (1 allocation: 16 bytes)
julia> @btime mymax(-Inf, r);
  2.731 μs (1 allocation: 16 bytes)
julia> @btime mymax(typemin(Int), r)
  910.500 ns (1 allocation: 16 bytes)

In theory, inference could figure out that the type of s in the loop is a state machine with transitions typeof(init)->typeof(init), typeof(init)->eltype(x), eltype(x)->eltype(x) and lacks the transition eltype(x)->typeof(init). Hence, the loop could be split into two loops, where the second is type-stable and the first has better type-stability (the comparison s<xi has known types, but the result of the conditional expression is unstable and a branch on its type is needed to decide whether to jump into the second loop)

That optimization appears to be missing or unreliable.

PS: The optimized version (that indeed avoids most of the penalty) would be

julia> @noinline function mymax_loopsplit(init, x)
           s = init
           i=1
           while i <= length(x) && s isa typeof(init) 
               xi = x[i]
               s = s<xi ? xi : s
               i += 1
           end
           while i <= length(x) && s isa eltype(x)
               xi = x[i]
               s = s<xi ? xi : s
               i += 1
           end
           return s
       end

It would be cool if the compiler could learn that (e.g.: during inference, record possible transitions of types, where initially everything is Union{}. Then collapse strong components, such that we get a nice DAG; now we can improve type stability by duplicating code and adding the requisite control flow. I’m not knowledgeable enough about compilers to have an opinion on how hard this would be in julia, how expensive this would be, or how this optimization is called in the literature).

2 Likes

Nice example, thanks!

The difference in performance can be even higher if bounds checking is avoided. For example:

function mymax(init, x)
    s = init
    for xi in x
        s = s<xi ? xi : s
    end
    return s
end
julia> r = rand(Int, 1000);

julia> @btime mymax(-Inf, $r)
  2.228 μs (0 allocations: 0 bytes)
9219970391413741430

julia> @btime mymax(typemin(Int), $r)
  149.559 ns (0 allocations: 0 bytes)
9219970391413741430

EDIT:

Not sure what happens here, but your function with bounds checking elision is significantly faster than mine in the type-unstable case. I would have expected both versions to perform similarly, as they do in the type-stable case:

function mymax_foobar(init, x)
    s = init
    @inbounds for i in 1:length(x)
        xi = x[i]
        s = s<xi ? xi : s
    end
    return s
end
julia> @btime mymax_foobar(-Inf, $r)
  1.649 μs (0 allocations: 0 bytes)
9219970391413741430

julia> @btime mymax_foobar(typemin(Int), $r)
  149.132 ns (0 allocations: 0 bytes)
9219970391413741430
1 Like

Thanks for all the suggestions. I went with a combination @Tamas_Papp’s code_warntype to demonstrate inference (the function not macro, which on further thought I agree hammers home the point that only the types matter better than the macro), and @mauro3’s benchmark to give a dramatic example of where this matters. I think that hit the right level and was well received (this was 30min for a group of cosmologists, like myself).

I didn’t quite see it in time but your version of mymax @foobar_lv2 is great though. That’s basically the right level of 1) big difference in runtime 2) easy to understand what the code does and why its type-unstable and 3) the @code_warntype is short enough you can look at it and see / understand the red. Thanks, I think I would use this next time!

1 Like