Best practices for dealing with `Union{Nothing,...}`

I’d like learn how to write good code that deals with Union{Nothing,...}. For example what’s the right way to pattern-match a value of this type in such a way that the compiler can infer what all the types are? For example, consider the following code.

module Tmp
const labels1 = ('a','b','c')
const labels1 = ('1','2','3')

function tryparser(s::AbstractString)
    if length(s) != 2
        return nothing
    end
    c1,c2 = s
    i1 = findfirst(isequal(c1), labels1)
    i2 = findfirst(isequal(c2), labels2)
    isnothing(i1) || isnothing(i2) ? nothing : (i1, i2)
end
end

The function tryparser should have a return type of Union{Nothing,Tuple{Int64,Int64}}, however @code_warntype Tmp.tryparser("a1") starts with Body::Union{Nothing, Tuple{Union{Nothing, Int64},Union{Nothing, Int64}}} which I assume means the compiler is unable to infer that when I return a tuple it never has any nothings inside. Perhaps this is just not a problem.

Essentially, my question is: What is the idiomatic way to write code like the above?

Simplifying the logic a little bit seems to help the parser compiler in this case:

(also note that your example above defines labels1 twice and does not define labels2, which interferes with the inference results).

module Tmp
const labels1 = ('a','b','c')
const labels2 = ('1','2','3')

function tryparser(s::AbstractString)
    if length(s) != 2
        return nothing
    end
    c1,c2 = s
    i1 = findfirst(isequal(c1), labels1)
    i2 = findfirst(isequal(c2), labels2)
    if i1 === nothing
      return nothing
    end
    if i2 === nothing
      return nothing
    end
    (i1, i2)
end
end
julia> @code_warntype Tmp.tryparser("hello")
Body::Union{Nothing, Tuple{Int64,Int64}}
3 Likes

You can also always just assert the correct types manually like this:

julia> function tryparser(s::AbstractString)
           if length(s) != 2
               return nothing
           end
           c1,c2 = s
           i1 = findfirst(isequal(c1), labels1)
           i2 = findfirst(isequal(c2), labels2)
           isnothing(i1) || isnothing(i2) ? nothing : (i1::Int, i2::Int)
       end
tryparser (generic function with 1 method)

julia> @code_warntype tryparser("a2")
Body::Union{Nothing, Tuple{Int64,Int64}}
2 Likes

Sorry about the labelsn typo.

Thanks! That’s what I was looking for. Interestingly replacing i1===nothing with isnothing(i1) does not work. Nor does i1===nothing || i2===nothing. So I assume the lesson is that the current compiler (Julia 1.1.1) is somewhat limited in what it can infer.

Interesting – do the asserts (which will always succeed) carry any performance penalty?

No, they’re only for the compiler, so there won’t be any runtime cost. Though of course, if the assertion fails, an error will be thrown

1 Like

But if they can fail at runtime then some checking must be happening, which must carry a runtime cost, right?

So based on @rdeits answer I thought I would “monadify” it as follows:

function bnd(f, x)
    x === nothing ? nothing : f(x)
end

function tryparsemove2(s::AbstractString)
    if length(s) != 2
        return nothing
    end
    c1,c2 = s
    bnd(findfirst(isequal(c2), labels2)) do i2
        bnd(findfirst(isequal(c1), labels1)) do i1
            i1,i2
        end
    end
end

which seems to work:

julia> @code_warntype Tmp.tryparsemove2("a2")
Body::Union{Nothing, Tuple{Int64,Int64}}

Is this good Julia? Horribly inefficient?

The logical conclusion of this monadic style is (I think) this:

function tryparsepairchars(s)
    if length(s) != 2
        return nothing
    end
    c1,c2 = s
    c1,c2    
end

function tryparser3(s::AbstractString)
    bnd(tryparsepairchars(s)) do (c1, c2)
        bnd(findfirst(isequal(c2), labels2)) do i2
            bnd(findfirst(isequal(c1), labels1)) do i1
                i1,i2
            end
        end
    end
end

Which is, I guess, cute, but I’d be curious to know what those with more Julia experience think.

I’m curious why not just do:

function tryparser(s::AbstractString)::Union{Tuple{Int64, Int64}, Nothing}

The recommendation I guess to let the compiler guess EVERYTHING except when it can’t. On the other hand it is sometimes nice with a large code base to be able to look at a function and know what it returns without having to read the code…

1 Like

I agree, indeed this is very similar to the suggestion of @simeonschaub (although that suggestion places the type more locally, rather than at the function signature level). I’d love to get some insight into the relative merits of the various approaches.

For myself I fully “decorate” my methods, which means declaring types and return values where appropriate. What I mean by appropriate is when the value is saved as a type X or passed to a C call as type X, then the method doing the saving/calling specifies that the value is of type X. This then “filter’s up” through the call stack so that value is always a type X. This avoid unnecessary conversions. Converting between primitives is fast, but not instant. This does in practice mean that ALL methods are fully decorated, because most if not all values eventually find their way to disk or the UI (and with GTK I need to specify the types of the values) in an application.

There is an argument here about CPU time vs programmer’s time and which is more expensive. I either spend my time making sure my data types are correct, or I spent the CPU time having it convert the data types. This would say spend the CPU time unless it’s running on a huge cluster and doing billions of conversions, the CPU time is probably cheaper.

On the other hand specifying data types also avoid “unexpected” exceptions like if you need to treat the value as unsigned but but a signed was passed in then you WILL get an exception during that conversion. Granted this could be solved by declaring the type as “unsigned” and just filtering that generic up…which just feels like half-assing it to me, if you are going to specify unsigned why not just specify the size as well?

If I want/need a method to work with different data types, then of course I don’t decorate and let the power of Julia create custom methods for each data type that is needed. However this seems to be rare when writing a program. Building a library it would be done more often.

The down side of this is a side effect of the method matching based on the arguments algorithm. Julia treats any constant, 1234, as an Int64 or, 1234.56, as a Float64 (on a 64bit machine). So if my method takes a Int32 I have write Int32(1234) to pass the value 1234, which is kind of annoying. However it’s basically what has to happen. If I declare foo(a::Int32) and foo(a::UInt32) then make a call to foo(1234) which method should be called? There is no “correct” answer for all situations.

1 Like

I think it’s clever, and if the @code_warntype looks good then it should be efficient. The key observation is that the compiler can make good use of checks like x === nothing (the triple equals part is important) because it is guaranteed that that check is true if and only if x is of type Nothing (because Nothing is a struct with no fields). Thus that comparison can be evaluated based only on the types of the arguments, which the compiler is pretty good at figuring out.

The only issue you might run into is https://github.com/JuliaLang/julia/issues/15276 (you’ll know if you see a bunch of Core.Box types in your @code_warntype). But, again, if the @code_warntype output is good then you should be fine. Most likely the compiler will be able to avoid the overhead of actually calling the bnd function and will do the right thing.

It’s also worth noting that this isn’t a special property of Nothing, but simply a general property of singletons. You can make your very own nothing-like type and get the same kind of performance:

julia> struct MySingleton
       end

julia> const mysingleton = MySingleton()
MySingleton()

julia> function f(x)
         if x === mysingleton
           1.0
         else
           "hello"
         end
       end
f (generic function with 1 method)

julia> @code_warntype(f(mysingleton))
Body::Float64
1 ─     return 1.0

Not that I’m recommending you do that here: using nothing is perfectly fine in your case. I’m just trying to make the point that there’s nothing magic about nothing in particular.

3 Likes

I also want to add that difference in inference of isnothing vs === nothing also applies to missing (which is a singleton as well). Keno explains why that’s the case in https://github.com/JuliaLang/julia/issues/27681#issuecomment-399054971

=== is special in inference and inference happens before inlining

Note that isa is also special; so x === nothing should also be as good as x isa Nothing.

Regarding monadic style, I don’t think it is regarded as “bad” in Julia. For example, there is a discussion about adding a special syntax to lift a function on Union{T, Missing}: Status of question mark syntax for missing values

But my impression is writing something like

i1 = findfirst(isequal(c1), labels1)
i1 === nothing && return nothing
i2 = findfirst(isequal(c2), labels2)
i2 === nothing && return nothing
return (i1, i2)

is more common. This is so common that IterTools.jl has @ifsomething which can be used to write

i1 = @ifsomething findfirst(isequal(c1), labels1)
i2 = @ifsomething findfirst(isequal(c2), labels2)
return (i1, i2)

This expands to something like above.

4 Likes

I think the actual issue here is that conditional expressions do not play well with the type inference engine.

This is imo a shortcoming of the compiler (gimme more pi-nodes), and the workaround is as @rdeits said: Use control flow (if/else/@goto/early return) instead of conditional expressions. Unfortunately I’m not deep enough into the related parts of the optimizer to propose a fix, or debug the precise reason for the current approach failing to optimize this.

A smaller reproducer would be

julia> function ttt(x) 
       xx=x[1]
       yy=x[2]
       xx isa Nothing || yy isa Nothing ? nothing : (xx, yy)
       end

julia> @code_typed ttt([1, nothing])
CodeInfo(
1 ─ %1  = (Base.arrayref)(true, x, 1)::Union{Nothing, Int64}
│   %2  = (Base.arrayref)(true, x, 2)::Union{Nothing, Int64}
│   %3  = (%1 isa Main.Nothing)::Bool
└──       goto #3 if not %3
2 ─       goto #4
3 ─ %6  = (%2 isa Main.Nothing)::Bool
4 ┄ %7  = φ (#2 => %3, #3 => %6)::Bool
└──       goto #6 if not %7
5 ─       return Main.nothing
6 ─ %10 = (Core.tuple)(%1, %2)::Tuple{Union{Nothing, Int64},Union{Nothing, Int64}}
└──       return %10
) => Union{Nothing, Tuple{Union{Nothing, Int64},Union{Nothing, Int64}}}

as opposed to

julia> function ttt2(x) 
       xx=x[1]
       yy=x[2]
       xx isa Nothing && return nothing
       yy isa Nothing && return nothing
       return (xx, yy)
       end
ttt2 (generic function with 1 method)

julia> @code_typed ttt2([1, nothing])
CodeInfo(
1 ─ %1  = (Base.arrayref)(true, x, 1)::Union{Nothing, Int64}
│   %2  = (Base.arrayref)(true, x, 2)::Union{Nothing, Int64}
│   %3  = (%1 isa Main.Nothing)::Bool
└──       goto #3 if not %3
2 ─       return Main.nothing
3 ─ %6  = (%2 isa Main.Nothing)::Bool
└──       goto #5 if not %6
4 ─       return Main.nothing
5 ─ %9  = π (%1, Int64)
│   %10 = π (%2, Int64)
│   %11 = (Core.tuple)(%9, %10)::Tuple{Int64,Int64}
└──       return %11
) => Union{Nothing, Tuple{Int64,Int64}}

Regarding monadic style: All julia style questions are relative to the compiler: if the compiler is bad at optimizing a certain coding pattern, then it cannot be good julia style; as the compiler improves, more coding styles become viable. In my experience, the compiler is currently not very good at optimizing closures, but YMMV.

2 Likes

Actually I think Julia is fantastic at optimizing closures

@inline foldlsomething(op, x) = x
@inline foldlsomething(op, x1, x2) = op(x1, x2)
@inline function foldlsomething(op, x1, x2, xs...)
    y = op(x1, x2)
    if y === nothing
        return y
    else
        return foldlsomething(op, y, xs...)
    end
end

@inline sumif(pred, xs) = foldlsomething(0, xs...) do sum, x
    if pred(x)
        sum + x
    else
        nothing
    end
end

f1() = sumif(isodd, (1, 3, 5))  # Julia const folds
f2() = sumif(isodd, (1, 3, 4))  # ditto
f3() = sumif(isodd, ntuple(i -> 2i + 1, 10))  # Julia doesn't fully const fold but LLVM does it
julia> @code_typed f1()
CodeInfo(
1 ─     return 9
) => Int64

julia> @code_typed f2()
CodeInfo(
1 ─     return
) => Nothing

julia> @code_llvm f3()

;  @ REPL[10]:1 within `f3'
define { %jl_value_t addrspace(10)*, i8 } @julia_f3_16500([8 x i8]* noalias nocapture align 8 dereferenceable(8)) {
post_union_move:
  %1 = bitcast [8 x i8]* %0 to i64*
  store i64 120, i64* %1, align 8
  ret { %jl_value_t addrspace(10)*, i8 } { %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* null to %jl_value_t addrspace(10)*), i8 2 }
}
4 Likes