Performance of dot operator on Nullables

question

#1

I was just comparing the performance of a handcoded lifted version of say addition for Nullables with the version that is in base now using the dot notation. My benchmark code looks like this:

a = Nullable(2)
b = Nullable(4)

@benchmark $a .+ $b

function my_addition{T<:Number}(a::Nullable{T}, b::Nullable{T})
    if isnull(a) || isnull(b)
        return Nullable{T}()
    else
        return Nullable(get(a)+get(b))
    end
end

@benchmark foo($a, $b)

Using the dot notation leads to performance that is about an order of magnitude worse than the hand coded version. Here are my benchmark results, first using the dot notation:

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     65.146 ns (0.00% GC)
  median time:      70.344 ns (0.00% GC)
  mean time:        75.946 ns (0.00% GC)
  maximum time:     311.660 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     987
  time tolerance:   5.00%
  memory tolerance: 1.00%

And for my hand coded version:

BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     7.546 ns (0.00% GC)
  median time:      7.849 ns (0.00% GC)
  mean time:        8.439 ns (0.00% GC)
  maximum time:     34.414 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%

Is there any hope that the dot version might become as fast as a hand coded version? Or is this a known limitation that can’t really be solved?

I was hoping that I could use the dot notation in Query.jl to deal with Nullables, but that won’t really work with this kind of performance characteristic…

PS: And yes, I’m aware of https://github.com/JuliaLang/Juleps/pull/21, but I couldn’t really get an answer to my (narrow) question here about the performance of the dot operation on Nullables.
PPS: CC @nalimilan, who probably just knows the answer :slight_smile:


#2

When https://github.com/JuliaLang/julia/pull/16961 was merged, the dot syntax was as performant as the hand-written version (actually moreso, as it avoids a branch). Something must have happened in the meantime. Can you file an issue?


#3

I looked into it and I believe it is a codegen regression/change that has caused this problem, but I can’t identify which one. I have a quick patch that will fix this regression by changing the Nullable broadcast code.

After the patch I get

julia> @benchmark Nullable(1) .+ Nullable(2)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.338 ns (0.00% GC)
  median time:      1.357 ns (0.00% GC)
  mean time:        1.452 ns (0.00% GC)
  maximum time:     14.030 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%

which I hope you’ll find more pleasant :wink:.


The PR is taking slightly longer to finish than I’d like due to some stray segmentation faults and other concerns. I should have it up for review by the end of the day.


#4

That looks fantastic!

Are there any caveats about using this for lifting in something like Query that I should know about? To me this looks as if it more or less solves the whole lifting debate we had, but I’m a bit worried that I’m overlooking something (because why did we even have https://github.com/JuliaLang/Juleps/pull/21 otherwise?).


#5

In terms of this particular syntax and performance, I’d say there aren’t really any caveats. The unfortunate thing is that we don’t have a nice way to broadcast dot calls, i.e. it’s not easy to do x ..+ y if we have Array{<:Nullable}.

The Julep proposes a nicer and more flexible lower-level representation of Nullable than the existing one and the currently recommended option is not anticipated to really change the existing broadcast methods. (Union{Some{T}, Null})

The other options that would change the semantics of dot-calls would be using Union{T, Null} and Union{T, Null{T}} to represent Nullable{T}, which are currently being actively discussed. However both these solutions are significantly more difficult changes than Union{Some{T}, Null}, so I believe that if the faster union types get into 1.0, we will start with that, and the broadcast methods will remain unchanged.


#6

Thanks, that clarifies things a lot!


#7

I should update that thanks to Jeff’s PR

this performance regression should be gone now. Thanks again for the report!