Inconsistent (unexpected?) behavior in elementwise operators


#1

I have a ratings variable:

ratings = 10×3 DataFrames.DataFrame
│ Row │ Movie title                         │ Dean │ Sam │
├─────┼─────────────────────────────────────┼──────┼─────┤
│ 1   │ "Moonlight (2016)"                  │ 10   │ 0   │
│ 2   │ "Zootopia (2016)"                   │ 0    │ 0   │
│ 3   │ "Arrival (2016)"                    │ 10   │ 10  │
│ 4   │ "Hell or High Water (2016)"         │ 10   │ 0   │
│ 5   │ "La La Land (2016)"                 │ 9    │ 0   │
│ 6   │ "The Jungle Book (2016)"            │ 2    │ 0   │
│ 7   │ "Manchester by the Sea (2016)"      │ 8    │ 0   │
│ 8   │ "Finding Dory (2016)"               │ 4    │ 0   │
│ 9   │ "Captain America: Civil War (2016)" │ 6    │ 9   │
│ 10  │ "Moana (2016)"                      │ 0    │ 0   │

I then apply some elementwise filtering to it:

recommended_movies = ratings[Array(ratings[:Dean] .> 7) .& Array(ratings[:Sam] .== 0), :]

The above code, where I explicitly use 7 works as expected.

However, if I do const minimum_rating = 7 and then

recommended_movies = ratings[Array(ratings[:Dean] .> minimum_rating) .& Array(ratings[:Sam] .== 0), :]

it fails with

ERROR: LoadError: MethodError: no method matching isless(::Int64, ::Nullable{Int64})
Closest candidates are:
  isless(!Matched::Nullable{Union{}}, ::Nullable) at nullable.jl:235
  isless(!Matched::DataArrays.NAtype, ::Any) at /Users/adrian/.julia/v0.6/DataArrays/src/operators.jl:383
  isless(::Real, !Matched::AbstractFloat) at operators.jl:97

:zipper_mouth_face:

It seems to me that this is the kind of refactoring that should Just Work™


#2

It’s really subtle, but Julia does actually treat those two cases differently. What’s happening is that the built-in broadcasting mechanism actually inlines literals into the function being broadcast. So, for example, if you do:

f.(x, y)

that turns into broadcast(f, x, y), but if you do:

f.(x, 3)

it’s turned into something like broadcast(z -> f(z, 3), x). You can see this by comparing expand(:(f.(x, y))) to expand(:(f.(x, 3))).

That said, I can’t actually reproduce the result you’ve shown (at least without knowing exactly how you constructed your data frame and which version of DataFrames this is). But my guess is that you’re essentially asking for two levels of broadcasting: one for the elementwise operation and one for the unwrapping of Nullables, but somehow due to the specifics of broadcasting in your version of DataFrames, the inlining of the literal 7 lets the first case work.

In general, elementwise comparison against a vector of Nullables doesn’t work:

julia> x = Nullable.(1:3)
3-element Array{Nullable{Int64},1}:
 1
 2
 3

julia> x .> 2
ERROR: MethodError: no method matching isless(::Int64, ::Nullable{Int64})
Closest candidates are:
  isless(::Nullable{Union{}}, ::Nullable) at nullable.jl:235
  isless(::Missings.Missing, ::Any) at /Users/rdeits/.julia/v0.6/Missings/src/Missings.jl:77
  isless(::Real, ::AbstractFloat) at operators.jl:97
  ...
Stacktrace:
 [1] (::##29#30)(::Nullable{Int64}) at ./<missing>:0
 [2] broadcast_t(::Function, ::Type{Any}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::Array{Nullable{Int64},1}) at ./broadcast.jl:256
 [3] broadcast_c at ./broadcast.jl:319 [inlined]
 [4] broadcast(::Function, ::Array{Nullable{Int64},1}) at ./broadcast.jl:434

Instead, we have to add another layer of broadcasting:

julia> (z -> z .> 2).(x)
3-element Array{Nullable{Bool},1}:
 false
 false
 true

Fortunately, this is a situation that will be resolved by the shift from Nullable{T} to Union{T, Missing}:

julia> x = Union{Int, Missing}[1:3...]
3-element Array{Union{Int64, Missings.Missing},1}:
 1
 2
 3

julia> x .> 2
3-element BitArray{1}:
 false
 false
  true

julia> x[2] = missing
missing

julia> x .> 2
3-element Array{Any,1}:
 false
      missing
  true

Note that the last result is an Array{Any} because I’m on Julia v0.6.1 which lacks the Union optimizations of Julia master.


#3

For what it’s worth, I strongly suggest you upgrade to DataFrames 0.11.1. This is exactly the sort of thing that is a non-issue in the new version.


#4

Thank you, that is quite some interesting insight, very instructive. I’m on DataFrames 0.10.1 still.

Just out of curiosity, Is the Missing type defined by DataFrames or by Julia itself (like Nullable)? Are there plans for dropping Nullable in favor of Union and Missing at language level?


#5

Thank you, yes, I’m planning to - however, the recommendation upon the release was to install from scratch, per the release announcement:

Also note that until all packages on your local installation have been ported to DataFrames 0.11.0, they will keep requiring version 0.10.1, and the package manager will not update DataFrames to version 0.11.0. If removing the problematic dependencies is not an option, you can use a separate Julia package directory to test the new framework: just set the JULIA_PKGDIR before starting Julia, and run Pkg.add(“DataFrame”).

On my current setup, it wasn’t updated yet and I’m reluctant to lose dependent packages that haven’t been updated yet.


#6

Missing is defined in Missings.jl but is re-exported by the new DataFrames, so using DataFrames will bring it into scope. And yes, Nullable is likely to be moved out of Base: https://github.com/JuliaLang/julia/issues/22682 (but will continue to live on in Nullables.jl) for anyone who still needs that type.


#7

Got it, thank you