Adding a Unitful unit to a vector with missings is slow

#1

I have a vector with a million floats, and some missing values. I want to say that it’s a vector of meters. On Julia 1.0.3:

julia> using BenchmarkTools, Unitful

julia> v = vcat(fill(1.0, 500000), missing, fill(1.0, 500000));

julia> import Base: *

julia> *(::Missing, ::Unitful.Units) = missing
* (generic function with 400 methods)

julia> @btime v .* u"m";
  60.390 ms (999502 allocations: 31.46 MiB)

julia> @btime copy(v);  # for comparison
  1.892 ms (2 allocations: 8.58 MiB)

On Julia 1.1 it should be faster, thanks to efforts by @nalimilan. However, what I would really like is to avoid the copy altogether. This should be a 5 ns operation. I thought of reinterpret, but unfortunately:

julia> reinterpret(Union{typeof(1.0u"m"), Missing}, v)
ERROR: ArgumentError: cannot reinterpret `Union{Missing, Float64}` `Union{Missing, Quantity{Float64,𝐋,FreeUnits{(m,),𝐋,nothing}}}`, type `Union{Missing, Quantity{Float64,𝐋,FreeUnits{(m,),𝐋,nothing}}}` is not a bits type
Stacktrace:
 [1] (::getfield(Base, Symbol("#throwbits#184")))(::Type{Union{Missing, Float64}}, ::Type{Union{Missing, Quantity{Float64,𝐋,FreeUnits{(m,),𝐋,nothing}}}}, ::Type{Union{Missing, Quantity{Float64,𝐋,FreeUnits{(m,),𝐋,nothing}}}}) at ./reinterpretarray.jl:16
 [2] reinterpret(::Type{Union{Missing, Quantity{Float64,𝐋,FreeUnits{(m,),𝐋,nothing}}}}, ::Array{Union{Missing, Float64},1}) at ./reinterpretarray.jl:33
 [3] top-level scope at none:0

Did I miss anything?

0 Likes

#2

I guess that’s https://github.com/JuliaLang/julia/issues/26681

0 Likes

#3

I don’t understand what you’re trying to do nor how missing is involved. Also v .* u"m" throws an error here because * isn’t defined.

0 Likes

#4

Ah, Unitful#208 hasn’t been merged yet. You just need *(::Missing, ::Unitful.Units) = missing to run the code.

Suppose you load a dataframe containing distances, in meters, along with missing values. Feather files can’t contain units, so you need to add the units after loading, hence df[:distance] = df[:distance] .* u"m".

If there’s no missing in the vector, reinterpret is glorious

julia> @btime reinterpret(typeof(1.0u"m"), v);
  51.708 ns (1 allocation: 32 bytes)
0 Likes

#5

OK. Then I’m not sure there’s a good solution. The issue you refer to is about convert, but broadcast and .* will always allocate a new vector AFAIK (changing this would break code). Technically it should be possible to get reinterpret to work for Vector{Union{T,Missing}} when T is a bits type, but there may be reasons not to support it. Maybe file a new issue.

0 Likes