Using Unitful.jl with imported dataframes?


#1

I was wondering how to most efficiently write functions that can take different units of measure.

My first approach to this would be to use a parameter in the function and if logic to switch as needed.

#A goofy example of how many iris petals needed to span a mile:

using RDatasets

iris=dataset("datasets","iris")

function how_many_in_a_mile(x,units::String="metric")
  if units=="metric"
      return 160934/x
  elseif units=="imperial"
      return 63360/x
  end
end

how_many_in_a_mile(iris[:PetalLength],"metric")

I did some googling and came across the Unitful.jl package which can leverage the type system which seems to be very close to what I am hoping to do but I can’t seem to grasp how you would use in if you are bringing in data from a dataframe.

Is there a way I can add unit type data to a dataframe so that it becomes a type parameter?

Is my thinking correct in that adding units as a type and letting dispatch do the work is better than using an if statement or maintaining two functions?
I am trying to learn to write more “Julian” but haven’t fully wrapped my head around the type system and all it can do.

#I think I should be writing like this?
function how_many_in_a_mile(x::cm)
    return 160934/x
end

function how_many_in_a_mile(x::inch)
    return 63360/x
end

#2

After spending some more time tinkering, I am able to add units to values in a dataframe (albeit in a rather hacky fashion), but cannot yet successfully figure out how to dispatch on them.

using the iris dataset:

using RDatasets
iris=dataset("datasets","iris")

for i in 1:4  #I know that the first 4 columns are cm, and the fifth column is a unitless factor variable
  iris[i]=iris[i].*[1u"cm"]
end

iris[1] #so now when we look at the data it has units.

#3

That doesn’t sound like a bad approach. You can change [1u"cm"] to u"cm" to save a few characters. For dispatch, you can do something like f(x::typeof(1u"m")) = ..., and write f.(iris[:PetalLength]).


#4

Thank you, I would not have thought to add the typeof() call in the type.

Interesting observation, the dispatch on typeof(1.0u"cm") is working,
When I just check typeof(1.0u"cm") I get the long nested arrays showing in Juno

Unitful.Quantity{Float64,Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)},Unitful.FreeUnits{(Unitful.Unit{:Meter,Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}(-2,1//1),),Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}}

but if I do print(typeof(1.0u"cm")) I get the much shorter Quantity{Float64, Dimensions:{𝐋}, Units:{cm}}

but if I try to dispatch on

function how_many_in_a_mile(x::Quantity{Dimensions:{𝐋}, Units:{cm}}) return 160934u"cm"/x end

I get a syntax warning

 syntax: { } vector syntax is discontinued
 in include_string(::String, ::String) at loading.jl:441
 in include_string(::String, ::String, ::Int64) at eval.jl:30
 in include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N}) at eval.jl:34
 in (::LastMain.Atom.##53#56{String,Int64,String})() at eval.jl:50
 in withpath(::LastMain.Atom.##53#56{String,Int64,String}, ::String) at utils.jl:30
 in withpath(::Function, ::String) at eval.jl:38
 in macro expansion at eval.jl:49 [inlined]
 in (::LastMain.Atom.##52#55{Dict{String,Any}})() at task.jl:60

which seems to be attributed to going from 0.4 to 0.5 (I am working on 0.5.2).

Irregardless I am very excited about this package as a complement to the language, this led me to reading about F# implementation of units and that seems like a really powerful ability to have in a language positioning.


#5

For the record, f(x::typeof(1.0u"cm")) requires that the user provides input in centimeters (not any length unit) and Float64, which might be overly fussy. Presumably you just want to make sure the user provides a quantity with length units? (Your function can convert it to cm if ncessary.)

A more flexible approach to dispatching is recorded here; briefly, it’s f(x::Unitful.Length).


#6

I am imagining this being used where I am certain the dimension ahead of time, so needs to dispatch on units rather than a dimension so the function behavior can change (see example in original post), but you bring up a really good point that I had not thought of where Float64 may be overly fussy.

typeof(1u"cm")==typeof(1.0u"cm") #false
typeof(u"cm")==typeof(1.0u"cm") #false
typeof(u"cm")==typeof(1u"cm") #false

I guess I would need a unit structure to let it take a Number(?) in that first spot inside the Unitful.Quantity{ } occupied by the Float64?

Unitful.Quantity{Number,Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)},Unitful.FreeUnits{(Unitful.Unit{:Meter,Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}(-2,1//1),),Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}}

I am not sure I am thinking about this correctly as I expect that first pair to evaluate to true.


#7

Yes, if you’re building a container then it’s advisable to try to give each element the same type, if possible. DataFrames is a bit of an exception, as it’s designed to hold heterogeneous types (since that’s so often necessary), but for most other container types you get major performance advantages if all elements can be of the same type. Here, that means both the Float64 and the u"cm".

However, for functions it generally (not always) makes more sense to be permissive about the inputs. Unitful poses special challenges because it’s such a sophisticated example of Julia’s type system. As you noted, Quantity has 3 type parameters: the element type (T), the dimensions (D), and the actual unit (U). To show how you might extract these quantities, let’s do a quick demo. I should acknowledge that Unitful contains its own type-manipulation utilities, some of which are far more sophisticated than these, but the point here is to build from the ground up.

So first some type utilities (using Julia 0.6 syntax):

using Unitful

# Implement the "main" operations on the types themselves
geteltype(::Type{Quantity{T,D,U}}) where {T,D,U} = T
getdimensions(::Type{Quantity{T,D,U}}) where {T,D,U} = D
getunits(::Type{Quantity{T,D,U}}) where {T,D,U} = U

# For instances, we can leverage the methods defined on types
geteltype(x::Quantity) = geteltype(typeof(x))
getdimensions(x::Quantity) = getdimensions(typeof(x))
getunits(x::Quantity) = getunits(typeof(x))

Now let’s build on these to build a utility that might be handy for dispatch:

typeof_generalized(x::Quantity) = Quantity{<:Number, getdimensions(x), getunits(x)}

and then try it out:

julia> xfloat = 1.0u"cm"
1.0 cm

julia> xint = 1u"cm"
1 cm

julia> y = 1.0u"m"
1.0 m

julia> getdimensions(xfloat)
Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}

julia> getunits(xfloat)
Unitful.FreeUnits{(Unitful.Unit{:Meter,Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}(-2, 1//1),),Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}

julia> getunits(y)
Unitful.FreeUnits{(Unitful.Unit{:Meter,Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}(0, 1//1),),Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}

julia> foo(x::typeof_generalized(1.0u"cm")) = 1
foo (generic function with 1 method)

julia> foo(xfloat)
1

julia> foo(xint)
1

julia> foo(y)
ERROR: MethodError: no method matching foo(::Quantity{Float64, Dimensions:{𝐋}, Units:{m}})
Closest candidates are:
  foo(::Unitful.Quantity{#s1,Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)},Unitful.FreeUnits{(Unitful.Unit{:Meter,Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}(-2, 1//1),),Unitful.Dimensions{(Unitful.Dimension{:Length}(1//1),)}}} where #s1<:Number) at REPL[16]:1

If you understand this, you’re now a master of Julia’s type system :wink:.