How to create a DataFrame from code literals, especially with Unitful?

Hi all! I’m working with some physical quantities and I’d like to use Unitful.jl for that. I am wondering two things:

  • what is the most easily maintainable, and/or efficient way to crate a DataFrame from code literals (i.e. straight from Julia code, no csv or imports)?
  • how to do this when a column has a type of some Quantity (for instance, u"cm")?
  • bonus: how to avoid the type ending up having Unions or optional (like Float64?, iiuc?) values

As an example, I managed to do this:

df = DataFrame(Time = String[], Measure = Float64[])
push!(df, ["09:29", 1.1])
push!(df, ["09:30", 1.2])
push!(df, ["09:31", 1.1])
# [...]
df[!, :Measure] .*= u"cm"

but i feel like there is a better way. Thanks!

1 Like

I’m not sure if I really understand the question you’re asking, but is something like this helpful?

julia> using DataFrames, Unitful 

julia> df = DataFrame([(; Time="09:29", Measure=1.1u"cm"),
                       (; Time="09:30", Measure=1.2u"cm"),
                       (; Time="09:31", Measure=1.1u"cm")])
3×2 DataFrame
 Row │ Time    Measure   
     │ String  Quantity… 
─────┼───────────────────
   1 │ 09:29      1.1 cm
   2 │ 09:30      1.2 cm
   3 │ 09:31      1.1 cm
3 Likes

Expanding on @Mason’s answer, doing the stacking the other way around also works,

df = DataFrame(
    Time = ["09:29", "09:30", "09:31"],
    Measure = [1.1u"cm", 1.2u"cm", 1.1u"cm"]
)

3×2 DataFrame
 Row │ Time    Measure   
     │ String  Quantity… 
─────┼───────────────────
   1 │ 09:29      1.1 cm
   2 │ 09:30      1.2 cm
   3 │ 09:31      1.1 cm
2 Likes

Yeah, lots of ways to skin the cat here. You can also avoid repeating the names if you want:

julia> DataFrame([
               ("09:29", 1.1u"cm"),
               ("09:30", 1.2u"cm"),
               ("09:31", 1.1u"cm")
           ],
           ["Time", "Measure"])
3×2 DataFrame
 Row │ Time    Measure
     │ String  Quantity…
─────┼─────────────────────
   1 │ 09:29        1.1 cm
   2 │ 09:30        1.2 cm
   3 │ 09:31        1.1 cm
1 Like

Yes you understood correctly. Thank you for your answer, i wasn’t aware of this DataFrame constructor. I was wondering if this approach however could be made more efficient, given that it creates a new NamedTuple for each data entry. But on the other hand, i guess the compiler can optimize this away?

Yes that’s right, the compiler will optimize away the NamedTuple construction, although now that you mention it, it wont optimize away the splitting of the array into an array of Time and an array of Measure properties, so from that point of view, @JonasWickman’s suggestion is likely more efficient.

1 Like

By the way, your times can also be properly semantic objects with units if you use the Dates standard library:

julia> using Dates, DataFrames, Unitful

julia> df = DataFrame(
           Time = [Time("09:29"), Time("09:30"), Time("09:31")],
           Measure = [1.1u"cm", 1.2u"cm", 1.1u"cm"]
       )
3×2 DataFrame
 Row │ Time      Measure   
     │ Time      Quantity… 
─────┼─────────────────────
   1 │ 09:29:00     1.1 cm
   2 │ 09:30:00     1.2 cm
   3 │ 09:31:00     1.1 cm

and then you can do things like e.g.

julia> diff(df.Time)
2-element Vector{Nanosecond}:
 60000000000 nanoseconds
 60000000000 nanoseconds

Yes I think the same, anyway thanks!

@JonasWickman probably your solution is the most efficient, but I think keeping it row first helps readability given that I want to check these data from time to time, anyway I’ll keep that in mind, thanks.

1 Like

Didn’t realized this either, thanks! I guess it’s a matter of wether to keep the schema in the tuples (more robust if you accidentally invert some data), or as an external parameter (that you can share among different DataFrames to make sure they all have a consistent interface). Actually I would have expected to be a constructor that accepts the header name first, and then the data, given that is the order in which you normally see it listed. Possibly i can think of a macro to do provide names, types and data all at once, since i use this style quite a lot

1 Like