DataFrames.jl (Tables.jl) is smart enough to consume my array of custom types...love it

First, I want to say that I love that this just works:

using DataFrames
using HTTP
using JSON
using Unmarshal

const appid = # my appid  
const url = "https://api.openweathermap.org/data/2.5/onecall"

params = Dict(
    "appid" => appid,
    "lat" => 30.33,
    "lon" => -87.14,
    "units" => "imperial",
    "exclude" => "minutely,hourly,alerts,current"
)

struct Temp
    min::Float64
    max::Float64
end

struct DayWeather
    temp::Temp
    humidity::Int64
    wind_speed::Float64
end

struct Response
    lat::Float64
    lon::Float64
    daily::Vector{DayWeather}
end

function fetch_weather(url, params)
    res = HTTP.get(
        url,
        ["Content-Type" => "application/json"],
        query=params
    )
    return unmarshal(Response, JSON.parse(String(res.body)))
end

julia> df = DataFrame(fetch_weather(url, params).daily)
8Γ—3 DataFrame
 Row β”‚ temp                humidity  wind_speed 
     β”‚ Temp                Int64     Float64    
─────┼──────────────────────────────────────────
   1 β”‚ Temp(78.42, 84.4)         76        9.91
   2 β”‚ Temp(78.04, 87.13)        57       10.71
   3 β”‚ Temp(79.18, 87.04)        58       13.42
   4 β”‚ Temp(80.85, 86.72)        61       11.41
   5 β”‚ Temp(81.84, 87.55)        61       12.06
   6 β”‚ Temp(82.51, 87.15)        63        9.51
   7 β”‚ Temp(82.36, 87.87)        58       11.32
   8 β”‚ Temp(82.51, 87.82)        60       10.42

DataFrames just knows how to consume my custom struct without me having to implement the Tables.jl interface… :slightly_smiling_face: :clap:

I do have a question though about how to design this code…I would like the temp field in my DayWeather struct to just hold the max temp from my Temp struct. Obviously I can’t define it like this:

struct Temp
    min::Float64
    max::Float64
end

struct DayWeather
    temp::Temp.max # this won't work
    humidity::Int64
    wind_speed::Float64
end

so what’s the proper way to go about this?

2 Likes

I think what you want is

struct Temp
    min::Float64
    max::Float64
end

struct DayWeather
    temp:Float64
    humidity::Int64
    wind_speed::Float64
end
DayWeather(t::Temp, h, w) = DayWeather (t.max, h, w)
1 Like

This doesn’t appear to work:

struct Temp
    min::Float64
    max::Float64
end

struct DayWeather
    temp::Temp
    humidity::Int64
    wind_speed::Float64
end

DayWeather(t::Temp, h, w) = DayWeather(t.max, h, w)

julia> day = DayWeather(Temp(1.0,2.0), 1, 1.0)
DayWeather(Temp(1.0, 2.0), 1, 1.0)

EDIT: this does work:

struct Temp
    min::Float64
    max::Float64
end

struct DayWeather
    temp::Float64 # <--- had to change this
    humidity::Int64
    wind_speed::Float64
end

DayWeather(t::Temp, h, w) = DayWeather(t.max, h, w)

julia> day = DayWeather(Temp(1.0,2.0), 1, 1.0)
DayWeather(2.0, 1, 1.0)
2 Likes

Uh oh, this breaks the ability to unmarshal from the JSON response to my custom structs though:

julia> df = DataFrame(fetch_weather(url, params).daily)
ERROR: MethodError: no method matching Float64()
Closest candidates are:
  (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at char.jl:50
  (::Type{T})(::Base.TwicePrecision) where T<:Number at twiceprecision.jl:243
  (::Type{T})(::Complex) where T<:Real at complex.jl:37
  ...
Stacktrace:
 [1] unmarshal(DT::Type, parsedJson::Dict{String, Any}, verbose::Bool, verboseLvl::Int64)
   @ Unmarshal ~\.julia\packages\Unmarshal\pNr0C\src\Unmarshal.jl:135

:pensive:

For now, I’ll just do this:

df = DataFrame(fetch_weather(url, params).daily)

df.temp = [df.temp[i].max for i in 1:length(df.temp)]

I would maybe make two structs, one for holding the stuff, which includes all the nesting, and another for what you actually want to work with, which just examines the day.

Also, I would question why you need temp to be a property in the struct, why not do maxtemp(d::DayTemp) = max(d.temp) or similar.

Is the issue you really want the automatic destructuring to work?

1 Like

Yes, was hoping to be able to automatically deserialize the JSON response to my custom types and the JSON response that I get back from the API is like this:

{
    .
    .
    .
    "humidity": 76,
    "dew_point": 74.16,
    "wind_speed": 9.91,
    "temp": {
       "max": 87.5,
       "min": 77.6
     },
    .
    .
    .
}

Is there any issue in doing it this way (inspired by this post):

DayWmax(t::DayWeather) = (tmax = t.temp.max, humidity = t.humidity, windspeed = t.wind_speed)  # assessor function

df = DataFrame(DayWmax.(fetch_weather(url, params).daily))

8Γ—3 DataFrame
 Row β”‚ tmax     humidity  windspeed 
     β”‚ Float64  Int64     Float64   
─────┼──────────────────────────────
   1 β”‚   83.97        74       7.9
   2 β”‚   86.41        58       9.42
   3 β”‚   86.36        64      12.71
   4 β”‚   84.34        67      11.34
   5 β”‚   85.37        69       9.69
   6 β”‚   86.45        66       9.51
   7 β”‚   87.64        59       9.42
   8 β”‚   87.93        60      10.85
1 Like

Besides not getting column names (which are easy enough to add), that’s a nice solution :slightly_smiling_face:

1 Like

Fixed above?

1 Like

Maybe this works?

julia> struct B
           bx
           by
       end;

julia> struct A
           ax::Int
           ay::B
       end;

julia> nested_vec = [A(rand(1:5), B(rand(), rand())) for i in 1:10];

julia> df = DataFrame(nested_vec)
10Γ—2 DataFrame
 Row β”‚ ax     ay                       
     β”‚ Int64  B                        
─────┼─────────────────────────────────
   1 β”‚     3  B(0.522192, 0.260377)
   2 β”‚     4  B(0.956823, 0.148446)
   3 β”‚     1  B(0.78614, 0.448788)
   4 β”‚     2  B(0.806301, 0.886143)
   5 β”‚     5  B(0.00277471, 0.0113921)
   6 β”‚     3  B(0.268688, 0.267194)
   7 β”‚     1  B(0.282506, 0.515099)
   8 β”‚     1  B(0.380532, 0.981713)
   9 β”‚     1  B(0.438347, 0.931338)
  10 β”‚     4  B(0.88685, 0.947232)

julia> transform(df, :ay => Tables.columns => AsTable)
10Γ—4 DataFrame
 Row β”‚ ax     ay                        bx          by        
     β”‚ Int64  B                         Float64     Float64   
─────┼────────────────────────────────────────────────────────
   1 β”‚     3  B(0.522192, 0.260377)     0.522192    0.260377
   2 β”‚     4  B(0.956823, 0.148446)     0.956823    0.148446
   3 β”‚     1  B(0.78614, 0.448788)      0.78614     0.448788
   4 β”‚     2  B(0.806301, 0.886143)     0.806301    0.886143
   5 β”‚     5  B(0.00277471, 0.0113921)  0.00277471  0.0113921
   6 β”‚     3  B(0.268688, 0.267194)     0.268688    0.267194
   7 β”‚     1  B(0.282506, 0.515099)     0.282506    0.515099
   8 β”‚     1  B(0.380532, 0.981713)     0.380532    0.981713
   9 β”‚     1  B(0.438347, 0.931338)     0.438347    0.931338
  10 β”‚     4  B(0.88685, 0.947232)      0.88685     0.947232
1 Like

To be fair, the credit for this behaviour goes to Tables.jl, which considers any vector of structs to be a table with columns being fields.

2 Likes

That’s very nice. I assumed I would have to implement the Tables interface for my structs but before I went down that road I decided to just try passing my vector into DataFrame and I immediately felt as if I’d just stumbled upon a $100 bill walking down the street lol…was very happy that it just worked…I’m really glad the Tables.jl devs made that decision!