I’d like to query a row of a DataFrame and return the result as a Dict. I was able to do it but the code isn’t pretty (converting Python programmer here). Here is a MWE:
Input DataFrame:
DataFrame(name=["John", "Sally", "Roger"],
age=[54., 34., 79.],
children=[0, 2, 4])
3×3 DataFrame
│ Row │ name │ age │ children │
│ │ String │ Float64 │ Int64 │
├─────┼────────┼─────────┼──────────┤
│ 1 │ John │ 54.0 │ 0 │
│ 2 │ Sally │ 34.0 │ 2 │
│ 3 │ Roger │ 79.0 │ 4 │
Desired output:
Dict{String,Real} with 2 entries:
"age" => 54.0
"children" => 0
Here is my code:
import DataFrames, Query
df = DataFrame(name=["John", "Sally", "Roger"],
age=[54., 34., 79.],
children=[0, 2, 4])
john = @from i in df begin
@where i.name == "John"
@select {i.age, i.children}
@collect DataFrame
end
# Query the first row of the returned dataframe
john = john[1,:]
johnDict = Dict("age" => john.age, "children" => john.children)
Thanks in advance! I’m new to data-wrangling in Julia.
Dict(names(john) .=> values(john))
Why do you need a dict, though? a DataFrameRow
can do everything a dict can.
4 Likes
Well, true. Because I’m trying to learn the syntax I guess (and I hate being stumped).
Thanks for help. Is there a one-liner for the DataFrame query as well? What I have seems excessive.
Thanks again, Sam
I’m not too familiar with Query, but that seems like the correct code
You can call first
on a DataFrame to get the first row.
I don’t use Query much either but that does seem correct. You could also use Underscores.jl to pipe things from start to finish as well (you use two underscores to refer to the table). It isn’t really any shorter though.
using DataFrames, Underscores
df = DataFrame(name=["John", "Sally", "Roger"],
age=[54., 34., 79.],
children=[0, 2, 4])
john = @_ df |> filter(:name => isequal("John"),__) |> select(__,:age,:children) |> first
To add on, this is the solution using just DataFrames and Pipe
julia> df = DataFrame(name=["John", "Sally", "Roger"],
age=[54., 34., 79.],
children=[0, 2, 4])
3×3 DataFrame
│ Row │ name │ age │ children │
│ │ String │ Float64 │ Int64 │
├─────┼────────┼─────────┼──────────┤
│ 1 │ John │ 54.0 │ 0 │
│ 2 │ Sally │ 34.0 │ 2 │
│ 3 │ Roger │ 79.0 │ 4 │
julia> using Pipe
julia> @pipe df |>
filter(r -> r.name == "John", _) |>
select(_, :age, :children) |>
first(_)
DataFrameRow
│ Row │ age │ children │
│ │ Float64 │ Int64 │
├─────┼─────────┼──────────┤
│ 1 │ 54.0 │ 0 │
1 Like