Hi, was trying to practice some simple data cleaning with Query.jl modifying a bit an example of the documentation.
What I intended to do was to replace all the missing values in a column with the mean of that column.
using Queryverse
using Statistics
df = DataFrame(a=[1,2,missing], b=[“One”,missing,“Three”])
myMean = mean(skipmissing(df.a))
q = df |> @replacena (:b=>“Unknown”, :a=>myMean) |> DataFrame
Wich trigered an error.
julia> q = df |> @replacena (:b=>“Unknown”, :a=>myMean) |> DataFrame
ERROR: UndefVarError: myMean not defined
So I decided to try using another function of Query to see if it was expected to work that way.
julia> df |> @mutate (c = myMean)
3x3 query result
a │ b │ c
────┼─────────┼────
1 │ “One” │ 1.5
2 │ #NA │ 1.5
#NA │ “Three” │ 1.5
And it did.
Finally, decided to try without using a variable and just using the function that returns the value wich I was trying to use to replace missing values with.
q = df |> @replacena (:b=>“Unknown”, :a=>mean(skipmissing(_.a))) |> DataFrame
Still got an error.
julia> q = df |> @replacena (:b=>“Unknown”, :a=>mean(skipmissing(_.a))) |> DataFrame
ERROR: UndefVarError: mean not defined
Any help will be apreciated.
oheil
August 23, 2020, 9:19am
2
I can only guess, despite your fine MWE but Queryverse doesn’t compile in my environment (python36.dll missing).
The docs
https://www.queryverse.org/Query.jl/stable/standalonequerycommands/#The-@replacena-command-1
aren’t explicit but talk only about values. So my guess is that
q = df |> @replacena(:b=>"Unknown", :a=> $myMean ) |> DataFrame
could do.
https://docs.julialang.org/en/v1/manual/metaprogramming/#man-expression-interpolation-1
Tried with
julia> q = df |> @replacena(:b=>"Unknown", :a=> $myMean ) |> DataFrame
Got
ERROR: syntax: "$" expression outside quote around C:\Users\RoniD\.julia\packages\Query\85Sw7\src\query_translation.jl:58
Also tried with
julia> q = df |> @replacena(:b=>"Unknown", :a=> "$myMean") |> DataFrame
And got another error
ERROR: UndefVarError: myMean not defined
oheil
August 23, 2020, 5:09pm
4
Ok, I had to remove my .julia to get a working Queryverse#master. This is the working syntax:
julia> using DataFrames, Queryverse, Statistics
julia> df = DataFrame(a=[1,2,missing], b=["One",missing,"Three"])
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ String? │
├─────┼─────────┼─────────┤
│ 1 │ 1 │ One │
│ 2 │ 2 │ missing │
│ 3 │ missing │ Three │
julia> myMean = mean(skipmissing(df.a))
1.5
julia> q = df |> @replacena(:b=>"Unknown", :a => :( $myMean ) ) |> DataFrame
3×2 DataFrame
│ Row │ a │ b │
│ │ Any │ Any │
├─────┼─────┼─────────┤
│ 1 │ 1 │ One │
│ 2 │ 2 │ Unknown │
│ 3 │ 1.5 │ Three │
Thanks!
It does work now.