I am currently playing around with Julia coming from Python. So far I am really thrilled. But today I stumbled over a bug I don’t understand.
Given I am very new to Julia I copied the entire code in as I am unsure where the bug actually sits.
With the below code I intend to take a string formula and (eventually) use it for GLM model.
However, my function parser doesn’t return anything (only a dummy function so far). Can anyone help me and explain what is wrong with the return statement in the function parser
using DataFrame, CSV, GLM
function extract(formula::String)
strY, strX = [strip(el) for el in split(formula, "~")]
y = Symbol(strY)
X = [Symbol(strip(el)) for el in split(strX, "+")]
vcat(y,X)
end;
function parser(cs::Array{Symbol,1}, df::DataFrame)
df[[cs[1]]], df[cs[2:end]]
end
df = CSV.read("Credit.csv")
cols = extract("Rating ~ Income + Education")
parser(cols, df) # Does not return anything
I tried your code with a mock Dataframe and it appears to do what you want:
using DataFrames
function extract(formula::String)
strY, strX = [strip(el) for el in split(formula, "~")]
y = Symbol(strY)
X = [Symbol(strip(el)) for el in split(strX, "+")]
vcat(y,X)
end;
df = DataFrame(Rating = 1:4, Income = ["M", "F", "F", "M"], Education = 6:9)
4×3 DataFrame
│ Row │ Rating │ Income │ Education │
│ │ Int64 │ String │ Int64 │
├─────┼────────┼────────┼───────────┤
│ 1 │ 1 │ M │ 6 │
│ 2 │ 2 │ F │ 7 │
│ 3 │ 3 │ F │ 8 │
│ 4 │ 4 │ M │ 9 │
julia> function parser(cs::Array{Symbol,1}, df::DataFrame)
df[[cs[1]]], df[cs[2:end]]
end
parser (generic function with 1 method)
julia> cols = extract("Rating ~ Income + Education")
3-element Array{Symbol,1}:
:Rating
:Income
:Education
julia> parser(cols, df)
(4×1 DataFrame
│ Row │ Rating │
│ │ Int64 │
├─────┼────────┤
│ 1 │ 1 │
│ 2 │ 2 │
│ 3 │ 3 │
│ 4 │ 4 │, 4×2 DataFrame
│ Row │ Income │ Education │
│ │ String │ Int64 │
├─────┼────────┼───────────┤
│ 1 │ M │ 6 │
│ 2 │ F │ 7 │
│ 3 │ F │ 8 │
│ 4 │ M │ 9 │)
One thing I noticed is that you are using DataFrame instead of DataFrames is that intentional? Do you see any error? what is the output of the 2 lines previous to your call to parser?
The type is Tuple{DataFrame,DataFrame}. It appears as if the output is just not properly formatted. If I use the output in a different cell the output is actually correct then.
Cell 1:
y, X = parser(cols, df) # output is (, )
Cell 2:
y # output is the correct dataframe
Cell 3
X # output is the correct dataframe
Just unsure if it is something I do wrong or if it is actually a bug with Pluto (which is by the way the reason I am playing around with Julia).
@pdeffebach: yes, I know that I don’t need a string. My idea was to use a similar functionality/logic such as patsy in python or glm in R where you can pass a string formula to your df/model. Thanks for the link, anyway. I will have a look at this.
I should mention that Pluto is the killer feature that had me test out Julia in the first place. I was forwarded the presentation on Pluto and was thrilled immediately. I still am.