Why does multiple return not work?

Hi all,

I am currently playing around with Julia coming from Python. So far I am really thrilled. But today I stumbled over a bug I don’t understand.

Given I am very new to Julia I copied the entire code in as I am unsure where the bug actually sits.

With the below code I intend to take a string formula and (eventually) use it for GLM model.

However, my function parser doesn’t return anything (only a dummy function so far). Can anyone help me and explain what is wrong with the return statement in the function parser

using DataFrame, CSV, GLM

function extract(formula::String)
	strY, strX = [strip(el) for el in split(formula, "~")]
	y = Symbol(strY)
	X = [Symbol(strip(el)) for el in split(strX, "+")]
	vcat(y,X)
end;

function parser(cs::Array{Symbol,1}, df::DataFrame)
	df[[cs[1]]], df[cs[2:end]]
end

df = CSV.read("Credit.csv")
cols = extract("Rating ~ Income + Education")
parser(cols, df) # Does not return anything

Thanks

I tried your code with a mock Dataframe and it appears to do what you want:

using DataFrames

function extract(formula::String)
	strY, strX = [strip(el) for el in split(formula, "~")]
	y = Symbol(strY)
	X = [Symbol(strip(el)) for el in split(strX, "+")]
	vcat(y,X)
end;


df = DataFrame(Rating = 1:4, Income = ["M", "F", "F", "M"], Education = 6:9)
4×3 DataFrame
│ Row │ Rating │ Income │ Education │
│     │ Int64  │ String │ Int64     │
├─────┼────────┼────────┼───────────┤
│ 1   │ 1      │ M      │ 6         │
│ 2   │ 2      │ F      │ 7         │
│ 3   │ 3      │ F      │ 8         │
│ 4   │ 4      │ M      │ 9         │

julia> function parser(cs::Array{Symbol,1}, df::DataFrame)
               df[[cs[1]]], df[cs[2:end]]
       end
parser (generic function with 1 method)

julia> cols = extract("Rating ~ Income + Education")
3-element Array{Symbol,1}:
 :Rating
 :Income
 :Education

julia> parser(cols, df)
(4×1 DataFrame
│ Row │ Rating │
│     │ Int64  │
├─────┼────────┤
│ 1   │ 1      │
│ 2   │ 2      │
│ 3   │ 3      │
│ 4   │ 4      │, 4×2 DataFrame
│ Row │ Income │ Education │
│     │ String │ Int64     │
├─────┼────────┼───────────┤
│ 1   │ M      │ 6         │
│ 2   │ F      │ 7         │
│ 3   │ F      │ 8         │
│ 4   │ M      │ 9         │)

One thing I noticed is that you are using DataFrame instead of DataFrames is that intentional? Do you see any error? what is the output of the 2 lines previous to your call to parser?

1 Like

Couple of things, that aren’t answering your question, but maybe help you later.

  1. CSV.read should error out or you are using very old version. It would make sense to upgrade it.
  2. Functions are overspecialized. There is no need in doing it, unless you know exactly why.
  3. You can get rid of list comprehensions in favor of broadcasting.
  4. New version of DataFrames support String variables as a column name and do not support vector arguments.

So, your code can look like

using DataFrames, CSV, GLM

function extract(formula)
	strY, strX = strip.(split(formula, "~"))
	X = strip.(split(strX, "+"))
	vcat(y,X)
end;

function parser(cs, df)
	df[!, [cs[1]]], df[!, cs[2:end]]
end

df = CSV.read("Credit.csv", DataFrame)
cols = extract("Rating ~ Income + Education")
parser(cols, df)
2 Likes

Thanks to both of you for your help. Very helpful.

I believe I found the bug. Running the file as a .jl file works. Running the code in Pluto it returns an empty tuple. Any idea why this could be?

What do you mean by empty tuple?

It just returns ( , ) .

What’s the type of the object? typeof(parser(cols, df)

I could have thought of that myself…thanks!

The type is Tuple{DataFrame,DataFrame}. It appears as if the output is just not properly formatted. If I use the output in a different cell the output is actually correct then.

Cell 1:
y, X = parser(cols, df) # output is (, )

Cell 2:
y # output is the correct dataframe

Cell 3
X # output is the correct dataframe

Just unsure if it is something I do wrong or if it is actually a bug with Pluto (which is by the way the reason I am playing around with Julia).

I think it’s just printing. Check julia --version when you are running Pluto vs when running as include. Same with import Pkg; Pkg.status()

Not sure what you mean. What is it I should exactly check?

My Julia version is 1.5.3. (type julia --version in terminal).

Against what should I compare this?

Hmmm I’m not totally sure. My impression is that this is just some edge case from printing that changed between versions. But I’m not sure.

BTW, you probably don’t need to be parsing a formula string at all. You can construct a formula programmatically with Symbols.

You can also access the matrices created from that formula via modelcols(f.rhs, df) and modelcols(f.lhs) where f is a Formula.

Thanks to all of you for your help!

@pdeffebach: yes, I know that I don’t need a string. My idea was to use a similar functionality/logic such as patsy in python or glm in R where you can pass a string formula to your df/model. Thanks for the link, anyway. I will have a look at this.

It looks like (, ) is Pluto visualisation bug, I’ve opened issue: https://github.com/fonsp/Pluto.jl/issues/718

Until it is fixed, you can use PlutoUI.jl functions Print or with_terminal as it is described in this issue.

Sorry!

Why sorry?

I should mention that Pluto is the killer feature that had me test out Julia in the first place. I was forwarded the presentation on Pluto and was thrilled immediately. I still am.

3 Likes

:revolving_hearts: Good to hear!

I fixed the issue in Pluto 0.12.12!

2 Likes

Cool. Thanks a lot! :smiley: