@formula for Lathe.preprocess UndefVarError

Hello Everyone:

I am attempting to perform a logistic regression and
for some reason the @formula macro, is not reading.


using GLM
using Lathe: preprocess


fm = @formula(Outcome ~  Sesh1 + Shes2 + Sesh3)
logit = glm(fm, train, Binomial(), ProbitLink())

I attempted to prefix @formula and there are
no package conflict warnings. One thing, I have
been getting some deprecation warnings during
my Lathe install. But for other sessions I have
been able to use Lathe package and modules.

Any suggestions?

Could you turn this into an MWE? Or at a minimum say what you mean by “the formula macro is not reading”?

Sure

I am working with the following variables:

ST = ["Sesh1","Sesh2","Sesh3"]
RT = [1,2,3]
Avail = ["Y", "N"]

The data frame I am working with is is structure similar to:

DBH = DataFrame(Id = 1:400, 
		    SessionTypes = rand(ST, 400),
		     RequestTypes = rand(RT, 400), 
		     Available = rand(Avail, 400))

Then am encountering the error when:

fm = @formula(Available ~  SessionTypes + RequestTypes)
logit = glm(fm, train, Binomial(), ProbitLink())

Okay first of all it’d be good if you had a look at Please read: make it easier to help you to get some tips on how to most effectively ask for help on here - your example isn’t exactly minimal I suppose if your error is related to @formula. Also, what is the actual error you’re seeing?

Your problem is that DBH.Available is a vector of Strings, not booleans. You can do

DBH.Available = DBH.Available .== "Y"

and your code will work.

GLM should definitely throw a better error in this scenario, though.

Thank you @pdeffebach

Yes – I pre-fixed @ formula before your
suggestion and received a similar error
message. Am waiting for my REPL to
load up in my browser.

Would you apply these instructions in the
first block as:

Avail.== "Y"

OR

Avail .== ["Y","N"]

If so, the output will vector will auto-assign
“Y” as 1 or else 0?

Or block two as:

DBH = DataFrame(Id = 1:400, 
		    SessionTypes = rand(ST, 400),
		     RequestTypes = rand(RT, 400), 
		    Available = rand(Available .== "Y")

I want to make sure the DBH.Available vectors are random.

Thank you,

Definitely this one. The 2nd one will throw an error (which you will soon realize as you try running the command…)

You can also do it on the DataFrame construction.

@pdeffebach Appreciated.

What is the logic behind this, how
does the system know to treat “Y”
as a bool? Or some other wizardry
at play?

It sounds like you could use a tutorial. or read through the Julia docs. It’s just checking for equality with the string "Y", which returns a Bool, as it does in all other languages.

Please read the documentation so you have a better idea what’s going on!

1 Like

Okay.

From your review, which you may
have gotten from HERE the instantiation
and data frame construction should
involve these steps:

ST = ["Sesh1","Sesh2","Sesh3"]
RT = [1,2,3]
Avail .== "Y" 

Then,

DBH = DataFrame(Id = 1:400, 
		    SessionTypes = rand(ST, 400),
		     RequestTypes = rand(RT, 400), 
		     Available = rand(Avail, 400))

The output should replace the “Y” with 1 and
any other value with 0.

I am still wondering, after reviewing the docs, at
least on Strings… how the 0 is assigned randomly.
We did not in this example instantiate “N”.

Avail .== "Y" 

Again you need to try your code out. This will result in an error without Avail being previously defined.

@pdeffebach Understood.

The way you worded what you said previously,
made me think in the FIRST block you were
instantiating Avail as:

Avail .=="Y"

Instead of:

Avail = ["Y","N"]

Then I realized you meant to apply
Avail .==“Y” to the SECOND block as:

DBH = DataFrame(Id = 1:400, 
		    SessionTypes = rand(ST, 400),
		     RequestTypes = rand(RT, 400), 
		    Available = rand(Available .== "Y", 400)

Now I understand why the equality works and
before thought it was some obscure instructions.

Thanks.

The error read:

no method matching fit(::Type{GLM.GeneralizedLinearModel}, ::Matrix{Float64}, ::Matrix{Float64}, ::Distributions.Binomial{Float64}, ::GLM.ProbitLink)