Very basic GLM help

HelgavonLichtenstein · November 9, 2020, 11:30am

Hello, I have been trying to learn how to use GLM.jl but I really don’t understand it. My goal is to show a linear model on a plot. I know how to create the base plot, but not the linear model or how to plot that onto the plot. Please help me - I really need baby step instructions with this it seems. (or with Algebra of Graphics, I’m not picky, I just need to learn how to do it with something)

df = DataFrame(A = 1:2:1000, B = repeat(1:10, inner=50), C = 1:500)
lm(@formula(:A ~ :C),df)  #doesn't work

Rudi79 · November 9, 2020, 12:58pm

I don’t know too much about linear models, but following the manual of GLM you should try:

df = DataFrame(A = 1:2:1000, B = repeat(1:10, inner=50), C = 1:500)
lm(@formula(A ~ C),df)

note the missing : in front of the variables in formula.
But if you are only interested in linear regression most of the plotting options can do that for you e.g.,
http://gadflyjl.org/stable/gallery/statistics/#[Stat.smooth](@ref)-1
or
https://www.queryverse.org/VegaLite.jl/stable/examples/examples_advancedcalculations/#Linear-Regression-1

mthelm85 · November 9, 2020, 2:35pm

As @Rudi79 points out, the error with your linear model is with the syntax and if all you want is to add a regression line, you can use the smooth keyword argument in Plots.jl, for example. Have a look at this example:

using DataFrames
using GLM
using StatsPlots

x = rand(10:0.1:30, 30)
f(x) = x + rand(-5:0.1:5)
df = DataFrame(y = f.(x), x = x)

scatter(df.x, df.y, legend=false, smooth=true)

That being said, you stated you are trying to learn GLM.jl, so you can also do something like this:

ols = lm(@formula(y ~ x), df)

You can then get the coefficients by calling coef(ols) which you can use to plot in a couple of different ways:

# create your own function from your coefficients:
model(x) = coef(ols)[1] + coef(ols)[2] * x
scatter(df.x, df.y)
plot!(x, model.(x), legend=false)

# make use of linear algebra to compute all the y values:
scatter(df.x, df.y)
plot!(x, (coef(ols)' * hcat(ones(size(df,1)), x)')', legend=false)

Or, as @nilshg points out:

scatter(df.x, df.y)
plot!(x, predict(ols, DataFrame(x = x)), legend=false)

I’ve personally always found it very annoying that you have to pass in some Table type to the predict function but it is of course still more convenient than the two alternatives I showed.

nilshg · November 9, 2020, 2:46pm

Note that this basically re-invents the predict function in GLM, which can also give you confidence or prediction intervals. Cf this thread here: Plot the confidence interval for a model fit - #4 by nilshg

HelgavonLichtenstein · November 9, 2020, 2:47pm

Thank you for your reply, removing : let it calculate the formula. I thought y and x could be replaced by column headings.

Now I am trying to plot it but have difficulties. Basically I want to do is what is done in R with the commandsgeom_smooth(method = "lm") and geom_smooth(method = "loess").

Following the example here

df = DataFrame(A = rand(500), B = repeat(1:10, inner=50), C = 1:500)
linear_model_df = lm(@formula(A ~ C),df)
pred = DataFrame(df(C)); #this does not work
pred.y = predict(model, pred)

Edit: I guess I am confused when to use x and y and when to use column headings.

Does one always need to state the intervals for the x axis?

nilshg · November 9, 2020, 2:54pm

It seems to me that your problems come from a more fundamental misunderstanding of Julia’s syntax rather than the specifics of GLM or Plots.jl (or any other plotting library).

In the above, you are calling DataFrame(df(C)), which probably should be DataFrame(C = df.C). In Julia, round brackets () denote function calls, so df(C) means “call function df with argument C”, which doesn’t really make sense as df isn’t a function.

HelgavonLichtenstein · November 9, 2020, 5:13pm

Thank you for your replies. I acknowledge that I have only basic understanding of programing syntax and often make mistakes. I will continue to try and learn.

df = DataFrame(A = rand(500), B = repeat(1:10, inner=50), C = 1:500)
linear_model_df = lm(@formula(A ~ C),df)
pred = DataFrame(C = 1:500);
pred.y = predict(linear_model_df, pred)

plot(df.C,df.A)
plot!(pred.C,pred.y)

Topic		Replies	Views
The simplest linear fit with GLM Tooling glm	13	5287	November 11, 2021
Linear regression without the intercept term Statistics question , regression , fit , glm	7	4147	March 8, 2023
How to pass a function in a variable inside a linear regression? New to Julia regression , glm	2	628	July 28, 2020
Generating the equation of a curve across attributes New to Julia dataframes , regression	7	512	October 29, 2021
Iteration over 10 columns to calculate LinearRegression Machine Learning	4	377	March 27, 2021

Very basic GLM help

Related topics