GLM: "no method matching fit"

,

Hey,

maybe a dumb question but I get the following error trying to run a linear model using the GLM package via StatsKit 0.3.0

MethodError: no method matching fit(::Type{LinearModel}, ::Array{Float64,2}, ::Array{Float64,2}, ::Bool)

I can’t detect anything at odds with the example in the GLM documentation. After saving the DataFrame as a CSV and loading it into R, the model runs smoothly. Any idea what causes the error? Thanks!

data = DataFrame(A = array_a, B = array_b)

│ Row │   A   │   B   │
│     │  Any  │  Any  │
├─────┼───────┼───────┤
│ 1   │ 19    │  22.5 │
│ 2   │ 34    │  23.5 │
│ 3   │ 70    │  24.2 │
│ 4   │ 122   │  22.1 │
│ ... │ ...   │  ...  │

lm(@formula(A ~ B), data)

Please post the code you are trying to run.

As Kristoffer says, an MWE would be helpful here. The following works for me:

julia> using GLM, DataFrames

julia> data = DataFrame(A = rand(1:122, 50), B = rand(22.1:0.1:22.5, 50))
50×2 DataFrame
│ Row │ A     │ B       │
│     │ Int64 │ Float64 │
├─────┼───────┼─────────┤
│ 1   │ 4     │ 22.1    │
│ 2   │ 117   │ 22.5    │
...

julia> lm(@formula(A ~ B), data)
A ~ 1 + B

Coefficients:
──────────────────────────────────────────────────────────────────────────────
              Estimate  Std. Error    t value  Pr(>|t|)   Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept)  -470.991     882.242   -0.533857    0.5959  -2244.86      1302.87
B              23.7686     39.5652   0.600744    0.5508    -55.7826     103.32
──────────────────────────────────────────────────────────────────────────────

(modulo some DataFrames deprecation warnings currently)

Thanks for your reply. This is the exact code (which is equivalent to the code posted above if I’m not mistaken).

using StatsKit

df_new = DataFrame(visitors = markers_visitors, temp = markers_temp)
lm(@formula(visitors ~ temp), df_new)

I tried your code and got

julia> df_new = DataFrame(visitors = markers_visitors, temp = markers_temp)
ERROR: UndefVarError: markers_visitors not defined
Stacktrace:
 [1] top-level scope at REPL[2]:100:

What am I doing wrong?

2 Likes

Whether or not this is equivalent will depend on markers_visitors and markers_temp, which as Kristoffer says isn’t available to us - can you produce an MWE that enables us to reproduce the error? If you can’t share the data you might want to consider generating some random data that matches your DataFrame in terms of the column types (as I did above).

Your initial post shows that your columns are of type Any, so there might be some stray non-numerical entries in your DataFrame?

1 Like

Sorry, sure.
The arrays are generated using the following code which is itself dependant on other functions. I thought it would help little if I posted 100+ lines of code but I see what you mean, sorry.

markers_visitors = []
markers_temp = []
datetime_temp = []
temp_dict = OrderedDict()
for (k, v) in markers_hourvalues
    temp_dict["$k"] = OrderedDict()
    for e in keys(v)
        d = filter(x -> string(hour(x.timestamp)) == e && string(Date(x.timestamp)) == k, temp)[:, :temp][1]   
        temp_dict["$k"]["$e"] = d
        push!(markers_temp, d)
        push!(datetime_temp, DateTime("$(k) $(e)", "yyyy-mm-dd H"))
        push!(markers_visitors, nrow(markers_hourvalues["$k"]["$e"]))
    end
end

This is the output of println(markers_visitors); println(markers_temp):

Any[19, 34, 70, 122, 136, 105, 74, 14, 48, 81, 68, 91, 93, 55, 17, 44, 62, 61, 141, 65, 50, 104, 164, 152, 27, 54, 64, 60, 101, 53, 73]
Any[22.5, 23.5, 24.2, 22.1, 21.4, 20.6, 19.4, 19.8, 22.2, 22.7, 24.1, 24.1, 23.1, 21.9, 24.8, 26.2, 27.0, 23.5, 23.0, 21.5, 19.9, 28.4, 27.2, 24.8, 28.3, 29.7, 31.2, 30.0, 22.2, 31.0, 31.4]

I checked if length(markers_visitors) == length(markers_temp) (true). typeof for each element in markers_temp is Float64 and markers_visitors is Int64 only.
Running a model with random values works…
Thanks for your patience!

You need to post enough so we can run the code and get the error message you posted. Even with the latest code I get a bunch of errors. It doesn’t have to be the same exact code that you use but it should be representative. Cut down as much as you can while the code still shows the problem.

1 Like

Okay thanks. Your MWE could have been:

using DataFrames, GLM
markers_visitors = Any[19, 34, 70, 122, 136, 105, 74, 14, 48, 81, 68, 91, 93, 55, 17, 44, 
    62, 61, 141, 65, 50, 104, 164, 152, 27, 54, 64, 60, 101, 53, 73]
markers_temp = Any[22.5, 23.5, 24.2, 22.1, 21.4, 20.6, 19.4, 19.8, 22.2, 22.7, 24.1, 
    24.1, 23.1, 21.9, 24.8, 26.2, 27.0, 23.5, 23.0, 21.5, 19.9, 28.4, 
    27.2, 24.8, 28.3, 29.7, 31.2, 30.0, 22.2, 31.0, 31.4]

df_new = DataFrame(visitors = markers_visitors, temp = markers_temp)
lm(@formula(visitors ~ temp), df_new)

This reproduces your error, and allows to narrow down the issue - it is indeed with the untyped Any arrays, if you instead use:

markers_visitors = Int64[19, 34, 70, 122, 136, 105, 74, 14, 48, 81, 68, 91, 93, 55, 17, 44, 
    62, 61, 141, 65, 50, 104, 164, 152, 27, 54, 64, 60, 101, 53, 73]
markers_temp = Float64[22.5, 23.5, 24.2, 22.1, 21.4, 20.6, 19.4, 19.8, 22.2, 22.7, 24.1, 
    24.1, 23.1, 21.9, 24.8, 26.2, 27.0, 23.5, 23.0, 21.5, 19.9, 28.4, 
    27.2, 24.8, 28.3, 29.7, 31.2, 30.0, 22.2, 31.0, 31.4]

you should get:

visitors ~ 1 + temp

Coefficients:
─────────────────────────────────────────────────────────────────────────────
              Estimate  Std. Error    t value  Pr(>|t|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────────────
(Intercept)  89.0044      50.6897    1.75587     0.0897  -14.6677   192.676
temp         -0.600152     2.04246  -0.293838    0.7710   -4.77745    3.57714
─────────────────────────────────────────────────────────────────────────────
1 Like

Thanks a lot! I don’t know how the GLM package works but I would have expected it to extract the values out of the arrays and then do stuff with them (in which case the type of the array itself wouldn’t really matter, I guess?). I actually rather expected it to be something with the environment or me being blind to some dumb syntax error. Well, asking questions properly isn’t easy apparently :wink:
Anyway, thank you both for the quick help even if my imprecise question didn’t make it easy…