Nobs() -> Float or Int?


#1

minor question: shouldn’t nobs() return an integer for a GLM lm model?

julia> y= [1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> nobs( GLM.lm(X,y) )
10.0

are there cases where this can become a fraction?


#2

Yes, when non-integer weights are provided. A solution to that would be to parameterize the model on the type of weights, and default to Int when there are no weights.


#3

thx. Simplicity is better. I would suggest an nobs() function that is truly the number of observations (and always integer) and a wobs() function that is the sum of the weights. should I suggest it as an issue on github?


#4

Feel free to file an issue, but I’m afraid it’s more complex than that. For example, with frequency/replicate weights, the apparent “number of observations” doesn’t have any meaning, it’s just the way the data has been compressed to save space. So it would be misleading to have nobs return that.


#5

I would appreciate an option to return the number of rows used in a regression. It’s useful in regressions with missing values where you want to make sure you are getting the right subset of data each time.

It’s pretty easy to define a function though.

function unweightednobs(m) nrow(m.mf.df) end

#6

thx. agreed with everything.

/iaw