Basic, technical questions on “goodness-of-fit” for regression models…

Using N data points, suppose I fit a regression model with n_\beta parameters \beta to get a predictor \hat{y}_i = \hat{\beta}\phi_i, with prediction error e_i = y_i - \hat{y}_i. The standard deviation \sigma_e is then given by:

and a *corrected* standard deviation s_e is given by

where essentially the subtraction of n_\beta corrects for the fact that parameters \hat{\beta} have been estimated using the same data as used for computing the standard deviation. [Trivial example… \hat{y}_i = \bar{y} where n_\beta = 1.]

Two questions:

- Suppose instead that the n_\beta parameters have been computed from
*training*data, while I want to compute the standard deviation over*validation*data which are*different*from the training data. I’d like to use the standard deviation as a measure of “goodness-of-fit”.

- Since I didn’t use validation data to compute \hat{\beta}, would it be correct to compute the standard deviation over the validation data using the expression for \sigma_e? Or should I use the corrected expression, i.e., s_e?

- Is the correction by dividing with N-n_\beta based on an assumption of a
*linear*regression model, or would the same idea be valid for nonlinear regression methods such as ANNs?

Sorry for bothering you with such trivial questions – I’m trying to convince some colleagues to take a look at Julia, and plan to use basic regression as a case study for them. (I’d like to understand what is going on in packages before I use them…)