Sorry if this isn’t the right place to post this question, but it was as close as I could think for the intersection of concerns.
Most sources I’ve read on data modelling suggest that, when required, values be normalized/standardized ~ N(0,1). If those values are going to be used in matrices for modelling, though, what are the considerations for floating point error? On my system, the machine epsilon is ~ 1e-7 for 64-bit floats (doubles). For very large data sets, there could be many values affected by roundoff error.
My question is: does it make sense to use a different parameterization of N, say N(0, 100), to avoid roundoff error in this case?
Do you mean for 32-bit floats? Machine epsilon for 64-bit numbers (at 1.0) is 2e-16, and that doesn’t depend on the system, it’s a property of floating point numbers themselves.
@GunnarFarneback Could you expand on that a bit, please? From my understanding, machine epsilon is the smallest difference that the type supports between two values … i.e. there are ranges of the real numberline where a value will get rounded up or down. The impact of roundoff will increase the smaller the value being rounded.
If I scale all my features / targets to N(0,1), and do many multiplication operations on them, the values between (-1,1) will tend to get smaller and the effect of roundoff error will be magnified, no?