Normalizing Values and Floating Point Error

Sorry if this isn’t the right place to post this question, but it was as close as I could think for the intersection of concerns.

Most sources I’ve read on data modelling suggest that, when required, values be normalized/standardized ~ N(0,1). If those values are going to be used in matrices for modelling, though, what are the considerations for floating point error? On my system, the machine epsilon is ~ 1e-7 for 64-bit floats (doubles). For very large data sets, there could be many values affected by roundoff error.

My question is: does it make sense to use a different parameterization of N, say N(0, 100), to avoid roundoff error in this case?

Do you mean for 32-bit floats? Machine epsilon for 64-bit numbers (at 1.0) is 2e-16, and that doesn’t depend on the system, it’s a property of floating point numbers themselves.

You are right, I misinterpreted another value and thought epsilon was 1e-7, so my concerns are likely invalid.

I’d still be curious to know if the conventional wisdom of standardizing to N(0,1) stands for very large data sets (200m+ rows)?

The machine epsilon is a relative measure. Scaling all values (within reasonable limits) won’t make any difference to floating point roundoff errors.

@GunnarFarneback Could you expand on that a bit, please? From my understanding, machine epsilon is the smallest difference that the type supports between two values … i.e. there are ranges of the real numberline where a value will get rounded up or down. The impact of roundoff will increase the smaller the value being rounded.

If I scale all my features / targets to N(0,1), and do many multiplication operations on them, the values between (-1,1) will tend to get smaller and the effect of roundoff error will be magnified, no?

The distance between two “adjacent” floating point values depends on the magnitude of the values.

julia> nextfloat(1.0)

julia> nextfloat(1.0) - 1.0

julia> nextfloat(1024.0)

julia> nextfloat(1024.0) - 1024.0

julia> eps(1.0)

julia> eps(1024.0)

I see, that’s really helpful, thank you!