# Normalizing Values and Floating Point Error

Sorry if this isn’t the right place to post this question, but it was as close as I could think for the intersection of concerns.

Most sources I’ve read on data modelling suggest that, when required, values be normalized/standardized ~ N(0,1). If those values are going to be used in matrices for modelling, though, what are the considerations for floating point error? On my system, the machine epsilon is ~ 1e-7 for 64-bit floats (doubles). For very large data sets, there could be many values affected by roundoff error.

My question is: does it make sense to use a different parameterization of N, say N(0, 100), to avoid roundoff error in this case?

Do you mean for 32-bit floats? Machine epsilon for 64-bit numbers (at 1.0) is 2e-16, and that doesn’t depend on the system, it’s a property of floating point numbers themselves.

You are right, I misinterpreted another value and thought epsilon was 1e-7, so my concerns are likely invalid.

I’d still be curious to know if the conventional wisdom of standardizing to N(0,1) stands for very large data sets (200m+ rows)?

The machine epsilon is a relative measure. Scaling all values (within reasonable limits) won’t make any difference to floating point roundoff errors.

@GunnarFarneback Could you expand on that a bit, please? From my understanding, machine epsilon is the smallest difference that the type supports between two values … i.e. there are ranges of the real numberline where a value will get rounded up or down. The impact of roundoff will increase the smaller the value being rounded.

If I scale all my features / targets to N(0,1), and do many multiplication operations on them, the values between (-1,1) will tend to get smaller and the effect of roundoff error will be magnified, no?

The distance between two “adjacent” floating point values depends on the magnitude of the values.

``````julia> nextfloat(1.0)
1.0000000000000002

julia> nextfloat(1.0) - 1.0
2.220446049250313e-16

julia> nextfloat(1024.0)
1024.0000000000002

julia> nextfloat(1024.0) - 1024.0
2.2737367544323206e-13

julia> eps(1.0)
2.220446049250313e-16

julia> eps(1024.0)
2.2737367544323206e-13
``````

I see, that’s really helpful, thank you!