Normalizing a vector to sum to one does not work?

Dan.phi · October 11, 2021, 11:27am

This is an extremely basic question and I have to be missing something, but when trying to normalize matrix A below so that rows sum to one, some (small) differences remain:

A = rand(100,4)

A = A./sum(A, dims = 2)

sum(sum(A, dims =2) .== 1)   # fewer than 100!

Why does this happen? Is there a way to solve this?

goerch · October 11, 2021, 11:35am

The result is due to floating point inaccuracies. The following holds:

A = rand(100,4)

A = A./sum(A, dims = 2)

@assert sum(sum(A, dims =2) .≈ 1) == 100

Alternate name for ≈ is isapprox.

Dan.phi · October 11, 2021, 11:42am

Yes, but I need exact sums–I’m thinking of columns as probabilities.

johnmyleswhite · October 11, 2021, 11:51am

Almost all scientific computing code works with inexact probabilities since the standard level of inexactness offered by floating point is usually not an issue in practice. What level of inexactness creates problems for you?

Dan.phi · October 11, 2021, 12:00pm

Well, that eventually results in some distance measures between probability distributions being negative. But now that I know the reason, I will just force them ex-post to be zero.

viraltux · October 11, 2021, 12:18pm

As @johnmyleswhite mentions with Float types we should always expect these tiny inaccuracies that we force into [0,1] when dealing with probabilities.

However, if you need to guarantee calculations without forcing the outcome an option is to use Rational numbers to declare probabilities.

A = rand(1:10_000_000,100,4)
A = A .// sum(A, dims = 2)

sum(sum(A, dims =2) .== 1)
100

johnmyleswhite · October 11, 2021, 12:33pm

The big limitation here is the rational numbers overflow in ways potentially more surprising than floating point numbers:

julia> x = 1//10
1//10

julia> typeof(x)
Rational{Int64}

julia> for i in 1:1_000
           x *= 1//10
           println((i, x))
           end
(1, 1//100)
(2, 1//1000)
(3, 1//10000)
(4, 1//100000)
(5, 1//1000000)
(6, 1//10000000)
(7, 1//100000000)
(8, 1//1000000000)
(9, 1//10000000000)
(10, 1//100000000000)
(11, 1//1000000000000)
(12, 1//10000000000000)
(13, 1//100000000000000)
(14, 1//1000000000000000)
(15, 1//10000000000000000)
(16, 1//100000000000000000)
(17, 1//1000000000000000000)
ERROR: OverflowError: 1000000000000000000 * 10 overflowed for type Int64
Stacktrace:
 [1] throw_overflowerr_binaryop(op::Symbol, x::Int64, y::Int64)
   @ Base.Checked ./checked.jl:154
 [2] checked_mul
   @ ./checked.jl:288 [inlined]
 [3] *(x::Rational{Int64}, y::Rational{Int64})
   @ Base ./rational.jl:334
 [4] top-level scope
   @ ./REPL[3]:2

This is actually part of the general problem – given a fixed number of bits, you can trade off exactness for range or range for exactness. Rational numbers have less range, but are exact; floating point have much larger range but are not exact.

Sukera · October 11, 2021, 12:41pm

I think this is relevant here:

StefanKarpinski · October 12, 2021, 2:13pm

24 posts were split to a new topic: Discussion about integer overflow

StefanKarpinski · October 12, 2021, 2:18pm

A post was merged into an existing topic: Discussion about integer overflow

mbauman · October 12, 2021, 2:42pm

A post was merged into an existing topic: Discussion about integer overflow

Topic		Replies	Views
After scaling the vector still not summing up to 1 General Usage	1	688	March 3, 2020
Sum of float64 vector gives slightly incorrect answer Performance question	44	5220	April 29, 2019
I've written softmax() function, but I don't get the proper answer Machine Learning	2	1268	January 22, 2020
Matrix multiplication precision issue General Usage	13	3053	January 29, 2019
Linear Algebra Rounding Errors? Specific Domains	6	699	October 3, 2019

Normalizing a vector to sum to one does not work?

Related topics