Inconsistent number of decimal places when printing sum(x, dims=1)

adriano.vilela · March 17, 2022, 10:33pm

For a while, I just couldn’t understand what was going on here:

julia> x = [
7136.52 0.00  7136.52;
6047.88 62.74 6110.62;
2374.32 26.19 2400.51;
2374.32 78.57 2452.89];

julia> sum(x, dims=1)
1×3 Matrix{Float64}:
 17933.0  167.5  18100.5

julia> sum(x[:,1])
17933.04

julia> sum(x[:,2])
167.5

julia> sum(x[:,3])
18100.54

The two decimal places in the matrix represent cents. Note the different values obtained from columns 1 and 3.

After a while I realized this seems to be just a printing problem:

julia> matrix_sum = sum(x, dims=1)
1×3 Matrix{Float64}:
 17933.0  167.5  18100.5

julia> matrix_sum[1]
17933.04

julia> matrix_sum[2]
167.5

julia> matrix_sum[3]
18100.54

However, since I was using the terminal as a calculator and copying and pasting results into a text editor, this caused me a big headache.

Do you guys think this is the expected behavior? If matrix x is printed with two decimal places, why is sum(x, dims=1) printed with only one?

It’s interesting to note that this problem doesn’t happen if I create a matrix z of random numbers:

julia> z = rand(4, 4)
4×4 Matrix{Float64}:
 0.314998  0.274576  0.0731202  0.268014
 0.737918  0.873513  0.594399   0.977716
 0.951614  0.757834  0.468538   0.645187
 0.624272  0.134771  0.401739   0.091374

julia> sum(z, dims=1)
1×4 Matrix{Float64}:
 2.6288  2.04069  1.5378  1.98229

mbaz · March 17, 2022, 11:09pm

This is the result of having different ways to show the results. When printing an array, a “compact” representation is chosen. The same is happening in the case of the random array:

(@v1.7) julia> z=rand(4,4);

(@v1.7) julia> a=sum(z,dims=1);

(@v1.7) julia> show(stdout, a)  # force "non-compact" output
[1.6704264853429793 2.1144597128383222 1.623925634087594 2.104374970136207]

(@v1.7) julia> show(stdout, "text/plain", a)
1×4 Matrix{Float64}:
 1.67043  2.11446  1.62393  2.10437

adriano.vilela · March 18, 2022, 12:00am

I understand that. But why in my first example is the compact representation so compact (just one decimal place instead of, say, four)? Shouldn’t the compact representation always have the same number of decimal places (I’m thinking of the Matlab commands format short and format long)?

stevengj · March 18, 2022, 12:26am

In both cases they are printed with 6 significant digits.

In floating-point arithmetic it generally makes more sense to think in terms of significant digits than decimal places, because the numbers can be very large (1e300) or very small (1e-300).

Sukera · March 18, 2022, 2:11am

Slightly off-topic, but I’d recommend not using a floating point data type for financial calculations/display, as some operations (e.g. adding a very small amount in cents to a very large amount in dollars) can make the cents vanish purely due to how floating point math inherently works. Different order of summations of floating point values can also lead to different results, which is probably not desirable.

For more information, see e.g. here.

As an alternative, I’d recommend either using whole cents stored as an integer directly or going with some fixed point type representation.

stevengj · March 18, 2022, 2:22am

“Vanishing” cents will only happen from a single addition in double precision for amounts exceeding roughly 90 trillion dollars, at which point the importance of individual pennies becomes questionable. However, because decimal values are not exactly represented, instead of 99 cents you might have 98.99999999999999… cents.

In situations where you absolutely have to preserve cents exactly (e.g. because of a fiduciary requirement), you can use decimal floating point, which preserves the wide dynamic range and sensible roundoff semantics of floating point (unlike fixed point) while providing exact representations of decimal values.