Mean of integers overflows - bug or expected behaviour?

stillyslalom · February 18, 2020, 4:00pm

Raw A2D converter output is indeed an integer, but it represents a position on a fixed scale. The Julian approach to measurement data is FixedPointNumbers.jl, which already does the safe thing here:

julia> mean(rand(N0f32, 1000))
0.5003701349572203

Mikhail_Kagalenko · February 18, 2020, 4:00pm

Show me a single ADC converter that outputs floating point values.

braamvandyk · February 18, 2020, 4:01pm

Show me a single thermocouple, flow orifice, pressure gauge, or GC that outputs integers. As I said, horses for courses.

Mikhail_Kagalenko · February 18, 2020, 4:09pm

If they are interfaced to a computer, all of them output integers, whether you realize it or not. For many examples of thermocouples you consult the site of one of the largest vendors.
.

braamvandyk · February 18, 2020, 4:10pm

And if you deal directly with the A2D output, then that is your reality. I deal with the temperature measurement. That is my reality. My values are floats.

stillyslalom · February 18, 2020, 4:11pm

Unless you’re using an analog gauge, (and thanks to the profusion of silicon, few of us are), all these voltage/charge/current signals will be converted to fixed-point integers by an ADC somewhere between the sensor and your data acquisition system. I just installed a bunch of Honeywell pressure gauges with an onboard chip that reads the strain bridge measurement as a voltage, converts it to digital with an ADC, performs signal-conditioning, and converts it back to an analog voltage level for output. My data acquisition system then takes that analog voltage, converts it back to digital, and performs additional signal conditioning. The amount of back-and-forth is ugly, but none of us at the interface between computers and measurement equipment can ignore the reality of signal discretization.

braamvandyk · February 18, 2020, 4:12pm

Of course, but I suspect very few of us are calculating the mean of the raw A2D output, rather than the resulting float value…

stillyslalom · February 18, 2020, 4:14pm

It really depends on the measurement system in question. Especially for high-bandwidth signals like audio/imagery, there’s a significant advantage to be gained by representing (and storing) signals using only the number of bits required.

improbable22 · February 18, 2020, 4:28pm

While I agree that at some level an A2D gives you an integer, do they even make them with 64-bit ranges? (I think that may be more accurate than any number known to mankind?)

From say 24 bits there would be a lot of headroom. Does mean(::Vector{Int32}) accumulate in Int64? Yes, it seems to:

julia> i32 = abs.(rand(Int32, 10000));

julia> mean(i32) == mean(float, i32) == mean(Int64, i32)
true

julia> i64 = abs.(rand(Int, 10000));

julia> mean(i64) - mean(float, i64)
-4.63382211131584e18

stillyslalom · February 18, 2020, 4:50pm

The behavior of sum (and mean, by extension) depends on whether your system’s word size:

help?> sum
  sum(f, itr)

  Sum the results of calling function f on each element of itr.

  The return type is Int for signed integers of less than system word size,
 and UInt for unsigned integers of less than system word size. For all 
 other arguments, a common return type is found to which all arguments
 are promoted.

Mikhail_Kagalenko · February 18, 2020, 4:52pm

The assumption that mean() is a mere extension of sum is questionable.

Tamas_Papp · February 18, 2020, 4:54pm

Not really. Most of the devices you are linking work via I2C or SPI. Strictly speaking, they are just outputting a bunch of bits.

A particular API may represent them as integers for you, but in this case I would argue that it’s just one possible design, and as your example shows, not even a really good one.

Whether you realize it or not , a particularly elegant Julian way of dealing with this would be converting to a temperature representation at the earliest opportunity. Then mean etc would just work out of the box. See

eg

julia> using Unitful, Unitful.DefaultSymbols, Statistics

julia> mean([1u"K", 5u"K"])
3.0 K

stillyslalom · February 18, 2020, 4:58pm

It’s right in the function definition:

_mean(A::AbstractArray, ::Colon) = sum(A) / length(A)

…which is the textbook definition of the arithmetic mean

Mikhail_Kagalenko · February 18, 2020, 5:01pm

That is one implementation, and it is flawed. sum() and mean() have different signatures, in Haskell the former would have been

[a]->a

whereas the latter

[a]->Float

Type conversion step is not a “mere extension”

Oscar_Smith · February 18, 2020, 5:04pm

That’s not a correct function signature. For example, the mean of complex numbers should be complex. As such, the correct type signature for the return type is the result of dividing the sum of the collection by it’s length.

Mikhail_Kagalenko · February 18, 2020, 5:06pm

How do you divide types?

Oscar_Smith · February 18, 2020, 5:08pm

I don’t mean divide types, I mean assuming that there is a type stable division, the return type of mean should be of that type.

Mikhail_Kagalenko · February 18, 2020, 5:09pm

That’s not how Julia’s mean() behaves, though. The type-stable division for integers is div().

stillyslalom · February 18, 2020, 5:14pm

As we’ve discussed at length in this thread, there are pros & cons to widening the accumulator when summing. Type conversion comes from /(a,b), which always returns a float for integer arguments. If we want mean to behave differently, we could define mean(A::AbstractArray{T,N}) where {T <: Integer, N} = sum(A) ÷ length(A), but that doesn’t solve your original issue, and introduces new issues:

julia> mean([1,1]*typemax(Int))
-1

julia> mean([1,2,2])
1

Oscar_Smith · February 18, 2020, 5:17pm

By type stable division, I don’t mean maintains the same type. I mean has a result that is type stable (ie Int/Int = Float) I was being a little pedantic to get around the fact that if the length of your collection is a BigInt for example, the result should be a BigFloat (or complex or unit…)

Topic		Replies	Views
Math operation known issue? a^b General Usage question , special-issue , math , integer-overflow	11	1578	May 6, 2020
Inconsistent behavior of `sum`,`mean` (and probably others) on different collection types Internals & Design	31	3516	September 19, 2017
Overflow issue? New to Julia question , float	3	154	December 20, 2024
Different results between 10^50 and 10.0^50 New to Julia	10	308	April 25, 2025
Question about reduce? General Usage integer-overflow	3	344	June 6, 2022

Mean of integers overflows - bug or expected behaviour?

Related topics