Mean of integers overflows - bug or expected behaviour?

Raw A2D converter output is indeed an integer, but it represents a position on a fixed scale. The Julian approach to measurement data is FixedPointNumbers.jl, which already does the safe thing here:

julia> mean(rand(N0f32, 1000))
0.5003701349572203
2 Likes

Show me a single ADC converter that outputs floating point values.

Show me a single thermocouple, flow orifice, pressure gauge, or GC that outputs integers. As I said, horses for courses.

2 Likes

If they are interfaced to a computer, all of them output integers, whether you realize it or not. For many examples of thermocouples you consult the site of one of the largest vendors.
.

And if you deal directly with the A2D output, then that is your reality. I deal with the temperature measurement. That is my reality. My values are floats.

2 Likes

Unless you’re using an analog gauge, (and thanks to the profusion of silicon, few of us are), all these voltage/charge/current signals will be converted to fixed-point integers by an ADC somewhere between the sensor and your data acquisition system. I just installed a bunch of Honeywell pressure gauges with an onboard chip that reads the strain bridge measurement as a voltage, converts it to digital with an ADC, performs signal-conditioning, and converts it back to an analog voltage level for output. My data acquisition system then takes that analog voltage, converts it back to digital, and performs additional signal conditioning. The amount of back-and-forth is ugly, but none of us at the interface between computers and measurement equipment can ignore the reality of signal discretization.

Of course, but I suspect very few of us are calculating the mean of the raw A2D output, rather than the resulting float value…

It really depends on the measurement system in question. Especially for high-bandwidth signals like audio/imagery, there’s a significant advantage to be gained by representing (and storing) signals using only the number of bits required.

While I agree that at some level an A2D gives you an integer, do they even make them with 64-bit ranges? (I think that may be more accurate than any number known to mankind?)

From say 24 bits there would be a lot of headroom. Does mean(::Vector{Int32}) accumulate in Int64? Yes, it seems to:

julia> i32 = abs.(rand(Int32, 10000));

julia> mean(i32) == mean(float, i32) == mean(Int64, i32)
true

julia> i64 = abs.(rand(Int, 10000));

julia> mean(i64) - mean(float, i64)
-4.63382211131584e18

The behavior of sum (and mean, by extension) depends on whether your system’s word size:

help?> sum
  sum(f, itr)

  Sum the results of calling function f on each element of itr.

  The return type is Int for signed integers of less than system word size,
 and UInt for unsigned integers of less than system word size. For all 
 other arguments, a common return type is found to which all arguments
 are promoted.

The assumption that mean() is a mere extension of sum is questionable.

1 Like

Not really. Most of the devices you are linking work via I2C or SPI. Strictly speaking, they are just outputting a bunch of bits.

A particular API may represent them as integers for you, but in this case I would argue that it’s just one possible design, and as your example shows, not even a really good one.

Whether you realize it or not :wink:, a particularly elegant Julian way of dealing with this would be converting to a temperature representation at the earliest opportunity. Then mean etc would just work out of the box. See

eg

julia> using Unitful, Unitful.DefaultSymbols, Statistics

julia> mean([1u"K", 5u"K"])
3.0 K

It’s right in the function definition:

_mean(A::AbstractArray, ::Colon) = sum(A) / length(A)

…which is the textbook definition of the arithmetic mean

3 Likes

That is one implementation, and it is flawed. sum() and mean() have different signatures, in Haskell the former would have been

[a]->a

whereas the latter

[a]->Float

Type conversion step is not a “mere extension”

That’s not a correct function signature. For example, the mean of complex numbers should be complex. As such, the correct type signature for the return type is the result of dividing the sum of the collection by it’s length.

2 Likes

How do you divide types?

I don’t mean divide types, I mean assuming that there is a type stable division, the return type of mean should be of that type.

That’s not how Julia’s mean() behaves, though. The type-stable division for integers is div().

As we’ve discussed at length in this thread, there are pros & cons to widening the accumulator when summing. Type conversion comes from /(a,b), which always returns a float for integer arguments. If we want mean to behave differently, we could define mean(A::AbstractArray{T,N}) where {T <: Integer, N} = sum(A) ÷ length(A), but that doesn’t solve your original issue, and introduces new issues:

julia> mean([1,1]*typemax(Int))
-1

julia> mean([1,2,2])
1
1 Like

By type stable division, I don’t mean maintains the same type. I mean has a result that is type stable (ie Int/Int = Float) I was being a little pedantic to get around the fact that if the length of your collection is a BigInt for example, the result should be a BigFloat (or complex or unit…)

1 Like