Sum or fold over fields in a struct

Given an array of RGB{UInt8} values, how can we collect the sums of their independent :r, :g, and :b 8-bit components, without overflowing?

We can write a for loop and do this manually:

for v in arr
   r += v.r
   g += v.g
   b += v.b
end

We can use the sum function:

r = sum(v -> v.r, arr)
g = sum(v -> v.g, arr)
b = sum(v -> v.b, arr)

but this is likely to be much less efficient than the for loop, because it has to iterate three times, correct?

If we write:

rgbsum = sum(arr)

we overflow if arr is an array of RGB structs with 8-bit fields.

Is there a way to ask the sum function to accumulate its sum into an RGB{Float64}, for example, so that rgbsum = sum(arr) would work without overflow and as efficiently as the explicit for loop?

Not sure which RGB type we are talking about (it doesn’t seem to be the one from Colors.jl) but you could try something like:

foldl(+, arr; init=RGB{Float64}(0.0,0.0,0.0))

That is fold/reduce the array via addition with a RGB{Float64} initial value.

Example with a simple fake RGB type:

julia> struct RGB{T}
           r::T
           g::T
           b::T
       end

julia> Base.:+(x::RGB, y::RGB) = RGB(x.r + y.r, x.g + y.g, x.b + y.b)

julia> arr = RGB{UInt8}.(rand(UInt8,10), rand(UInt8,10), rand(UInt8,10));

julia> typeof(arr)
Array{RGB{UInt8},1}

julia> foldl(+, arr; init=RGB{Float64}(0,0,0))
RGB{Float64}(894.0, 1321.0, 1086.0)
2 Likes

Another option, maybe a bit clearer, would be a typecast in sum:

sum(RGB{UInt16}, arr)

But yes, it would help if the example code included the using statement that imported the RGB type.

Ah, thank you. The init argument was what I needed.

I’m sorry for the type confusion. This RGB type comes from the Images package, but I was intending to ask a more general question, which you’ve both answered.

I like the typecast approach of sum(RGB{Float64}, arr), but it seems to perform much worse than the foldl(+, arr, init=RGB{Float64}(0,0,0)) approach. 20ms vs 0.25ms for an array of size (460, 374).

It’ll be faster to retain the underlying UInt representation using a N24f8 or similar instead of doing a floating-point conversion:

julia> using BenchmarkTools, Images

julia> arr = rand(RGB{N0f8}, 460, 374);

julia> @btime foldl(+, $arr; init=RGB{Float64}(0, 0, 0))
  158.500 μs (0 allocations: 0 bytes)
RGB{Float64}(86118.69411764605,85819.37254901891,86040.69019608037)

julia> @btime foldl(+, $arr; init=RGB{N24f8}(0, 0, 0))
  29.900 μs (0 allocations: 0 bytes)
RGB{N24f8}(86118.7,85819.4,86040.7)

This is only an option if you know the sum for each channel will be less than typemax(N24f8)

julia> typemax(N24f8) |> Float64
1.6843009e7

…but you can get around this with N56f8 if need be:

julia> @btime foldl(+, $arr; init=RGB{N56f8}(0, 0, 0))
  54.800 μs (78 allocations: 2.73 KiB)
RGB{N56f8}(86118.7,85819.4,86040.7)
1 Like

That’s worth raising as an issue, unless the difference is compilation time or something like that. Could you please post a complete example, either here or on Github, with the code you ran that printed those times?

At least on v1.6, it’s actually faster than the foldl approach:

julia> @btime sum(RGB{Float64}, $arr)
  76.800 μs (0 allocations: 0 bytes)
RGB{Float64}(86118.69411764707,85819.37254901961,86040.69019607845)

**edit: can confirm that it’s much slower on v1.5.

julia> @btime sum(RGB{Float64}, $arr)
  17.857 ms (516119 allocations: 13.13 MiB)
4 Likes