# Speeding up a color transformation loop

First, a little background. I have an image [2000 x 2000 pixels] I am applying different illuminants to. Towards the end of this process I have to transform the CIE1931 X,Y, and Z coordinate to RGB. These are stored as an array of 3 different 2000x2000 matrices ( colorLayers[1:3][x, y]).

To speed things up, I used CartesianIndices instead of two nested loops.

``````for xy in CartesianIndices(colorLayers)

colorLayers[ xy] = (3.2404542 * XYZlayers[ xy]) + (-1.5371385 * XYZlayers[ xy]) + (-0.4985314* XYZlayers[ xy])
colorLayers[ xy] = (-0.969266 * XYZlayers[ xy]) + (1.8760108 * XYZlayers[ xy]) + (0.041556 * XYZlayers[ xy])
colorLayers[ xy] = (0.0556434 * XYZlayers[ xy]) + (-0.2040259 * XYZlayers[ xy]) + (1.0572252 * XYZlayers[ xy])
end
``````

This takes 3.22018 seconds to perform.

I’ve tried:

1. Putting X,Y, and Z in their own separate loops, hoping it would give an memory access advantage (it didn’t).
3. Using FLoops…

FLoops seems to have a cache issue. The runs with the first few illuminants are fairly fast, but get slower over time:

illuminant-1 0.98
illuminant-2 0.73
illuminant-3 5.54
illuminant-4 5.61
illuminant-5 5.77
illuminant-6 12.44 seconds (31.07 seconds total)

In contrast, the non-FLoops code is a fairly constant ~3.2 seconds per illuminant (19.3 seconds total).

Any ideas how to speed up this code?

FLoops snippet below:

``````@floop	for xy in CartesianIndices(colorLayers)

colorLayers[ xy] = (3.2404542 * XYZlayers[ xy]) + (-1.5371385 * XYZlayers[ xy]) + (-0.4985314* XYZlayers[ xy])
colorLayers[ xy] = (-0.969266 * XYZlayers[ xy]) + (1.8760108 * XYZlayers[ xy]) + (0.041556 * XYZlayers[ xy])
colorLayers[ xy] = (0.0556434 * XYZlayers[ xy]) + (-0.2040259 * XYZlayers[ xy]) + (1.0572252 * XYZlayers[ xy])
end
``````
1. Have you tried `eachindex(colorLayers)` instead of Cartesian indexing?

2. Instead of accessing `XYZlayers[xy]` 3 times, have you tried setting `h1` `h2` `h3` to `XYZlayers[1 to 3][xy]` respectively and then using the `h`s for each calculation.

1 Like

Apart from trying Dan’s suggestions, you should put your code inside of a function. This is mentioned in the performance tips, which you should read given that you’re interested in performance: Performance Tips · The Julia Language

1 Like

Any reason for this indirect storage format? Using an 3 x 2000 x 2000 array your whole operation can be written as a matrix multiplication.

2 Likes

First off, thanks for all of the suggestions. I’ll report back later after I’ve had time to try everyone’s suggestions. In the meantime, I’ll answer some questions, which will give everyone a better background.

Big picture: I’m trying to show what hyperspectral images look like under different illuminants (e.g. daylight vs LED light bulb). Instead of 3 channels corresponding to red, green, and blue, hyperspectral images might have 31 channels, corresponding to 400 through 700nm wavelengths, in steps of 10nm.

Why hyperspectral? Metamerism: two things whose color looks identical, despite reflecting completely different wavelengths. For example, object A might reflect 610nm (orange), whereas object B reflects a mixture of yellow (580nm) and red (640nm), yet both look orange. For example, the objects on the left are rendered under a full spectrum, whereas the objects on the right have a 610nm illuminant. In this case, metamerism fails, as the mixture of yellow and red looks dark gray instead of orange.

nsajko: That’s just a snippet that is part of larger function. Below is an outline of what the function does:

``````function processHyperSpectral( images, CMF, Illuminant)

# images is an array of 31 hyperspectral images (e.g. 400 nm, 410nm…700nm)
# CMF is a color matching function, such as CIE1931. 3 (XYZ) x 31 (nm)
# illuminant is an array of the spectrum of a light source (daylight, incandecent bulb, etc.)

1) turn the CMF into matrix of floats (was an array of xyz)
2) make separate images layers representing X,Y,and Z
3) precalculate the illuminant * CMF (color matching function adapted to the light source).
4) go through each of the 31 hyperspectral images and multiply them by the precalculated CMF
This is summed into separate X,Y, and Z images.
5) Normalize the X, Y, and Z images
6) Transform the X, Y, and Z images to separate R, G, and B images (CURRENT ISSUE)
7) Perform sVGA gamma correction
8) Fuse the 3 images into a single RGB image
``````

Dan: 1) No, I did not try each index, because I already knew the horizontal and vertical dimensions. Earlier iterations used:

``````for x in 1:horiz
for y in 1:vert
``````

… which was slower than the Cartersian method.

1. I’m not sure what you mean by h1, h2, and h3. Please elaborate.

bertschi: Yes, there is a historical reason, but it might work better if I take your suggestion and
go to the next step and set it up as a matrix rather than an array of smaller matrices.
The original code leaned heavily on colors.jl. The constant battle between types
(Gray, RGB, and XYZ) was not only a pain-in-the-ass, but made arrays/matrices
and looping for efficient coding all but impossible. For example, and earlier version
looked like this:

``````for x in 1:horiz

for y in 1:vert

redLayer[x,y] = (3.2404542 * Xlayer[x,y]) + (-1.5371385 * Ylayer[x,y]) + (-0.4985314* Zlayer[x,y])
greenLayer[x,y] = (-0.969266 * Xlayer[x,y]) + (1.8760108 * Ylayer[x,y]) + (0.041556 * Zlayer[x,y])
blueLayer[x,y] = (0.0556434 * Xlayer[x,y]) + (-0.2040259 * Ylayer[x,y]) + (1.0572252 * Zlayer[x,y])
end
end
``````

…because the style of the codebase encouraged (forced?) treating red, green, and blue (or x, y, and z) as separate entities, rather than looping through colors 1,2, and 3. So my first course of action was to try to rip that out, and have a separate array entries for each color (later, during gamma correction, I do loop through the colors). The next step ended up looking like this:

``````for x in 1:horiz
for y in 1:vert
colorLayers[x,y] = (3.2404542 * Xlayer[x,y]) + (-1.5371385 * Ylayer[x,y]) + (-0.4985314* Zlayer[x,y])
colorLayers[x,y] = (-0.969266 * Xlayer[x,y]) + (1.8760108 * Ylayer[x,y]) + (0.041556 * Zlayer[x,y])
colorLayers[x,y] = (0.0556434 * Xlayer[x,y]) + (-0.2040259 * Ylayer[x,y]) + (1.0572252 * Zlayer[x,y])
end
end
``````

In addition, all of the typecasting and stuff really slowed down my code when I just needed to manipulate arrays/matrices of floats.

So, by first translating them to floats, doing my several steps of manipulations, and then translating back to RGB types at the last minute, ended up yielding much faster code.

If you want people to be able to help you well, you’ve got to give self-contained examples of whatever’s the problem. This means including an entire function, along with the data necessary to run it. See Please read: make it easier to help you

But don’t give us the huge function as in your actual code, rather you should produce a self-contained example that’s representative of your issue.

Pretty sure they meant to deduplicate your code by assigning the common expressions to variables:

``````h1 = XYZlayers[xy]
h2 = XYZlayers[xy]
h3 = XYZlayers[xy]
``````

If nothing else, that would make the code read better.

2 Likes

Using Muladd (tweaked from Colors.jl) brought it down from 3.2 to 2.91 seconds, and using Dan’s suggestion of deduplicating by assigning the common expressions to variables brought it down from 2.91 to 2.47 seconds.

This is the current state of my code:

``````for xy in CartesianIndices(colorLayers)
A = XYZlayers[ xy]
B = XYZlayers[ xy]
C = XYZlayers[ xy]
A,
B,
-0.4985314 * C)
)
A,
XYZlayers[ xy],
0.041556 * C)
)
A,
B,
1.0572252 * C)
)

end
``````

Next, I’m going to try the matrix suggestion.

Couldn’t you just use `@fastmath` and let the compiler do it?

1 Like

Do you have a full runnable benchmark? I am getting much faster timings than you and I doubt my computer is faster. Here is a slightly tweaked version of your original:

``````using Images
using BenchmarkTools

function layer_test_data(; N = 2000)
return [rand(Float32, N, N) for _ = 1:3]
end

function transform_layers(XYZlayers)
N = size(layers, 1)
colorLayers = [zeros(Float32, N, N) for _ = 1:3]
for xy in CartesianIndices(XYZlayers)
h1 = XYZlayers[xy]
h2 = XYZlayers[xy]
h3 = XYZlayers[xy]
@fastmath colorLayers[xy] = (3.2404542 * h1) + (-1.5371385 * h2) + (-0.4985314 * h3)
@fastmath colorLayers[xy] = (-0.969266 * h1) + (1.8760108 * h2) + (0.041556 * h3)
@fastmath colorLayers[xy] = (0.0556434 * h1) + (-0.2040259 * h2) + (1.0572252 * h3)
end
colorLayers
end
``````

Which benchmarks pretty well

``````julia> layers = layer_test_data();

julia> @btime transform_layers(\$layers);
734.404 ms (24000010 allocations: 595.09 MiB)
``````

Using a matrix multiply

``````function color_test_data(; N = 2000)
return XYZ.(rand(RGB{Float32}, N, N))
end

function transform_colors(colors)
M = Float32[
3.2404542 -1.5371385 -0.4985314
-0.969266 1.8760108 0.041556
0.0556434 -0.2040259 1.0572252
]
N = size(colors, 1)
colorview(RGB, reshape(M * reshape(channelview(colors), 3, N*N), 3, N, N))
end
``````

is a big speedup

``````julia> ctd = color_test_data();

julia> @btime transform_colors(\$ctd);
15.197 ms (6 allocations: 45.78 MiB)
``````

The transform from `Colors.jl` is in the middle

``````julia> @btime RGB.(\$ctd);
423.400 ms (2 allocations: 45.78 MiB)
``````
2 Likes

My results (MacStudio with M1Max chipset):

`````` transform_layers    82.292 ms
transform_colors  1640.000 ms
colors.jl          799.766 ms
``````

So, your first algorithm wins! But (and strangely), your matrix multiply version (transform_colors) tanks on my machine, and gets outdone by colors.jl.

In practice, when I substituted you fastmath code for mine, it ended up being a little slower (~10%).

In the meantime, I’ll clean up my code and post the entire function.

Source code:

``````function processHyperSpectral( images, CMF, Illuminant)

numWL = length(CMF)				# numWL = number of wavelength bins
cmfArray = zeros( (3, numWL))

for i in eachindex(CMF)			# extract colors.jl CMF so that are hands are not tied by their data types.
cmfArray[1, i] = CMF[i].x
cmfArray[2, i] = CMF[i].y
cmfArray[3,i] = CMF[i].z
end
dimensions = size(images)

horiz = dimensions
vert = dimensions

XYZlayers = []									# Seperate X, Y, and Z layer for destination image
for j in 1:3
push!( XYZlayers, fill(0.0, horiz, vert) )
end

#-----------------------------
# Precalculate Illuminant x CMF
IllumiCMF = zeros( (3,numWL))
for i in eachindex(CMF)
for j in 1:3
IllumiCMF[j, i] = cmfArray[j, i] .* Illuminant[i]
end
end
#-----------------------------
# multiply each image by the CMF/Illuminant combo
for i in eachindex( images)		# 31 10nm images starting at 400 nm (violet)
for j in 1:3
XYZlayers[j] = XYZlayers[j] .+ (images[i] .*IllumiCMF[j,i])		# multiply each image by the CMF/Illuminant combo
end
end
#-----------------------------
# We'll norm these by XYZ, which is the average area under the 3 curves
XYZ =  sum(cmfArray)
XYZ /= 3

for j in 1:3
XYZlayers[j] = XYZlayers[j] ./ XYZ					# last was no sqrt, maxed by mxval
end

#-----------------------------
# Next, convert XYZ to RGB
colorLayers = []				# This will hold the 3 color layers before performing gamma and compositing to RGB
for j in 1:3
push!( colorLayers, fill(0.0, horiz, vert) )
end

#= XYZ to RGB matrix for D65:

3.2404542	-1.5371385	-0.4985314
-0.969266	1.8760108	0.041556
0.0556434	-0.2040259	1.0572252

from http://brucelindbloom.com
=#

for xy in CartesianIndices(colorLayers)
A = XYZlayers[ xy]
B = XYZlayers[ xy]
C = XYZlayers[ xy]
A,
B,
-0.4985314 * C)
)
A,
XYZlayers[ xy],
0.041556 * C)
)
A,
B,
1.0572252 * C)
)
#=		for xy in CartesianIndices(XYZlayers)
h1 = XYZlayers[xy]
h2 = XYZlayers[xy]
h3 = XYZlayers[xy]
@fastmath colorLayers[xy] = (3.2404542 * h1) + (-1.5371385 * h2) + (-0.4985314 * h3)
@fastmath colorLayers[xy] = (-0.969266 * h1) + (1.8760108 * h2) + (0.041556 * h3)
@fastmath colorLayers[xy] = (0.0556434 * h1) + (-0.2040259 * h2) + (1.0572252 * h3)

=#
end

#-----------------------------
# Perform gamma correction and composit images
# 	clamp the values
for j in 1:3
clamp!(colorLayers[j], 0, 1)
end
# 	make the newImage container
newImage = fill(RGB(0.0, .0, 0.0), horiz, vert)
# 	perform gamma correction
for xy in CartesianIndices(colorLayers)
redTemp, greenTemp, blueTemp = sRGBGamma(colorLayers[ xy], colorLayers[ xy], colorLayers[ xy])
newImage[xy] = RGB( redTemp, greenTemp, blueTemp )	# collect changes tuple to an array
end

return newImage
end
#-=================================================
function sRGBGamma(red, green, blue)

if red > 0.0031308
red = 1.055 * (red^(1. / 2.4)) - 0.055
else
red *= 12.92
end
red = clamp(red, 0, 1)
if green > 0.0031308
green = 1.055 * (green^(1. / 2.4)) - 0.055
else
green *= 12.92
end
green = clamp(green, 0, 1)
if blue > 0.0031308
blue = 1.055 * (blue^(1. / 2.4)) - 0.055
else
blue *= 12.92
end
blue = clamp(blue, 0, 1)
return red, green, blue
end
``````

One thing that is odd is that the toy simulation, despite using the same amount of data, runs substantially faster than when it is embedded in the function processHyperSpectral(). For example, in the toy version, it took 101 ms to complete, whereas when that same section is in processHyperSpectral(), it takes 2508ms to complete for the same amount of data.

The only differences between the two are (other than the randomized data in the toy version) the computations that occurred before-hand. I suspect, perhaps naively, that there is some sort of type-checking that is slowing down the computations when it is in processHyperSpectral(). Having said that, I tried to ensure that all matrices and arrays were [implicitly] of type Float64.*

• That’s the reason I stepped away from using the built-in types used by colors.jl, as they seemed to require too much overhead when typecasting from one to another when I could have treated everything as array/matrices of float64. Indeed, when I transformed the initial CMF from using the colors.jl xyz data type to just a float, everything sped up.

You repeatedly use `CartesianIndices`, for no apparent reason. The appropriate thing to use in your case `eachindex` or `for idx = 1:length(A)`.

The difference is that `eachindex` uses linear indexing. That is, each access is just `basepointer + shift * (index-1)`. Cartesian indices are (for a matrix) two indices `i, j` and access at `basepointer + shift * ((i-1) + size(A,1) * (j-1))`. You see the extra addition and multiplication!

Never just use `CartesianIndices` without thinking. Only use it if you have good reasons for using it; this is not a sane default.

1 Like

These are arrays of type `Any`, can you initialize them with the correct types? Like `Int[]` for an array of integers.

edit:

In this case, do:

``````	XYZlayers = Matrix{Float64}[]									# Seperate X, Y, and Z layer for destination image
for j in 1:3
push!( XYZlayers, fill(0.0, horiz, vert) )
end
``````

or, more directly:

``````XYZlayers = [ zeros(horiz, vert) for _ in 1:3 ]
``````

in which case the element type of the array can be deduced directly upon construction (you can use `fill` instead of `zeros` there, but `zeros` seems clearer to me).

1 Like

This is a sign that you may be using some global variable in your actual code that is causing a type instability, that you solved in the toy example.

Your function `processHyperSpectral` is unwieldy, you should separate it into many smaller functions.

There’s a horrid degree of code duplication in `sRGBGamma`.

These snippets should obviously be the body of their own function. If you don’t want to define a function in the global namespace for some reason, you can use a local function, like:

``````function sRGBGamma(red, green, blue)
gc = function(x)
if x > 0.0031308
x = 1.055 * (x^(1. / 2.4)) - 0.055
else
x *= 12.92
end
clamp(x, 0, 1)
end
map(gc, (red, green, blue))
end
``````

Try `IllumiCMF[j, i] .= cmfArray[j, i] .* Illuminant[i]` instead to prevent unnecessary allocation.

Try `XYZlayers[j] .+= images[i] .* IllumiCMF[j,i]` instead.

The same here.

In general, see the Performance tips in the Manual, in particular this section.

1 Like

Another thing: as I understand here `XYZlayers[j]` is an array. Here you are creating a new array on the right side of the equation and replacing the array on that position with this new array. You probably want to update the current array, with `XYXlayers[j] .= ... `(note the dot in `.=`). Alternatively, you can use, for simplicity:

``````     @. XYZlayers[j] = XYZlayers[j] + (images[i] * IllumiCMF[j,i])
``````

edit: Indeed, the same @nsajko posted above.

1 Like

This part would perhaps better be implemented as a polynomial approximation. In any case, I’m guessing some gamma correction functionality should already be available in a package in the ecosystem.

Doesn’t it make more sense for `XYZlayers` to be a 3D Array rather than a vector of matrices? For the latter there is no structural enforcement of the layers having the same size.