# Speeding up a color transformation loop

Huh, interesting! I wonder if this is due to differences in multithreading of BLAS. On my computer, BLAS uses 4 threads:

``````julia> using LinearAlgebra

4
``````

The layers loop can be multithreaded too, simce there are no dependencies between iterations of the loop, it might be interesting to try changing that for loop to

``````    Threads.@threads for xy in CartesianIndices(XYZlayers[1])
``````

And seeing if you get any speedup.

However, the other suggestions in this thread (reducing allocations and removing type instability) will have far bigger returns than optimizing this loop at this stage of development.

Can you make the parameters of your function clearer? Say by running:

``````typeof.((images, CMF, Illuminant))
``````

and

``````size.((images, CMF, Illuminant))
``````

and posting the output?

nsajko Ignore sRGBgamma(). The unweildy code is a side effect of originally relying heavily on colors.jl, which stores an RGB color as a struct of .r, .g, and .b. Most of my code changes those to arrays so that I can loop over them — I just never got around to doing that in sRGBgamma() function (and its not my biggest bottleneck).

Why doesn’t CartesianIndices make sense for going over a 2000x2000 matrix? Please explain.

In the meantime, I tested 3 versions:

``````for x in 1:horiz
for y in 1:vert
ColorSomething[x, y]...
end
end

for y in 1:vert
for x in 1:horiz
ColorSomething[x, y]...
end
end

for xy in CartesianIndices(ColorSomething)
ColorSomething[xy]...
end
``````

…and the CartesianIndices method was the fastest.

There is nothing wrong with `CartesianIndices` per se – they are actually a really nice feature of Julia. It’s just that linear indexing should be faster (as explained by @foobar_lv2). Thus, you might want to try

``````for idx in eachindex(ColorSomething)
ColorSomething[idx] ...
end
``````
1 Like

The first one is bad because you are running in the outer indexes first, and Julia is column major. The second one is good, and probably as fast (or faster) than the third if you add `@inbounds` to it. The third is fine, but you can use also (as mentioned above, @bertschi was faster):

``````for xy in eachindex(ColorSomething)
ColorSomething[xy] = ...
end
``````

which may be faster, but maybe the compiler lowers this to the same thing as the second and third options. All options may (or not) benefit from adding `@inbounds`, as the compiler may or not have that for granted depending on the exact code (but using it with `eachindex` or `CartesianIndexes` is safer, as you are sure that the loop is inbounds).

Dan:

Here’s the output:

``````typeof.((images, CMF, Illuminant)) (Vector{Any}, Vector{XYZ{Float64}}, Vector{Float64})
sizeof.((images, CMF, Illuminant)) (248, 744, 248)
``````

To be honest, I’m baffled as to why it says the size of images = 248. If I step into the debugger, I can see that it is an array of 31 images (each image is grayscale 2000 x 2000)

numWL = 31
type of images Vector{Any}

``````size of images 248
type of images[1] Matrix{Gray{N0f16}}
size of images[1] 8000000
``````

These are 16-bit images, which is why it is 8000000

The 744 for the CMF is because there are 31nm x 3 (8bit) xyz.
The 248 for the illuminant because there 31nm x 8bit (gray).

Thanks. If you look at my post, I’ve used `size` and not `sizeof`, which would have made the output a little easier to parse. But `sizeof` is okay too. Copy-paste is the ultimate friend.

And the `typeof` of parameter `images` is not good !

`Vector{Any}` means type instability. You want it to be `Vector{Matrix{Gray{N0f16}}}`. This could explain part of the performance problem.

I was very confused, because I would make a suggested change, such as using FLoops or rewriting it for matrices, and I was getting inconsistent results. A run consisted of processing 6 images, and sometimes the processing times for the images were all over the place.

Someone suggested that I had a global variable problem, but I didn’t have any global variables in my file. Nonetheless, while using JET to find type inconsistencies, it found a global variable in another file (that I include()-ed in my main file) that had a const global variable that was being redefined. I cleaned up the types, and used @isdefined to prevent the constants from being redefined, and the result is lightening-fast code that went from 38 seconds down to 0.58 seconds to process 6 images! It’s probably faster than that, as I still have some diagnostic code left in that reports timings.

Thanks everyone! The cleaned-up coded is below:

``````function processHyperSpectral( images::Vector{Matrix{Gray{N0f16}}}, CMF::Vector{XYZ{Float64}}, Illuminant::Vector{Float64})

numWL = length(CMF)
cmfArray = zeros( (3,numWL))

for i in eachindex(CMF)
cmfArray[1, i] = CMF[i].x
cmfArray[2, i] = CMF[i].y
cmfArray[3,i] = CMF[i].z
end
dimensions = size(images[1])

horiz = dimensions[1]
vert = dimensions[2]

XYZlayers =Vector{Matrix{Float64}}()
for j in 1:3
push!( XYZlayers, fill(0.0, horiz, vert) )
end
#----------------
# Precalculate Illuminant x CMF
IllumiCMF = zeros( (3,numWL))
for i in eachindex(CMF)
for j in 1:3
IllumiCMF[j, i] = cmfArray[j, i] .* Illuminant[i]
end
end
#------------------
# 5) Multiply hyperspectral by cmf
for i in eachindex( images)
for j in 1:3
XYZlayers[j] .+= images[i] .* IllumiCMF[j,i]            # CHANGE 1B
end
end
#---------------
# We'll norm these by XYZ, which is the average area under the 3 curves
XYZ =  sum(cmfArray)
XYZ /= 3

for j in 1:3
XYZlayers[j] ./= XYZ                    # CHANGE 1c
end
#-----------------------------
# Next, convert XYZ to RGB
#colorLayers = fill(0.5, 3, horiz, vert)
colorLayers = Vector{Matrix{Float64}}()

for j in 1:3
push!( colorLayers, fill(0.0, horiz, vert) )
end

for xy in CartesianIndices(colorLayers[1])
A = XYZlayers[1][ xy]
B = XYZlayers[2][ xy]
C = XYZlayers[3][ xy]
colorLayers[1][ xy] = muladd( 3.2404542,
A,
B,
-0.4985314 * C)
)
colorLayers[2][ xy] = muladd( -0.969266,
A,
XYZlayers[2][ xy],
0.041556 * C)
)
colorLayers[3][ xy] = muladd( 0.0556434,
A,
B,
1.0572252 * C)
)
end
#------------------
# 6) perform gamma correction and composit images into the final image
for j in 1:3
clamp!(colorLayers[j], 0, 1)
end

newImage = fill(RGB(0.0, .0, 0.0), horiz, vert)

for xy in CartesianIndices(colorLayers[1])
redTemp, greenTemp, blueTemp = sRGBGamma(colorLayers[1][ xy], colorLayers[2][ xy], colorLayers[3][ xy])
newImage[xy] = RGB( redTemp, greenTemp, blueTemp )  # collect changes tuple to an array
end

return newImage
end
``````
1 Like

Add a dot there.