Speeding up a color transformation loop

Huh, interesting! I wonder if this is due to differences in multithreading of BLAS. On my computer, BLAS uses 4 threads:

julia> using LinearAlgebra

julia> LinearAlgebra.BLAS.get_num_threads()
4

The layers loop can be multithreaded too, simce there are no dependencies between iterations of the loop, it might be interesting to try changing that for loop to

    Threads.@threads for xy in CartesianIndices(XYZlayers[1])

And seeing if you get any speedup.

However, the other suggestions in this thread (reducing allocations and removing type instability) will have far bigger returns than optimizing this loop at this stage of development.

Can you make the parameters of your function clearer? Say by running:

typeof.((images, CMF, Illuminant))

and

size.((images, CMF, Illuminant))

and posting the output?

nsajko Ignore sRGBgamma(). The unweildy code is a side effect of originally relying heavily on colors.jl, which stores an RGB color as a struct of .r, .g, and .b. Most of my code changes those to arrays so that I can loop over them — I just never got around to doing that in sRGBgamma() function (and its not my biggest bottleneck).

Why doesn’t CartesianIndices make sense for going over a 2000x2000 matrix? Please explain.

In the meantime, I tested 3 versions:

for x in 1:horiz
    for y in 1:vert
       ColorSomething[x, y]...
    end
end

for y in 1:vert
    for x in 1:horiz
       ColorSomething[x, y]...
    end
end

for xy in CartesianIndices(ColorSomething)
    ColorSomething[xy]...
end

…and the CartesianIndices method was the fastest.

There is nothing wrong with CartesianIndices per se – they are actually a really nice feature of Julia. It’s just that linear indexing should be faster (as explained by @foobar_lv2). Thus, you might want to try

for idx in eachindex(ColorSomething)
    ColorSomething[idx] ...
end
1 Like

The first one is bad because you are running in the outer indexes first, and Julia is column major. The second one is good, and probably as fast (or faster) than the third if you add @inbounds to it. The third is fine, but you can use also (as mentioned above, @bertschi was faster):

for xy in eachindex(ColorSomething)
    ColorSomething[xy] = ...
end

which may be faster, but maybe the compiler lowers this to the same thing as the second and third options. All options may (or not) benefit from adding @inbounds, as the compiler may or not have that for granted depending on the exact code (but using it with eachindex or CartesianIndexes is safer, as you are sure that the loop is inbounds).

Dan:

Here’s the output:

typeof.((images, CMF, Illuminant)) (Vector{Any}, Vector{XYZ{Float64}}, Vector{Float64})
sizeof.((images, CMF, Illuminant)) (248, 744, 248)

To be honest, I’m baffled as to why it says the size of images = 248. If I step into the debugger, I can see that it is an array of 31 images (each image is grayscale 2000 x 2000)

numWL = 31
type of images Vector{Any}

size of images 248
type of images[1] Matrix{Gray{N0f16}}
size of images[1] 8000000

These are 16-bit images, which is why it is 8000000

The 744 for the CMF is because there are 31nm x 3 (8bit) xyz.
The 248 for the illuminant because there 31nm x 8bit (gray).

Thanks. If you look at my post, I’ve used size and not sizeof, which would have made the output a little easier to parse. But sizeof is okay too. Copy-paste is the ultimate friend.

And the typeof of parameter images is not good !

Vector{Any} means type instability. You want it to be Vector{Matrix{Gray{N0f16}}}. This could explain part of the performance problem.

I was very confused, because I would make a suggested change, such as using FLoops or rewriting it for matrices, and I was getting inconsistent results. A run consisted of processing 6 images, and sometimes the processing times for the images were all over the place.

Someone suggested that I had a global variable problem, but I didn’t have any global variables in my file. Nonetheless, while using JET to find type inconsistencies, it found a global variable in another file (that I include()-ed in my main file) that had a const global variable that was being redefined. I cleaned up the types, and used @isdefined to prevent the constants from being redefined, and the result is lightening-fast code that went from 38 seconds down to 0.58 seconds to process 6 images! It’s probably faster than that, as I still have some diagnostic code left in that reports timings.

Thanks everyone! The cleaned-up coded is below:

function processHyperSpectral( images::Vector{Matrix{Gray{N0f16}}}, CMF::Vector{XYZ{Float64}}, Illuminant::Vector{Float64})

    numWL = length(CMF)
    cmfArray = zeros( (3,numWL))

    for i in eachindex(CMF)
        cmfArray[1, i] = CMF[i].x
        cmfArray[2, i] = CMF[i].y
        cmfArray[3,i] = CMF[i].z
    end
    dimensions = size(images[1])

    horiz = dimensions[1]
    vert = dimensions[2]

    XYZlayers =Vector{Matrix{Float64}}()
    for j in 1:3
        push!( XYZlayers, fill(0.0, horiz, vert) )
    end
    #----------------
    # Precalculate Illuminant x CMF 
    IllumiCMF = zeros( (3,numWL))
    for i in eachindex(CMF)
        for j in 1:3
            IllumiCMF[j, i] = cmfArray[j, i] .* Illuminant[i]
        end
    end
    #------------------
    # 5) Multiply hyperspectral by cmf
    for i in eachindex( images)
        for j in 1:3
            XYZlayers[j] .+= images[i] .* IllumiCMF[j,i]            # CHANGE 1B
        end
    end
    #---------------
    # We'll norm these by XYZ, which is the average area under the 3 curves
    XYZ =  sum(cmfArray) 
    XYZ /= 3

    for j in 1:3
        XYZlayers[j] ./= XYZ                    # CHANGE 1c
    end
    #-----------------------------
    # Next, convert XYZ to RGB
    #colorLayers = fill(0.5, 3, horiz, vert)    
    colorLayers = Vector{Matrix{Float64}}()

    for j in 1:3
        push!( colorLayers, fill(0.0, horiz, vert) )
    end

    for xy in CartesianIndices(colorLayers[1])
        A = XYZlayers[1][ xy]
        B = XYZlayers[2][ xy]
        C = XYZlayers[3][ xy]
        colorLayers[1][ xy] = muladd( 3.2404542, 
                                A, 
                                muladd(-1.5371385, 
                                B, 
                                -0.4985314 * C)
                                )
        colorLayers[2][ xy] = muladd( -0.969266, 
                                A, 
                                muladd(1.8760108, 
                                XYZlayers[2][ xy], 
                                0.041556 * C)
                                )
        colorLayers[3][ xy] = muladd( 0.0556434, 
                                A, 
                                muladd(-0.2040259, 
                                B, 
                                1.0572252 * C)
                                )   
    end
    #------------------
    # 6) perform gamma correction and composit images into the final image
    for j in 1:3
        clamp!(colorLayers[j], 0, 1)
    end

    newImage = fill(RGB(0.0, .0, 0.0), horiz, vert)

    for xy in CartesianIndices(colorLayers[1])
        redTemp, greenTemp, blueTemp = sRGBGamma(colorLayers[1][ xy], colorLayers[2][ xy], colorLayers[3][ xy]) 
        newImage[xy] = RGB( redTemp, greenTemp, blueTemp )  # collect changes tuple to an array
    end

    return newImage
end
1 Like

Add a dot there.