Rebuilding Data from Heatmaps

Hello, first time posting, might be in the wrong category.

I need to extract scalar values from heatmaps saved as images. I donโ€™t have access to the raw data, only compressed JPGs with inlayed colorbars.

I would think that finding an efficient inverse of the colormap would be the ideal way to retrieve the actual values. What would be the (most) efficient way to compute the inverse/ what would be the most computationally efficient inverse?


UPDATE: Added an example. The images are from a thermal camera.

The Minima and Maxima are present in the picture.

Running the following code

using Images, ImageShow, Plots
frame_flir = Images.load("FLIR/Misc Pics/FLIR1274.jpg")

unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))

plot(unzip(frame_flir[31:209, 309:312]), linecolor = frame_flir[31:209, 310], label = false, xlab = "R", ylab = "G", zlab = "B")

plot(unzip(frame_flir[31:209, 309:312])[1], linecolor = :red, label = ["red" false false false])
plot!(unzip(frame_flir[31:209, 309:312])[2], linecolor = :green, label = ["green" false false false])
plot!(unzip(frame_flir[31:209, 309:312])[3], linecolor = :blue, label = ["blue" false false false])

I think this is a bit noisier than desired because of JPG compression, along the points where there are the black notches in the colorbar.

1 Like

Do you have some sample images you could provide us with? I think that would really increase your chances of getting good feedback. Do you know how the images were generated? Was a common colormap used, or was it something custom? Also, do you know how much data loss there may have been in the compression process? Finally, I assume the plotting library would have normalized the values before mapping them to the colors, so I donโ€™t think youโ€™ll be able to get the original values back without access to the raw data, unless you know min/max values maybe?

1 Like

Hey. I have added a sample image, and a bit of code I used to analyze the colorbar present in said image. I think itโ€™s within reason to infer the colormap using some smoothing interpolation, such as Loess.jl. My issue is more-so with the process of obtaining the inverse map, Temperature=f(R,G,B), which, ideally, should be fast.

Hmmm, what about building a k-d tree or ball tree with the colormap values and then finding the nearest neighbor for each pixel in the image? Once you get something (anything) working, it wonโ€™t be difficult to get advice from the community on how to speed it up.

Hereโ€™s an attempt to just get something working with your image above:

using ColorSchemes
using Images
using NearestNeighbors

flir = Images.load(raw"C:\Users\mthel\OneDrive\Pictures\FLIR.jpeg")
colorbar = flir[31:209, 309:312]
colors = vec(mean(colorbar, dims=2))
colormap = ColorScheme(colors)
kdtree = KDTree(vcat([[red(c) green(c) blue(c)] for c in colors]...)')

function pixel_to_value(pixel, kdtree)
    query_point = [red(pixel), green(pixel), blue(pixel)]
    idx, _ = knn(kdtree, query_point, 1) 
    normalized_val = (idx[1] - 1) / 178
	return 30.0 + (17.6 - 30.0) * normalized_val
end

img_vals = map(pixel -> pixel_to_value(pixel, kdtree), flir)

# img_vals =

240ร—320 Matrix{Float64}:
 21.1528  21.1528  21.1528  21.1528  21.1528  โ€ฆ  21.5708  21.7101  21.2921  21.0135
 21.1528  21.1528  21.1528  21.1528  21.1528     21.7101  21.6404  21.2225  21.0135
 21.0831  21.0831  21.1528  21.1528  21.1528     21.5011  21.2921  21.0831  21.0135
 21.0831  21.0831  21.1528  21.1528  21.1528     20.8045  21.2921  21.2225  21.1528
 21.0831  21.0831  21.0831  21.0831  21.0831     21.6404  21.3618  21.3618  21.4315
 21.0831  21.0831  21.0831  21.0831  21.0831  โ€ฆ  21.7101  21.5011  21.4315  21.4315
 21.1528  21.1528  21.0831  21.0831  21.0831     21.4315  21.2225  20.8045  20.5955
  โ‹ฎ                                           โ‹ฑ                             
 20.4562  20.0382  19.7596  20.5955  20.1079     22.3371  22.3371  22.3371  22.2674
 20.4562  20.6652  19.7596  20.0382  20.8742  โ€ฆ  22.4764  22.4067  22.3371  22.3371
 20.3169  20.3169  20.7348  20.4562  20.6652     22.4764  22.4764  22.4067  22.4067
 20.3169  20.1775  20.8742  20.3169  20.1775     22.5461  22.4764  22.4067  22.4764
 20.3169  20.3169  20.3169  20.5258  20.5258     22.4764  22.4764  22.4067  22.4067
 20.4562  20.3169  20.1775  20.2472  20.0382     22.4764  22.4067  22.4764  22.4067

# convert back the other direction as a sanity check

function value_to_pixel(value, colormap)
    normalized_val = (value - 30.0) / (17.6 - 30.0)
    return get(colormap, normalized_val)
end

recovered_img = map(val -> value_to_pixel(val, colormap), img_vals)

Below is the original image, as well as the recovered image:


You can see just from a visual inspection that there are some problems, but itโ€™s a start!

3 Likes

Given a heatmap as a color image of resolution (rows, cols), and fixing a number of colors, n_colors,
we can perform color image quantization, using K-means, to extract the basic elements that define a heatmap:
an array, z_data, of size (rows, cols), as normalized original data in the heatmap definition, and a colorscheme defined by n_colors.

using Clustering, Images, ColorSchemes
using Plots
plotlyjs()

function image2zvals(img::Union{Matrix{RGB{T}}, Matrix{RGBA{T}}};  
               n_colors=64, maxiter=200, tol=1e-04) where T<:Real
    
    rows, cols = size(img)
    observations = reshape(img, rows*cols)
    kmres = kmeans(observations, n_colors; maxiter = maxiter, tol=tol)
    nclust = nclusters(kmres)
    a = assignments(kmres)/nclust
    z_data = reshape(a, rows, cols)
   
    hcolors =  kmres.centers # kmres.centers is the codebook
    if fieldcount(eltype(img)) == 3
        cscheme = ColorScheme([RGB(c...) for c in eachcol(hcolors)])
    else
         cscheme = ColorScheme([RGB(c[1:3]...) for c in eachcol(hcolors)]) 
    end    
    return z_data, cscheme
end 

img = load("heatmap-forum.jpeg");
zdata, cscheme = image2zvals(img; n_colors=64,  maxiter=200, tol=1e-04);
#Reconstructed heatmap:
plt = heatmap(zdata[end:-1:1,:], c=cgrad(cscheme), cbar=false, size=(375, 300))

To avoid posting here one more image, see this notebook https://nbviewer.org/gist/empet/6f759474f944b6f865897bebc8b5da76 displaying the heatmap recovered from the computed normalized data and the colorscheme, as well as the values of z_data.

The current version of Clustering.js is 0.15.7, and I worked with an older one: 0.14.4.

3 Likes

Sorry for the long gap between updates.

Iโ€™ve tried a number of different approaches and benchmarked the results. The particular problem I tried to optimize is reducing the compute time for a pixel-to-value computation of an array, regardless of the setup time.

1 - Brute force
Using Loess.jl, fit the individual colors, and then sample them at an acceptable density, then compute the 3D distance between pixel color and the sampled colors. The minimum distance index gives the corresponding temperatrue.

Code Snippet
using Images, Loess, Statistics, ThreadsX

fn = "flir.jpg"

const lw, hi, ฮด = 18.0, 280., 0.1
const temp_rng = range(lw, hi, step = ฮด)

frame_flir = Images.load(fn)
idxs = (rows = 31:209, cols = 308:313)
cb = frame_flir[idxs...]

const t_samp_rng = range(lw, hi, length(idxs.rows)) |> collect

unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))
R, G, B = unzip(cb)

function (lm::Loess.LoessModel)(x)
    return predict(lm, x)
end

span = 0.1 # 0.125

model_R = loess(t_samp_rng, mean(R, dims = 2) |> vec; span)
model_G = loess(t_samp_rng, mean(G, dims = 2) |> vec; span)
model_B = loess(t_samp_rng, mean(B, dims = 2) |> vec; span)

### method 1, brute force

const samples = (R = model_R(temp_rng), 
                 G = model_G(temp_rng), 
                 B = model_B(temp_rng))

function find_temp(A, samples = samples)
    matB = map(x->(r = Float64(x.r), g = Float64(x.g), b = Float64(x.b)), A) :: Matrix{@NamedTuple{r::Float64, g::Float64, b::Float64}}
    t = ThreadsX.map(pix -> temp_rng[findmin(map((r, g, b) -> (pix.r - r)^2 + (pix.g - g)^2 + (pix.b - b)^2, samples...))[2]], matB) :: Matrix{Float64}
    return t 
end

Using BenchmarkTools.jl, average execution time is ~500ms.

julia> @benchmark find_temp(frame_flir) seconds = 10
BenchmarkTools.Trial: 21 samples with 1 evaluation.
 Range (min โ€ฆ max):  370.657 ms โ€ฆ    1.223 s  โ”Š GC (min โ€ฆ max): 53.07% โ€ฆ 69.91%
 Time  (median):     408.187 ms               โ”Š GC (median):    54.02%   
 Time  (mean ยฑ ฯƒ):   507.752 ms ยฑ 246.115 ms  โ”Š GC (mean ยฑ ฯƒ):  58.57% ยฑ  5.36%

  โ–ˆโ–‚โ–‚ โ–‚
  โ–ˆโ–ˆโ–ˆโ–…โ–ˆโ–โ–ˆโ–โ–โ–…โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–…โ–…โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–… โ–        
  371 ms           Histogram: frequency by time          1.22 s <        

 Memory estimate: 1.51 GiB, allocs estimate: 231131.

2 - Symbolic Regression
If an analytic expression linking the RGB encoding to the scalar value exists, then using it would be the fastest way to convert the entire picture. Using SymbolicRegression.jl, I tried to infer an analytic form starting from both the RGB and HSV colorspaces.

Code Snippet
using Images
import SymbolicRegression: SRRegressor
import MLJ: machine, fit!, predict, report

fn = "flir.jpg"

const lw, hi, ฮด = 18.0, 280., 0.1
const temp_rng = range(lw, hi, step = ฮด)

frame_flir = Images.load(fn)
idxs = (rows = 31:209, cols = 308:313)
cb = frame_flir[idxs...]

const t_samp_rng = range(lw, hi, length(idxs.rows)) |> collect

unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))
R, G, B = unzip(cb)

### method 2, symbolic regression

# RGB - colorspace
inRGB = (r = vec(R), g = vec(G), b = vec(B))

# T - temperature
inT = stack(t_samp_rng for i in 1:6) |> vec

model = SRRegressor(
    niterations = 1000,
    binary_operators = [+, -, *, /],
    unary_operators=[exp, log, cos, sin, sqrt],
)

mach = machine(model, inRGB, inT)

fit!(mach)
rep = report(mach)

println(rep.equations[rep.best_idx])

julia> ((log(r + 0.06514106105173974) + sin(sin(b * 2.00389950943762))) * -114.86518971882892) + 117.47144076159321

julia> @benchmark bestf.(frame_flir)
BenchmarkTools.Trial: 2622 samples with 1 evaluation.
 Range (min โ€ฆ max):  1.724 ms โ€ฆ  13.856 ms  โ”Š GC (min โ€ฆ max): 0.00% โ€ฆ 86.34%
 Time  (median):     1.830 ms               โ”Š GC (median):    0.00%
 Time  (mean ยฑ ฯƒ):   1.900 ms ยฑ 481.131 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  4.03% ยฑ  9.53%

  โ–†โ–ƒโ–‚โ–ˆโ–†โ–‚
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–…โ–โ–…โ–…โ–ƒโ–โ–โ–โ–โ–โ–ƒโ–ƒโ–„โ–โ–โ–โ–โ–โ–โ–โ–ƒโ–โ–โ–โ–โ–โ–ƒโ–โ–โ–โ–โ–ƒโ–โ–โ–โ–โ–โ–โ–โ–โ–โ–ƒโ–โ–โ–โ–โ–…โ–†โ–‡โ–ˆโ–ˆโ–† โ–ˆ
  1.72 ms      Histogram: log(frequency) by time      3.48 ms <

 Memory estimate: 600.09 KiB, allocs estimate: 4.

which is a ~500x improvement, however thereโ€™s a significant error; maybe itโ€™s possible to refine this approach, but I am no familiar enough with the underlying package.

and HSV

Code Snippet
# HSV - colorspace
H, S, V = convert.(HSV, cb) |> unzip
inHSV = (h = vec(H), s = vec(S), v = vec(V))

# T - temperature
inT = stack(t_samp_rng for i in 1:6) |> vec

model = SRRegressor(
    niterations = 1000,
    binary_operators = [+, -, *, /],
    unary_operators=[exp, log, cos, sin, sqrt],
)

mach = machine(model, inHSV, inT)

fit!(mach)
rep = report(mach)

println(rep.equations[rep.best_idx])

with similar results, both for time to execute and error.

3 - @mthelm85โ€™s approach using k-d tree, with some unsophisticated parallelization via ThreadsX.jl

julia> @benchmark ThreadsX.map(pixel -> pixel_to_value(pixel, kdtree), flir)
BenchmarkTools.Trial: 975 samples with 1 evaluation.                       
 Range (min โ€ฆ max):  3.462 ms โ€ฆ 26.377 ms  โ”Š GC (min โ€ฆ max):  0.00% โ€ฆ  6.33%
 Time  (median):     5.203 ms              โ”Š GC (median):    29.23%        
 Time  (mean ยฑ ฯƒ):   5.106 ms ยฑ  1.761 ms  โ”Š GC (mean ยฑ ฯƒ):  19.10% ยฑ 15.10%

  โ–ˆ          โ–โ–ƒ
  โ–ˆโ–ƒโ–ƒโ–ƒโ–„โ–ƒโ–„โ–ƒโ–‚โ–‚โ–‚โ–ˆโ–ˆโ–†โ–„โ–…โ–…โ–„โ–„โ–ƒโ–ƒโ–ƒโ–‚โ–‚โ–ƒโ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–โ–โ–โ–‚โ–‚โ–‚โ–โ–‚โ–โ–‚โ–โ–‚โ–โ–โ–โ–โ–‚โ–โ–โ–โ–‚โ–โ–โ–โ–‚ โ–ƒ
  3.46 ms        Histogram: frequency by time        11.8 ms <

 Memory estimate: 22.39 MiB, allocs estimate: 615110.

evaluates in ~5ms, which is close to the analytical approach, but the error is much improved.

.
I will also look into the performance when feeding the complete list of points to knn(...).

4 - @empetโ€™s approach using Clustering.jl seems to be a good approach for when the colorbar is missing. The downside is that the colorscheme, and implicitly the underlying scalar is unsorted.

julia> zdata, cscheme = image2zvals(img; n_colors=128,  maxiter=200, tol=1e-04);
julia> cscheme


And the execution time is the longest so far at ~ 2s.

julia> @benchmark image2zvals(img, n_colors = 128, maxiter = 50, tol = 1e-2) seconds = 60        
BenchmarkTools.Trial: 32 samples with 1 evaluation.
 Range (min โ€ฆ max):  1.196 s โ€ฆ    2.685 s  โ”Š GC (min โ€ฆ max): 0.24% โ€ฆ 0.15%
 Time  (median):     1.932 s               โ”Š GC (median):    0.19%
 Time  (mean ยฑ ฯƒ):   1.921 s ยฑ 368.344 ms  โ”Š GC (mean ยฑ ฯƒ):  0.17% ยฑ 0.05%

                       โ–ˆ โ–ƒ โ–ƒ     โ–ˆ      โ–ˆ โ–ƒ
  โ–‡โ–โ–โ–โ–โ–‡โ–‡โ–‡โ–โ–โ–‡โ–โ–โ–โ–‡โ–‡โ–โ–‡โ–โ–โ–‡โ–ˆโ–โ–ˆโ–โ–ˆโ–โ–โ–โ–โ–โ–ˆโ–โ–‡โ–‡โ–โ–‡โ–โ–ˆโ–‡โ–ˆโ–โ–โ–โ–โ–โ–โ–‡โ–‡โ–โ–โ–‡โ–โ–โ–โ–โ–โ–‡ โ–
  1.2 s          Histogram: frequency by time         2.69 s <

 Memory estimate: 47.21 MiB, allocs estimate: 335.
1 Like