Coloring similar entries in a matrix with the same color

I have a matrix A with many entries but many of them are very close to each other. A small example of A could be:

A = [1.0001  2.0003  1.0;
    2.0  3.0001  2.0004;
    1.0001  2.0002  1.0]

The actual matrix A is quite large, but the number of unique elements in the matrix is much smaller subject to the tolerance. I would like to color the numbers that are similar to each other, e.g., for the small example above, I would like to display the matrix A in a manner so that 1.0001, 1.0 are in red (say), 2.0003, 2.0002, 2.0004, 2.0 in blue, 3.0001 in green and so on. Is there a way for Julia to achieve this? I am not sure if there is a package that can achieve this. Any tips/suggestions will be much appreciated.

Given A as defined, the following returns a matrix of colors for the entries (it might even work for other ndims values).

julia> let c = 0, B = similar(A, Int)
    t = foldl((s,n)->(isapprox(last(n[1]),last(n[2]); atol=0.1) || (c += 1;); vcat(s,first(n[2])=>c)),
      IterTools.partition(vcat(CartesianIndex(1, 1)=>NaN,sort(vec(collect(pairs(A))); by = last)), 2, 1); 
      init = Pair{CartesianIndex{ndims(A)},Int}[]
    )
    setindex!.(Ref(B), last.(t), first.(t))
    B
end
3Ă—3 Matrix{Int64}:
 1  2  1
 2  3  2
 1  2  1

The isapprox in the middle determined the tolerance and can be tweaked and, of course, this should be a function.

And this works only for sortable values, even though, any range which has a similarity metric should suffice to do this clustering-like result (but with a different algo).

Thanks @Dan, the code produces the following error:

UndefVarError: `Itr` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
 [1] top-level scope
   @ REPL[2]:2

I found a package called https://github.com/KristofferC/Crayons.jl, I will see if I can use it achieve what I want. Thanks again.

Oh yeah… forgot, Itr is what I use for IterTools package.
I’m fixing the post, but you can just do:

import IterTools as Itr

to fix it.

I’m not sure I can follow @Dan’s solution, but here’s one I spent some time optimizing. It treats 0 somehow specially, but maybe that’s not a problem for you:

# this one vectorizes, round does not
function unsafe_round(f::AbstractFloat; scale)
    (x, n) = frexp(f)
    y = unsafe_trunc(Int, scale * x) / scale
    return ldexp(y, n)
end

function _clamp_round!(
    A::AbstractArray;
    atol=Base.rtoldefault(real(eltype(A))),
    sigdigits=floor(Int, -log10(atol))
)
    @inbounds for (i, a) in pairs(A)
        A[i] = ifelse(
            abs(a) < atol,
            zero(a),
            unsafe_round(a, scale=10^sigdigits)
        )
    end
    return A
end

function partition(M::AbstractMatrix)
    l = 0 # number of partitions disregarding 0
    d = Dict(zero(eltype(M)) => l)
    res = zeros(Int, size(M))
    for (v, idx) in zip(M, eachindex(res))
        k = get!(d, v, l + 1)
        l = ifelse(k == l + 1, k, l)
        res[idx] = k
    end
    return l, res
end
julia> let A = deepcopy(A), atol = 1e-3
           A = _clamp_round!(A, atol=atol)
           (nparts, M) = partition(A)
           @info nparts
           M
       end
[ Info: 3
3Ă—3 Matrix{Int64}:
 1  2  1
 2  3  2
 1  2  1

julia> let A = deepcopy(A), atol = 1e-4
           A = _clamp_round!(A, atol=atol)
           (nparts, M) = partition(A)
           @info nparts
           M
       end
[ Info: 4
3Ă—3 Matrix{Int64}:
 1  2  1
 2  3  4
 1  2  1

@Shuvomoy_Das_Gupta How large are your matrices?
If you create an even faster solution, please share it here as well!

My earlier suggestion was a bit too obfuscated. Hopefully this one is easier to read:

function cheapcluster(A, tol)
    v = sort(vec(A))
    return getindex.(Ref(Dict(v .=> cumsum(vcat(1,diff(v) .> tol)))),A)
end

and with this defined and the A in the OP:

julia> cheapcluster(A, 0.1)
3Ă—3 Matrix{Int64}:
 1  2  1
 2  3  2
 1  2  1

julia> cheapcluster(A, 1e-4)
3Ă—3 Matrix{Int64}:
 1  4  1
 2  5  4
 1  3  1

MĂŞme principe avec sortperm

function cheapcluster(A, tol)
    v = vec(A)
    id = sortperm(v)
    sorted_v = @view v[id]
    clusters = zeros(Int, length(v))
    cluster_id = 1
    clusters[1] = cluster_id

    for i in 2:length(sorted_v)
        if sorted_v[i] - sorted_v[i-1] > tol
            cluster_id += 1
        end
        clusters[i] = cluster_id
    end

    result = zeros(Int, size(A))
    result[id] .= clusters
    return results 
end

Given that the intention is to colorize the matrix, I would tackle this from a colormap perspective. Don’t know how representative the A matrix is of the true case, so my solution may not be adequate for it, but for this A

using GMT

C = makecpt(range=(0.5,3.5,1),)   # Colormap with boundaries between the the integers
grdimage(A, cmap=C, show=true) # The image

The fact that the outer pixels seem to be cut is no error. It results from grid being grid registered. A pixel registerd grid can be obtained with

G = mat2grid(Float32.(A), reg=1);  # Better to use Float32 as the image creation implies that conversion under the hood
grdimage(G, cmap=C, show=true)

1 Like

Short answer: probably use round.(A, sigdigits=2) or similar to assign colors. (Adjust sigdigits as desired, or use digits if you want an absolute tolerance.)

Longer answer: approximate comparison is not transitive, so trying to define “uniqueness subject to a tolerance” inevitably will encounter some odd behaviors. This has been a frequently discussed topic. See, for example:

1 Like

This was the original intent behind my question, I am looking into using Crayons.jl as well. Thanks very much for the code.

The largest matrix in my case will be around 50Ă—50. Thanks very much for the code @abulak .