The actual matrix A is quite large, but the number of unique elements in the matrix is much smaller subject to the tolerance. I would like to color the numbers that are similar to each other, e.g., for the small example above, I would like to display the matrix A in a manner so that 1.0001, 1.0 are in red (say), 2.0003, 2.0002, 2.0004, 2.0 in blue, 3.0001 in green and so on. Is there a way for Julia to achieve this? I am not sure if there is a package that can achieve this. Any tips/suggestions will be much appreciated.
Given A as defined, the following returns a matrix of colors for the entries (it might even work for other ndims values).
julia> let c = 0, B = similar(A, Int)
t = foldl((s,n)->(isapprox(last(n[1]),last(n[2]); atol=0.1) || (c += 1;); vcat(s,first(n[2])=>c)),
IterTools.partition(vcat(CartesianIndex(1, 1)=>NaN,sort(vec(collect(pairs(A))); by = last)), 2, 1);
init = Pair{CartesianIndex{ndims(A)},Int}[]
)
setindex!.(Ref(B), last.(t), first.(t))
B
end
3Ă—3 Matrix{Int64}:
1 2 1
2 3 2
1 2 1
The isapprox in the middle determined the tolerance and can be tweaked and, of course, this should be a function.
And this works only for sortable values, even though, any range which has a similarity metric should suffice to do this clustering-like result (but with a different algo).
I’m not sure I can follow @Dan’s solution, but here’s one I spent some time optimizing. It treats 0 somehow specially, but maybe that’s not a problem for you:
# this one vectorizes, round does not
function unsafe_round(f::AbstractFloat; scale)
(x, n) = frexp(f)
y = unsafe_trunc(Int, scale * x) / scale
return ldexp(y, n)
end
function _clamp_round!(
A::AbstractArray;
atol=Base.rtoldefault(real(eltype(A))),
sigdigits=floor(Int, -log10(atol))
)
@inbounds for (i, a) in pairs(A)
A[i] = ifelse(
abs(a) < atol,
zero(a),
unsafe_round(a, scale=10^sigdigits)
)
end
return A
end
function partition(M::AbstractMatrix)
l = 0 # number of partitions disregarding 0
d = Dict(zero(eltype(M)) => l)
res = zeros(Int, size(M))
for (v, idx) in zip(M, eachindex(res))
k = get!(d, v, l + 1)
l = ifelse(k == l + 1, k, l)
res[idx] = k
end
return l, res
end
julia> let A = deepcopy(A), atol = 1e-3
A = _clamp_round!(A, atol=atol)
(nparts, M) = partition(A)
@info nparts
M
end
[ Info: 3
3Ă—3 Matrix{Int64}:
1 2 1
2 3 2
1 2 1
julia> let A = deepcopy(A), atol = 1e-4
A = _clamp_round!(A, atol=atol)
(nparts, M) = partition(A)
@info nparts
M
end
[ Info: 4
3Ă—3 Matrix{Int64}:
1 2 1
2 3 4
1 2 1
@Shuvomoy_Das_Gupta How large are your matrices?
If you create an even faster solution, please share it here as well!
function cheapcluster(A, tol)
v = vec(A)
id = sortperm(v)
sorted_v = @view v[id]
clusters = zeros(Int, length(v))
cluster_id = 1
clusters[1] = cluster_id
for i in 2:length(sorted_v)
if sorted_v[i] - sorted_v[i-1] > tol
cluster_id += 1
end
clusters[i] = cluster_id
end
result = zeros(Int, size(A))
result[id] .= clusters
return results
end
Given that the intention is to colorize the matrix, I would tackle this from a colormap perspective. Don’t know how representative the A matrix is of the true case, so my solution may not be adequate for it, but for this A
using GMT
C = makecpt(range=(0.5,3.5,1),) # Colormap with boundaries between the the integers
grdimage(A, cmap=C, show=true) # The image
Short answer: probably use round.(A, sigdigits=2) or similar to assign colors. (Adjust sigdigits as desired, or use digits if you want an absolute tolerance.)
Longer answer: approximate comparison is not transitive, so trying to define “uniqueness subject to a tolerance” inevitably will encounter some odd behaviors. This has been a frequently discussed topic. See, for example: