How to replace multiple values in an large array with the occurrence of the value?

Bjorn_De · November 25, 2020, 10:55am

Hi!
I have an array and I want to create a new array (or replace the values) with the occurrences of the values from the input array. For example:

using StatsBase

blobs
10×10 Array{Int64,2}:
 1  1  0  2  0  2  0  2  0  2
 1  1  0  2  2  0  2  2  2  2
 0  1  0  0  0  0  0  2  0  0
 1  1  0  0  2  0  2  2  2  2
 0  1  0  0  2  2  2  2  2  0
 1  0  0  2  2  0  0  2  2  0
 0  0  0  0  2  2  2  0  2  0
 0  0  2  0  0  2  2  0  2  2
 0  0  2  0  2  2  2  0  0  0
 0  0  0  2  2  0  2  0  3  0

blobs_occ = countmap(vec(blobs))
Dict{Int64,Int64} with 4 entries:
  0 => 49
  2 => 41
  3 => 1
  1 => 9

blob_counts = zeros(size(arr));

for (key, value) in blobs_occ
       C=findall(blobs.==key)
       blob_counts[C] .= value
end

blob_counts
10×10 Array{Float64,2}:
  9.0   9.0  49.0  41.0  49.0  41.0  49.0  41.0  49.0  41.0
  9.0   9.0  49.0  41.0  41.0  49.0  41.0  41.0  41.0  41.0
 49.0   9.0  49.0  49.0  49.0  49.0  49.0  41.0  49.0  49.0
  9.0   9.0  49.0  49.0  41.0  49.0  41.0  41.0  41.0  41.0
 49.0   9.0  49.0  49.0  41.0  41.0  41.0  41.0  41.0  49.0
  9.0  49.0  49.0  41.0  41.0  49.0  49.0  41.0  41.0  49.0
 49.0  49.0  49.0  49.0  41.0  41.0  41.0  49.0  41.0  49.0
 49.0  49.0  41.0  49.0  49.0  41.0  41.0  49.0  41.0  41.0
 49.0  49.0  41.0  49.0  41.0  41.0  41.0  49.0  49.0  49.0
 49.0  49.0  49.0  41.0  41.0  49.0  41.0  49.0   1.0  49.0

This works well with small arrays but if I have an larger array e.g. 10000x10000 this could take many hours. Is there a way to do this faster?
Thanks in advance, Bjoern

rfourquet · November 25, 2020, 11:10am

Using map! seems moderately faster: map!(x -> blobs_occ[x], blob_counts, blobs).

rfourquet · November 25, 2020, 11:29am

Although, if the keys are “dense” like in your example, you can instead store the countmap in a Vector, which makes it then quite faster to look up:

minkey, maxkey = extrema(keys(blobs_occ))
occ = zeros(Int, maxkey-minkey+1)
for (k, v) in blobs_occ
    occ[k+1-minkey] = v
end
map!(x -> occ[x+1-minkey], blob_counts, blobs)

jishnub · November 25, 2020, 11:32am

You need to iterate the array keeping the column-major layout in mind. map! would do this for you. You may also loop over the array by yourself:

julia> using StatsBase

julia> blobs = rand(1:10, 10_000, 10_000);

julia> blobs_occ = countmap(vec(blobs))
Dict{Int64,Int64} with 10 entries:
  7  => 9999281
  4  => 10004145
  9  => 9998303
  10 => 10000867
  2  => 10003930
  3  => 10000831
  5  => 10001701
  8  => 9993679
  6  => 9996181
  1  => 10001082

julia> blob_counts = similar(blobs);

julia> function elcount!(blob_counts, blobs, blobs_occ)
           for (ind, el) in enumerate(blobs)
               blob_counts[ind] = blobs_occ[el]
           end
           return blob_counts
       end

julia> @time elcount!(blob_counts, blobs, blobs_occ);
  1.499894 seconds

sijo · November 25, 2020, 11:39am

Or more succinctly:

blobs_occ = countmap(vec(blobs))
[blobs_occ[value] for value in blobs]

Edit: and the same in one line, though less readable I think:

getindex.(Ref(countmap(vec(blobs))), blobs)

rafael.guerra · November 26, 2020, 11:31pm

@rfourquet, your solution is amazingly fast but not the easiest to read.

On another note, how can we @btime a simple but slower in-place double-loop solution as below?

function elcount6!(blobs, blobs_occ)
    @inbounds for j in axes(blobs,2), i in axes(blobs,1)
        blobs[i,j] = blobs_occ[blobs[i,j]];
    end
    return blobs
end

Thanks in advance.

JeffreySarnoff · November 27, 2020, 12:36am

Using @btime to time functions over large arrays is not difficult. Remember to prefix the arguments in the function call with $, this tells BenchmarkTools not to treat their presence in the function call as a something of specific timing interest (it is the time required by however they are used within the function that matters).

using BenchmarkTools

# toi see the tiing info and the returned values
@btime elcount6!($blobs, $blobs_occ)

# to see the timing  info only (suppressing the values)
@btime elcount6!($blobs, $blobs_occ);

rafael.guerra · November 27, 2020, 8:24am

@JeffreySarnoff, thanks for your time and answer, but it issues error messages:

function elcount6!(blobs, blobs_occ)
    @inbounds for j in axes(blobs,2), i in axes(blobs,1)
        blobs[i,j] = blobs_occ[blobs[i,j]];
    end
    return blobs
end

blobs = rand(0:9,10_000,10_000);
blobs_occ = countmap(vec(blobs))
@btime elcount6!($blobs, $blobs_occ)

julia> @btime elcount6!($blobs, $blobs_occ)
ERROR: KeyError: key 10002175 not found
Stacktrace:
 [1] getindex at .\dict.jl:467 [inlined]
 ...

Some setup seems to be required to @btime in-place modifying functions but I did not figure out yet how to do it for the function above.

sijo · November 27, 2020, 10:18am

Indeed a fresh blob variable is required for every evaluation, you can do it like this:

function elcount!(blobs, blobs_occ)
	@inbounds for i in eachindex(blobs)
		blobs[i] = blobs_occ[blobs[i]];
	end
	return blobs
end

blobs = rand(0:9,10_000,10_000);
blobs_occ = countmap(vec(blobs));

@btime elcount!(b, $blobs_occ) setup=(b=copy(blobs)) evals=1;

rafael.guerra · November 27, 2020, 10:59am

@sijo, much appreciated.
The result shows 0 allocations and 0 bytes. Is it correct?

@btime elcount6!(b, $blobs_occ) setup=(b=copy(blobs)) evals=1;  #
  910 ms (0 allocations: 0 bytes)

sijo · November 27, 2020, 11:07am

Isn’t that beautiful? It’s expected since the dictionary of counts is created outside of the benchmarked code. The code really has nothing to do except finding and assigning existing values.

Topic		Replies	Views
Counting number of occurences in an array Tooling question , statistics , arrays , splitapplycombine	10	16141	December 18, 2019
Frequency counts on a square lattice Performance	6	537	September 13, 2020
Optimizing counting number of occurrences of a given number in an array Performance question	10	342	November 19, 2023
Replacing values of specific entries in an Array In Julia General Usage	10	24071	May 26, 2021
Number of each unique value in an array General Usage	4	5271	March 26, 2024

How to replace multiple values in an large array with the occurrence of the value?

Related topics