Hi,
I have a matrix A = rand(M,N). I need to compute the maximum by column. Then compute the number of maximands in each row. What is an efficient way for large matrices? Here is what I am doing.
B = (A.==maximum(A,dims=1))
C = sum(B,dims=2)
Hi,
I have a matrix A = rand(M,N). I need to compute the maximum by column. Then compute the number of maximands in each row. What is an efficient way for large matrices? Here is what I am doing.
B = (A.==maximum(A,dims=1))
C = sum(B,dims=2)
Why not just a loop? It avoids allocating the array B = (A.==maximum(A,dims=1))
.
julia> using Random
julia> N = 4;
julia> M = 3;
julia> A = rand(1:10, N, M)
4×3 Matrix{Int64}:
4 9 2
6 9 5
10 2 7
7 5 9
julia> colmax = maximum(A; dims = 1)
1×3 Matrix{Int64}:
10 9 9
julia> C=zeros(Int, N);
julia> for j=1:N
for k=1:M
A[j,k] == colmax[k] && (C[j] += 1);
end
end
julia> @show C
C = [1, 1, 1, 1]
4-element Vector{Int64}:
1
1
1
1
The method you propose does not scale well. I can avoid creating B by just doing the max inside, but the loop with M = 300 and N = 100000 is a lot slower than what I suggested.
You can avoid allocating colmax
too, which at least for these dims
helps a bit:
julia> function mine(A)
N, M = size(A)
C = zeros(Int, N)
@inbounds for col in 1:M
top = typemin(eltype(A))
for row in 1:N
top = max(top, A[row, col])
end
for row in 1:N
C[row] += (A[row, col] == top)
end
end
C
end;
julia> function user(A)
B = (A.==maximum(A,dims=1))
C = sum(B,dims=2)
end;
julia> function henry(A)
colmax = maximum(A; dims = 1)
N, M = size(A)
C=zeros(Int, N)
for j=1:N
for k=1:M
A[j,k] == colmax[k] && (C[j] += 1);
end
end
C
end;
julia> M = 300; N = 100_000; A = rand(1:N, N, M);
julia> vec(user(A)) == henry(A) == mine(A)
true
julia> @btime user($A);
54.233 ms (14 allocations: 4.35 MiB)
julia> @btime henry($A);
49.214 ms (5 allocations: 783.92 KiB)
37.659 ms (5 allocations: 783.92 KiB) # with @inbounds
julia> @btime mine($A);
21.359 ms (2 allocations: 781.33 KiB)