Map! on Base.generator

Suppose that I want to get the maximum of each column of a matrix (the actual task is of course a more complex function).

julia> x = randn(3, 4);

julia> map(maximum, eachcol(x))
4-element Array{Float64,1}:
 0.18983469254515659
 0.6975823542373406
 0.7069073875867887
 0.3085149421010158

However, the following syntax with map! does not work.

julia> t = zeros(4); # destination
julia> map!(maximum, t, eachcol(x))
ERROR: MethodError: no method matching map!(::typeof(maximum), ::Array{Float64,1}, ::Base.Generator{Base.OneTo{Int64},Base.var"#194#195"{Array{Float64,2}}})

But if we create an explicit array via collect, the following code works.

julia> map!(maximum, t, collect(eachcol(x)))
4-element Array{Float64,1}:
 0.18983469254515659
 0.6975823542373406
 0.7069073875867887
 0.3085149421010158

Is there any way to make the second code above work? I want to avoid memory allocation as much as possible since the map! will be called inside a loop.

A simple function like this does not allocate anything:

julia> function maxcol!(matrix,dest)
         i = firstindex(dest)
         for col in eachcol(matrix)
           dest[i] = maximum(col)
           i += 1
         end
       end
maxcol! (generic function with 1 method)

julia> x = rand(10,10);

julia> y = zeros(10);

julia> @btime maxcol!($x,$y)
  195.139 ns (0 allocations: 0 bytes)

1 Like

I maybe wrong, but i remember that in 1.6 there is new Iterators.map which should solve this issue.

I have one 1.7 nightly here and apparently not (same result as in 1.5.3):

julia> @btime map!(maximum,$y,collect(eachcol($x)))
  261.272 ns (1 allocation: 496 bytes)

julia> versioninfo()
Julia Version 1.7.0-DEV.536

Perhaps @Skoffer meant something like (using Julia 1.6)

julia> t .= Iterators.map(maximum, eachcol(x))
4-element Vector{Float64}:
 1.2240641310646176
 2.2647933007084426
 1.0913527176078932
 0.009157100568926244

However, it still allocates:

julia> @btime $t .= Iterators.map($maximum, eachcol($x));
  74.448 ns (1 allocation: 112 bytes)
1 Like

Indeed, it allocates less. For small arrays (~10 cols) it seems to be somewhat slower than the manual loop, but not for larger ones:

julia> x = rand(1000,1000); y = zeros(1000);

julia> function f!(x,y)
         y .= map(maximum,eachcol(x))
         nothing
       end
f! (generic function with 2 methods)

julia> @btime f!($x,$y)
  947.502 μs (1 allocation: 7.94 KiB)

julia> @btime maxcol!($x,$y)
  946.911 μs (0 allocations: 0 bytes)

julia> x = rand(10,10); y = zeros(10);

julia> @btime f!($x,$y)
  250.669 ns (1 allocation: 160 bytes)

julia> @btime maxcol!($x,$y)
  210.236 ns (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.7.0-DEV.536


But that is exactly the same in 1.5.3:

julia> x = rand(1000,1000); y = zeros(1000);

julia> function f!(x,y)
         y .= map(maximum,eachcol(x))
         nothing
       end
f! (generic function with 1 method)

julia> @btime f!($x,$y)
  950.048 μs (1 allocation: 7.94 KiB)

julia> versioninfo()
Julia Version 1.5.3


Edit: I noticed now that you are both mentioning explicit Iterators.map which is not the same function as Base.map, but the result is the same with both in 1.7.

Thanks, @lmiq. Yes, a custom function can simply address the issue. My actual need is that I want to use CUDA.map! (which has the same syntax as map!).

The custom function has to be written into a kernel (with some additional lines) and call with @cuda. This is the reason why I want to use map! directly.

It seems for now that the second code snippet map!(maximum, t, eachcol(x)) does not work. I will go with the custom function (i.e., a custom kernel in CUDA).

If you can, please post the CUDA solution here. I miss examples of CUDA function implementations.

function maxcol!(matrix, dest)
    i = threadIdx().x
    if i > size(matrix, 2)
        return nothing
    end
    m = -Inf32
    @inbounds for j = 1:size(matrix, 1)
        m = max(matrix[j, i], m)
    end
    dest[i] = m
    nothing
end

matrix = CUDA.randn(Float32, 10, 1000)
dest = CUDA.zeros(Float32, 1000);

@cuda threads=1000 maxcol!(matrix, dest)
dest |> Array  # to display

Note that

  • We cannot use maximum inside a kernel directly,; it otherwise tries to launch another kernel by maximum(@view matrix[:,i]) and results in an error. Wondering there are other better solutions rather than writing my own for loop.
  • The above code is just for demonstration is by no way a high-performance one, e.g., does not take memory coalescing into consideration.
1 Like