I would like to know how the most efficient way to chunk a multi-dimensional array in julia.
arr = rand(Float64, (10, 100))
How can split it into chunks of size 30 so that I get 3 (10, 30) and 1 (10,10) arrays?
I have this function:
function chunk_array(arr, N)
chunked = []
n_columns = size(arr)[2]
for i in 1:N:n_columns
push!(chunked, arr[:, i:min(i+N-1, n_columns)])
end
return chunked
end
You can replace size(arr)[2] by size(arr, 2) (this is more of a style improvement). You can pre-allocate the vector for the chunks.
function chunk_array_2(arr::AbstractMatrix, N::Integer)
n_cols = size(arr, 2)
n_chunks = ceil(Int, n_cols / N)
chunks = Vector{typeof(arr)}(undef, n_chunks)
for i in 1:n_chunks
from = N * (i - 1) + 1
to = min(i * N, n_cols)
chunks[i] = arr[:, from:to]
end
return chunks
end
If it suits your needs, you can use a view when slicing the matrix.
function chunk_array_3(arr::AbstractMatrix, N::Integer)
n_cols = size(arr, 2)
n_chunks = ceil(Int, n_cols / N)
chunks = Vector{AbstractMatrix}(undef, n_chunks)
for i in 1:n_chunks
from = N * (i - 1) + 1
to = min(i * N, n_cols)
chunks[i] = @view arr[:, from:to]
end
return chunks
end
Loop in Julia are fine and what you wrote is fine except for one or two things:
chunked isa Vector{Any} and will lead to poor downstream performance every time it is accessed, due to type instability. Note that Vector{AbstractMatrix} is basically just as bad. You want the type to be declared or inferred concretely.
You are making copies of the data when you write arr[:, i:min(i+N-1, n_columns)]. If you want copies, this is fine. If it’s okay to alias the input data, consider using @view arr[:, i:min(i+N-1, n_columns)] instead. When aliased, changes to arr will be reflected in chunked and vice-versa, as they share memory.
I would probably write this function like this
function chunk_array(arr::AbstractVecOrMat, N)
chunked = map(Iterators.partition(axes(arr,2),N)) do cols
@view arr[:,cols] # remove @view if you want copies that do not alias `arr`
end
return chunked
end
Yes, if you declared the type via chunked = THE_TYPE[] or chunked = Vector{THE_TYPE}(undef, num_chunks) it would be fine. The annoying part is that the type can be a little complicated at times. But in this case, since you want the data copied, it’s pretty easy and Matrix{eltype(arr)} in place of THE_TYPE will work.
But I would still recommend you use my suggested solution with @view deleted. It will make the copies and let the compiler determine the type for you (thanks to map). Declaring types manually can be tedious (and sometimes very difficult) to do correctly.