Broadcasting the reshaped data @. versus .=

raman_kumar · December 31, 2024, 5:43am

I am rewriting this code below using broadcasting to improve performance and clarity.

var = zeros(Float64, 9, 109, 50)
data = rand(9*109*50)
for j in 1:9
	# Extract the slice of data corresponding to the current `j`
	temp = data[((j-1)*109*50) + 1 : (j*109*50)]
	# Reshape the slice and directly assign it to `var`
	var[j, :, :] .= reshape(temp, 109, 50)
end

This code below gives error.

var = zeros(Float64, 9, 109, 50)
data = rand(9*109*50)
@. var = reshape(data, 9, 109, 50)

ERROR: DimensionMismatch: array could not be broadcast to match destination
Stacktrace:
[1] check_broadcast_shape
@ ./broadcast.jl:547 [inlined]
[2] check_broadcast_axes
@ ./broadcast.jl:550 [inlined]
[3] check_broadcast_axes
@ ./broadcast.jl:553 [inlined]
[4] instantiate
@ ./broadcast.jl:305 [inlined]
[5] materialize!
@ ./broadcast.jl:878 [inlined]
[6] materialize!(dest::Array{…}, bc::Base.Broadcast.Broadcasted{…})
@ Base.Broadcast ./broadcast.jl:875
[7] top-level scope
@ REPL[10]:1
Some type information was truncated. Use show(err) to see complete types.

while following code below works fine but i see increase in memory allocation. Why above code gives error?

var = zeros(Float64, 9, 109, 50)
data = rand(9*109*50)
var .= reshape(data, 9, 109, 50)

jules · December 31, 2024, 5:56am

@. adds a dot to reshape as well

DNF · December 31, 2024, 9:07am

Note that the loop and the broadcast expression do not do the same thing. They copy over the data in different order. The loop will also by less efficient, since it doesn’t respect the memory layout of Arrays.

raman_kumar · December 31, 2024, 3:56pm

Yes, Broadcasting has better performance. I notice the performance changes in these codes.

@timev begin
	var = zeros(Float64, 9, 109, 50)
	data = rand(9*109*50)
	var .= reshape(data, 9, 109, 50)
end


@timev begin
	var = zeros(Float64, 9, 109, 50)
	data = rand(9*109*50)
	for j in 1:9
		temp = data[((j-1)*109*50) + 1 : (j*109*50)]
		var[j, :, :] .= reshape(temp, 109, 50)
	end
	var
end

DNF · December 31, 2024, 4:10pm

It has nothing to do with broadcasting being faster than loops, you’re just doing the loop wrong, you must loop in correct, memory-friendly order.

E.g.

for i in eachindex(var, data) 
    var[i] = data[i]
end

raman_kumar · December 31, 2024, 4:19pm

How should i do in my case? Can you explain in detail? In my case for j in 1:9 there is no column-ordering issue.

DNF · December 31, 2024, 4:24pm

It depends on what the correct order you want the data copied over. Your loop and your broadcast expression do it differently, so I don’t know which you actually want. But if the broadcasted version is the one you want (which is also the most efficient order) then do for example

for the loop, or

var[:] .= data

for broadcasting.

DNF · December 31, 2024, 4:29pm

Yes, there is. For performance, the loop over j should be the innermost loop, not the outermost. But, as I said, your loop code and your broadcasting code are copying the data differently, so you will get different results.

raman_kumar · December 31, 2024, 4:39pm

Thanks for pointing out it to me. I would like to get data which i got using for loop but i also want to improve performance.

But on using broadcasting in way you suggested i got array instead of matrix. Please tell me proper way for broadcasting case.

@timev begin
	var = zeros(Float64, 9, 109, 50)
	data = rand(9*109*50)
	var[:] .= data
end

@timev begin
	for i in eachindex(var, data) 
    	var[i] = data[i]
	end
	var
end

I have an 1d array of data but i want to reshape it in a 3dim matrix.

DNF · December 31, 2024, 4:49pm

Well, it’s not obvious. If you have some data, and you want to copy it somewhere else in non-contiguous order, then either the left-hand or right-hand side must be iterated in non-optimal order.

It is possible that your first piece of code is good, just slap a @view in front of the slice. But it requires testing to find the best alternative.

Oh, also make sure you put your code inside a function.

raman_kumar · December 31, 2024, 5:03pm

@DNF I would like to get data in pattern that i get using this for loop shown below. Your every suggestion gives different pattern.

for j in 1:9
	temp = data[((j-1)*109*50) + 1 : (j*109*50)]
	var[j, :, :] .= reshape(temp, 109, 50)
end

DNF · December 31, 2024, 5:28pm

As I said, there are two different patterns, and each of your code snippets produce one of each. I have not introduced any new patterns, only pointed out that your codes were not consistent.

Now that you have clarified which one you want, you can try

function mycopy!(var, data) 
    # don't hardcode 1:9, that will only work for one specific array
    N = size(var, 1) * size(var, 2)
    for j in axes(var, 1)  
    	temp = @view data[((j-1)*N) + 1 : j*N]
	    var[j, :] .= temp
    end
    return var
end

You could also try iterating over contiguous samples in var and discontinuous samples in data to see which is faster.

raman_kumar · January 1, 2025, 4:58am

Your example don’t work. Actually i have to use N = size(var, 2) * size(var, 3) and var[j,:,:] .= reshape(temp, 109, 50) to get same output but there is no performance improvement.

@timev begin
	var = zeros(Float64, 9, 109, 50)
	data = rand(9*109*50)
	for j in 1:9
		temp = data[((j-1)*109*50) + 1 : (j*109*50)]
		var[j, :, :] .= reshape(temp, 109, 50)
	end
	var
end

@timev begin
    # don't hardcode 1:9, that will only work for one specific array
    N = size(var, 2) * size(var, 3)
    for j in axes(var, 1)  
    	temp = @view data[((j-1)*N) + 1 : j*N]
	    var[j,:,:] .= reshape(temp, 109, 50)
    end
    var
end

But this code below gives better performance.

@timev begin
    data = rand(9 * 109 * 50)
    reshaped_data = reshape(data, 109, 50, 9)
    var = permutedims(reshaped_data, (3, 1, 2))
end

Is there any better way to improve performance further? Thanks.

DNF · January 1, 2025, 11:48am

I was sure the above would work, by unravelling the trailing dimensions, just like var[:] does work, but apparently not. I did not have access to a computer to test, but I was completely certain.

A good solution then is to just make temp into a view, as you have in one of your snippets.

The snippet below does something different from your other snippets, it creates a new array var, instead of writing in-place into a pre-existing array.

It is indeed quite fast, though, and you can make it in-place by doing:

siz = size(var)
reshaped_data = reshape(data, siz[2], siz[3], siz[1])
permutedims!(var, reshaped_data, (3, 1, 2))

(note the trailing ! in permutedims!)

Also note that your benchmarking is a bit suboptimal. Firstly, you are timing the the creation of you input data as well as the copying code, which I assume is not desirable, and secondly, you should put your code in functions, like

function newcopy!(var, data)
    siz = size(var)
    permutedims!(var, reshape(data, siz[2], siz[3], siz[1]), (3, 1, 2))
    return var
end

and then benchmark passing the data as input

@time newcopy!(var, data)

Or, even better, use the BenchmarkTools.jl package. On my computer, @timev estimates 45us runtime for newcopy!, while BencmarkTools.@btime tells me 25us, which is probably more correct.

Topic		Replies	Views
Broadcast two vectors to form a matrix General Usage matrices	9	1004	May 31, 2022
Performance of array broadcasting General Usage question	11	1179	July 8, 2017
Understanding major order performance when broadcasting in column vs row operations Performance question , array , benchmark	9	1002	June 21, 2021
Efficient Broadcasting in Julia: CartesianIndex vs. Reshape New to Julia broadcast , reshaping	1	249	November 30, 2023
Broadcast into higher dimensional array vs nested array Performance	4	625	July 3, 2020

Broadcasting the reshaped data @. versus .=

Related topics