Broadcasting the reshaped data @. versus .=

I am rewriting this code below using broadcasting to improve performance and clarity.

var = zeros(Float64, 9, 109, 50)
data = rand(9*109*50)
for j in 1:9
	# Extract the slice of data corresponding to the current `j`
	temp = data[((j-1)*109*50) + 1 : (j*109*50)]
	# Reshape the slice and directly assign it to `var`
	var[j, :, :] .= reshape(temp, 109, 50)
end

This code below gives error.

var = zeros(Float64, 9, 109, 50)
data = rand(9*109*50)
@. var = reshape(data, 9, 109, 50)

ERROR: DimensionMismatch: array could not be broadcast to match destination
Stacktrace:
[1] check_broadcast_shape
@ ./broadcast.jl:547 [inlined]
[2] check_broadcast_axes
@ ./broadcast.jl:550 [inlined]
[3] check_broadcast_axes
@ ./broadcast.jl:553 [inlined]
[4] instantiate
@ ./broadcast.jl:305 [inlined]
[5] materialize!
@ ./broadcast.jl:878 [inlined]
[6] materialize!(dest::Array{…}, bc::Base.Broadcast.Broadcasted{…})
@ Base.Broadcast ./broadcast.jl:875
[7] top-level scope
@ REPL[10]:1
Some type information was truncated. Use show(err) to see complete types.

while following code below works fine but i see increase in memory allocation. Why above code gives error?

var = zeros(Float64, 9, 109, 50)
data = rand(9*109*50)
var .= reshape(data, 9, 109, 50)

@. adds a dot to reshape as well

1 Like

Note that the loop and the broadcast expression do not do the same thing. They copy over the data in different order. The loop will also by less efficient, since it doesn’t respect the memory layout of Arrays.

2 Likes

Yes, Broadcasting has better performance. I notice the performance changes in these codes.

@timev begin
	var = zeros(Float64, 9, 109, 50)
	data = rand(9*109*50)
	var .= reshape(data, 9, 109, 50)
end


@timev begin
	var = zeros(Float64, 9, 109, 50)
	data = rand(9*109*50)
	for j in 1:9
		temp = data[((j-1)*109*50) + 1 : (j*109*50)]
		var[j, :, :] .= reshape(temp, 109, 50)
	end
	var
end

It has nothing to do with broadcasting being faster than loops, you’re just doing the loop wrong, you must loop in correct, memory-friendly order.

E.g.

for i in eachindex(var, data) 
    var[i] = data[i]
end

How should i do in my case? Can you explain in detail? In my case for j in 1:9 there is no column-ordering issue. :thinking:

It depends on what the correct order you want the data copied over. Your loop and your broadcast expression do it differently, so I don’t know which you actually want. But if the broadcasted version is the one you want (which is also the most efficient order) then do for example

for the loop, or

var[:] .= data

for broadcasting.

Yes, there is. For performance, the loop over j should be the innermost loop, not the outermost. But, as I said, your loop code and your broadcasting code are copying the data differently, so you will get different results.

Thanks for pointing out it to me. I would like to get data which i got using for loop but i also want to improve performance.

:innocent: But on using broadcasting in way you suggested i got array instead of matrix. Please tell me proper way for broadcasting case.

@timev begin
	var = zeros(Float64, 9, 109, 50)
	data = rand(9*109*50)
	var[:] .= data
end

@timev begin
	for i in eachindex(var, data) 
    	var[i] = data[i]
	end
	var
end

I have an 1d array of data but i want to reshape it in a 3dim matrix.

Well, it’s not obvious. If you have some data, and you want to copy it somewhere else in non-contiguous order, then either the left-hand or right-hand side must be iterated in non-optimal order.

It is possible that your first piece of code is good, just slap a @view in front of the slice. But it requires testing to find the best alternative.

Oh, also make sure you put your code inside a function.

@DNF I would like to get data in pattern that i get using this for loop shown below. Your every suggestion gives different pattern.

for j in 1:9
	temp = data[((j-1)*109*50) + 1 : (j*109*50)]
	var[j, :, :] .= reshape(temp, 109, 50)
end

As I said, there are two different patterns, and each of your code snippets produce one of each. I have not introduced any new patterns, only pointed out that your codes were not consistent.

Now that you have clarified which one you want, you can try

function mycopy!(var, data) 
    # don't hardcode 1:9, that will only work for one specific array
    N = size(var, 1) * size(var, 2)
    for j in axes(var, 1)  
    	temp = @view data[((j-1)*N) + 1 : j*N]
	    var[j, :] .= temp
    end
    return var
end

You could also try iterating over contiguous samples in var and discontinuous samples in data to see which is faster.

1 Like

Your example don’t work. Actually i have to use N = size(var, 2) * size(var, 3) and var[j,:,:] .= reshape(temp, 109, 50) to get same output but there is no performance improvement.

@timev begin
	var = zeros(Float64, 9, 109, 50)
	data = rand(9*109*50)
	for j in 1:9
		temp = data[((j-1)*109*50) + 1 : (j*109*50)]
		var[j, :, :] .= reshape(temp, 109, 50)
	end
	var
end

@timev begin
    # don't hardcode 1:9, that will only work for one specific array
    N = size(var, 2) * size(var, 3)
    for j in axes(var, 1)  
    	temp = @view data[((j-1)*N) + 1 : j*N]
	    var[j,:,:] .= reshape(temp, 109, 50)
    end
    var
end

But this code below gives better performance.

@timev begin
    data = rand(9 * 109 * 50)
    reshaped_data = reshape(data, 109, 50, 9)
    var = permutedims(reshaped_data, (3, 1, 2))
end

Is there any better way to improve performance further? Thanks.

I was sure the above would work, by unravelling the trailing dimensions, just like var[:] does work, but apparently not. I did not have access to a computer to test, but I was completely certain.

A good solution then is to just make temp into a view, as you have in one of your snippets.

The snippet below does something different from your other snippets, it creates a new array var, instead of writing in-place into a pre-existing array.

It is indeed quite fast, though, and you can make it in-place by doing:

siz = size(var)
reshaped_data = reshape(data, siz[2], siz[3], siz[1])
permutedims!(var, reshaped_data, (3, 1, 2))

(note the trailing ! in permutedims!)

Also note that your benchmarking is a bit suboptimal. Firstly, you are timing the the creation of you input data as well as the copying code, which I assume is not desirable, and secondly, you should put your code in functions, like

function newcopy!(var, data)
    siz = size(var)
    permutedims!(var, reshape(data, siz[2], siz[3], siz[1]), (3, 1, 2))
    return var
end

and then benchmark passing the data as input

@time newcopy!(var, data)

Or, even better, use the BenchmarkTools.jl package. On my computer, @timev estimates 45us runtime for newcopy!, while BencmarkTools.@btime tells me 25us, which is probably more correct.

1 Like