Julia v0.6.0. I have the following snippets. I wonder why “snippet_1” is so much more efficient than the others. Is it possible to make the others as efficient as “snippet_1”? Thanks!

```
i_ = 40
j_ = 16
a1 = rand(i_,j_,2);
# snippet_1: 0.0005 seconds (5.20 k allocations: 302.531 KiB)
@time begin
for i in 1:i_, j in 1:j_
a1ij = a1[i,j,:]
a1[i,j,:] = rand(2) .* [1,1]
end
end
# snippet_2: 0.032269 seconds (25.43 k allocations: 1.150 MiB)
@time begin
for i in 1:i_, j in 1:j_
a1ij = a1[i,j,:]
a1[i,j,:] = rand(2) .* 1
end
end
# snippet_3: 0.030712 seconds (44.24 k allocations: 1.946 MiB)
@time begin
for i in 1:i_, j in 1:j_
a1ij = a1[i,j,:]
a1[i,j,:] = a1ij .* [1,1]
end
end
# snippet_4: 0.032377 seconds (24.79 k allocations: 1.091 MiB)
@time begin
for i in 1:i_, j in 1:j_
a1ij = a1[i,j,:]
a1[i,j,:] = a1ij .* 1
end
end
```

You’re creating temporaries by slicing. Use `@view A[i:j]`

etc.

Thanks for the suggesion about `@view`

, but I’m not sure if I’m using it the right way? Using the code below, `@view`

seems to be less efficient.

```
julia> @time x = a1[1,1,:]
0.000005 seconds (6 allocations: 272 bytes)
2-element Array{Float64,1}:
0.228546
0.995806
julia> @time x = @view a1[1,1,:]
0.000074 seconds (37 allocations: 896 bytes)
2-element SubArray{Float64,1,Array{Float64,3},Tuple{Int64,Int64,Base.Slice{Base.OneTo{Int64}}},true}:
0.228546
0.995806
```

Well the other thing is that your indexing is backwards. In a column-major language (Julia), you want to be slicing by columns, i.e. the first index. You want `@view a1[:,1,1]`

. All of your loops are backwards too.

(Don’t benchmark in global scope. Use `.=`

for in-place assignment that fuses with operations like `.*`

.)

Note that for initializing only two components, the overhead even of creating a view, or calling `broadcast`

, or allocating a temporary array like `[1,1]`

will be very significant. For such a smll array in an inner loop, the fastest things are to write out your own loop or to use something like the StaticArrays package.

2 Likes

That is a good point. But my code has to slice it in all dimensions. Am I supposed to use `permutedims`

to set the dimension I want to slice the 1st?

Thanks for the suggestion, but in this case below `.=`

seems less efficient than `=`

```
# snippet_1: 0.0005 seconds (5.20 k allocations: 302.531 KiB)
@time begin
for i in 1:i_, j in 1:j_
a1ij = a1[i,j,:]
a1[i,j,:] = rand(2) .* [1,1]
end
end
# snippet_5: 0.006824 seconds (28.24 k allocations: 762.531 KiB)
@time begin
for i in 1:i_, j in 1:j_
a1ij = a1[i,j,:]
a1[i,j,:] .= rand(2) .* [1,1]
end
end
```

Again, you’re timing globals and compilation times. Put it in a function. And “vectorization” only makes sense when the vectors are sufficiently large (even then, it won’t be as efficient as a loop, just close).

I’m not sure what you mean by put it in a function. This is what I did, below `=`

still seems more efficient than `.=`

. I ran the last two lines a couple times to let it compile before write down the time elapsed.

```
function f1()
for i in 1:i_, j in 1:j_
a1ij = a1[i,j,:]
a1[i,j,:] = rand(2) .* [1,1]
end
end
function f5()
for i in 1:i_, j in 1:j_
a1ij = a1[i,j,:]
a1[i,j,:] .= rand(2) .* [1,1]
end
end
@time f1() # 0.001070 seconds (5.21 k allocations: 302.688 KiB)
@time f5() # 0.008428 seconds (28.25 k allocations: 762.688 KiB)
```

I guess you mean creating a vector by slicing an array isn’t as efficient as a loop? That’s a good point and I’ll remember it for future use. But for my code, I slice it because I need to use the vector for matrix multiplication later, I haven’t thought of a way to avoid slicing it into a vector.

Read the performance tips. In particular you are having problems with global variables, which is what the “put it in a function comment” was meant to address. Your attempt to put it in a function didn’t address the problem, because you were still referring to global variables, rather than passing them as arguments.

Consider the following, the only difference between the two function is that `f1`

takes an argument, and thus can compile to specialized code based on the type of `a`

. Whereas `f2`

refers to a global variable, and since that variable can be changed at will, julia won’t specialize the code of `f2`

based on the type of `a`

. Thus `f2`

is slow.

```
a=rand(40,16,2)
function f1(a)
for i = 1:size(a,1), j=1:size(a,2)
a[i,j,:] = rand(2)
end
end
function f2()
for i = 1:size(a,1), j=1:size(a,2)
a[i,j,:] = rand(2)
end
end
@time f1(a) #0.000263 seconds
@time f2() #0.001927 seconds
```

3 Likes

ook. Now I get it. Thank everyone for the great advice!