Hello community!
As I understand from what I have read in Julia’s optimization section, “Performance critical code should be inside a function”.
My code has several for loops that fill the created arrays, my question is: To make the execution faster, is it better to create the array outside the function or inside the function?
I think that if I create it outside, the function is only in charge of filling it with elements, on the other hand, if I create it inside, the function is in charge of creating the matrix and filling it with elements, which would demand more memory.
I don’t know if it’s a good idea to put my code here since it’s like 600 lines long, so here is an easy example. I hope it helps your understanding.
x::Matrix{Float64} = zeros(1,1000)
y = -1.0
function example(x,y)
for i = 1:1000
x[1,i] = i^2 + 10y
end
return x
end
or it would be faster to:
y = -1.0
function example2(y)
x::Matrix{Float64} = zeros(1,1000)
for i = 1:1000
x[1,i] = i^2 + 10y
end
return x
end
I know i can measure it with @time or @btime, but i would like to know the “computationally” correct answer, not the one i got by repeting the exercise.
When performance matters, it’s good to provide a mutating interface (function). It’s possible to additionally provide an allocating interface (function), which could just call the mutating interface after allocating the data structure.
NB: in Julia, it’s common and preferred to have the name of a function that mutates its arguments end with a bang (!). See Style Guide · The Julia Language
“Allocating” is what you called “the function is in charge of creating the matrix”. The function example2 has to allocate memory to put the matrix into. If instead the function example takes the matrix as an argument, the matrix is pre-allocated so it’s faster.
You can do
# nonallocating function mutates its x argument
function example!(x, y)
for i in eachindex(y)
x[i] = 10 * y[i]
end
x
end
# allocating version creates an x and then calls the nonallocating version
function example(y)
x = similar(y)
example!(x, y)
end
If we take the code at face value i.e. we don’t try to change its structure, then the right answer here is to construct the matrix inside the function. This makes the function self-contained and therefore makes the code cleaner.
As is, you’ve already added a type hint to x, so there shouldn’t be any speed-up by putting the zeros inside the function.
We don’t have enough information to answer definitively. If you only run the function once, it doesn’t matter, performance-wise, whether you allocate it outside or inside. It will anyway be allocated once, and filled once.
The difference occurs when you run the function multiple times. If you can reuse the matrix, then naturally it’s faster to only allocate once outside the function, and repeatedly fill it inside.
I am a little bit suspicious that you are a recent Matlab convert, and that you should be creating a one-dimensional Vector rather than a two-dimensional Matrix. Note that unlike Matlab, Julia supports true, one-dimensional arrays, known as Vectors for short. You would create such an object as follows:
x = zeros(1000)
Note that it is not necessary to provide a type annotation for x—Julia will know the type automatically, because a single, integer argument to the zeros function will generate an array of type Vector{Float64} (an alias for the slightly more verbose Array{Float64,1}).
If these comments are misdirected, please forgive me. I have seen this mistake made frequently in the past.
In addition to the remarks by @PeterSimon, there’s also the risky code snippet above. This will work well for arrays of one particular size, but fail or error for other sizes. Make sure to check the array size inside the function when iterating over its elements.
Depending on globals as a buffer is probably not a good idea. You can use an optional keyword argument for this purpose.
function example(
y;
x::Matrix{Float64} = zeros(1,1000)
)
for i = 1:1000
x[1,i] = i^2 + 10y
end
return x
end
Then you can call the function without providing a keyword argument, which will have a new buffer allocated everytime. You can also provide the keyword argument to use or reuse a buffer.
Mutation is changing a value. E.g, changing the first element of a vector from 13 to 14. A mutating function may mutate one of its arguments.
Allocation is less well defined. Basically a function is allocating if it may cause a heap allocation that may need to be tracked by the garbage collector (GC). Or by another allocator, such as the libc malloc, if you’re using FFI, for example.
Mutation is less safe in principle than working with immutable structures (making a modified copy of an existing data structure), but may be preferred for performance reasons when it allows preventing allocation.
A single function/method may be both mutating and allocating, but the goal is often to have a higher-level, more convenient, allocating function, that doesn’t mutate any of its arguments; and a lower-level, mutating function, whose name ends with a !, and which doesn’t allocate.
In general, editing (mutating) Arrays (Vectors, Matrices etc) is “cheaper” than creating (“allocating”) them. So the general rule is create (allocate) once, edit (mutate) as needed. Here’s one possible way to think about it: If your function is meant to create an Array that wouldn’t otherwise exist, then maybe it’s okay to create and edit the array within the function. If your function needs to do a series of complicated calculations/transformations, it might still be worth creating the array once and editing it multiple times, even if your function is creating an array that wouldn’t otherwise exist.
Here’s a case where it might be more efficient to pass an array to the function as an argument and mutate it: You get a column of data from a csv file and the values represent percentages (are between 0 and 100) and you need to scale the values to 0 to 1 before you do further calculations. Say you read in the csv data and store it in a vector v. if you DON’T need this vector v to stay unchanged in value for later use, you could transform it by doing scale01!(v) where scale01! is a mutating function defined as:
function scale01!(x::Vector{<:Real})
for i in eachindex(x)
x[i] = x[i] / 100
end
return nothing
end
Notice that the function returns nothing, though it would do that even without the explicit return statement.
But, in your example, you’re creating an array of zeros just to store the data, so it might not matter whether you create (allocate) it inside or outside the function.
I’m not an expert on squeezing performance out of Julia, but your example is missing other performance optimizing recommendations:
avoid using global variables, and if you must, declare them constant
use eachindex so that you can safely use @inbounds and avoid bounds checking (making sure you’re not using wrong indices to get elements from your array), though I am not sure if this is helpful for simpler loops. So your example would look like:
const x = zeros(Float64, 1000)
const y = -1.0
example!(x, y) = @inbounds for i in eachindex(x)
x[i] = x[i] ^ 2 + 10 * y
end
You can do more: like doing the 10 * y outside the loop, or if this is actually the computation you want, doing muladd(x[i], x[i], 10y). I’m not sure how much these would help in this case.