# Inlining a function in a struct field

Hi,

I have a question about inlining functions that hopefully somebody here can help me solve. I define an inlined potential function as follows

``````@inline function V(x)
1.0/x^5
end;
``````

and also a user-defined type that will hold it

``````mutable struct mine
x   :: Float64
Pot :: Function
mine() = new()
end
``````

from where I define

``````W     = mine();
W.x   = 1.0;
W.Pot = V;      # here I assign the potential function
Np    = 64;
r     = rand(Np,3);
``````

Now with that I want to evaluate the potential energy with this function

``````function Epot(r,glob::mine)
Ep  = 0.0
ri  = zero(r[1,:])
rij = zero(r[1,:])
N   = size(r,1)
for i in 1:N-1
for id in 1:3
ri[id] = r[i,id]
end
for j in i+1:N
for jd in 1:3
rij[jd] = ri[jd] - r[j,jd]
end
rr  = norm(rij)
Ep += glob.Pot(rr)
# Ep += V(rr)
end
end
Ep
end;
``````

for that I use BenchmarkTools and get

``````@btime Epot(r,W)
237.614 ÎĽs (6052 allocations: 94.94 KiB)
2.0618823575716226e6
``````

Now I can directly use V(x) instead of glob.Pot as in the first commented line at the bottom ofthe function, and get much better results, which way less allocations

``````@btime Epot(r,W)
166.685 ÎĽs (5 allocations: 464 bytes)
2.0618823575716226e6
``````

Since V(x) is already defined to be @inline, my question is why is it that this property (=inlining) is not propagated through my type mine()? Is there a way to keep the function definition in mine(), and still make the compiler understand that the function call in glob.Pot inside function Epot must be inlined? I ask because I see that all the allocations go there, and I get a performance penalty that would like to avoid if possible.

On another (performance) note, please notice that in my function I copy all the coordinates r[:i] of the ith-particle with a for loop. My first approach was to simply replace lines like

``````for id in 1:3
ri[id] = r[i,id]
end
``````

with

``````ri .= r[1,:]
``````

but that also increases the number of allocations, in the later case, from 5 to

``````@btime Epot(r,W)
170.148 ÎĽs (68 allocations: 7.34 KiB)
2.0618823575716226e6
``````

and even worse if I do the same with the rij variable in the j loop.
Why is this happening? Is there a way to use this simple, yet nice vectorial notation, without increasing much the number the allocations?

Best regards and thanks,

Ferran.

1 Like

Iâ€™m not sure about the inlining question, but I think the performance difference might be since `Function` is an abstract type. I think you should parametrize your structure with a type e.g. `F`, and annotate the `Pot` field with `F` instead of `Function`.

3 Likes

Why is this happening? Is there a way to use this simple, yet nice vectorial notation, without increasing much the number the allocations?

My undestanding is that `ri .= r[1,:]` creates a new array every time `r[1,:]` is called.
If you donâ€™t want to create an array use `ri .= view(r,1,:)`. This also allocates a view but it is much less memory.

If you are using views you donâ€™t want to do `ri .= view(r,i,:)` because you donâ€™t need to allocate `ri`. You can simply do `ri = view(r,i,:)` and you donâ€™t need to declare `ri` beforehand. In this case you donâ€™t even need to allocate `ri` beforehand (whether you use views or copies) because `ri` is only used to read data from (you never write to `ri`).

With the for loop:

``````  100.650 ÎĽs (5 allocations: 464 bytes)
``````

With a view

``````  101.431 ÎĽs (3 allocations: 240 bytes)
``````

With a copy

`````` 106.780 ÎĽs (68 allocations: 7.34 KiB)
``````

Usually views are more than enough if you donâ€™t go crazy optimizing everything. There are many things to optimize before you even care about the allocations generated by views.

#### Code for comparing views VS copies

``````function Epot2view(r,glob::mine)
Ep  = 0.0
rij = zero(r[1,:])
N   = size(r,1)

for i in 1:N-1
ri = view(r,i,:)
for j in i+1:N
for jd in 1:3
rij[jd] = ri[jd] - r[j,jd]
end
rr  = norm(rij)
#Ep += glob.Pot(rr)
Ep += V(rr)
end
end
Ep
end;

@btime Epot2view(r,W)
``````

I get `101.331 ÎĽs (3 allocations: 240 bytes)`

``````function Epot2copy(r,glob::mine)
Ep  = 0.0
rij = zero(r[1,:])
N   = size(r,1)

for i in 1:N-1
ri = r[i,:]
for j in i+1:N
for jd in 1:3
rij[jd] = ri[jd] - r[j,jd]
end
rr  = norm(rij)
#Ep += glob.Pot(rr)
Ep += V(rr)
end
end
Ep
end;
@btime Epot2copy(r,W)
``````

I get `104.813 ÎĽs (68 allocations: 7.34 KiB)`

#### Another solution if you donâ€™t like another for loop inside your code

If you want to fill arrays without using view but you donâ€™t want to have more for loops in a complicated function you simply write the for loop in a function and call it.

``````
function fill_ri_from_r_row!(ri, r, i)
for j in 1:length(ri)
ri[j] = r[i,j]
end
end
``````
``````function Epot3(r,glob::mine)
Ep  = 0.0
n_rows = size(r,1)
n_cols = size(r,2)
ri  = zeros(n_cols)
rij = zeros(n_cols)

for i in 1:n_rows-1
fill_ri_from_r_row!(ri,r,i)
for j in i+1:n_rows
for jd in 1:3
rij[jd] = ri[jd] - r[j,jd]
end
rr  = norm(rij)
#Ep += glob.Pot(rr)
Ep += V(rr)
end
end
Ep
end
``````
``````@btime Epot3(r,W)
101.880 ÎĽs (3 allocations: 240 bytes)
``````

#### Creating vectors with zeros

When you create vectors of zeros from slices of a matrix `r` you donâ€™t need to create a new vector doing `r[1,:]`. You can check that this allocates double the memory when you create `ri` and `rij`

``````@allocated ri = zero(r[1,:])
240
@allocated ri = zeros(size(r,2))
112
``````

You can create a vector of zeros with the `zeros` function and the number of elements you need.

2 Likes

Extending/summarizing what has been said above:

``````using LinearAlgebra

V(x) = 1.0/x^5 # no need for inline here.

struct mine{F<:Function} # doesn't need to be mutable, also parametrize on function type
x   :: Float64
Pot :: F
end

W     = mine(1.0, V);
Np    = 64;
r     = rand(Np,3);

function Epot(r,glob::mine)
Ep  = 0.0
T = eltype(r)
ncols = size(r, 2)
ri  = zeros(T, ncols)
rij = zeros(T, ncols)
N   = size(r,1)
@inbounds for i in 1:N-1 # disable bounds checks
for id in 1:3
ri[id] = r[i,id]
end
for j in i+1:N
for jd in 1:3
rij[jd] = ri[jd] - r[j,jd]
end
rr  = norm(rij)
Ep += glob.Pot(rr)
# Ep += V(rr)
end
end
Ep
end;
``````
``````julia> @btime Epot(\$r,\$W); # before
314.199 ÎĽs (6052 allocations: 94.94 KiB)

julia> @btime Epot(\$r,\$W); # after modification of the struct
217.199 ÎĽs (4 allocations: 448 bytes)

julia> @btime Epot(\$r,\$W); # finally, after slight modification of Epot function
210.799 ÎĽs (2 allocations: 224 bytes)
``````

Further speedup could berhaps be achieved by using static arrays.

1 Like

You probably do not need a mutable struct here. Mutable or not, using

``````struct Mine{T}
x::Float64
Pot::T
end

# or

mutable struct Mine{T}
x::Float64
Pot::T
end

x = 1.0
pot = V
W = Mine(x, pot)
``````

gets your speed and allocations where you prefer (notwithstanding the other improvements suggested above).

If you want a rellevant speedup you can also take into account you donâ€™t need the rij vector:

``````V(x) = 1.0/x^5 # no need for inline here.

struct mine{F<:Function} # doesn't need to be mutable, also parametrize on function type
x   :: Float64
Pot :: F
end

W     = mine(21.0, V);
Np    = 64;
r     = rand(Np,3);

function Epot_from_glob(r,glob::mine)
Ep  = 0.0
T = eltype(r)
ncols = size(r, 2)
ri  = zeros(T, ncols)
rij = zeros(T, ncols)
N   = size(r,1)
@inbounds for i in 1:N-1 # disable bounds checks
for id in 1:3
ri[id] = r[i,id]
end
for j in i+1:N
aux = zero(T)
for jd in 1:3
aux += (ri[jd] - r[j,jd])^2
end
rr  = sqrt(aux)
Ep += glob.Pot(rr)
end
end
Ep
end;

``````
``````@btime Epot_from_glob(r,W)
72.134 ÎĽs (3 allocations: 240 bytes)
``````

I guess `aux = zero(T)` should go before the loop, to be safe.

Oh, thank you all guys for your answers. I think that now I understand a couple of things better
However, I still have some questionsâ€¦ The first one is: in the definition of the parametrized type that now can contain a non-abstract function, there is no explicit constructor

``````struct mine{F<:Function}
x   :: Float64
Pot :: F
end
``````

while in my definition, there was one

``````mutable struct mine
x   :: Float64
Pot :: Function
mine() = new()
end
``````

what would be the proper way to include it in the definition above? Iâ€™ve tried something like

``````mine{F}() where {F} = new{F}();
``````

but that does not seem to work, as it complains when I try to build an object of type mine.

Thanks a lot,

Ferran.

PS: and yes, I want to have the explicit constructor in the method definition, as it is easier for me to use it in other situations. Maybe it is not the best practice, but the question is specific as to how to do that, not why

What would you wish the default values of the fields be? If you cannot answer that, then I think you will understand why the constructor without arguments will not work.

1 Like

And once you set a default function type, overwriting that with a different function will not work. If that matters for your use.

Note that although immutable `struct`s can seem a bit inconvenient to handle, itâ€™s much easier to reason about them for the compiler and they can often be allocated on the stack. If you care about performance, you should definitely favor them over mutable ones if possible.

Oh, sorry, I just copied thebstruct. Letâ€™s make it inmutable if needed.
About the constructor, I can imagine what default valyes to useâ€¦ for some of the fields, not all. For instance, the function F will take a Float64 and return also a Float64. Is there a way to build the proper constructor for that or not, then?

I think you are looking for FunctionWrappers.

I am replying because I think nobody addressed the @inline point, and could be a opportunity to Ferran to understand more (if I am not wrong, XD).

â€śIn computing, inline expansion , or inlining , is a manual or compiler optimization that replaces a function call site with the body of the called function.â€ť (Wikipedia, 2019)

As far as I know (that is not saying much), no language allow inline expansion of functions in fields/variables. The C/C++ language family, for example, allows inline expansion if you use the name of the function (as `V(rr)` in your example) because that function name will always refer to the same function and it is know in compilation time. If you create a variable/field with a function pointer, often it can can have different functions stored there through execution (or even a null pointer), and a function that receives such variable/field has no way beforehand to know what there will be in the field, and as such it will need to inspect the pointer and call the function at the call site, the function code will not be already expanded (i.e., basically copy-pasted) there. If the function knew exactly which function would be in the field to do the expansion in compilation time, then you could use the function name, and there would be no reason for passing it as parameter. It is a speed/flexibility trade-off.

Maybe Julia can bypass some of this by compiling a different function (on-the-fly) for every function passed as parameter, but I do not know Julia well enough to know.

It already does this. Each function (closure body, technically) has its own type and methods are specialized in the types of their arguments, so eg `map(f, v)` is specialized in the specific `f` as well as the type of `v`.

3 Likes

Ah, this really explains why typeof works the way it works for functions in Julia.

This raises some questions in my mind:

1. The way `@inline` works is compatible with this kind of inlining? I mean, the extra info given by `@inline` is used if the inlining would happen in a method that received a closure body as parameter and is specilizing on it?
2. `@noespecialize` in a parameter that is a closure body conflicts with `@inline`?
3. I suppose if the parameter is a struct (mutable or immutable), then the situation becomes more complex. The field to be useful (be able to store different values) will need to have the supertype `Function`, and will not allow such inline-specialization, or be parametric. Would this mean that is a good idea to use structs with parametric types (in the place of `Function` fields) using the specific closure-body-typeof to allow Julia to compile a specialized version of the code for every struct with a different closure inside?

You can use a type parameter to specialize a struct on the type of a specific function. Of course, then you canâ€™t change which function is stored there later but if you needed to do that then specialization on the specific function would be wrong anyway.

3 Likes