Inlining a function in a struct field

Ferran_Mazzanti · May 29, 2019, 9:44am

Hi,

I have a question about inlining functions that hopefully somebody here can help me solve. I define an inlined potential function as follows

@inline function V(x)
    1.0/x^5
end;

and also a user-defined type that will hold it

mutable struct mine
    x   :: Float64
    Pot :: Function
    mine() = new()
end

from where I define

W     = mine();
W.x   = 1.0;
W.Pot = V;      # here I assign the potential function
Np    = 64;
r     = rand(Np,3);

Now with that I want to evaluate the potential energy with this function

function Epot(r,glob::mine)
    Ep  = 0.0
    ri  = zero(r[1,:])
    rij = zero(r[1,:])
    N   = size(r,1)
    for i in 1:N-1
        for id in 1:3
            ri[id] = r[i,id]
        end
        for j in i+1:N
            for jd in 1:3
                rij[jd] = ri[jd] - r[j,jd]
            end
            rr  = norm(rij)
            Ep += glob.Pot(rr)
            # Ep += V(rr)
        end
    end
    Ep
end;

for that I use BenchmarkTools and get

@btime Epot(r,W)
  237.614 μs (6052 allocations: 94.94 KiB)
2.0618823575716226e6

Now I can directly use V(x) instead of glob.Pot as in the first commented line at the bottom ofthe function, and get much better results, which way less allocations

@btime Epot(r,W)
  166.685 μs (5 allocations: 464 bytes)
2.0618823575716226e6

Since V(x) is already defined to be @inline, my question is why is it that this property (=inlining) is not propagated through my type mine()? Is there a way to keep the function definition in mine(), and still make the compiler understand that the function call in glob.Pot inside function Epot must be inlined? I ask because I see that all the allocations go there, and I get a performance penalty that would like to avoid if possible.

On another (performance) note, please notice that in my function I copy all the coordinates r[:i] of the ith-particle with a for loop. My first approach was to simply replace lines like

for id in 1:3
    ri[id] = r[i,id]
end

with

ri .= r[1,:]

but that also increases the number of allocations, in the later case, from 5 to

@btime Epot(r,W)
  170.148 μs (68 allocations: 7.34 KiB)
2.0618823575716226e6

and even worse if I do the same with the rij variable in the j loop.
Why is this happening? Is there a way to use this simple, yet nice vectorial notation, without increasing much the number the allocations?

Best regards and thanks,

Ferran.

ericphanson · May 29, 2019, 9:49am

I’m not sure about the inlining question, but I think the performance difference might be since Function is an abstract type. I think you should parametrize your structure with a type e.g. F, and annotate the Pot field with F instead of Function.

davidbp · May 29, 2019, 10:23am

Why is this happening? Is there a way to use this simple, yet nice vectorial notation, without increasing much the number the allocations?

My undestanding is that ri .= r[1,:] creates a new array every time r[1,:] is called.
If you don’t want to create an array use ri .= view(r,1,:). This also allocates a view but it is much less memory.

If you are using views you don’t want to do ri .= view(r,i,:) because you don’t need to allocate ri. You can simply do ri = view(r,i,:) and you don’t need to declare ri beforehand. In this case you don’t even need to allocate ri beforehand (whether you use views or copies) because ri is only used to read data from (you never write to ri).

With the for loop:

  100.650 μs (5 allocations: 464 bytes)

With a view

  101.431 μs (3 allocations: 240 bytes)

With a copy

 106.780 μs (68 allocations: 7.34 KiB)

Usually views are more than enough if you don’t go crazy optimizing everything. There are many things to optimize before you even care about the allocations generated by views.

Code for comparing views VS copies

function Epot2view(r,glob::mine)
    Ep  = 0.0
    rij = zero(r[1,:])
    N   = size(r,1)
    
    for i in 1:N-1
        ri = view(r,i,:)
        for j in i+1:N
            for jd in 1:3
                rij[jd] = ri[jd] - r[j,jd]
            end
            rr  = norm(rij)
            #Ep += glob.Pot(rr)
            Ep += V(rr)
        end
    end
    Ep
end;      

@btime Epot2view(r,W)

I get 101.331 μs (3 allocations: 240 bytes)

function Epot2copy(r,glob::mine)
    Ep  = 0.0
    rij = zero(r[1,:])
    N   = size(r,1)
    
    for i in 1:N-1
        ri = r[i,:]
        for j in i+1:N
            for jd in 1:3
                rij[jd] = ri[jd] - r[j,jd]
            end
            rr  = norm(rij)
            #Ep += glob.Pot(rr)
            Ep += V(rr)
        end
    end
    Ep
end;     
@btime Epot2copy(r,W)

I get 104.813 μs (68 allocations: 7.34 KiB)

Another solution if you don’t like another for loop inside your code

If you want to fill arrays without using view but you don’t want to have more for loops in a complicated function you simply write the for loop in a function and call it.


function fill_ri_from_r_row!(ri, r, i)
    for j in 1:length(ri)
        ri[j] = r[i,j]
    end
end

function Epot3(r,glob::mine)
    Ep  = 0.0
    n_rows = size(r,1)
    n_cols = size(r,2)
    ri  = zeros(n_cols)
    rij = zeros(n_cols)
    
    for i in 1:n_rows-1
        fill_ri_from_r_row!(ri,r,i)
        for j in i+1:n_rows
            for jd in 1:3
                rij[jd] = ri[jd] - r[j,jd]
            end
            rr  = norm(rij)
            #Ep += glob.Pot(rr)
            Ep += V(rr)
        end
    end
    Ep
end

@btime Epot3(r,W)
  101.880 μs (3 allocations: 240 bytes)

Creating vectors with zeros

When you create vectors of zeros from slices of a matrix r you don’t need to create a new vector doing r[1,:]. You can check that this allocates double the memory when you create ri and rij

@allocated ri = zero(r[1,:])
240
@allocated ri = zeros(size(r,2))
112

You can create a vector of zeros with the zeros function and the number of elements you need.

carstenbauer · May 29, 2019, 11:13am

Extending/summarizing what has been said above:

using LinearAlgebra

V(x) = 1.0/x^5 # no need for inline here.

struct mine{F<:Function} # doesn't need to be mutable, also parametrize on function type
    x   :: Float64
    Pot :: F
end


W     = mine(1.0, V);
Np    = 64;
r     = rand(Np,3);


function Epot(r,glob::mine)
    Ep  = 0.0
    T = eltype(r)
    ncols = size(r, 2)
    ri  = zeros(T, ncols)
    rij = zeros(T, ncols)
    N   = size(r,1)
    @inbounds for i in 1:N-1 # disable bounds checks
        for id in 1:3
            ri[id] = r[i,id]
        end
        for j in i+1:N
            for jd in 1:3
                rij[jd] = ri[jd] - r[j,jd]
            end
            rr  = norm(rij)
            Ep += glob.Pot(rr)
            # Ep += V(rr)
        end
    end
    Ep
end;

julia> @btime Epot($r,$W); # before
  314.199 μs (6052 allocations: 94.94 KiB)

julia> @btime Epot($r,$W); # after modification of the struct
  217.199 μs (4 allocations: 448 bytes)

julia> @btime Epot($r,$W); # finally, after slight modification of Epot function
  210.799 μs (2 allocations: 224 bytes)

Further speedup could berhaps be achieved by using static arrays.

JeffreySarnoff · May 29, 2019, 11:40am

You probably do not need a mutable struct here. Mutable or not, using

struct Mine{T}
    x::Float64
    Pot::T
end

# or

mutable struct Mine{T}
    x::Float64
    Pot::T
end

x = 1.0
pot = V
W = Mine(x, pot)

gets your speed and allocations where you prefer (notwithstanding the other improvements suggested above).

davidbp · May 29, 2019, 12:02pm

If you want a rellevant speedup you can also take into account you don’t need the rij vector:

V(x) = 1.0/x^5 # no need for inline here.

struct mine{F<:Function} # doesn't need to be mutable, also parametrize on function type
    x   :: Float64
    Pot :: F
end


W     = mine(21.0, V);
Np    = 64;
r     = rand(Np,3);


function Epot_from_glob(r,glob::mine)
    Ep  = 0.0
    T = eltype(r)
    ncols = size(r, 2)
    ri  = zeros(T, ncols)
    rij = zeros(T, ncols)
    N   = size(r,1)
    @inbounds for i in 1:N-1 # disable bounds checks
        for id in 1:3
            ri[id] = r[i,id]
        end
        for j in i+1:N
            aux = zero(T)
            for jd in 1:3
                aux += (ri[jd] - r[j,jd])^2
            end
            rr  = sqrt(aux)
            Ep += glob.Pot(rr)
        end
    end
    Ep
end;

@btime Epot_from_glob(r,W)
  72.134 μs (3 allocations: 240 bytes)

carstenbauer · May 29, 2019, 1:46pm

I guess aux = zero(T) should go before the loop, to be safe.

Ferran_Mazzanti · May 29, 2019, 3:06pm

Oh, thank you all guys for your answers. I think that now I understand a couple of things better
However, I still have some questions… The first one is: in the definition of the parametrized type that now can contain a non-abstract function, there is no explicit constructor

struct mine{F<:Function} 
    x   :: Float64
    Pot :: F
end

while in my definition, there was one

mutable struct mine
    x   :: Float64
    Pot :: Function
    mine() = new()
end

what would be the proper way to include it in the definition above? I’ve tried something like

mine{F}() where {F} = new{F}();

but that does not seem to work, as it complains when I try to build an object of type mine.

Thanks a lot,

Ferran.

PS: and yes, I want to have the explicit constructor in the method definition, as it is easier for me to use it in other situations. Maybe it is not the best practice, but the question is specific as to how to do that, not why

PetrKryslUCSD · May 29, 2019, 3:23pm

What would you wish the default values of the fields be? If you cannot answer that, then I think you will understand why the constructor without arguments will not work.

JeffreySarnoff · May 29, 2019, 3:27pm

And once you set a default function type, overwriting that with a different function will not work. If that matters for your use.

carstenbauer · May 29, 2019, 3:36pm

Note that although immutable structs can seem a bit inconvenient to handle, it’s much easier to reason about them for the compiler and they can often be allocated on the stack. If you care about performance, you should definitely favor them over mutable ones if possible.

Ferran_Mazzanti · May 29, 2019, 3:44pm

Oh, sorry, I just copied thebstruct. Let’s make it inmutable if needed.
About the constructor, I can imagine what default valyes to use… for some of the fields, not all. For instance, the function F will take a Float64 and return also a Float64. Is there a way to build the proper constructor for that or not, then?

PetrKryslUCSD · May 29, 2019, 3:50pm

I think you are looking for FunctionWrappers.

Henrique_Becker · May 30, 2019, 2:13pm

I am replying because I think nobody addressed the @inline point, and could be a opportunity to Ferran to understand more (if I am not wrong, XD).

“In computing, inline expansion , or inlining , is a manual or compiler optimization that replaces a function call site with the body of the called function.” (Wikipedia, 2019)

As far as I know (that is not saying much), no language allow inline expansion of functions in fields/variables. The C/C++ language family, for example, allows inline expansion if you use the name of the function (as V(rr) in your example) because that function name will always refer to the same function and it is know in compilation time. If you create a variable/field with a function pointer, often it can can have different functions stored there through execution (or even a null pointer), and a function that receives such variable/field has no way beforehand to know what there will be in the field, and as such it will need to inspect the pointer and call the function at the call site, the function code will not be already expanded (i.e., basically copy-pasted) there. If the function knew exactly which function would be in the field to do the expansion in compilation time, then you could use the function name, and there would be no reason for passing it as parameter. It is a speed/flexibility trade-off.

Maybe Julia can bypass some of this by compiling a different function (on-the-fly) for every function passed as parameter, but I do not know Julia well enough to know.

StefanKarpinski · May 30, 2019, 2:20pm

It already does this. Each function (closure body, technically) has its own type and methods are specialized in the types of their arguments, so eg map(f, v) is specialized in the specific f as well as the type of v.

Henrique_Becker · May 30, 2019, 2:46pm

Ah, this really explains why typeof works the way it works for functions in Julia.

This raises some questions in my mind:

The way @inline works is compatible with this kind of inlining? I mean, the extra info given by @inline is used if the inlining would happen in a method that received a closure body as parameter and is specilizing on it?
@noespecialize in a parameter that is a closure body conflicts with @inline?
I suppose if the parameter is a struct (mutable or immutable), then the situation becomes more complex. The field to be useful (be able to store different values) will need to have the supertype Function, and will not allow such inline-specialization, or be parametric. Would this mean that is a good idea to use structs with parametric types (in the place of Function fields) using the specific closure-body-typeof to allow Julia to compile a specialized version of the code for every struct with a different closure inside?

StefanKarpinski · May 30, 2019, 3:00pm

You can use a type parameter to specialize a struct on the type of a specific function. Of course, then you can’t change which function is stored there later but if you needed to do that then specialization on the specific function would be wrong anyway.

Topic		Replies	Views
Why using a mutable struct type argument to create instances creates a 50x slowdown? Performance question , type , function	7	533	June 25, 2023
Structures containing a function field Performance	12	533	August 3, 2021
Best practice for storing functions in a struct General Usage	31	9742	February 4, 2021
Can @inline works when the inlined function is a field of a struct? New to Julia	4	536	March 16, 2022
Function in type definition General Usage	24	7026	May 19, 2017

Inlining a function in a struct field

Code for comparing views VS copies

Another solution if you don’t like another for loop inside your code

Creating vectors with zeros

Related topics