What is the best way to design a long computation?

aortizb · January 5, 2019, 2:18am

In the context of the development of a finite element code, I have a dictionary that is formed after reading an input data file containing the finite element mesh (that is, nodal coordinates and element connectivity). I am in the process of deciding how I am going to unpack the finite element mesh from the dictionary to perform the long computation. Here, I present two approaches to perform the long computation. The codes look something like these:

Approach 1:

function main1()
	file = "inputdata.txt" 
	data = read_input(file)  # data is a dictionary containing a finite element mesh (nodal coordinates, element connectivity, etc.)
	solve1(data)
end

function solve1(data)
	# some auxiliary inexpensive computations
        someoutputs = do_some_short_computations() 
	
	# long computation that depends on data
	loop_over_elements1(data)     
	
	# additional auxiliary inexpensive computations
	do_some_other_short_computations()          
end 

function loop_over_elements1(data)
	# this is an array of a "struct" type containing the data of an element
	elements = data["elements"] 
	
	# the loop
	@inbounds for e = 1:length(elements)
		element = elements[e]  # the struct containing the information of the current element
		nodes = element.nodes  # this is the nodal connectivity of the current element
		do_some_stuff_with_nodes(nodes)
	end
end

Approach 2:

function main2()
	file = "inputdata.txt" 
	data = read_input(file)  # data is a dictionary containing a finite element mesh (nodal coordinates, element connectivity, etc.)
	solve2(data)
end

function solve2(data)
	# some auxiliary inexpensive computations
        someoutputs = do_some_short_computations() 
	
	# long computation
	elements = data["elements"]  # this is an array of a "struct" type containing the data of an element
	loop_over_elements2(elements)     
	
	# additional auxiliary inexpensive computations
	do_some_other_short_computations()          
end 

function loop_over_elements2(elements)	
	@inbounds for e = 1:length(elements)
		element = elements[e]  # the struct containing the information of the current element
		nodes = element.nodes  # this is the nodal connectivity of the current element
		do_some_stuff_with_nodes(nodes)
	end
end

What do you think it is a better (in term of performance) approach? to pass the data dictionary inside the “loop_over_elements” function and unpack the elements inside this function (Approach 1) or to unpack the array of elements before entering the “loop_over_elements” function (Approach 2)?

kevbonham · January 5, 2019, 2:46am

Can you build a toy example of each and then just benchmark it? I usually find it difficult to reason about what will be performant, so this is my approach. That or I make one version that works and then ask people if there’s a better way to do it.

You’re much more likely to get useful responses if you have code that actually runs that people can play with.

aortizb · January 5, 2019, 2:59am

@kevbonham I don’t have it yet, sorry. My question is tailored to figure out the best way to do it in Julia so that it can be optimized. The “elements” array can be huge. I am tempted to think that Approach 2 is better, but perhaps there is no difference. I don’t know. I am new to Julia. Anyway, I’ll do what you say of course when I have it, but in the meantime perhaps someone already did it and can share his/her findings.

PetrKryslUCSD · January 5, 2019, 3:06am

loop_over_elements2() may be written in a way that will nail down the type of the argument so that the interior of this function is type-stable.
Retrieving something from an untyped dictionary (Any) within your computation routine
will not do that.

FinEtools makes sure that all heavy-lifting routines are type stable.

aortizb · January 5, 2019, 4:38am

Thank you @PetrKryslUCSD . Actually, I became interested in Julia when I saw your project for the first time about a month ago. I was browsing your code by then and took some ideas from it. Nice project and very interesting. Now, I had another look at your project and found something that I believe it is similar to what I had in mind. It is given in the function stiffness() in FEMMDeforLinearBaseModule.jl of FinEtools:

function stiffness(self::FEMMDeforLinearAbstract, assembler::A, geom::NodalField{FFlt}, u::NodalField{T}) where {A<:SysmatAssemblerBase, T<:Number}

    fes = self.integdomain.fes

    npts,  Ns,  gradNparams,  w,  pc = integrationdata(self.integdomain);

    dofnums, loc, J, csmatTJ, gradN, D, B, DB, elmat, elvec, elvecfix = buffers(self, geom, u)

    self.material.tangentmoduli!(self.material, D, 0.0, 0.0, loc, 0)

    startassembly!(assembler, size(elmat, 1), size(elmat, 2), count(fes),

    u.nfreedofs, u.nfreedofs);

    for i = 1:count(fes) # Loop over elements

        fill!(elmat,  0.0); # Initialize element matrix

        for j = 1:npts # Loop over quadrature points

            locjac!(loc, J, geom.values, fes.conn[i], Ns[j], gradNparams[j]) 

            Jac = Jacobianvolume(self.integdomain, J, loc, fes.conn[i], Ns[j]);

            updatecsmat!(self.mcsys, loc, J, fes.label[i]);

            At_mul_B!(csmatTJ, self.mcsys.csmat, J); # local Jacobian matrix

            gradN!(fes, gradN, gradNparams[j], csmatTJ);

            Blmat!(self.mr, B, Ns[j], gradN, loc, self.mcsys.csmat);

            add_btdb_ut_only!(elmat, B, Jac*w[j], D, DB)

        end # Loop over quadrature points

        complete_lt!(elmat)

        gatherdofnums!(u, dofnums, fes.conn[i]); # retrieve degrees of freedom

        assemble!(assembler, elmat, dofnums, dofnums); # assemble symmetric matrix

    end # Loop over elements

    return makematrix!(assembler);

end

It appears that self is at some point similar to what I have in “elements” in the code I put in my original post — self appears to be some sort of struct containing the elements and my “elements” is an array of structs representing the element data (each entry in elements is an element struct having the nodes defining the element, their coordinates, etc.). Under this understanding, I think that loop_over_elements2 (Approach 2) is more closer to your stiffness function. As you said, passing the untyped dictionary inside the long computation function appears to be not convenient for type stability. Did I get your message right?

PetrKryslUCSD · January 5, 2019, 5:26am

Yes, that is what I had in mind. The integration domain is parameterized with the type of the element.

mutable struct IntegDomain{S<:FESet, F<:Function}
    fes::S # finite element set object
    integration_rule::IntegRule  # integration rule object
    otherdimension::F # function to compute the "other" dimension (thickness, or cross-sectional area)
    axisymmetric::Bool
end

That makes it possible to achieve type stability.

kevbonham · January 5, 2019, 3:15pm

But it looks to me like what you care about here isn’t actually about your use case, but about how Julia handles different data structures. I’m saying, fill a dictionary containing your struct with random numbers, try out your loops and see what happens…

bennedich · January 5, 2019, 3:42pm

Just to clarify, this statement:

elements = data["elements"]

simply returns a reference to the already existing element array. There’s no unpacking or array construction taking place there, unless I’ve misunderstood something in your code. See this example:

julia> struct Element e end

julia> @time original_elements = Element.(1:1_000_000);
  0.073706 seconds (2.12 M allocations: 43.801 MiB)

julia> @time data = Dict("elements" => original_elements);
  0.000008 seconds (9 allocations: 800 bytes)

julia> @time elements = data["elements"];
  0.000004 seconds (4 allocations: 160 bytes)

So in terms of where the “unpacking” occurs, your two implementations are almost identical. The more relevant questions IMO are 1) type-stability, as @PetrKryslUCSD points out and 2) if one form allows you to unit test / re-use your code better.

aortizb · January 5, 2019, 4:16pm

Sorry @bennedich , I didn’t use correctly the term “unpacking.” I was refering to the process of getting out of the dictionary, which has many other things as well (Dict{String, Any}). When I wrote the original post, I was thinking about the performance issues that would cause the “Any” (I didn’t wrote all I was thinking and I appologize for that), so @PetrKryslUCSD gave me the answer I was expecting. Anyway, I am grateful for your kind reply.

Topic		Replies	Views
I hope you like brainteasers General Usage	0	793	October 12, 2019
Functions in an input file-- how? General Usage	35	2573	May 16, 2017
Implementing parallel computing in finite element problems Numerics	6	2037	August 15, 2018
Data Structures for Finite Volumes Numerics question , package , pde , data_structures	3	102	September 19, 2024
Finite element method interpolation performance Performance memory-allocation , staticarrays	3	425	November 5, 2021

What is the best way to design a long computation?

Related topics