# Can I make my code faster with parallelism, or just plain better coding?

I am trying to optimize a lennard-jones force calculation. When I first wrote it, It used millions of allocations and over 1 giB of memory, and these both increased if I looped over the algorithm. I have gotten it down to 2 allocations and 224 bytes of memory. The biggest factor was wrapping everything (functions and variable initialization) in a big function. However, it is not fast enough yet. I need another order of magnitude or else the code will be too long to execute. It will take weeks to do what Fortran or C can do in a day.

I tried adding @threads and using 4 threads but it speeds it up by less than 10% and the results are highly variable. I do not know if I am using it properly, and I am not knowledgeable enough on the other parallel options.

The code below is actually in two parts since I am also experimenting with Array of Structs vs. Struct of Arrays. They both seem about the same speed… I have labelled all parts clearly. I leave both in (sorry it makes the code seem long) since one method may be preferable.

Any advice on how to either parallelize it, or write the code better would be appreciated! An order of magnitude speedup would be amazing…

Each test has 2 functions, the main begins with `TotalEnergy` and the inner begins with `Single`. These are the only two functions that can be played with, everything else just sets up the necessary structs and default values.

The first 95 lines can be skipped, sorry for the length, but they set it up… The issue is only the functions that start with either `Single` or `TotalEnergy`/

i.e.,

``````function SingleLJ2!(f1::Vector{Float64},f2::Vector{Float64},
coord1::XYZ,coord2::XYZ, ϵ::Float64, σ::Float64, box_size)

dx = vector1D(coord1.x, coord2.x, box_size)
dy = vector1D(coord1.y, coord2.y, box_size)
dz = vector1D(coord1.z, coord2.z, box_size)

rij_sq = dx*dx + dy*dy + dz*dz

if  rij_sq > 1.0
fill!(f1,0.0)
fill!(f2,0.0)
return
else
sr2 = σ^2 / rij_sq
rmag = sqrt(rij_sq)
sr6 = sr2 ^ 3
sr12 = sr6 ^ 2
f = ((24 * ϵ) / (rij_sq)) * (2 * sr12 - sr6)
f1 = -f * dx
f1 = -f * dy
f1 = -f * dz
f2 .= .-f1
end
nothing
end
function TotalEnergyStructOfArrays(array::StructOfArrays,vdwTable::Table,
n::Int64,box_size::Vector{Float64})

f1 = Vector{Float64}(undef,3)
f2 = Vector{Float64}(undef,3)
for k=1:1 # this loop is redundant now, it used to increase memory and allocations
for j = (i+1):n
ti = array.type[i]
tj = array.type[j]
SingleLJ2!(f1,f2,array.r[i],array.r[i],vdwTable.ϵij[ti,tj],
vdwTable.σij[ti,tj],box_size)
array.f[i].x += f1
array.f[i].y += f1
array.f[i].z += f1
array.f[j].x += f2
array.f[j].y += f2
array.f[j].z += f2
end
end
end
return
end
``````

Here is the full code for if anyone wished to run it, but the above segment of code is all that I am interested in optimizing (and the equivalent Array of Structures version) and also parallelizing.

``````#############################################################################
#
#    A general struct (Table) and two functions vector1D and vector.
#        - vector1D and vector are taken from Molly
#     https://github.com/jgreener64/Molly.jl/blob/master/src/md.jl
################################################################################
using Distributed
struct Table  # given dummy values below
ϵij::Array{Float64,2}
σij::Array{Float64,2}
end
mutable struct XYZ
x::Float64
y::Float64
z::Float64
end
mutable struct ArrayOfStructs
r::XYZ
v::XYZ
f::XYZ
type::Int64
cgnr::Int64
name::String
mass::Float64
qq::Float64
mol::String
end

mutable struct StructOfArrays
r::Vector{XYZ}
v::Vector{XYZ}
f::Vector{XYZ}
type::Vector{Int64}
cgnr::Vector{Int64}
name::Vector{String}
mass::Vector{Float64}
qq::Vector{Float64}
mol::Vector{String}
end

"Vector between two coordinate values, accounting for mirror image seperation"
function vector1D(c1::Float64, c2::Float64, box_size::Float64)
if c1 < c2
return (c2 - c1) < (c1 - c2 + box_size) ? (c2 - c1) : (c2 - c1 - box_size)
else
return (c1 - c2) < (c2 - c1 + box_size) ? (c2 - c1) : (c2 - c1 + box_size)
end
end

vector(coords_one::XYZ, coords_two::XYZ, box_size::Float64) =
[vector1D(coords_one.x, coords_two.x, box_size),
vector1D(coords_one.y, coords_two.y, box_size),
vector1D(coords_one.z, coords_two.z, box_size)]

################################################################################
#                  Some general parameters used by all trials
################################################################################
function main()
box_size = [2.0,2.0,2.0]
x = 3011 # dummy number of atoms
m = 4    # dummy number of atom types (for table of parameters...)
vdwTable = Table([rand() for i=1:m,j=1:m],
[rand() for i=1:m,j=1:m]) #,
################################################################################
#                   Start of trials
################################################################################

arrayOfStructs = Vector{ArrayOfStructs}(undef,x)

for i=1:x
arrayOfStructs[i] = ArrayOfStructs( XYZ(box_size * rand(), box_size * rand(), box_size * rand() ),
XYZ(0.0, 0.0, 0.0),
XYZ(0.0, 0.0, 0.0),
ceil(rand()*m),15,"hi",14.1,1.2,"mol" )
end

structOfArrays = StructOfArrays([XYZ(box_size * rand(), box_size * rand(), box_size * rand()) for i=1:x],
[XYZ(0.0, 0.0, 0.0) for i=1:x ],
[XYZ(0.0, 0.0, 0.0) for i=1:x ],
[ceil(rand())*m for i=1:x],
[12 for i=1:x],
["hi" for i=1:x],
[rand()*5.1 for i=1:x],
[rand()*1.3 for i=1:x],
["string" for i=1:x])

################################################################################
#
#         Testing Array of Structs
#
################################################################################
function SingleLJ!(f1::Vector{Float64},f2::Vector{Float64},
coord1::XYZ,coord2::XYZ, ϵ::Float64, σ::Float64, box_size)

# method 1 # 9.99 M allocations: 685.959 MiB
#diff = Vector{Float64}(undef,3)
#diff = vector1D(coord1.x, coord2.x, box_size)
#diff = vector1D(coord1.y, coord2.y, box_size)
#diff = vector1D(coord1.z, coord2.z, box_size)
# method 2 # 9.99 M allocations: 685.959 MiB

#    diff = coord1.x - coord2.x
#    diff = coord1.y - coord2.y
#    diff = coord1.z - coord2.z
#    diff .= diff .- round.(diff./box_size) .* box_size    # technically a bit more rigorous than method 1 and 3 # mirror image seperation
#method 3   29.97 M allocations: 1.414 GiB
#diff = vector(coord1, coord2, box_size) # THis is very heavy on allocations

#rij_sq = sum(diff.^2) # use with methods 1,2,3

# method 4 2 allocations: 224 bytes
dx = vector1D(coord1.x, coord2.x, box_size)
dy = vector1D(coord1.y, coord2.y, box_size)
dz = vector1D(coord1.z, coord2.z, box_size)

rij_sq = dx*dx + dy*dy + dz*dz  # use only with method 4

if  rij_sq > 1.0
fill!(f1,0.0)
fill!(f2,0.0)
return
else
sr2 = σ^2 / rij_sq
rmag = sqrt(rij_sq)
sr6 = sr2 ^ 3
sr12 = sr6 ^ 2
f = ((24 * ϵ) / (rij_sq)) * (2 * sr12 - sr6)
#f1 = -f * diff
#f1 = -f * diff
#f1 = -f * diff
#f2 .= .-f1

f1 = -f * dx
f1 = -f * dy
f1 = -f * dz
f2 .= .-f1
end
nothing
end
function TotalEnergyArrayOfStructs(array::Vector{ArrayOfStructs},vdwTable::Table,
n::Int64,box_size::Vector{Float64})

f1 = Vector{Float64}(undef,3)
f2 = Vector{Float64}(undef,3)
for k=1:1 # this loop is redundant... I do not understand why allocations increase linearly with this loop...
for j = (i+1):n
ti = array[i].type
tj = array[j].type
SingleLJ!(f1,f2,array[i].r,array[j].r,vdwTable.ϵij[ti,tj],
vdwTable.σij[ti,tj],box_size)
array[i].f.x += f1
array[i].f.y += f1
array[i].f.z += f1
array[j].f.x += f2
array[j].f.y += f2
array[j].f.z += f2
end
end
end
return
end
TotalEnergyArrayOfStructs(arrayOfStructs,vdwTable,x,box_size)  # precompile
println("------------------------Start of Testing AoS------------------------------")
print("array of structs: ")
@time TotalEnergyArrayOfStructs(arrayOfStructs,vdwTable,x,box_size)
println("--------------------------End of Testing AoS------------------------------")

################################################################################
#
#                           Testing Struct of Arrays
#
################################################################################

function SingleLJ2!(f1::Vector{Float64},f2::Vector{Float64},
coord1::XYZ,coord2::XYZ, ϵ::Float64, σ::Float64, box_size)

dx = vector1D(coord1.x, coord2.x, box_size)
dy = vector1D(coord1.y, coord2.y, box_size)
dz = vector1D(coord1.z, coord2.z, box_size)

rij_sq = dx*dx + dy*dy + dz*dz

if  rij_sq > 1.0
fill!(f1,0.0)
fill!(f2,0.0)
return
else
sr2 = σ^2 / rij_sq
rmag = sqrt(rij_sq)
sr6 = sr2 ^ 3
sr12 = sr6 ^ 2
f = ((24 * ϵ) / (rij_sq)) * (2 * sr12 - sr6)
f1 = -f * dx
f1 = -f * dy
f1 = -f * dz
f2 .= .-f1
end
nothing
end
function TotalEnergyStructOfArrays(array::StructOfArrays,vdwTable::Table,
n::Int64,box_size::Vector{Float64})

f1 = Vector{Float64}(undef,3)
f2 = Vector{Float64}(undef,3)
for k=1:1 # this loop is redundant now, it used to increase memory and allocations
for j = (i+1):n
ti = array.type[i]
tj = array.type[j]
SingleLJ2!(f1,f2,array.r[i],array.r[i],vdwTable.ϵij[ti,tj],
vdwTable.σij[ti,tj],box_size)
array.f[i].x += f1
array.f[i].y += f1
array.f[i].z += f1
array.f[j].x += f2
array.f[j].y += f2
array.f[j].z += f2
end
end
end
return
end

TotalEnergyStructOfArrays(structOfArrays,vdwTable,x,box_size)  # precompile
println("------------------------Start of Testing SoA------------------------------")
print("Struct of Arrays: ")
@time TotalEnergyStructOfArrays(structOfArrays,vdwTable,x,box_size)
println("--------------------------End of Testing SoA------------------------------")

end

main()

``````

That is a lot of code you expect someone to review. You might get lucky but if not, try to focus your questions to much simpler things.

• Have you read the performance tips section of the manual?
• are the Fortran codes you’re comparing to threaded as well? I would expect that you can get Julia to perform within a factor of 2, probably less. But to be able to judge you need to compare single threaded to single threaded.
1 Like

Here are a few scattershot things to try. Try https://github.com/JuliaArrays/StaticArrays.jl/ for your size-3 arrays. Use `@code_warntype` to make sure there are no funny business going on. Profile to see whether the bottleneck is in the actual computation or if it is just moving memory around. Avoid `mutable` structs.

edit: more advanced micro-optimizations: `x*x*x` might be faster than `x^3` (`x^3` is not allowed to round intermediate results, while `x*x*x` is). `@inbounds` might help. Saving floating point divisions (which are pretty expensive) by computing things like `1/rij_sq` might be faster.

1 Like

it is alot… However, there are only 2 functions that can be editted, and they are each around 15-20 lines, so not big… there is unfortunately a lot of stuff not part of the problem in the first 95 lines of the code.

Very little actually needs to be optimized, only the functions that start with `Single` or `TotalEnergy`. Everything else is unfortunately long, but background noise.

Here is a version of the SoA code which works faster:

``````struct Table  # given dummy values below
ϵij::Array{Float64,2}
σij::Array{Float64,2}
end

mutable struct XYZ
x::Float64
y::Float64
z::Float64
end

struct StructOfArrays
r::Vector{XYZ}
v::Vector{XYZ}
f::Vector{XYZ}
typ::Vector{Int64}
cgnr::Vector{Int64}
name::Vector{String}
mass::Vector{Float64}
qq::Vector{Float64}
mol::Vector{String}
end

function vector1D(c1::Float64, c2::Float64, box_size::Float64)
if c1 < c2
return (c2 - c1) < (c1 - c2 + box_size) ? (c2 - c1) : (c2 - c1 - box_size)
else
return (c1 - c2) < (c2 - c1 + box_size) ? (c2 - c1) : (c2 - c1 + box_size)
end
end

function SingleLJ2!(f1::XYZ, f2::XYZ,
coord1::XYZ,coord2::XYZ, ϵ::Float64, σ::Float64, box_size)

dx = vector1D(coord1.x, coord2.x, box_size)
dy = vector1D(coord1.y, coord2.y, box_size)
dz = vector1D(coord1.z, coord2.z, box_size)

rij_sq = dx*dx + dy*dy + dz*dz

if  rij_sq > 1.0
return
else
sr2 = σ^2 / rij_sq
rmag = sqrt(rij_sq)
sr6 = sr2 ^ 3
sr12 = sr6 ^ 2
f = ((24 * ϵ) / rij_sq) * (2 * sr12 - sr6)

x = -f * dx
y = -f * dy
z = -f * dz

f1.x += x
f1.y += y
f1.z += z
f2.x -= x
f2.y -= y
f2.z -= z
end
nothing
end

function TotalEnergyStructOfArrays(array::StructOfArrays,vdwTable::Table,
n::Int64,box_size)

@inbounds for i = 1:(n-1)
ti = array.typ[i]
for j = (i+1):n
tj = array.typ[j]
SingleLJ2!(array.f[i], array.f[j],
array.r[i], array.r[i],
vdwTable.ϵij[ti,tj],
vdwTable.σij[ti,tj],
box_size)
end
end
return
end

################################################################################
#                  Some general parameters used by all trials
################################################################################
box_size = (2.0, 2.0, 2.0)
x = 3011 # dummy number of atoms
m = 4    # dummy number of atom types (for table of parameters...)
vdwTable = Table([rand() for i=1:m,j=1:m],
[rand() for i=1:m,j=1:m]) #,

################################################################################
#                   Start of trials
################################################################################

structOfArrays = StructOfArrays([XYZ(box_size * rand(), box_size * rand(), box_size * rand()) for i=1:x],
[XYZ(0.0, 0.0, 0.0) for i=1:x ],
[XYZ(0.0, 0.0, 0.0) for i=1:x ],
[ceil(rand())*m for i=1:x],
[12 for i=1:x],
["hi" for i=1:x],
[rand()*5.1 for i=1:x],
[rand()*1.3 for i=1:x],
["string" for i=1:x])

################################################################################
#
#                           Testing Struct of Arrays
#
################################################################################

using BenchmarkTools
println("------------------------Start of Testing SoA------------------------------")
print("Struct of Arrays: ")
@btime TotalEnergyStructOfArrays(\$structOfArrays,\$vdwTable,\$x,\$box_size)
println("--------------------------End of Testing SoA------------------------------")
``````

This version benchmarks at 64ms (no allocations) on my machine, as compared to your original code which took around 210ms (3 allocations).

Some of the most important points to take into account (some of which were mentioned in previous answers):

• avoid using general `Array`s for fixed-size arrays (such as `box_size`)
• in your case the 3-element arrays in `SingleLJ2!` can even be avoided altogether
• `@inbounds` helps some more

IIRC the first two points above took me all the way down to 80ms. `@inbounds` saves an additional 15ms or so in this case. Hopefully the results are not perturbed, but I could not test them easily (it would have been good practice to have a way to easily and automatically check that).

Profiling shows that the bottleneck is in arithmetic operations in `SingleLJ2!`. I did not find any easy way to reduce the cost of those (but did not think about it for long either)…

4 Likes

It probably would be faster and cleaner with StaticArrays. You can avoid using a mutable struct and instead have a regular (immutable) struct, which should be more efficient. For example, you could replace

``````            array.f[i].x += f1
array.f[i].y += f1
array.f[i].z += f1
array.f[j].x += f2
array.f[j].y += f2
array.f[j].z += f2
``````

with

``````array.f[i] += f1
array.f[j] += f2
``````

once you make the appropriate conversions.

Let’s look at it from this angle: Your inner loop is executed `3011*3010÷2 = 4531555` times. Your struct of arrays version runs in ~157 ms on my machine. That’s `157e6 / 4531555 = 34.6` ns per iteration, or `34.6 * 2.9 = 100.3` clock cycles per iteration on my 2.9 GHz machine.

Your inner loop contains several conditional branches, a bunch of memory accesses, division, square root (which seems unused? so probably optimized away by compiler), and quite a bit of other logic. It seems like a bit of a stretch to me to speed that up by an order of magnitude to reach ~10 clock cycles per iteration. Perhaps if you can get the loop to vectorize (SIMD), but with that `vector1D` I don’t know how feasible that is.

Then again, if you have C and Fortran versions of this code running an order of magnitude faster, it is of course possible. (Is that the case? If so, what’s the performance difference, and could you paste that code here?)

A few quick ideas (struct of arrays version):

• Put `@inbounds` on the outer for loop.
– 157 ms --> 142 ms

• Replace `f2 .= .-f1` with `f2 = -f1; f2 = -f1; f2 = -f1`
– 142 ms --> 114 ms

• Put `@inline` infront of `function SingleLJ2!`
– 114 ms --> 84 ms

Not an order of magnitude, but a factor of 2 speedup, with minimal changes.

To speed it up further, I’d experiment with `StaticArrays`, or deal with tuples instead of vectors, or extract all vector elements to individual variables (there’s a small overhead each time you access a vector). I’d also manually inline the functions and see if the code could be simplified/sped up that way.

This is all just micro-optimization, of course you should also look at the big picture, to see if you need to do all of these 4 million iterations, if you can reuse results, etc.

Forget about multi-threading for now. Your code is not thread-safe. Modifying it to be thread-safe might either make it slower (e.g. due to locking), or make it a lot more complicated, which increases the risk of bugs and makes it harder to spot more effective optimizations. Multi-threading is usually the very last optimization step I do (if I do it at all). If this thing is going to run for days, it might just be easier to start several processes of it to run in parallel, with different start parameters, and achieve multi-threading that way.

Btw, as long as you don’t use global variables, I don’t think there’s a need to wrap everything in a `main` function?

8 Likes

How would that be more efficient? Wouldn’t that allocate a new `XYZ` struct for each addition?

1 Like

Is there a bug in your code btw? Shouldn’t this:

``````SingleLJ2!(f1, f2, array.r[i], array.r[i], ...
``````

be:

``````SingleLJ2!(f1, f2, array.r[i], array.r[j], ...
``````

This change makes it ~20 % slower…

I’d strongly recommend adding a simple test that makes sure the results are correct. It’s so easy otherwise to make mistakes like this and think that the code is suddenly faster, when in reality it no longer does the same thing.

Yep you got me there. The results from this aren’t really testable(everything is generated as random numbers, so no physical system comes of this) hence why I didn’t spot it(edit: for real systems, yes, I do have many checks, a significant one being that it reproduces the required thermodynamic properties on test molecules that are very well documented). Also as I will mention in my next message which I will tag you in, I spent my time on the Array of Structures version, it did not have this bug ^^

@bennedich @ffevotte Thanks for all of your help, the speedup is impressive! And, more importantly, I learned a bunch too As a new user I can only tag two people… I also appreciate the comments from "antoine-levitt, mauro3 and aaowens!

I have a question though for bennedich and ffevotte… you both worked on the `Structure of Arrays` version, is there a reason? I personally much prefer working with `Array of Structures`, but I know people say that SoA is better. However, interestingly, when I add your combined recommendations, the AoS is about 4% faster. Not much, but still… Now, should I be wary of possible future expansions of my code where using AoS may come to bite me?

Thanks

1 Like

I think this works as an example:

``````using StaticArrays, BenchmarkTools

mutable struct XYZ
x::Float64
y::Float64
z::Float64
end
A = [XYZ(rand(), rand(), rand()) for i = 1:100000]
B = deepcopy(A)
As = [SVector{3}(rand(), rand(), rand()) for i = 1:100000]
Bs = deepcopy(As)
function test1(A, B)
for i in eachindex(A)
A[i].x += B[i].x
A[i].y += B[i].y
A[i].z += B[i].z
end
end
function test2(A, B)
for i in eachindex(A)
A[i] += B[i]
end
end
julia> @btime test1(\$A, \$B)
446.745 μs (0 allocations: 0 bytes)

julia> @btime test2(\$As, \$Bs)
210.350 μs (0 allocations: 0 bytes)
``````

A struct won’t be allocated like a traditional array. It temporarily lives on the stack. The vector of SVector, `As` does get allocated, but the memory layout is just 3 floats in a row for each element. In contrast, I’m pretty sure the vector of mutable struct `XYZ` allocates a vector of pointers to separate `XYZ` objects . Possibly the compiler optimizes some of this away, I don’t know.

Also, operations on StaticArrays get SIMD vectorization naturally. If I look at the `@code_llvm` for `test2`, I see SIMD operations, but not in `test1`.

2 Likes

I have been wondering about if using the XYZ struct is the way to go… it makes it hard to vectorize things, but variable.x is easier to read and understand… I will try using static arrays on my test set and see how it goes.

Have you thought about using an algorithm that is not O(n^2)? Say a fast-multipole algorithm or similar?

4 Likes

There are a lot of things I want to try, multiple time steps (RESPA) is one of them as well, especially for the electrostatics… I am not looking forward to being crushed by Ewalds, but first I must master the basics, and then I can move on.

Also in simulation we cheat and make neighborlists so we don’t bother interacting with the particles far far away, but alot of time, is still spent on the ones in range… too much time I believe people have been playing around with fast multipole for lennard jones, I know they use them for electrostatics… I should revisit what their progress has been.

When optimizing or refactoring code like this, I always employ a simple form of Golden Master Testing. Basically, all you’re testing then, is that your working version is equal to the original version. Whether it was correct or not to begin with, who knows, but at least it works the same.

This is usually trivial to set up. Example:

``````Random.seed!(0)          # fix random seed, so you always get the same data

A = rand(100, 100)        # create input data
B = complex_algorithm(A)  # output data returned as a matrix

# Now calculate some hash of the output (problem specific)
hash = sum(sum(B))

# And test against what you got with the original version of your code
expected = 8425.2178752625
hash ≈ expected || println("Expected \$expected, got \$hash")
``````

Of course, a sum is not a perfect hash (and if you have `NaN`s it’s worthless), but I don’t think you need to go overboard in creating a perfect hash. A simple sum will almost always catch refactoring errors (like the one you had), and the above takes a minute or so to implement.

(Note: Just pay attention that if you rearrange floating point operations, floats may differ slightly due to rounding artifacts, so don’t do exact comparisons.)

There are many variations of this. A more robust option is to keep a copy of you original code, and call both that and your new code with the given input, and make sure that the results are equal.

5 Likes

@aaowens

How did you get the `SVector` to add `inplace` in a for loop?

When I use your code, it works on my computer, and the timing I get is the same as yours.

But, if I try to update a SVector in my `SingleLJ2!` function, it fails, I get `ERROR: LoadError: setindex! ::SArray{Tuple{3},Float64,1,3}, value, ::Int) is not defined.`

In this case I am passing a `SVector{3}` to a function (`SingleLJ2!`), and in that function I am trying to add to each of its 3 indices in a for loop, just like in your example. You example works… mine does not :S

I get that SVectors are immutable, but why does your example seem to break this? If I add to my `SVector` a `Vector` of size 3, that works, but I get a ton of allocations and memory usage when I do this inside my function!

Actually, all of the extra allocations are because I have to fill the SVector manually as f1 = [-fdx,-fdy,-f*dz]… i then fill the second SVector as f2 = -f1, and this generates no allocations… So I also am curious about this too… how do I avoid allocations if I fill the SVector with an array…

I think you misread @aaowens’s code, this `A[i] += abc` means "copy `abc` to location `i` of vector `A`". This works because `A` is mutable (even though its elements are not). Thus `A[i] += B[i]` would not work because `A[i]` is a field of a SVector and thus cannot be updated.

To fill an SVector, see Constructing SVector with a loop - #7 by tkf. But for your example just do `SVector(-fdx, -fdy, f*dz)`.

2 Likes

Not all allocations are equal. “Allocations” of `isbitstype` structs are on the stack and since they are created from scratch the compiler knows that nothing holds a reference to them and can optimize based on that. Mutable structs can have other references to them so the compiler must defensive.

For things like 3-dimensional vectors in this case I would recommend always using immutable struct and writing the code in a more functional manner (similar as how you would write it in math / physics).

3 Likes