Performance of array of arrays vs. dictionaries when accessing and manipulating data structures

Christian_Brunbjerg · March 14, 2022, 11:41am

I here have two different structs that represent the same data. The main objective is to achieve as fast as possible performance.

#Using array of arrays representation
struct Route_array
    Station::Vector{Int64}
    Tank::Vector{Vector{Int64}}
    Amount::Vector{Vector{Float64}}
end

#Using dictionaries to represent the data
struct Route_dict
    Station::Vector{Int64}
    Tank::Dict{Int64,Vector{Int64}} 
    Amount::Dict{Tuple{Int64,Int64},Float64} 
end

Which of these formulations would be optimal if you have to access and update the values of structs more than say ~10^10 times? The dictionary version would be the preferable one since it would make the following code easier to write, but I am concerned about losing performance.

EDIT: The data structures comes from the an optimization problem where the goal is to find the optimal way to deliver fuel. The route structs that you see is the foundation of a solution to this problem. (The structs here have been significantly reduced, to better understand the difference between the two formulations. In the original struct there are 8 members). A complete solution to my instance of the problem will contain around 3000 of these route structs. These routes are then nested inside another structs (Called Solution) which together with the routes also contain the objective value (given in km) and the current level of fuel in all tanks and all stations.

From here the goal is to improve the solution by, for example, creating or deleting route structs. It could also be by making smaller changes to an existence route struct, this could be done by changing the order in which the stations and tanks are visited or how much fuel (amount) are put in each of these. Again, there are 8 structs members in the original struct, so what you see have been significantly reduced. But essentially, I access the route struct many, many times in which a function adds something to the struct members or change some of the values in the struct. Calculating these changes can involve some operations the scope of which depends to how big the changes are. For example, when creating a new route struct I need to do “a lot” of calculations (Determine amount to each tank, determine if there is an vehicle available, making sure that constraints are not violated, etc…).

Sukera · March 14, 2022, 12:10pm

Naively (without knowing how these structs are used, which is crucial), the former should be faster because there’s no hashing or rehashing involved compared to a Dict. Practically, it depends on how you use them - if the access is even somewhat rare, any real computation done on them will matter more than the time difference in accessing these.

EDIT: Though a slight performance upgrade due to caching effects may be achieved if you have a Matrix instead of a Vector{Vector{Int}} - the former is stored in contiguous memory, while the latter is disjointed and scattered, leading to more cache misses.

rdeits · March 14, 2022, 1:00pm

If you have a contiguous array of anything, I can’t think of any reason to ever prefer using a Dict{Int, X} over a Vector{X}. The Dict layout is more complicated, so lookup and modification will be slower.

If you don’t have a contiguous array, then a Dict might make sense, although there are other choices to consider like a sparse vector.

This seems like the most relevant part. A Vector seems like the obvious answer to the question you’ve posed, but I bet the actual answer will depend a lot on what you mean here.

goerch · March 14, 2022, 8:52pm

Not without reason someone called his book data structures and algorithms = programs. Maybe you should show us something more about what you trying to achieve with your data structures…

Henrique_Becker · March 14, 2022, 9:29pm

However, the two are not equivalent, a Vector{Vector{Int}} may have each inner row/column of different length, the Matrix solution needs to support either Nothing/Missing to do the same and may waste a lot of memory.

goerch · March 14, 2022, 9:36pm

Yeah, sparsity maybe a key element here (and even then there a probably alternate solutions).

Edit: sorry for inappropriate comment.

Topic		Replies	Views
Performance issues when working with dict Performance dictionary	11	1695	November 16, 2022
When to use a Dictionary versus a Struct in Julia? General Usage	3	4529	May 6, 2020
What is the fastest way of making a vector of dictionaries? General Usage	11	330	April 1, 2024
What is faster: sparse vector or a dictionary? General Usage	14	670	April 1, 2024
[ANN] DictArrays: dictionary-based arrays, or wide tables with the Julian interface Package Announcements	0	355	March 20, 2023

Performance of array of arrays vs. dictionaries when accessing and manipulating data structures

Related topics