Performance issue: Axelrod model of dissemination of culture optimization

Hello! I’m implementing an Agent Based Model (axelrod model of dissemination of culture). Every node has an associated vector of “opinions” (`data`) on different subjects (number of features is `feat`). Each step two nodes are chosen random (I get a random node and then a random neighbor using the `neighs` adjacency list which is a vector of vector where `neighs[j]` is the vector of all neighbors of node `j`.

The nodes are compared and a vector of Bool which is true when two nodes disagree on a subject is called `disagreements` and is created with the `compare!` function. Along with that `feats - count(disagreements)` is returned which is the number of overlapping opinions.

Two nodes interact if they have at least 1 and not all opinions in common with probability equal to the number of overlapping opinions. The interaction consist of setting equal one of the opinions where they differ.

Every `nodes` steps all pair of nodes are tested and the number of active links (that is, the links that have at least 1 but not all opinions in common) is returned.

``````function simulate_system!(
data::Vector{Vector{T}},
feats::Int,
maxtime::Int,
storage::Vector{T},
edges::Vector{Tuple{U,U}},
neighs::Vector{Vector{U}},
nodes::Int
) where {T<:Integer,U<:Integer}
disagreements::BitVector = BitVector(undef, feats)
for t in 1:maxtime
if t % nodes == 0
num_active::Int = get_active(data, feats, edges)
setindex!(storage, num_active, div(t, nodes))
if num_active == 0
println("Absorbing state reached in \$t")
break
end
end
node_1 = rand(1:nodes)
node_2 = rand(neighs[node_1])
op_1 = data[node_1]
op_2 = data[node_2]
overlaps::Int = compare!(disagreements, op_1, op_2, feats)
if overlaps != feats && overlaps != 0
if rand() < overlaps / feats
interact!(disagreements, op_1, op_2)
end
end
end
end
``````
Here I have all the functions used in the code if you want to test it out
``````function get_overlaps(data_1::Vector{T}, data_2::Vector{T}) where {T<:Integer}
overlaps::Int = 0
for (el_1, el_2) in zip(data_1, data_2)
if el_1 == el_2
overlaps += 1
end
end
return overlaps
end

function get_active(
data::Vector{Vector{T}},
feats::Int,
edges::Vector{Tuple{U,U}}
) where {T<:Integer,U<:Integer}
num_active::Int = 0
for (node_1, node_2) in edges
overlaps::Int = get_overlaps(data[node_1], data[node_2])
if overlaps != 0 && overlaps != feats
num_active += 1
end
end
return num_active
end

function compare!(
diffs::Union{BitVector,Vector{Bool}},
data_1::Vector{T},
data_2::Vector{T},
feats::Int
) where {T<:Integer}
@inbounds @simd for idx in 1:feats
diffs[idx] = data_1[idx] != data_2[idx]
end
return feats - count(diffs)
end

function interact!(
diffs::Union{BitVector,Vector{Bool}},
data_1::Vector{T},
data_2::Vector{T}
) where {T<:Integer}
idx_diff::Vector{Int} = findall(diffs)
to_agree::Int = rand(idx_diff)
data_1[to_agree] = data_2[to_agree]
end
``````

Data is initialized as random with a Poisson distribution and a 150x150 grid. These are the parameters I used

``````using Graphs
using Distributions: Poisson

side::UInt32 = 150
graph = grid((side, side), periodic=false)
n_nodes::Int = nv(graph)
edgelist = Vector{Tuple{Int,Int}}(Tuple.(edges(graph)))

features = 10
traits = 100
time_steps = 1_000 * n_nodes

opinions = [rand(Poisson(traits), features) for _ = 1:n_nodes]
storage_edges::Vector{Int} = zeros(Int, time_steps ÷ n_nodes)

simulate_system!(
opinions,
features,
time_steps,
storage_edges,
edgelist,
n_nodes
)
``````

While trying to profile using VS cdoe profiler I get the following:

1. 35% in: `overlaps::Int = compare!(disagreements, op_1, op_2, feats)`
2. 18% in: `num_active::Int = get_active(data, feats, edges)`
3. 16% in: `node_2 = rand(neighs[node_1])`
4. 14% in: `interact!(disagreements, op_1, op_2)`
5. 8% in: `op_2 = data[node_2]`

The first four seem fine and work as expected, most of the pairs will have none or all opinions in common so I expect the compare! and the choice of the nodes to be the most impactful functions.

I’m more interested in the 8% due to the second call `data[node_2]` this is weird also because the `op_1 = data[node_1]` cannot even be measured by the profiler. Why is calling a view of a vector the second time slower?

Also if you see anything that could be optimized (especially in the functions) I would be extremely happy because the computation times are very high. Thanks in advance!

This seems like it might be an artefact of sample-based profiling. Does this behavior persist if you re-run the profiler a second time? How about when you profile 10 runs of the function?

1 Like

I’ve tried again putting the simulation in a for loop this way

``````@profview for _ in 1:10 simulate_system!(
opinions,
features,
time_steps,
storage_edges,
edgelist,
n_nodes
) end
``````

But I still get the same result with a varying percentage from 8% to 10%.
Same result also when rerunning

Can you post a screenshot of the flame graph? My second hypothesis would be that `node_1` is type-inferred correctly but not `node_2`, which would lead to runtime dispatch during indexing (a red tile).

1 Like

the fifth from the left is the getindex

Ok that’s not it, and I’m out of hypotheses without taking a closer look at your code ^^

1 Like

Thanks in advance for anybody checking out!

A copy and paste of my code
``````using Graphs
using Distributions: Poisson

side = 150
graph = grid((side, side), periodic=false)

n_nodes::Int = nv(graph)
edgelist = Vector{Tuple{Int,Int}}(Tuple.(edges(graph)))

## imports and Graphs ##

function get_overlaps(data_1::Vector{T}, data_2::Vector{T}) where {T<:Integer}
overlaps::Int = 0
for (el_1, el_2) in zip(data_1, data_2)
if el_1 == el_2
overlaps += 1
end
end
return overlaps
end

function get_active(
data::Vector{Vector{T}},
feats::Int,
edges::Vector{Tuple{U,U}}
) where {T<:Integer,U<:Integer}
num_active::Int = 0
for (node_1, node_2) in edges
overlaps::Int = get_overlaps(data[node_1], data[node_2])
if overlaps != 0 && overlaps != feats
num_active += 1
end
end
return num_active
end

function compare!(
diffs::Union{BitVector,Vector{Bool}},
data_1::Vector{T},
data_2::Vector{T},
feats::Int
) where {T<:Integer}
@inbounds @simd for idx in 1:feats
diffs[idx] = data_1[idx] != data_2[idx]
end
return feats - count(diffs)
end

function interact!(
diffs::Union{BitVector,Vector{Bool}},
data_1::Vector{T},
data_2::Vector{T}
) where {T<:Integer}
idx_diff::Vector{Int} = findall(diffs)
to_agree::Int = rand(idx_diff)
data_1[to_agree] = data_2[to_agree]
end

function simulate_system!(
data::Vector{Vector{T}},
feats::Int,
maxtime::Int,
storage::Vector{T},
edges::Vector{Tuple{U,U}},
neighs::Vector{Vector{U}},
nodes::Int
) where {T<:Integer,U<:Integer}
disagreements::BitVector = BitVector(undef, feats)
for t in 1:maxtime
if t % nodes == 0
num_active = get_active(data, feats, edges)
storage[div(t, nodes)] = num_active
if num_active == 0
println("Absorbing state reached in \$t")
break
end
end
node_1 = rand(1:nodes)
node_2 = rand(neighs[node_1])
op_1 = data[node_1]
op_2 = data[node_2]
overlaps = compare!(disagreements, op_1, op_2, feats)
if overlaps != feats && overlaps != 0
if rand() < overlaps / feats
interact!(disagreements, op_1, op_2)
end
end
end
end

## functions ##

features = 10
traits = 100
time_steps = 1_000 * n_nodes

opinions::Vector{Vector{Int}} =
[rand(Poisson(traits), features) for _ = 1:n_nodes]
storage_edges::Vector{Int} = zeros(Int, time_steps ÷ n_nodes);

## initialization ##

@profview simulate_system!(
opinions,
features,
time_steps,
storage_edges,
edgelist,