Hi,
I have data on users for whom I’d like to create a bipartite network that connects them through their consumption decisions. Since the data is rather big I’m trying to find an efficient implementation to create the network (so any tips in that direction are appreciated). Basically, I (think I) need a simple weighted graph with edges weighted by the number of times a particular item was consumed. In the end I would like to extract e.g. the centrality and add it to the original DataFrame. So my main question is how to best preserve “identifyability” of each node in the network. One way would be to use a MetaGraph and add an ID to each node. Alternatively if the nodes are added iteratively I could keep track of ID => ith node.
I am just getting started with network analysis so please correct me on anything I am saying. Any help is appreciated. Thanks!
EDIT:
I should add an example of what I am doing so far:
using DataFrames, Dates, Plots, Random
using LightGraphs, SimpleWeightedGraphs, GraphRecipes
Random.seed!(1)
customers = [randstring(3) for _ in 1:15]
df = DataFrame(customer = vcat(customers, customers),
item = rand(["apple", "bread", "banana"], 30),
date = Date(2020, 01, 01) .+ Day.(rand(1:100, 30))
)
dfg = combine(nrow => :weight, groupby(df, [:customer, :item]))
verts = sort([unique(df.customer)..., unique(df.item)...])
G = SimpleWeightedDiGraph(length(verts))
labels = Dict() # for plot
for row in eachrow(dfg)
s = searchsortedfirst(verts, row.customer)
d = searchsortedfirst(verts, row.item)
w = row.weight
add_edge!(G, SimpleWeightedEdge(s, d, w))
labels[(s,d)] = w
end
centr = eigenvector_centrality(G)
dfn = DataFrame(vertex = verts, centrality = centr)
plot(verts, centr,
seriestype = :scatter,
legend = false,
xrotation = 60,
xticks = :all)
graphplot(G, names = verts, edgelabel = labels, arrow = true)