Creating Adjacency matrix from a DataFrame

I want to create an adjacency matrix with first two columns of a dataframe and the weights will be taken from the 3rd column.
for reference, say the dataframe looks like:

index  Name_A  Name_B
  0    Adam    Ben
  1    Chris   David
  2    Adam    Chris
  3    Ben     Chris

and I’ll like to obtain the adjacency matrix

      Adam Ben Chris David
Adam   0    1    1     0
Ben    0    0    1     0
Chris  0    0    0     1
David  0    0    0     0

here the weights are random.
It can be done in python easily but is there a better way to do it in julia?

nA=["adam", "chris", "adam", "ben"]  #(=df.name_A)
nB=["ben", "david", "chris", "chris"]   #(=df.name_B)

n=sort(union(nA,nB))
[e in nA.*nB for e in n.*reshape(n,1,:)]

n=sort(union(nA,nB))
xnA=indexin(nA,n)
xnB=indexin(nB,n)
using SparseArrays
ad=sparse(xnA,xnB,fill(1,length(n)))

n=sort(union(nA,nB))
xnA=indexin(nA,n)
xnB=indexin(nB,n)
sz=length(n)
ad=fill(0, sz,sz)
ad[CartesianIndex.(zip(xnA,xnB))].=1

Thanks for the reply, but let say i have weights in another column
how can i add weights in it?

I would use GitHub - JuliaGraphs/SimpleWeightedGraphs.jl: Edge-weighted graphs compatible with Graphs.jl for this.

2 Likes

could you elaborate this? may be showing an example

is this perhaps?

julia> df=DataFrame(;index,nA,nB,w)
4×4 DataFrame
 Row │ index  nA      nB      w
     │ Int64  String  String  Float64
─────┼──────────────────────────────────
   1 │     0  adam    ben     0.137223
   2 │     1  chris   david   0.0460983
   3 │     2  adam    chris   0.967263
   4 │     3  ben     chris   0.42919

julia> n=sort(union(nA,nB))
4-element Vector{String}:
 "adam"
 "ben"
 "chris"
 "david"

julia> sz=length(n)
4

julia> ad=fill(0., sz,sz)
4×4 Matrix{Float64}:
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

julia> ad[CartesianIndex.(zip(xnA,xnB))]=w
4-element Vector{Float64}:
 0.13722275200211942
 0.04609834073111019
 0.9672631957847251
 0.4291902000590766

julia> ad
4×4 Matrix{Float64}:
 0.0  0.137223  0.967263  0.0
 0.0  0.0       0.42919   0.0
 0.0  0.0       0.0       0.0460983
 0.0  0.0       0.0       0.0
using DataFrames
index=0:3
nA=["adam", "chris", "adam", "ben"]
nB=["ben", "david", "chris", "chris"]
w=rand(4)
df=DataFrame(;index,nA,nB,w)

dfe=vcat(df,DataFrame(;nA="david"),DataFrame(;nB="adam"), cols=:union)

udf=unstack(dfe, :nA,:nB,:w, allowmissing=true, fill=0)

sort!(udf,:nA)

select(udf, ["nA"; sort(names(udf)[2:end])])[1:end-1,1:end-1]

4×5 DataFrame
 Row │ nA       adam      ben       chris     david    
     │ String?  Float64?  Float64?  Float64?  Float64?
─────┼─────────────────────────────────────────────────
   1 │ adam          0.0  0.354848  0.90648   0.0
   2 │ ben           0.0  0.0       0.118272  0.0
   3 │ chris         0.0  0.0       0.0       0.708691
   4 │ david         0.0  0.0       0.0       0.0
1 Like