I want to create an adjacency matrix with first two columns of a dataframe and the weights will be taken from the 3rd column.
for reference, say the dataframe looks like:
index Name_A Name_B
0 Adam Ben
1 Chris David
2 Adam Chris
3 Ben Chris
and I’ll like to obtain the adjacency matrix
Adam Ben Chris David
Adam 0 1 1 0
Ben 0 0 1 0
Chris 0 0 0 1
David 0 0 0 0
here the weights are random.
It can be done in python easily but is there a better way to do it in julia?
nA=["adam", "chris", "adam", "ben"] #(=df.name_A)
nB=["ben", "david", "chris", "chris"] #(=df.name_B)
n=sort(union(nA,nB))
[e in nA.*nB for e in n.*reshape(n,1,:)]
n=sort(union(nA,nB))
xnA=indexin(nA,n)
xnB=indexin(nB,n)
using SparseArrays
ad=sparse(xnA,xnB,fill(1,length(n)))
n=sort(union(nA,nB))
xnA=indexin(nA,n)
xnB=indexin(nB,n)
sz=length(n)
ad=fill(0, sz,sz)
ad[CartesianIndex.(zip(xnA,xnB))].=1
Thanks for the reply, but let say i have weights in another column
how can i add weights in it?
could you elaborate this? may be showing an example
is this perhaps?
julia> df=DataFrame(;index,nA,nB,w)
4×4 DataFrame
Row │ index nA nB w
│ Int64 String String Float64
─────┼──────────────────────────────────
1 │ 0 adam ben 0.137223
2 │ 1 chris david 0.0460983
3 │ 2 adam chris 0.967263
4 │ 3 ben chris 0.42919
julia> n=sort(union(nA,nB))
4-element Vector{String}:
"adam"
"ben"
"chris"
"david"
julia> sz=length(n)
4
julia> ad=fill(0., sz,sz)
4×4 Matrix{Float64}:
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
julia> ad[CartesianIndex.(zip(xnA,xnB))]=w
4-element Vector{Float64}:
0.13722275200211942
0.04609834073111019
0.9672631957847251
0.4291902000590766
julia> ad
4×4 Matrix{Float64}:
0.0 0.137223 0.967263 0.0
0.0 0.0 0.42919 0.0
0.0 0.0 0.0 0.0460983
0.0 0.0 0.0 0.0
using DataFrames
index=0:3
nA=["adam", "chris", "adam", "ben"]
nB=["ben", "david", "chris", "chris"]
w=rand(4)
df=DataFrame(;index,nA,nB,w)
dfe=vcat(df,DataFrame(;nA="david"),DataFrame(;nB="adam"), cols=:union)
udf=unstack(dfe, :nA,:nB,:w, allowmissing=true, fill=0)
sort!(udf,:nA)
select(udf, ["nA"; sort(names(udf)[2:end])])[1:end-1,1:end-1]
4×5 DataFrame
Row │ nA adam ben chris david
│ String? Float64? Float64? Float64? Float64?
─────┼─────────────────────────────────────────────────
1 │ adam 0.0 0.354848 0.90648 0.0
2 │ ben 0.0 0.0 0.118272 0.0
3 │ chris 0.0 0.0 0.0 0.708691
4 │ david 0.0 0.0 0.0 0.0
1 Like