Pagerank for weighted graphs in julia

Hello everyone, I need to use Graphs.jl for a project which requires me to mainly use the pagerank function on an undirected weighted graph. I found SimpleWeightedGraphs.jl provides this functionality. It implements the pagerank function and the docs clearly say that it applies the page rank algorithm on a weighted graph. However, in practice, when I calculated the pagerank values for a weighted graph using this function, the output does not seem to depend on the input at all. I drastically changed the weights of an edge and still the output remains exactly the same.

My code:

julia> using Graphs, SimpleWeightedGraphs

julia> gio = open("jlswg.swg", "r")
IOStream(<file jlswg.swg>)

julia> g = loadgraph(gio, "mygraph", SWGFormat())
{19, 18} undirected simple Int64 graph with Float64 weights

julia> pagerank(g)
19-element Vector{Float64}:
 0.1092146212861009
 0.10356432533049537
 0.10178023275386266
 0.10206193095278274
 0.07852598368760652
 0.10973951356004562
 0.030143177366541946
 0.031102727252357323
 0.031102727252357323
 0.031102727252357323
 0.029902352263017743
 0.029902352263017743
 0.029522712438970138
 0.029522712438970138
 0.02958336377901657
 0.02958336377901657
 0.031215058781161084
 0.031215058781161084
 0.031215058781161084

jlswg.swg looks like this:

LightGraphs.SimpleWeightedGraph,19,18,u,mygraph,1,Int64,Float64,simpleweightedgraph
1,2,1.0
2,3,33.0
3,4,2.0
4,5,5.0
5,6,99.0
5,7,5.0
1,8,2.0
1,9,2.0
1,10,4.0
2,11,12.0
2,12,34.0
3,13,999.0
3,14,8.0
4,15,5.0
4,16,34.0
6,17,2.0
6,18,3.0
6,19,5.0

I tried changing many different values by large amounts, but still the output for pagerank(g) remains the same. What might the issue here? Is it something to do with SimpleWeightedGraphs.jl or my implementation?

Hey @vmpyr, welcome!
I fixed the pagerank function a few months ago, and I added a test that is based on a comparison with networkx, so I’m like 43% confident it is accurate ^^ Is there any way you could test it with the python library to compare results?

2 Likes

Sure, I’ll do that.

Hey @gdalle, I tried pagerank with networkx, for the same file as shown above, and these are the results, you can clearly see the greater rank score of node 3 and node 13 since they are part of the most “weighted” edge by far:

{
 '1': 0.07999848984931547, 
 '2': 0.05603006628782006, 
 '3': 0.1442076878706111, 
 '4': 0.06915320750734422,
 '5': 0.13443522946519423,
 '6': 0.14117357854542822,
 '7': 0.013136636002009195,
 '8': 0.023005526742333755,
 '9': 0.023005526742333755,
 '10': 0.03811631664256225,
 '11': 0.015038603206693566,
 '12': 0.02813569154177212,
 '13': 0.12541081917264824,
 '14': 0.008835806570477979,
 '15': 0.014283774810718273,
 '16': 0.05134019502867372,
 '17': 0.010096463539654842,
 '18': 0.011197326888429629,
 '19': 0.013399053585979207
}

Ok then that’s clearly on us. Can you open an issue on the repo linking this discourse thread? I’ll try to take a look

Update: I just did it `pagerank` is wrong · Issue #47 · JuliaGraphs/SimpleWeightedGraphs.jl · GitHub

1 Like

Thanks for this! I’m sorry I didn’t see this earlier, or I would have opened it.

@gdalle, I did some digging on my own, and I think I know the reason for this error. When I run:

julia> @which pagerank(g)
pagerank(g::AbstractGraph{U}) where U<:Integer
     @ Graphs ~/.julia/packages/Graphs/7SMZs/src/centrality/pagerank.jl:15

So, the function inside of src/overrides.jl in SimpleWeightedGraphs isn’t being used.

Oh I think I know: for some reason SimpleWeightedGraphs.pagerank is only implemented for weighted digraphs. And your graph is undirected, which explains the fallback on the unweighted Graphs.pagerank.
This must be a historical artifact: @mbesancon @simonschoelly @etienne_dg any idea why this was done that way?

Btw I didn’t even pay attention but judging by your code sample the file storage to be from LightGraphs.jl, which is deprecated. Don’t forget to update it to Graphs.jl.

1 Like

And your graph is undirected, which explains the fallback on the unweighted Graphs.pagerank

Oh okay.

Don’t forget to update it to Graphs.jl.

I thought about that, and changing it worked fine as well. However, I should point out that this file was generated by the savegraph() function when I saved it after initially loading the graph with the add_edge!() function, hence, I did not bother to change it. We should change that behaviour in the source code as well. Also, the documentation for SWGFormat still mentions LightGraphs.SimpleWeightedGraph as the file header. That should be changed as well. I’ll open a PR when I get time.

1 Like

Until we fix the undirected case, I think turning your graph into a directed one by adding 2 directed edges for each undirected one should do the trick

2 Likes

Pagerank was devised for digraphs, because it associates the importance of a node i, based on how important are the nodes that have links to i (i.e. they recommend the node i). I may like the answers given here by expert julians, but it doesn’t mean that I’m skilled and have experience in Julia just because I put a link to their representative nodes. This could be the reason why there is no Julia implementation for weighted undirected graphs.
Even for weighted directed graphs the Pagerank algorithm depends on what that graph is modeling.
As example of weighted directed graphs, there exists an old paper published 5 years before the Page&Brin paper, related to ranking footbal teams:
J. P. Keener, The Perron-Frobenius Theorem and the Ranking of Football Teams, SIAM Review, (1993) 35 80-93.
or the Eigenfactor algorithm [C. T. Bergstrom, Eigenfactor: Measuring the value and prestige of scholarly journals, College & Research Libraries News (2007), 5, 68] that estimates the relative influence of an academic journal, based on citations in other journals in the same field, and more other papers.
Each one has a specific setting of weights matrix, a particular teleportation distribution (not necessarily the uniform distribution), and the value of the parameter \alpha is different of 0.85.

That is why even in the case of digraphs the weighted Pagerank algorithm cannot be a general one, but it has to be adapted to the studied graph.

2 Likes

@empet, I beg to differ. A weighted undirected graphs’s pageranks can also be very useful. As for my case, I want to obtain a pagerank vector of companies(nodes) where the edges are volume of transaction, doesn’t mater who payer or payee, so a weighted undirected graphs fits my needs best. And a pagerank algorithm can be generalised to some extent, even for weighted graphs, we can always ask for the specific values of alpha and tolerance as input to function.

Yes, you may implement a pagerank for weighted undirected graphs, and adapt it to your specific data, but what information do you extract from such a ranking in which you involve both inlinks and outlinks?

In this case, a general idea of the money flowing in the company, it doesn’t matter if it is coming in or going out.

I know what you are saying, but this is not an answer to my question: how do you interpret the r-value associated to a node, where r is the coordinate of the pagerank vector corresponding to that node?
I’m not aware on any result concerning the pagerank of weighted undirected graphs. For simple undirected graphs long time it was believed that the pagerank is proportional to the degree distribution. Here [1205.1960] A Note on the PageRank of Undirected Graphs it is proved that in this case the pagerank is just approximated at some extent by this distribution.

1 Like

Ohkay, I now understand what you were trying to convey. Well, I am currently in the process of creating an algorithm for such a task and exploring as many libraries as possible to get an idea. Julia would be best for implementation because I need to create a small-sized binary of the code finally (I can do that in Python, but the file size is huge). I found this anomaly in the SimpleWeightedGraphs library and hence reported it. Thanks for sharing the paper, btw!

So what would be the best solution? Make the implementation in SWG work for directed and undirected graphs alike? Perhaps add a warning for undirected graphs with a link to the paper?

As I pointed out above the procedure to get from the weight matrix,
a stochastic matrix, that ensures the convergence to an equilibrium distribution (pagerank), differs from a graph to another.
To be more convincing I give a few details in the case of Eigenfactor score and the pagerank for football teams:
For the Eigenfactor score the weight matrix, C_{ij} i, j = \overline{1,m}, is the citation matrix, i.e. C_{ij} represents the citations from journal i, published in the year y, of the papers published in journal j, during the target period (the last 5 years, y-5, y-4, y-3, y-2, y-1).
The self-citations are omitted, i.e C_{ii}=0. Each nonezero line of the citation matrix is normalized:
p_{ij}=\displaystyle\frac{C_{ij}}{\sum_{k=1}^m C_{ik}}. Denote by P=(p_{ij}) the corresponding matrix.
If a line i in the matrix P is the null vector, i.e. the journal i corresponds to a dangling page in the WWW network settings, then p[i, :] is replaced by a special probabilistic vector, pv, which is the normalization of the vector {\bf nr}=[nr_1, nr_2, \ldots, nr_m], where nr_i represents the number of papers published by the i^{th} journal in the target period.
(for details see the paper that discusses the Eigenfactor). Let P' the resulted matrix.

Finally the stochatic matrix involved in computing the eigenfactor score is:
E=\alpha\left[\begin{array} p'_{11}&p'_{12}&\ldots&p'_{1m}\\ p'_{21}&p'_{22}&\ldots&p'_{2m}\\ \vdots&\vdots&\ldots&\vdots\\ p'_{m1}&p'_{m2}&\ldots&p'_{mm}\end{array}\right]+(1-\alpha)\left[\begin{array}{cccc} pv_1&pv_2&\ldots&pv_m\\pv_1&pv_2&\ldots&pv_m\\ \vdots&\vdots&\ldots&\vdots\\ pv_1&pv_2&\ldots&pv_m\end{array}\right]
where \alpha\in (0,1) depends on the scientific area (physics, social sciences, etc) of the monitored journals. hence in this case the teleporting distribution is not the uniform one.

On the other hand if we want to rank the football teams T_1, T_2, \ldots, T_n, then we look for the following information:

  • how many matches each pair (T_i, T_j), i\neq j, of teams played. We denote this number by g_{ij}. Obviously, g_{ij}=g_{ji}.
  • \ell_{ij} is the number of matches lost by team T_i from those g_{ij} played with team T_j. \ell_{ij}\neq\ell{ji}.
    We associate the weights matrix of elements W_{ij}=\ell_{ij}/g_{ij}.
    In other words the associated graph has a link from T_i to T_j if T_i lost at least one match in the games played with T_j.
    If a team T_k did not lose any match when it played with the rest of the teams, then on line k all elements are 0. Team T_k is in this case equivalent to a dangling page.
    If a team T_i has never played against the team T_j, then in the position (i,j) and (j,i) of the matrix we have 0.

We normalize the weigths matrix, dividing the elements of each non-zero line, k, by the sum of the elements on that line and replacing a zero line with 1.0/n elements as in the case of dangling pages. In other words, “we force” the team T_k to have lost a percentage of 100/n\% matches with each team.
After these operations, the obtained matrix is a stochastic matrix. Here the teleporting distribution is taken the uniform distribution.
For football teams, it was found that the damping factor \alpha=0.95 leads to a pagerank of the teams close to the index from FIFA for ranking.
These two example, and many more existent on WWW, illustrate
that an implementation of the pagerank-style algorithm for directed or undirected weighted graphs cannot be done, by giving as input the weight matrix. An user should process his data, and decide how the particular elements and dangling lines in that matrix must be defined and which teleporting distribution and damping factor is the most suitable for the his data/graph. The Julia implementation should get as input the stochatic matrix P= \alpha S + (1-\alpha) TD that ensures the convergence to a probability distribution (i.e. the pagerank).

1 Like