MetaGraph node indexing w/metadata?

question

#1

I’m trying to iteratively generate an interaction graph given a table of data, and will often come up on a situation where I need to check an interaction (edge weight) between two nodes is less than some new value, and if so, update it to the new value. In the table, the nodes will have unique names. So for example, given the table:

sp me coef
sp1 me1 1.0
sp2 me1 1.2
sp1 me2 0.3
sp1 me1 1.1

On the first row, I want to add nodes sp1 and me1 with an edge of 1.0. On row 2, I’d like to add node sp2 and link it to me1 with an edge of 1.2. Row 3 adds me2, and row 4 updates the edge of sp1:me1, since 1.1 > 1.0.

One solution so this is to store a separate dictionary that maps the node name to the numerical node index, but I’m wondering if there’s a native and efficient way to do this by giving the nodes something like a :name attribute that could be used for indexing.


#2

It is unclear what (if any) library/data structures you are using. Please post a self-contained minimal working example. That said, a dictionary will give you the fastest lookup if you have more than a few keys, but some wrapper structure could make the whole lookup syntax seamless. But to give more help, we need to see some code.


#3

My bad - I’m trying to use MetaGraphs.jl.

The above would be:

using LightGraphs
using MetaGraphs

G = MetaGraph(4)

# In a loop, the first row would result in:
set_prop!(G, 1, :name, "sp1")
set_prop!(G, 2, :name, "me1")
set_prop!(G, 1, 2, :weight, 1.0)

# second row
set_prop!(G, 3, :name, "sp2")
set_prop!(G, 3, 2, :weight, 1.2)

# etc

The question is whether I can efficiently find the indices (3 and 2 in that last line) using the names (sp2, me1). The best I’ve thought of is to set up a separate dictionary that stores name::String=>indx::Int.


#4

It should be possible to do this using filter_vertices:

filter_vertices(G, :name, "sp2") # select all the nodes with name "sp2"

filter_vertices(G, :name, "me1") # select all the nodes with name "me1"

You can also pass a function in that selects the nodes.

See https://github.com/JuliaGraphs/MetaGraphs.jl/pull/2#issuecomment-316547052 for a quick example and https://juliagraphs.github.io/MetaGraphs.jl/latest/metagraphs.html#MetaGraphs.filter_vertices-Tuple{MetaGraphs.AbstractMetaGraph,Function} for the docs.


#5

Thanks @sbromberger - I have no memory of commenting on that issue! I saw the filtering in the docs, but was wondering if there’s an indexing method that’s as fast as a dictionary lookup would be. I hadn’t tested it - I just assumed the filtering was slower. Sure enough:

using LightGraphs
using MetaGraphs
using BenchmarkTools

function set_up()
    G = MetaGraph(1000)
    sp = Dict("sp$i" => i for i in 1:500)
    me = Dict("me$i" => i+500 for i in 1:500)

    for i in 1:1000
        if i <= 500
            set_prop!(G, i, :name, "sp$i")
        else
            set_prop!(G, i, :name, "me$(i-500)")
        end
    end

    x = rand(1:500, 10000)
    y = rand(501:1000, 10000)
    w = rand(10000)

    return (G, x, y, w)
end

function name_lookup(mgraph::MetaGraph, name::String)
    return collect(filter_vertices(mgraph, :name, name))[1]
end

function dict_assign!(mgraph::MetaGraph, spindex::Vector{Int}, meindex::Vector{Int}, weights::Vector{Float64})
    for i in 1:10000
        spi = sp["sp$(spindex[i])"]
        mei = me["me$(meindex[i]-500)"]
        weight = weights[i]
        (has_prop(mgraph, Edge(spi, mei), :weight) && weight <= get_prop(mgraph, Edge(spi, mei), :weight)) && continue    

        set_prop!(mgraph, Edge(spi, mei), :weight, weight)
        
    end
end

function filter_assign!(mgraph::MetaGraph, spindex::Vector{Int}, meindex::Vector{Int}, weights::Vector{Float64})
    for i in 1:10000
        spi = name_lookup(mgraph, "sp$(spindex[i])")
        mei = name_lookup(mgraph, "me$(meindex[i]-500)")
        weight = weights[i]
        (has_prop(mgraph, Edge(spi, mei), :weight) && weight <= get_prop(mgraph, Edge(spi, mei), :weight)) && continue    
        
        set_prop!(mgraph, Edge(spi, mei), :weight, weight)
    end
end

function test_dict()
    (G, x, y, w) = set_up()

    dict_assign!(G, x, y, w)
end

function test_filter()
    (G, x, y, w) = set_up()

    filter_assign!(G, x, y, w)
end

@benchmark test_dict() 
#=
BenchmarkTools.Trial: 
  memory estimate:  28.45 MiB
  allocs estimate:  377474
  --------------
  minimum time:     34.503 ms (21.11% GC)
  median time:      40.698 ms (24.59% GC)
  mean time:        41.800 ms (28.85% GC)
  maximum time:     53.724 ms (27.73% GC)
  --------------
  samples:          120
  evals/sample:     1
=#
@benchmark test_filter()
#=
BenchmarkTools.Trial: 
  memory estimate:  22.68 GiB
  allocs estimate:  160517950
  --------------
  minimum time:     16.994 s (32.95% GC)
  median time:      16.994 s (32.95% GC)
  mean time:        16.994 s (32.95% GC)
  maximum time:     16.994 s (32.95% GC)
  --------------
  samples:          1
  evals/sample:     1
=#

Not sure if I did it quite right, but using the filter method looks to be substantially slower. I’m wondering if it might be worth adding a metaindex or something that could be used to index a MetaGraph with an arbitrary type (eg String) that would be unique per node. I’d be happy to open an issue or start a PR if that seems like something that would be useful.


#6

Wow. Those benchmarks are certainly discouraging. Let me look into this a bit more, but if you happen across a performance improvement, I’d be grateful for a PR.

Edit: one thing I’d point out is that filter_* does something subtly different than a dictionary lookup: specifically, it does not assume that the values are unique. With a dict lookup, you are limited to a single result. filter_vertices returns an iterator that will find all vertices meeting the condition. That means that (without some other optimizations) it’s O(n) at a minimum.

Edit 2: I’ve made a quick-n-dirty filter_first_vertex function that improves the benchmark to 5s - still not as fast as dict lookup but it will never be:

"""
    filter_first_vertex(g, prop[, val])
    filter_first_vertex(g, fn)

Return the first vertex that has property `prop` defined (optionally
as `val`), or where function `fn` returns `true`.

### Implementation Notes
    Will throw an error if no vertex matches.
"""
function filter_first_vertex(g::AbstractMetaGraph, fn::Function)
    for v in vertices(g)
        fn(g, v) && return v
    end
    error("No vertex matching conditions found")
end

filter_first_vertex(g::AbstractMetaGraph, prop::Symbol) =
    filter_first_vertex(g, (g, x) -> has_prop(g, x, prop))

filter_first_vertex(g::AbstractMetaGraph, prop::Symbol, val) =
    filter_first_vertex(g, (g, x) -> has_prop(g, x, prop) && get_prop(g, x, prop) == val)

I don’t much like the name so if it manages to go into MetaGraphs it will likely be called something different, but you can just include it in your code directly for now. There may be other optimizations possible.


#7

Edit: one thing I’d point out is that filter_* does something subtly different than a dictionary lookup: specifically, it does not assume that the values are unique.

Right - I’m specifically looking for something that would be unique per-node. I think it would require adding something to the type, in a similar vein as the weightfield property (indexfield or something), as well as a dictionary. I’m not sure how performant it would be if you have to check to make sure that two nodes don’t have the same key every time you change it, but that was my idea.


#8

I’d be happy to entertain a PR that allowed “reverse vertex / edge indices” as a dictionary (that optionally enforced value uniqueness), but we’d have to discuss whether this makes sense as part of the metagraph structure or whether it’s better just to have it outside. :slight_smile:


#9

Definitely worth a discussion. I opened a PR with a proof of principle for what I was thinking - would be happy to take feedback (or suggestions for how to structure something outside the MetaGraph package if that’s preferred).