Creating a Weighted Graph

erlebach · June 13, 2020, 11:14am

That is great, that I am in the ballpark. So I am stopping the optimization process, although it might be possible to decrease memory allocation further with more sophisticated datastructures at the outset. A graph, even with 20 million edges should not tax my memory significantly. Your help is much appreciated, @anon94023334!

anon94023334 · June 13, 2020, 4:26pm

A graph with 1 million nodes and 20 million edges weighted by Float64s is trivially represented by a SimpleWeightedGraph in ~ 618 MB. If you squash the vertex indices down to UInt32 (using squash() on the SimpleGraph prior to creating the SWG), you get that down to 462 MB. If you restrict your weights to Float32, it’s 309MB.

So whatever’s happening in your data is likely not due directly to the storage that the graph is requiring.

Henrique_Becker · June 13, 2020, 6:38pm

I just wanna point two things:

You (kinda) can have private fields in Julia if you really want.
Undefined behavior is a term often used by a language specification for code that should not be written because the language definition does not specify what to do in such case and, consequently, each compiler implementation can act differently, possibly with segmentation fault or silently corrupting data. Accessing such fields is probably not undefined behavior, with a specific version of Julia (language, not compiler implementation) and a specific version of the package in question (with source code available), it will behave in a predictably fashion, what is problematic is that you violated the package API and any version change can break the code (because the implementation detail you are accessing can change with minor changes), but if you know what you are doing and have no problem with fixing the package version, it can be a solution (while very restrictive).

anon94023334 · June 14, 2020, 12:42am

That’s very cool. I think it’s a bit too magical right now, but if this turns out to be a bigger issue than it’s been to date perhaps we do it.

You’re right re: undefined behavior vs API contract. I should have chosen my words more carefully but the bottom line still is: don’t do this if you expect your code to work

Tamas_Papp · June 14, 2020, 6:23am

“Consequences are undefined” is a very common way of phrasing this in software documentation. In a library, it is implied that they are undefined by the API. So I think your way of saying it is fine.

Henrique_Becker · June 14, 2020, 3:51pm

Fair enough. However, given any Julia compiler following the language specification and the internals of some open source valid code, you can predict its behavior, with undefined behavior the compiler implementation is free to not even be deterministic/consistent in the behavior.

Also, unless I am missing something, the “software documentation” you linked is a language specification, so…

Tamas_Papp · June 14, 2020, 5:42pm

Sure, but that’s not the relevant question: given that you are talking about computers, you can pretty much always predict everything (except for random seeds, etc).

The key question is what behavior you can predict

given a Julia version,
package versions (mostly major, but minor if you are relying on that),
the API specs of packages which conform to SemVer.

So basically your Project.toml, including the [compat] bounds.

Version 10.3.1 of HypotheticalPackage.jl can define

struct Foo{T,S}
    a::T
    b::S
end

with a given interface, which does not include accessing fields.

Then, if you are accessing fields, and version 10.3.2 is refactored so that the API is unchanged but the fields of Foo are named something else, you are in the “consequences are undefined” situation. It will mostly likely just throw a error in current Julia (so in that sense, it is predictable), it’s just that the HypotheticalPackage.jl API does not define/care what happens.

Yes, you are missing the fact that this is only an example. This kind of language us used pervasively — it is pretty standard.

Finally, given that this is a topic where a package author is kindly giving user support, I am not sure I understand why you started this subdiscussion.

Henrique_Becker · June 14, 2020, 6:22pm

I understand your point while, in my experience, I have only seen the exact term undefined behavior in the context of language specifications.

But this bit:

Finally, given that this is a topic where a package author is kindly giving user support, I am not sure I understand why you started this subdiscussion.

This is just hypocritical. I can as well ask why did you continue this subdiscussion. My first comment in this thread (and only comment that does not answer to you) pointed two things: (1) that you kinda can have private fields, that is a knowledge sbromberger did not had and found positive to know (so I aggregated something to the discussion); (2) criticized the term undefined behavior, in which sbromberger agreed with me. The discussion could have ended here, but was you that found relevant enough to dispute my terminology here. If you are now second-guessing the usefulness of this subdiscussion that was effectively started by you, then you can delete your posts that I will delete my posts answering you. But I think there is value to my initial reply to sbromberger.

erlebach · June 14, 2020, 7:29pm

Hi Seth,

I think I found a bug in SimpleWeightedGraphs (which does not occur in LightGraphs):
If I add a self-edge, in this case from node 1 to node 1, no error is thrown, the number of edges is not incremented (I do not know why one would purposefully impose this limitation), and yet, has_edge returns true.

graph = SimpleWeightedGraph(100)
add_edge!(graph, 1,1)
println("ne(graph): ", ne(graph))
println("has_edge(graph, 1, 1): ", has_edge(graph, 1,1))
rem_edge!(graph, 1, 1)
println("ne(graph): ", ne(graph))
println("has_edge(graph, 1, 1): ", has_edge(graph, 1,1))

anon94023334 · June 15, 2020, 4:54am

This isn’t a bug; it’s just that SimpleWeightedGraphs doesn’t support self-loops (unlike SimpleGraphs, which “sort of” does - most things will work; some won’t.)

The definition of a simple graph is one with no multi-edges, no self-loops, and no weights, so if you stick to making your graphs confirm to that definition when you use Simple*Graphs packages, you won’t run into any problems.

anon94023334 · June 15, 2020, 4:26pm

Thinking about how to handle this since we’re coming up on v2.0, it’s worth a discussion as to whether to disallow self-loops entirely in these packages and instead, if self-loops are needed, have an implementation in a separate graph structure.

erlebach · June 15, 2020, 4:55pm

I think that either self-loops should be allowed or they should not be allowed. I have opinion, except that if they are not allowed, a command such as has_edge(g, i, i) should always return false, contrary to the current implementation of SimpleWeightedGraph.

anon94023334 · June 16, 2020, 4:24pm

a command such as has_edge(g, i, i) should always return false ,

That’s the way it used to work. We made the change at the request of the JuliaBio community and it worked for them, but I don’t know whether they still need it anymore.

erlebach · June 16, 2020, 9:50pm

@anon94023334, here is what I’d like to do, and perhaps you can give me some advice. I’d like to implement the following. Consider two graphs (G1, E1) and (G2, E2), with the same nodes, but different edges. I would like both graphs to share the node structure but with different edge structures. So each graph would have its own adjacency matrix. Of course I could create two independent graphs. Perhaps that would work. But here is the problem. I want the node to have a state index. If the node is in state “S1”, graph G1 would be activated and only its connections available to the program, and if the node is in state “S2”, graph G2 would be activated and only its connections would be available. This is an example of a multiplex network with two layers. This can be generalized, but this is sufficient. Do you think it is easy to create a package based on LightGraphs that can do this? What do you think?