That is great, that I am in the ballpark. So I am stopping the optimization process, although it might be possible to decrease memory allocation further with more sophisticated datastructures at the outset. A graph, even with 20 million edges should not tax my memory significantly. Your help is much appreciated, @anon94023334!
A graph with 1 million nodes and 20 million edges weighted by Float64
s is trivially represented by a SimpleWeightedGraph in ~ 618 MB. If you squash the vertex indices down to UInt32
(using squash()
on the SimpleGraph prior to creating the SWG), you get that down to 462 MB. If you restrict your weights to Float32
, itâs 309MB.
So whateverâs happening in your data is likely not due directly to the storage that the graph is requiring.
I just wanna point two things:
- You (kinda) can have private fields in Julia if you really want.
- Undefined behavior is a term often used by a language specification for code that should not be written because the language definition does not specify what to do in such case and, consequently, each compiler implementation can act differently, possibly with segmentation fault or silently corrupting data. Accessing such fields is probably not undefined behavior, with a specific version of Julia (language, not compiler implementation) and a specific version of the package in question (with source code available), it will behave in a predictably fashion, what is problematic is that you violated the package API and any version change can break the code (because the implementation detail you are accessing can change with minor changes), but if you know what you are doing and have no problem with fixing the package version, it can be a solution (while very restrictive).
Thatâs very cool. I think itâs a bit too magical right now, but if this turns out to be a bigger issue than itâs been to date perhaps we do it.
Youâre right re: undefined behavior vs API contract. I should have chosen my words more carefully but the bottom line still is: donât do this if you expect your code to work
âConsequences are undefinedâ is a very common way of phrasing this in software documentation. In a library, it is implied that they are undefined by the API. So I think your way of saying it is fine.
Fair enough. However, given any Julia compiler following the language specification and the internals of some open source valid code, you can predict its behavior, with undefined behavior the compiler implementation is free to not even be deterministic/consistent in the behavior.
Also, unless I am missing something, the âsoftware documentationâ you linked is a language specification, soâŚ
Sure, but thatâs not the relevant question: given that you are talking about computers, you can pretty much always predict everything (except for random seeds, etc).
The key question is what behavior you can predict
- given a Julia version,
- package versions (mostly major, but minor if you are relying on that),
- the API specs of packages which conform to SemVer.
So basically your Project.toml
, including the [compat]
bounds.
Version 10.3.1 of HypotheticalPackage.jl can define
struct Foo{T,S}
a::T
b::S
end
with a given interface, which does not include accessing fields.
Then, if you are accessing fields, and version 10.3.2 is refactored so that the API is unchanged but the fields of Foo
are named something else, you are in the âconsequences are undefinedâ situation. It will mostly likely just throw a error in current Julia (so in that sense, it is predictable), itâs just that the HypotheticalPackage.jl API does not define/care what happens.
Yes, you are missing the fact that this is only an example. This kind of language us used pervasively â it is pretty standard.
Finally, given that this is a topic where a package author is kindly giving user support, I am not sure I understand why you started this subdiscussion.
I understand your point while, in my experience, I have only seen the exact term undefined behavior in the context of language specifications.
But this bit:
Finally, given that this is a topic where a package author is kindly giving user support, I am not sure I understand why you started this subdiscussion.
This is just hypocritical. I can as well ask why did you continue this subdiscussion. My first comment in this thread (and only comment that does not answer to you) pointed two things: (1) that you kinda can have private fields, that is a knowledge sbromberger did not had and found positive to know (so I aggregated something to the discussion); (2) criticized the term undefined behavior, in which sbromberger agreed with me. The discussion could have ended here, but was you that found relevant enough to dispute my terminology here. If you are now second-guessing the usefulness of this subdiscussion that was effectively started by you, then you can delete your posts that I will delete my posts answering you. But I think there is value to my initial reply to sbromberger.
Hi Seth,
I think I found a bug in SimpleWeightedGraphs (which does not occur in LightGraphs):
If I add a self-edge, in this case from node 1 to node 1, no error is thrown, the number of edges is not incremented (I do not know why one would purposefully impose this limitation), and yet, has_edge
returns true.
graph = SimpleWeightedGraph(100)
add_edge!(graph, 1,1)
println("ne(graph): ", ne(graph))
println("has_edge(graph, 1, 1): ", has_edge(graph, 1,1))
rem_edge!(graph, 1, 1)
println("ne(graph): ", ne(graph))
println("has_edge(graph, 1, 1): ", has_edge(graph, 1,1))
This isnât a bug; itâs just that SimpleWeightedGraphs doesnât support self-loops (unlike SimpleGraphs, which âsort ofâ does - most things will work; some wonât.)
The definition of a simple graph is one with no multi-edges, no self-loops, and no weights, so if you stick to making your graphs confirm to that definition when you use Simple*Graphs packages, you wonât run into any problems.
Thinking about how to handle this since weâre coming up on v2.0, itâs worth a discussion as to whether to disallow self-loops entirely in these packages and instead, if self-loops are needed, have an implementation in a separate graph structure.
I think that either self-loops should be allowed or they should not be allowed. I have opinion, except that if they are not allowed, a command such as has_edge(g, i, i)
should always return false
, contrary to the current implementation of SimpleWeightedGraph
.
a command such as
has_edge(g, i, i)
should always returnfalse
,
Thatâs the way it used to work. We made the change at the request of the JuliaBio community and it worked for them, but I donât know whether they still need it anymore.
@anon94023334, here is what Iâd like to do, and perhaps you can give me some advice. Iâd like to implement the following. Consider two graphs (G1, E1) and (G2, E2), with the same nodes, but different edges. I would like both graphs to share the node structure but with different edge structures. So each graph would have its own adjacency matrix. Of course I could create two independent graphs. Perhaps that would work. But here is the problem. I want the node to have a state index. If the node is in state âS1â, graph G1 would be activated and only its connections available to the program, and if the node is in state âS2â, graph G2 would be activated and only its connections would be available. This is an example of a multiplex network with two layers. This can be generalized, but this is sufficient. Do you think it is easy to create a package based on LightGraphs that can do this? What do you think?