Been playing around with Julia a bit just to practice some topics. I have recently started using Graphs.jl and found some unexpected behaviour. A MWE and the output has been included below.
Long story short, I created a graph with some edges, and I would like to associate a probability to each edge, which is constant. For each edge, I cast the Graphs.SimpleGraphs.SimpleEdge{Int64} to a tuple as the key of the edge probability, whose keys are tuples and values are its probability. However, when checking the key of the dictionary, it shows that there are 16 keys instead of 3 which I was expecting. Any idea what happened?
Thank you all very much in advance!
# MWE
using Distributions, Graphs, Random
G₀ = SimpleGraph(3, 0)
V₀, E₀ = vertices(G₀), edges(G₀)
add_edge!(G₀, 1, 2)
add_edge!(G₀, 2, 3)
add_edge!(G₀, 1, 3)
# Defining probability of connection and probability of disconnection
probs = rand(Uniform(0, 1), G₀.ne)
const edge_prob = Dict{Tuple{Int64, Int64}, Float64}(Tuple.(E₀) .=> probs)
println("The dictionary has length $(length(probs))")
println("The keys has length $(length(edge_prob.keys))")
# Output
The dictionary has length 3
The keys has length 16
The keys method gets you an object that holds the keys. The .keys field is an Array holding the keys and empty slots. I think 16 entries is as low as it ever gets, and more space is allocated to make sure the dictionary isn’t too full. IIRC, hash table performance generally degrades fast beyond 2/3 or 3/4 occupancy.
Julia doesn’t have the concept of private fields of a struct. This means accessing .xxx fields is usually not intended or safe, unless explicitly documented.
julia> d = Dict(1 => 2)
Dict{Int64, Int64} with 1 entry:
1 => 2
help?> keys(d)
keys(a::AbstractDict)
Return an iterator over all keys in a dictionary. collect(keys(a)) returns an array of keys. When
the keys are stored internally in a hash table, as is the case for Dict, the order in which they
are returned may vary. But keys(a) and values(a) both iterate a and return the elements in the
same order.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> D = Dict('a'=>2, 'b'=>3)
Dict{Char, Int64} with 2 entries:
'a' => 2
'b' => 3
julia> collect(keys(D))
2-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
Normally I prefer to avoid leading underscores as much as possible, but in this case it might be a good idea for Base to rename that field to _keys to avoid this gotcha.
Ahh, thanks for the information. I did looked into SimpleWeightedGraphs, but I needed it to have more structure than just the edge probability so I chose to work from basic principles. Besides, it’s a good practice for myself anyway. (:
Thank you. How is the performance and stability of the package right now though, do you know? I was told that it is an option but not very stable but it was a while ago.
I have personally cleaned it up a lot lately, and added more tests for robustness. I’m gonna benchmark it against the historical MetaGraphs.jl in the next few days, but I’m fairly confident that its performance is much better
I suppose you would want PR for such packages too, and at least it’s only a breaking change packages(s), not Julia per se. There’s at least one more dictionary package too, Dictionary.jl.
It might be a good start, to change packages, at least to see how tolerant users are to change, and get them ready.
That said, would there be a way to make my_dict.keys work when keys is gone (without a macro?), to get you keys(my_dict). I believe you can overload the period. I believe the former does actually allow you to change the keys, but I would be ok with read-only, likely covers all uses (and it’s a bug anyway as is). Does the later give you keys you can then change?
It’s not a breaking change to rename internal fields. In this case it’s also pretty difficult to make use of the keys field so the probability that some user has mistakenly (and successfully) dipped into the internals is pretty low.