Dictionary showing invalid keys

gxyau · March 21, 2023, 10:25pm

Hello there,

Been playing around with Julia a bit just to practice some topics. I have recently started using Graphs.jl and found some unexpected behaviour. A MWE and the output has been included below.

Long story short, I created a graph with some edges, and I would like to associate a probability to each edge, which is constant. For each edge, I cast the Graphs.SimpleGraphs.SimpleEdge{Int64} to a tuple as the key of the edge probability, whose keys are tuples and values are its probability. However, when checking the key of the dictionary, it shows that there are 16 keys instead of 3 which I was expecting. Any idea what happened?

Thank you all very much in advance!

# MWE
using Distributions, Graphs, Random

G₀ = SimpleGraph(3, 0)
V₀, E₀ = vertices(G₀), edges(G₀)

add_edge!(G₀, 1, 2)
add_edge!(G₀, 2, 3)
add_edge!(G₀, 1, 3)

# Defining probability of connection and probability of disconnection
probs = rand(Uniform(0, 1), G₀.ne)
const edge_prob = Dict{Tuple{Int64, Int64}, Float64}(Tuple.(E₀) .=> probs)
println("The dictionary has length $(length(probs))")
println("The keys has length $(length(edge_prob.keys))")

# Output

The dictionary has length 3
The keys has length 16

odow · March 21, 2023, 10:42pm

You want keys(edge_prob), not edge_prob.keys.

Benny · March 21, 2023, 11:05pm

The keys method gets you an object that holds the keys. The .keys field is an Array holding the keys and empty slots. I think 16 entries is as low as it ever gets, and more space is allocated to make sure the dictionary isn’t too full. IIRC, hash table performance generally degrades fast beyond 2/3 or 3/4 occupancy.

uniment · March 22, 2023, 12:40am

Not confusing at all.

odow · March 22, 2023, 1:39am

Not confusing at all.

Julia doesn’t have the concept of private fields of a struct. This means accessing .xxx fields is usually not intended or safe, unless explicitly documented.

Here’s the documentation for dictionaries: Collections and Data Structures · The Julia Language

keys is documented:

julia> d = Dict(1 => 2)
Dict{Int64, Int64} with 1 entry:
  1 => 2

help?> keys(d)
  keys(a::AbstractDict)

  Return an iterator over all keys in a dictionary. collect(keys(a)) returns an array of keys. When
  the keys are stored internally in a hash table, as is the case for Dict, the order in which they
  are returned may vary. But keys(a) and values(a) both iterate a and return the elements in the
  same order.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> D = Dict('a'=>2, 'b'=>3)
  Dict{Char, Int64} with 2 entries:
    'a' => 2
    'b' => 3
  
  julia> collect(keys(D))
  2-element Vector{Char}:
   'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
   'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)

CameronBieganek · March 22, 2023, 2:00am

Normally I prefer to avoid leading underscores as much as possible, but in this case it might be a good idea for Base to rename that field to _keys to avoid this gotcha.

CameronBieganek · March 22, 2023, 2:07am

I opened an issue with that suggestion:

gdalle · March 22, 2023, 6:04am

For your specific use case you may also be interested in GitHub - JuliaGraphs/SimpleWeightedGraphs.jl: Edge-weighted graphs compatible with Graphs.jl

gxyau · March 22, 2023, 8:08am

Thanks for replying, I got so used to Python and Java at my work that forgot why I liked Julia in the first place lol.

gxyau · March 22, 2023, 8:10am

Ahh, thanks for the information. I did looked into SimpleWeightedGraphs, but I needed it to have more structure than just the edge probability so I chose to work from basic principles. Besides, it’s a good practice for myself anyway. (:

gxyau · March 22, 2023, 8:11am

I see, that’s why some of the key-value pairs are even repeated.

gdalle · March 22, 2023, 10:26am

I agree that it is excellent practice to code it yourself, but in case you need something more reliable, you can always try GitHub - JuliaGraphs/MetaGraphsNext.jl: Graphs with vertex labels and metadata in Julia

gxyau · March 22, 2023, 10:48am

Thank you. How is the performance and stability of the package right now though, do you know? I was told that it is an option but not very stable but it was a while ago.

gdalle · March 22, 2023, 12:19pm

I have personally cleaned it up a lot lately, and added more tests for robustness. I’m gonna benchmark it against the historical MetaGraphs.jl in the next few days, but I’m fairly confident that its performance is much better

Palli · March 22, 2023, 7:09pm

There are also other dictionaries to be aware of, and likely fix, e.g. (SwissDict and RobinDict and) most importantly OrderedDicts (and OrderedSet):

https://juliacollections.github.io/DataStructures.jl/latest/ordered_containers/

I suppose you would want PR for such packages too, and at least it’s only a breaking change packages(s), not Julia per se. There’s at least one more dictionary package too, Dictionary.jl.

It might be a good start, to change packages, at least to see how tolerant users are to change, and get them ready.

That said, would there be a way to make my_dict.keys work when keys is gone (without a macro?), to get you keys(my_dict). I believe you can overload the period. I believe the former does actually allow you to change the keys, but I would be ok with read-only, likely covers all uses (and it’s a bug anyway as is). Does the later give you keys you can then change?

GunnarFarneback · March 22, 2023, 7:37pm

It’s not a breaking change to rename internal fields. In this case it’s also pretty difficult to make use of the keys field so the probability that some user has mistakenly (and successfully) dipped into the internals is pretty low.

Yes, you could but why would you want to?