Reading the Dictionaries.jl readme, it looks like a really nice package that lot of effort went into. Very glad to have something like this. Thank you for your contribution!
I’ve begun to play with it a bit to learn more about tokens, and I ran into the feeling that the interface could be more ergonomic.
Consider the following:
julia> using Dictionaries
julia> d = Dictionary('a':'c', 1:3)
3-element Dictionary{Char, Int64}
'a' │ 1
'b' │ 2
'c' │ 3
julia> iskey, t = gettoken(d, 'b')
(true, (15, 2))
julia> iskey && gettokenvalue(d, t)
2
julia> iskey && settokenvalue!(d, t, 5)
3-element Dictionary{Char, Int64}
'a' │ 1
'b' │ 5
'c' │ 3
julia> t ∈ tokens(d) # what?
false
julia> d[t] # this should work
ERROR: MethodError: Cannot `convert` an object of type Tuple{Int64, Int64} to an object of type Char
Questions:
- Is there a reason the
gettoken
function returns a tuple containing whether the key is present? Why not just return a guaranteed-invalid token, whichisassigned
(or whatever token-checking function) can immediately recognize as invalid, and skip the boolean? - Is there a reason the token is just a tuple? I get that that’s the minimal amount of information needed to communicate the dictionary access, but it could be nice to have a specialized
DictToken
type—then we could type-specializegetindex
andsetindex!
on it, so we wouldn’t need specialgettokenvalue
andsettokenvalue!
functions and we could leverage[]
index syntax. (I don’t see any reason that a dictionary must maintain the ability to hashDictToken
s into keys; that seems like degenerate behavior not everything that’s hashable ought to be hashed). This is basically the hashmap dual of the idea to access arrays by ordinal indices, which was what inspired this thread. - There seems to be something I don’t understand about
tokens
: why ist ∉ tokens(d)
? And why does iteratingtokens(d)
yield something so different from what’s shown by itsshow
method?
julia> t
(15, 2)
julia> tokens(d)
3-element Dictionaries.IndicesTokens{Char, Int64, Indices{Char}}
'a' │ (8, 1)
'b' │ (15, 2)
'c' │ (9, 3)
julia> t ∈ tokens(d) # we can't use this to check token validity
false
julia> (tokens(d)...,) # what?
((0, 1), (0, 2), (0, 3))
Now, consider the following extension to the above example:
julia> _, tb = gettoken(d, 'b'); _, tc = gettoken(d, 'c');
julia> unset!(d, 'b')
2-element Dictionary{Char, Int64}
'a' │ 1
'c' │ 3
julia> istokenassigned(d, tb), istokenassigned(d, tc)
(true, false)
julia> set!(d, 'b', 2)
3-element Dictionary{Char, Int64}
'a' │ 1
'c' │ 3
'b' │ 2
julia> istokenassigned(d, tb), istokenassigned(d, tc)
(true, true)
julia> gettokenvalue(d, tb), gettokenvalue(d, tc) # what?
(3, 2)
More questions:
- It seems like a good way to shoot ourselves in the foot is by keeping an old token and attempting to use it after a dictionary has had an insertion or deletion. Is there any zero-cost way to store information in the
DictToken
that could be used to indicate whether it is still valid, so that attempting to use an outdated token will throw an error? - Is there a better way to check the validity of tokens that I’m completely missing?
- For ordered dictionaries, would it be meaningful to be able to construct token ranges? For example, I’m imagining that one could write
d[token1:token2]
and access all elements of the dictionary between those tokens.
I would imagine a more ergonomic interface to work like this:
#= simulation =#
d = Dictionary('a':'e', 1:5)
t = gettoken(d, 'c')
if isassigned(d, t) d[t] = d[t] + 1 end
and possibly this for ordered dictionaries:
# currying form of gettoken? just spitballing...
gt = gettoken(d)
trange = gt('b'):gt('d')
isassigned(d, trange) &&
map(d[trange]) do x; x^2 end
Finally, I’d like your opinion if you don’t mind:
- It seems like ordered dictionaries offer a lot of benefits. Would it make sense to introduce an ordered dictionary into
Base
? - If a scheme for ordinal indexing of abstract arrays is introduced, would you think it a good idea to also use it for ordered dictionaries?
Thanks again!