Distinguish dictionary lookup from nothing and not found?

get(dict, key, nothing) can be ambiguous as it can means dict[key] === nothing or key not found.

Without a get function returning Union{Some{T}, Nothing}, I have to choose the following options, which turns out to trouble me in my use case:

  1. I may use get(key, value) do ..., and then my code will be forced to present in an inner scope.
    As a consequence, my variable assignments will not have effects upon the outer scope.

  2. I have to define a singleton struct like struct MyNothing end, which will be invisible to users, and I use get(d, key, MyNothing()) to distinguish “not found” from value is nothing.
    This hurts, as I want to generate code without any runtime dependency.
    Defining a MyNothing struct can be feasible, but it’s kind of painful to treat: redefinitions of MyNothing can have many occurrences, hence need extra efforts to analyse and reduce the definition generation.

  3. I use haskey(d, key) first, but it may be inefficient.
    Anyone could guarantee that using haskey(d, key) first and then do d[key] has no overhead?

As you see, above 3 approaches are all painful.

Thanks in advance for showing me other workarounds/solutions.

Depending on the context (ie what you want to do when the dictionary doesn’t have the key), the get(f, dict, key) or get!(f, dict, key) accessors may provide a workaround.

I don’t think that doing haskey is without overhead. An approach to overcome this is outlined in

(see the “tokens” part).

1 Like

Another technique similar to the “singleton” solution is to use a long randomly looking symbol, like const my_secret_token = :Tnukw2GFmwFIiTGoQLWrDL3zGJKP5v, which doesn’t suffer from “redefinitions” (if I understood your problem with this solution correctly). I don’t know how this would compare performance-wise.

Also, a PR to follow, with a get(dict, key) interface: https://github.com/JuliaLang/julia/pull/34821

2 Likes

With get(f, dict, key), I can only distinguish things inside f… Inside f it’s a function, the variable assignments get affected.
Those library looks good, but I want to generate code without dependencies(so prefer Stdlib).
Anyway, thanks.

interesting but this is a little magic and unreliable I think… :joy:

1 Like

Haha, does it look less magic if I had not used the name “secret” ?
FWIW, this is exactly the technique currently used in Base for e.g. in(::Pair, ::AbstractDict), and is as reliable as using a singleton (provided there are enough random characaters in the token).

1 Like

Ahhh, I’m not sure…

This shocked me… Isn’t it possible to produce bugs?
I know the possibility is quite insignificant, but…
Thanks, I feel so messy now

Yes this is the key. When the probability is so insignificant, it’s null for all practical matters. It’s as likely to have e.g. collision in Julia’s git repository objects (which are content-addressed by cryptographic hash), or collisions in the UUIDs of packages, etc. Collision resistance is an important property on which we rely every day without necessarily be aware.

Hey, no need to! Security based on unlikelyness is probably unintuitive.

1 Like

I remember having this same feeling in my course of stochastic algorithms until my teacher said:

“Every algorithm is a stochastic algorithm. Every time you run a function there is a non-zero chance of a meteorite falling on your computer, or the energy going down, or a hardware error, or anything else. The question is: how does the probability of your stochastic algorithm returning the right solution compares to these other possibilities? If it is much smaller, then you are worrying about the wrong thing.”

1 Like

I am not saying that this is an unreasonable sentiment for practical code, but when robust general solutions are available the language should be evolving towards them in the long run as this benefits everyone (kludges do not compose well).

#34821 linked above is such a solution, and so are the tokens of Dictionaries.jl. This particular problem (in various contexts with various trade-offs, nicely enumerated by the original post) has been around for a long time and I hope it will be solved eventually. That will make Julia a better language.

I think I get the unlikelyness argument but I still think @thautwarm’s concern is legit. For example, if you store the secret token as a global const, you can’t naively use Dict in a module-walking functions. Even if you cleverly hide the token in closure or something, it’s quite visible in Julia if you use reflections and I’d imagine it’s almost impossible to hide it from compiler hooks like Cassette.jl and IRTools.jl.

It’d be really nice to have “monadic” interface get(dict, key) :: Union{Some,Nothing} as proposed in #34821.

4 Likes

Yes, the unlikeness argument does not consider reflection. Strangely, seems like a case in which would be better to have a magic number repeated over all code instead of a constant that can be accessed by reflection.

2 Likes

Oh definitely! I compared the token solution to the singleton solution, which share the same problem with reflection. But even this problem is quite insignificant (as in: almost no-one is concerned by it) compared to the usability problem: no one wants to bother writing down a secret token or a singleton struct, so the temptation is big to just hope that there won’t be nothing in the dict and use nothing as the default.

4 Likes

Yeah, I agree that the singleton solution also shares the same pitfall.

BTW, if you control the Dict type as well, I think the cleanest solution ATM is to use Dict{Some{T}} so that get(dict, key, nothing) is equivalent to #34821 get(dict, key).

4 Likes