Distinguish dictionary lookup from nothing and not found?

thautwarm · May 3, 2020, 8:28am

get(dict, key, nothing) can be ambiguous as it can means dict[key] === nothing or key not found.

Without a get function returning Union{Some{T}, Nothing}, I have to choose the following options, which turns out to trouble me in my use case:

I may use get(key, value) do ..., and then my code will be forced to present in an inner scope.
As a consequence, my variable assignments will not have effects upon the outer scope.
I have to define a singleton struct like struct MyNothing end, which will be invisible to users, and I use get(d, key, MyNothing()) to distinguish “not found” from value is nothing.
This hurts, as I want to generate code without any runtime dependency.
Defining a MyNothing struct can be feasible, but it’s kind of painful to treat: redefinitions of MyNothing can have many occurrences, hence need extra efforts to analyse and reduce the definition generation.
I use haskey(d, key) first, but it may be inefficient.
Anyone could guarantee that using haskey(d, key) first and then do d[key] has no overhead?

As you see, above 3 approaches are all painful.

Thanks in advance for showing me other workarounds/solutions.

Tamas_Papp · May 3, 2020, 8:41am

Depending on the context (ie what you want to do when the dictionary doesn’t have the key), the get(f, dict, key) or get!(f, dict, key) accessors may provide a workaround.

I don’t think that doing haskey is without overhead. An approach to overcome this is outlined in

(see the “tokens” part).

rfourquet · May 3, 2020, 8:53am

Another technique similar to the “singleton” solution is to use a long randomly looking symbol, like const my_secret_token = :Tnukw2GFmwFIiTGoQLWrDL3zGJKP5v, which doesn’t suffer from “redefinitions” (if I understood your problem with this solution correctly). I don’t know how this would compare performance-wise.

Also, a PR to follow, with a get(dict, key) interface: https://github.com/JuliaLang/julia/pull/34821

thautwarm · May 3, 2020, 10:01am

With get(f, dict, key), I can only distinguish things inside f… Inside f it’s a function, the variable assignments get affected.
Those library looks good, but I want to generate code without dependencies(so prefer Stdlib).
Anyway, thanks.

thautwarm · May 3, 2020, 10:02am

interesting but this is a little magic and unreliable I think…

rfourquet · May 3, 2020, 11:01am

Haha, does it look less magic if I had not used the name “secret” ?
FWIW, this is exactly the technique currently used in Base for e.g. in(::Pair, ::AbstractDict), and is as reliable as using a singleton (provided there are enough random characaters in the token).

thautwarm · May 3, 2020, 11:51am

Ahhh, I’m not sure…

This shocked me… Isn’t it possible to produce bugs?
I know the possibility is quite insignificant, but…
Thanks, I feel so messy now

rfourquet · May 3, 2020, 12:25pm

Yes this is the key. When the probability is so insignificant, it’s null for all practical matters. It’s as likely to have e.g. collision in Julia’s git repository objects (which are content-addressed by cryptographic hash), or collisions in the UUIDs of packages, etc. Collision resistance is an important property on which we rely every day without necessarily be aware.

Hey, no need to! Security based on unlikelyness is probably unintuitive.

Henrique_Becker · May 3, 2020, 2:13pm

I remember having this same feeling in my course of stochastic algorithms until my teacher said:

“Every algorithm is a stochastic algorithm. Every time you run a function there is a non-zero chance of a meteorite falling on your computer, or the energy going down, or a hardware error, or anything else. The question is: how does the probability of your stochastic algorithm returning the right solution compares to these other possibilities? If it is much smaller, then you are worrying about the wrong thing.”

Tamas_Papp · May 3, 2020, 2:29pm

I am not saying that this is an unreasonable sentiment for practical code, but when robust general solutions are available the language should be evolving towards them in the long run as this benefits everyone (kludges do not compose well).

#34821 linked above is such a solution, and so are the tokens of Dictionaries.jl. This particular problem (in various contexts with various trade-offs, nicely enumerated by the original post) has been around for a long time and I hope it will be solved eventually. That will make Julia a better language.

tkf · May 3, 2020, 3:23pm

I think I get the unlikelyness argument but I still think @thautwarm’s concern is legit. For example, if you store the secret token as a global const, you can’t naively use Dict in a module-walking functions. Even if you cleverly hide the token in closure or something, it’s quite visible in Julia if you use reflections and I’d imagine it’s almost impossible to hide it from compiler hooks like Cassette.jl and IRTools.jl.

It’d be really nice to have “monadic” interface get(dict, key) :: Union{Some,Nothing} as proposed in #34821.

Henrique_Becker · May 3, 2020, 4:01pm

Yes, the unlikeness argument does not consider reflection. Strangely, seems like a case in which would be better to have a magic number repeated over all code instead of a constant that can be accessed by reflection.

rfourquet · May 3, 2020, 4:07pm

Oh definitely! I compared the token solution to the singleton solution, which share the same problem with reflection. But even this problem is quite insignificant (as in: almost no-one is concerned by it) compared to the usability problem: no one wants to bother writing down a secret token or a singleton struct, so the temptation is big to just hope that there won’t be nothing in the dict and use nothing as the default.

tkf · May 3, 2020, 4:19pm

Yeah, I agree that the singleton solution also shares the same pitfall.

BTW, if you control the Dict type as well, I think the cleanest solution ATM is to use Dict{Some{T}} so that get(dict, key, nothing) is equivalent to #34821 get(dict, key).

complyue · March 31, 2022, 3:53pm

get(dict, key, dict) is one solution, though it’ll bite you one day your users want to use self-referencing dicts for weird things.

complyue · March 31, 2022, 4:01pm

If you do this in a macro, I think you can interpolate your true (some const in your macro’s module) singleton value into the expr returned.

I started serious playing with Julia only since 1.7.2, maybe that’s not a working option at the time you were struggling?

thautwarm · April 12, 2022, 10:50am

Hi, sorry for my late reply. This is an old thread, but it is not resolved so I’d bump. Let me introduce the context a little, it seems trivial but actually not that trivial…

The context is I was doing some macro-based code generation, and there are two requirements.

My package’s first requirement is that I want to distinguish nothing and “not found” from get. Although get(...) do seems working, but when it comes to code generation, due to the lack of a get returning Union{Some{T}, Nothing}, I have to choose one of the following three options:

put the whole subsequent code in the inner function. The problem here is that my code generation involves goto and label statements, which does not work cross-function.
use get(dict, k, default) where default is a secret_table_token. The last few comments in this thread shows secret_table_token works in practice, but as a library author I’d like to avoid including such code in the library. See julia#34821.
use get(dict, k, default) where default is guaranteed to be a different from the value looked up from the generic dict. This can be done via defining a new struct MyNothing.

These options actually do not work. The 1st one is impossible in my case, the 2nd is evil. The 3rd looks nice but unfortunately does not work due to my package’s second requirement: My final goal is to develop a macro package that exists only at development time but not runtime (like babel in JS world), i.e., making a “runtime-free” package.

an example of "runtime-free" macro expansion

The following macro-expanded code can totally replace the original macrocall @switch. If we expand macrocalls at development time, @switch and its source package can be safely removed!

 @macroexpand @switch x begin
        @case Dict(1 => x) && if x < 5 end
             println(x)
        @case _
 end

quote
    true
    var"##323" = x
    #= REPL[33]:2 =#
    if var"##323" isa Dict && (begin
                    var"##324" = if haskey(var"##323", 1)
                            Some(var"##323"[1])
                        else
                            nothing
                        end
                    var"##324" isa Some
                end && (var"##324" !== nothing && begin
                        var"##325" = (var"##324").value
                        let x = var"##325"
                            x < 5
                        end
                    end))
        x = var"##325"
        var"##return#321" = begin
                #= REPL[33]:3 =#
                println(x)
            end
        $(Expr(:symbolicgoto, Symbol("####final#322#326")))
    end
    #= REPL[33]:4 =#
    begin
        var"##return#321" = begin
            end
        $(Expr(:symbolicgoto, Symbol("####final#322#326")))
    end
    (error)("matching non-exhaustive, at #= REPL[33]:1 =#")
    $(Expr(:symboliclabel, Symbol("####final#322#326")))
    var"##return#321"
end

The reason(s) why introducing MyNothing breaks “runtime-free” can be illustrated by the following example:

module MyMod
function f()
     # the first time expanding this macro in `MyMod`, 
     # surprisingly, defines a global struct `MyNothing` in `MyMod`
     @codegenhere! 
end

Above code intends to define MyNothing when expanding codegenhere!. However, it is not “runtime-free”, because obviously you cannot put a struct definition inside a local function, which means that MyNothing should be defined using side-effects in the macrocall. Finally, the expanded code does not contain MyNothing but does reference my MyNothing. We have to provide runtime support that supplies MyNothing.

P.S: see the generated code in the collapse section, I finally choose to generate 1 haskey and 1 getindex instead, which performs 2 hash search and makes MLStyle.jl significant slower than crafted code in this certain case.

complyue · April 12, 2022, 12:49pm

I’m curious how this is done?

As far as I’ve played with Julia, I’m not aware of a way to use dev-time only dependencies, i.e. dependencies of a Julia package are always installed in source form, on any end user’s site. Is there an exception for macros? Or you just generate some source by macro expansion, and only distribute the generated source? But not distributing the very original source seems not open-source adhering, how’s that justified if it’s really the case?

I heard of system image building a thing but never looked into it yet, but even in that scenario, I think you can bundle your MyNothing struct into the built image.

Sukera · April 12, 2022, 1:59pm

This can be achieved by already storing a Some of your elements, no? If you then actually store a nothing, it ends up as Some(nothing), distinguishing it from just nothing.

thautwarm · April 12, 2022, 3:33pm

I have to handle general dictionary pattern matching. It’s not a specific use case. If you want details, see the collapse section of the comment you replied. So far I cannot further optimize the generated code for the pattern matching.

Topic		Replies	Views
Findfirst for Dicts with `nothing` keys General Usage	33	3406	March 4, 2018
Could the usage of `haskey` be unidiomatic/inefficient? Performance dictionary	6	1287	November 13, 2018
Suggestion for more general/performant sentinel for find* functions Internals & Design	17	1639	January 29, 2018
Nothing == absence of keyword argument? General Usage question , keyword-arguments	15	877	June 13, 2025
Understanding the limitations of generated functions General Usage question , performance , metaprogramming	17	1467	July 27, 2020

Distinguish dictionary lookup from nothing and not found?

Related topics