`objectid` and reproducibility

mhinsch · February 20, 2025, 3:48pm

I just spent half a day debugging very weird non-deterministic behaviour in my simulation. The culprit as it turns out was one little piece of code where I used sorting an array of objects by objectid as a way to make getting rid of duplicates more efficient. Apparently objectid is not derministic, though, so the order of objects varied between simulation runs leading to (numerically) different results even with the same random seed.

It’s not difficult to avoid the issue, once you know that it exists, but finding it was a bit of a hassle…

ETA: issue

matthias314 · February 20, 2025, 10:48pm

I’ve run into a related issue: If you define hash for some new type T and that hash involves objectid(T), then dictionaries with key type T are displayed non-deterministically. If such a dictionary shows up in Julia output inside some docstring, then Documenter.jl will complain all the time that the docstring is not accurate anymore.

cstjean · February 21, 2025, 4:26am

Python switched its dicts to ordered dicts by default, and solving those non-determinacy problems is a major benefit. I use OrderedDict by default nowadays (from OrderedCollections.jl), unless I have a very good reason not to.

danielwe · February 21, 2025, 5:35am

I’d try Base.IdSet for this. It lacks a convenient constructor, but you can do

set = union!(Base.IdSet(), arr)

Sets have undefined order, so for reproducibility you probably want something like

ordered_and_unique = sort!(collect(set); by=<something>)

Caveat: Base.IdSet isn’t documented. I don’t know why; it’s built on IdDict the same way Set is built on Dict, and IdDict is well-documented and exported.

Zentrik · February 21, 2025, 8:02am

Fwiw IdSet uses objectid as it’s hashing function.

foobar_lv2 · February 21, 2025, 10:51am

As people pointed out, objectid is simply the (slightly mangled) address for addressable objects (i.e. for objects where pointer_from_objref is doesn’t throw).

This fact looks unlikely to go away soon. The current implementation implicitly guarantees the following very nice property:

If you have two addressable objects that are not identical / egal, then their objectids will be different.

This subtle undocumented fact is internally relied on. For example, the current implementation of ScopedValues uses a persistent dict / HAMT that just doesn’t handle hash-collisions. This is fine, because hash-collisions are fundamentally impossible as long as you remember never to feed in non-addressable objects (objectids can be re-used after the old object dies; that’s the responsibility of allocator/GC)!

So, I guess you should consider asking this question to your operating system / libc, because the exact addresses of virtual memory handed out there is probably the ultimate source of your non-reproducibility.

(yes, this is another complication in moving GC)

I very much second this recommendation. Introducing very-hard-to-reproduce unintentional randomization into a software project sucks so hard!

At work we had one incident where somebody computed a checksum over some data by processing entries in the order they fell out of a java HashMap iterator, to be used for some caching. Later that was upgraded to a sha checksum, and another team thought “these don’t collide, let’s use that as a key/identity for our database”.

That bug laid dormant for years until it exploded into our face, and left us with years of subtly broken/inconsistent history that was impossible to clean up . Upshot was a soft policy of “unordered collections are just as suspect as sun.misc.Unsafe – don’t do it without at least a paragraph of justifying source-code comment”.

nsajko · February 21, 2025, 11:12am

That’s simply a doc test bug (your bug). Use [...] in the doc string where the dictionary is shown: Writing Documentation.

Another solution, adopted by the Go language, is to randomize the iteration order every time.

matthias314 · February 21, 2025, 1:23pm

That doesn’t work if I want the output to appear in the docstring. Think of a function that returns a dictionary. If I want to illustrate what such a function does, then omitting the output doesn’t make sense.

nsajko · February 21, 2025, 1:33pm

Then use julia-repl instead of jldoctest, in addition to unit tests.

barucden · February 21, 2025, 1:58pm

Following on Neven’s suggestion, you could also show the dictionary entries explicitly:

    """
        f(x)

    Returns a dict.

    # Examples
    ```jldoctest
    julia> d = f(1);

    julia> d["a"], d["b"]
    (1, 2)
    ```
    """
    function f(x)
        return Dict("a"=>x, "b"=>1+x)
    end

That way, you don’t (incorrectly) assume any ordering.

Topic		Replies	Views
Different dict iteration order between runs when altering unrelated code New to Julia	3	292	January 19, 2021
Break Dictionary arrange key General Usage	2	292	December 22, 2019
Setting the display order of entries in (unordered) Dict Performance	2	371	November 8, 2023
Is order well-defined for multiple iterations over `Dict`? General Usage question	5	917	February 21, 2017
Replacement for Dict, is anyone using isslotfilled? General Usage	22	1871	October 27, 2020

`objectid` and reproducibility

Related topics