Datastructure for two-way map

Say that I have two vectors U and V defined as

U = ["A", "D", "B"]
V = ["F", "G", "B"]

and I want to map from U to V, and V to U. Both vectors contain only unique elements. For a one-way mapping, this is easy via a Dict:

julia> U_V = Dict(zip(U, V))
Dict{String,String} with 3 entries:
  "B" => "B"
  "A" => "F"
  "D" => "G"

julia> U_V["A"]
"F"

Is there also a data structure to have a two-way mapping? As I see it, the problem with two Dictionaries is that they duplicate the data and can get out of sync.

1 Like

I am not sure if I understand your problem. What prevents you from associating the first element in the first vector with the first element in the second vector and so on?

I think your problem is, in fact, that you want to be able to find the elements in O(1) (or O(log n)) and associate both sets without replicating data. In this case I would suggest having both arrays ordered (what allows for O(log n) binary search) and each of the two arrays has its own auxiliary array of indexes that points to the associated index in the other array. However, this can become a nightmare to maintain if queries are interleaved with insertion and removal of elements.

I am not aware of any out-of-the-box solution solution for this. Did you look at DataStructures.jl?

1 Like

The problem is not runtime but avoiding inconsistent states. Your mention of DataStructures.jl made me realize that this is a general problem indeed. It appears to be called a BiMap:

From the Haskell docs:

An implementation of bidirectional maps between values of two key types. A Bimap is essentially a bijection between subsets of its two argument types.

Each element of the left-hand type is associated with an element of the right-hand type, and vice-versa, such that the two mappings are inverses. Deleting an element will cause its twin to be deleted, and inserting a pair of elements will cause any overlapping bindings to be deleted.

There is a package created and last updated in 2015 at https://github.com/bicycle1885/BiMaps.jl.

I should probably make a PR at DataStructures.jl

1 Like

Again, I do not understand what you mean by this. Do you mean you can insert/remove an element from one vector and forgot to do the same to the associated element? In this case you can wrap your vectors inside a struct BiMap and only interact with them by means of safe methods.

Unfortunately, so often I find Julia kinda of lacking in advanced DataStructures. I myself have written TrackingHeaps.jl to overcome some limitations I found.

If you just need a short-term solution to be optimized for speed later, a simple reverse lookup is plenty fast:

julia> function get_rev(src, val)
         for key in keys(src)
           src[key] == val && return key
         end
         throw(KeyError("$val"))
       end
get_rev (generic function with 1 method)

julia> get_rev(U_V, "F")
"A"

But if you need serious performance, you’d have to have to have a single structure with two hash tables and maintain consistency internal to the api. Personally, I wouldn’t do this up front until I was sure this was going to be my bottleneck.

1 Like

If you’re looking for a bijection, look no further:

https://github.com/scheinerman/Bijections.jl

Example:

julia> using Bijections

julia> d = Dict("A" => "F", "D" => "G", "B" => "B")
Dict{String,String} with 3 entries:
  "B" => "B"
  "A" => "F"
  "D" => "G"

julia> b = Bijection(d)
Bijection{String,String} (with 3 pairs)

julia> b["A"]
"F"

julia> b("G")
"D"
12 Likes

I wanted this a few years back and ended up just implementing my own type with two dicts, as at the time anything else just seemed like too much hassle (plus I didn’t have any performance critical applications).

3 Likes

I just did the same and put it in a package: https://github.com/rikhuijzer/BidirectionalMaps.jl. Only implemented the immutable one for now because that’s good enough for my use case.

Awesome suggestion. I did my own thing, though, because I found the reverse syntax a bit weird for my use-case

julia> b = Bijection{Int,String}()
Bijection{Int64,String} (with 0 pairs)

julia> b[1] = "alpha";

julia> inv = active_inv(b);

julia> inv["alpha"]
1

versus

julia> using BidirectionalMaps

julia> U = ["alpha"];

julia> V = [1];

julia> b = ImmutableBimap{String,Int}(U, V);

julia> b.left["alpha"]
1

julia> b.right[1]
"alpha"

Yes, exactly. When training a statistical model, I sometimes need to convert my labels (strings) to integer. After training the model, I want to be absolutely 100% sure that my conversion is right to avoid messing up the conclusions.

The syntax of the Bijection was probably created with the objective of writing general code that does not care which is domain and which is image. If you want a function using yours implementation to work both from left to right as well as right to left you will probably need to create an wrapper or put ifs all over it. This active_inv method will probably work again over inv giving back the original direction, so there is no concept of a natural direction and you can just call active_inv before passing the Bijection to a function if you want it to work in the opposite direction.

1 Like

@rikh It seems like you might have overlooked the convenient syntax that Bijections.jl provides for accessing the maps in both directions:

julia> using Bijections

julia> b = Bijection("alpha", 1)
Bijection{String,Int64} (with 1 pairs)

julia> b["alpha"]
1

julia> b(1)
"alpha"

Notice that a Bijection is callable. So, to access the “left” map, you index the bijection, and to access the “right” map, you call the bijection.

The only small downside to this is that there is not much visual distinction between b["a"] and b("a").

5 Likes

As a side note, isn’t this what CategoricalArrays.jl is for?

1 Like

Probably! Thanks!