Get element of a SortedDict by index instead of getting by key


I have a SortedDict like

julia> using DataStructures: SortedDict
julia> v = SortedDict(Dict{Float64,Float64}(), Base.Forward)
julia> v[1.32]=10
julia> v[1.33]=20
julia> v
DataStructures.SortedDict{Float64,Float64,Base.Order.ForwardOrdering} with 2 entries:
  1.32 => 10.0
  1.33 => 20.0

I’d like to get a value using its index (its position) instead of its key.

How can I do this using Julia ?

Kind regards

You can always collect() the keys of the dict to get a vector into which you can index:

keys_as_vector = collect(keys(v))

But this allocates a new array to hold the keys, so it may not be the most efficient approach possible.

However, if you’re always accessing the elements by index instead of by key, then perhaps a SortedDict isn’t really the right data structure for your case? Would a vector of Pairs or Tuples work instead?

I need to access values both by index and by key so a vector of Pairs or Tuples can’t work.

I’d prefer to avoid to allocates a new array as it can be inefficient.

I know you can do this with an OrderedDict. I don’t think it’s possible without creating a temporary for the indexing because the internal indexing of a SortedDict is much more difficult. This is because OrderedDict stores everything in arrays while SortedDict stores everything in a BalancedTree. So you have to really step through the tree to find out what the next value is, which is what collecting the iterator does.

Maybe it could be better for you to sort it yourself and use an OrderedDict for this reason? Or maybe find a way to save what the indexes to the BalancedTree mean, and mutate this when a value is added?

Your best bet would probably be to use an OrderedDict, and resort it whenever you need to access an element by position.

If you’re resorting frequently, you should use TimSort from the SortingAlgorithms package, which is faster than other sorts when the data is mostly sorted.

A SortedDict is implemented with a tree, and it might be possible to use a different tree structure which makes indexing by position efficient, by maintaining counts of nodes on each branch, but I’m not specifically familiar with such a tree.

1 Like

If your purpose is to be able to quickly answer a query such as: what is the 145th largest item in the SortedDict, then you are out of luck-- the data structure does not currently provide an efficient algorithm for this type of query. (An inefficient algorithm would be: step through items 1 to 144 by advancing a token.) With a substantial amount of code development, it would be possible to support a query like this in O(log n) time. The code development would involving adding a field to all internal tree nodes that keeps track of the number of descendants of the node.

If your purpose is, given a key-value pair already found, store something like an integer so that you can get direct (i.e., O(1)) access to the (k,v) pair again later on, then you should read up on tokens and semitokens in the documentation.

1 Like

My goal is to simulate an orderbook (with price as key and volume as value).
So I need to be able to update this orderbook by level ie (for example) add or substract a given volume for a given price.
I also need to calculate price for a given volume so I need to iterate increasing (for asks prices) prices and so volumes up to requested volume.

If I understand correctly, you have two main operations: given a new (p,v), where p=price and v=volume, you want to insert this into the sorted dict. If there is already (p,v’) in the sorted dict (same p), then instead you wish to update (p,v’) to (p,v+v’)…

The other operation is, given a volume V, you want to find and delete pairs (p,v) starting from lowest to highest p, until the sequence of v’s removed adds up to V.

Both of these are possible with SortedDict. For the first operation, you find p; if found under semitoken s, then you update sd[s], and if not found, the you push! or insert! the new item. For the second operation, you use a while-loop with the startof function and delete! function.

Your username ‘femtotrader’ suggests that you are interested in high-performance software for use in high-speed financial trades. In this case, you should know that SortedDict in Julia is not as fast as map in C++, primarily because the latter has no safeguards or error-checks, at least when used in the default way.