Sorted list or heaps?

Hello again everyone. I am trying to maintain a sorted list of a custom structure, and broadcast a transformation onto the list whenever needed. Currently I am using BinaryMaxHeap in DataStructure, but it does not allow broadcasting. Consider the following MWE.

using DataStructures
import Base:: isless

const T0 = 1+1/pi

struct MyStruct{T}
    a::T
    b::T
    total::Matrix{T}

    
    function MyStruct(a::U, b::T) where U<:Number where T<:Number
        return new{Float64}(a, b, a+b)
    end
end

function isless(stuct1::MyStruct, struct2::MyStruct)
    return isless(struct1.a, struct2.a)
end

function increase_a(M::MyStruct)
    return MyStruct(M.a + T0, M.b - 2 * T0)
end

struct_list = BinaryMaxHeap{MyStruct}()

push!(struct_list, MyStruct(1, 2))
push!(struct_list, MyStruct(2, 3))

What I want to do is increase_a.(struct_list), but what I get is ERROR: MethodError: no method matching iterate(::BinaryMaxHeap{MyStruct}). Is it possible to do this with heap?

Another idea I have is to maintain a sorted list (maybe something like Vector{MyStruct}), but I don’t see any sorted list implementation on Julia. So if I want to maintain a sorted list then I would have to write custom function to insert into the vector. Is there a reason to have a preference of heap over sorted list or vice versa in Julia? Thank you! (:

The comment below is most useful.

EDIT:
However, if you do want to use a sorted vector: I think people use searchsorted and related functions with a plain old Vector. I thought about adding more of an interface, but I think it’s not worth it. You don’t save much boiler plate. Probably why no one else has done it.

1 Like

The basic problem is that there is no efficient way to insert into a sorted vector: the cost to insert a new element is O(n) for a vector of n elements, because you may have to move many of the existing elements to make room for the new one. This is why people use things like heaps and other data structures.

The basic problem that you have here is that broadcast transformations that change the keys may change the order — you might need to re-sort.

If you are willing to be careful about this, you can use the low-level arrays-as-heaps functions in DataStructures.jl. These let you treat an ordinary array a as a heap, and push to it with heappush!(a, x) or pop from it with heappop!(a). If you do a transformation that changes the ordering, you will need to re-sort the heap by calling heapify!(a).

(These functions used to be documented in the DataStructure manual, but it seems to have been removed from the manual?)

4 Likes

You get a heart regardless. (:

1 Like

Heh! I left my comment because I realized it did answer part of the question.

I am willing to try anything that would work honestly. I do need to update the elements often though, after every iteration in a for loop to be precise. I am not sure if I still get the advantage since it is equivalent of sorting the array again after every iteration.

You may need to re-think your whole algorithm more carefully. What underlying problem are you trying to solve?

Basically maintaining a sorted list that can be updated after every iteration. Or what do you mean??

Why do you need to maintain a sorted list where you are continually changing the sort keys? (Is there an XY problem here?)

Ah, because I needed to access the top two elements (in decreasing order based on a) in each iteration, but the field a decays over time, so I needed to update the list at the end of each iteration. I should also mention that in each iteration, elements may be added or removed (so it’s not a constant list of elements that gets updated at the end of each iteration), so I needed a way to insert or remove elements from the list.

I went back and double check the function for the decay, but it seems that if the decay function is monotonic in a, so the list will and should maintain the same order after the decay. I will double check on this, because if it is true then the solution you mention above would work perfectly.

Will update in a bit, and will accept your answer if that turns out to be true. (:

Edit: Yeah okay, the decay function is monotonic in a, so I can just use a list and the heapify! and related functions as you said.

The SortedSet data structure in DataStructures.jl allows you to find and remove the smallest or largest elements in O(log n) time per element. It does not support in-place key updating. However, you can create a SortedSet in linear time if the keys are already sorted using a special constructor for already-sorted keys. This means that if you want to update all the keys by an order-preserving transformation, then you can create a new SortedSet data structure and discard the old one in linear time.

I believe the correct answer here isn’t a sorted datastructure at all. Maintaining storting will require O(n) per iteration, so instead you can just not sort, and find the top two elements while you do the scan to update all the elements for the decay. This will be faster since all the accesses are linear and you will be doing strictly fewer comparisons.

Granted, if I always want to find the top two elements only. In the future I might need to pair the elements in order (i.e. 1st and 2nd, 3rd and 4th, 5th and 6th, etc), and unsorted collection will then become a problem.

I wrote the top two element because for now I want to implement the most basic case.

That was a really nice idea though, if I didn’t have to consider future generalizations that is.

How is the performance compared to the sorted list? Do you know?

Edit: SortedSet also does not allow for duplicates, which may very well happen in my case. Still curious about the performance of the two data structure though.

I’m not sure what “duplicates” means in this context. Perhaps instead of SortedSet you need SortedMultiDict? As for performance, theoretically, individual operations on the sorted containers are O(c log n), where c is the key comparison time and n is the size of the container. I suggest you run some benchmarks to compare your options.