Why is deepcopy() -ing Sets slower than Arrays of the same size?

bsuwal · January 8, 2021, 1:28am

I have the following code:

function deepcopy_n_times(obj::Set{Int}, n::Int)
    for i in 1:n
        deepcopy(obj)
    end
end

function deepcopy_n_times(obj::Array{Int, 1}, n::Int)
    for i in 1:n
        deepcopy(obj)
    end
end
    
set = Set{Int}([1, 2, 3, 4, 5])
arr = Array{Int, 1}([1, 2, 3, 4, 5])
num_times = 10000000

@time deepcopy_n_times(arr, num_times)
@time deepcopy_n_times(arr, num_times)

@time deepcopy_n_times(set, num_times)
@time deepcopy_n_times(set, num_times)

The output is:

  1.341462 seconds (30.00 M allocations: 4.619 GiB, 17.08% gc time)
  1.335890 seconds (30.00 M allocations: 4.619 GiB, 16.63% gc time)
  3.896371 seconds (80.00 M allocations: 8.941 GiB, 18.57% gc time)
  3.893588 seconds (80.00 M allocations: 8.941 GiB, 18.75% gc time)

Why is copying a set more expensive time-wise? I believe I am betraying my lack of knowledge about how Sets are represented under the hood.

johnmyleswhite · January 8, 2021, 1:33am

Yes, this is a case where reading the implementation (which is very readable Julia code) makes clear that it must strictly be slower since the Set implementation contains an Array inside of it.

If you read the Set definition, you’ll see:

struct Set{T} <: AbstractSet{T}
    dict::Dict{T,Nothing}

    Set{T}() where {T} = new(Dict{T,Nothing}())
    Set{T}(s::Set{T}) where {T} = new(Dict{T,Nothing}(s.dict))
end

At which point you might jump to the Dict definition to see:

mutable struct Dict{K,V} <: AbstractDict{K,V}
    slots::Array{UInt8,1}
    keys::Array{K,1}
    vals::Array{V,1}
    ndel::Int
    count::Int
    age::UInt
    idxfloor::Int  # an index <= the indices of all used slots
    maxprobe::Int

    function Dict{K,V}() where V where K
        n = 16
        new(zeros(UInt8,n), Vector{K}(undef, n), Vector{V}(undef, n), 0, 0, 0, 1, 0)
    end
    function Dict{K,V}(d::Dict{K,V}) where V where K
        new(copy(d.slots), copy(d.keys), copy(d.vals), d.ndel, d.count, d.age,
            d.idxfloor, d.maxprobe)
    end
    function Dict{K, V}(slots, keys, vals, ndel, count, age, idxfloor, maxprobe) where {K, V}
        new(slots, keys, vals, ndel, count, age, idxfloor, maxprobe)
    end
end

Oscar_Smith · January 8, 2021, 1:43am

This sugests we should probably do a re-write of our Set struct. we should be able to use about half the storage if we made Set lower level.

Keno · January 8, 2021, 3:19am

Array{Nothing, 1} uses basically no storage (only metadata), so it’s not that bad, but more notable at these tiny sizes of course

anon56330260 · January 8, 2021, 3:23am

Half storage? How can you do that? Array{Nothing,1} allocates only 80 bytes even if it contains many elements.

julia> @allocated a = Array{Int}(undef,10000000)
80000080

julia> @allocated a = Array{Nothing}(undef,10000000)
80

The problem here is that you have only 5 elements in this set, but a hash table with n = 16 slots (with 16 UInt). Also, if you check the keys of the set, you will find it contains 16 keys:

julia> set.dict.keys
16-element Vector{Int64}:

So another half time is spent to copy the slots and these additional keys.

Topic		Replies	Views
Why would I ever use deepcopy? General Usage question	5	5250	June 28, 2018
Why is copying using a loop is much slower than `copy` for large arrays? Performance question , copy	10	1417	November 27, 2022
Set performance question New to Julia	3	572	May 8, 2020
Array broadcasting slower than numpy? Performance	20	717	June 4, 2022
True power set of a set is slower than power array of array Performance	3	721	April 17, 2021

Why is deepcopy() -ing Sets slower than Arrays of the same size?

Related topics