# Why is deepcopy() -ing Sets slower than Arrays of the same size?

I have the following code:

``````function deepcopy_n_times(obj::Set{Int}, n::Int)
for i in 1:n
deepcopy(obj)
end
end

function deepcopy_n_times(obj::Array{Int, 1}, n::Int)
for i in 1:n
deepcopy(obj)
end
end

set = Set{Int}([1, 2, 3, 4, 5])
arr = Array{Int, 1}([1, 2, 3, 4, 5])
num_times = 10000000

@time deepcopy_n_times(arr, num_times)
@time deepcopy_n_times(arr, num_times)

@time deepcopy_n_times(set, num_times)
@time deepcopy_n_times(set, num_times)
``````

The output is:

``````  1.341462 seconds (30.00 M allocations: 4.619 GiB, 17.08% gc time)
1.335890 seconds (30.00 M allocations: 4.619 GiB, 16.63% gc time)
3.896371 seconds (80.00 M allocations: 8.941 GiB, 18.57% gc time)
3.893588 seconds (80.00 M allocations: 8.941 GiB, 18.75% gc time)
``````

Why is copying a set more expensive time-wise? I believe I am betraying my lack of knowledge about how Sets are represented under the hood.

Yes, this is a case where reading the implementation (which is very readable Julia code) makes clear that it must strictly be slower since the Set implementation contains an Array inside of it.

If you read the Set definition, you’ll see:

``````struct Set{T} <: AbstractSet{T}
dict::Dict{T,Nothing}

Set{T}() where {T} = new(Dict{T,Nothing}())
Set{T}(s::Set{T}) where {T} = new(Dict{T,Nothing}(s.dict))
end
``````

At which point you might jump to the Dict definition to see:

``````mutable struct Dict{K,V} <: AbstractDict{K,V}
slots::Array{UInt8,1}
keys::Array{K,1}
vals::Array{V,1}
ndel::Int
count::Int
age::UInt
idxfloor::Int  # an index <= the indices of all used slots
maxprobe::Int

function Dict{K,V}() where V where K
n = 16
new(zeros(UInt8,n), Vector{K}(undef, n), Vector{V}(undef, n), 0, 0, 0, 1, 0)
end
function Dict{K,V}(d::Dict{K,V}) where V where K
new(copy(d.slots), copy(d.keys), copy(d.vals), d.ndel, d.count, d.age,
d.idxfloor, d.maxprobe)
end
function Dict{K, V}(slots, keys, vals, ndel, count, age, idxfloor, maxprobe) where {K, V}
new(slots, keys, vals, ndel, count, age, idxfloor, maxprobe)
end
end
``````
2 Likes

This sugests we should probably do a re-write of our `Set` struct. we should be able to use about half the storage if we made `Set` lower level.

`Array{Nothing, 1}` uses basically no storage (only metadata), so it’s not that bad, but more notable at these tiny sizes of course

Half storage? How can you do that? Array{Nothing,1} allocates only 80 bytes even if it contains many elements.

``````julia> @allocated a = Array{Int}(undef,10000000)
80000080

julia> @allocated a = Array{Nothing}(undef,10000000)
80
``````

The problem here is that you have only 5 elements in this set, but a hash table with `n = 16` slots (with 16 UInt). Also, if you check the keys of the set, you will find it contains 16 keys:

``````julia> set.dict.keys
16-element Vector{Int64}:
``````

So another half time is spent to copy the slots and these additional keys.