Consider the following struct
:
struct MetaTuple
data::Tuple{Vararg{Union{Tuple{}, Tuple{Int16}, Tuple{Int16, Int16}, Tuple{Int16, Int16, Int16}, Tuple{Int16, Int16, Int16, Int16}}}}
end
Its data is a tuple of tuples. It is of variable length (on the order of 1000), but the ‘inner’ tuples all have length in 0:4
.
Actually, consider that I have millions of them[1], and that I would like to either 1) serialize them directly or 2) transform them into simpler structs for serialization.
I am running into the same problem in either case: Because no two of the metatuples have the same shape, every call to serialize
or flatten
(defined in a moment) results in compilation time greater than 99\%. It takes multiple seconds per struct to flatten them, no matter how many I have.
I know that this is what @nospecialize
is for, but I haven’t been able to figure out how to make it work. I’ve tried a lot of approaches, to no avail. Here’s an example of code that I thought might work, but which doesn’t[2]:
module Scratch
using Random
struct MetaTuple
data::Tuple{Vararg{Union{Tuple{}, Tuple{Int16}, Tuple{Int16, Int16}, Tuple{Int16, Int16, Int16}, Tuple{Int16, Int16, Int16, Int16}}}}
end
function fakeData()
ntups = rand(500:1500)
tlengths = rand(0:4, ntups)
data = [tlen == 0 ? () : Int16.(Tuple(rand(1:ntups, tlen))) for tlen in tlengths]
return MetaTuple(Tuple(data))
end
struct FlatStruct
L::Tuple{Vararg{Int8}}
data::Tuple{Vararg{Int16}}
end
function flatten(@nospecialize(mt))
L = length.(mt.data)
data = Vector{Int16}(undef, sum(L))
n = 1
for ix in eachindex(mt.data)
for x in mt.data[ix]
data[n] = x
n += 1
end
end
return FlatStruct(Tuple(L), Tuple(data))
end
mt = fakeData()
@time flatten(mt)
mt = fakeData()
@time flatten(mt)
end
Julia is still compiling from scratch the second (and every subsequent) time I call flatten
.
4.683906 seconds (8.90 M allocations: 539.773 MiB, 4.67% gc time, 99.85% compilation time)
2.469198 seconds (6.29 M allocations: 362.959 MiB, 3.08% gc time, 99.85% compilation time)
My best guess is that it’s the value of mt.data
that needs to be @nospecialize
d, rather than mt
itself. But that approach doesn’t seem to work either:
function flatten(@nospecialize(X))
L = length.(X)
data = Vector{Int16}(undef, sum(L))
n = 1
for ix in eachindex(X)
for x in X[ix]
data[n] = x
n += 1
end
end
return FlatStruct(Tuple(L), Tuple(data))
end
mt = fakeData()
@time flatten(mt.data)
mt = fakeData()
@time flatten(mt.data)
\Rightarrow
1.677197 seconds (5.15 M allocations: 290.645 MiB, 3.50% gc time, 99.84% compilation time)
4.487556 seconds (8.76 M allocations: 529.944 MiB, 3.50% gc time, 99.61% compilation time)
So there’s something I don’t understand about @nospecialize
. Do you know what I’m doing wrong?