Ok, while trying to check the original example in the OP with primitive type
, I noticed that your (@Maurizio_Tomasi) string2ntuple
function is what’s causing half the ruckus. You’re returning a large mixed tuple instead of NTuple{N, UInt8}
. This forces the compiler to insert conversions in your constructor, which probably kills performance at creation of Foo
. Julia uses UTF-8, and its Char
is a unicode codepoint (which may be larger than a byte), not a UInt8
:
julia> s1 = string2ntuple("test_tag", 32)
(0x74, 0x65, 0x73, 0x74, 0x5f, 0x74, 0x61, 0x67, '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0')
The tuple that’s returned above is a Tuple{UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char}
, which isn’t the same as NTuple{32, UInt8}
.
Fixing that and creation of Foo
works like a charm:
julia> struct Foo
id:: UInt64
tag::NTuple{32, UInt8}
start_comment::NTuple{4096, UInt8}
end_comment::NTuple{4096, UInt8}
end
julia> string2ntuple(s::String, len) = ntuple(i -> i <= ncodeunits(s) ? UInt8(codeunit(s, i)) : 0x0, len)
string2ntuple (generic function with 1 method)
julia> @time data = Foo(
0,
string2ntuple("test_tag", 32),
string2ntuple("start_comment", 4096),
string2ntuple("end_comment", 4096),
); # silence output
0.009649 seconds (2.93 k allocations: 257.717 KiB, 96.70% compilation time)
julia> @time data = Foo(
0,
string2ntuple("test_tag", 32),
string2ntuple("start_comment", 4096),
string2ntuple("end_comment", 4096),
); # silence output, second run so compilation is done
0.000276 seconds (9 allocations: 89.016 KiB)
julia> data.tag |> collect |> String
"test_tag\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
But creating the array is still running (which also didn’t work for your string2ntuple
version):
julia> @time dataset = Foo[data]
I’ll see what it reports once it’s done, but I’d say there’s a bug here. I’m also not sure whether that conversion Char -> UInt8
should take up this much in your bad string2ntuple
, there may be a better way to write that function in the first place.