Using large NTuples makes Julia hang

Ok, while trying to check the original example in the OP with primitive type, I noticed that your (@Maurizio_Tomasi) string2ntuple function is what’s causing half the ruckus. You’re returning a large mixed tuple instead of NTuple{N, UInt8}. This forces the compiler to insert conversions in your constructor, which probably kills performance at creation of Foo. Julia uses UTF-8, and its Char is a unicode codepoint (which may be larger than a byte), not a UInt8:

julia> s1 = string2ntuple("test_tag", 32)
(0x74, 0x65, 0x73, 0x74, 0x5f, 0x74, 0x61, 0x67, '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0')

The tuple that’s returned above is a Tuple{UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char, Char}, which isn’t the same as NTuple{32, UInt8}.

Fixing that and creation of Foo works like a charm:

julia> struct Foo
           id:: UInt64
           tag::NTuple{32, UInt8}
           start_comment::NTuple{4096, UInt8}
           end_comment::NTuple{4096, UInt8}
       end

julia> string2ntuple(s::String, len) = ntuple(i -> i <= ncodeunits(s) ? UInt8(codeunit(s, i)) : 0x0, len)
string2ntuple (generic function with 1 method)

julia> @time data = Foo(
               0,
               string2ntuple("test_tag", 32),
               string2ntuple("start_comment", 4096),
               string2ntuple("end_comment", 4096),
           ); # silence output
  0.009649 seconds (2.93 k allocations: 257.717 KiB, 96.70% compilation time)

julia> @time data = Foo(
               0,
               string2ntuple("test_tag", 32),
               string2ntuple("start_comment", 4096),
               string2ntuple("end_comment", 4096),
           ); # silence output, second run so compilation is done
  0.000276 seconds (9 allocations: 89.016 KiB)

julia> data.tag |> collect |> String
"test_tag\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"

But creating the array is still running (which also didn’t work for your string2ntuple version):

julia> @time dataset = Foo[data]

I’ll see what it reports once it’s done, but I’d say there’s a bug here. I’m also not sure whether that conversion Char -> UInt8 should take up this much in your bad string2ntuple, there may be a better way to write that function in the first place.

3 Likes