Bikeshedding: Name of package with AbstractStrings based on NTuples

I’m in the process of creating a new package for strings based on NTuples. It is similar in concept to InlineStrings.jl, but it uses NTuple{N,UInt8} instead of primitives. It allows for more variation in N. In part, it is intended for interop with C where fixed sized strings are sometimes represented by char str[N].

Currently, I have called this package NStrings.

Others have proposed

  • TupleStrings
  • StaticStrings

Do you have another idea? Please let me know below. Let the bikeshedding begin [1].

[1] Tone is hard to convey in text. I’m being jovial. That said I would earnestly like to hear your ideas.

StaticCharStrings or CharStrings?

Why Char? The tuples do not consist of Chars.

I mean…

UInt8 and C Char are the same thing and I doubt you gonna use this for C Unicode? (because if it were unicode it wouldn’t be char str[N] in C

1 Like

CCharTuples?

1 Like

C interop is a particular application, but this meant to cover Julia strings in general.

julia> str = "This is a Julia string with multiple NUL bytes \0\0\0\0"
"This is a Julia string with multiple NUL bytes \0\0\0\0"

julia> NString(Tuple(codeunits.(str)))
"This is a Julia string with multiple NUL bytes \0\0\0\0"

julia> NString(Tuple(codeunits.(str))) == str
true

julia> length(str)
51

julia> Tuple(codeunits.(str))
(0x54, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20, 0x61, 0x20, 0x4a, 0x75, 0x6c, 0x69, 0x61, 0x20, 0x73, 0x74, 0x72, 0x69, 0x6e, 0x67, 0x20, 0x77, 0x69, 0x74, 0x68, 0x20, 0x6d, 0x75, 0x6c, 0x74, 0x69, 0x70, 0x6c, 0x65, 0x20, 0x4e, 0x55, 0x4c, 0x20, 0x62, 0x79, 0x74, 0x65, 0x73, 0x20, 0x00, 0x00, 0x00, 0x00)
1 Like

I disagree, I think you’d definitely want the storage to consist of UTF-8 code units so that it can store unicode.

This is also important for C interoperability, since typically C char* strings these days are actually UTF8-encoded Unicode — i.e. a char is a code unit, not a Char (Unicode codepoint). (This is basically true everywhere but Windows, and even on Windows people are starting to use UTF8 more.)

1 Like

UTF8TupleStrings?

Julia’s ordinary String type is UTF-8, so I think it’s reasonable to assume that this is the default for any new string type and doesn’t need to be in the type name unless you’re using a different encoding.

1 Like

I’d advocate in favour of StaticStrings, because I think it’s really informative.

When I see the name StaticStrings, I immediately associate it with packages like StaticArrays.jl, and this is essentially the StaticArray version of a string anyways.

NStrings.jl on the other hand is a really uninformative name as far as I’m concerned. All strings have N elements. What’s important and useful about these strings is not N, but the fact that they’re static.

7 Likes

Is this essentially the same thing/type as FixedSizeStrings.jl/FixedSizeStrings.jl at master · JuliaComputing/FixedSizeStrings.jl · GitHub, right? Which seems also a good name :wink:

1 Like

Yes, although FixedSizeStrings.jl seems to assume that a Char corresponds to a single codeunit, which is wrong for UTF-8.

julia> s = "สวัสดี"
"สวัสดี"

julia> using NStrings, FixedSizeStrings

julia> NString(s)
"สวัสดี"

julia> FixedSizeString(s)
ERROR: InexactError: trunc(UInt8, 3626)

Another approach would be merging with an existing package. Does anyone have any opinions on that?

I think you might be able to cover both representations by using an additional type parameter for the encoding, see for example this implementation

In this case we cover both UTF8 and ASCII string types. BTW thanks for making this package, maybe Zarr.jl can use it so we don’t have to maintain our own implementation there anymore.

1 Like

Could this be considered a rewrite of FixedSizeStrings.jl? If so, it’s pretty good name.

UTF-8 is a backwards compatible superset of ASCII, so StaticStrings.jl does support both just as Julia Strings do.