Way to make SharedArray over fixed length Strings?

if you do SharedArray{AbstractString}(10, 1)

you get the following error:

ERROR: ArgumentError: type of SharedArray elements must be bits types, got AbstractString

Is there a package for making fixed length strings to allow their use in SharedArrays?

Thanks!


edit: is there a way to make it work with LegacyStrings.jl?

What about an SVector{N, Char} from StaticArrays.jl?

julia> using StaticArrays

julia> isbits(SVector{10, Char})
true

How do you treat that as a string though?

And is there a way to do build in rpad()'ing?

julia> s = SVector{4,Char}(b"abcd")
4-element SVector{4,Char}:
 'a'
 'b'
 'c'
 'd'

julia> convert(String, s)
"abcd"

As for padding, just fill the remaining characters with spaces?

1 Like

Might be nice to have a package implementing a fixed-length SString{n,T} <: AbstractString type on top of SVector{n,T} (supporting both T = Char for UTF-32 and T = UInt8 for ASCII). If you don’t need to support strings containing NUL, then you could also use NUL-padding to give a string of length ≤ n with fixed-length storage.

2 Likes

Can you guys take a look at this package I wrote up?

It’s probably more of a proof of concept than an efficient design at this point.

// I basically just started with @stevengj’s LatexStrings.jl and worked my way backward

1 Like

There is also: https://github.com/JuliaComputing/FixedSizeStrings.jl

Well damn… Can anyone discuss the pros & cons of the two implementations?

I’m cool with letting the package I wrote die, I just want to know it’s for the right reasons.

Well, yours doesn’t really seem to be optimized for performance since you don’t properly propagate the fixed size.
Also, this is a pretty crazy use of parse and eval:

You do realize, that you can just do SVector{length(string), UInt8}(Vector{UInt8}(string)) ? :slight_smile:

In general, I always found it useful to have a fixed size string, that looks more like this:

struct FString{N}
length::Int 
data::NTuple{N, UInt8}
end

So that one can have a Vector{FString{32}} with variable sized strings - which seems to be a shortcoming of FixedSizeStrings.

I made a prototype for such a type for my GPU blog post: fixed_strings.jl

All packages don’t support UTF8, so a package that would offer that would be nice.

In general, I recommend to just extend the existing package with more functionality - especially when both packages are still that small!

2 Likes

I wholeheartedly agree with this. There was just no way to find your code?

It seems like you only came forward because of Cunningham’s Law:

“The best way to get the right answer on the Internet is not to ask a question, it’s to post the wrong answer.”


What do you think the best route is moving forward?

I just don’t really like writing pointless code and wasting time.

// and from ANN: HigherPrecision and Pruning and quality control for the package ecosystem, it seems like this kind of thing is happening all the time to Julians :frowning:

1 Like

There was just no way to find your code?

Sorry :wink:
I meant extending FixedSizeStrings, which isn’t my package. I should probably contribute my code to that package as well :slight_smile:

you could figure out whats missing for your use case in FixedSizeStrings and make a PR - feel free to ping me for guidance :slight_smile: i should also figure out what i could add…

Is there a definitive solution to having SharedArray’s over fixed length strings?

Ran into a use case for symbols recently too

This may be of interest: https://github.com/JuliaComputing/FixedSizeStrings.jl/pull/7

ShortStrings.jl should work.
If your strings are relatively short.
It is super fast for short strings but becomes much slower the longer the string is. Would not use it it over 255 bytes at most

2 Likes

Note that if you want a fixed number of bytes, then you can use a StringView of an SVector:

julia> using StringViews, StaticArrays

julia> s = StringView(SVector{4,UInt8}(b"abcd"))
"abcd"

It is probably more useful to have a fixed upper bound on the number of bytes, in which case you can use a StringView of a SubArray of an SVector.

julia> s = StringView(@view SVector{8,UInt8}(b"abcd    ")[1:4])
"abcd"

For example, given an array a of strings, you can convert it to an array of strings with fixed-length inline (isbits) storage via:

julia> function tofixedstr(s::String, nbytes)
           n = ncodeunits(s)
           StringView(@view SVector{nbytes}(codeunits(s * ' '^(nbytes-n)))[Base.OneTo(n)])
       end
tofixedstr (generic function with 3 methods)

julia> function tofixedstr(a::AbstractVector{String})
           npad = maximum(ncodeunits, a)
           tofixedstr.(a, npad)
       end
tofixedstr (generic function with 3 methods)

julia> a = ["foo", "blärg", "l♡ve"]
3-element Array{String,1}:
 "foo"
 "blärg"
 "l♡ve"

julia> b = tofixedstr(a)
3-element Array{StringView{SubArray{UInt8,1,SArray{Tuple{6},UInt8,1,6},Tuple{Base.OneTo{Int64}},true}},1}:
 "foo"
 "blärg"
 "l♡ve"

Note that this is a bits type, so all of the data is stored inline, as needed for SharedArray:

julia> isbits(b[1])
true

julia> println(reinterpret(UInt8, b))
UInt8[0x66, 0x6f, 0x6f, 0x20, 0x20, 0x20, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x62, 0x6c, 0xc3, 0xa4, 0x72, 0x67, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x6c, 0xe2, 0x99, 0xa1, 0x76, 0x65, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]

(Note that 0x20 is the ' ' padding byte in each underlying SVector.)

All the usual string operations should work, e.g.

julia> sort(b)
3-element Array{StringView{SubArray{UInt8,1,SArray{Tuple{6},UInt8,1,6},Tuple{Base.OneTo{Int64}},true}},1}:
 "blärg"
 "foo"
 "l♡ve"
2 Likes