Do we have a string type that wraps a (sub-)vector of bytes in some package? Something a bit safer than WeakRefString (so not pointer based)?
julia> String(rand(UInt8, 16))
"9#N\xab>x\xe2VD\xa2\x84-B\r\x15\xbd"
From the question, I would guess that the following behavior is not desired:
julia> a = rand(UInt8, 16);
julia> b = String(a)
"-\xfb\xe2d\x922]\xbc\xac\xc2\xf1\xa6\xa7F3;"
julia> a
0-element Array{UInt8,1}
but more something like a mutable String
mutable struct MutableString{A <: AbstractVector{<:UInt8}}
v::A
end
But the point is that AbstractString
must not be mutable.
So you should rather write:
a = rand(UInt8, 16)
b = String(copy(a))
I think.
Perhaps the orignal poster is looking for codeunits.
Correct - this is for decoding some binary data with embedded strings (representing C++ type information, it’s ROOT files with custom streamers). I want to check what’s in these strings with minimum memory allocation and copies, so I’d like to just wrap a SubArray
into something that implements an AbstractString
. But the string may be passed around a bit (in a very limited fashion), so I’d like to avoid WeakRefString
.
I’m kinda looking for the reverse of codeunits
: A copy-free way to wrap a sub-array into something string-like, without invalidating the original array.
If I understand it correctly, this is deliberately disallowed because strings are treated as immutable. It would not be hard to create an AbstractVector
with the behavior you are talking about.
Perhaps you could describe a bit about what you are trying to achieve? It seems to me that you could just go ahead and do whatever you like with a Vector{UInt8}
and later call String
on it. This is only inadequate if you need to repeatedly manipulate the string. Unicode gets a bit tricky, but again, it depends on what you are trying to do.
It should be possible to create a StringView
type that takes a region of a byte array and wraps it to act like a string. One would want to reuse much of the code for String
and SubString{String}
, which is non-trivial, so the ideal way to do this might require a little refactoring of the code to make it possible to reuse or it might be easier to add StringView
to Base and add it to the dispatch on the relevant methods.
Sure, normally we definitely want Strings to be immutable. However, this is for a binary parsing application, and I’m checking the contents of strings imbedded in a buffer that will be destroyed afterwards. In the interest of performance, I don’t want to turn these embedded strings into actual String
s, as that would result in unnecessary memory allocation.
WeakRefString
has been created for scenarios like this, but in my case I don’t want something pointer based, at least not explicitly (I may use UnsafeArray
s at a higher level). Of course it’s not hard to create something similar that is backed by an AbstractVector{UInt8}
instead of a Ptr{UInt8}
- I just didn’t wondered if someone had already done so in some package, to avoid duplication of work.
Kinda like that, yes. I just wondered if anybody had already done something like it. Basically something like
struct StringView{BV<:AbstractVector{<:UInt8}}
data::BV
end
When used with an Array
or SubArray
, it would be completely safe (GC-wise), but immutability of the string’s content would not be guaranteed. When used with an UnsafeArray
, it would be an allocation-free bitstype like WeakRefString
, but still guarded by the automatic GC.@preserve
of UnsafeArrays.@uviews
.
Update: There is now a package for this: GitHub - JuliaStrings/StringViews.jl: String-like views of arbitrary Julia byte arrays
It has some code duplication with Base, but it would be possible to reduce this significantly by a tiny amount of refactoring in a future Julia version.
Thanks a lot, @stevengj!