Constructing String(x::Vector{UInt8}) empties the vector?

data = Vector{UInt8}("hello, world!")
@assert(!isempty(data))
String(data)
@assert(!isempty(data)) # boom!

is this expected?

I found this issue: https://github.com/JuliaLang/julia/issues/24388, which eventually was resolved by copying data when converting string to a vector. Perhaps, when constructing a string from a vector, it should also copy?

4 Likes

Actually, after reading that thread more thoroughly, Jeff Bezanson suggested (and implemented?) precisely this behaviour: Conversion of String to Vector{UInt8} · Issue #24388 · JuliaLang/julia · GitHub

So I guess this is “by design” because Julia doesn’t have move().

1 Like

Still, it’s mind-blowing that String constructor is allowed to do that.

4 Likes

I agree this is surprising, but it is explicitly documented:

help?> String
search: String string StringIndexError Cstring Cwstring bitstring SubString include_string setrounding unsafe_string AbstractString escape_string unescape_string AbstractUnitRange SubstitutionString setprecision AbstractIrrational

  String(v::AbstractVector{UInt8})

  Create a new String object from a byte vector v containing UTF-8 encoded characters. If v is Vector{UInt8} it will be truncated to zero length and future modification of v cannot affect the contents of the resulting string. To avoid
  truncation of Vector{UInt8} data, use String(copy(v)); for other AbstractVector types, String(v) already makes a copy.
3 Likes

I guess it’s slightly out of convention. Proper name should be String!, then there would be no confusion.

6 Likes

If you want a non-destructive string view of a byte array, another option is https://github.com/JuliaStrings/StringViews.jl

4 Likes

The problem is people will read this documentation after their tests fail in a weird way, not before (if they have tests, that is).

2 Likes

Hi. I am one of those people =) I agree that it should follow the expected behaviour and, if otherwise, better be called String!. I don’t think it matters much that it is documented, since the level of perplexity of such behaviour is so large. Nobody would check the constructor of strings before anything failing.