While working on https://github.com/JuliaCollections/DataStructures.jl/pull/759, which I thought would be a funny solution to a project euler riddle, I noticed that String
is not a subtype of AbstractVector{Char}
. Are there good reasons against making AbstractString <: AbstractVector
? Or even <: AbstractVector{AbstractChar}
?
Not an answer, but this seems related to your last question: Vector{Int} <: Vector{Real} is false??? · JuliaNotes.jl
The <:
behaves sometimes a little bit different with regard to the element typesâŠ
There are various reasons for considering strings something totally different from vectors of chars. For me, an important one is vectors are mutable objects, whereas strings are not. Also, indexing the characters of a string in the fashion of indexing vectors (i.e. x[i]
with i
being an integer) gives wrong results in strings with Unicode characters.
another big problem with String
s compared to Vector
s is that not all indices in the range begin:end
are valid:
julia> a="âb"
"âb"
julia> a[2]
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'â', [4]=>'b'
Stacktrace:
[1] string_index_err(s::String, i::Int64)
@ Base ./strings/string.jl:12
[2] getindex_continued(s::String, i::Int64, u::UInt32)
@ Base ./strings/string.jl:233
[3] getindex(s::String, i::Int64)
@ Base ./strings/string.jl:226
[4] top-level scope
@ REPL[5]:1
(nextind(a,1)
should be used instead)
Another difference: String
s are treated as scalars by broadcasting:
jl> println.("Hello, ", 1:7);
Hello, 1
Hello, 2
Hello, 3
Hello, 4
Hello, 5
Hello, 6
Hello, 7
vs
jl> println.(['H', 'e', 'l', 'l', 'o', ',', ' '], 1:7);
H1
e2
l3
l4
o5
,6
7
Not necessarily: ranges like 1:10
are mutable immutable AbstractVectors
, as are SVectors
from StaticArrays.jl and many other examples.
I think the main technical reason that AbstractString
is not a subtype of AbstractVector
is that the indices of a String
are not necessarily consecutive, and in consequence there is no O(1) algorithm to âgive me the n-th character of a stringâ (str[nextind(str, 0, n)]
in Julia).
Conceptually, however, one rarely views a String
as a collection of characters, in part because the concept of a âcharacterâ is itself ambiguous in Unicode. For example, "no\u00EBl"== "noĂ«l"
and "noe\u0308l" == "noeÌl"
are canonically equivalent strings in Unicode, but the former has length
5 and the latter has length
4 (i.e., different numbers of codepoints, depending on whether a combining character is used to make the Ă«
). For a similar reason, itâs not generally useful to ask for the ân-th character of a stringâ where n
is chosen at random.
(It is useful to be able to read from an index that was located previously, e.g. by a find
function, but in that case the index is just an arbitrary position indicator in the string and you donât care how many characters it corresponds to.)
(Conversely, codeunits(str)
for a string str
does give a subtype of AbstractVector
, but the elements of this array are not characters, but rather the elementary units of the unicode encoding of the stringâbytes, for String
with its UTF-8 encoding.)
Nit: immutable
Thanks, fixed.
Thanks for update and quick response, itâs too helpful.
Just wondered what is the best way to convert between string and vector of characters:
julia> s = "toto"
"toto"
julia> vc = collect(s)
4-element Vector{Char}:
't': ASCII/Unicode U+0074 (category Ll: Letter, lowercase)
'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)
't': ASCII/Unicode U+0074 (category Ll: Letter, lowercase)
'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)
julia> s = String(vc)
"toto"
Those are good ways to do it. Note that collecting a string into a vector of characters potentially blows up the storage by up to 4x.