String indexing

Hi all, playing with Julia today with some limited Python and R experience. How does Julia perform indexing from the end, up to the end minus some value but conditional on the length.

Example: strings = [“name”, “verylargenameforname”]

Python example:
string[0][-11:] will give me “name”
string[1][-11:] will give me “nameforname”

R example:
library(tidyverse)
strings = c(“name”, “verylargenameforname”)
str_sub(strings[1], start = -11)
str_sub(strings[2], start = -11)

Julia:
strings[1][end-10: end] → throws a BoundsError since the string length is less than 10
strings[2][end-10: end]

1 Like

You may have to define a function to do this:

julia> function f(s,n)
           try
               return s[end-n:end]
           catch
               return s
           end
       end
f (generic function with 1 method)

julia> f("name",11)
"name"

julia> f("ultralargename",11)
"tralargename"

I guess you could avoid the try block by doing calculations on the indices and string length, but personally I’m afraid of dealing with codepoints and other string black magic.

1 Like

Just to show a simple oneliner alternative (not that much black magic involved):

julia> strings[1][max(end-10,1):end]
"name"

julia> strings[2][max(end-10,1):end]
"nameforname"

julia> endstr(s,n) = s[max(end-n,1):end]
endstr (generic function with 1 method)

julia> endstr(strings[1],10)
"name"

julia> endstr(strings[2],10)
"nameforname"

4 Likes

Nice, I wish I had thought of using max :rofl:

Be careful when indexing into strings:

julia> s = "αβγδϵhelloω"
"αβγδϵhelloω"

julia> s[end-10:end]
ERROR: StringIndexError("αβγδϵhelloω", 6)
Stacktrace:
 [1] string_index_err(::String, ::Int64) at ./strings/string.jl:12
 [2] getindex(::String, ::UnitRange{Int64}) at ./strings/string.jl:249
 [3] top-level scope at REPL[110]:1

This won’t work because indices work differently for strings:

julia> s[1]
'α': Unicode U+03B1 (category Ll: Letter, lowercase)

julia> s[2]
ERROR: StringIndexError("αβγδϵhelloω", 2)
Stacktrace:
 [1] string_index_err(::String, ::Int64) at ./strings/string.jl:12
 [2] getindex_continued(::String, ::Int64, ::UInt32) at ./strings/string.jl:220
 [3] getindex(::String, ::Int64) at ./strings/string.jl:213
 [4] top-level scope at REPL[114]:1

I’m not completely confident about this code, but you could try

julia> foo(str, n) = chop(str; head=max(0, length(str)-n), tail=0)

julia> foo("name", 7)
"name"

julia> foo("nameforname", 7)
"forname"

julia> foo("αβγδϵhelloω", 7)
"ϵhelloω"

If you want to use indexing in particular, you should look into the prevind and nextind functions:

julia> bar(str, n) = str[prevind(str, lastindex(str), min(length(str), n)-1):end]

julia> bar("name", 7)
"name"

julia> bar("nameforname", 7)
"forname"

julia> bar("αβγδϵhelloω", 7)
"ϵhelloω"
2 Likes

Thanks fellows! I was looking for the simplest solution and Niclas solution fits the bill! Although, I can see that writing a function to accomplish a common operation like that could be an issue to many new comers. I understand that negative indexing is a big no in Julia because of safety issues, which makes sense. But there is a reason why Python and R have those build in - allows for ease of use, data interactivity, and quick prototyping. There should be simple way to grab elements starting from the end (and not just for strings). A middle ground solution would be great like a module similar to R’s stringr, if it doesn’t exist already (did not see this addressed in Strs package).

Looking forward playing with the language!

As I tried to demonstrate, that solution will not work for strings in general.

Note that there is a chop function which can remove a number of characters from the beginning or end of a string while handling unicode characters correctly:

chop(s::AbstractString; head::Integer = 0, tail::Integer = 1)

Remove the first head and the last tail characters from s. The call chop(s) removes the last character from s. If it is requested to remove more characters than length(s) then an empty string is returned.

3 Likes

Chop would take care of the unicode issue DHF mentioned but it still returns an empty string while the desired return is the string itself (if string is less “chop” → return string). Also, frequently you don’t know where you need to chop at (head = unknown), you just want the a number of characters starting from the end, but keeping the strings that are less than the desired chop amount (out of bounds).

maybe something like this:
string_chop(string, from = end - 10 , to = end, max_length = True)

where max_length is a Boolean to return the string if end-10 is out of bound. But currently chop doesn’t use the “end” keyword.

Niclas solution works for strings but as DNF pointed out, it doesn’t work for unicode.

I’m missing something. Can you give an example? I thought my suggestions covered your requirements.

Yes, I am all set as far as strings.

The problem statement is unclear (to me at least) so I’m not sure what the desired behavior is. Just mentioning chop which is generally useful for chopping the head or tail off a string.

1 Like