String indexing

georgegi86 · April 11, 2020, 4:43pm

Hi all, playing with Julia today with some limited Python and R experience. How does Julia perform indexing from the end, up to the end minus some value but conditional on the length.

Example: strings = [“name”, “verylargenameforname”]

Python example:
string[0][-11:] will give me “name”
string[1][-11:] will give me “nameforname”

R example:
library(tidyverse)
strings = c(“name”, “verylargenameforname”)
str_sub(strings[1], start = -11)
str_sub(strings[2], start = -11)

Julia:
strings[1][end-10: end] → throws a BoundsError since the string length is less than 10
strings[2][end-10: end]

mbaz · April 11, 2020, 5:49pm

You may have to define a function to do this:

julia> function f(s,n)
           try
               return s[end-n:end]
           catch
               return s
           end
       end
f (generic function with 1 method)

julia> f("name",11)
"name"

julia> f("ultralargename",11)
"tralargename"

I guess you could avoid the try block by doing calculations on the indices and string length, but personally I’m afraid of dealing with codepoints and other string black magic.

NiclasMattsson · April 11, 2020, 5:58pm

Just to show a simple oneliner alternative (not that much black magic involved):

julia> strings[1][max(end-10,1):end]
"name"

julia> strings[2][max(end-10,1):end]
"nameforname"

julia> endstr(s,n) = s[max(end-n,1):end]
endstr (generic function with 1 method)

julia> endstr(strings[1],10)
"name"

julia> endstr(strings[2],10)
"nameforname"

mbaz · April 11, 2020, 6:00pm

Nice, I wish I had thought of using max

DNF · April 11, 2020, 6:14pm

Be careful when indexing into strings:

julia> s = "αβγδϵhelloω"
"αβγδϵhelloω"

julia> s[end-10:end]
ERROR: StringIndexError("αβγδϵhelloω", 6)
Stacktrace:
 [1] string_index_err(::String, ::Int64) at ./strings/string.jl:12
 [2] getindex(::String, ::UnitRange{Int64}) at ./strings/string.jl:249
 [3] top-level scope at REPL[110]:1

This won’t work because indices work differently for strings:

julia> s[1]
'α': Unicode U+03B1 (category Ll: Letter, lowercase)

julia> s[2]
ERROR: StringIndexError("αβγδϵhelloω", 2)
Stacktrace:
 [1] string_index_err(::String, ::Int64) at ./strings/string.jl:12
 [2] getindex_continued(::String, ::Int64, ::UInt32) at ./strings/string.jl:220
 [3] getindex(::String, ::Int64) at ./strings/string.jl:213
 [4] top-level scope at REPL[114]:1

I’m not completely confident about this code, but you could try

julia> foo(str, n) = chop(str; head=max(0, length(str)-n), tail=0)

julia> foo("name", 7)
"name"

julia> foo("nameforname", 7)
"forname"

julia> foo("αβγδϵhelloω", 7)
"ϵhelloω"

If you want to use indexing in particular, you should look into the prevind and nextind functions:

julia> bar(str, n) = str[prevind(str, lastindex(str), min(length(str), n)-1):end]

julia> bar("name", 7)
"name"

julia> bar("nameforname", 7)
"forname"

julia> bar("αβγδϵhelloω", 7)
"ϵhelloω"

georgegi86 · April 11, 2020, 8:04pm

Thanks fellows! I was looking for the simplest solution and Niclas solution fits the bill! Although, I can see that writing a function to accomplish a common operation like that could be an issue to many new comers. I understand that negative indexing is a big no in Julia because of safety issues, which makes sense. But there is a reason why Python and R have those build in - allows for ease of use, data interactivity, and quick prototyping. There should be simple way to grab elements starting from the end (and not just for strings). A middle ground solution would be great like a module similar to R’s stringr, if it doesn’t exist already (did not see this addressed in Strs package).

Looking forward playing with the language!

DNF · April 11, 2020, 8:41pm

As I tried to demonstrate, that solution will not work for strings in general.

StefanKarpinski · April 11, 2020, 8:45pm

Note that there is a chop function which can remove a number of characters from the beginning or end of a string while handling unicode characters correctly:

chop(s::AbstractString; head::Integer = 0, tail::Integer = 1)
Remove the first head and the last tail characters from s. The call chop(s) removes the last character from s. If it is requested to remove more characters than length(s) then an empty string is returned.

georgegi86 · April 11, 2020, 9:59pm

Chop would take care of the unicode issue DHF mentioned but it still returns an empty string while the desired return is the string itself (if string is less “chop” → return string). Also, frequently you don’t know where you need to chop at (head = unknown), you just want the a number of characters starting from the end, but keeping the strings that are less than the desired chop amount (out of bounds).

maybe something like this:
string_chop(string, from = end - 10 , to = end, max_length = True)

where max_length is a Boolean to return the string if end-10 is out of bound. But currently chop doesn’t use the “end” keyword.

Niclas solution works for strings but as DNF pointed out, it doesn’t work for unicode.

DNF · April 11, 2020, 10:03pm

I’m missing something. Can you give an example? I thought my suggestions covered your requirements.

georgegi86 · April 11, 2020, 10:20pm

Yes, I am all set as far as strings.

StefanKarpinski · April 11, 2020, 10:47pm

The problem statement is unclear (to me at least) so I’m not sure what the desired behavior is. Just mentioning chop which is generally useful for chopping the head or tail off a string.

Topic		Replies	Views
Truncate String New to Julia strings	13	2561	August 26, 2019
Weird string slicing in korean Performance	3	479	December 29, 2022
StringIndexError some time comes unexpectedly General Usage strings	2	402	March 19, 2020
String indices : byte indexing feels wrong New to Julia strings , unicode	18	1414	December 5, 2023
Julia substring return empty string New to Julia	8	1020	April 23, 2019

String indexing

Related topics