Rstrip minor bug

Since I’m the one who started this, I would want to fix it myself. However, I don’t think I know enough to do this well. Others would be much more productive than me. Please go ahead run with it if that’s ok with the team. I hope I can contribute more in the future.

I’d recommend going ahead and starting the PR to add a test, you know you will get any help you need, the community is good about that.
If you haven’t tried it yet, Gitter (JuliaLang/julia - Gitter) is a good place to get really quick help.

1 Like

Now that I’m thinking about it, I’m uncertain whether the fix was correct. The PR was created based upon the premise raised by @nalimilan above -

I think strings are now expected to use 1 as their first index

I don’t know the history of that but based upon @ScottPJones’s test case above, I would create a custom “reverse” String type that start from the end and end at the the beginning. So start() returns the end index and endof() returns 1.

In that case, the rstrip code would not work since 1) it would take the char from the wrong end, and 2) the while condition is always ≤ sign and it would never ends!

My other question is how languages that read from right to left is represented in AbstractString? Is that just a display thing or is the string is stored in reverse order? I would tend to think that rstrip should really strip from the right hand side after the display is set. That should be universal.

No, endof is defined to return the last index, it’s completely different from start, which returns the internal iteration state (which is allowed to differ from indices).

OK. Consider the following implementation where iteration is in reverse but indexing is kept as-is, where endof returns the true end-index of the original string.

struct RevAsciiString <: AbstractString
    orig::Vector{UInt8}
    RevAsciiString(s) = new(unsafe_wrap(Vector{UInt8}, s))
end

Base.ncodeunits(s::RevAsciiString) = length(s.orig)
Base.codeunit(s::RevAsciiString) = Type{UInt8} 
Base.isvalid(s::RevAsciiString, i::Int64) = 1 ≤ i ≤ length(s.orig)
Base.sizeof(s::RevAsciiString) = length(s.orig)
Base.endof(s::RevAsciiString) = length(s.orig)

# iteration - ok to be different than index order
Base.start(s::RevAsciiString) = length(s.orig)
Base.next(s::RevAsciiString, i::Int64) = (Char(s.orig[i]), i-1)
Base.done(s::RevAsciiString, i::Int64) = i < 1

Base.getindex(s::RevAsciiString, i::Int64) = Char(s.orig[length(s.orig)-i+1])

julia> s
"edcba"

julia> s[1]
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)

Now, when rstrip is called, it returns a SubString type, which has a custom iteration that’s incompatible, resulting in a crash :worried: It seems that the AbstractString contract needs iteration to be in the natural order (from 1 to n) or else a separate “substring-able” contract would be required. Perhaps that could be a separate discussion.

julia> rstrip(s)
"aError showing value of type SubString{RevAsciiString}:
ERROR: BoundsError: attempt to access "a
Stacktrace:
 [1] checkbounds at ./strings/basic.jl:179 [inlined]
 [2] next at ./strings/substring.jl:65 [inlined]
 [3] escape_string(::IOContext{Base.Terminals.TTYTerminal}, ::SubString{RevAsciiString}, ::String) at ./strings/io.jl:271
 [4] print_quoted(::IOContext{Base.Terminals.TTYTerminal}, ::SubString{RevAsciiString}) at ./strings/io.jl:300
...

Let’s go back to our original goal. We want to create proper test cases for rstrip but it does not depend on start anymore. Further, the current test suite relies on the fact that strip uses both lstrip and rstrip and so the only test cases are for the strip function. That wouldn’t be my style as I would have done it more explicitly on each function but nonetheless it should be sufficient. Wouldn’t it not?

From test/strings/util.jl

    @test strip("") == ""
    @test strip(" ") == ""
    @test strip("  ") == ""
    @test strip("   ") == ""
    @test strip("\t  hi   \n") == "hi"
    @test strip("foobarfoo", ['f','o']) == "bar"
    @test strip("foobarfoo", ('f','o')) == "bar"