Since I’m the one who started this, I would want to fix it myself. However, I don’t think I know enough to do this well. Others would be much more productive than me. Please go ahead run with it if that’s ok with the team. I hope I can contribute more in the future.
I’d recommend going ahead and starting the PR to add a test, you know you will get any help you need, the community is good about that.
If you haven’t tried it yet, Gitter (JuliaLang/julia - Gitter) is a good place to get really quick help.
Now that I’m thinking about it, I’m uncertain whether the fix was correct. The PR was created based upon the premise raised by @nalimilan above -
I think strings are now expected to use 1 as their first index
I don’t know the history of that but based upon @ScottPJones’s test case above, I would create a custom “reverse” String type that start from the end and end at the the beginning. So start() returns the end index and endof() returns 1.
In that case, the rstrip code would not work since 1) it would take the char from the wrong end, and 2) the while condition is always ≤ sign and it would never ends!
My other question is how languages that read from right to left is represented in AbstractString? Is that just a display thing or is the string is stored in reverse order? I would tend to think that rstrip should really strip from the right hand side after the display is set. That should be universal.
No, endof
is defined to return the last index, it’s completely different from start
, which returns the internal iteration state (which is allowed to differ from indices).
OK. Consider the following implementation where iteration is in reverse but indexing is kept as-is, where endof
returns the true end-index of the original string.
struct RevAsciiString <: AbstractString
orig::Vector{UInt8}
RevAsciiString(s) = new(unsafe_wrap(Vector{UInt8}, s))
end
Base.ncodeunits(s::RevAsciiString) = length(s.orig)
Base.codeunit(s::RevAsciiString) = Type{UInt8}
Base.isvalid(s::RevAsciiString, i::Int64) = 1 ≤ i ≤ length(s.orig)
Base.sizeof(s::RevAsciiString) = length(s.orig)
Base.endof(s::RevAsciiString) = length(s.orig)
# iteration - ok to be different than index order
Base.start(s::RevAsciiString) = length(s.orig)
Base.next(s::RevAsciiString, i::Int64) = (Char(s.orig[i]), i-1)
Base.done(s::RevAsciiString, i::Int64) = i < 1
Base.getindex(s::RevAsciiString, i::Int64) = Char(s.orig[length(s.orig)-i+1])
julia> s
"edcba"
julia> s[1]
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
Now, when rstrip
is called, it returns a SubString type, which has a custom iteration that’s incompatible, resulting in a crash It seems that the AbstractString contract needs iteration to be in the natural order (from 1 to n) or else a separate “substring-able” contract would be required. Perhaps that could be a separate discussion.
julia> rstrip(s)
"aError showing value of type SubString{RevAsciiString}:
ERROR: BoundsError: attempt to access "a
Stacktrace:
[1] checkbounds at ./strings/basic.jl:179 [inlined]
[2] next at ./strings/substring.jl:65 [inlined]
[3] escape_string(::IOContext{Base.Terminals.TTYTerminal}, ::SubString{RevAsciiString}, ::String) at ./strings/io.jl:271
[4] print_quoted(::IOContext{Base.Terminals.TTYTerminal}, ::SubString{RevAsciiString}) at ./strings/io.jl:300
...
Let’s go back to our original goal. We want to create proper test cases for rstrip
but it does not depend on start
anymore. Further, the current test suite relies on the fact that strip
uses both lstrip
and rstrip
and so the only test cases are for the strip
function. That wouldn’t be my style as I would have done it more explicitly on each function but nonetheless it should be sufficient. Wouldn’t it not?
From test/strings/util.jl
@test strip("") == ""
@test strip(" ") == ""
@test strip(" ") == ""
@test strip(" ") == ""
@test strip("\t hi \n") == "hi"
@test strip("foobarfoo", ['f','o']) == "bar"
@test strip("foobarfoo", ('f','o')) == "bar"