Rstrip minor bug

tk3369 · January 14, 2018, 7:36pm

Since I’m the one who started this, I would want to fix it myself. However, I don’t think I know enough to do this well. Others would be much more productive than me. Please go ahead run with it if that’s ok with the team. I hope I can contribute more in the future.

ScottPJones · January 14, 2018, 7:57pm

I’d recommend going ahead and starting the PR to add a test, you know you will get any help you need, the community is good about that.
If you haven’t tried it yet, Gitter (JuliaLang/julia - Gitter) is a good place to get really quick help.

tk3369 · January 16, 2018, 4:08am

Now that I’m thinking about it, I’m uncertain whether the fix was correct. The PR was created based upon the premise raised by @nalimilan above -

I think strings are now expected to use 1 as their first index

I don’t know the history of that but based upon @ScottPJones’s test case above, I would create a custom “reverse” String type that start from the end and end at the the beginning. So start() returns the end index and endof() returns 1.

In that case, the rstrip code would not work since 1) it would take the char from the wrong end, and 2) the while condition is always ≤ sign and it would never ends!

github.com

JuliaLang/julia/blob/master/base/strings/util.jl#L179-L188


      
          
          Create a function that checks whether its argument contains `needle`, i.e.
          a function equivalent to `haystack -> contains(haystack, needle)`.
          
          The returned function is of type `Base.Fix2{typeof(contains)}`, which can be
          used to implement specialized methods.
          """
          contains(needle) = Base.Fix2(contains, needle)
          
          """

My other question is how languages that read from right to left is represented in AbstractString? Is that just a display thing or is the string is stored in reverse order? I would tend to think that rstrip should really strip from the right hand side after the display is set. That should be universal.

nalimilan · January 16, 2018, 8:10am

No, endof is defined to return the last index, it’s completely different from start, which returns the internal iteration state (which is allowed to differ from indices).

tk3369 · January 16, 2018, 11:49pm

OK. Consider the following implementation where iteration is in reverse but indexing is kept as-is, where endof returns the true end-index of the original string.

struct RevAsciiString <: AbstractString
    orig::Vector{UInt8}
    RevAsciiString(s) = new(unsafe_wrap(Vector{UInt8}, s))
end

Base.ncodeunits(s::RevAsciiString) = length(s.orig)
Base.codeunit(s::RevAsciiString) = Type{UInt8} 
Base.isvalid(s::RevAsciiString, i::Int64) = 1 ≤ i ≤ length(s.orig)
Base.sizeof(s::RevAsciiString) = length(s.orig)
Base.endof(s::RevAsciiString) = length(s.orig)

# iteration - ok to be different than index order
Base.start(s::RevAsciiString) = length(s.orig)
Base.next(s::RevAsciiString, i::Int64) = (Char(s.orig[i]), i-1)
Base.done(s::RevAsciiString, i::Int64) = i < 1

Base.getindex(s::RevAsciiString, i::Int64) = Char(s.orig[length(s.orig)-i+1])

julia> s
"edcba"

julia> s[1]
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)

Now, when rstrip is called, it returns a SubString type, which has a custom iteration that’s incompatible, resulting in a crash It seems that the AbstractString contract needs iteration to be in the natural order (from 1 to n) or else a separate “substring-able” contract would be required. Perhaps that could be a separate discussion.

julia> rstrip(s)
"aError showing value of type SubString{RevAsciiString}:
ERROR: BoundsError: attempt to access "a
Stacktrace:
 [1] checkbounds at ./strings/basic.jl:179 [inlined]
 [2] next at ./strings/substring.jl:65 [inlined]
 [3] escape_string(::IOContext{Base.Terminals.TTYTerminal}, ::SubString{RevAsciiString}, ::String) at ./strings/io.jl:271
 [4] print_quoted(::IOContext{Base.Terminals.TTYTerminal}, ::SubString{RevAsciiString}) at ./strings/io.jl:300
...

Let’s go back to our original goal. We want to create proper test cases for rstrip but it does not depend on start anymore. Further, the current test suite relies on the fact that strip uses both lstrip and rstrip and so the only test cases are for the strip function. That wouldn’t be my style as I would have done it more explicitly on each function but nonetheless it should be sufficient. Wouldn’t it not?

From test/strings/util.jl

    @test strip("") == ""
    @test strip(" ") == ""
    @test strip("  ") == ""
    @test strip("   ") == ""
    @test strip("\t  hi   \n") == "hi"
    @test strip("foobarfoo", ['f','o']) == "bar"
    @test strip("foobarfoo", ('f','o')) == "bar"

Topic		Replies	Views
Any difference between : or , in the SubString() method? New to Julia	2	280	September 24, 2020
Do not allow SubString to grow Internals & Design proposal	7	1137	July 2, 2017
Replacing strings in specific position indices as in `str_sub()` in `stringr` General Usage strings , unicode	9	1678	November 14, 2022
Breakage due to changes in `String` slicing in v0.7 Internals & Design	35	2305	February 12, 2018
String slicing General Usage	3	2714	October 25, 2018

Rstrip minor bug

Related topics