Rstrip minor bug

This code seems wrong though should be fine under normal circumstances… SubString(s, 1:i) should have been SubString(s, a:i)?

From base/strings/util.jl

function rstrip(s::AbstractString, chars::Chars=_default_delims)
    a = start(s)
    i = endof(s)
    while a ≤ i
        c = s[i]
        j = prevind(s, i)
        c in chars || return SubString(s, 1:i)
        i = j
    end
    SubString(s, a, a-1)
end

I think strings are now expected to use 1 as their first index, so that should always work. But it’s indeed a bit silly to call start in one place and use 1 in another. Would you make a pull request to use a? You can edit the file directly on GitHub.

That’s wrong in any case. start does NOT give you the first index.

So the correct fix is to just replace a with 1.

1 Like

Also ref https://github.com/JuliaLang/julia/pull/25458 for the real way to get non-1 first index.

yuyichao is correct here: it is the last line that should be fixed, to be either Substring(s, 1:0) or simply "".
I think the second might be preferable, because you don’t want to keep around a unnecessary pointer to s, that might prevent it from getting GCed laster.

This is a common confusion, that the iterator state == index, which isn’t always true.

Yeah, that’s right. This is what happens when you try to write generic code without any actual implementations for which indices and iteration state differ. :slight_smile: Since strings are now required to use 1 as their first index, we can as well use 1 everywhere instead of a.

However, we can’t return "" because of type stability.

1 Like

True - for my Strs.jl code, I have an empty_str(::Type{S}) where {S<:...} = ... for that case, so that all “” values are the same (which is fine since these strings are immutable).
Possibly S("") would work, if you have function rstrip(s::S, chars::Chars=_default_delims) where {S<:AbstractString}

Base already defines one for this. (Since * is string concatenation, one returns the empty string.) But it is not applicable in this case because of type stability: rstrip must return a SubString in all cases.

We could return SubString(one(s), 1:0) though.

1 Like

I’d missed that (rather strange IMO) change, to define one for a string.
I suppose I’ll need to add that to Strs.jl for my string types.

It is an inevitable consequence of using * for string concatenation (which has been discussed ad nauseam, so let’s please not re-open that debate here), since one is defined as the multiplicative identity. Add `one` for AbstractString by bdeonovic · Pull Request #19548 · JuliaLang/julia · GitHub

(If we used + for string concatenation, then zero would return the empty string for the same reason.)

I’ve never suggested using + for string concatenation, I, like many others, think that * is a bad form of punning, and I see that it was said that defining one for strings should be taken as an argument against that sort of punning. Add `one` for AbstractString by bdeonovic · Pull Request #19548 · JuliaLang/julia · GitHub and Add `one` for AbstractString by bdeonovic · Pull Request #19548 · JuliaLang/julia · GitHub.

Also, if this was so inevitable, why did it take 8 years since RepString for efficient representation of repeated strings. · JuliaLang/julia@b7faaf9 · GitHub for it to be added?

This has all been debated ad nauseam elsewhere. I’m not interested in re-iterating.

2 Likes

I just wanted to point out that having to define one (or zero) for strings was not such an obvious or inevitable consequence of the choice of * for concatenation.
Is it documented, as part of the API that needs to be implemented by anybody adding a new AbstractString type, as I am doing?

I guess if you define convert, this fallback should take care of it.

1 Like

Opened PR. I hope I did it correctly… first timer :stuck_out_tongue:

https://github.com/JuliaLang/julia/pull/25505

1 Like

Nice! I just verified, the fallback works just fine for my strings package - one(UTF32Str) gives me a “” of type UTF32Str, as expected, without me having to had a method for it.

Welcome! :smiley:

I see the PR was merged without a test! :open_mouth: When would the previous version fail?

Could anyone suggest a test case?

I think it would be an issue of a string type where the value of start(str) is not 1.
You could create a reversed string type, that returned a start value of sizeof(str), for example, and used prevind to walk backwards, for example, to show the bug.

I guess we could change the GenericString type, which is used to test that the string functions do not make more assumptions than the interface allows, to have start return something different from 1.

4 Likes