Strange comparison in Base's split method

Moderator’s note: this was split from Non-friendly documentation - #49 by rjmoses, but the initial post is mirrored here as an edited-in quote:

It seems to be a weird way of checking that r is not an empty array:

julia> typeof(0:-1)
UnitRange{Int64}

julia> collect(0:-1)
0-element Array{Int64,1}

julia> a = rand(3)
3-element Array{Float64,1}:
 0.7815764925085253
 0.8167278058427023
 0.6665929228579621

julia> a == 0:-1
false

julia> b = Array{Float64,1}()
0-element Array{Float64,1}

julia> b == 0:-1
true
1 Like

It seems that way to me as well, but it is straight from the Base:split() function:


r = something(findfirst(splitter,str), 0)
if r != 0:-1
j, k = first(r), nextind(str,last(r))
while 0 < j <= n && length(strs) != limit-1

As near as I can guess, findfirst() returns nothing or the index of the character and then something() returns either the index or 0 if nothing.

Then, the if tests the return against the range of 0 or not 0.

I wonder why the author used the range instead of just saying “if r != 0…”.

I wonder how this bit of obfuscation got in?

Really nice catch. And this points to a big thing I see in many beginners (albeit in reverse). Error attribution is hard when you’re learning a new language. In this case, the error is in Julia source, but you’re thinking your mental model is wrong. More frequently I see the reverse.

On this bug: I don’t believe it affects behavior, but it is now a missing optimization I think. It’d be awesome if you’d want to correct it! I think it should be written:

r =findfirst(splitter,str)
if r !== nothing

This mistake has been here for two years and has survived multiple touches from core devs. :slight_smile: It probably wasn’t caught because it’s not going to trip any unit tests, and will only really show up as a performance issue if you have a really long string you’re splitting and the splitter isn’t found.

2 Likes

We are not looking at the same code, are we? I see this at https://github.com/JuliaLang/julia/blob/9b7ea2f0b3bf4f1062eb2cee86f3cfcb197e54eb/base/strings/util.jl#L328

No, that’s exactly where I’m looking! Here’s the archaeology: It had been like this:

https://github.com/JuliaLang/julia/blob/69f9c2ea4738ef24c17fd10151f11bf3ea12f43a/base/strings/util.jl#L293-L294

The sentinel value for search’s not found used to be 0:-1. Two separate API changes moved this to findfirst with a sentinel value of nothing, but those were both huge changes that touches many lines of code and required lots of menial fixes… so this one got missed.

1 Like

Oh, I am looking at Julia latest. The line there reads r = something(findfirst(splitter,str), 0).

Right, the something is part of the bug. What I wrote is how I think it should be written.

1 Like

The “if r !== nothing…” makes it even clearer

I will change my local source and check it out.

Thanks

1 Like

And it passed in the first place because == falls back to === for disparate types, which seems like a bug-breeder. Is there a document providing the rationale for this fallback?

Amusingly enough, I stumbled across a comment (by myself) about this from last year while reviewing a separate PR. My initial assessment in this thread isn’t the whole story because findfirst("asdf", "") returns 1:0, so that branch actually is doing something in real code!

https://github.com/JuliaLang/julia/pull/32348/files#r294443966

1 Like