Strange comparison in Base's split method

PetrKryslUCSD · April 26, 2020, 1:55pm

Moderator’s note: this was split from Non-friendly documentation - #49 by rjmoses, but the initial post is mirrored here as an edited-in quote:

It seems to be a weird way of checking that r is not an empty array:

julia> typeof(0:-1)
UnitRange{Int64}

julia> collect(0:-1)
0-element Array{Int64,1}

julia> a = rand(3)
3-element Array{Float64,1}:
 0.7815764925085253
 0.8167278058427023
 0.6665929228579621

julia> a == 0:-1
false

julia> b = Array{Float64,1}()
0-element Array{Float64,1}

julia> b == 0:-1
true

rjmoses · April 26, 2020, 4:01pm

It seems that way to me as well, but it is straight from the Base:split() function:

…
r = something(findfirst(splitter,str), 0)
if r != 0:-1
j, k = first(r), nextind(str,last(r))
while 0 < j <= n && length(strs) != limit-1
…

As near as I can guess, findfirst() returns nothing or the index of the character and then something() returns either the index or 0 if nothing.

Then, the if tests the return against the range of 0 or not 0.

I wonder why the author used the range instead of just saying “if r != 0…”.

PetrKryslUCSD · April 26, 2020, 11:34pm

I wonder how this bit of obfuscation got in?

mbauman · April 27, 2020, 4:32pm

Really nice catch. And this points to a big thing I see in many beginners (albeit in reverse). Error attribution is hard when you’re learning a new language. In this case, the error is in Julia source, but you’re thinking your mental model is wrong. More frequently I see the reverse.

On this bug: I don’t believe it affects behavior, but it is now a missing optimization I think. It’d be awesome if you’d want to correct it! I think it should be written:

r =findfirst(splitter,str)
if r !== nothing

This mistake has been here for two years and has survived multiple touches from core devs. It probably wasn’t caught because it’s not going to trip any unit tests, and will only really show up as a performance issue if you have a really long string you’re splitting and the splitter isn’t found.

PetrKryslUCSD · April 27, 2020, 5:46pm

We are not looking at the same code, are we? I see this at https://github.com/JuliaLang/julia/blob/9b7ea2f0b3bf4f1062eb2cee86f3cfcb197e54eb/base/strings/util.jl#L328

mbauman · April 27, 2020, 5:55pm

No, that’s exactly where I’m looking! Here’s the archaeology: It had been like this:

https://github.com/JuliaLang/julia/blob/69f9c2ea4738ef24c17fd10151f11bf3ea12f43a/base/strings/util.jl#L293-L294

The sentinel value for search’s not found used to be 0:-1. Two separate API changes moved this to findfirst with a sentinel value of nothing, but those were both huge changes that touches many lines of code and required lots of menial fixes… so this one got missed.

PetrKryslUCSD · April 27, 2020, 6:01pm

Oh, I am looking at Julia latest. The line there reads r = something(findfirst(splitter,str), 0).

mbauman · April 27, 2020, 6:04pm

Right, the something is part of the bug. What I wrote is how I think it should be written.

rjmoses · April 28, 2020, 12:39am

The “if r !== nothing…” makes it even clearer

I will change my local source and check it out.

Thanks

Ralph_Smith · April 28, 2020, 2:01am

And it passed in the first place because == falls back to === for disparate types, which seems like a bug-breeder. Is there a document providing the rationale for this fallback?

mbauman · April 28, 2020, 1:33pm

Amusingly enough, I stumbled across a comment (by myself) about this from last year while reviewing a separate PR. My initial assessment in this thread isn’t the whole story because findfirst("asdf", "") returns 1:0, so that branch actually is doing something in real code!

https://github.com/JuliaLang/julia/pull/32348/files#r294443966

Topic		Replies	Views
The colon punctuation: Why doesn't 5:1 return [5, 4, 3, 2, 1]? New to Julia question , range , unitrange	8	1083	December 16, 2023
Question about range operator (":") General Usage question	7	574	March 6, 2024
`in(x, y)` and `y in x` behave differently? General Usage question	8	378	April 15, 2024
Redefining x - y = 0 General Usage question	18	2020	November 3, 2018
If and symbol, bug or feature? General Usage syntax , parser	6	578	February 3, 2022

Strange comparison in Base's split method

Related topics