Split inconsistency?

julia> split(" aa bb")
2-element Vector{SubString{String}}:
 "aa"
 "bb"

julia> split(" aa bb", ' ')
3-element Vector{SubString{String}}:
 ""
 "aa"
 "bb"

julia> split(" aa bb", " ")
3-element Vector{SubString{String}}:
 ""
 "aa"
 "bb"
3 Likes

From the docstring

If dlm is omitted, it defaults to isspace.

yet

julia> split(" aa bb", isspace)
3-element Vector{SubString{String}}:
 ""
 "aa"
 "bb"

julia> split(" aa bb")
2-element Vector{SubString{String}}:
 "aa"
 "bb"

File an issue.

1 Like

Oh, wait!

  split(str::AbstractString, dlm; limit::Integer=0, keepempty::Bool=true)
  split(str::AbstractString; limit::Integer=0, keepempty::Bool=false)

It’s the keepempty kwarg which has a different default.

julia> split(" aa bb"; keepempty=true)
3-element Vector{SubString{String}}:
 ""
 "aa"
 "bb"

julia> split(" aa bb"; keepempty=false)
2-element Vector{SubString{String}}:
 "aa"
 "bb"

I can kinda see where that comes from, even if it is a bit confusing.

1 Like

Yes, you are right, it’s documented

 keepempty: whether empty fields should be kept in the result. Default is false without a dlm argument, true with a dlm argument.

but …

2 Likes

Changing a documented behavior is probably bound o break some code, so I do not believe this will change in the close future.

2 Likes

This behaviour was probably chosen because it’s the behaviour of Perl and Python.

1 Like