Sorting names. Surprise

What I’m I missing here?
This is wrong.

julia> sort(["Vila Flor", "Vila da Flor"])
2-element Vector{String}:
 "Vila Flor"
 "Vila da Flor"

correct output should be

 "Vila da Flor"
 "Vila Flor"

What makes you say this is wrong?

1 Like

Capitalization does matter here, since:

julia> 'F' < 'd'
true
5 Likes

Note, you can use the by keyword to give your own custom preprocessing before each argument is compared.

3 Likes

Thanks, that was it. Had tried with Windows name sorting that gave me what I was expecting but possibly because on Win names are case insensitive.

julia> sort(["Vila Flor", "Vila da Flor"], by=uppercase)
2-element Vector{String}:
 "Vila da Flor"
 "Vila Flor"
1 Like

Note that simply doing converting to uppercase might possibly not sort correctly outside of ASCII (correct uppercasing is itself language-dependent). If you need proper locale-dependent sorting (“collation”), I think you can find it in the StrICU package.

Right, but these are Portuguese names so all inside ASCII. And I checked that now the sorting does what is expected when comparing the names to those of another file that are previously sorted.

It might be worth keeping in mind that:

julia> sort(["a", "ã", "â", "c", "ç", "e", "ê", "o", "õ", "z"], by=uppercase)
10-element Array{String,1}:
 "a"
 "c"
 "e"
 "o"
 "z"
 "â"
 "ã"
 "ç"
 "ê"
 "õ"
2 Likes