Sorting strings containing numbers, so that "A2" < "A10"?


I need to sort strings containing numbers. Is there a simple way to sort them, such that, for example, "A2" < "A10"? That is, I want a lexicographic compare, such that numbers in the string are compared by value.

I guess this should be a version compare. Is it implemented in Julia, maybe in a package?


You could try it along this way:

julia> list(s::AbstractString) = zip(split(s, r"[\d]+"), parse.(split(s, r"[^\d]+")))
list (generic function with 1 method)
julia> lex2(s::AbstractString) = map(x -> x[2] == nothing ? x[1] : x[2], list(s))
lex2 (generic function with 1 method)
julia> islesscustom(a,b) = begin
             for x in (zip(lex2(a), lex2(b)))
                 x[1] < x[2] && return true
islesscustom (generic function with 1 method)
julia> islesscustom("A20", "A3")

julia> islesscustom("A3", "A20")


Depending on the flexibility needed, here is another method that is a bit faster:

function custom_cmp(x::String)
   number_idx = findfirst(isdigit, x)
   str, num = SubString(x, 1, number_idx-1), SubString(x, number_idx, length(x))
   return str, parse(Int, num)

l = ["A5", "A10", "A200", "A20"]

sort(l, by = custom_cmp)
@btime sort(l, lt = islesscustom)
  405.706 μs (577 allocations: 25.81 KiB)

@btime sort(l, by = custom_cmp)
  2.292 μs (33 allocations: 1.31 KiB)


What if there are multiple numbers, “A10B3cd”? Ideally we should have a lexicographic comparison, that detects al the numbers and compares them by value.


As I implied, you need to more clearly define exactly what you expect to put into the comparison. How should “5” vs “A” compare? How should “A53” vs “53A” etc.

julia> function naturalsort(x::Vector{String})
           f = text -> all(isnumber, text) ? Char(parse(Int, text)) : text
           sorter = key -> join(f(c) for c in matchall(r"[0-9]+|[^0-9]+", key))
           sort(x, by=sorter)
naturalsort (generic function with 1 method)

julia> l = ["A5a", "A10b", "A10a", "A200", "A20"];

julia> @btime naturalsort($l)
  10.196 μs (233 allocations: 6.67 KiB)
5-element Array{String,1}:

with inspiration from here.


I don’t know if this will help you now, but if you are the one generating the strings, a good technique is to fill the number with zeros.

The following function is useful

numstring(x, n=3) = string(x + 10^n)[2:end]

With this function,

julia> x = "A" .* numstring.([rand(1:500) for i in 1:10], 5)
10-element Array{String,1}:

julia> sort(x)
10-element Array{String,1}:

This has the added benefit when naming files, they will appear in order in the file system.


You have a trick, but ls, e.g. ls -lv (can’t speak for Windows…) already has what you need to avoid naming like this:

-v natural sort of (version) numbers within text


I did not know that, cool. Thanks. But is still nice to left pad the numbers with zeros