# Get shortest/longest string in array

What is the best way to retrieve the shortest or longest string in an array? If there are multiple candidates having the same minimum/maximum length, it’s okay to return just the first one.

This can be solved with the snippet below. Is there a better way to do it, though? Thanks in advance.

``````arr = ["hello", "beautiful", "world", "!"]
minl, maxl = extrema(length, arr)

shortest = filter(x -> length(x) == minl, arr) |> first
longest = filter(x -> length(x) == maxl, arr) |> first

# or even...
shortest = arr[findfirst(x -> length(x) == minl, arr)]
longest = arr[findfirst(x -> length(x) == maxl, arr)]
``````
1 Like

I would certainly use `findfirst` rather than `filter` here (since the latter creates a new array unnecessarily). But you could also easily write a function to do this in a single pass:

``````function extrema_elements(f, itr)
i = iterate(itr)
i === nothing && throw(ArgumentError("iterator must be nonempty"))
xmin = xmax = i[1]
vmin = vmax = f(xmin)
while true
i = iterate(itr, i[2])
i === nothing && return (xmin, xmax)
x = i[1]; v = f(x)
if v < vmin
xmin, vmin = x, v
elseif v > vmax
xmax, vmax = x, v
end
end
end
``````

giving:

``````julia> extrema_elements(length, arr)
("!", "beautiful")
``````
3 Likes

I have a feeling this can be done elegantly with Transducers.jl based on this example

1 Like

Thank you for the prompt reply, Professor. The custom function you suggest is great, especially for other collections besides arrays. For example, if instead of an array of strings our input was a set of strings, the filtering from my original post would work, and so would the `findfirst`, but it would require to first collect the set to an array; however, your function would work regardless and without unnecessary copies!

Note also that there is some ambiguity in the definition of “length”:

• `length(s)` is the number of Unicode characters, which might not be what you expect if the text can contain combining characters like diacritical marks, e.g. `length("naïve") == 5` but `length("naïve") == 6`. (If this confuses you, compare `collect("naïve")` to `collect("naïve")` and read about Unicode equivalence.)

• `sizeof(s)` is quicker to compute than `length` (it is `O(1)` rather than `O(n)` where `n` is the length of `s`) and gives the size in bytes. (This equals the `length` for ASCII strings, but not for other characters; for example, `sizeof("🐨") == 4` because the UTF-8 encoding of this emoji character requires 4 bytes. And `sizeof("naïve") == 6` while `sizeof("naïve") == 7`!)

• `textwidth(s)` gives the length of the string as measured in terminal character cells (roughly), so that `textwidth("naïve") == 5`. However, `textwidth("🐨") == 2` because emoji are typically displayed as taking up 2 terminal cells.

• `length(Unicode.graphemes(s))` gives the number of Unicode “graphemes” in `s`, which more closely corresponds to what readers would typically consider “characters”. e.g. `length(Unicode.graphemes("naïve")) == 5` and `length(Unicode.graphemes("🐨")) == 1`. But see the next point:

• The number of on-screen symbols (“glyphs”) is ultimately determined by the font and text-rendering system. For example, the FiraCode font renders the ASCII string `"->"` (2 characters, 2 bytes, 2 graphemes, textwidth 2) as a single `→` glyph via a ligature.

7 Likes

You could also use `foldl`:

``````function extrema_elements(f, itr)
x = first(itr); v = f(x)
val = foldl(itr, init=(x,v,x,v)) do minmax, x
v = f(x)
xmin, vmin, xmax, vmax = minmax
if v < vmin
xmin, vmin = x, v
elseif v > vmax
xmax, vmax = x, v
end
(xmin, vmin, xmax, vmax)
end
return val[1], val[3]
end
``````

I’m not sure whether this is clearer than an explicit loop, however.

Another way (for the non ambiguous strings):
PS: replaced `length` by `sizeof` for speed

``````arr = ["hello", "beautiful", "world", "!"]
arr[sortperm(sizeof.(arr))][1:length(arr)-1:end]
2-element Vector{String}:
"!"
"beautiful"
``````

This variant allocates a bit less:

``````sortperm(sizeof.(arr)) |> z -> (arr[z[1]], arr[z[end]])
("!", "beautiful")
``````
1 Like

You can reduce allocation even further with

``````sortperm(arr, by = length)
``````
1 Like

Why introduce O(n log n) sorting to solve a fundamentally O(n) problem?

3 Likes

The naive implementation is actually hard to beat:

``````function minmax_elements(arr)
minl = maxl = arr[1]
smin = smax = sizeof(minl)
for a in arr
sa = sizeof(a)
if sa > smax
maxl, smax = a, sa
elseif sa < smin
minl, smin = a, sa
end
end
minl, maxl
end

arr = ["hello", "beautiful", "world", "!"];
@btime minmax_elements(\$arr)
5.700 ns (0 allocations: 0 bytes)
("!", "beautiful")
``````
2 Likes

@Seif_Shebl, you have activated the Julia turbo mode!

2 Likes

@Jeff_Emanuel, maybe for the pleasure of learning Julia?
PS: and thanks for the insight.

2 Likes

``````julia> a = ["!","beautiful"]
2-element Vector{String}:
"!"
"beautiful"

julia> findmax(length,a)
(9, "beautiful")

julia> a = ["!","beautiful", "beautifil"]
3-element Vector{String}:
"!"
"beautiful"
"beautifil"

julia> findmax(length,a)
(9, "beautiful")
``````
4 Likes

This is actually slightly suboptimal, as it reads the first element twice.

1 Like

@aramirezreyes, I liked your solution before trying it. It issues error in Win10 Julia 1.6.0-beta1.0:

``````julia> findmax(length,a)
ERROR: MethodError: no method matching findmax(::typeof(length), ::Vector{String})
``````

Any clues?

I thought this was coming in 1.6, but actually it’s not here until 1.7: julia/NEWS.md at master · JuliaLang/julia · GitHub

2 Likes

Not a big deal in exchange for elegance. This minimal implementation is still 10X faster than `findmin` , `findmax` though.

The strange thing is that your implementation is faster than @stevengj’s when using the `sizeof` function. I would have thought those two implementations to be identical, except that the latter only evaluates the first element once, so should be marginally faster, but it’s the opposite.

This one does not look bad:

``````sizeof.(arr) |> z -> arr[argmin(z)], arr[argmax(z)]
``````
1 Like

In Julia 1.5, mine is about 13% slower on my machine. However, in Julia 1.6, mine is 13% faster for the original 4-element array, and 24% faster for a longer array `arr2 = repeat(arr, 100)`. I’m not sure why, though; I suspect it gets into the weeds of when the compiler is inlining things etc.

2 Likes