Cleaning up strings via piping...?

In general, I would avoid creating a whole sequence of intermediate arrays. map with the do block sequence is pretty clear and avoids the intermediate arrays:

convert_clean(arr) = map(arr) do x
    s = string(x)
    s = Unicode.normalize(s, stripmark=true)
    s = replace(s, r"[^a-zA-Z0-9_]" => "")
end

for example.

You could also use .|> here to apply a bunch of functions elementwise, but it is a bit awkward because of the need to explicitly construct anonymous functions:

convert_clean(arr) = arr .|> string .|>
         s -> Unicode.normalize(s, stripmark=true)  .|>
         s -> replace(s, r"[^a-zA-Z0-9_]" => "")

Hopefully someday you will be able to use a magic underscore

convert_clean(arr) = arr .|> string .|>
         Unicode.normalize(_, stripmark=true)  .|>
         replace(_, r"[^a-zA-Z0-9_]" => "")

but not yet.

3 Likes