What is the library for arrays of strings?


#1

I often look in the docs to see what methods are available for arrays of strings, but it doesn’t look like there is a lot. Can anyone direct me to where a useful standard library for strings exist? methodswith(Vector{String}) only gives a single result (evalfile), not even join as signature of join is join(io::IO, strings, delim) (strings is Any).

Here are some things I’d like to do:

  1. Find all strings with a given substring.
    desired interface: something like grep(strings::Vector{String}, pattern::Union{String, RegEx}), i.e. grep(strings, "word")
    what I usually have to do instead: find(_-> ! (search(_, pattern) == 0:-1), strings) #or filter

  2. Replace all instances of a pattern.
    desired interface: something like gsub(strings, pattern, replacement)
    what I usually have to do instead: map(_->replace(_, pattern, replacement), strings)

Thanks!


#2

I don’t know of any libraries which do what you’re suggesting, but in 0.5 you can take advantage of generators to write for 1.

find(search(_, pattern) != 0:-1 for _ in  strings)

For 2. you can use comprehensions,

[replace(_, pattern, replacement) for _ in strings]

or alternatively the . broadcast syntax

replace.(strings, [pattern], [replacement])

(It would be nice if we didn’t need the [] wrapper around the last 2 arguments though).
-simon


#3

Also, you can use contains instead of search(...) != 0:-1, though we really should probably combine that function with ismatch (see #19250)


#4

You don’t need the [...] in 0.6 (see https://github.com/JuliaLang/julia/issues/16966):

julia> replace.(["a234", "Foo", "!"], r"[a-z]", "_")
3-element Array{String,1}:
 "_234"
 "F__" 
 "!"   

#5

nice, I did not know the trick with wrapping the other arguments in an array to get the broadcast dot to work for strings (still great news that 0.6 will obviate this). The replace.(...) and the contains.(strings, pattern) syntaxes are perfect, and [string for string in strings if contains(string,pattern)] or strings[contains.(strings, pattern)] (for the filter version) is not too bad :slight_smile: