What is the library for arrays of strings?


#1

I often look in the docs to see what methods are available for arrays of strings, but it doesn’t look like there is a lot. Can anyone direct me to where a useful standard library for strings exist? methodswith(Vector{String}) only gives a single result (evalfile), not even join as signature of join is join(io::IO, strings, delim) (strings is Any).

Here are some things I’d like to do:

  1. Find all strings with a given substring.
    desired interface: something like grep(strings::Vector{String}, pattern::Union{String, RegEx}), i.e. grep(strings, "word")
    what I usually have to do instead: find(_-> ! (search(_, pattern) == 0:-1), strings) #or filter

  2. Replace all instances of a pattern.
    desired interface: something like gsub(strings, pattern, replacement)
    what I usually have to do instead: map(_->replace(_, pattern, replacement), strings)

Thanks!


#2

I don’t know of any libraries which do what you’re suggesting, but in 0.5 you can take advantage of generators to write for 1.

find(search(_, pattern) != 0:-1 for _ in  strings)

For 2. you can use comprehensions,

[replace(_, pattern, replacement) for _ in strings]

or alternatively the . broadcast syntax

replace.(strings, [pattern], [replacement])

(It would be nice if we didn’t need the [] wrapper around the last 2 arguments though).
-simon


#3

Also, you can use contains instead of search(...) != 0:-1, though we really should probably combine that function with ismatch (see #19250)


#4

You don’t need the [...] in 0.6 (see https://github.com/JuliaLang/julia/issues/16966):

julia> replace.(["a234", "Foo", "!"], r"[a-z]", "_")
3-element Array{String,1}:
 "_234"
 "F__" 
 "!"   

#5

nice, I did not know the trick with wrapping the other arguments in an array to get the broadcast dot to work for strings (still great news that 0.6 will obviate this). The replace.(...) and the contains.(strings, pattern) syntaxes are perfect, and [string for string in strings if contains(string,pattern)] or strings[contains.(strings, pattern)] (for the filter version) is not too bad :slight_smile:


#6

Hello,

I’m reviving this old topic because I’m trying to do the same as the OP in 0.7.

For the OP’s first question I’m using

somestrings = ["file1" , "hello" , "num123" , "foo_bar"]
filter(x->occursin(pattern,x),somestrings)

This works fine, but I’m wondering if there is a more ideomatic way in 0.7.
I’ve seen that find and search are deprecated and replace was changed.

Many thanks,


#7

I think filter and a closure is perfectly idiomatic. If you want the indices, you can use findall.

The find* family of functions underwent an API redesign, you are doing the right thing when using v0.7 for upgrading your code, after which you can switch to v1.0.


#8

Thanks for the reply!