How to remove all w

how to remove all two-letter words from an Array String? the numbers remain.
data like this:

julia> baza1[:,5]
1074955-element Array{String,1}:
 "test"
 "mm gastro"
 "mm gastro"
 "mm gastro"
 "pps ps sc"
"00-254"
"00asd asda"

mm ps sc to remove!

From a little bit of debugging on regex101.com I came up with

(^|\s)([a-z][a-z])(\s|$)

where the second capture is those two letter words you don’t want.

1 Like

An exact answer would depend on more detail, like what exactly you want to consider as words, and what to do with the whitespace around words you removed (e.g., what should be the output for "abc de\tfg,\nhi!"?).
Here’s a regex-based solution that keeps all whitespace:

julia> replace("foo bar 12 ab cd αβ 34", r"\b\p{L}{2}\b"=>"")
"foo bar 12    34"

It would be very helpful if you could include what the desired output would be, given your vector.

3 Likes

Try
(?<=\s|^)\w{2}(?=\s|$)

1 Like

Note that this would match 00 in

00 asd asda

but it’s unclear from the provided examples if that’s the desired output.


Maybe try

julia> baza1 = ["test", "mm gastro", "mm gastro", "mm gastro", "pps ps sc", "00-254", "00asd asda", "00 asd asda"];

julia> replace.(baza1, r"(?<=\s|^)\D{2}(?=\s|$)"=>"")
8-element Array{String,1}:
 "test"
 " gastro"
 " gastro"
 " gastro"
 "pps  "
 "00-254"
 "00asd asda"
 "00 asd asda"
1 Like

Just the same Array{String} but only without the shor words
repalce! ?
Paul
W dniu 2020-09-06 o 19:33, Matt Helm via JuliaLang pisze: