how to remove all two-letter words from an Array String? the numbers remain.
data like this:
julia> baza1[:,5]
1074955-element Array{String,1}:
"test"
"mm gastro"
"mm gastro"
"mm gastro"
"pps ps sc"
"00-254"
"00asd asda"
mm ps sc to remove!
how to remove all two-letter words from an Array String? the numbers remain.
data like this:
julia> baza1[:,5]
1074955-element Array{String,1}:
"test"
"mm gastro"
"mm gastro"
"mm gastro"
"pps ps sc"
"00-254"
"00asd asda"
mm ps sc to remove!
From a little bit of debugging on regex101.com I came up with
(^|\s)([a-z][a-z])(\s|$)
where the second capture is those two letter words you don’t want.
An exact answer would depend on more detail, like what exactly you want to consider as words, and what to do with the whitespace around words you removed (e.g., what should be the output for "abc de\tfg,\nhi!"
?).
Here’s a regex-based solution that keeps all whitespace:
julia> replace("foo bar 12 ab cd αβ 34", r"\b\p{L}{2}\b"=>"")
"foo bar 12 34"
It would be very helpful if you could include what the desired output would be, given your vector.
Try
(?<=\s|^)\w{2}(?=\s|$)
Note that this would match 00
in
00 asd asda
but it’s unclear from the provided examples if that’s the desired output.
Maybe try
julia> baza1 = ["test", "mm gastro", "mm gastro", "mm gastro", "pps ps sc", "00-254", "00asd asda", "00 asd asda"];
julia> replace.(baza1, r"(?<=\s|^)\D{2}(?=\s|$)"=>"")
8-element Array{String,1}:
"test"
" gastro"
" gastro"
" gastro"
"pps "
"00-254"
"00asd asda"
"00 asd asda"
Just the same Array{String} but only without the shor words
repalce! ?
Paul
W dniu 2020-09-06 o 19:33, Matt Helm via JuliaLang pisze: