Regx what wrong?

programista · September 4, 2020, 1:36pm

Why last row is wrong ?

v=["sp. z o.o. asdas"
"sp. z o.o asdasd" 
"sp. z oo asdas"
"sp. zo.o. asdasd"
"sp. zoo. asddfa"
"sp. z o.o. afdasf"
"sp zoo. afdasf"
"sp.zoo. afdasf"
"spzoo afdasf"]

julia> occursin.(r"sp.+z.+o", v)
9-element BitArray{1}:
 1
 1
 1
 1
 1
 1
 1
 1
 0

Thx Paul

oheil · September 4, 2020, 1:42pm

+ is 1 or more times. Between sp and z there is 0 times anything, so it fails.
You may use * for 0 or more times, like:
occursin.(r"sp.*z.+o", v)

programista · September 4, 2020, 1:53pm

Thanks!
but now is 1 row more : “zespol szkol w przem” , the last one. This solution with las row is wrong. How to find only simillar sp. z o.o. I thing space betewen Chars must be no loenger then 1-2 place. How to do ?

Thanks, stars !
But julia> v=["sp. z o.o. asdas"
       "sp. z o.o asdasd"
       "sp. z oo asdas"
       "sp. zo.o. asdasd"
       "sp. zoo. asddfa"
       "sp. z o.o. afdasf"
       "sp zoo. afdasf"
       "sp.zoo. afdasf"
       "spzoo afdasf"
       "zespol szkol w przem"]

julia> occursin.(r"sp.*z.+o", v)
10-element BitArray{1}:
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1

pdeffebach · September 4, 2020, 1:53pm

Have you tested this on Regex101.com? Always super helpful for debugging regex

programista · September 4, 2020, 1:58pm

thanks I still practice it at https://regexr.com/ but it’s not easy

Paul

oheil · September 4, 2020, 2:00pm

What is exactly your desired outcome?
E.g. . is part of your regex meaning “any character” and part of the strings in the array. So it’s not clear what you want to match.

Henrique_Becker · September 4, 2020, 2:52pm

You mean that they must start with “sp”? In this case, just use ^ as the first character of the regex. This means the regex must match from the start of the string. “zespol szkol w przem” is currently being matched because the substring “spol szko” matches (i.e., “zespol szkol w przem”).

danielw2904 · September 4, 2020, 4:12pm

See

programista · September 5, 2020, 3:45pm

Thanks, ^ works… but the fraze can be evrywhere…
At the monet I gave new patern: r"sp.+z.+o.{1}" works wrong in 5 and 6 rows… Please


julia> v=["sf asd sp. z o.o. asdas "
       "dfs sp. z o.o asdasd "
       "ss sp.zoo. afdasf"
       "ssssp.zoo. afdasf"
       "ss spzoo afdasf"
       "ss  ds zespol szkol w przem"]
6-element Array{String,1}:
 "sf asd sp. z o.o. asdas "
 "dfs sp. z o.o asdasd "
 "ss sp.zoo. afdasf"
 "ssssp.zoo. afdasf"
 "ss spzoo afdasf"
 "ss  ds zespol szkol w przem"

julia>

julia> occursin.(r"sp.+z.+o.{1}", v)
6-element BitArray{1}:
 1
 1
 1
 1
 0
 1

Henrique_Becker · September 5, 2020, 4:14pm

I am not sure what is your question. Are you asking why $ makes the regex to not match any of the strings? This happens because you have defined that the second-to-last character is an o, what is not true for any of the strings. Did you mean to use r"^sp.*z.+o.*$"? I do not think there is a reason to add an $ if you gonna use a .* (or .+) after it.

programista · September 5, 2020, 4:20pm

In my language is very importand offical shortcut : “sp. z o. o.” bat people makes many mistakes ;)I have to find every mistake combination like:: spzoo sp. zoo …
At the moment i found this solution

Why works wrong with 10 rows? (row 9 a can remove in second step)

julia> v=["sf asd sp. z o.o. asdas "
       " dfs sp. z o.o asdasd "
       " ds sp. .z. oo asdas"
       "dsfs sp. zo.o. asdasd"
       "d sp. zoo. asddfa"
       "s sp. z o.o. afdasf"
       "ss sp.zoo. afdasf"
       "ssssp.zoo. afdasf"
       "ss spzoo afdasf"
       "ss  ds zespol szkol w przem"]


julia> occursin.(r"s?p.+z.+o.{0}",v)
10-element BitArray{1}:
 1
 1
 1
 1
 1
 1
 1
 1
 0
 1

Paul

jling · September 5, 2020, 4:37pm

https://regex101.com maybe try to debug with one of those online regex visualized editor

danielw2904 · September 5, 2020, 4:52pm

I think what you are looking for is

julia> spzoo = r"s[.\s]*p[.\s]*z[.\s]*o[.\s]*o\.?";

julia> occursin.(spzoo, v)
10-element BitArray{1}:
 1
 1
 1
 1
 1
 1
 1
 1
 1
 0

The [.\s]* part allows for optional . and whitespace inbetween the characters. \.? is not strictly necessary for occursin but for the match it will return the trailing . if its in the string.

To see what this matches exactly:

julia> [match(spzoo, s).match for s in v if !isnothing(match(spzoo, s))] 
9-element Array{SubString{String},1}:
 "sp. z o.o."
 "sp. z o.o"
 "sp. .z. oo"
 "sp. zo.o."
 "sp. zoo."
 "sp. z o.o."
 "sp.zoo."
 "sp.zoo."
 "spzoo"

EDIT:
You might want to add i after the regex to make it case insensitive

spzoo2 = r"s[.\s]*p[.\s]*z[.\s]*o[.\s]*o\.?"i;
julia> occursin(spzoo2, "s Sp. Z o.O. afdasf")
true

programista · September 5, 2020, 5:18pm

big help and good lesson Thanks

programista · September 6, 2020, 2:35pm

To many line … I need only first 5 lines

rx=r"\b\d{2}-\d{3}\b"

julia> baza1[occursin.(rx,baza1)]
1077386-eleme "45-367"
 "45-367"
 "a 45-367"
 "0 45-367 0" 
"a 45-367 b"
 "tel. 91-321 28 81"
 "58-531 54 52"
 "58-531 54 52"
 "58-531 54 52"
 "58-531 54 52"
 "58-531 54 52"
 "91-321 28 81"
 "12-289 13"
 "12-289 13 31"
 "12-289 13 32"
 "12-289 13 31"
 "12-289 13 31"
 "67-286 24 80"
 "12-289 13 31"
...

Henrique_Becker · September 6, 2020, 2:48pm

The regex below only matches if the dd-ddd is at most preceded by any character and a space and/or followed by a space and any character.

rx=r"^(. )?\d{2}-\d{3}( .)?$"

This is the pattern I have seen at least. In your initial regex you considered that the extremities could only have base 10 digits (this is what the \d means), but in the 5 first lines you have lines in which the extremities are letters like a and b (should this be hex?).

programista · September 6, 2020, 3:07pm

W dniu 2020-09-06 o 16:53, Henrique Becker via JuliaLang pisze:

hex no! All data are just string UTF8

Paul

Tamas_Papp · September 6, 2020, 3:13pm

(I moved this to Offtopic since the discussion is about construction regexs, not Julia code.)

Henrique_Becker · September 6, 2020, 3:16pm

That… was not what I meant. I was asking if you considered a and b to be digits (as you were trying to match them with \d) because the numbers were in base 16 instead of base 10.

Topic		Replies	Views
Correct usage of regex matches New to Julia regex	5	696	May 9, 2021
Problem with regex example in docs New to Julia regex	3	759	December 6, 2021
Strange regex error (bug?) General Usage regex	3	489	July 25, 2022
Regex on byte vector General Usage regex	10	1583	November 10, 2020
Filter dataframe with regular expression New to Julia regex , dataframes	8	2588	February 20, 2025

Regx what wrong?

Related topics