Thanks!
but now is 1 row more : “zespol szkol w przem” , the last one. This solution with las row is wrong. How to find only simillar sp. z o.o. I thing space betewen Chars must be no loenger then 1-2 place. How to do ?
Thanks, stars !
But julia> v=["sp. z o.o. asdas"
"sp. z o.o asdasd"
"sp. z oo asdas"
"sp. zo.o. asdasd"
"sp. zoo. asddfa"
"sp. z o.o. afdasf"
"sp zoo. afdasf"
"sp.zoo. afdasf"
"spzoo afdasf"
"zespol szkol w przem"]
julia> occursin.(r"sp.*z.+o", v)
10-element BitArray{1}:
1
1
1
1
1
1
1
1
1
1
What is exactly your desired outcome?
E.g. . is part of your regex meaning “any character” and part of the strings in the array. So it’s not clear what you want to match.
You mean that they must start with “sp”? In this case, just use ^ as the first character of the regex. This means the regex must match from the start of the string. “zespol szkol w przem” is currently being matched because the substring “spol szko” matches (i.e., “zespol szkol w przem”).
I am not sure what is your question. Are you asking why $ makes the regex to not match any of the strings? This happens because you have defined that the second-to-last character is an o, what is not true for any of the strings. Did you mean to use r"^sp.*z.+o.*$"? I do not think there is a reason to add an $ if you gonna use a .* (or .+) after it.
In my language is very importand offical shortcut : “sp. z o. o.” bat people makes many mistakes ;)I have to find every mistake combination like:: spzoo sp. zoo …
At the moment i found this solution
Why works wrong with 10 rows? (row 9 a can remove in second step)
The [.\s]* part allows for optional . and whitespace inbetween the characters. \.? is not strictly necessary for occursin but for the match it will return the trailing . if its in the string.
To see what this matches exactly:
julia> [match(spzoo, s).match for s in v if !isnothing(match(spzoo, s))]
9-element Array{SubString{String},1}:
"sp. z o.o."
"sp. z o.o"
"sp. .z. oo"
"sp. zo.o."
"sp. zoo."
"sp. z o.o."
"sp.zoo."
"sp.zoo."
"spzoo"
EDIT:
You might want to add i after the regex to make it case insensitive
The regex below only matches if the dd-ddd is at most preceded by any character and a space and/or followed by a space and any character.
rx=r"^(. )?\d{2}-\d{3}( .)?$"
This is the pattern I have seen at least. In your initial regex you considered that the extremities could only have base 10 digits (this is what the \d means), but in the 5 first lines you have lines in which the extremities are letters like a and b (should this be hex?).
That… was not what I meant. I was asking if you considered a and b to be digits (as you were trying to match them with \d) because the numbers were in base 16 instead of base 10.