I want the number before the string c= “9S8M13S” 's S and then return 9 and 13.And I tried following.

``````julia> c="9S8M13S"
"9S8M13S"
julia> fs=findall(r"\d*S",c)
2-element Vector{UnitRange{Int64}}:
1:2
5:7
julia> first.(fs)
2-element Vector{Int64}:
1
5
julia> last.(fs).-1
2-element Vector{Int64}:
1
6
julia> z=zip(first.(fs),last.(fs).-1)
zip([1, 5], [1, 6])

a=Int[]

for (i,j) in z
push!(a,parse(Int,c[i:j]))
end

julia> a
2-element Vector{Int64}:
9
13
``````

Is there a more direct way to do it? I think it is a little bit undirect using this method.

1 Like

You can use lookaround operators

``````julia> c = "9S45M11S"
"9S45M11S"

julia> eachmatch(r"\d+(?=S)", c)
Base.RegexMatchIterator(r"\d+(?=S)", "9S45M11S", false)
``````

They actually match only the numbers:

``````julia> collect(eachmatch(r"\d+(?=S)", c))
2-element Vector{RegexMatch}:
RegexMatch("9")
RegexMatch("11")
``````

Now, I would have expected this to work, but sadly, and confusingly, it doesn’t:

``````julia> parse.(Int, eachmatch(r"\d+(?=S)", c))
ERROR: MethodError: no method matching parse(::Type{Int64}, ::RegexMatch)
``````

Instead, it seems I must dig around in the internals of the match object, for example like this:

``````julia> [parse(Int, m.match) for m in eachmatch(r"\d+(?=S)", c)]
2-element Vector{Int64}:
9
11
``````

This should be very efficient, but it’s not nice to have to dig into the internal `match` field of the object, and it always seemed to me like an outlier in the language.

3 Likes

You may use capture groups instead of lookarounds:

``````julia> [parse(Int, m[1]) for m in eachmatch(r"(\d+)S", c)]
2-element Vector{Int64}:
9
13
``````
3 Likes

``````julia> parse.(Int, first.(eachmatch(r"(\d+)S", c)))
2-element Vector{Int64}:
9
11
``````

I do wonder, though, why `match` and `eachmatch` don’t simply return the actual matches. There could be a separate `captures`/`eachcapture`, no? When I ask for the match, that’s what I want.

OK.I will try that.Thanks for response.

Using lookahead seems to be slightly faster, with fewer allocations, though:

``````julia> @btime [parse(Int, m[1]) for m in eachmatch(r"(\d+)S", \$c)]
788.506 ns (11 allocations: 656 bytes)
2-element Vector{Int64}:
9
11

julia> @btime [parse(Int, m.match) for m in eachmatch(r"\d+(?=S)", \$c)]
699.291 ns (9 allocations: 560 bytes)
2-element Vector{Int64}:
9
11
``````
1 Like

What do you want it to return? Isn’t this a match?

``````julia> eachmatch(r"(\d)", "a1b2")|>first
RegexMatch("1", 1="1")
``````

I’d like it to return an `AbstractString`, like `SubString`, which is what `m.match` is. I don’t like that I need to reach into a field to get access to the actual string match. It seems un-idiomatic.

1 Like

You’re right, usually struct fields are “implementation details”. But in this case, note that docs tell us that this is the intended way for users to access the matching substring.

Search for the first match of the regular expression r in s and return a RegexMatch object containing the match, or nothing if the match failed. The matching substring can be retrieved by accessing m.match and the captured sequences can be retrieved by accessing m.captures. The optional idx argument specifies an index at which to start the search.

Yeah, I know. I just don’t like that it is different from what I think of as “idiomatic”.

1 Like

Here is a possible alternative to regex.
It seems to be efficient from the example tested, but there might be a catch.

``````numberstrip1(c) = [parse(Int, m.match) for m in eachmatch(r"\d+(?=S)", c)]

function numberstrip2(c)
v = split(c, 'S')[1:end-1]
i = findlast.(!isdigit, v)
return parse.(Int, [isnothing(j) ? u : u[j+1:end] for (j,u) in zip(i,v)])
end

c = repeat("9S8M13S1",1000)

@btime numberstrip1(\$c)                     # 503 μs (6008 allocs: 398 KiB)
@btime numberstrip2(\$c)                     # 232 μs (  17 allocs: 288 KiB)
``````

In recent Julia versions the `Match` indexes by captures:

``````julia> match(r"(\d)", "a5b6")[1]
"5"
``````

This isn’t what you asked for, it seems `m.match` is still needed but `.captures` isn’t.

In fact, there are some potential unmanaged situations.

``````julia> ss ="S91S87M4SgS43SS"
"S91S87M4SgS43SS"

julia> numberstrip2(ss)
ERROR: ArgumentError: input string is empty or only contains whitespace
Stacktrace:

julia> digit_S(ss)
3-element Vector{Any}:
91
4
43

ss ="S91S87M4mSSgS4S3SMS"

digit_S(ss) == ns(ss)
``````

This other scheme to test (to see if it works in “all” cases) seems faster.
Probably by reasoning more carefully it can be further simplified and made faster.
Edited

``````
function digit_S(s)
prev_digit=false
res=Int[]
v=0
for c in s
if isdigit(c)
v=v*10+codepoint(c)-0x30
prev_digit=true
elseif c=='S'&& prev_digit
push!(res,v)
v=0
prev_digit=false
else
v=0
prev_digit=false
end
end
res
end

julia> s ="91S87M4Sg43"
"91S87M4Sg43"

ns(s)=[parse(Int, m.match) for m in eachmatch(r"\d+(?=S)", s)]

julia> s10t = repeat(s,10^4)
...
julia> @btime digit_S(\$s10t);
261.600 μs (9 allocations: 326.55 KiB)

julia> @btime numberstrip2(\$s10t);
3.139 ms (20 allocations: 2.20 MiB)

julia> @btime ns(\$s10t);
5.430 ms (60010 allocations: 3.68 MiB)

julia> ns(s10t)==numberstrip2(s10t)==digit_S(s10t)
true

``````
``````
function digit_cuS(s)
prev_digit=false
res=Int[]
v=0
cus=codeunits(s)
for c in cus
if 0x30 <= c <= 0x39
v=v*10+c-0x30
prev_digit=true
elseif c==codepoint('S')&& prev_digit
push!(res,v)
v=0
prev_digit=false
else
v=0
prev_digit=false
end
end
res
end

julia> @btime digit_cuS(\$s10t);
195.300 μs (9 allocations: 326.55 KiB)
julia> @btime ns(\$s10t);
5.351 ms (60010 allocations: 3.68 MiB)
``````
``````julia> function digit_cuS1(s)
prev_digit=false
res=Vector{Int}(undef,20000)
v=0
cus=codeunits(s)
i=1
for c in cus
if 0x30 <= c <= 0x39
v=v*10+c-0x30
prev_digit=true
elseif c==codepoint('S')&& prev_digit
res[i]=v #push!(res,v)
v=0
prev_digit=false
i+=1
else
v=0
prev_digit=false
end
end
res[1:i-1]
end
digit_cuS1 (generic function with 1 method)

julia> @btime digit_cuS1(\$s10t);
98.000 μs (4 allocations: 312.59 KiB)
``````
``````parse.(Int,filter(!isempty,[ isnothing(findlast(!isdigit, e)) ? e : e[findlast(!isdigit, e)+1:end] for e in split(s10t,'S') ]))
``````
2 Likes

I try your function works well!

But, how to make it works for multiple characters / a word for example before the word Freya:

(it can only works for one letter…)

``````s ="91S87M4Sg43S18FreyaS18S"

function codebreaker(s)
prev_digit=false
res=Vector{Int}(undef,20000)
v=0
cus=codeunits(s)
i=1
for c in cus
if 0x30 <= c <= 0x39
v=v*10+c-0x30
prev_digit=true
elseif c==codepoint('S')&& prev_digit
res[i]=v #push!(res,v)
v=0
prev_digit=false
i+=1
else
v=0
prev_digit=false
end
end
res[1:i-1]
end

# Type codebreaker(s)
``````

Great question! keeps asking like this again, we all learn a lot from each other

``````function codebreaker(s, pattern)
p=codeunits(pattern)
cus=codeunits(s)
prev_digit=false
res=Vector{Int}(undef,count(==(p[1]),cus))
v=0
i=1
for (j,c) in enumerate(cus[1:end-size(p,1)+1])
if 0x30 <= c <= 0x39
v=v*10+c-0x30
prev_digit=true
elseif cus[j:j+size(p,1)-1]==p && prev_digit
res[i]=v #push!(res,v)
v=0
prev_digit=false
i+=1
else
v=0
prev_digit=false
end
end
res[1:i-1]
end

julia> s ="91S87FreyaM97878Freyauy78Freya5Freya999Frey"
"91S87FreyaM97878Freyauy78Freya5Freya999Frey"

julia> codebreaker(s,"Freya")
4-element Vector{Int64}:
87
97878
78
5
``````
2 Likes

@rocco_sprmnt21, you seem to have turned on the turbo on your Ferrari!

Thanks for spotting the limitation. I’ve done a little maintenance here on my Fiat Cinquecento, just trying to get to the destination in one piece:

numberstrip3()
``````function numberstrip3(c, pattern)
d = split(c, pattern)
pop!(d)
v = filter(!isempty, d)
ix = findlast.(!isdigit, v)
s = Int[]
for (i,u) in zip(ix,v)
if isnothing(i)
push!(s, parse(Int,u))
else
x = tryparse(Int, u[i+1:end])
!isnothing(x) && push!(s, x)
end
end
return s
end
``````
1 Like

paradoxically, the generalization of the algorithm from character to string makes it more efficient

``````function codebreaker1(s, pattern)
p=codeunits(pattern)
cus=codeunits(s)
is_prev_digit=false
res=Vector{Int}(undef,count(==(p[1]),cus))
v=0
i=j=1
while j <=size(cus,1)-size(p,1)+1
if 0x30 <= cus[j] <= 0x39
v=v*10+cus[j]-0x30
is_prev_digit=true
elseif cus[j:j+size(p,1)-1]==p && is_prev_digit
res[i]=v
v=0
is_prev_digit=false
i+=1
j+=size(p,1)-1
else
v=0
is_prev_digit=false
end
j+=1
end
res[1:i-1]
end

``````

try this
`bads="565Freyaiu87Frϵyα98wey78FreyauuFreya"`

The Cinquecento spits out :

``````numberstrip3(bads,"Freya")
2-element Vector{Int64}:
565
78
``````

the same result as the Ferrari, but at its own pace.

1 Like