Doesn't capturing groups within non capturing groups work in regex?

I’m looking to capture a pattern like "=2.4.7.9_" and be able to extract the numbers.

I try to match it with r"=(\d+)(:\.(\d+))*_" but that doesn’t seem to work. I expect each number to end up in a capture group but instead it just doesn’t match at all. Why is this? What should I write instead?

That’s incorrect syntax for non-capturing groups. It should be =(\d+)(?:\.(\d+))*_ instead (note the ?).

3 Likes

For some reason I was sure I knew the syntax, but evidently not. I even looked up the syntax at pcre.org, but I still couldn’t see what I had done wrong.

Thanks for spotting it.

For people who might stumble across this topic in the future, I would like to add that using r"=(\d+)(?:\.(\d+))*_" the way I intended doesn’t work, anyway.

The number of capture groups in the match object is statically allocated from the number of capturing parentheses encountered in a lexical analysis of the regular expression. That means, that when repeating the non-capturing group in the above regular expression, the second capturing group of the match object would be replaced.

I wasn’t aware that you can label the capturing groups and use a dictionary-like interface to the match objects, but had I been, this behaviour would have been self evident since keys have to be unique. I ended up capturing the whole collection of numbers and dots in one fell swoop and used split instead.

image

I usually find it super useful to check regexps against e.g regex101.com (see above) or regexpal.com.

1 Like

Yup. Added them to my bookmarks. Ta!