Non greedy regex match

The regex-engine is going to start from the left and find the initial <a href="files/. After that, it will keep adding the minimum number of letters (since .*? is indeed non-greedy) to complete the match, which gives you a total match of "<a href=\"files/Zamren.gml\">GML</a> <a href=\"files/Zamren.graphml\">". So, the syntax for non-greedy matches is indeed *?, but in this case, “non-greedy” doesn’t do what you think it does (find a shorter match later in the string). An easy mistake to make (took me a while, too), and why regexes are often quite tricky.

I would strongly second using a proper HTML parser instead of regexes.

4 Likes