The regex-engine is going to start from the left and find the initial <a href="files/
. After that, it will keep adding the minimum number of letters (since .*?
is indeed non-greedy) to complete the match, which gives you a total match of "<a href=\"files/Zamren.gml\">GML</a> <a href=\"files/Zamren.graphml\">"
. So, the syntax for non-greedy matches is indeed *?
, but in this case, “non-greedy” doesn’t do what you think it does (find a shorter match later in the string). An easy mistake to make (took me a while, too), and why regexes are often quite tricky.
I would strongly second using a proper HTML parser instead of regexes.