[ACCEPTED]-Regex to find unmatched parentheses-regex
The short answer is that you can't find 4 unmatched parentheses with regular expressions. Regular 3 expressions encode regular languages, whereas the language 2 of all properly matched parentheses is a 1 context-free language.
Here's a sort-of-regex-based solution :)
def balanced?( str, open='(', close=')' )
re = Regexp.new( "[\\#{open}\\#{close}]" )
str.scan(re).inject(0) do |lv,c|
break :overclosed if lv < 0
lv + (c==open ? 1 : -1)
end == 0
end
s1 = "one) ((two) (three) four) (five)))"
s2 = "((one) ((two) (three) four) (five))"
s3 = "((one) ((two) (three) four) (five)"
puts balanced?(s1), #=> false
balanced?(s2), #=> true
balanced?(s3) #=> false
0
Ruby's Oniguruma library can parse LALR(n) grammars, including 8 HTML. Citing the README:
r = Regexp.compile(<<'__REGEXP__'.strip, Regexp::EXTENDED)
(?<element> \g<stag> \g<content>* \g<etag> ){0}
(?<stag> < \g<name> \s* > ){0}
(?<name> [a-zA-Z_:]+ ){0}
(?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}
(?<etag> </ \k<name+1> >){0}
\g<element>
__REGEXP__
p r.match('<foo>f<bar>bbb</bar>f</foo>').captures
The above code is of course 7 much simpler than a real HTML parser, but 6 it matches nested tags. Also, you should 5 note that it is incredibly simple to make 4 a regex which would be very slow (in the 3 range of minutes to parse a 80-symbol string).
It's 2 better to use a real parser like Treetop for this 1 task.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.