[ACCEPTED]-PHP: Regex to ignore escaped quotes within quotes-regex
For most strings, you need to allow escaped 19 anything (not just escaped quotes). e.g. you most 18 likely need to allow escaped characters 17 like "\n"
and "\t"
and of course, the escaped-escape: "\\"
.
This 16 is a frequently asked question, and one 15 which was solved (and optimized) long ago. Jeffrey 14 Friedl covers this question in depth (as 13 an example) in his classic work: Mastering Regular Expressions (3rd Edition). Here 12 is the regex you are looking for:
Good:
"([^"\\]|\\.)*"
Version 11 1: Works correctly but is not terribly efficient.
Better:
"([^"\\]++|\\.)*"
or 10 "((?>[^"\\]+)|\\.)*"
Version 2: More efficient if you have possessive 9 quantifiers or atomic groups (See: sin's 8 correct answer which uses the atomic group 7 method).
Best:
"[^"\\]*(?:\\.[^"\\]*)*"
Version 3: More efficient still. Implements 6 Friedl's: "unrolling-the-loop" technique. Does not require possessive 5 or atomic groups (i.e. this can be used 4 in Javascript and other less-featured regex 3 engines.)
Here are the recommended regexes 2 in PHP syntax for both double and single 1 quoted sub-strings:
$re_dq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';
$re_sq = "/'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'/s";
Try a regex like this:
'/"(\\\\[\\\\"]|[^\\\\"])*"/'
A (short) explanation:
" # match a `"`
( # open group 1
\\\\[\\\\"] # match either `\\` or `\"`
| # OR
[^\\\\"] # match any char other than `\` and `"`
)* # close group 1, and repeat it zero or more times
" # match a `"`
The 2 following snippet:
<?php
$text = 'abc "string \\\\ \\" literal" def';
preg_match_all('/"(\\\\[\\\\"]|[^\\\\"])*"/', $text, $matches);
echo $text . "\n";
print_r($matches);
?>
produces:
abc "string \\ \" literal" def
Array
(
[0] => Array
(
[0] => "string \\ \" literal"
)
[1] => Array
(
[0] => l
)
)
as you can see 1 on Ideone.
This has possibilities:
/"(?>(?:(?>[^"\\]+)|\\.)*)"/
/'(?>(?:(?>[^'\\]+)|\\.)*)'/
0
This will leave the quotes outside
(?<=['"])(.*?)(?=["'])
and use 1 global
/g will match all groups
This seems to be as fast as the unrolled 4 loop, based on some cursory benchmarks, but 3 is much easier to read and understand. It 2 doesn't require any backtracking in the 1 first place.
"[^"\\]*(\\.[^"\\]*)*"
According to W3 resources : https://www.w3.org/TR/2010/REC-xpath20-20101214/#doc-xpath-StringLiteral
The general Regex 8 is:
"(\\.|[^"])*"
(+ There is no need to add back-slashes 7 in capturing group when they checked first)
Explain:
"..."
any match between quotes(...)*
The inside can have any length from 0 to Infinity\\.|[^"]
First accept any char that have slash behind | (Or) Then accept any char that is not quotes
The 6 PHP version of the regex with better grouping 5 for better handling of Any Quotes can be like this 4 :
<?php
$str='"First \\" \n Second" then \'This \\\' That\'';
echo $str."\n";
// "First \" \n Second" then 'This \' That'
$RX_inQuotes='/"((\\\\.|[^"])*)"/';
preg_match_all($RX_inQuotes,$str,$r,PREG_SET_ORDER);
echo $r[0][1]."\n";
// First \" \n Second
$RX_inAnyQuotes='/("((\\\\.|[^"])*)")|(\'((\\\\.|[^\'])*)\')/';
preg_match_all($RX_inAnyQuotes,$str,$r,PREG_SET_ORDER);
echo $r[0][2]." --- ".$r[1][5];
// First \" \n Second --- This \' That
?>
Try it: http://sandbox.onlinephpfunctions.com/code/4328cc4dfc09183f7f1209c08ca5349bef9eb5b4
Important Note: In this age, for not sure contents, you 3 have to use u
flag in end of the regex like 2 /.../u
for avoid of destroying multi-byte
strings like UTF-8
, or 1 functions like mb_ereg_match.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.