[ACCEPTED]-UTF-8 in PHP regular expressions-utf-8
Updated answer:
This is now tested and working
$post = '9999, škofja loka';
echo preg_match('/^\\d{4},[\\s\\p{L}]+$/u', $post);
\\w
will not 15 work, because it does not contain all unicode 14 letters and contains also [0-9_]
additionally 13 to the letters.
Important is also the u
modifier 12 to activate the unicode mode.
If there can 11 be letters or whitespace after the comma then 10 you should put those into the same character 9 class, in your regex there are 0 or more 8 whitespace after the comma and then there 7 are only letters.
See http://www.regular-expressions.info/php.html for php regex details
The 6 \\p{L}
(Unicode letter) is explained here
Important 5 is also the use of the end of string boundary 4 $
to ensure that really the complete string 3 is verified, otherwise it will match only 2 the first whitespace and ignore the rest 1 for example.
[a-zA-Z]
will match only letters in the range of 4 a-z and A-Z. You have non-US-ASCII letters, and 3 therefore your regex won't match, regardless 2 of the /u
modifier. You need to use the word 1 character escape sequence (\w
).
$post = '9999,škofja loka';
echo preg_match('/^[0-9]{4},[\s]*[\w]+/u', $post);
The problem is your regular expression. You 8 are explicitly saying that you will only 7 accept a b c ... z A B C ... Z
. š
is not in the a-z set. Remember, š
is 6 as different to s
as any other character.
So 5 if you really just want a sequence of letters, then 4 you need to test for the unicode properties. e.g.
echo preg_match('/^[0-9]{4},[\s]*\p{L}+', $post);
That 3 shouuld work because \p{L}
matches any unicode 2 character which is considered a letter. Not 1 just A through Z.
Add a u
, and remember the trailing slash:
echo preg_match('/^[0-9]{4},[\s]*[a-zA-Z]+/u', $post);
Edited:
echo preg_match('/^\d{4},(?:\s|\w)+/u', $post);
0
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.