[ACCEPTED]-Regex to only allow alphanumeric, comma, hyphen, underscore and semicolon-preg-replace

Accepted answer
Score: 31

It's not the comma and semicolon causing 11 your problem; it's the hyphen. Look at the 10 parts of your character class and consider 9 what they mean:

0-9 # Anything from '0' to '9', meaning 0, 1, 2, ... 9
A-Z # Anything from 'A' to 'Z', meaning A, B, C, ... Z
_-, # Anything from '_' to ',', meaning...uh...hmmm.

There's no clear progression 8 from _ to ,, so the regex engine isn't sure 7 what to make of this. In character classes, if 6 you want a hyphen to be interpreted literally, it 5 needs to be at the very beginning or end 4 of the class (or escaped with a backslash). So 3 any of these will work:


As for trimming off 2 the end, you can do all of this in one regex 1 replace:

$data = preg_replace('/[^,;a-zA-Z0-9_-]|[,;]$/s', '', $data);
Score: 2

I believe it's the placement of the hyphen 3 that matters -- has to be at start or end 2 to be a hyphen (literal), otherwise it's 1 being used to define a range.

Score: 1

You can escape the hyphen and put it anywhere 4 in the regex like this \-

As for the trailing 3 semicolons and commas, try this /[,;]+$/ it should 2 match any commas and semicolons at the end 1 even if they are many.

More Related questions