[ACCEPTED]-Should I avoid regular expressions?-regex
If you can easily do the same thing with 7 common string operations, then you should 6 avoid using a regular expression.
In most 5 situations regular expressions are used 4 where the same operation would require a 3 substantial amount of common string operations, then 2 there is of course no point in avoiding 1 regular expressions.
Don't avoid them. They're an excellent tool, and 3 when used appropriately can save you a lot of time and effort. Moreover, a 2 good implementation used carefully should 1 not be particularly CPU-intensive.
Overhyped? No. They're extremely powerful 16 and flexible.
Overused? Absolutely. Particularly 15 when it comes to parsing HTML (which frequently 14 comes up here).
This is another of those 13 "right tool for the job" scenarios. Some 12 go too far and try to use it for everything.
You 11 are right though in that you can do many 10 things with substring and/or split. You 9 will often reach a point with those where 8 what you're doing will get so complicated 7 that you have to change method or you just 6 end up writing too much fragile code. Regexes 5 are (relatively) easy to expand.
But hand 4 written code will nearly always be faster. A 3 good example of this is Putting char into a java string for each N characters. The regex solution 2 is terser but has some issues that a hand 1 written loop doesn't and is much slower.
You can substitute "regex" in your question 18 with pretty much any technology and you'll 17 find people who poorly understand the technology 16 or too lazy to learn the technology making 15 such claims.
There is nothing heavy-weight 14 about regular expressions. The most common 13 way that programmers get themselves into 12 trouble using regular expressions is that 11 they try to do too much with a single regular 10 expression. If you use regular expressions 9 for what they're intended (simple pattern 8 matching), you'll be hard-pressed to write 7 procedural code that's more efficient than 6 the equivalent regular expression. Given 5 decent proficiency with regular expressions, the 4 regular expression takes much less time 3 to write, is easier to read, and can be 2 pasted into tools such as RegexBuddy for 1 visualization.
As a basic parser or validator, use a regular 14 expression unless the parsing or validation 13 code you would otherwise write would be 12 easier to read.
For complex parsers (i.e. recursive 11 descent parsers) use regex only to validate 10 lexical elements, not to find them.
The bottom 9 line is, the best regex engines are well 8 tuned for validation work, and in some cases 7 may be more efficient than the code you 6 yourself could write, and in others your 5 code would perform better. Write your code 4 using handwritten state machines or regex 3 as you see fit, but change from regex to 2 handwritten code if performance tests show 1 you that regex is significantly inefficient.
"When you have a hammer, everything looks like a nail."
Regular expressions are a very useful tool; but 26 I agree that they're not necessary for every 25 single place they're used. One positive 24 factor to them is that because they tend 23 to be complex and very heavily used where 22 they are, the algorithms to apply regular 21 expressions tend to be rather well optimized. That 20 said, the overhead involved in learning 19 the regular expressions can be... high. Very 18 high.
Are regular expressions the best tool 17 to use for every applicable situation? Probably 16 not, but on the other hand, if you work 15 with string validation and search all the 14 time, you probably use regular expressions 13 a lot; and once you do, you already have 12 the knowledge necessary to use the tool 11 probably more efficiently and quickly than 10 any other tool. But if you don't have that 9 experience, learning it is effectively a 8 drag on your productivity for that implementation. So 7 I think it depends on the amount of time 6 you're willing to put into learning a new 5 paradigm, and the level of rush involved 4 in your project. Overall, I think regular 3 expressions are very worth learning, but 2 at the same time, that learning process 1 can, frankly, suck.
I think that if you learn programming in 9 language that speaks regular expressions 8 natively you'll gravitate toward them because 7 they just solve so many problems. IE, you 6 may never learn to use split because regexec() can 5 solve a wider set of problems and once you 4 get used to it, why look anywhere else?
On 3 the other hand, I bet C and C++ programmers 2 will for the most part look at other options 1 first, since it's not built into the language.
You know, given the fact that I'm what many 27 people call "young", I've heard too much 26 criticism about RegEx. You know, "he had 25 a problem and tried to use regex, now he 24 has two problems".
Seriously, I don't get 23 it. It is a tool like any other. If you 22 need a simple website with some text, you 21 don't need PHP/ASP.NET/STG44. Still, no 20 discussion on whether any of those should 19 be avoided. How odd.
In my experience, RegEx 18 is probably the most useful tool I've ever 17 encountered as a developer. It's the most 16 useful tool when it comes to #1 security 15 issue: parsing user input. I has saved me 14 hours if not days of coding and creating 13 potentially buggy (read: crappy) code.
With 12 modern CPUs, I don't see what's the performance 11 issue here. I'm quite willing to sacrifice 10 some cycles for some quality and security. (It's 9 not always the case, though, but I think 8 those cases are rare.)
Still, RegEx is very 7 powerful. With great power, comes great 6 responsibility. It doesn't mean you'll use 5 it whenever you can. Only where it's power 4 is worth using.
As someone mentioned above, HTML 3 parsing with RegEx is like a Russian roulette 2 with a fully loaded gun. Don't overdo anything, RegEx 1 included.
You should also avoid floating-point numbers 12 at all cost. That is when you're programming 11 in an embedded-environment.
Seriously: if 10 you're in normal software-development you 9 should actually use regex if you need to 8 do something that can't be achieved with 7 simpler string-operations. I'd say that 6 any normal programmer won't be able to implement 5 something that's best done using regexps 4 in a way that is faster than the correspondig 3 regular expression. Once compiled, a regular 2 expression works as a state-maschine that 1 is optimized to near perfection.
Overhyped? No
Under-Utilized Properly? Yes
0
If more people knew how to use a decent 2 parser generator, there would be fewer people 1 using regular expressions.
In my belief, they are overused by people 17 quite a bit (I've had this discussion a 16 number of times on SO).
But they are a very 15 useful construct because they deliver a 14 lot of expressive power in a very small 13 piece of code.
You only have to look at an 12 example such as a Western Australian car 11 registration number. The RE would be
re.match("[1-9] [A-Z]{3} [0-9]{3}")
whilst 10 the code to check this would be substantially 9 longer, in either a simple 9-if-statement 8 or slightly better looping version.
I hardly 7 ever use complex REs in my code because:
- I know how the RE engines work and I can use domain knowledge to code up faster solutions (that 9-if variant would almost certainly be faster than a one-shot RE compile/execute cycle); and
- I find code more readable if it's broken up logically and commented. This isn't easy with most REs (although I have seen one that allows inline comments).
I 6 have seen people suggest the use of REs for 5 extracting a fixed-size substring at a fixed 4 location. Why these people don't just use 3 substring()
is beyond me. My personal thought is that 2 they're just trying to show how clever they 1 are (but it rarely works).
Don't avoid it, but ask youself if they're 6 the best tool for the task you have to solve. Maybe 5 sometimes regex are difficult to use or 4 debug, but they're really usefull in some 3 situations. The question is to use the apropiate 2 tool for each task, and usually this is 1 not obvious.
There is a very good reason to use regular 14 expressions in scripting languages (such 13 as Ruby, Python, Perl, JavaScript and Lua): parsing 12 a string with carefully optimized regular 11 expression executes faster than the equivalent 10 custom while loop which scans the string 9 character-by-character. For compiled languages 8 (such as C and C++, and also C# and Java 7 most of the time) usually the opposite is 6 true: the custom while loop executes faster.
One 5 more reason why regular expressions are 4 so popular: they express the programmer's 3 intention in an extremely compact way: a 2 single-line regexp can do as much as a 10- or 1 20-line while loop.
Overhyped? No, if you have ever taken a 6 Parsing or Compiler course, you would understand 5 that this is like saying addition and multiplication 4 is overhyped for math problems.
It is a system 3 for solving parsing problems.
some problems 2 are simpler and don't require regular expressions, some 1 are harder and require better tools.
I've seen so many people argue about whether 15 a given regex is correct or not that I'm 14 starting to think the best way to write 13 one is to ask how to do it on StackOverflow 12 and then let the regex gurus fight it out.
I 11 think they're especially useful in JavaScript. JavaScript 10 is transmitted (so should be small) and 9 interpreted from text (although this is 8 changing in the new browsers with V8 and 7 JIT compilation), so a good internal regex 6 engine has a chance to be faster than an 5 algorithm.
I'd say if there is a clear and 4 easy way to do it with string operations, use 3 the string operations. But if you can do 2 a nice regex instead of writing your own 1 state machine interpreter, use the regex.
Regular Expressions are one of the most 3 useful things programmers can learn, they 2 allow to speed up and minimize your code 1 if you know how to handle them.
Regular expressions are often easier to 10 understand than the non-regex equivalent, especially 9 in a language with native regular expressions, especially 8 in a code section where other things that 7 need to be done with regexes are present.
That 6 doesn't meant they're not overused. The 5 only time string.match(/\?/) is better than 4 string.contains('?') is if it's significantly 3 more readable with the surrounding code, or 2 if you know that .contains is implemented 1 with regexes anyway
I often use regex in my IDE to quick fix 4 code. Try to do the following without regex.
glVector3f( -1.0f, 1.0f, 1.0f 3 ); -> glVector3f( center.x - 1.0f, center.y 2 + 1.0f, center.z + 1.0f );
Without regex, it's 1 a pain, but WITH regex...
s/glVector3f\((.*?),(.*?),(.*?)\)/glVector3f(point.x+$1,point.y+$2,point.z+$3)/g
Awesome.
I'd agree that regular expressions are sometimes 18 used inappropriately. Certainly for very 17 simple cases like what you're describing, but 16 also for cases where a more powerful parser 15 is needed.
One consideration is that sometimes 14 you have a condition that needs to do something 13 simple like test for presence of a question 12 mark character. But it's often true that 11 the condition becomes more complex. For 10 example, to find a question mark character 9 that isn't preceded by a space or beginning-of-line, and 8 isn't followed by an alphanumeric character. Or 7 the character may be either a question mark or 6 the Spanish "¿" (which may appear at the 5 start of a word). You get the idea.
If conditions 4 are expected to evolve into something that's 3 less simple to do with a plain call to String.contains("?")
, then 2 it could be easier to code it using a very 1 simple regular expression from the start.
It comes down to the right tool for the 27 job.
I usually hear two arguments against 26 regular expressions: 1) They're computationally 25 inefficient, and 2) They're hard to understand.
Honestly, I 24 can't understand how either are legitimate 23 claims.
1) This may be true in an academic 22 sense. A complex expression can double 21 back on itself may times over. Does it 20 really matter though? How many millions 19 of computations a second can a server processor 18 do these days? I've dealt with some crazy expressions, and 17 I've never seen a regexp be the bottle neck. By 16 far it's DB interaction, followed by bandwidth.
2) Hard 15 for about a week. The most complicated 14 regexp is no more complex than HTML - it's 13 just a familiarity problem. If you needed 12 HTML once every 3 months, would you get 11 it 100% each time? Work with them on a 10 daily basis and they're just as clear as 9 any other language syntax.
I write validation 8 software. REGEXP's are second nature. Every 7 fifth line of code has a regexp, and for 6 the life of me I can't understand why people 5 make a big deal about them. I've never 4 seen a regexp slow down processing, and 3 I've seen even the most dull 'programmers' pick 2 up the syntax.
Regexp's are powerful, efficient, and 1 useful. Why avoid them?
I wouldn't say avoid them entirely, as they 13 are QUITE handy at times. However, it is 12 important to realize the fundamental mechanisms 11 underneath. Depending on your implementation, you 10 could have up to exponential run-time for 9 a search, but since searches are usually 8 bounded by some constant number of backtraces, you 7 can end up with the slowest linear run-time 6 you ever saw.
If you want the best answer, you'll 5 have to examine your particular implementation 4 as well as the data you intend to search 3 on.
From memory, wikipedia has a decent article 2 on regular expressions and the underlying 1 algorithms.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.