[ACCEPTED]-What regular expression features are supported by Solr edismax?-edismax

Accepted answer
Score: 15

Version 4.0 of Lucene will support regex 18 queries directly in the standard query parser 17 using special syntax. I verified that it 16 works on an instance of Solr I am running, built 15 from the subversion trunk in February.

Jira ticket 2604 describes 14 the extension of the standard query parser 13 using special regex syntax, using forward 12 slashes to delimit the regex, similar to 11 syntax in Javascript. It seems to be using 10 the underlying RegexpQuery parser.

So a 9 brief example:

body:/[0-9]{5}/

will match on a five-digit 8 zip code in the textual corpus I have indexed. But, oddly, body:/\d{5}/ did 7 not work for me, and ^ failed as well.

The 6 regex dialect would have to be Java's, but 5 I'm not sure if everything in it works, since 4 I have only done a cursory examination. One 3 would probably have to look carefully at 2 the RegexpQuery code to understand what works and what 1 doesn't.

Score: 4

Regular expressions and (e)dismax are not 18 really comparable. Dismax is meant to work 17 directly with common end-user input, while 16 regular expressions are not typical end-user 15 input.

Also, matching regular-expression-like 14 things with dismax depends largely on text analysis settings and 13 schema design, not on dismax itself. With 12 Solr you typically tailor the schema and 11 text analysis to the concrete search need, possibly 10 doing much of the work at index-time. Regular 9 expressions are at odds with this and even 8 with the basic structure of Lucene inverted 7 indices.

Still, Lucene provides RegexQuery and the 6 newer RegexpQuery. As far as I know, these are not 5 integrated with Solr, but they could be. Start 4 a new item in the Solr issue tracker and happy coding! :)

Keep 3 in mind that regex queries will probably 2 always be slow... but they could have acceptable 1 performance in your case.

More Related questions