[ACCEPTED]-Best HashTag Regex-twitter
If you are pulling statuses containing hashtags 16 from Twitter, you no longer need to find 15 them yourself. You can now specify the 14 include_entities parameter to have Twitter automatically 13 call out mentions, links, and hashtags.
For 12 example, take the following call to statuses/show:
In 11 the resultant JSON, notice the entities object.
You 10 can use the above to locate the specific 9 entities in the tweet (which occur between 8 the string positions denoted by the indices property) and 7 transform them appropriately.
If you just 6 need the regular expression to locate the 5 hashtags, Twitter provides these in an open source library.
Hashtag Match Pattern
After looking at the previous answers here 15 and making some test tweets to see what 14 Twitter liked, I think I've come up with 13 a solid regular expression that should do 12 the trick. It requires lookaround functionality 11 in the regular expression engine, so it 10 might not work with all engines out there. It 9 should still work fine for .NET and PCRE.
According 8 to RegexBuddy, this does the following:
And 7 again, according to RegexBuddy, here is 6 what it matches:
Anything highlighted is 5 part of the match. The darker highlighted 4 part indicates what is returned from the 3 capture.
Edit Dec 2014:
Here's a slightly simplified version 2 from zero323 that should be functionally 1 equivalent:
It depends on whether you want to match 7 hashtags inside other strings ("Some#Word") or 6 things that probably aren't hashtags ("We're 5 #1"). The regex you gave
#\w+ will match in 4 both these cases. If you slightly modify 3 your regex to
\B#\w\w+, you can eliminate these 2 cases and only match hashtags of length 1 greater than 1 on word boundaries.
I tweeted a string with randomly placed 5 hash tags, saw what Twitter did with it, and 4 then tried to match it with a regular expression. Here's 3 what I got:
#face #Fa!ce something 2 #iam#1 #1 #919 #jifdosaj somethin#idfsjoa 9#9#98 9#9f9j#9jlasdjl 1 #jklfdsajl34 #34239 #jkf #a *#1j3rj3
As far as I can tell, this pattern works 7 the best. The others posted here don't take 6 into account that a hashtag starting with 5 numbers is invalid. Please ensure that you 4 only use the second capturing group when 3 you extract the hashtag.
Note, I've also 2 explicitly limited lookaheads and lookbehinds 1 because of their performance penalties.
this is what I use:
this is the one i wrote it looks for word 1 boundaries and only matches hash text
reference: Unicode Table
More Related questions