[ACCEPTED]-Best HashTag Regex-twitter

Accepted answer
Score: 39

If you are pulling statuses containing hashtags 16 from Twitter, you no longer need to find 15 them yourself. You can now specify the 14 include_entities parameter to have Twitter automatically 13 call out mentions, links, and hashtags.

For 12 example, take the following call to statuses/show:

http://api.twitter.com/1/statuses/show/60183527282577408.json?include_entities=true

In 11 the resultant JSON, notice the entities object.

"entities":{"urls":[{"expanded_url":null,"indices":[68,88],"url":"http:\/\/bit.ly\/gWZmaJ"}],"user_mentions":[],"hashtags":[{"text":"wordpress","indices":[89,99]}]}

You 10 can use the above to locate the specific 9 entities in the tweet (which occur between 8 the string positions denoted by the indices property) and 7 transform them appropriately.

If you just 6 need the regular expression to locate the 5 hashtags, Twitter provides these in an open source library.

Hashtag Match Pattern

(^|[^&\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7])(#|\uFF03)(?!\uFE0F|\u20E3)([\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*[\p{L}\p{M}][\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*)

The 4 above pattern can be pieced together from 3 this java file (retrieved 2015-11-23). Validation 2 tests for this pattern are located in this file around 1 line 128.

Score: 30

After looking at the previous answers here 15 and making some test tweets to see what 14 Twitter liked, I think I've come up with 13 a solid regular expression that should do 12 the trick. It requires lookaround functionality 11 in the regular expression engine, so it 10 might not work with all engines out there. It 9 should still work fine for .NET and PCRE.

(?:(?<=\s)|^)#(\w*[A-Za-z_]+\w*)

According 8 to RegexBuddy, this does the following: RegexBuddy Create View

And 7 again, according to RegexBuddy, here is 6 what it matches: RegexBuddy Test View

Anything highlighted is 5 part of the match. The darker highlighted 4 part indicates what is returned from the 3 capture.

Edit Dec 2014:
Here's a slightly simplified version 2 from zero323 that should be functionally 1 equivalent:

(?<=\s|^)#(\w*[A-Za-z_]+\w*)
Score: 12

It depends on whether you want to match 7 hashtags inside other strings ("Some#Word") or 6 things that probably aren't hashtags ("We're 5 #1"). The regex you gave #\w+ will match in 4 both these cases. If you slightly modify 3 your regex to \B#\w\w+, you can eliminate these 2 cases and only match hashtags of length 1 greater than 1 on word boundaries.

Score: 5

I tweeted a string with randomly placed 5 hash tags, saw what Twitter did with it, and 4 then tried to match it with a regular expression. Here's 3 what I got:

\B#\w*[a-zA-Z]+\w*

#face #Fa!ce something 2 #iam#1 #1 #919 #jifdosaj somethin#idfsjoa 9#9#98 9#9f9j#9jlasdjl 1 #jklfdsajl34 #34239 #jkf #a *#1j3rj3

Score: 1

As far as I can tell, this pattern works 7 the best. The others posted here don't take 6 into account that a hashtag starting with 5 numbers is invalid. Please ensure that you 4 only use the second capturing group when 3 you extract the hashtag.

(^|\s)#([A-Za-z_][A-Za-z0-9_]*)

Note, I've also 2 explicitly limited lookaheads and lookbehinds 1 because of their performance penalties.

enter image description here

Score: 1

this is what I use:

/#(\w*[0-9a-zA-Z]+\w*[0-9a-zA-Z])/g

link of the hashtag Regex to test

CavalcanteLeo

0

Score: 0

this is the one i wrote it looks for word 1 boundaries and only matches hash text (?<=#)\w*?(?=\W).

Score: 0

/#((\w|[\u00C0-\uFFDF])+)/g

reference: Unicode Table

0

More Related questions