[ACCEPTED]-Creating your own TinyURL-mysql
The tiny url people like to use random tokens 29 because then you can't just troll the tiny 28 url links. "Where does #2 go?" "Oh, cool!" "Where 27 does #3 go?" "Even cooler!" You can type 26 in random characters but it's unlikely you'll 25 hit a valid value.
Since the key is rather 24 sparse (4 values each having 36* possibilities 23 gives you 1,679,616 unique values, 5 gives 22 you 60,466,176) the chance of collisions 21 is small (indeed, it's a desired part of 20 the design) and a good SQL index will make 19 the lookup be trivial (indeed, it's the 18 primary lookup for the url so they optimize 17 around it).
If you really want to avoid the 16 lookup and just unse auto-increment you 15 can create a function that turns an integer 14 into a string of seemingly-random characters 13 with the ability to convert back. So "1" becomes 12 "54jcdn" and "2" becomes "pqmw21". Similar 11 to Base64-encoding, but not using consecutive 10 characters.
(*) I actually like using less 9 than 36 characters -- single-cased, no vowels, and 8 no similar characters (1, l, I). This prevents 7 accidental swear words and also makes it 6 easier for someone to speak the value to 5 someone else. I even map similar charactes 4 to each other, accepting "0" for "O". If 3 you're entirely machine-based you could 2 use upper and lower case and all digits 1 for even greater possibilities.
In the database table, there is an index 7 on the
unique_chars field, so I don't see why that would 6 be slow or inefficient.
UNIQUE KEY `unique_chars` (`unique_chars`)
Don't rush to do 5 premature optimization on something that 4 you think might be slow.
Also, there may 3 be some benefit in a url shortening service 2 that generates random urls instead of sequential 1 urls.
I don't know why you'd bother. The premise 27 of the tutorial is to create a "random" URL. If 26 the random space is large enough, then you 25 can simply rely on pure, dumb luck. If you 24 random character space is 62 characters 23 (A-Za-z0-9), the the 4 characters they use, given 22 a reasonable random number generator, is 21 1 in 62^4, which is 1 in 14,776,336. Five 20 characters is 1 in 916,132,832. So, a conflict 19 is, literally, "1 in a billion".
Obviously, as 18 the documents fill, your odds increase for 17 the chance of a collision.
With 10,000 documents, it's 16 1 in 91,613, almost 1 in 100,000 (for round 15 numbers).
That means, for every new document, you 14 have a 1 in 91,613 chance of hitting the 13 DB again for another pull on the slot machine.
It 12 is not deterministic. It's random. It's 11 luck. In theory, you can hit a string of 10 really, really, bad luck and just get collision 9 after collision after collision. Also, it 8 WILL, eventually, fill up. How many URLs 7 do you plan on hashing?
But if 1 in 91,613 6 odds isn't good enough, boosting it to 6 5 chars makes it more than 1 in 5M for 10,000 4 documents. We're talking almost LOTTO odds 3 here.
Simply put, make the key big enough 2 (7 characters? 8?) and the problem pretty 1 much "wishes" itself out of existence.
Couldn't you encode the URL as Base36 when 12 it's generated, and then decode it when 11 visited - that would allow you to remove 10 the database completely?
A snippet from Channel9:
The 9 formula is simple, just turn the Entry 8 ID of our post, which is a long into a 7 short string by Base-36 encoding it and 6 then stick 'http://ch9.ms/' onto the front of it. This 5 produces reasonably short URLs, and can 4 be computed at either end without any 3 need for a database look up. The result, a 2 URL like http://ch9.ms/A49H is then used in creating the 1 twitter link.
I solved a similar problem by implementing 9 an alogirthm that used to generate serial 8 numbers one-by-one in base36. I had my own oredring of base36 characters all 7 of which are unique. Since it was generating 6 numbers serially I did not have to worry 5 about duplication. Complexity and randomness 4 of the number depends on the ordering of 3 base36 numbers[characters]... that too for 2 public only becuase to my application they 1 are serial numbers :)
Check out this guys functions - http://www.pgregg.com/projects/php/base_conversion/base_conversion.php source 9 - http://www.pgregg.com/projects/php/base_conversion/base_conversion.inc.phps
You can use any base you like, for example 8 to convert 554512 to base 62, call
$tiny = base_base2base(554512, 10, 62); and 7 that evaluates to
$tiny = '2KFk'.
So, just pass in the 6 unique id of the database record.
In a project 5 I used this in a removed a few characters 4 from the
$sChars string, and am using base 58. You 3 can also rearrange the characters in the 2 string if you want the values to be less 1 easy to guess.
You could of course add ordering by simply 4 numbering the urls:
and so on. But if the 3 hash key is indexed in the database (which 2 it obviously should be), the performance 1 boost would be minimal at best.
I wouldn't bother doing ordered enumeration 5 for two reasons:
1) SQL servers are very 4 effective at checking such hash collisions 3 (given correct indexes)
2) That might hurt 2 privacy, as users would be able to easily 1 figure out what other users are tinyurl-ing.
That might work, but the easiest way to 10 accomplish the problem would probably be 9 with hashing. Theoretically speaking, hashing 8 runs in O(1) time, as in, it only has to 7 perform the hash, and then does only one 6 actual hit to the database to retrieve the 5 value. Then, you would introduce complications 4 for checking for hash collisions, but it 3 seems like this is probably what most of 2 the tinyurl providers do. And, a good hash 1 function isn't terribly hard to write.
Use autoincrement on the database, and get 1 the latest id as described by http://www.acuras.co.uk/articles/24-php-use-mysqlinsertid-to-get-the-last-entered-auto-increment-value
Perhaps this is a bit off-answer, but, my 7 general rule for creating always unique keys is 6 simple md5( time() * 100 + rand( 0, 100 5 ) ); There is a one in 100,000 chance that 4 if two people are using the same service 3 at the same second they will get the same 2 result (nie impossible).
That said, md5( rand( 0, n 1 ) ) works too.
I have also created small tinyurl service.
I 19 wrote a script in Python that was generating 18 keys and store in MySQL table named tokens 17 with status U(Unused).
But, I am doing it 16 in offline mode. I have a corn job on my 15 VPS. It runs a script every 10 minutes. The 14 script check if there are less than 1000 13 keys in the table, it keep generating keys 12 and inserting them if they are unique and 11 not already exists in the table until the 10 key's count up to 1000.
For my service, 1000 9 keys for 10 minutes are more than enough, you 8 can set the timing or number of keys generated 7 according to your need.
Now when any tiny 6 url needs to be created on my website, my 5 PHP script just fetch any key which is unused 4 from the table and marked its status as 3 T(taken). PHP script does not have to bother 2 about its uniqueness as my python script 1 already populated only unique keys.
More Related questions