[ACCEPTED]-Finding how similar two strings are-string-matching
Ok, so the standard algorithms are:
1) Hamming distance Only 23 good for strings of the same length, but 22 very efficient. Basically it simply counts 21 the number of distinct characters. Not useful 20 for fuzzy searching of natural language 19 text.
2) Levenstein distance. The Levenstein distance measures 18 distance in terms of the number of "operations" required 17 to transform one string to another. These 16 operations include insertion, deletion and 15 substition. The standard approach of calculating 14 the Levenstein distance is to use dynamic 13 programming.
3) Generalized Levenstein/(Damerau–Levenshtein distance) This distance also takes 12 into consideration transpositions of characters 11 in a word, and is probably the edit distance 10 most suited for fuzzy matching of manually-entered 9 text. The algorithm to compute the distance 8 is a bit more involved than the Levenstein 7 distance (detecting transpositions is not 6 easy). Most common implementations are a 5 modification of the bitap algorithm (like grep).
In 4 general you would probably want to consider 3 an implementation of the third option implemented 2 in some sort of nearest neighbour search 1 based on a k-d tree
- Levenstein distance
- Hamming distance
- soundex
- metaphone
0
the Damerau-Levenshtein distance is similar to the Levenshtein distance, but 3 also includes two-character transposition. the 2 wikipedia page (linked) includes pseudocode 1 that should be fairly trivial to implement.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.