Calculating Levenshtein distance for fun and profit

April 16th, 2012 Comments Off on Calculating Levenshtein distance for fun and profit

JavaScript has some pretty weak tools for strings. What method would you use to compare simliar-but-different strings? Say, for example, you have a signup form and when someone is typing in their email address you want to be helpful and check to see if they may have typed “gmial.com” instead of “gmail.com” or “em.com” instead of “me.com”. I give you the Levenshtein distance!

The Levenshtein distance is an oft-used algorithm for determining the similarity of two strings, measured in the number of changes you’d have to make to one string to make it into the other. So strings a distance of 5 would require five letter changes to be the same, a distance of 7 would need seven edits and so on.

For reference, I have used Carlos Rodrigues’ implementation of the Levenshtein distance in previous projects with great success. Yes, it is attached to the String prototype. No, I don’t have a problem with that, nor should you. Arguments about prototype purity will be saved for another day.

Levenshtein, how does it work?

In practice finding the distance between two strings is easy. Just do this:

var barDiff = 'bar'.levenshtein('baz');
console.log('difference: ' + barDiff);
//Outputs: "difference: 1"
var jsDiff = 'javascript'.levenshtein('JavaScript');
console.log('difference: ' + jsDiff);
//Outputs: "difference: 2"

Did you see that? Yes, Levenshtein is a case-sensitive comparison, just like the rest of JavaScript. Keep that in mind and sanitize/mutate your comparisons accordingly.

That’s it! Take this information and go write something clever and useful for your users. Also, keep an eye on my GitHub account for a Backbone extension that will automatically suggest proper email provider spellings for you users.

Comments are closed.