Calculating Levenshtein distance for fun and profit

April 16th, 2012 § Comments Off on Calculating Levenshtein distance for fun and profit § permalink

JavaScript has some pretty weak tools for strings. What method would you use to compare simliar-but-different strings? Say, for example, you have a signup form and when someone is typing in their email address you want to be helpful and check to see if they may have typed “gmial.com” instead of “gmail.com” or “em.com” instead of “me.com”. I give you the Levenshtein distance!

The Levenshtein distance is an oft-used algorithm for determining the similarity of two strings, measured in the number of changes you’d have to make to one string to make it into the other. So strings a distance of 5 would require five letter changes to be the same, a distance of 7 would need seven edits and so on.

For reference, I have used Carlos Rodrigues’ implementation of the Levenshtein distance in previous projects with great success. Yes, it is attached to the String prototype. No, I don’t have a problem with that, nor should you. Arguments about prototype purity will be saved for another day.

Levenshtein, how does it work?

In practice finding the distance between two strings is easy. Just do this:

var barDiff = 'bar'.levenshtein('baz');
console.log('difference: ' + barDiff);
//Outputs: "difference: 1"
var jsDiff = 'javascript'.levenshtein('JavaScript');
console.log('difference: ' + jsDiff);
//Outputs: "difference: 2"

Did you see that? Yes, Levenshtein is a case-sensitive comparison, just like the rest of JavaScript. Keep that in mind and sanitize/mutate your comparisons accordingly.

That’s it! Take this information and go write something clever and useful for your users. Also, keep an eye on my GitHub account for a Backbone extension that will automatically suggest proper email provider spellings for you users.

Way more action, much less to read!

April 16th, 2012 § Comments Off on Way more action, much less to read! § permalink

Blog posts are such a huge investment for me.

If you want to see the day-to-day stuff I’m working on, come find me on my Twitter.

Side note: If you think something basic like natural date parsing in JavaScript is a) simple and b) not yet solved you are wrong on both counts.