Tuesday, December 11, 2007

Search by SOUNDEX and DIFFERENCE


In late nineteenth century, the United States Census decided to create a method by which they can search people even if they don't know the exact spelling of name. This method was called Soundex Phonetic Filling System. This system used simple phonetic algorithm that was capable of reducung any word to a four character alphanumeric code in which the First character of name is not changed and rest of the characters are converted to a numeral.

Now this technique has been introduced in almost every DBMS. We shall discuss two functions from MS-SQL that help us recognise Vikramjit and Vikramjeet as same. But remember that different database systems have implemented different algorithms, which means that phonetic code of MS-SQL may not match to code of ORACLE or MySQL.

SOUNDEX FUNCTION
This function is having a very simple syntax. Just write...
SELECT * FROM <table_name> WHERE SOUNDEX(<var1>) = SOUNDEX(<var2>)
As a result, it will Compare Var1 and Var2. If Var1 has value Vikramjit and Var2 has value VIKRAMJEET, they will be treated equally. Similarily, Ajay=Ajoy, Dave=Daev and Chopsey=Chopsi

DIFFERENCE FUNCTION
If you want to work with Threshold and want to perform mathematical magic on names, this is the function on you.
SELECT * FROM <table_name>WHERE DIFFERENCE(<var1>,<var2>) <= 2
The difference function returns a value between 1 and 4 where 1 is least similar and 4 is most similar.

Happy Querying...


1 comments:

Anonymous said...

Cool!!!