By: Rex Allen

Found a simple and easy to use algorithm for comparing similar spelled words:

http://www.catalysoft.com/articles/StrikeAMatch.html

I came across this today because one of our clients got a FHA mortgage lead and it didn't match any FHA products and wondered why. It turns out the client spelled the property county incorrectly on their application and as a result, it did not match against the FHA table.
Note: FHA has a table of State, County and loan limits on their website.

To solve this problem we will be implementing a two-part approach - SOUNDEX and the Adjacent Character Pairs algorithm on our FHA Query.

First, we begin by using the SQL function SOUNDEX which resolves similar spelled words to a code. While this narrows down the potential matches, we still have about 140 combinations on a State-by-State level that creates problems - e.g. MO with ST. CLAIR, STE. GENEVIEVE, ST. LOUIS, etc.

Second, we apply the Adjacent Character Pairs algorithm through a combination of CFMX and SQL. SQL isn't very good at looping so when we build the query, we'll let CFMX build the algorithm in SQL. Lets say the user entered 'ST LOUIS CETY' on the application (bad spelling). Our approach will start with a SQL template as follows:

SELECT TOP 1
State, County, SOUNDEX(County), ...,
xAlogorithmx
FROM FHA_LOAN_LIMITS
WHERE STATE='MO' and SOUNDEX(COUNTY) = SOUNDEX('ST LOUIS CETY')
ORDER BY
xAlogorithmx
DESC

Our algorithm essentially breaks up the character pairs on the user entered county and does the formula as follows:

100 * [ (2 * No of Found Character Pairs) /
( Number of Character Pairs in User County +
Number of Character Pairs in db County ) ]

The operating assumption is that the number of character pairs in a word is always the length of the word minus 1.

Using CFMX to parse the user entered County and generate the SQL snippet for the algorithm, the generated SQL string looks like:

(200 * (
(select
case when charindex('ST',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex(' L',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex('LO',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex('OU',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex('UI',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex('IS',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex('S ',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex(' C',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex('CE',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex('ET',COUNTY) > 0
Then 1 Else 0 End)
+
(select
case when charindex('TY',COUNTY) > 0
Then 1 Else 0 End)
) ) / ( (LEN('ST LOUIS CETY')-1) + (LEN(COUNTY)-1) )
as SimScore

Now we can drop it into our SQL template replacing xAlgorithmx with the
generated code and then run it in CFQUERY.

Which should return the record for ST. LOUIS CITY with a SimScore of 72
(highest).

About This Tutorial
Author: Rex Allen
Skill Level: Beginner 
 
 
 
Platforms Tested: CFMX
Total Views: 89,589
Submission Date: March 03, 2006
Last Update Date: June 05, 2009
All Tutorials By This Autor: 1
Discuss This Tutorial
  • It looks very good. I have yet to try it but I think i will need it someday. Thanks for that.

Advertisement

Sponsored By...
Deep Tissue Massage and Swedish Massage Services just $39 for a 50 minute massage!