May 22, 2024

Sometimes you receive user input from an input form, but you want to compare to known values and find the closest one.

A beginning computer programmer will asume user perfection and compare two or more strings. But what if there is no exact match… welcome to the real world.

What you need is to compute the *distance* between the two strings. Using the *Levenshtein distance,* that’s the minimum number of characters you need to add or delete or change to convert from on string to the others.

For example

Text a Text ba

is one character apart. You can just add one *b* and get from *Text a* to *Text ab*

Often people mix up case, insert extra spaces, and mix up a few letters. I needed to find best fits despite those challenges.

In my example here, I accept two inputs, convert all strings to lower case, so case will not be considered a difference. And I keep converting multiple consequtive spaces to single spaces until there are not multiple spaces between words.

Notice how the second line has an accented E.

A distance of 2 for such long strings is indicative of a pretty good fit.

So for both TEdit’s, I point them to code which computes the Levenshtein Distnce and print it out. In this case, you would delete the E and add an accented E.

You can try it out here

If you compare against many strings, the one with the lowest Levenshtein distance is the closest one.

procedure TForm1.Edit2Change(Sender: TObject); var distance : integer; begin distance := levenshteindistance( LowerCase( edit1.Text ), LowerCase(edit2.text) ); label1.Caption := 'The distance is ' + IntToStr( distance ); end;

So now all you need is the levenshtein algorithm.

unit levenstein; type tlevarr1 = array of integer; tlevarr2 = array of tlevarr1; Const cost = 2; Function LevenshteinDistance(str1, str2: String): integer; Var mat: tlevarr2; i, j, n, m: integer; Begin // remove all mutliple spcaes repeat str1 := strreplace( str1,' ',' ', True ); str2 := strreplace( str2,' ',' ', True ); until (Pos(' ', str1 ) = 0 ) and ( Pos(' ', str2 ) = 0 ); n := length(str1); m := length(str2); setlength( mat, n+1 ); for i := 0 to n+1 do setlength( mat[i], m+1 ); For i := 0 To n Do Begin mat[i][0] := i; End; For j := 0 To m Do Begin mat[0][ j] := j; End; For i := 1 To n Do begin For j := 1 To m Do begin Begin If str1[i] = str2[j] Then mat[i][ j] := mat[i-1][ j-1] Else mat[i][ j] := min(mat[i-1][j]+1,min(mat[i][j-1]+1, mat[i-1][j-1]+cost)); End; end; end; result := mat[n][m]; End;