May 22, 2024

Sometimes you receive user input from an input form, but you want to compare to known values and find the closest one.

A beginning computer programmer will asume user perfection and compare two or more strings. But what if there is no exact match…​ welcome to the real world.

What you need is to compute the distance between the two strings. Using the Levenshtein distance, that’s the minimum number of characters you need to add or delete or change to convert from on string to the others.

For example

Text a
Text ba

is one character apart. You can just add one b and get from Text a to Text ab

Often people mix up case, insert extra spaces, and mix up a few letters. I needed to find best fits despite those challenges.

In my example here, I accept two inputs, convert all strings to lower case, so case will not be considered a difference. And I keep converting multiple consequtive spaces to single spaces until there are not multiple spaces between words.

sample code

Notice how the second line has an accented E.

A distance of 2 for such long strings is indicative of a pretty good fit.

So for both TEdit’s, I point them to code which computes the Levenshtein Distnce and print it out. In this case, you would delete the E and add an accented E.

You can try it out here

If you compare against many strings, the one with the lowest Levenshtein distance is the closest one.

procedure TForm1.Edit2Change(Sender: TObject);
var
  distance : integer;
begin
  distance := levenshteindistance( LowerCase( edit1.Text ), LowerCase(edit2.text) );
  label1.Caption := 'The distance is ' + IntToStr( distance );
end;

So now all you need is the levenshtein algorithm.

unit levenstein;

type
  tlevarr1 = array of integer;
  tlevarr2 = array of tlevarr1;

Const cost = 2;

Function LevenshteinDistance(str1, str2: String): integer;
Var
  mat: tlevarr2;
  i, j, n, m: integer;

Begin
  // remove all mutliple spcaes
  repeat
     str1 := strreplace( str1,'  ',' ', True );
     str2 := strreplace( str2,'  ',' ', True );
  until (Pos('  ', str1 ) = 0 ) and ( Pos('  ', str2 ) = 0 );

  n := length(str1);
  m := length(str2);

  setlength( mat, n+1 );
  for i := 0 to n+1 do
      setlength( mat[i], m+1 );
  For i := 0 To n Do
    Begin
      mat[i][0] := i;
    End;

  For j := 0 To m Do
    Begin
      mat[0][ j] := j;
    End;

  For i := 1 To n Do  begin
    For j := 1 To m Do begin
      Begin
        If str1[i] = str2[j] Then
          mat[i][ j] := mat[i-1][ j-1]
        Else
          mat[i][ j] := min(mat[i-1][j]+1,min(mat[i][j-1]+1, mat[i-1][j-1]+cost));
      End;
    end;
  end;
  result := mat[n][m];
End;