d ← L ##.dist R ⍝ Levenshtein distance. From Jay Foad, [dist] returns the "distance" between simple vectors L and R. The O(m×n) algorithm compares sequences, calculating the minimum number of edits required to transform one into the other. The notional costs of the edit operat- ions are: Insert item: 1 Delete item: 1 Replace item: 1 Copy item: 0 For example, comparing vectors 'Sunday' and 'Saturday' implies the following table, where each item is: 0 + its NW neighbour, if the corresponding items match. 1 + the minimum of its N, W and NW neighbours, otherwise. S a t u r d a y S 0 1 2 3 4 5 6 7 u 1 1 2 2 3 4 5 6 n 2 2 2 3 3 4 5 6 d 3 3 3 3 4 3 4 5 a 4 3 4 4 4 4 3 4 y 5 4 4 5 5 5 4 3 Each number in the table is the edit distance between the sub-strings above and to its left. For example, the 2 in the second row, fourth column, is the dist- ance from 'Su' to 'Satu': two copies and two insertions. Note that, as the cost of insertion and deletion is the same, the function is commutative. 'Satu' → 'Su' : two copies and two deletions. The Levenshtein distance between the two sequences is thus the lower-right table item (3 in the above example). See: http://en.wikipedia.org/wiki/Levenshtein_distance Technical note: dist←{ ⍝ Levenshtein distance. a←(n+1)⍴(⍴⍺)+n←⍴⍵ ⍝ first row of matrix f←⍵{⌊\⍵⌊(⊃⍵),(¯1↓⍵)-1+⍺=⍺⍺} ⍝ iteration step z←⊃f/(⌽⍺),⊂a ⍝ last row of matrix ⊃⌽z } [dist] calculates a row of the matrix at a time. The / iterates over successive characters from ⍺, generating a new matrix row each time. The ⌊\ is key to calc- ulating a whole row quickly. Examples 'Sunday' dist 'Saturday' ⍝ distance between strings. 3 'sitting' dist 'kitten' ⍝ examples from Wikipedia. 3 mons ← 'January' 'February' 'March' mons,← 'April' 'May' 'June' mons,← 'July' 'August' 'September' mons,← 'October' 'November' 'December' mons ∘.dist mons ⍝ distances between months 0 4 6 7 5 5 4 6 9 7 8 8 4 0 7 7 6 7 6 7 8 7 8 7 6 7 0 4 3 5 5 6 9 7 8 8 7 7 4 0 5 5 5 5 8 7 8 8 5 6 3 5 0 4 3 6 9 7 8 8 5 7 5 5 4 0 2 5 8 6 7 7 4 6 5 5 3 2 0 5 9 7 8 8 6 7 6 5 6 5 5 0 9 7 8 8 9 8 9 8 9 8 9 9 0 5 4 3 7 7 7 7 7 6 7 7 5 0 5 4 8 8 8 8 8 7 8 8 4 5 0 3 8 7 8 8 8 7 8 8 3 4 3 0 fuzzy←{({⍵⍳⌊/⍵}(lcase ⍺)∘dist∘lcase¨⍵)⊃⍵} ⍝ fuzzy selection fuzzy∘mons¨ 'dcmbr' 'marching' 'febury' ⍝ fuzzy matches December March February See also: lcase Back to: contents Back to: Workspaces