« Locally Consistent Decomposition of Strings with Applications to Edit Distance Sketching
January 31, 2024, 11:00 AM - 12:00 PM
Location:
Conference Room 301
Rutgers University
CoRE Building
96 Frelinghuysen Road
Piscataway, NJ 08854
Michal Koucký, Charles University
In this talk I will present a new locally consistent decomposition of strings with applications to sketching of edit distance and approximate pattern matching. Edit distance is a string similarity measure that counts how many characters have to be deleted, inserted or substituted in one string to get another one. Our ultimate goal is to be able to compute for each string a short sketch such that from sketches of two strings we can estimate their edit distance. Our new string decomposition allows to build such sketches of certain length.
For a parameter k, each string x is decomposed into blocks that can be described by grammars of size O(k) (using some amount of randomness). If we take two strings x and y of edit distance at most k then their block decomposition uses the same number of grammars and the i-th grammar of x is the same as the i-th grammar of y except for at most k indexes i. The edit distance of x and y equals to the sum of edit distances of pairs of blocks where x and y differ. This decomposition can be used to build edit distance sketches and streaming approximate pattern matching algorithms.
Joint work with Sudatta Bhattacharya.