« search calendars« Theoretical Computer Science Seminar

« Locally Consistent Decomposition of Strings with Applications to Edit Distance Sketching

Locally Consistent Decomposition of Strings with Applications to Edit Distance Sketching

January 31, 2024, 11:00 AM - 12:00 PM

Location:

Conference Room 301

Rutgers University

CoRE Building

96 Frelinghuysen Road

Piscataway, NJ 08854

Michal Koucký, Charles University

In this talk I will present a new locally consistent decomposition of strings with applications to sketching of edit distance and approximate pattern matching. Edit distance is a string similarity measure that counts how many characters have to be deleted, inserted or substituted in one string to get another one. Our ultimate goal is to be able to compute for each string a short sketch such that from sketches of two strings we can estimate their edit distance. Our new string decomposition allows to build such sketches of certain length.

For a parameter k, each string x is decomposed into blocks that can be described by grammars of size O(k) (using some amount of randomness). If we take two strings x and y of edit distance at most k then their block decomposition uses the same number of grammars and the i-th grammar of x is the same as the i-th grammar of y except for at most k indexes i. The edit distance of x and y equals to the sum of edit distances of pairs of blocks where x and y differ. This decomposition can be used to build edit distance sketches and streaming approximate pattern matching algorithms.

Joint work with Sudatta Bhattacharya.