Dynamic Algorithms for Edit Distance - Part I

May 08, 2024, 9:30 AM - 10:30 AM

Location:

DIMACS Center

Rutgers University

CoRE Building

96 Frelinghuysen Road

Piscataway, NJ 08854

Click here for map.

Tomasz Kociumaka, Max Planck Institute for Informatics

The edit distance (also known as the Levenshtein distance) of two strings is the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. The textbook algorithm determines the edit distance of two length-n strings in O(n²) time, and any polynomial-factor improvement upon this runtime would violate the Orthogonal Vectors Hypothesis (OVH). An established way of circumventing this lower bound is to consider the bounded edit distance problem, where the running time is expressed in terms of not only the length n of the input strings but also the value k of the edit distance: the classic algorithm of Landau and Vishkin [JCSS’88] achieves O(n + k²) time, which is optimal (up to sub-polynomial factors and conditioned on OVH) as a function of n and k. Another direction, with multiple ground-breaking results over the last decade, is approximating edit distance, where the runtime has been brought down to truly sub-quadratic, almost-linear, and even sub-linear at the cost of introducing approximation ranging from constant-factor to sub-polynomial in n.

In this tutorial, I will focus on the dynamic version of the edit distance problem, which asks to maintain the edit distance of two strings that change dynamically, with each update modeled as a single edit (character insertion, deletion, or substitution). First, I will introduce some basic concepts, such as the edit-distance alignment graph, and a folklore approach that combines the Landau–Vishkin algorithm with a classic dynamic strings implementation to achieve Õ(k²) time per update, where the Õ(·) notation hides factors poly-logarithmic in n. I will then show how the framework of Tiskin [SODA'10], built around efficient min-plus multiplication of unit-Monge matrices, can be applied to achieve a dynamic edit distance algorithm that takes Õ(n) time per update [Charalampopoulos, Kociumaka, and Mozes; CPM'20]. While this update time is conditionally optimal in terms of n, a very recent preprint [Gorbachev and Kociumaka; arXiv'24] achieves Õ(k) time per update using a novel divide-and-conquer approach combined with an application of weight-balanced straight-line programs [Charikar, Lehman, Liu, Panigrahy, Prabhakaran, Sahai, and Shelat; STOC'02]. After providing a high-level overview of this result, I will move on to approximating edit distance in the dynamic setting, where a black-box application of existing tools suffices to achieve O(n^{0.5+ε}) update time for maintaining a constant-factor approximation. Building upon the approach of Andoni, Krauthgamer, and Onak [FOCS'10], the current state-of-the-art algorithm [Kociumaka, Mukherjee, and Saha; FOCS'23] maintains an n^{o(1)}-factor approximation in n^{o(1)} time per update.

Videos: [Part 1]  [Part 2]