Storing and Retrieving Labeled Interval Data for Speech and Multimedia

Pierre Wellner

                        AT&T Labs -- Research
                       Florham Park, NJ 07932
                       pierre@research.att.com

Written communication is often stored and archived, but speech communication is rarely recorded, despite the fact that digital speech is not all that expensive to store. A year of telephone conversations, for example, can fit on a tape costing less than fifty dollars. Why not record all those phone calls? The reason most people don't record and keep gigabytes of speech is that there are no flexible, efficient ways to search through it, retrieve selected conversations, and then listen to them quickly. A solution to this problem will enable a wide range of applications and services, and it will involve integration of state-of-the-art technology from the fields of speech processing, information retrieval, and user-interface design.

In the long term, vast amounts of speech and multimedia communication will be routinely recorded, just as both paper and electronic documents are archived today. User interfaces and technology for archival and access of these data will become critical to making it useful. At AT&T Labs - Research, we are investigating a number of applications that store and retrieve large amounts of speech and multimedia data. One key component of these systems is a database for storing and retrieving labeled interval data. These are data that describe properties about specific intervals within the speech or other multimedia stream. Examples include who is talking when, pauses in speech, telephone call-control data, video scene changes, and automatic speech-recognition output.

We call this component the IDB (for interval database), and this paper discusses the manner in which applications will make use of interval data, as well as the requirements that these applications will place on the IDB. We briefly describe and demonstrate two example applications: one for storing and retrieving conference-call recordings and another for storing and retrieving broadcast news television programs. We then discuss the CORBA API we have designed to support these types of applications and describe how we are implementing the IDB so that it will be capable of scaling up to the size and performance required of a fully deployed AT&T service.

Joint work with David Gibbon, Chris Macey, and David Weimer.