Co-sponsored by DIMACS and the IEEE Information Theory Society.
1.
Julia Abrahams, Rutgers University
Tutorial - Huffman Code Trees and Problem Variants:
Some Open Questions
Huffman code trees will be reviewed as the solution to the
minimum-average-codeword-length coding problem, and also as the
solution to the minimum-average-number-of-tests sequential
diagnosis or search problem. Variants of the underlying "word
problem" which impose costs or constraints on the paths through
the tree tend to arise in coding problems while costs or constraints
on the interior nodes of the tree tend to arise in testing problems,
although some variants are plausible problems of interest in both
contexts. Several examples of problem variants will be categorized
as instances of constrained Huffman problems or as examples of
Huffman problems with an imposed cost structure.
In order for a Huffman variant problem to be considered resolved,
we would like to have a parsimonious characterization of the set
of candidate trees, an entropy-type bound on the cost of the optimal
tree, and an efficient algorithm for the construction of the optimal
tree. Particular problems which are open in this sense include the
case that partial order constraints are imposed on the leaf nodes
of the tree, the case that the edges in the paths through the tree
occur according to a finite state cost model, and the case in which
costs are associated with setting up distinct tests. These three
examples will be presented in some detail as research challenges.
2.
Joy Thomas, IBM
Huffman Algebras
In this talk we define a class of two operator algebras for which the
Huffman algorithm is optimal, based on a rearrangement inequality by
Hardy, Littlewood and Polya. These algebras, called Huffman algebras,
include both real numbers and independent random variables. Many
examples of such algebras are given. For the case with random weights
of the leaves, we prove the optimality of the tree constructed by the
power of two rule, i.e., the Huffman algorithm assuming identical
weights, when the weights of the leaves are independent and
identically distributed. As a natural extension of our algebras, we
consider tree extensions, and derive an equivalent of entropy for some
of the algebras considered in the talk. Applications of these
algebras include not only the source coding problem, but also circuit
design and scheduling in parallel computers. (Joint work with C.S. Chang)
3.
Kingo Kobayashi, University of Electro Communications
Coding of Trees: From the Information-Theoretic Point of View
The tree structure has often played central roles in both computer
science and information theory. Various expressions for several
classes of trees have been devised in computer science. We can
compare these expressions by the efficiency of "coding" which is
an important concept of information theory. Here, from the coding
point of view, we give some results on the expression of trees.
4.
Lawrence L. Larmore, University of Nevada - Las Vegas
Optimal Trees and the Monge Property
In several cases, it is possible to reduce the problem of finding
an optimal tree to a seemingly unrelated problem in which the Monge
property applies. The first and simplest example is the reduction
of the Huffman coding problem to the concave least weight subsequence
problem. Although in this one simplest case no speedup is achieved,
it is nevertheless instructive. When the technique is extended to more
complex problems, there is typically a speedup of an order of magnitude.
5.
Rudolf Fleischer, Max-Planck Institut Saarbrucken
Self-Organizing Data Structures and Compression: A Survey
Self-organizing data structures like linear lists (with the Move-to-Front
rule, for example) or Splay Trees have been proposed to be used in
efficient implementations of encoders in compression algorithms. But
they can also be used directly to encode data yielding good compression
rates, in particular if used together with the Burrows-Wheeler transform.
In this talk, I will give a short survey on self-organizing lists and
Splay Trees, and how these data structures can be used for data
compression.
6.
Mike Fredman, Rutgers University
(with David M. Cohen, IDA)
Weighted Binary Trees for Concurrent Searching
A traditional cost measure for binary search trees is given by
weighted path length, which measures the expected cost of a single
random search. In this paper we investigate a generalization, the
k-cost, which is suitable for applications involving independent
parallel processors each utilizing a common search tree. The k-cost
is defined to be the expected value of the maximum length of the
search paths encountered during the course of k independently
performed random searches. For example, if k = 1, then the k-cost
represents the traditional weighted path length.
Our results include efficient algorithms for constructing binary
search trees having near optimal k-cost. An interesting feature
of one of our algorithms concerns the fact that it constructs a
search tree that simultaneously has near optimal k-cost for all k;
in fact, the structure of this tree is determined only by the order
permutation of the sequence of probability weights that are to be
assigned to the tree nodes. This stands in contrast to an example
which shows that a tree which is precisely optimal relative to
weighted path length (1-cost) can have k-cost which is exponentially
greater than the k-cost of the optimal tree.
7.
Bob Sedgewick, Princeton University
Is Quicksort Optimal?
This talk describes some relatively bold theoretical results
and some relatively new practical implementations that imply
tight relationships between the number of comparisons used by
a properly implemented Quicksort and the entropy of the set of
keys being sorted.
9.
Veronique Bruyere, University of Mons-Hainaut Le Pentagone
Tutorial - Codes: A Language Theoretic Approach
The topic of this tutorial is the theory of variable-length codes.
This theory was initiated by Schutzenberger in the fifties. It takes
its origin in the coding and information theory developed by Shannon.
In the first part of the tutorial, we survey the definitions and
basic results about codes, with an emphasis on the interconnections
with finite automata.
In the second part, we discuss some of the results recently obtained
in this direction together with some open problems, that might be
interesting for the information theory community and the algorithms
community. When the proofs are simple, we expose them in a way that
the audience has a flavor on the tools used in this domain.
9.
Dominique Perrin, Universite de Marne-la-Vallee
Bifix Codes
Bifix codes are codes which have the prefix property in both
directions. Many interesting -and not widely known - results hold
for this particular class of prefix codes introduced long ago by
Gilbert and Moore and by Schutzenberger. In this talk, I will present
several constructions allowing to embed a bifix code into a maximal
one. The results are due to Zhang and Shen (1995) and to Veronique
Bruyere and myself.
10.
Ian Munro, Waterloo
(with Venatesh Raman)
Succinct Tree Representations for Fast Navigation
We consider applications of large, mostly static, tree structures and
the implementation of abstract data types for the objects: binary tree,
rooted ordered tree and balanced parenthesis expression. We review a
number of representations, and present some new ones, that use an amount
of space within a lower order term of the information theoretic minimum
and support, in constant time, a rich set of navigational operations.
In the case of binary trees, for instance, we can move from a node to
its left or right child or to the parent in constant time while retaining
knowledge of the size of the subtree at which we are positioned.
11.
Erik D. Demaine, Waterloo
(with Ian Munro and Alex Lopez-Ortiz, University of New Brunswick)
Adaptive Computation of the Intersection of a Collection of Sets
This talk discusses the problem of computing the intersection of a
collection of sets, which is motived by the goal of efficiently
evaluating boolean queries in text retrieval systems. An optimal
adaptive algorithm is proposed to solve this problem. Information
theory is an important tool for analyzing this algorithm. In
particular, we study sets of comparisons that prove correctness of
an answer, and study optimal binary encodings of such proofs. The
lengths of these encodings form an information theoretic lower bound.
We present matching upper and lower bounds on the worst-case running
time with respect to the information theoretic lower bound.
12.
Stott Parker, UCLA
The Construction of Huffman Codes is a Submodular ('Convex')
Problem Over a Lattice of Binary Trees
We show that the space of all binary Huffman codes for a finite
alphabet defines a lattice, ordered by the imbalance of the code
trees. Representing code trees as path-length sequences, we show that
the imbalance ordering is closely related to a majorization ordering on
real-valued sequences that correspond to discrete probability density
functions. Furthermore, this tree imbalance is a partial ordering that
is consistent with the total orderings given by either the external
path length (sum of tree path lengths), or the entropy determined by
the tree structure. On the imbalance lattice, we show the weighted
path-length of a tree (the usual objective function for Huffman coding)
is a submodular function, as is the corresponding function on the
majorization lattice. Submodular functions are discrete analogues of
convex functions. These results give perspective on Huffman coding,
and suggest new approaches to coding as optimization over a lattice.
13.
Mihai Sipitca, Georgia Tech
(with David Gillman)
Lossless Encoding of DCT Coefficients for Video Compression
The last stage of many image or video coders is to losslessly encode
an orthogonal transformation of an image or a residual image,
after the transformed image has been quantized, or reduced in numerical
precision. Traditionally, image and video coders use a fixed table for
entropy coding the tokens that represent the quantized transform
coefficients. However, the characteristics of images are
highly non-stationary, i.e., different portions of images and different
images have different statistics which would require different entropy
coding tables for a more efficient encoding.
In our presentation, we will investigate the improvement obtained when
using multiple entropy tables instead of one. Issues addressed include:
1) choosing the variance as the characteristic of a block,
2) table compaction - keeping the memory requirements reasonable at a
minimum coding efficiency cost,
3) presenting the trade-off between memory requirements and the coding
efficiency improvement.
The results presented were obtained for resolutions between SIF
(similar to VHS) and CCIR-601 (broadcast TV). Future work is going to
target the low bit rate range.
14.
Sajal Das, U. North Texas
(with Amiya Bhattacharya)
Predictive Location Management Using Lossless Compression
A user in a cellular or PCS system, while enjoying the freedom to
be mobile, creates an uncertainty about the exact location of the
registered mobile terminal being carried. The system has to search
for the mobile against this uncertainty, to ensure a successful call
delivery within a time constarint. The collector network deployed is
usually organized in the form of a three-tier tree like hierarchy,
viz. with a mobile switching exchange at the root, a number of base
station controllers at the second tier, and several base stations as
leaves at the third tier. Each base station is dedicated to handle
the air-link within one cell.
On a call arrival, the search for the target mobile is inititaed by
the system by paging or polling the cells where the user can possibly
be found. Paging messages originate from mobile switching exchange and
when it reaches a base station, it is broadcast so that the target
mobile can respond if present. To limit the level of uncertainty, the
mobile can report its position from time to time. This mobile-initiated
reporting is called a location update or registration. Both paging and
update contribute to the location management cost, and the location
management problem aims at minimizing this total cost. Any more message
exchange than necessary would lead to wastage of valuable resources such
as wireless bandwidth and mobile equipment power, in effect increasing
the operational cost that the user has to bear.
In this paper, we propose a predictive location management technique
that makes use of path updates as opposed to the widely used position
updates. To enhance predictability, the mobility profile for each mobile
user must be built in the system database. For this, the path traced by
the user in the past needs to be made completely known to the system.
The cost minimization can come out of achieving this information exchange
in its information theoretic limit. The technique evolves out of the
mapping of the user's trail onto a string, as reducing the location
management cost translates to lossless compression of this string. The
compressibility of the fixed-to-variable length encoding of the acclaimed
Lempel-Ziv family of algorithms reduce the update cost, whereas their
built-in predictive power can be effectively used to reduce paging cost.
What we end up with is a family of adaptive on-line algorithms, which are
asymptotically optimal if the user mobility can be modeled as an ergodic
source. We discuss the underlying data structures used for building the
profile on the fly, and how good paging strategies can be designed based on
the profile. We take into consideration the limit on the memory available
on a mobile terminal for executing the proposed class of algorithms.
15.
Yuriy Reznik, RealNetworks
On Compact Level Partitioning of Digital Trees
A modification of digital tree structure is proposed that supports
non-uniform distribution of digits between levels (i.e. each
level may have nodes of different size). We call this distribution
of digits - level partitioning, and the resulting data structure -
LP-tree. We study general properties of LP-trees, and develop an
algorithm that for any given binary tree constructs an adequate
LP-tree that uses minimum amount of memory. We call the resulting
trees - compact LP-trees, and perform detailed analysis of their
properties. It turns out that, on average, the compact LP-tree uses
approximately 20.65% less memory than a binary tree, and even more
importantly, the average complexity of the successful search in
such a tree is O(1) (vs. O(log N) for the standard binary trie).
16.
Mamoru Hoshi, University of Electro Communications
Homophonic Coding by the Algorithm: Preliminary Considerations
We shall show that the idea of the successive refinement
of interval partitions, which plays the key role in the interval
algorithm for random number generation [Han and Hoshi], is directly
applicable also to the homophonic coding.
We propose an efficient and very simple algorithm for homophonic coding
which produces an i.i.d. sequence. The lower and upper bounds for the
expected length of the code generated by the algorithm are given.
In a special case, the expected length of the code generated by the
algorithm is bounded by H(X)/log M + 3, where H(X) is the entropy of a
source X. Homophonic coding with cost is also discussed.
17.
Andrej Brodnik, Lulea Technical University and IMFM,
and Svante Carlsson, Lulea Technical University
Sub-Linear Huffman Decoding: Some Theoretical and Practical Results
In the first part of our presentation we describe a succinct data
structure storing the Huffman encoding that permits sub-linear
decoding in the number of transmitted bits ([BroCar98]). The size of
the extra storage except for the storage of the symbols in the
alphabet for the new data structure is
O(l log N) bits,
where l is the longest Huffman code and N is the number of symbols in
the alphabet. Similar solution was independently presented by Moffat
and Turpin ([MofTur97]).
We use the presented data structure to describe two different decoding
algorithms -- one running in time O(l') and the other in O(log l'),
where l' is a number of distinct lengths of Huffman codes. We proceed
by showing that our algorithm decodes Huffman codes on average in time
O(log log N) for any probability distribution.
In the second part we present practical results obtained from the
implementation of the described algorithms. We show that with such an
efficient decoding algorithm, handling of the input stream of bits
(Huffman codes) becomes as expensive as the decoding itself.
To improve the latter we developed a more detail machine model which
better captures the memory hierarchy. The model was influenced by the
work on the external memory data structures (eg. [Arge96,ArgVit96]).
We conclude the presentation with some comments and open questions
related to the dynamic Huffman encoding ([Vitter87]).
The presented algorithms will be used in the fall release of the
``Phone Book of Slovenia'' ([PhBook]).
References
----------
[Arge96] L. Arge. Efficient External-Memory Data Structures and
Applications. PhD thesis, University of {\AA}rhus, 1996.
[ArgVit96] L. Arge and J.S. Vitter. Optimal dynamic interval
management in external memory. In 37\Th FOCS, 1996.
[BroCar98] A. Brodnik and S. Carlsson. Sub-linear Decoding of Huffman
Codes Almost In-Place IMFM, Technical Report IMFM-(1998)-PS-600.
(ftp://www.ijp.si/pub/preprints/ps/98/pp600.ps)
[MofTur97] A. Moffat and A. Turpin. On implementation of minimum
redundancy prefix codes. IEEE Transactions on Communications,
45(10):1200--1207, 1997.
[Vitter87] J.S. Vitter. Design and analysis of dynamic Huffman
codes. JACM, 34(4):825--845, October 1987.
[PhBook] Telekom Slovenije & ADACTA. Telefonski imenik Slovenije. URL:
http://tis.telekom.si.
Other Workshops