DIMACS Workshop on Codes and Trees: Algorithmic and Information Theoretic Approaches

October 5 - 7, 1998
DIMACS Center, CoRE Building (Room 431), Rutgers University, Busch Campus, Piscataway, NJ


Presented under the auspices of the Special Year on Massive Data Sets.

Co-sponsored by DIMACS and the IEEE Information Theory Society.



Julia Abrahams, Rutgers University 
Tutorial - Huffman Code Trees and Problem Variants:
Some Open Questions

Huffman code trees will be reviewed as the solution to the
minimum-average-codeword-length coding problem, and also as the 
solution to the minimum-average-number-of-tests sequential 
diagnosis or search problem. Variants of the underlying "word 
problem" which impose costs or constraints on the paths through 
the tree tend to arise in coding problems while costs or constraints 
on the interior nodes of the tree tend to arise in testing problems, 
although some variants are plausible problems of interest in both 
contexts.  Several examples of problem variants will be categorized 
as instances of constrained Huffman problems or as examples of 
Huffman problems with an imposed cost structure.  

In order for a Huffman variant problem to be considered resolved, 
we would like to have a parsimonious characterization of the set 
of candidate trees, an entropy-type bound on the cost of the optimal 
tree, and an efficient algorithm for the construction of the optimal 
tree.  Particular problems which are open in this sense include the
case that partial order constraints are imposed on the leaf nodes
of the tree, the case that the edges in the paths through the tree 
occur according to a finite state cost model, and the case in which 
costs are associated with setting up distinct tests.  These three
examples will be presented in some detail as research challenges. 


Joy Thomas, IBM

Huffman Algebras

In this talk we define a class of two operator algebras for which the
Huffman algorithm is optimal, based on a rearrangement inequality by
Hardy, Littlewood and Polya.  These algebras, called Huffman algebras,
include both real numbers and independent random variables.  Many
examples of such algebras are given.  For the case with random weights
of the leaves, we prove the optimality of the tree constructed by the
power of two rule, i.e., the Huffman algorithm assuming identical
weights, when the weights of the leaves are independent and
identically distributed.  As a natural extension of our algebras, we
consider tree extensions, and derive an equivalent of entropy for some
of the algebras considered in the talk.  Applications of these
algebras include not only the source coding problem, but also circuit
design and scheduling in parallel computers. (Joint work with  C.S. Chang)


Kingo Kobayashi, University of Electro Communications 

Coding of Trees: From the Information-Theoretic Point of View

The tree structure has often played central roles in both computer
science and information theory. Various expressions for several
classes of trees have been devised in computer science. We can 
compare these expressions by the efficiency of "coding" which is 
an important concept of information theory. Here, from the coding 
point of view, we give some results on the expression of trees.


Lawrence L. Larmore, University of Nevada - Las Vegas

Optimal Trees and the Monge Property

In several cases, it is possible to reduce the problem of finding
an optimal tree to a seemingly unrelated problem in which the Monge
property applies.  The first and simplest example is the reduction
of the Huffman coding problem to the concave least weight subsequence
problem.  Although in this one simplest case no speedup is achieved,
it is nevertheless instructive.  When the technique is extended to more
complex problems, there is typically a speedup of an order of magnitude.


Rudolf Fleischer, Max-Planck Institut Saarbrucken

Self-Organizing Data Structures and Compression: A Survey

Self-organizing data structures like linear lists (with the Move-to-Front
rule, for example) or Splay Trees have been proposed to be used in 
efficient implementations of encoders in compression algorithms. But
they can also be used directly to encode data yielding good compression
rates, in particular if used together with the Burrows-Wheeler transform.

In this talk, I will give a short survey on self-organizing lists and
Splay Trees, and how these data structures can be used for data


Mike Fredman, Rutgers University
(with David M. Cohen, IDA)

Weighted Binary Trees for Concurrent Searching

A traditional cost measure for binary search trees is given by
weighted path length, which measures the expected cost of a single 
random search.  In this paper we investigate a generalization, the 
k-cost, which is suitable for applications involving independent 
parallel processors each utilizing a common search tree.  The k-cost 
is defined to be the expected value of the maximum length of the 
search paths encountered during the course of k independently 
performed random searches.  For example, if k = 1, then the k-cost 
represents the traditional weighted path length.  

Our results include efficient algorithms for constructing binary
search trees having near optimal k-cost.  An interesting feature
of one of our algorithms concerns the fact that it constructs a 
search tree that simultaneously has near optimal k-cost for all k; 
in fact, the structure of this tree is determined only by the order
permutation of the sequence of probability weights that are to be 
assigned to the tree nodes.  This stands in contrast to an example 
which shows that a tree which is precisely optimal relative to 
weighted path length (1-cost) can have k-cost which is exponentially 
greater than the k-cost of the optimal tree.


Bob Sedgewick, Princeton University

Is Quicksort Optimal?

This talk describes some relatively bold theoretical results 
and some relatively new practical implementations that imply
tight relationships between the number of comparisons used by 
a properly implemented Quicksort and the entropy of the set of 
keys being sorted.


Veronique Bruyere, University of Mons-Hainaut Le Pentagone

Tutorial - Codes: A Language Theoretic Approach

The topic of this tutorial is the theory of variable-length codes. 
This theory was initiated by Schutzenberger in the fifties. It takes 
its origin in the coding and information theory developed by Shannon.

In the first part of the tutorial, we survey the definitions and 
basic results about codes, with an emphasis on the interconnections 
with finite automata.

In the second part, we discuss some of the results recently obtained 
in this direction together with some open problems, that might be 
interesting for the information theory community and the algorithms 
community. When the proofs are simple, we expose them in a way that 
the audience has a flavor on the tools used in this domain.


Dominique Perrin, Universite de Marne-la-Vallee

Bifix Codes

Bifix codes are codes which have the prefix property in both 
directions. Many interesting -and not widely known - results hold 
for this particular class of prefix codes introduced long ago by 
Gilbert and Moore and by Schutzenberger. In this talk, I will present
several constructions allowing to embed a bifix code into a maximal 
one. The results are due to Zhang and Shen (1995) and to Veronique 
Bruyere and myself.


Ian Munro, Waterloo
(with Venatesh Raman)

Succinct Tree Representations for Fast Navigation

We consider applications of large, mostly static, tree structures and 
the implementation of abstract data types for the objects: binary tree, 
rooted ordered tree and balanced parenthesis expression. We review a 
number of representations, and present some new ones, that use an amount 
of space within a lower order term of the information theoretic minimum 
and support, in constant time, a rich set of navigational operations.
In the case of binary trees, for instance, we can move from a node to
its left or right child or to the parent in constant time while retaining
knowledge of the size of the subtree at which we are positioned.


Erik D. Demaine, Waterloo
(with Ian Munro and Alex Lopez-Ortiz, University of New Brunswick)

Adaptive Computation of the Intersection of a Collection of Sets

This talk discusses the problem of computing the intersection of a 
collection of sets, which is motived by the goal of efficiently 
evaluating boolean queries in text retrieval systems. An optimal 
adaptive algorithm is proposed to solve this problem. Information 
theory is an important tool for analyzing this algorithm. In 
particular, we study sets of comparisons that prove correctness of 
an answer, and study optimal binary encodings of such proofs. The 
lengths of these encodings form an information theoretic lower bound.  
We present matching upper and lower bounds on the worst-case running 
time with respect to the information theoretic lower bound.


Stott Parker, UCLA

The Construction of Huffman Codes is a Submodular ('Convex')
Problem Over a Lattice of Binary Trees

We show that the space of all binary Huffman codes for a finite
alphabet defines a lattice, ordered by the imbalance of the code
trees.  Representing code trees as path-length sequences, we show that
the imbalance ordering is closely related to a majorization ordering on
real-valued sequences that correspond to discrete probability density
functions.  Furthermore, this tree imbalance is a partial ordering that
is consistent with the total orderings given by either the external
path length (sum of tree path lengths), or the entropy determined by
the tree structure.  On the imbalance lattice, we show the weighted
path-length of a tree (the usual objective function for Huffman coding)
is a submodular function, as is the corresponding function on the
majorization lattice.  Submodular functions are discrete analogues of
convex functions.  These results give perspective on Huffman coding,
and suggest new approaches to coding as optimization over a lattice.


Mihai Sipitca, Georgia Tech
(with David Gillman)

Lossless Encoding of DCT Coefficients for Video Compression

The last stage of many image or video coders is to losslessly encode
an orthogonal transformation of an image or a residual image,
after the transformed image has been quantized, or reduced in numerical
precision.  Traditionally, image and video coders use a fixed table for
entropy coding the tokens that represent the quantized transform
coefficients.  However, the characteristics of images are
highly non-stationary, i.e., different portions of images and different
images have different statistics which would require different entropy
coding tables for a more efficient encoding.

In our presentation, we will investigate the improvement obtained when
using multiple entropy tables instead of one. Issues addressed include:
1) choosing the variance as the characteristic of a block,
2) table compaction - keeping the memory requirements reasonable at a
minimum coding efficiency cost,
3) presenting the trade-off between memory requirements and the coding
efficiency improvement.

The results presented were obtained for resolutions between SIF
(similar to VHS) and CCIR-601 (broadcast TV). Future work is going to
target the low bit rate range.


Sajal Das, U. North Texas
(with Amiya Bhattacharya)

Predictive Location Management Using Lossless Compression

A user in a cellular or PCS system, while enjoying the freedom to 
be mobile, creates an uncertainty about the exact location of the 
registered mobile terminal being carried. The system has to search 
for the mobile against this uncertainty, to ensure a successful call 
delivery within a time constarint. The collector network deployed is 
usually organized in the form of a three-tier tree like hierarchy, 
viz. with a mobile switching exchange at the root, a number of base 
station controllers at the second tier, and several base stations as 
leaves at the third tier. Each base station is dedicated to handle 
the air-link within one cell. 

On a call arrival, the search for the target mobile is inititaed by 
the system by paging or polling the cells where the user can possibly 
be found. Paging messages originate from mobile switching exchange and 
when it reaches a base station, it is broadcast so that the target 
mobile can respond if present. To limit the level of uncertainty, the 
mobile can report its position from time to time. This mobile-initiated 
reporting is called a location update or registration. Both paging and 
update contribute to the location management cost, and the location 
management problem aims at minimizing this total cost. Any more message 
exchange than necessary would lead to wastage of valuable resources such 
as wireless bandwidth and mobile equipment power, in effect increasing 
the operational cost that the user has to bear.

In this paper, we propose a predictive location management technique 
that makes use of path updates as opposed to the widely used position 
updates. To enhance predictability, the mobility profile for each mobile 
user must be built in the system database. For this, the path traced by 
the user in the past needs to be made completely known to the system. 
The cost minimization can come out of achieving this information exchange 
in its information theoretic limit. The technique evolves out of the 
mapping of the user's trail onto a string, as reducing the location 
management cost translates to lossless compression of this string. The 
compressibility of the fixed-to-variable length encoding of the acclaimed 
Lempel-Ziv family of algorithms reduce the update cost, whereas their 
built-in predictive power can be effectively used to reduce paging cost.
What we end up with is a family of adaptive on-line algorithms, which are
asymptotically optimal if the user mobility can be modeled as an ergodic 
source. We discuss the underlying data structures used for building the 
profile on the fly, and how good paging strategies can be designed based on 
the profile. We take into consideration the limit on the memory available 
on a mobile terminal for executing the proposed class of algorithms.


Yuriy Reznik, RealNetworks

On Compact Level Partitioning of Digital Trees

A modification of digital tree structure is proposed that supports
non-uniform distribution of digits between levels (i.e. each 
level may have nodes of different size). We call this distribution 
of digits - level partitioning, and the resulting data structure - 
LP-tree. We study general properties of LP-trees, and develop an 
algorithm that for any given binary tree constructs an adequate 
LP-tree that uses minimum amount of memory. We call the resulting 
trees - compact LP-trees, and perform detailed analysis of their 
properties. It turns out that, on average, the compact LP-tree uses 
approximately 20.65% less memory than a binary tree, and even more
importantly, the average complexity of the successful search in 
such a tree is O(1) (vs. O(log N) for the standard binary trie). 


Mamoru Hoshi, University of Electro Communications

Homophonic Coding by the Algorithm: Preliminary Considerations

We shall show that the idea of the successive refinement 
of interval partitions,  which plays the key role in the interval 
algorithm for random number generation [Han and Hoshi], is directly
applicable also to the homophonic coding.

We propose an efficient and very simple algorithm for homophonic coding 
which produces an i.i.d. sequence. The lower and upper bounds for the 
expected length  of the code generated by the algorithm are given.
In a special case,  the expected length of the code generated by the
algorithm is bounded by H(X)/log M + 3, where H(X) is the entropy of a
source X. Homophonic coding with cost is also discussed. 


Andrej Brodnik, Lulea Technical University and IMFM,
and Svante Carlsson, Lulea Technical University

Sub-Linear Huffman Decoding: Some Theoretical and Practical Results

In the first part of our presentation we describe a succinct data
structure storing the Huffman encoding that permits sub-linear
decoding in the number of transmitted bits ([BroCar98]). The size of
the extra storage except for the storage of the symbols in the
alphabet for the new data structure is

			  O(l log N) bits,

where l is the longest Huffman code and N is the number of symbols in
the alphabet. Similar solution was independently presented by Moffat
and Turpin ([MofTur97]).

We use the presented data structure to describe two different decoding
algorithms -- one running  in time O(l') and the  other in O(log  l'),
where l' is a number of distinct lengths  of Huffman codes. We proceed
by showing that our algorithm decodes Huffman codes on average in time
O(log log N) for any probability distribution.

In  the  second part we present  practical results obtained  from the
implementation of the described algorithms. We show  that with such an
efficient decoding algorithm, handling  of the input stream of bits
(Huffman codes) becomes as expensive as the decoding itself.

To improve the  latter we developed a more  detail machine model which
better captures the memory hierarchy. The model was influenced by the
work on the external memory data structures (eg. [Arge96,ArgVit96]).

We conclude the presentation with  some  comments and open questions
related to the dynamic Huffman encoding ([Vitter87]).

The presented algorithms will be used in the fall release of the
``Phone Book of Slovenia'' ([PhBook]).

[Arge96]   L.  Arge. Efficient   External-Memory Data   Structures and
  Applications. PhD thesis, University of {\AA}rhus, 1996.
[ArgVit96]  L.   Arge and  J.S.    Vitter. Optimal   dynamic  interval
  management in external memory. In 37\Th FOCS, 1996.
[BroCar98] A. Brodnik and S. Carlsson. Sub-linear Decoding of Huffman
  Codes  Almost  In-Place IMFM,  Technical  Report IMFM-(1998)-PS-600.
[MofTur97]  A.  Moffat and A. Turpin.   On  implementation  of minimum
  redundancy  prefix   codes.   IEEE  Transactions  on Communications,
  45(10):1200--1207, 1997.
[Vitter87]  J.S.  Vitter.   Design  and  analysis of  dynamic  Huffman
  codes. JACM, 34(4):825--845, October 1987.
[PhBook] Telekom Slovenije & ADACTA. Telefonski imenik Slovenije. URL:

Other Workshops
DIMACS Homepage
Contacting the Center
Document last modified on October 2, 1998.