DIMACS Workshop on Internet and WWW Measurement, Mapping and Modeling

February 13-15, 2002
DIMACS Center, CoRE Bldg., Room 431, Rutgers University, Piscataway, NJ

Organizers:: John Byers, Boston University, byers@cs.bu.edu; Danny Raz, Technion, raz@cs.technion.ac.il; Yuval Shavitt, Tel Aviv University, shavitt@eng.tau.ac.il

Presented under the auspices of the Special Year on Networking.

Abstracts:

Emergence of scaling in the Internet and the www
Albert Laszlo Barabasi

The Internet is like a Jellyfish
Michalis Faloutsos, U.C. Riverside
(joint work with L. Tauro, G. Siganos, C. Palmer)

Abstract - We propose a conceptual visual model for the
Internet inter-domain topology. Recently, power-laws were used to
describe the topology concisely. Despite their success, the power-laws
do not help us visualize the topology, i.e., draw the topology on
paper by hand. Here, we deal with the following questions:

a. Can we identify a hierarchy in the Internet?
b. How can I represent the network in an abstract graphical way?

The focus of this talk is threefold. First, we characterize nodes
using three metrics of topological "importance"', which we later use
to identify a sense of hierarchy. Second, we identify some new
topological properties. We then observe that the Internet has a highly
connected core and identify layers of nodes in decreasing importance
surrounding the core. Finally, we show that our observations suggest
an intuitive model. The topology can be seen as a jellyfish, where
the core is in the middle of the cap, and one-degree nodes form its
legs. In addition, we argue for the need of simple intuitive models to
deal with the overwhelming complexity of the real world.

Finite metric spaces and clustering large data sets
Nati Linial

In the last several years there has been
very intense interest in the study of finite metric
spaces. These spaces and their low-disotrion
embeddings have played a key role in the design of
new approximation algorithms for hard problems. In
this talk, I will discuss their role in the analysis
of large data sets. If time permits, I will also discuss
several open questions pertaining to the theoretical
foundations of clutering.

Protocols for building low diameter peer-to-peer networks
Gopal Panduragan

In this talk, I will describe a distributed protocol for building
connected, low-diameter Peer-to-Peer (P2P) networks. These properties
are important for performing search (the most common P2P application to date)
in P2P networks, especially Gnutella-like networks. Current P2P
protocols are typically ad hoc and often lead to network fragmentation.

An important feature of our protocol is that it operates with no global
knowledge of all the nodes in the network.
It is also simple to implement and is easily scalable.
We analyze the protocol under a stochastic model (which can be of independent
interest) and prove that it results in connected networks of
constant degree and logarithmic diameter.

This is joint work with Prabhakar Raghavan and Eli Upfal.

Graph Theoretic Enhancements of Internet Topology Generators
Milena Mihail, Georgia Tech

New families of internet topology generators, typified by Brite
and Inet, are primarily trying to match the skewed degree sequence
statistics of the internet topology at the AS level.

(a) We have developed algorithms that, given a target degree sequence,
produce topologies meeting the degree sequence but differing substantially
from each other and from the real data. We can also produce ``random''
instances matching the real data. Our techniques use graph theoretic
primitives as well as the Markov chain simulation method,
and substantially enrich the output of existing generators.
(They also suggest that degree sequence alone is not sufficient
to capture all topology characteristics).

(b) For modeling and simulation purposes, it is particularly important
to identify global characteristics of AS topologies, such as hierarchy
and clustering. Hierarchy has been addressed in several prior works.
In this work we initiate a study of clustering using spectral filtering
(Singular Value Decomposition - Principal Vector Analysis) methods.
We have identified many clusters on the real data (expressing geography
or business and cutting across different hierarchical levels).
Synthetic data appear to posses significantly weaker clustering
properties.

Joint work with Chitsos Gkantsidis, Amin Saberi, and Ellen Zegura.

On the Marginal Utility of Deploying Measurement Infrastructure
Mark Crovella

The cost and complexity of deploying measurement infrastructure in the
Internet for the purpose of analyzing its structure and behavior is
considerable. Basic questions about the utility of increasing the
number of measurements and measurement sites have not yet been
addressed which has led to a ``more is better'' approach to wide-area
measurement studies. In this paper, we step toward a more
quantifiable understanding of the marginal utility of performing
wide-area measurements in the context of Internet topology discovery.
We characterize the observable topology in terms of nodes, links, node
degree distribution, and distribution of end-to-end flows.

We classify nodes discovered on the routes between a set of 8 sources
and 1277 destinations to differentiate nodes which make up the so
called ``backbone'' from those which border the backbone and those on
links between the border nodes and destination nodes. This process
includes reducing nodes that advertise multiple interfaces to single
IP addresses. We show that the utility of adding sources beyond the
second source quickly diminishes from the perspective of interface,
node, link and node degree discovery. We also show that the utility
of adding destinations is constant for interfaces, nodes, links and
node degree indicating that it is more important to add destinations
than sources.

On the Influence of Phased Growth of the Internet on Power Laws in the Topology
Adam O'Donnell

Empirical analysis has revealed several basic large-scale power-law
relationships in the Internet topology. Recent
work has focused on discerning the underlying phenomena that give
rise to the observed scale-free topological properties. Incremental
growth with preferential connectivity is one of the phenomena that
has been said to have contributed to the power-law relationships.
In this work, we present a conjecture that the Internet has undergone
not one but several distinct growth phases, each with a different growth
process. We present simulation results to corroborate our conjecture
and show that a 2-phase growth with preferential connectivity only
in the second phase leads to stronger power-law relationships closer
to those observed in the real Internet.

Explaining Power Laws on the Internet
Christos H. Papadimitriou

Authors: Alex Fabrikant, Elias Koutsoupias, Milena Mihail, Christos H. Papadimitriou

We present a simple model of network creation that predicts power
law-distributed degrees as observed by Faloutsos et al. In particular,
we assume that nodes arrive at random in a unit square, and each attaches
itself to the already existing node that minimizes a linear
combination of the Euclidean distance and the hop distance from the
center. We show that, if the coefficient is in a particular range,
power laws in the degree distribution arise with high probability.
Outside this range, degrees are highly concentrated. We also
point out that the power law in the distributions of eigenvalues
observed in the same paper are a corollary of the degree distributions.

Network Topologies: Large-Scale Structure and Hierarchy
Ramesh Govindan

It has long been thought that the Internet, and its constituent
networks, are hierarchical in nature. Consequently, the network
topology generators most widely used by the Internet research
community, GT-ITM and Tiers, create networks with a deliberately
hierarchical structure. However, recent work by Faloutsos et
al. revealed that the Internet's degree distribution --- the
distribution of the number of connections routers or Autonomous
Systems (ASs) have --- is a power-law. The degree distributions
produced by the GT-ITM and Tiers generators are not power-laws.

To rectify this problem, several new network generators have recently
been proposed that produce more realistic degree distributions; these
new generators do not attempt to create a hierarchical structure but
instead focus solely on the degree distribution. There are thus two
families of network generators, structural generators that treat
hierarchy as fundamental and degree-based generators that treat
the degree distribution as fundamental.

In this talk we use several topology metrics to compare the networks
produced by these two families of generators to current measurements
of the Internet graph. We find that the degree-based generators
produce better models, at least according to our topology metrics, of
both the AS-level and router-level Internet graphs. We then seek to
resolve the seeming paradox that while the Internet certainly has
hierarchy, it appears that the Internet graphs are better modeled by
generators that do not explicitly construct hierarchies.

10.

On the Completeness of Observed AS-level Internet Topology
Sugih Jamin

Researchers have charactered AS-level Internet topology
by exploiting connectivity information contained in BGP
routing tables, mainly from those collected by the University
of Oregon. The AS-level topology constructed from
these routing tables are believed to reflect existing
AS-level Internet connectivity reasonably well.
However, the completeness of the Oregon topology hasn't been
validated by any kind of formal proof or actual measurements.
Our measurement result shows that a nonnegligible number of
existing AS-level connections are not advertised in many BGP
routing tables. We found that the observability of AS-level
connectivities in BGP routing tables depends to a large extent
on existing AS relationships. Our overall result suggests that
the Oregon route server may not capture a significant number of
existing AS-level connections.

11.

Analysis of Real and Synthetic Internet AS Topology Graphs
Danica Vukadinovic, Polly Huang, Thomas Erlebach

We study the Internet topology on the autonomous system (AS) level
from a graph-theoretic point of view. Our goal is to determine
characteristic properties of the real AS graphs and to develop a
methodology for comparing real and synthetic topologies. On the one
hand, we study many graph parameters that are well-known in graph
theory but have rarely been considered in previous investigations
of Internet graphs. On the other hand, we find that the normalized
Laplacian spectrum provides a concise fingerprint of a network
topology. Our results indicate that current state-of-the-art
Internet topology generators fail to capture certain distinctive
properties of the real AS graphs. This is important for
topology-sensitive Internet applications and suggests new directions
towards more realistic generators.

12.

Observations from Router-level Internet Traces
Lisa Amini

13.

Characterizing the Internet Hierarchy from Multiple Vantage Points
Jennifer Rexford

The delivery of IP traffic through the Internet depends on the complex
interactions between thousands of Autonomous Systems (ASes) that
exchange routing information using the Border Gateway Protocol (BGP).
This paper investigates the topological structure of the Internet in
terms of customer-provider and peer-peer relationships between ASes,
as manifested in BGP routing policies. We describe a technique for
inferring AS relationships by exploiting partial views of the AS graph
available from different vantage points. Next we apply the technique
to a collection of ten publicly-available BGP routing tables to infer
the relationships between neighboring ASes. Based on these results,
we analyze the hierarchical structure of the Internet and propose a
five-level classification of ASes. Our analysis differs from previous
characterization studies by focusing on the commercial relationships
between ASes rather than simply the connectivity between the nodes.

This is joint work with Lakshminarayanan Subramanian, Sharad Agarwal,
and Randy Katz at the University of California at Berkeley. A paper
describing the work is available at
http://www.research.att.com/~jrex.
The BGP routing data and our inference results are available at
http://www.cs.berkeley.edu/~sagarwal/research/BGP-hierarchy/

14.

On Characterizing BGP Routing Table Growth
Tian Bu

The Internet has experienced explosive growth since its commercialization.
The sizes of the routing tables have increased by an order of magnitude
over the past six years. This dramatic growth of the routing table can
decrease the packet forwarding speed and demand more router memory space.
In this work, we explore the extent that various factors contribute to
the routing table growth. We first perform measurement study to determine
the extent that factors such as multi-homing, failure to aggregate,
load balancing, and address fragmentation contribute to routing table
size, and find that only 20-30% of prefixes are due to multi-homing,
15 - 20% of prefixes are due to failure to aggregate, 20 - 25% of
prefixes are due to load balancing, and more than 75% of prefixes are
due to address fragmentation. This leads us to group all prefixes that
are not aggregated due to either failure to aggregate or address
fragmentation. We find that the number of prefix clusters is no more than
20% of the number of prefixes. We explore the extent that load
balancing contributes to the number of prefix clusters. To the best of our
knowledge, this is the first study on the explosive growth of routing
tables by systematically comparing factors that contribute to the growth
and by observing routing table growth patterns.

15.

A web graph bestiary
Andrei Broder, IBM Research Division

Experiments conducted a couple of years ago on a series crawls from
AltaVista revealed the intricate structure of the enormous link graph
induced by the corpus of a major search engine. But clearly no search
engine covers all the pages on the web, and even defining precisely
what is "the entire web graph" is challenging. Furthermore, any
crawling process is driven by some rules that influence the final
result. Last but not least, the web is intentionally created: there
are incentives (some economic and some not so) to perturb the
"natural" process of birth and death of nodes and links. These
perturbation might explain the discrepancies between some reasonable
models of web expansion and the observed reality. The main aim of
this talk is to present a bestiary of these perturbations; among the
monsters to be presented are soft 404's, infinite paths, seldom
connected servers, link spamming, robot submitters,

cookie crumbs, content generators, and others.

16.

Web community identification from hyperlink topology: clustering and ranking algorithms
Chris Ding

We study methods for effectively identifying web communities
and ranking algorithms for placing most informative webpages on the
top. Web communities can be discovered with an unsupervised learning
method by clustering web pages into clusters utilizing
information contained in hyperlink topology, text, and co-citation.
The subgraph components are generally sparsely-connected
with their sizes following Zipf distribution.
We provide a comprehensive analysis on
PageRank and HITS ranking algorithms, with
improvements and generalizations. Closed-form solutions
are obtained which proves the fundamental notion that
top authorities always have high in-degrees and
top hubs always have high out-degrees.

Joint work with Xiaofeng He, Hongyuan Zha, Parry Husbands, Horst Simon.

17.

Measurement and modelling of the WWW link graph
Andrew Tomkins

18.

A general model of web graphs
Alan Frieze

We describe a general model of a random graph whose
degree sequence obeys a power law. Such laws have recently been observed in
graphs associated with the world wide web.

Joint work with Colin Cooper.

19.

Fine Grain Observation of Traffic Dynamic in the Sprint IP Backbone
Christophe Diot

Network traffic measurements provide essential data for networking
research and network management. We describe a passive
monitoring system designed to capture the dynamic aspects of the traffic
at the finest possible level of granularity. Our system is deployed
in 3 POPs in the Sprint E|Solutions IP backbone. It taps fibers
at different backbone links and timestamps each IP packet with a GPS-based
global clock signal, allowing packet correlation at different locations
on the network. We present a set of results to both demonstrate the
strength of the system, and identify new characteristics of the Internet
traffic. The results include traffic statistics collected on specific
OC-12 links in terms of bytes, packets, and flows, as well as single-hop
and multiple-hop packet delay. Recent measurement issued from IS-IS
and BGP speakers deployed in the same backbone are also discussed.

20.

Linking Internet traffic dynamics with the BGP topology:
the view of two stub ISPs
Steve Uhlig

In this presentation, we study two one-week long traces of all the
interdomain traffic received by 2 Belgian stub-ISPs. We correlate
the traffic received by these two very different ISPs with their
BGP routing table. First, we present the topological distribution
of their traffic throughout the Internet and show that most of their
traffic is received from sources located at a small AS hop distance.
Then, we study the ``activity'' of the interdomain traffic sources
and show that this ``activity'' does not vary much with the considered
timescale. Finally, we show that the distribution of the most
important traffic sources in terms of the traffic volume is stable.

21.

Using Simulation of Particle Mechanics for Calculating Coordinates of Large Distance Maps
Tomer Tankel

IDMaps is architecture for estimating Internet distances using Instrumentation boxes, placed in the Internet.
Host-to-host distances are calculated as
sum of several measured segments because there is no means to 'triangulate' these segments.
Ng and Zhang suggested coordinate-based approach to provide this triangulation, however
their technique suffers from significant inaccuracy when the number of instrumentation boxes is larger than
about 15. We suggest a novel graph embedding technique that can handle hundreds of such boxes
at high accuracy.

We model network nodes as particles traveling under force field derived from embedding error.
Particles escape local minima due to their kinetic energy and achieve global minimum of embedding error.
By the use several error 'forces' in different stages of the calculation we converge very fast
to very low embedding error. We will present results both power-law networks as well as for
traditional Waxman networks. We will also show our embedding is superior in practice to the one
based on the theoretical results of Linial, London, and Rabinovich.

Joint work with Yuval Shavitt.

22.

Inference and Labelling of Metric-Induced Network Topologies
Azer Bestavros

The deployment of distributed network-aware applications over the
Internet requires an accurate representation of the conditions of
underlying network resources. To be effective, this representation
must be possible at multiple resolutions relative to a metric of
interest. In this talk, I will describe an approach for the
construction of such representations using end-to-end measurement
of internal network loss, delay, and bandwidth characteristics.

This work is done in collaboration with John Byers and Khaled Harfoush.

23.

Maximum Likelihood Estimation of Network Topology from End-to-End Measurements
Robert Nowak

Authors: Rui Castro, Mark Coates, Rob Nowak

One of the predominant schools of thought in networking today is that
monitoring and control of large scale networks is most practical at
the edge. Identifying or estimating the routing topology is crucial
to edge-based approaches. The focus of this work is a new Maximum
Likelihood criterion for topology identification that makes use only
of unicast measurements performed between host computers and requires
no special support (e.g., ICMP responses) from internal routers.
The measurements are based on a novel unicast probing strategy, and
theoretical analysis of the likelihood criterion leads to a new,
fast algorithm for topology identification.

24.

Network Tomography Using Passive End-to-End Measurements
Venkata N. Padmanabhan and Lili Qiu

We discuss Network Tomography, which refers to the inference of characteristics
of internal links in a network using end-to-end measurements. A novel aspect of
our work is that we do the inference based on passive observation of existing
end-to-end traffic, rather than doing active measurements. We focus on inferring
link loss rate and describe two techniques to this end. We evaluate the accuracy
of these techniques using packet traces gathered at a busy Web site and also
simulation. The initial results are encouraging: our techniques can identify
70 - 80% of the lossy links in a network with a manageable false positive rate.

25.

Measurement and Classification of Out-of-Sequence Packets in a Tier-1 Backbone
Sharad Jaiswal

In this study we present a comprehensive study and classification of
out-of-sequence packets based on TCP connections observed over the Sprint
IP backbone, using a monitoring infrastructure present in different
backbone POPs.

We describe the measurement and classification methodology. We determine
and present the relative frequency of events causing of out-of-sequence
packets, including packet losses, reordering and duplicates within the
network. We also consider the distribution of out-of-sequence packets
among the measured flows, based on flow size (e.g., mice versus
elephants), and POP location.

Analyzing out-of-sequence TCP packets is important since they have a
direct bearing on the performance of applications. This study is of
considerable interest because (i) the data we analyze is unprecedented in
size and scope, (ii) it provides a starting point to address several
interesting additional questions about congestion in the Internet.

26.

Performance Monitoring for Wide-Area Applications
Louiqa Raschid

Recent technology advances have enabled wide area applications
with Internet accessible sources. A challenge to such applications
is the unpredictable behavior of sources in the dynamic WAN.
There can be wide variability in the latency (end-to-end delay) of
accessing these sources, and this could depend on network
and server workloads.
The success of these applications will depend on their
ability to monitor and predict end-to-end client-side performance.
Our objective is to explore open technology for passive performance
monitoring to develop performance profiles for clusters of clients
and servers. Such technology must be scalable to large numbers of
clients and servers.

27.

Comparative Analysis of Traffic Matrix Estimation Methods
Alberto Medina

Knowledge about the traffic matrix of a network is very valuable
to perform traffic engineering tasks such as load balancing,
routing protocol configuration, load balancing, provisioning,
pricing, etc. Traffic matrices must be estimated based on limited
available data and to date, very few techniques exist for doing so.
We conduct a comparative study of all of these techniques. We
discuss the strengths and weaknesses of these techniques in the
context of real commercial networks. The analysis of a real environment
is done in the context of the Sprint Tier-1 backbone network,
and make use of our measurement infrastructure to validate the
techniques.

28.

Connection-Level Modeling of Network Traffic
Rudolf Riedi

Aggregate network traffic exhibits strong burstiness and non-Gaussian
marginals, which popular models like fractional Gaussian noise (fGn) fail to
capture. To better understand the cause of traffic burstiness, we look into
connection-level information of traffic traces. A careful study reveals that
traffic burstiness is directly related to the heterogeneity in connection
bandwidths (and round-trip times), and that a small number of high-bandwidth
connections are solely responsible for burstiness. This decomposition has
far-reaching implications on network control and leads to a new model for
network traffic: the mixture of fGn and stable L\'evy noise, which captures
both long-range dependence and burstiness.

29.

A New Methodological Approach to Cross-Traffic Estimation
Kave Salamatian

This paper applies to cross traffic estimation, a new methodology for
analyzing and interpreting measurement collected over Internet. This new
approach is based on inferring cross traffic characteristics that lead
to the observed losses by using an associated a priori constructive model.

The constructive model used in this paper is an MMPP/M/1/N single server
bottleneck. The originality of this solution is that we start with
observed loss process and infer inputs that have lead to these observations.

Methods presented in the paper provide a powerful solution to address
the complexity of interpreting IP active measurement and empirical
network modelling.

30.

Internet Traffic: Statistical Multiplexing Gains
Jin Cao, William S. Cleveland, Dong Lin, and Don X. Sun
Bell Labs, Murray Hill, NJ
{cao, wsc, dong, dxsun}@research.bell-labs.com

As the number of active connections on an Internet link increases,
the long-range dependence of packet traffic changes due to increased
statistical multiplexing of packets from different active connections.
Four packet traffic variables are studied -- inter-arrival times, sizes,
packet counts, and byte counts. Results are based on the following:
(1) the mathematical theory of marked point processes; (2) empirical
study of 2526 packet traces, 5 min or 90 sec in duration, from 6 Internet
monitors measuring 15 interfaces ranging from 100 mbps to 622 mbps;
(3) network simulation with NS; and (4) simple statistical models
for the traffic variables.

31.

Investigating the Structure of the Internet via Game Theory
Scott Shenker

Auhtors: Elitza Maneva, Christos H. Papadimitriou, Scott Shenker, Kunal Talwar

We study network creation as a game between nodes. Each node can create
an edge to any other at a fixed cost per edge, and, once the graph
corresponding to the choices of all nodes is formed, it incurs an
additional cost equal to the sum of the distances to the other nodes.
In another variant, the costs correspond to VCG charges, minus VCG income.
We are interested in questions such as: How bad are the Nash
equilibria? And can bad instances of VCG routing arise as Nash equilibria?

32.

The diameter of a scale-free random graph
Bela Bollobas

We consider a random graph process in which vertices are added to the graph
one at a time and joined to each earlier vertex $v$ with probability
proportional to the degree of $v$.
This process was introduced by Barabasi and Albert,
as a simple model of the growth of real-world graphs such as the
world wide web.
Computer experiments presented by Barabasi, Albert, and Jeong
and heuristic arguments given by Newman, Strogatz and Watts suggested
that after $n$ steps the resulting graph
should have diameter approximately $\log n$.

Recently, with Oliver Riordan,
we determined the asymptotic value of the diameter; the main
aim of the talk is to sketch a proof of this result. We shall also
present some related results obtained with Riordan, Spencer and
Tusnady about the degree sequence of this random graph.

33.

On the power of off line data in approximating Internet distances
Prasun Sinha

Estimate of Internet distances can be used for finding good
solutions to problems such as cache placement and dynamic selection
of the nearest mirror site. Proposed approaches for Internet
distance measurement are based on either using geographic
distance as a metric, or extrapolation using background distance
measurements between a fixed set of sites. Only using geographic
distances have been shown to have limitations and the latter
technique of active distance measurement has several associated
problems such as the requirement of dedicated machines for
measurements and introduction of additional network traffic.

In this work, we explore the use of multi-variable linear
regression techniques for distance estimation using multiple metrics
such as distance to the nearest backbone network and geographic
distance. For our studies, we have used the set of public libraries
and the set of universities in U.S. as clients and a fixed set of 8
servers. In particular we study the accuracy of picking up the
nearest server from a set of servers using our distance estimation
technique.

Our preliminary results indicate that off line information allows
choosing the nearest server in about 80% of the cases.

Joint work with Danny Raz.

34.

Comparative Analysis of Traffic Matrix Estimation Methods
Alberto Medina

Authors: Alberto Medina, Nina Taft, Kave Salamatian, Supratik
Bhattacharyya and Christophe Diot.

Previous: Participation

Next: Registration

Workshop Index

DIMACS Homepage

Contacting the Center

Document last modified on February 7, 2002.