Trying to Identify the Principles Underlying Web Growth

"It's already enormous, and the scary thing is, it is only the beginning," says Fan Chung Graham, a professor of Mathematics and Computer Science at UCSD and a principal investigator in Calit². She's talking about the World Wide Web. As of February 2001, the number of Internet hosts was estimated at 109 million and growing at more than 50% per year. In the same timeframe, the number of Web pages indexed by search engines was more than 1.3 billion with an estimated 5,000 Web sites created every day.

As a result, the Web has caught the attention of mathematicians who want to determine the fundamental structural properties of such massive and dynamic "graphs." Graham and colleagues want to answer such questions as: What is the size and diameter of the largest Web component, or are there interesting structural properties that govern the development and use of such networks? Graham admits that answering such questions may not be easy.

But because similar questions arise in the physical, biological, and social sciences, answers here may have far-reaching implications for other disciplines. Graham hopes such answers can be arrived at through an interplay between analyzing experimental data and mathematical modeling.

By mathematically modeling the Web, then comparing the results with real data, she is trying to extract the principles underlying Web growth. Graham maintains that graph theory (the study of objects and relations between these objects) and many other branches of mathematics will become increasingly important to analyzing the information being produced by the exploding numbers of computers, embedded processors, sensing devices, etc., because they make it possible to detect underlying patterns in otherwise intractable amounts of data. This work also has obvious implications for architecting future networks, and network management and optimization.

Graham points out that fundamental principles, such as general relativity and quantum mechanics, govern behavior in the physical world. But we know relatively little about such principles in the information world. "The principles are harder to identify," she says, "because, in this discrete mathematical world, one small change of the input can affect the outcome profoundly. So to control the information explosion depends on our ability to understand and describe its governing principles."

It turns out that the Web has a structure very similar to an airline routing map: It is organized hierarchically with a relatively small number of hubs located at key "connection points" (compare with United Airline's Chicago hub or American Airline's Dallas hub), connected to a much larger number of terminal end points in the more remote hinterlands. This organization is said to follow a "power law distribution," in which the number of nodes of a certain size is proportional to the inverse of some power of the size of the node. It was realized a couple of years ago that this power law closely describes the behavior of huge graphs such as the Web.

Interestingly, the Web, not just as a physical network, is also a social network reflecting the spread and increasing interconnectedness of human relationships -- in effect, it is an evolving map of human collaboration. "And don't think the sociologists and historians haven't already noticed that fact," says Graham.

--Stephanie Sides, Director of Communications, Calit²