Networks, Randomness, and Primes
Much of the world can be described as networks, consisting of discrete elements with connections between certain pairs of them. Some examples described in the recent book Large Networks and Graph Limits, by László Lovász, include:

• The Internet with computers connected by links
• The World Wide Web with webpages and hyperlinks
• Social networks like Facebook with users and friendships
• Chemical networks like imperfect crystals with atoms and chemical bonds
• Biological networks like the brain with neurons and synapses
• Engineered networks like integrated circuits with transistors and wires.

Understanding the structure of these networks can yield critical insights on topics ranging from the spread of diseases to the properties of complex crystals. However, it is often difficult to analyze extremely large networks; each of the examples given above has over a billion elements. This major challenge has led to exciting research in recent years.

In graph theory, networks are called graphs, the discrete elements are called vertices, and the connections between the elements are called edges. Graph theory is an old subject with a history that goes as far back as 18th century mathematician Leonhard Euler. An important modern direction of research in graph theory is developing mathematical tools for studying very large graphs. Central to this area is a powerful result of Endre Szemerédi known as the regularity lemma. The regularity lemma provides a rough structural description of all large graphs. It states that the vertices of any graph can be partitioned into a bounded number of parts such that the edges between almost every pair of parts behaves in a random-like fashion. This result created a paradigm shift in how we view and study networks, and it has become a central tool in discrete mathematics with diverse applications in mathematics, computer science, and elsewhere.

As an example, Szemerédi used an early version of the regularity lemma to prove a famous conjecture of Paul Erdős and Paul Turán. The density of a set of integers represents the proportion of the integers contained within that set. For example, the set of even integers has density 1/2. An arithmetic progression is a sequence of evenly spaced numbers such as 4, 7, 10, 13, 16. The length of the arithmetic progression is the number of terms, which in this example is five. Szemerédi’s theorem states that every set of integers with positive density contains arbitrarily long arithmetic progressions. This theorem was then instrumental in the celebrated result of Ben Green and Terence Tao that there are arbitrarily long arithmetic progressions of prime numbers; an example of length six is 7, 37, 67, 97, 127, and 157. This problem in number theory had been open for several hundred years. In 2012, Szemerédi was awarded the Abel prize, considered the mathematics equivalent of the Nobel Prize, in large part for his work on the regularity lemma.

While the regularity lemma has spurred significant research progress, it also has some notable limitations. One major drawback of applying the regularity lemma is that the “bounded” number of parts is actually a function M(ε) of an approximation parameter ε, which is an exponential tower of 2s of height ε-2.

## fox-figure.jpg

Even for ε = 1/3, the number of parts is much larger than the number of particles in the universe. Unfortunately, this yields weak and impractical quantitative estimates in its various applications. For almost two decades, there was hope that the bound could be substantially improved; however, in 1997, Timothy Gowers used a probabilistic construction to show that a tower-type bound is indeed necessary. This work was called a “tour de force” in the Fields Medal citation for Gowers. Very recently, László Miklós Lovász and I gave a tight lower bound construction for the order on the tower height, showing that a tower of height about ε-2 is indeed needed. The proof reverse engineers Szemerédi’s upper bound argument and shows that it is essentially best possible.

While the bound in the regularity lemma keeps it from being directly useful for real-world applications, it has been used to solve many beautiful problems that originally appeared intractable. These include problems in theoretical computer science, combinatorics, number theory, and discrete geometry. For many of these results, new proofs were later discovered with much better quantitative bounds.

Most of the real-world examples and some of the most interesting theoretical problems involve sparse graphs, where almost all pairs of vertices are not edges. However, Szemerédi’s regularity lemma is not directly useful for studying sparse graphs, as in this case the error term is considerably larger than the term one wishes to bound. Kohayakawa and Rödl proved a sparse regularity lemma as part of a program to extend the regularity method to sparse graphs. Many of the most interesting applications of the regularity lemma also use a counting lemma, which shows that the counts of small subgraphs are close to what is expected based on the edge densities in the regular partition of the graph. The combination of these tools is commonly referred to as the regularity method. David Conlon, Yufei Zhao, and I recently proved a sparse counting lemma and used it to prove sparse extensions of some of the classical results in combinatorics.

The proof of the Green-Tao theorem that the primes contain arbitrarily long arithmetic progressions has two steps. The first step is establishing a relative Szemerédi theorem, which states that any relatively dense subset of a pseudorandom set of integers must contain arbitrarily long arithmetic progressions. Roughly, a set is pseudorandom if it satisfies certain properties that a random set of the same density typically satisfies. The second part is finding a pseudorandom set of “almost primes” which contains, but is not much larger than, the primes`. The research on the sparse counting lemma led Conlon, Zhao, and me to prove a new relative Szemerédi theorem that requires a substantially weaker pseudorandomness condition. This simplifies the proof of the Green-Tao theorem.

While various applications now have new proofs that avoid using the regularity lemma and give substantially better bounds, these bounds are in many cases still quite far from what is conjectured to hold. It remains a significant challenge to better understand these problems and close the large gaps.

SUPPORT THE SCHOOL OF SCIENCE

Assistant Dean for Development
t: 617-253-8903