Thesis for the degree

‫עבודת גמר (תזה) לתואר‬

Doctor of Philosophy

‫דוקטור לפילוסופיה‬

Submitted to the Scientific Council of the Weizmann Institute of Science Rehovot, Israel

‫מוגשת למועצה המדעית של‬ ‫מכון ויצמן למדע‬ ‫ ישראל‬,‫רחובות‬

By Igor Shinkar

‫מאת‬

‫איגור שינקר‬

‫נושאים על הקשר בין תכונות לוקאליות לגלובליות‬ ‫באובייקטים קומבינטוריים גדולים‬ Topics on local-to-global phenomena in large combinatorial objects

Advisor: Prof. Irit Dinur

May 2014

:‫מנחה‬ ‫ אירית דינור‬.‫פרופ‬

‫אייר תשע"ד‬

TO MY FAMILY

Acknowledgments First of all I would like thank my advisor Irit Dinur. Thank you for believing in me, and for giving me the freedom do anything I found interesting during my PhD period. Thank you for making me feel comfortable to talk to you about anything that could possibly worry me. Your help and support gave me everything I needed in an advisor. I have been very lucky to collaborate with Oded Goldreich. Your insistence on simplifying complicated ideas, and writing them as clear as possible, had a great impact on me and my work. Thank you for being such a wonderful teacher, I truly hope I learnt something from you. I especially thank Itai Benjamini. Thank you for the millions of questions you ask. In person, by email, in chat, by phone. Thank you for having the door of your office open for me any time, for any reason, and most of the times without a reason at all. Thank you for many hours spent there working, talking, reading the book with Ben. I thank my good friends in Weizmann: Elazar Goldenberg, Itai Dinur, Tal Orenshtein, Daniel Reichman. Thank you for all the jokes we laughed, all the cookies we ate, and for all the long walks we had. I thank my other fellow students and postdocs in Weizmann: Gil Cohen, Roee David, Anat Ganor, Alon Ivtsan, Rani Izsak, Shlomo Jozeph, Lior Kamma, Merav Parter, Ron Rothblum, Avishay Tal, Gilad Tsur. Each of them, in his/her own way, made it into a fun and creative environment, and it has been a pleasure for me to be part of this. Most importantly I am grateful to Eva, who convinced me to do a PhD in Weizmann, when I didn’t know what I wanted to do, and to Ben for being the light of my life. mu.

Contents Introduction

1

I

Two-Sided Error Proximity Oblivious Testing

6

1

Introduction to Part I 1.1 The notion of a Proximity Oblivious Tester (POT) 1.2 On the power of two-sided error POTs . . . . . . 1.3 An overview of our results . . . . . . . . . . . . 1.4 Organization . . . . . . . . . . . . . . . . . . . .

2

3

Classes of Boolean Functions 2.1 A generic tester and its analysis . 2.2 Generalization of Theorem 2.2 . 2.3 POTs can test only intervals . . 2.4 Proof of Theorem 1.2 . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Graph Properties (in the Adjacency Matrix Model) 3.1 The class of k-regular graphs . . . . . . . . . . . 3.2 Other regular graph properties . . . . . . . . . . 3.3 The class of regular graphs . . . . . . . . . . . . 3.4 Bounded density of induced copies . . . . . . . . 3.5 Towards a characterization . . . . . . . . . . . . 3.6 Impossibility results . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

. . . .

. . . . . .

. . . .

7 7 8 9 12

. . . .

13 13 16 17 18

. . . . . .

18 19 20 23 27 29 33

4

In the Bounded-Degree Graph Model

33

5

Classes of Non-binary Distributions 5.1 Characterizing the classes of distributions that have a POT 5.2 Closure under disjoint union . . . . . . . . . . . . . . . . 5.3 Positive corollaries . . . . . . . . . . . . . . . . . . . . . 5.4 Negative corollaries . . . . . . . . . . . . . . . . . . . . .

. . . .

38 38 41 42 44

. . . .

49 51 54 57 63

6

More on Graph Properties in the Adjacency Rep. Model 6.1 Testing CC (D) for finite sets D ⊆ ∆(t) . . . . . . . . . . 6.2 Testing CC (D) when D(N ) ⊆ ∆(2) is tiny . . . . . . . . 6.3 Testing CC (D) when D(N ) ⊆ ∆(2) is almost everything 6.4 Impossibility results for subclasses of CC ≤2 . . . . . . . i

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

II

Greedy Random Walk

68

7

Introduction to Part II 69 7.1 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

8

Edge Cover Time of Finite Graphs 8.1 The Complete Graph . . . . . 8.2 Expander graphs . . . . . . . 8.3 Hypercube {0, 1}d . . . . . . 8.4 d-regular trees . . . . . . . . .

9

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

72 74 75 80 81

Greedy Random Walk on Zd 83 d 9.1 GRW on Z for d 6= 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 9.2 GRW on Z2 and the mirror model . . . . . . . . . . . . . . . . . . . . . . . . 86

10 Remarks and Open Problems 87 10.1 A Conjecture Regarding Theorem 8.2 . . . . . . . . . . . . . . . . . . . . . . 87 10.2 Rules on Vertices Instead of Edges . . . . . . . . . . . . . . . . . . . . . . . . 88 10.3 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

III

Acquaintance Time of a Graph

90

11 Introduction to Part III 91 11.1 Our results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 12 Definitions and Notation

92

13 Some Concrete Examples 94 13.1 Separating AC(G) From Other Parameters . . . . . . . . . . . . . . . . . . . . 96 13.2 Exact Computation of AC for the path and the barbell graph . . . . . . . . . . 97 14 The Range of AC(G)

98

15 N P-Hardness Results 104 15.1 Towards stronger hardness results . . . . . . . . . . . . . . . . . . . . . . . . 108 16 Graphs with AC(G) = 1 109 16.1 Algorithmic results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

ii

17 Other Variants and Open Problems

114

IV Bi-Lipschitz Bijection between the Boolean Cube and the Hamming Ball 116 18 Introduction to Part IV 117 18.1 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 18.2 The Complexity of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 120 18.3 Proof Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 19 Proof of the Main Theorem 19.1 The De Bruijn-Tengbergen-Kruyswijk Partition 19.2 The Bijection ψ . . . . . . . . . . . . . . . . . 19.3 Proof of Theorem 18.2 . . . . . . . . . . . . . 19.4 Proof of Missing Claims . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

123 123 124 126 128

20 The Mapping ψ is Computable in DLOGTIME-uniform TC0

130

21 All But the Last Output Bit Depend Essentially on a Single Input Bit

132

22 Concluding Remarks and Open Problems

135

Appendices: Proofs of Technical Claims from Part I A.1 Proof of Claim 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Proofs of Propositions 3.5 and 3.6 . . . . . . . . . . . . . . . . . . . . . A.3 Proof of Proposition 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Strengthening Corollary 5.8 . . . . . . . . . . . . . . . . . . . . . . . . . A.5 Testable classes of distributions are not closed under taking complements A.6 Proof of Claim 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.7 Proof of Lemma 6.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

138 138 139 143 145 147 149 149

Publications Not Included in the Thesis

. . . . . . .

. . . . . . .

. . . . . . .

155

iii

Introduction This thesis deals with the study of local-to-global phenomena in combinatorial objects, and with local algorithms in general. Local algorithms are algorithms that get a large input, but read only a small part of it, and make their decisions according to the small view they see. Such phenomena occur naturally, and are studied in many different contexts including property testing, sublinear time algorithms, coding theory, probabilistically checkable proofs, hardness of approximations, random walks, and more. In this thesis we study these phenomena from several different aspects. One of the aspects is in the area of property testing, where the goal is to infer some global property of a given large object by having access only to local views of it. A typical task in property testing deals with the problem of distinguishing between (combinatorial) objects that have some predetermined property and objects that are far from having the property, while making only a small number of (random) queries to the object. Another part of this thesis deals with random walks on graphs. Here we imagine a person walking a large graph without actually knowing the entire graph, with the goal of covering the entire graph as quickly as possible. The walker makes only local (random) decisions based on her current location and the list of the neighbouring vertices. One of the most well-studied rules for making such decisions is known as the simple random walk, where the walker chooses her next step uniformly among her current neighbouring vertices. In this work we suggest a slight variant of the simple random walk, where the walker is allowed to use some memory locally in each vertex. We show that in many cases this allows the walker to cover the graph more quickly than the time in takes for the simple random walk. Next we consider a problem where we have one walker sitting on every vertex of the graph. At each time step the walkers move on the graph according to some rule, and their goal is to meet each other as quickly as possible, where walkers are said to have met if they shared a common edge at some point in time. The original motivation for this problem came from trying to understand this question when each of the walkers performs a simple random walk, each time making only local decisions independent of the rest of the walker, and of the past. Later, however, our study took a different route, and the results discussed in this thesis focus on global strategies rather than local behaviour of the walkers. The fourth part of the thesis shows the impossibility of a certain local-to-global result. Specifically, we show a local similarity between two graphs that are globally very different. Specifically, we show a bi-Lipschitz bijection between the Boolean Cube and the Hamming Ball. In simple words this means that vertices of the Boolean Cube can be embedded into the Hamming Ball so that neighbouring vertices in one graph are quite close to one another in the other graph. This looks somewhat surprising, as typically these two graphs are considered to

1

be as different as possible in certain precise senses. Part I The first part of the thesis we study Proximity Oblivious Testers with Two-Sided Error. It is based on a joint work with Oded Goldreich [GS12]. As explained above, a typical task in the area of property testing deals with the problem of distinguishing between objects that satisfy some property and objects that are far from having the property, while making only a small number of queries to the object. More precisely, a q-query tester T for a property P ⊆ {0, 1}∗ is a randomized procedure that gets as input a proximity parameter  > 0 and an oracle access to x ∈ {0, 1}n . The tester makes q queries to x, and satisfies the following conditions: YES If x ∈ P, then Pr[T x () = 1] > 2/3. NO If x is -far from P, then Pr[T x () = 1] < 1/3. That is, the goal in this problem is to infer about some global property of a given large object from having only access to its local views. A systematic study of property testing has been initiated by Goldreich, Goldwasser, and Ron in [34], and since then received a lot of attention. For an exposition of this subject see the excellent survey of Ron [66]. In this work we consider testers that make only constant number of queries, independent of the proximity parameter , and need to distinguish with some non-negligible probability between the objects that have some property and those far from having the property. Proximityoblivious testing takes the foregoing local-to-global phenomenon to the extreme by considering local views whose size is independent of the distance. In other words, it refers to the smallest local view that may provide information about the global property. Besides being interesting in their own right, a motivation for the studying of POTs is that understanding this natural subclass of testers may shed light on property testing at large. This paper extends the work of Goldreich and Ron [37], who studied POTs with one-sided error. In the context of testing distributions, we give a full characterization of properties of distributions with finite support that can be tested with constant number of queries. Studying two-sided error POTs for graph properties in the adjacency matrix model, we present a few examples of properties that have a two-sided error POT that include the class of regular graphs, regular graphs of a prescribed degree, and subclasses of such regular graphs (such as regular graphs that consists of a collection of bicliques). For example, for testing regularity we show a 2-query tester that accepts all regular graph with probability 0.5, while all graphs that are -far from regular are accepted with probability 0.5 − 2 . This work is a first exploration of the notion of two-sided error POTs, and leaves many problems unsolved. In particular, we leave open the problem of giving a characterization of 2

graph properties that admit a two-sided error POT. Part II The second part of the thesis is based on a joint work with Tal Orenshtein [OS11], where we define a non-Markovian random walk on graphs, which we call Greedy Random Walk (GRW). Greedy Random Walk is defined as follows. A walker is located initially at some vertex, and as time evolves, each vertex maintains the set of adjacent edges touching it that have not been crossed yet by the walker. At each step, the walker picks an adjacent edge among the edges it has not traversed thus far according to some (deterministic or randomized) rule, and jumps along this edge. If all the adjacent edges have already been traversed, then an adjacent edge is chosen uniformly at random. One can think of GRW as an algorithmic task, where the walker wishes to cover the graph as fast as possible, and is allowed to make some local computation in each vertex she visits (e.g., mark the last edge that the walker used to reach the current vertex, and also mark the edge that the walker is going to use in the next step), but is not allowed to transfer information between vertices. We show that for some natural families of graphs the expected edge cover time of GRW is linear in the number of edges. Examples of such graphs include the complete graph, even degree expanders of logarithmic girth, and the hypercube graph. These result should be compared with the general lower bound on simple random walk of Feige [31], saying that for any graph with n vertices the cover time of simple random walk is at least n log n. We are also interested in the behavior of GRW on infinite graphs. It is well known that SRW on Zd is transient if d ≥ 3, and recurrent otherwise. We prove that GRW is transient on Zd for d ≥ 3, that is, with positive probability the walk never returns to the origin. The case of d = 2 remains open, and we show that it is equivalent to the notorious two dimensional mirror model problem [24]. Part III The third part of the thesis introduces a parameter of connected graphs, called Acquaintance time of graph. This part is based on a work with Itai Benjamini and Gilad Tsur [BST13], and a work with Omer Angel [AS13]. The work deals with “social connectivity”, where agents walk on a graph meeting each other. Specifically, we define the following parameter of connected graphs. For a given graph G = (V, E) we place one agent in each vertex. Every pair of agents sharing a common edge are declared to be acquainted. In each step we choose a matching of G, and for each edge in the matching the agents on this edge switch places, and again every pair of agents sharing a common edge get acquainted. We define the acquaintance time parameter of G, denoted by AC(G), as the minimal number of steps required for everyone to get acquainted.

3

(|V2 |) It is easy to see that AC(G) ≥ max{diam(G)/2, |E| − 1} for every graph G = (V, E), where diam(G) is the distance between two vertices of G. For an upper bound, it is shown in [BST13] that every n-vertex graph G with maximal degree ∆ satisfies AC(G) ≤ O(n2 /∆). In addition, in a work with Omer Angel [AS13] we show that for all n-vertex graphs G with maximal degree ∆ it holds that AC(G) ≤ O(∆n). Combining the two bounds together we get that for every graph G with n vertices it holds that AC(G) ≤ O(n1.5 ). This bound it tight in general up to a multiplicative constant, as for all n ∈ N and for all positive integers k ≤ n1.5 there exists an n-vertex graph G such that k/c ≤ AC(G) ≤ c · k for some constant c ≥ 1. We also consider the computation problem of computing/approximating AC(G) for a given graph G. We show that the AC problem is N P-complete by a reduction from the coloring problem. In fact, we show that it is N P-hard to decide for a given graph G whether AC(G) ≤ t or AC ≥ 2t for all t ≥ 1. We conjecture that AC(G) is N P-hard to approximate within any multiplicative constant, and suggest an appealing conjecture on hardness of a variant of the graph coloring problem, that if true would imply yield a hardness result for AC. On the algorithmic side, we give a probabilistic polynomial time algorithm that when given graph G with AC(G) = 1 finds a solution of O(log(n)) matchings. Part IV In a joint work with Itai Benjamini and Gil Cohen [BCS13] we construct a bi-Lipschitz bijection between the Boolean cube and the Hamming ball. The Boolean cube is just the set {0, 1}n , and by the Hamming ball, denoted by Bn we refer to the set of points in {0, 1}n+1 whose distance from the string 11 · · · 1 is at most n/2, i.e., Bn = {x ∈ {0, 1}n+1 : |x| > n/2}. Our result says that for all even n ∈ N there exists an explicit bijection f : {0, 1}n → Bn such that for every x 6= y ∈ {0, 1}n it holds that dist(f (x), f (y)) 1 ≤ ≤ 4, 5 dist(x, y) where dist(·, ·) denotes the Hamming distance. This gives a negative answer to an open problem of Lovett and Viola [55], who raised the question as an attempt to prove that there exists no bijection g from {0, 1}n to the hamming ball in {0, 1}n+1 such that each output bit of g is in AC0 . In fact, the function f we construct is in uniform-TC0 . This result shows a somewhat surprising similarity between the Boolean cube and the Hamming ball, since we are used to think about these objects as quite opposite. For example, from the Boolean functions perspective, the indicator of {0, 1}n embedded in {0, 1}n+1 is commonly referred to as the dictator function, and the indicator of hamming ball is the majority function, and it is a recurring theme in the analysis of Boolean functions that they are, in some senses, 4

opposites of one another. Another interesting consequence of this result is that this implies that the Hamming ball is bi-Lipschitz transitive. As a key building block in our construction we use a classical partition of {0, 1}n to symmetric chains, due to De Bruijn, Tengbergen, and Kruyswijk [23], and show that this partition satisfies the following property: if two points x, y ∈ {0, 1}n are close, then the entire chains Cx and Cy containing x and y respectively must also be close to each other.

5

Part I

Two-Sided Error Proximity Oblivious Testing Abstract Loosely speaking, a proximity-oblivious (property) tester is a randomized algorithm that makes a constant number of queries to a tested object and distinguishes objects that have a predetermined property from those that lack it. Specifically, for some threshold probability c, objects having the property are accepted with probability at least c, whereas objects that are -far from having the property are accepted with probability at most c − %(), where % : (0, 1] → (0, 1] is some fixed monotone function. (We stress that, in contrast to standard testers, a proximity-oblivious tester is not given the proximity parameter.) The foregoing notion, introduced by Goldreich and Ron (STOC 2009), was originally defined with respect to c = 1, which corresponds to one-sided error (proximity-oblivious) testing. Here we study the two-sided error version of proximity-oblivious testers; that is, the (general) case of arbitrary c ∈ (0, 1]. We show that, in many natural cases, twosided error proximity-oblivious testers are more powerful than one-sided error proximityoblivious testers; that is, many natural properties that have no one-sided error proximityoblivious testers do have a two-sided error proximity-oblivious tester.

6

1

Introduction to Part I

In the last two decades, the area of property testing has attracted much attention (see, e.g., a couple of recent surveys [65, 66]). Loosely speaking, property testing typically refers to sublinear time probabilistic algorithms for deciding whether a given object has a predetermined property or is far from any object having this property. Such algorithms, called testers, obtain local views of the object by performing queries; that is, the object is seen as a function and the testers get oracle access to this function (and thus may be expected to work in time that is sub-linear in the length of the object). The foregoing description refers to the notion of “far away” objects, which in turn presumes a notion of distance between objects as well as a parameter determining when two objects are considered to be far from one another. The latter parameter is called the proximity parameter, and is often denoted ; that is, one typically requires the tester to reject with high probability any object that is -far from the property. Needless to say, in order to satisfy the aforementioned requirement, any tester (of a reasonable property) must obtain the proximity parameter as auxiliary input (and determine its actions accordingly). A natural question, first addressed systematically by Goldreich and Ron [37], is what does the tester do with this parameter (or how does the parameter affect the actions of the tester). A very minimal effect is exhibited by testers that, based on the value of the proximity parameter, determine the number of times that a basic test is invoked, where the basic test is oblivious of the proximity parameter. Such basic tests, called proximity-oblivious testers, are indeed at the focus of the study initiated in [37].

1.1

The notion of a Proximity Oblivious Tester (POT)

Loosely speaking, a proximity-oblivious tester (POT) makes a number of queries that does not depend on the proximity parameter, but the quality of its ruling does depend on the actual distance of the tested object to the property.1 (A standard tester of constant error probability can be obtained by repeatedly invoking a POT for a number of times that depends on the proximity parameter.) The original presentation (in [37]) focused on POTs that always accept objects having the property. Indeed, the setting of one-sided error probability is the most appealing and natural setting for the study of POT. Still, one can also define a meaningful notion of two-sided error probability proximity-oblivious testers (POTs) by generalizing the definition (i.e., [37, Def. 2.2]) as follows:2 1

A formal definition is presented below (cf. Definition 1.1). For simplicity, we define POTs as making a constant number of queries, and this definition is used throughout the current work. However, as in [37], the definition may be extended to allow the query complexity to depend on n. 2

7

S Definition 1.1 (POT, generalized): Let Π = n∈N Πn , where Πn contains functions defined def over the domain [n] = {1, ..., n}, and let % : (0, 1] → (0, 1] be monotone. A two-sided error POT with detection probability % for Π is a probabilistic oracle machine T that makes a constant number of queries and satisfies the following two conditions, with respect to some constant c ∈ (0, 1]: 1. For every n ∈ N and f ∈ Πn , it holds that Pr[T f (n) = 1] ≥ c. 2. For every n ∈ N and f : [n] → {0, 1}∗ not in Πn , it holds that Pr[T f (n) = 1] ≤ c − def %(δΠn (f )), where δΠn (f ) = ming∈Πn {δ(f, g)} and δ(f, g) = |{x ∈ [n] : f (x) 6= g(x)}|/n. The constant c is called the threshold probability (of T ). Indeed, one-sided error POTs (i.e., [37, Def. 2.2]) are obtained as a special case by letting c = 1. Furthermore, for every c ∈ (0, 1], every property having a one-sided error POT also has a twosided error POT of threshold probability c (e.g., consider a generalized POT that activates the one-sided error POT with probability c and rejects otherwise). Likewise, every property having a (two-sided error) POT, has a two-sided error POT of threshold probability 1/2. Lastly, a standard property tester is obtained by repeatedly invoking such a POT for O(1/%()2 ), where  is the value of the proximity parameter given to the tester. (Indeed, in case of one-sided error POT, we obtain a one-sided error property tester by O(1/%()) invocations.) Motivation. Property testing can be thought of as relating local views to global properties, where the local view is provided by the queries and the global property is the distance to a predetermined set. Proximity-oblivious testing takes this relation to an extreme by making the local view independent of the distance. In other words, it refers to the smallest local view that may provide information about the global property (i.e., the distance to a predetermined set). Hence, POTs are a natural context for the study of the relation between local views and global properties of various objects. In addition, a major concrete motivation for the study of POTs is that understanding a natural subclass of testers (i.e., those obtained via POTs) may shed light on property testing at large. This motivation was advocated in [37], while referring to one-sided error POTs, but it extends to the generalized notion defined above.

1.2

On the power of two-sided error POTs

The first question that arises is whether the latter generalization (i.e., from one-sided to twosided error POTs) is a generalization at all (i.e., does it increase the power of POTs). This is not obvious, and for some time the first author implicitly assumed that the answer is negative. However, considering the issue seriously, one may realize that two-sided error POTs exist also 8

for properties that have no one-sided error POT. A straightforward example is the property of Boolean functions that have at least a τ fraction of 1-values, for any constant τ ∈ (0, 1). But this example is quite artificial and contrived, and the real question is whether there exist more natural examples. In this paper we provide a host of such examples. The current work reports of several natural properties that have two-sided error POTs, although they have no one-sided error POTs. A partial list of such examples includes: 1. Properties of Boolean functions that refer to the fraction of 1-values (i.e., the density of the preimage of 1). Each such property is specified by a constant number of subintervals of [0, 1], and a function satisfies such a property if the fraction of 1-values (of the function) resides in one of these subintervals. 2. Testing graph properties in the adjacency matrix model. One class of properties refers to regular graphs, regular graphs of a prescribed degree, and to subclasses of such regular graphs (e.g., regular graphs that consists of a collection of bicliques). Another class refers to graphs in which some fixed graph occurs for a bounded number of times (e.g., at most 1% of the vertex triplets form triangles). 3. Testing graph properties in the bounded-degree model. One class of properties refers to graphs that contain a fraction of isolated vertices that falls in a predetermined set of densities (as in the foregoing Item 1). It is evident that none of the foregoing properties has a one-sided error POTs.3 The point is showing that they all have two-sided error POTs. A more detailed account of these and other results is provided next.

1.3

An overview of our results

In this section and throughout the rest of this paper, unless stated differently, a POT is a twosided error one. We first consider POTs for symmetric properties of Boolean functions, where a property Π = ∪n∈N Πn is symmetric if for every f ∈ Πn and every permutation π : [n] → [n] it def holds that f ◦ π ∈ Πn (where (f ◦ π)(x) = f (π(x))). Each symmetric property of Boolean functions, Π = ∪n∈N Πn , is characterized by a sequence of sets (Sn )n∈N such that for every f : [n] → {0, 1} it holds that f ∈ Πn if and only if |{x ∈ [n] : f (x) = 1}| ∈ Sn . We say that 3

Consider, for example, the task of testing the set of Boolean functions that have at least a τ fraction of 1values, for any constant τ ∈ (0, 1). A hypothetical one-sided error POT for this property is required to accept each function that has exactly a τ fraction of 1-values, with probability 1, which implies that it must accept regardless of the answers it obtains (since each sequence of answers is consistent with such a function). But, then, this POT accepts each Boolean function with probability 1, which means that it is a POT for the trivial property (rather than for the aforementioned one).

9

a set of natural numbers is t-consecutive if it can be partitioned into at most t sequences of consecutive numbers (e.g., {1, 2, 3, 5, 7} is 3-consecutive but not 2-consecutive). Theorem 1.2 (POTs for symmetric properties of Boolean functions): Let Π = ∪n∈N Πn be symmetric property that is chracterized by the sequence of sets (Sn )n∈N . Then, Π has a POT if and only if there exists a constant t such that each Sn is t-consecutive. We stress that subintervals are allowed to have length zero (i.e., [0.5, 0.5] is a vaild subinterval). Theorem 1.2 is proved by relating uniform symmetric properties of Boolean functions to properties of distributions that assume values in {0, 1}, whereas chracterization of the binary distributions that have a POT is provided in Theorem 2.4. Jumping ahead, we mention that this relation generalized to the relation between functions with range Σ and distributions that assume values in Σ. We next turn to testing graph properties in the adjacency matrix model (as defined in [34]). Here we present POTs for several properties that refer to regular graphs including all regular graphs, regular graphs of a prescribed degree, and some subclasses of the latter. Theorem 1.3 (POTs for certain classes of regular graphs, in the adjacency matrix model): The following graph properties have a POT. 1. The set of all regular graphs. 2. The set of all κ · N -regular N -vertex graphs, for any constant κ. 3. The set of all regular complete t-partite graphs, for any constant t ≥ 2. Item 1 of Theorem 1.3 appears as Theorem 3.7, Item 2 appears as Theorem 3.1, and Item 3 is derived by combining Theorem 3.3 (which states a general condition) with Proposition 3.4 (which shows that it holds for complete t-partite graphs). An altogether different class of properties that have POTs is the class of properties that upper bounded the density of the occurrences of some fixed graph as an induced subgraph. Specifically, for any fixed graph H and a generic graph G, let dnsH (G) denote the density of H as an induced subgraph of G. Let ΠH,τ denote the set of graphs G that satisfy dnsH (G) ≤ τ . Recall that Alon et al. [6] showed that, for every fixed H, the class ΠH,0 has a one-sided error POT, albeit their lowerbound on the detection probability of this POT is very weak (i.e., a graph that is δ-far from ΠH,τ is rejected with probability 1/T(poly(1/δ)), where T(m) is a tower of m exponents). Here we provide a much sharper bound for the case of τ > 0 (while using an elementary proof and a two-sided error POT, which is necessary in this case).

10

Theorem 1.4 (a POT for ΠH,τ , still in the adjacency matrix model): For every n-vertex graph H and τ > 0, the property ΠH,τ has a POT. Furthermore, this POT accepts each graph in ΠH,τ with probability at least 1 − τ and accepts graphs that are δ-far from ΠH,τ with probability at most 1 − τ − (τ n/3) · δ.4 Theorem 1.4 follows from Theorem 3.10, which relates the distance of a graph from ΠH,τ to the density of H as an induced subgraph in it. We also consider testing graph properties in the bounded-degree graph model (as defined in [35]). In this case, our results are obtained by simple reductions to the problem of testing binary distributions. Loosely speaking, the main result in this model is a POT for properties that refer to the number of isolated subgraphs that equal one of the graphs in some fixed family of graphs. For details, see Section 4 (and Theorem 4.3). Theorems 1.3, 1.4 and 4.3 refer to the density of the occurrence of some specific patterns in the tested graph (e.g., Theorem 1.3 refers to the density of edges incident at various vertices, and Theorem 1.4 refers to the density of occurrence of a fixed graph as an induced subgraph). These densities correspond to binary distributions, but when one wishes to refer to a number of densities that correspond to the occurrences of different patterns, then multi-valued distributions arise. Indeed, a property may be defined by an arbitrary condition of the form “pattern A occurs between 10%-20% of the time, whereas pattern B occurs at least twice as often as pattern C.” This motivates the study of POTs for properties of distributions over an arbitrary fixed-size domain (rather than over a binary domain). This study is initiated in Section 5 and its results are applied to graph properties in Section 6. It turns out that POTs for properties of multi-valued distributions are more exceptional than their binary-valued analogues. As hinted above, Theorem 2.4 asserts that properties (of binary distributions) that correspond to intervals (bounding the probability that the outcome is 1) have POTs. It is tempting to hope that properties of ternary distributions that correspond to rectangles (bounding the probabilities of the outcomes 1 and 2 respectively) also have POTs; however, as shown in Section 5, this is typically not the case! In contrast, properties of multivalued distributions that corresponds to regions that are ellipsoids do have POTs. In general, the question of whether a property of r-valued distributions has a POT or not is closely related to the question of whether there exists a polynomial that is non-negative on the distributions (viewed as a set in Rr ). Theorem 1.5 (POTs for testing multi-valued distributions, a coarse version of Theorem 5.1): Let Π be an arbitrary class of distributions over [r], viewed as the set of all non-negative rsequences that sum up to 1. Then, Π has a POT if and only if there is a polynomial P : Rr → R such that for every distribution q = (q1 , . . . , qr ) it holds that P(q1 , . . . , qr ) ≥ 0 if and only if 4

This bound assumes that the tested graph has more than 6/δ vertices.

11

q ∈ Π. Furthermore, if the total degree of P is t, then Π has a two-sided error POT that makes t queries and has polynomial detection probability, where the power of the polynomial depends on P. In light of these limitations of POTs for properties of multi-valued distributions, we focus in Section 6 on very simple properties of graphs (in the adjacency matrix representation). Specifically, for every sequence of intervals (IN )N ∈N such that IN ⊆ [0, 0.5], we consider the set of graphs consisting of two isolated cliques such that the density of the smaller clique resides in IN , where N denotes the number of vertices in the graph. Testing these properties is reduced to testing the distribution of subgraphs induced by three random vertices, which is a distribution that assumes four possible values. We show that if for all N ∈ N the interval IN = I for some interval I ⊆ [0, 0.5] of length that is strictly contained in (0, 0.5) (i.e., I is neither a singleton nor contain all of [0, 0.5]), then the corresponding property has no POT. In contrast, if the length of IN is either smaller than N −1/2 or larger than 0.5 − N −1/2 , then the corresponding property has a POT. Conclusion. The current work does not provide conclusive answers regarding the scope of two-sided error POTs, although some of our results aim at that direction. In particular, Theorem 2.4 provides a characterization of binary distributions having a POT, and Theorem 5.1 provides a less effective characterization w.r.t multi-valued distributions, and Theorem 3.11 may be viewed as a programmatic step in the context of graph properties. Indeed, the current work is merely a first exploration of the notion of two-sided error POTs.

1.4

Organization

In order to facilitate developing an intuition regarding the power of two-sided error POTs, we partitioned the exposition into two parts. The first part is organized in three sections, which correspond to three domains: Section 2 deals with properties of Boolean functions, Section 3 deals with testing graph properties in the adjacency matrix model (of [34]), and Section 4 deals with testing graph properties in the bounded-degree model (of [35]). In the second part, we revisit the study of classes of distributions (see Section 5), which underlies the study presented in the first part, and apply the results to further study of graph properties in the adjacency matrix model (see Section 6). The second part is far more technical than the first part, and we chose to present it later in order to allow the reader to go from simple examples to more complex ones. Also, for sake of readability, the proofs of many technical claims (especially in the second part) were moved to the appendix.

12

2

Classes of Boolean Functions

As mentioned above, a simple example of a property of Boolean functions that has a (two-sided error) POT is provided by the set of all functions that have at least a τ fraction of 1-values, for any constant τ ∈ (0, 1). In this case, the POT may query the function at a single uniformly chosen preimage and return the function’s value. Indeed, every function in the foregoing set is accepted with probability at least τ , whereas every function that is -far from the set is accepted with probability at most τ − . A more telling example refers to the set of Boolean function having a fraction of 1-values that is at least τ1 but at most τ2 , for any 0 < τ1 < τ2 < 1. This property has a two-sided error POT that selects uniformly two samples in the function’s domain, obtains the function values on them, and accept with probability αi if the sum of the answers equals i, where (α0 , α1 , α2 ) = 1 −τ2 ) 2 −1) ) if τ1 + τ2 ≥ 1, and (α0 , α1 , α2 ) = ( 2(1−τ , 1, 0) otherwise. (0, 1, 2(ττ11+τ +τ2 2−τ1 −τ2 In general, we consider properties that are each specified by a sequence of t density thresholds, denoted τ = (τ1 , ..., τt ), such that 0 < τ1 < τ2 < · · · < τt < 1. The corresponding property, denoted Bτ , consists of all Boolean functions f : [n] → {0, 1} such that for some def i ≤ dt/2e it holds that τ2i−1 ≤ Prr∈[n] [f (r) = 1] ≤ τ2i , where τt+1 = 1 for odd t. We observe that the foregoing testing task, which refers to Boolean functions, can be reduced to testing 0-1 distributions when the tester is given several samples of the tested distribution (i.e., these samples are independently and identically distributed according to the tested distribution).5 Specifically, the corresponding class of distributions, denoted Dτ , consists of all 0-1 random variables X such that for some i ≤ dt/2e it holds that τ2i−1 ≤ Pr[X = 1] ≤ τ2i . Indeed, (uniformly selected) queries made to a Boolean function (when testing Bτ ) correspond to samples obtained from the tested distribution.

2.1

A generic tester and its analysis

A generic tester for Dτ obtains k samples from the tested distribution, where k may (but need not) equal t, and outputs 1 with probability αi if exactly i of the samples have value 1. That is, this generic tester is parameterized by the sequence α = (α0 , α1 , ..., αk ). The question, of course, is how many samples do we need (i.e., how is k related to t and/or to other parameters); in other words, whether it is possible to select a (k + 1)-long sequence α such that the resulting tester, denoted Tα , is a POT for Dτ . (We shall show that k = t is sufficient and necessary.) The key quantity to analyze is the probability that this tester (i.e., Tα ) accepts a distribution that is 1 with probability q. This accepting probability, denoted Pα (q), satisfies 5

In this case, the distance between distributions is merely the standard notion of statistical distance.

13

Pα (q) =

k   X k i=0

i

· q i (1 − q)k−i · αi .

(1)

Indeed, the function Pα is a degree k polynomial. Noting that 0-1 distributions are determined by the probability that they assume the value 1, we associate these distributions with the corresponding probabilities (e.g., we may say that q is in Dτ and mean that the distribution that is 1 with probability q is in Dτ ). Thus, Tα is a POT for Dτ if every distribution that is -far from Dτ def is accepted with probability at most c − %(), where c = minq∈Dτ {Pα (q)} and % : (0, 1] → (0, 1] is some monotone function. One necessary condition for the foregoing condition to hold is that for every i ∈ [t] it holds that Pα (τi ) = c, because otherwise a tiny shift from some τi to outside Dτ will not reduce the value of Pα (·) below c. Another necessary condition is that Pα (·) is not a constant function. We first show that there exists a setting of α for which both conditions hold (and, in particular, for k = t). Proposition 2.1 (on the existence of τ such that Pα is “good”): For every sequence τ = (τ1 , ..., τt ) such that 0 < τ1 < τ2 < · · · < τt < 1, there exists a sequence α = (α0 , α1 , ..., αt ) ∈ [0, 1]t+1 such that the following two conditions hold 1. For every i ∈ [t], it holds that Pα (τi ) = Pα (τ1 ). 2. The function Pα is not a constant function. Proof Fixing any q, we view (1) as a linear expression in the αi ’s. Thus, Condition 1 yields a system of t − 1 linear equations in the t + 1 variables α0 , α1 , ..., αt . This system is not contradictory, since the uniform vector, denoted u, is a solution (i.e., α = ((t + 1)−1 , ..., (t + 1)−1 ) satisfies Pα (τi ) = (t + 1)−1 ). Thus, this (t − 1 dimensional) system has also a solution that is linearly independent of u. Denoting such a solution by s, consider arbitrary β 6= 0 and def γ such that βs + γu ∈ [0, 1]t+1 \ {0t+1 }. Note that α = βs + γu satisfies the linear system and is not spanned by u. To establish Condition 2, we show that only vectors α that are spanned by u yield a constant function Pα . To see this fact, write Pα (q) as a polynomial in q, obtaining: t X

  d t (−1) · Pα (q) = d d=0

  d X i d (−1) · αi i i=0

! · qd.

(2)

 P Hence, if Pα is a constant function, then for every d ∈ [t] it holds that di=0 (−1)i di · αi = 0, which yields a system of t linearly independent equation in t + 1 unknowns. Thus, the only solutions to this system are vectors that are spanned by u, and the claim follows. We next prove that the sequence α guaranteed by Proposition 2.1 yields a POT for Dτ . 14

Theorem 2.2 (analysis of Tα ): For every sequence τ = (τ1 , ..., τt ) such that 0 < τ1 < τ2 < · · · < τt < 1, there exists a sequence α = (α0 , α1 , ..., αt ) ∈ [0, 1]t+1 such that Tα is a POT with linear detection probability for Dτ . Proof Let α = (α0 , α1 , ..., αt ) ∈ [0, 1]t+1 be as guaranteed by Proposition 2.1. Then, the (degree t) polynomial Pα “oscillates” in [0, 1], while obtaining the value Pα (τ1 ) on the t points τ1 , τ2 , ..., τt (and only on these points). Thus, for every i ∈ [t] and all sufficiently small  > 0, exactly one of the values Pα (τi − ) and Pα (τi + ) is larger than Pα (τ1 ) (and the other is smaller than it). Without loss of generality, it holds that Pα (q) ≥ Pα (τ1 ) for every q in Dτ and Pα (q) < Pα (τ1 ) otherwise.6 Furthermore, we claim that there exists a constant γ such that, for any probability q that is -far from Dτ , it holds that Pα (q) ≤ Pα (τ1 ) − γ · . This claim can be proved by considering the Taylor expansion of Pα ; specifically, expanding Pα (q) based on the value at τi yields Pα (q) = Pα (τi ) +

Pα0 (τi )

· (q − τi ) +

t (j) X P (τi ) α

j=2

j!

· (q − τi )j ,

(j)

where P0α is the derivative of Pα and Pα is the j th derivative of Pα . By the above, P0α (τi ) 6= 0 def def (j) (for all i ∈ [t]). Let v = mini∈[t] {|P0α (τi )|} > 0 and w = maxi∈[t],j≥2 {|Pα (τi )|/j!}. Then, for (j) P P (τ ) all sufficiently small  > 0 (say for  ≤ min(1, v)/3w), it holds that tj=2 α j! i · (q − τi )j P is upper bounded by j≥2 w · (v/3w) · (1/3)j−2 = v · /2; and so, for every i ≤ dt/2e, it holds that Pα (τ2i−1 − ) < Pα (τ2i−1 ) − v · /2 and Pα (τ2i + ) < Pα (τ2i ) − v · /2. Using γ = min(1, v)/3tw, the claim holds for all  ≤ 1.

Sample optimality: We have analyzed a generic tester that uses k = t samples for testing a property parameterized by t thresholds (i.e., τ = (τ1 , ..., τt )). The proof of Theorem 2.2 implies that using t samples (i.e., k ≥ t) is necessary, because for α = (α0 , α1 , ..., αk ) we need the (non-constant) degree k polynomial Pα to attain the same value on t points (i.e., the τi ’s). The case of t = 2: The considerations underlying the proof of Theorem 2.2 imply that in this case (i.e., t = 2) the polynomial Pα is quadratic and equals Pα (q) = α0 − 2(α0 − α1 ) · q + (α0 − def 2α1 + α2 ) · q 2 (cf. (2)). Thus, Pα obtains its maximum at the point τ = (τ1 + τ2 )/2, which in 6

Otherwise, use 1 − Pα .

15

turn equals

2(α0 −α1 ) . 2·(α0 −2α1 +α2 )

The derivative of Pα at τ2 (and likewise −P0α (τ1 )) equals

P0α (τ2 ) = −2(α0 − α1 ) + 2 · (α0 − 2α1 + α2 ) · τ2 α0 − α1 = −2(α0 − α1 ) + 2 · · τ2 τ 2 = · (τ2 − τ ) · (α0 − α1 ) τ τ2 − τ1 · (α1 − α0 ), = − τ where the second equality is due to α0 −α1 = τ ·(α0 −2α1 +α2 ) (and the last to τ = (τ1 +τ2 )/2). Thus, we wish to maximize α1 − α0 subject to α0 , α1 , α2 ∈ [0, 1]. Using α0 − α1 = τ · (α0 − −1)α1 2α1 + α2 ) again, we obtain α2 = (1−τ )α0 +(2τ . Hence, if τ ≥ 1/2, then we may just τ 2τ −1 set α0 = 0 and α1 = 1 (and α2 = τ ∈ [0, 1]). On the other hand, if τ ≤ 1/2, then the maximum of α1 − α0 subject to α0 , α1 , α2 ∈ [0, 1] is obtained at α2 = 0, which implies ∈ [0, 1]). In both cases, letting (1 − τ )α0 = (1 − 2τ )α1 (i.e., setting α1 = 1 and α0 = 1−2τ 1−τ γ = max(τ, 1 − τ ) ∈ [0, 5, 1), we obtain τ2 − τ1 · (α1 − α0 ) τ τ2 − τ1 ≥ 2(τ2 − τ1 ), = γ

−P0α (τ2 ) =

which means that distributions that are -far from Dτ are rejected with probability at least 2(τ2 − τ1 ) · .

2.2

Generalization of Theorem 2.2

So far we considered distribution classes Dτ such that τ = (τ1 , ..., τt ) and 0 < τ1 < τ2 < · · · < τt < 1. Recall that this class contains the distribution q if and only if τ2i−1 ≤ q ≤ τ2i for some i ≤ t/2. Here we also allow τ1 = 0, which corresponds to including in Dτ all distributions X such that Pr[X = 1] ≤ τ2 . In such case we define the sequence α = (α0 , α1 , ..., αt−1 ) as guaranteed by Proposition 2.1 such that the polynomial Pα of degree t − 1 is non-constant and satisfies Pα (τi ) = Pα (τ2 ) for all i ≥ 2. Analogously we treat the case of t being even and τt = 1, which corresponds to including in Dτ all distributions X such that Pr[X = 1] ≥ τt−1 . In both cases the induced tester Tα is a POT for Dτ . We consider also the case that τ2i−1 = τ2i (for some i’s); that is, some of the allowed intervals can be collapsed to single points. Consider, for example, the distribution classes Dτ,τ , for some τ ∈ (0, 1). The foregoing design of a POT for Dτ1 ,τ2 can be easily adapted for the case of Dτ,τ . Specifically, rather than ensuring that Pα (τ1 ) = Pα (τ2 ), we ensure that Pα obtain a maximum at τ (equiv., P0α (τ ) = 0), which is actually what we did in the case of 16

def

α0 −α1 t = 2 in Section 2.1 for τ = (τ1 + τ2 )/2. Thus, we again get τ = α0 −2α , which implies 1 +α2 1−2τ (α0 , α1 , α2 ) = (0, 1, (2τ − 1)/τ ) if τ ≥ 1/2 and (α0 , α1 , α2 ) = ( 1−τ , 1, 0) otherwise. Next, (2) (2) we approximate Pα at τ +  by Pα (τ ) + Pα (τ ) · 2 /2, where Pα (τ ) = 2(α0 − 2α1 + α2 ) = 2(α0 − α1 )/τ . (Note that a much simpler test and analysis is begging in the case that t = 2 and τ1 = τ2 = 1/2.)7 More generally, we get

Theorem 2.3 (Theorem 2.2, generalized): For every sequence τ = (τ1 , ..., τt ) ∈ [0, 1]t such that τ1 ≤ τ2 < τ3 ≤ τ4 < · · · < τt−1 ≤ τt , there exists a sequence α = (α0 , α1 , ..., αt ) ∈ [0, 1]t+1 such that Tα is a POT with quadratic detection probability for Dτ . Furthermore, if τ2i−1 = τ2i for every i ∈ [dt/2e], then Pα (q) = Pα (τ1 ) for every q in Dτ . Proof Let J = {j : τ2j−1 = τ2j }. Then, the system of equations regarding the αi ’s contains t − |J| − 1 equations that arise from the equalities imposed on the values of Pα at t − |J| different points and |J| additional equalities that arise from equalities imposed on the values of P0α at |J| different points. The same considerations (as in the proof of Theorem 2.2) imply the existence of a solution τ such that Pα is not a constant function, but here the analysis of Pα (τj ± ) depends on whether or not dj/2e ∈ J: The case of dj/2e 6∈ J is handled as in the (2) proof of Theorem 2.2, but the case of dj/2e ∈ J relies on the fact that Pα (τ2j ) < 0.

2.3

POTs can test only intervals

In this section we show that the only testable classes of Boolean distributions are those defined by a finite collection of intervals in [0, 1], where intervals of length zero (i.e., points) are allowed. This means that the only properties of Boolean distribution that have a POT are those covered in Theorem 2.3. Theorem 2.4 (characterization of Boolean distributions having a POT): Let DS be a property of Boolean distributions associated with a set S ⊆ [0, 1] such that distribution X is in DS if and only if Pr[X = 1] ∈ S. Then, the property DS has a POT if and only if S consists of a finite subset of subintervals of [0, 1]. Proof The “if” direction follows from Theorem 2.3. For the other direction, assume that T is POT for DS that makes k queries. Then, for a view b = (b1 , . . . , bk ) ∈ {0, 1}k , the tester T accepts this view with some probability, denoted αb ∈ [0, 1]. Note that when testing a distribution X such that Pr[X = 1] = p, the probability of seing this view is pw(b) (1 − p)k−w(b) , 7

In this case, a POT may just select two random samples and accept if and only if exactly one of them assumed the value 1. The probability that this test accepts the distribution q equals 2q(1 − q) = 21 − 2(q − 0.5)2 .

17

P where w(b) = j bj denotes the number of 1’s in b. Hence, when given a distribution X such that Pr[X = 1] = p, the acceptance probability of T on X is

Pr[T accepts X] =

k X



 X

αb  · pi (1 − p)k−i ,

 i=0

(3)

b∈{0,1}k :w(b)=i

which is a polynomial of degree k (in p). Thus, for every r ∈ R, the set of points p ∈ [0, 1] on which the value of this polynomial is at least r equals a union of up to d(k + 1)/2e intervals. In particular, this holds for r = c, where c denotes the threshold probability of T , in which case this set of points equals the set S (because T is POT for DS ). The theorem follows.

2.4

Proof of Theorem 1.2

As should be clear by now, the positive part of Theorem 2.4 implies the positive part of Theorem 1.2: That is, a POT for the the symmetric property Π is obtained by sampling elements in [n], querying the Boolean function for their value, and invoking the corresponding distributionPOT. The opposite direction require a little more care, since a tester for the function class Π may avoid repeated samples, while a distribution-tester may not. Furthermore, the behavior of the former may depend on n, whereas the behavior of the latter may not depensd on the unknown sample space. Still, by considering a sufficiently large n, these effect become negligible. Still, we may just mimic the argument used in the proof of the corresponding part of Theorem 2.4. The key obsevation that the probability that a k-query tester accepts a random function f : [n] → {0, 1} that evaluates to 1 on exactly m inputs equasl k X



 X 

i=0

b∈{0,1}k :w(b)=i



m−j αb  · Pr j=0 n − j i−1

   k−i−1 n − m − j · Pr , j=0 n−j

(4)

where the αb are as in (3) (except that they may depend on n), and we assume (w.l.o.g.) that the tester always makes k queries and never makes the same query twice. Letting ρ = m/n denote the density of 1-values in f , observe that (4) is a polynomail of degree k in ρ. The theorem follows (exactly as in the case of Theorem 2.4).

3

Graph Properties (in the Adjacency Matrix Model)

Symmetric properties of Boolean functions induce graph properties (in the adjacency matrix model of [34]), and so the statistical properties of the previous section yield analogous proper18

ties that refer to the edge densities of graphs. The question addressed in this section is whether the study of two-sided error POT can be extended to “genuine” graph properties. The first property that we consider is degree regularity. Recall that, in the adjacency matrix model, an N -vertex graph G = ([N ], E) is represented by the Boolean function g : [N ] × [N ] → {0, 1} such that g(u, v) = 1 if and only if u and v are adjacent in G (i.e., {u, v} ∈ E). Distance between graphs is measured in terms of their aforementioned representation (i.e., as the fraction of (the number of) different matrix entries (over N 2 )), but occasionally we shall use the more intuitive notion of the fraction of (the number  of) edges over N2 .

3.1

The class of k-regular graphs (k)

(k)

For every function k : N → N, we consider the set R(k) = ∪N ∈N RN such that RN is the (k) set of all k(N )-regular N -vertex graphs. That is, G ∈ RN if and only if G is a simple N vertex graph in which each vertex has degree k(N ). Clearly, R(k) has no one-sided error POT, provided that 0 < k(N ) < N −1 (cf. [37, Sec. 4]).8 In contrast, we show that it has a two-sided error POT. Theorem 3.1 (a POT for R(k) ): For every function k : N → N such that k(N ) = κN for some fixed constant κ ∈ (0, 1), the property R(k) has a two-sided error POT. Furthermore, all graphs in R(k) are accepted with equal probability. Proof We may assume that N · k(N ) is an even integer (since otherwise the test may reject without making any queries). On input N and oracle access to an N -vertex graph G = ([N ], E), the tester sets τ = k(N )/N = κ and proceeds as follows. 1. Selects uniformly a vertex s ∈ [N ] and consider the Boolean function fs : [N ] → {0, 1} such that fs (v) = 1 if and only if {s, v} ∈ E. 2. Invokes the POT of Theorem 2.3 to test whether the function fs has density τ ; that is, it tests whether the random variable Xs defined uniformly over [N ] such that Xs (v) = fs (v) is in the class Dτ,τ . Recall that this POT takes two samples of Xs and accepts with probability αi when seeing i values of 1. (The values of (α0 , α1 , α2 ) are set based on τ .)9 Specifically, the characterization in [37, Thm. 4.7] implies that it suffices to show that R(k) is not a subgraph freeness property. Assume, without loss of generality that k(N ) > N/2. Then, the subgraphs disallowed in R(k) cannot contain a clique, and it follows that the N -vertex clique is in R(k) , which contradicts k(N ) < N − 1. 9 Recall (cf. Section 2.2) that we may use the setting outlined at the end of Section 2.1: That is, we set α1 = 1, 2τ −1 and if τ ≤ 1/2, then α2 = 0 and α0 = 1−2τ 1−τ , and otherwise α0 = 1 and α2 = τ . 8

19

The implementation of Step 2 calls for taking two samples of Xs , which amounts to selecting uniformly two vertices and checking whether or not each of them neighbors s. Thus, we make two queries to the graph G. Turning to the analysis of the foregoing test, let P(q) denote the probability that the POT invoked in Step 2 accepts a random variable X such that Pr[X = x] = q. Then, the probability that our graph tester accepts the graph G equals 1 X · P(dG (s)/N ), N

(5)

s∈[N ]

where dG (v) denotes the degree of vertex v in G. Thus, every k(N )-regular N -vertex graph G is accepted with probability P(τ ). As we shall show, the following claim (which improves over (k) a similar claim in [34, Apdx D]) implies that every graph that is -far from RN is accepted with probability P(τ ) − Ω(2 ). P (k) Claim 3.2 If v∈[N ] |dG (v) − k(N )| ≤ 0 · N 2 , then G is 60 -close to RN . The proof of Claim 3.2 is presented in Appendix A.1. Note that the claim is non-trivial, since it asserts that small local discrepancies (in the vertex degrees) imply small distance to regularity. The converse is indeed trivial. P (k) Using Claim 3.2, we infer that if G is -far from RN , then v∈[N ] |dG (v) − k(N )| >  · N 2 /6. On the other hand, by Theorem 2.3 (or the analysis of the case t = 2 that precedes it), we have, for some γ > 0,

 1 X 1 X · P(dG (s)/N ) ≤ · P(τ ) − γ · ((dG (s) − k(N ))/N )2 N N s∈[N ]

s∈[N ]

1 X γ · · (dG (s) − k(N ))2 N2 N s∈[N ] !2 P |d (s) − k(N )| G γ s∈[N ] ≤ P(τ ) − 2 · N N = P(τ ) −

P where the last inequality follows by the Cauchy-Schwarz inequality. Now, using v∈[N ] |dG (v)− k(N )| >  · N 2 /6, we conclude that G is accepted with probability at most P(τ ) − γ · (/6)2 . The theorem follows.

3.2

Other regular graph properties

The two-sided error POT guaranteed by Theorem 3.1 can be combined with one-sided POT for other graph properties to yield two-sided error POTs for the intersection. This combination is 20

possible whenever the two properties behave nicely with respect to intersection in the sense that being close to both properties (i.e., to both R(k) and Π) implies being close to their intersection (i.e., to R(k) ∩ Π). Recall that, as pointed out in [34], in general it may not be that case that objects that are close to two properties are also close to their intersection. Theorem 3.3 (a POT for R(k) ∩ Π): Let Π be a graph property that has a one-sided error and k(N ) = κN for some fixed constant κ ∈ (0, 1). Suppose that there exists a monotone function F : (0, 1] → (0, 1] such that if G is δ-close to both Π and R(k) then G is F (δ)-close to Π∩R(k) . Then, Π ∩ R(k) has a two-sided error POT. Note that the condition made in Theorem 3.3 may not hold in general. For example, consider Π that consists of all bicliques as well as all graphs that each consist of two isolated cliques. Then, for k(N ) = N/2, it holds that R(k) ∩ Π consists of N -vertex bicliques with N/2 vertices on each side, and so the graph G consisting of two N/2-vertex cliques is 0.49-far from R(k) ∩ Π. On the other hand, G is in Π and is 1/N -close to R(k) (by virtue of adding a perfect matching between the two cliques). In contrast, it can be shown that the condition in Theorem 3.3 holds with respect to k(N ) = 2N/3 and the set Π consisting of all complete tripartite graphs (see either Proposition 3.4 or Proposition 3.5). Proof On input N and oracle access to an N -vertex graph G = ([N ], E), the tester proceeds as follows (while assuming that κN is an integer and κN 2 is even).10 1. Invokes the POT for R(k) and reject if it halts while rejecting. Otherwise, proceeds to the next step. 2. Inkoves the POT for Π and halts with its verdict. The analysis relies crucially on the fact that the (two-sided error) POT for R(k) accepts any graph in R(k) with the same probability, denoted c. It follows that any N -vertex graph in Π ∩ R(k) is accepted with probability c · 1 = c. Next, we show that graphs that are far from Π ∩ R(k) are accepted with probability that is significantly smaller than c. Let G be a graph that is δ-far from Π ∩ R(k) . Then, by the hypothesis regarding Π and R(k) , either G is F −1 (δ)-far from Π or G is F −1 (δ)-far from R(k) . In the first case, G is accepted with probability at most c · (1 − %1 (F −1 (δ))), where %1 is the detection probability function of the one-sided error POT for Π. Note that we rely on the fact that the (two-sided error) POT for R(k) accepts any graph with probability at most c. In the second case (i.e., G is far from R(k) ), it holds that G is accepted with probability at most c − %2 (F −1 (δ)), where %2 is the detection probability function of the two-sided error POT for R(k) . The claim follows. 10

Otherwise, the tester rejects upfront, since no N -vertex graph can be κN -regular.

21

Corollaries. One natural question is which properties Π satisfy the condition of Theorem 3.3 and what properties arise from their intersection with R(k) . Recall that by the characterization result of [37], the property Π must be defined in terms of subgraph freeness (since only such properties have a one-sided error POT). However, the intersection Π ∩ R(k) may not be easy to characterize in general. Furthermore, as indicated above, some subgraph freeness properties satisfy the condition of Theorem 3.3 while others do not. We consider this issue in the context of two specific classes of properties, studied in [36]. The first class consists of all complete t-partite graphs, where a graph is called complete t-partite if its vertex set can be partitioned into t (independent) sets such that two vertices are connected by an edge if and only if they belong to different sets. Proposition 3.4 (on regular complete t-partite graphs): Let t ≥ 2 be an integer and k(N ) = (t − 1)N/t. 1. The set of k-regular complete t-partite graphs equals to the set of complete t-partite graphs in which each part (i.e., independent set) has density 1/t. 2. If a graph G = ([N ], E) is δ-close to both the set of complete t-partite graphs and to √ R(k) , then G is O( δ)-close to some k-regular complete t-partite graph. Thus, the set of k-regular complete t-partite graph has a two-sided error POT. Proof Let Π denote the set of t-partite graphs. First, we show that R(k) ∩ Π equals the set of all k-regular complete t-partite graphs, which we denote by Π0 . This follows by considering the t-partition (V1 , ..., Vt ) of an arbitrary N -vertex graph in Π, and observing that the degree P condition implies that for every i ∈ [t] such that Vi 6= ∅ it holds that j6=i |Vj | = k(N ). Thus, for every such i ∈ [t] it holds that |Vi | = N/t, and Item 1 follows. Turning to the proof of Part 2, we note that if G is δ-close to both Π and R(k) , then there exists G0 ∈ Π that is 2δ-close to R(k) . Let I1 , ..., It be the partition of G0 to t independent sets P such that there is a complete bipartite graph between each two Ij ’s. Then, we have i6=j |Ii | · P |Ij | ≥ k(N )N − 4δN 2 , which implies i∈[t] x2i ≤ (1/t) + 4δ, where xi = |Ii |/N . It follows √ P that i∈[t] (xi − (1/t))2 ≤ 4δ, and thus G0 is O( δ)-close to Π0 . The second class, studied in [36], is the class of super-cycle collections, where a supercycle (of length `) is a graph consisting of a sequence of disjoint sets of vertices, called clouds, such that two vertices are connected if and only if they reside in neighboring clouds S (i.e., denoting the ` clouds by S0 , . . . , S`−1 , vertices u, v ∈ i∈{0,1,...,`−1} Si are connected by an edge if and only if for some i ∈ {0, 1, ..., ` − 1} and j ∈ {i − 1 mod `, i + 1 mod `} it holds that u ∈ Si and v ∈ Sj ). Note that a bi-clique that has at least two vertices on each side can be viewed as a super-cycle of length four (by partitioning each of its sides into two parts). 22

We denote the set of graphs that consists of a collection of isolated super-cycles of length ` by SC` C As is shown in the next two propositions, for every ` ≤ 3, there is a dichotomy in the behavior of the set SC` C ∩R(k) : For some integers t and k(N ) = 2N/t`, the sets SC` C and R(k) satisfy the conditions of Theorem 3.3, whereas for the remaining values of t the conditions of Theorem 3.3 are not satisfied. Proposition 3.5 (on SC` C ∩ R(k) and testing it, for some values of `, t and k(N ) = 2N/t`): Let ` ≥ 3 be an integer and t ∈ T` , where    {1, 2, 3} if ` ≡ 1 T` = {1} if ` ≡ 2   N if ` ≡ 0

(mod 2) (mod 4) (mod 4)

(6)

and k(N ) = 2N/t`. Then: 1. The set SC` C ∩ R(k) equals the set of graphs that consists of t super-cycle of length `, each containing N/t vertices, such that clouds that are at distance four apart have equal size. Furthermore, if ` 6≡ 0 (mod 4), then each cloud has size N/t`. √ 2. If a graph G = ([N ], E) is δ-close to both SC` C and R(k) , then G is O( δ)-close to SC` C ∩ R(k) , where the hidden constant depends polynomially on t`. Thus, SC` C ∩ R(k) has a two sided error POT. The proof of Proposition 3.5 is quite tedious and is deferred to Appendix A.2. Proposition 3.6 (on SC` C ∩ R(k) in other cases): Let ` ≥ 3 and t ∈ N \ T` , where T` is as in (6). Then, for any integer k(N ) = 2N/t`, there exists an N -vertex graph in R(k) that is O(1/N )-close to SC` C but Ω(1)-far from SC` C ∩ R(k) . Indeed, Proposition 3.6 is non-trivial only in the case that ` 6≡ 0 (mod 4). We stress that Proposition 3.6 does not assert that in certain cases SC` C ∩ R(k) has no POT, but rather that the conditions stated in Theorem 3.3 are not satisfied (and so the approach suggested by its proof will not work). The proof of Proposition 3.6 refers to some elements of the proof of Proposition 3.5, and thus is also deferred to Appendix A.2.

3.3

The class of regular graphs

We consider the class REG of regular graphs. Note that this class strictly contains the classes considered in Section 3.1, since we make no restriction on the degrees. (Still, this does not mean that testing REG is either easier or harder than testing these subclasses.) Clearly, REG has no one-sided error POT. In contrast, we show that it has a two-sided error POT. 23

Theorem 3.7 (a POT for REG): The class REG has a two-sided error POT. Furthermore, all graphs in REG are accepted with equal probability. Recall that a standard tester of regularity can be obtained by estimating the degrees of random vertices (cf. [34, Prop. 10.2.1.3]), where these estimations are related to the proximity parameter. However, such good approximations are not possible in the context of proximity oblivious testing. Still, as in Section 3.1, crude approximations (which are obtained by a constant number of queries) turn out to be sufficiently good. Specifically, we construct a POT that picks two random vertices in the given graph, and checks that these two vertices have the same degree in a proximity oblivious manner. This checking is reduced to the problem of testing equality between two Boolean distributions, where in the reduction the distributions correspond to the density of the neighbors of each of the chosen vertices.11 We show first that the task of testing that two Boolean distributions are equal can be tested in a proximity oblivious manner (and will return to Theorem 3.7 later). Proposition 3.8 (a POT for EQ): Let EQ = {(P, Q) : Pr[P = 1] = Pr[Q = 1]} be the class that consists of pairs of equal Boolean distributions, and let the distance of a pair (P, Q) from EQ be defined as dist ((P, Q), EQ) = |Pr[P = 1] − Pr[Q = 1]| . Then, the property EQ has a two-sided error POT. Given two distributions, the tester makes two queries to each of them, and has quadratic detection probability. Moreover, all pairs of equal distributions are accepted with the same probability. As shown below (see Proposition 3.9), the property EQ has no POT that (always) makes less than two queries to one of the two distributions. Proof Following the recipe of Section 2.1, the design of the desired POT calls for choosing a sequence of α(i,j) ’s, where α(i,j) represents the acceptance probability when seeing i ones in the (2-element) sample of P and j ones in the (2-element) sample of Q. The corresponding acceptance probability is a polynomial of individual degree 2 in p and q, where p = Pr[P = 1] and q = Pr[Q = 1]. The goal is thus to choose the α(i,j) ’s such that this polynomail evaluates to c for every (p, q) such that p = q, and evaluates to c − Ω((p − q)2 ) otherwise. Let c, δ ∈ (0, 1) be two parameters such that c − 2δ, c + δ ∈ [0, 1]. Indeed, we may chose c = 0.5 and δ = 0.25. For every (i, j) ∈ {0, 1, 2}2 define

α(i,j)

   c − 2δ if (i, j) ∈ {(0, 2), (2, 0)} = c + δ if (i, j) = (1, 1)   c otherwise

11

(7)

This reduction is analogous to the proof of Theorem 3.1 in which we check that the degree of a vertex equals to some k ∈ N that is fixed and is known to the tester.

24

Given a pair of distributions (P, Q), the tester, denoted T , proceeds as following: 1. Make two queries to each distribution. Denote by i the number of ones obtained from P , and denote by j the number of ones obtained from Q. 2. Accept with probability α(i,j) . Letting p = Pr[P = 1] and q = Pr[Q = 1], the acceptance probability of the tester is Pr[T accepts (P, Q)] =

X i,j∈{0,1,2}

    2 i 2 j 2−i α(i,j) · p (1 − p) · q (1 − q)2−j . i j

(8)

Note that almost all the α(i,j) ’s in (7) are equal to c, with the exception of α(2,0) = α(0,2) and α(1,1) . Thus, the tester “penalizes” a highly unbalanced view (i.e., (2, 0) or (0, 2)) and “awards” a balanced view. Indeed, plugging in the parameters in (8), we get Pr[T accepts (P, Q)] = c − (p2 (1 − q)2 + q 2 (1 − p)2 ) · 2δ + 4p(1 − p)q(1 − q) · δ  = c − 2δ · p2 (1 − q)2 + q 2 (1 − p)2 − 2p(1 − p)q(1 − q) = c − 2δ · (p − q)2 The proposition follows. Proof of Theorem 3.7: follows.

Given an N -vertex graph G = ([N ], E), the tester proceeds as

1. Select uniformly two vertices v1 , v2 ∈ [N ] and consider the Boolean functions fv1 : [N ] → {0, 1} and fv2 : [N ] → {0, 1} such that fvi (w) = 1 if and only if {vi , w} ∈ E. 2. Invoke the POT of Proposition 3.8 to test whether the function fv1 and fv2 have the same density, and act according to its answer. That is, test whether the random variables Xv1 and Xv2 defined over [N ] such that Xvi (w) = fvi (w) are equal (i.e., whether the pair (Xv1 , Xv2 ) is in EQ). Recall that this POT accepts all equal pairs of distributions with the same probability c (which is an absolute constant independent of dG (v1 ) and dG (v2 )). In the implementation of Step 2 the tester takes two samples of Xv1 and two sample of Xv2 , which amounts to selecting uniformly four vertices, and checking whether that the first two are adjacent to v1 and the last two are adjacent to v2 . Thus, we make four queries to the graph G. Clearly, if G is a regular graph, then for every pair of vertices (v1 , v2 ), chosen in Step 1, the POT of Step 2 will accept with the same probability c (which is also independent of the degree of G). Suppose now that a graph G = ([N ], E) is accepted with probability c − . 25

Then, by an averaging argument, there are (1 − )N 2 pairs of vertices (v1 , v2 ) such that when these vertex-pairs are chosen in Step 1, then Step 2 accepts with probability at least c − . Thus, there exists a vertex v1 ∈ [N ] and a subset V2 ⊆ [N ] of size at least (1 − )N such that for every vertex v2 ∈ V2 the acceptance probability of Step 2 when applied on the pair (v1 , v2 ) is at least c − . Proposition 3.8 implies that for every such vertex v2 it holds that √ √ | Pr[Xv1 = 1] − Pr[Xv2 = 1]| = O( ), and therefore |dG (v2 ) − dG (v1 )| < O(  · N ). Let K = dG (v1 ) be the degree of v1 . We show that the graph is close to being K-regular. Indeed, using the fact that |V2 | ≥ (1 − )N and for every v2 ∈ V2 it holds that |dG (v2 ) − K| < √ O( N ), we have X v∈[N ]

|dG (v) − K| =

X

|dG (v) − K| +

v∈V2

X

√ |dG (v) − K| < O(  · N 2 ).

v∈[N ]\V2

√ Therefore, by applying Claim 3.2 we get that the graph G is O( )-close to being K-regular. This completes the proof of Theorem 3.7.

Comment. A result analogous to Theorem 3.3 can be proved in the current context. That is, Π ∩ REG has a two-sided error POT if Π has a one-sided error POT and there exists a monotone function F : (0, 1] → (0, 1] such that every graph that is δ-close to both Π and REG is F (δ)-close to Π ∩ REG. Proposition 3.9 The property EQ has no two-sided error POT that always makes at most one query to the first (resp., second) distribution. Indeed, Proposition 3.9 does not rule out the possibility that a POT for EQ can sometimes (i.e., depending on its coin tosses) make less than two queries to the first (resp., second) distribution. In fact, there exists a POT for EQ that always makes two queries (in total) such that with probability 14 it makes two queries to the first (resp., second) distribution, and otherwise it makes a single query to each distribution.12 Proof Suppose, without loss of generality, that there exists a POT, denoted T , that always makes t queries to distribution P and a single query to distribution Q. Let us denote the threshold probability of T by c. Note that, in general, without loss of generality, the activities of any POT for EQ depend on the number of ones that it sees among the P samples and among the Q samples; that is, when it sees i ones in the sample of P and j ones in the sample of Q it accepts with probability α(i,j) (e.g., let α(i,j) be the average acceptance probability taken over 12

This tester can be obtained by a reduction to testing a property of an auxiliary 4-valued distribution R such that R = (1, P ) with probability 12 and R = (2, Q) otherwise. Testing whether P equals Q reduces to testing whether Pr[R = (1, 1)] = Pr[R = (2, 1)], whereas (by Corollary 5.9) this property has a POT that uses two samples. We comment that a similar POT can be obtained for testing the equality of t ≥ 2 distributions.

26

all the corresponding cases). Now, denoting (again) p = Pr[P = 1] and q = Pr[Q = 1], the acceptance probability of T equals def

X

A(p, q) = Pr[T accepts (P, Q)] =

i∈{0,1,...,t}

  t i p (1 − p)t−i · qα(i,1) + (1 − q)α(i,0) . (9) i

Letting δ = p − q, we have A(p, q) = B(p) + δ · D(p), where   t i B(p) = p (1 − p)t−i · pα(i,1) + (1 − p)α(i,0) i i∈{0,1,...,t}   X t i def and D(p) = p (1 − p)t−i · (α(i,1) − α(i,0) ). i def

X

i∈{0,1,...,t}

The following observations only rely on the fact that the acceptance probability A(p, p + δ) is linear in δ. 1. For every p it holds that B(p) ≥ c, because (P, P ) must be accepted with probability at least c (i.e., because A(p, p) ≥ c). 2. For every p it holds that B(p) ≤ c, because otherwise for some δ such that |δ| > 0 the noinstance (P, Q) is accepted with probability at least B(p) − |δ| > c (using |D(p)| ≤ 1), which violates A(p, q) < c. 3. For every p it holds that D(p) 6= 0, because otherwise for some δ such that |δ| > 0 the no-instance (P, Q) is accepted with probability B(p) = c. Using (3), we note that for every p ∈ (0, 1) and  > 0 such that p − , p +  ∈ [0, 1], either (p, p−) or (p, p+) is accepted with probability greater than B(p) = c (since either D(p) > 0 or −D(p) > 0). This violates the requirements from a POT for EQ, and the proposition follows.

3.4

Bounded density of induced copies

We now turn to a different type of graph properties; specifically, to sets of graphs in which a fixed graph appears as an induced subgraph for a bounded number of times. Fixing any n-vertex graph H, denote by dnsH (G) the density of H as a subgraph in G; that is, dnsH (G) is the probability that a random sample of n vertices in G induces the subgraph H. def For any graph H and τ ∈ [0, 1], we consider the graph property ΠH,τ = {G : dnsH (G) ≤ τ }; in particular, ΠH,0 is the class of H-free graphs. Alon et al. [6] showed that, for some monotone function Fn : (0, 1] → (0, 1] if G is δ-far from the class of H-free graphs, then dnsH (G) > 27

Fn (δ). Here we provide a much sharper bound for the case of τ > 0 (while using an elementary proof).13 Theorem 3.10 (distance from ΠH,τ yields dnsH > τ ): For every n-vertex graph H and τ > 0, if G = ([N ], E) is δ-far from ΠH,τ , then dnsH (G) > (1 + (δn/3)) · τ , provided that δ > 6/N . It follows that ΠH,τ has a two-sided error POT, which just inspects a random sample of n vertices and checks whether the induced subgraph is isomorphic to H. This POT accepts a graph in ΠH,τ with probability at least 1 − τ , whereas it accepts any graph that is δ-far from ΠH,τ with probability at most 1 − τ − (τ n/3) · δ (if δ > 6/N , and with probability at most 1 − τ − (δ/6)n otherwise). Proof Let us consider first the case that H contains no isolated vertices. Setting G0 = G, we proceed in iterations while preserving the invariant that Gi is (δ − 2i/N )-far from ΠH,τ . In particular, we enter the ith iteration with a graph Gi−1 not in ΠH,τ , and infer that Gi−1 contains  def −1 a vertex, denoted vi , that participates in at least M = τ · Nn−1 copies of H. We obtain a graph  N Gi that is (N − 1)/ 2 -close to Gi−1 by omitting from Gi−1 all edges incident at vi . We stress that the M copies of H counted in the ith iterations are different from the copies counted in the prior i − 1 iterations, because all copies counted in the ith iteration touch the vertex vi and do not touch the vertices v1 , ..., vi−1 , since the latter vertices are isolated in Gi−1 (whereas H contains no isolated vertices). Also note that the copies of H counted in the ith iteration also occur in G, since they contain no vertex pair on which Gi−1 differs from G. Thus, after  def t = bδN/2c iterations, we obtain a graph Gt 6∈ ΠH,τ , which contain τ · Nn copies of H that are disjoint from the t · M copies of H counted in the t iterations. It follows that dnsH (G) ≥ τ + t ·

M  N n

n·τ = τ + bδN/2c · N   δn n > τ+ − ·τ 2 N and the claim follows (using δ > 6/N ). Recall, however, that the foregoing relies on the hypothesis that H has no isolated vertices. If this hypothesis does not hold, then the complement graph of H has no isolated vertices, and we can proceed analogously. In other words, if H has an isolated vertex, then no vertex in H is connected to all the other vertices. In this case, we consider the graph Gi obtained from Gi−1 by connecting the vertex vi to all other vertices in the graph. Also in this case, H-copies in Gi cannot touch v1 , ..., vi−1 (this time because each vertex in v1 , ..., vi−1 is connected to all vertices in Gi−1 ), and we can proceed as before. 13

In contrast, the proof of Alon et al. [6] relies on Szemeredy’s Regularity Lemma.

28

3.5

Towards a characterization

The foregoing results beg the question of characterizing the class of graph properties that have a two-sided error POT and also suggest that such a characterization may be related to the densities in which various fixed-size graphs appear as induced subgraphs in the graph. In the current section we pursue these ideas. Recall that, for an n-vertex graph H, we denote by dnsH (G) the density of H as a subgraph in G (i.e., dnsH (G) equals the probability that a random sample of n vertices in G induces the subgraph H). We consider graph properties that are each parameterized by a sequence of weights w = (wH ) and by b ∈ [0, 1], where wH ∈ [0, 1] for each n-vertex (unlabeled) graph H. The corresponding graph property is denoted Πw,b , and a graph G is in Πw,b if and only if P H wH · dnsH (G) ≤ b. Note that the case of b = 0 corresponds to F-freeness for F = {H : wH > 0}. More generally, if for every H it holds that wH ≥ b, then Πw,b equals the set of F-free graphs, where F = {H : wH > b}. Another interesting case is where wH0 = 1 for a unique graph H0 and wH = 0 otherwise (i.e., for every H 6= H0 ): In this case the property Πw,b corresponds to having an H0 -density that does not exceed b (i.e., in this case G ∈ Πw,b if and only if dnsH0 (G) ≤ b, which is the case studied in Section 3.4). In the rest of this section, we shall discard the case of a uniform sequence w (i.e., wH = w for some w and all H’s), since in this case the property is trivial. We conjecture that, for any b > 0 and w, the property Πw,b has a two-sided error POT, but we are only able to establish it in special cases (see Theorem 3.12). On the other hand, we show that any graph property having a two-sided error POT is essentially of the foregoing type. The latter statement requires some clarification. Recall that it was shown in [37, Thm. 4.7] that a graph property has a one-sided error POTs if and only if it is a subgraph freeness property. However, the equivalence is not to F-freeness where F is a fixed set of forbidden subgraphs, but rather to an infinite sequence of subgraph freeness properties that correspond to different graph sizes. Specifically, it was shown that Π = ∪N ΠN has a one-sided error POT if there exists a constant n and an infinite sequence (FN )N ∈N such that for every N it holds that (1) all graphs in FN are of size n, and (2) ΠN equals the set of all N -vertex FN -free graphs. Note that in the latter context there are only finitely many possible sets FN , whereas in our context there are infinitely many possible sequences w = (wH ) (and ditto b’s). In other words, for every fixed N , the number of possible properties of N -vertex graphs that arises from such n (2( 2 ) + 1)-long sequences depends on N (and is not upper bounded by a function of n). For  example, for every m(N ) ∈ {0, 1, ..., N2 }, we may consider the property of N -vertex graphs

29

having at most m(N ) edges.14 Another difficulty that arises regarding the foregoing properties is that, in general, it is not clear how the following notions of violating the property Πw,b are related: 1. The graph G is δ-far from Πw,b ; that is, for every G0 ∈ Πw,b it holds that G and G0 differ on at least δ fraction of vertex pairs. 2. The graph G satisfies

P

H

wH · dnsH (G) ≥ b + .

P Indeed, δ > 0 if and only if  > 0 (since G 6∈ Πw,b if and only if H wH · dnsH (G) > b). It  also holds that  ≤ n2 · δ (since the probability that a random sample of n vertices hits a pair of vertices that differs in two graph can be upper bounded in term of the distance between the graphs). But what is missing is a general bound in the opposite direction, whereas we do have such bounds in special cases (e.g., either b = 0 or |{H : wH > 0|}| = 1, see Section 3.4).15 In light of this state of affairs, a first step towards a characterization is provided by the following result (where V (H) denotes the vertex set of H). Theorem 3.11 (a kind of characterization): Let Π = ∪N ΠN be a graph property. Then, Π has a two-sided error POT if and only if there exists an integer n, a number b ∈ [0, 1], and a function F such that for every N there exists a sequence w = (wH )H:|V (H)|=n that satisfies the following two conditions: 1. ΠN equals the set of N -vertex graphs in Πw,b . 2. If G is δ-far from ΠN , then

P

H

wH · dnsH (G) ≥ b + F (δ).

Indeed, the second condition drastically limits the usefulness of the current characterization; still, Theorem 3.12 (which generalizes Theorem 3.10) presents cases in which this condition holds. Note that while one direction of Theorem 3.11 is quite obvious (i.e., that properties that correspond to such sequences of Πw,b ’s have a POT), the opposite direction requires a proof (i.e., that having a POT implies a correspondence to such sequences of Πw,b ’s). Proof The proof follows the outline of the proof of [37, Thm. 4.7]. Suppose that Π has a constant-query (two-sided error) POT. Then, by following the proof of [38, Thm. 4.5] (see also [39]), we can obtain a POT that inspects the subgraph induced by a random set of n = O(1) vertices and accepts with probability αH if the induced subgraph seen is isomorphic to H. Note that n equals twice the query complexity of the original POT, and that the resulting POT maintains the acceptance probability of the original POT (on any random isomorphic copy  This property can be represented by setting b = m(N )/ N2 , n = 2, and wH = 1 if H is a connected 2-vertex graph (i.e., an edge) and wH = 0 if H consists of two isolated vertices. 15 A more general result is presented in Theorem 3.12. 14

30

of any fixed graph G).16 Let c be the acceptance threshold of the original POT (i.e., c = P minG∈Π {Pr[TestG (|V (G)|) = 1]}). Then, ΠN = {G : |V (G)| = N ∧ H dnsH (G) · αH ≥ c}, which equals the set of N -vertex graphs in Πw,b for wH = 1 − αH and b = 1 − c. That is, this w satisfies the first condition. Furthermore (by the POT guarantee), if the N -vertex graph P G0 is 0 -far from Π, then H dnsH (G0 ) · αH ≤ c − %(0 ), where % is the guaranteed detection def probability function. That is, this w satisfies the second condition, with respect to F = %, since P 0 0 H dnsH (G ) · wH ≥ b + F ( ). Thus, we obtained n, b and F such that for every N there exists a sequence of wH ’s that satisfies both conditions. Suppose, on the other hand, that for some n, b and F , it holds that for every N there exists a sequence w = (wH )H:|V (H)|=n that satisfies the two conditions (i.e., (i) ΠN equals the set of P N -vertex graphs in Πw,b , and (ii) if G is δ-far from ΠN then H wH · dnsH (G) ≥ b + F (δ)). Our goal is to derive a constant-query two-sided error POT for Π, which we achieve using the following natural test: The test selects a random set of n vertices, inspects the induced subgraph, and accepts with probability 1 − wH when seeing a subgraph isomorphic to H. def Clearly, every graph in ΠN is accepted with probability at least c = 1 − b, whereas if G is δ-far P from ΠN then it is accepted with probability at most H (1 − wH ) · dnsH (G) ≤ c − F (δ). Thus, this test is a two-sided error POT with % = F .

Discussion. As admitted upfront, Theorem 3.11 leaves open the question of which graph properties can be captured by sequences of wH ’s that satisfy the second condition (i.e., that P being δ-far from Πw,b , means satisfying H wH · dnsH (·) ≥ b + F (δ)). Generalizing Theorem 3.10, it is easy to prove the following. Theorem 3.12 (Theorem 3.10, generalized): Let b ∈ [0, 1] and w = (wH )H:V (H)=n . If the set {H : wH ≥ b} contains only graphs with no isolated vertices, then for every graph G that is P def δ-far from Πw,b it holds that H wH ·dnsH (G) ≥ b+(dn/3)·δ, where d = b−maxH:wH 6/|V (G)|. The same holds if {H : wH ≥ b} contains only graphs in which no vertex neighbors all other vertices. When applying the argument used in proving Theorem 3.10, each iteration reduces the value of  P P N −1 w ·dns (·) by at least d· units. Thus, we obtain H wH ·dnsH (G) ≥ b+(δn/3)·d. H H H n−1 In contrast to Theorem 3.12, we observe that not every Πw,b satisfies the second condition of Theorem 3.11. Specifically, we show the following Proposition 3.13 (violating the second condition of Theorem 3.11): There exists b ∈ (0, 1) and w = (wH )H:V (H)=O(1) such that for every N there exists an N -vertex graph G that is P Ω(1)-far from Πw,b and yet H wH · dnsH (G) = b + O(1/N ). 16

We avoid the final step in [38, Sec. 4] (and [37]), where each αH > 0 is replaced by αH = 1, yielding a deterministic decision (which in turn corresponds to F-freeness).

31

Note that this does not say that Πw,b does not have a POT, since such a possible POT may use an alternative characterization of the same property (i.e., Πw,b may equal Πw0 ,b0 such that the former violates the second condition of Theorem 3.11 whereas the latter satisfies this very condition). In Proposition 6.14 (see also Corollary 6.13) we provide an example of a property Π = ∪N ∈N ΠN that does not have a POT, but for some b ∈ (0, 1) and every N ∈ N there are w = (wH )H:|V (H)|=O(1) such that ΠN is exactly the set of all N -vertex graphs in Πw,b . Thus, Proposition 6.14 asserts that a property Πw,b that satisfies the first condition of Theorem 3.11 does not necessarily have a POT.17 Proof Consider any of the properties SC` C ∩ R(k) asserted in Proposition 3.6, where k(N ) = 2N/`t for some t ∈ N \ T` . Next, consider the tester for SC` C ∩ R(k) described in the proof of Theorem 3.3: This tester selects a random sample of n = O(1) vertices, and inspect the corresponding induced subgraph, denoted H. Specifically, let H 0 be the subgraph induced by the first three vertices and H 00 be the subgraph induced by the other n − 3 vertices. Then, this 0 00 0 tester accepts with probability αH ∈ AS and rejects otherwise, where (αH 0 if H 0 )H 0 denotes the sequence of probabilities used by the POT of R(k) and AS denotes the set of all possible induced subgraphs of graphs in SC` C. Recall that, for some c0 ∈ (0, 1) and every graph G ∈ R(k) , it P 0 = c0 (since all graphs in R(k) are accepted by the corresponding holds that H 0 dnsH (G) · αH POT with exactly the same probability). Denoting by H(H 0 , H 00 ) the set of all n-vertex graphs H that induce the graphs H 0 and H 00 as above, we observe that for every graph G it holds that P 0 00 H∈H(H 0 ,H 00 ) dnsH (G) = dnsH 0 (G) · dnsH 00 (G) ± O(n/N ). Note that for every H ∈ H(H , H ), 0 the test accepts a graph when seeing the induced subgraph H with probability αH = αH 0 if 00 H ∈ AS and αH = 0 otherwise. Then, for the N -vertex graph G, we have X

X

αH · dnsH (G) =

H 0 :|V

H:|V (H)|=n

(H 0 )|=3

X

=

X H 00 :|V 0 αH 0 ·

H 0 :|V (H 0 )|=3

X

=

H 0 :|V

0 αH 0 ·

X

(H 00 )|=n−3

X

X

H 00 ∈AS

H∈H(H 0 H 00 )

X

[dnsH 0 (G)dnsH 00 (G) ± O(n/N )]

! =

dnsH (G)

H 00 ∈AS

(H 0 )|=3

X

αH · dnsH (G)

H∈H(H 0 H 00 )

0 αH 0 dnsH 0 (G)

H0

! ·

X

dnsH 00 (G)

2

± O(2n /N )

H 00 ∈AS

2

Recall that n = O(1) and thus O(2n /N ) = O(1/N ). Then, for the N -vertex graph G asserted P in Proposition 3.6, we have H:|V (H)|=n αH · dnsH (G) = c0 − O(1/N ), because G ∈ R(k) P 0 0 implies H 0 αH 0 dnsH 0 (G) = c whereas the fact that G is O(1/N )-close to SC` C implies that the density (in G) of subgraphs not in AS is at most O(n2 /N ). Finally, using the same 17

We note that Proposition 3.13 is not subsumed by Proposition 6.14, since the parameters w used in the former are independent of N whereas in the latter w depend on N .

32

translation as in the proof of Theorem 3.11 (i.e., b = 1 − c0 and wH = 1 − αH ), we conclude P that although G is Ω(1)-far from Πw,b it holds that H wH · dnsH (G) = b + O(1/N ). The claim follows.

3.6

Impossibility results

It is easy to derive impossibility results regarding general POTs by considering two distributions on N -vertex such that the following two conditions hold: (1) the two distributions cannot be distinguished by a constant number of queries, and (2) the first distribution is concentrated on graphs that have the property whereas the second distribution is concentrated on graphs that do not have the property.18 For example, wishing to prove that bipartiteness has no constant-query POTs, we consider for each constant q, the following two distributions that refer to ` = 2dq/2e + 1: The first distribution consists of random isomorphic copies of an N -vertex graph that is obtained by a balanced blow-up of a single 2`-cycle, and the second distribution is analogously obtained by a balanced blow-up of two `-cycles. Thus, each graph in each of the two distributions consists of 2` clouds such that each cloud consists of an independent set of size N/2`, and the clouds are arranged either in a single 2`-cycle or in two disjoint `-cycles. Clearly, these distributions cannot be distinguished by an algorithm that makes less than ` queries, but graphs in the first distribution are bipartite whereas graphs in the second distribution are far from being bipartite. Thus, we get: Theorem 3.14 Bipartiteness has no two-sided error POT.

4

In the Bounded-Degree Graph Model

The bounded-degree graph model refers to a fixed degree bound, denoted d ≥ 2. An N vertex graph G = ([N ], E) (of maximum degree d) is represented in this model by a function g : [N ] × [d] → {0, 1, ..., N } such that g(v, i) = u ∈ [N ] if u is the ith neighbor of v and g(v, i) = 0 if v has less than i neighbors. Distance between graphs is measured in terms of their aforementioned representation (i.e., as the fraction of (the number of) different array entries (over dN )). The straightforward method for showing impossibility results (outlined in Section 3.6), is applicable also in the current (bounded-degree) model. To demonstrate this, we show that 18

The foregoing method directly establishes the non-existence of a two-sided error POT. Alternatively, one may use this method to show that the first condition in Theorem 3.11 is not satisfied. Indeed, using Theorem 3.11 merely allows to replace (1) by (1’) the two distributions have the same densities of various induced subgraphs of constant size.

33

(for any constant q) the connectivity property has no q-query (two-sided error) POT in this model. The two distributions that we consider are: (1) a random isomorphic copy of the graph consisting of a single N -vertex Hamiltonian cycle, and (2) a random isomorphic copy of the graph consisting of N/(q + 1) isolated (q + 1)-vertex cycles. Thus, we get: Theorem 4.1 Connectivity has no two-sided error POT (in the bounded-degree graph model, for any d ≥ 2). Turning to positive results, we note that the properties of distributions studied in Section 2 give rise to graph properties that have POTs in the bounded-degree model. The first type of such graph properties refer to the edge densities of graphs, where in the current section densities are measured as a fraction of dN/2. (Note that a Boolean function f : [N ] × [d] → {0, 1} can be defined such that f (v, i) = 1 if and only if g(v, i) ∈ [N ].)19 As in Section 3, we are more interested in “genuine” graph properties, and the first type of properties that we consider refer to the density of isolated vertices in the graph. Recall that for any sequence of t density thresholds, denoted τ = (τ1 , ..., τt ) ∈ [0, 1]t , such that τ1 ≤ τ2 < τ3 ≤ τ4 < · · · ≤ τt , we considered (in Section 2) the class of distributions, denoted Dτ , consists of all 0-1 random variables X such that for some i ≤ dt/2e it holds that τ2i−1 ≤ Pr[X = 1] ≤ τ2i . The corresponding class of bounded-degree graphs will consist of graphs that contain a fraction of isolated vertices that corresponds to a distribution in Dτ . That is, Gτ contains the N -vertex graph G if and only if G contains M isolated vertices such that the fraction M/N (viewed as a probability) is in Dτ . Theorem 4.2 (POT for Gτ ): For every τ = (τ1 , ..., τt ), the property Gτ has a two-sided error POT. Proof On input N and oracle access to an N -vertex graph G = ([N ], E), of degree bound d, the tester proceeds as follows. 1. Selects uniformly and independently t vertices, denoted u1 , ..., ut , and explore their immediate neighborhood. That is, for each i determine whether or not ui is isolated in G. 2. Let j ∈ {0, 1, ..., t} denote the number of isolated vertices seen in Step 1. Then, the tester accepts with probability αj , where (α0 , α1 , ..., αt ) is the sequence of probabilities used by the POT that is guaranteed by Theorem 2.3 (i.e., the tester for Dτ ). Let c be the threshold probability associated with the tester of Theorem 2.3. Then, each graph G ∈ Gτ is accepted with probability at least c. On the other hand, we shall show that if G is 19

Thus, the fraction of 1-values in f equals the fraction of edges in the graph represented by g.

34

-far from being in Gτ , then the fraction of isolated vertices in G is (/4)-far from Dτ , and the theorem follows. Actually, the validity of this claim presupposes that all the thresholds in τ are multiples of 1/N , and we shall defer this issue to the end of the proof. We shall prove the counterpositive (i.e., if the fraction of isolated vertices in G is -close to Dτ , then G is 4-close to Gτ ). Suppose that G is an N -vertex graph with M isolated vertices such that there exists p ∈ Dτ that satisfies |p − (M/N )| ≤ . If M > pN (and M < N )20 , then we may decrement the number of isolated vertices by connecting any isolated vertex v to some non-isolated vertex. (If some non-isolated vertex u has degree smaller than d, then we connect v to u, else we connect v to an arbitrary vertex of degree d while omitting one of its current edges.) The case M < pN is slightly more complex, since we wish to turn some non-isolated vertex v into an isolated vertex. If each of the neighbors of v has degree at least two, then these is no problem. Otherwise, we may need to connect these neighbors among themselves so to prevent them from becoming isolated. The details are omitted. Note that the foregoing argument presupposes that pN is an integer, which is indeed the case when all the thresholds in τ are multiples of 1/N . Thus, our argument needs to be augmented to deal with the general case, in which the latter presumption does not hold. We distinguish between dealing threshold pairs of the form τ2i−1 < τ2i and pairs of the form τ2i−1 = τ2i . In the first case, ignoring finitely many N ’s, we may replace p ∈ [τ2i−1 , τ2i ] by p0 ∈ [τ2i−1 , τ2i ] ∩ {j/N : j = 0, 1, ..., N } (while increasing  by at most 1/N , which is fine since it suffices to establish an upper bound of 4( + (1/N ))). In the second case, we should actually modify the algorithm and omit the pair (τ2i−1 , τ2i ) from τ . That is, the algorithm will refer to a modified τ that contains a pair of the form τ2i−1 = τ2i if and only if such a pair is a multiple of 1/N (for the current N ).21

Generalization. The foregoing treatment can be extended to properties that refer to the density of certain isolated patterns in the graph. Specifically, for any fixed family of graphs H, we denote by #H (G) the number of connected components in G that are isomorphic to some graph in H. Next, for any τ as above, we may consider the property GH,τ that consist of N -vertex graphs G such that the fraction #H (G)/N is in Dτ . (Indeed, Gτ is a special case obtained when letting H be a singleton consisting of the 1-vertex graph.) The integrality issue (i.e., the τi ’s not necessarily being multiples of 1/N ) dealt with at the end of the proof of Theorem 4.2 takes a more acute form in the current setting, since if H consists only of n-vertex graphs then #H (G)/N resides in the interval [0, 1/n] (rather than in [0, 1]). Therefore, letting s(H) denote 20

If M = N , then it must be that pN ≤ N − 2, and thus connecting a pair of isolated vertices is fine. Indeed, this means that the algorithm may use up to 2t/2 different sequences τ , each having its own corresponding POT. This requires scaling the threshold probabilities of all these POTs so that they are all equal, and it is indeed crucial that we are dealing with a finite number of algorithms (or threshold probabilities). 21

35

the (number of vertices in the) smallest graph in H, we may restrict our attention to the interval [0, 1/s(H)]. Theorem 4.3 (POT for GH,τ ): For every H and every τ = (τ1 , ..., τt ) such that τt ≤ 1/s(H) the property GH,τ has a two-sided error POT. Proof We build on the proof of Theorem 4.2, while somewhat adapting both the tester and its analysis. For starters, the tester should look for isolated copies of graphs in H (rather than isolated vertices), and count them in proportion to their size (which reflects the probability that a uniformly selected vertex hits such a copy). Let n = n(H) denote the (number of vertices in the) largest graph in H. Then, on input N and oracle access to an N -vertex graph G = ([N ], E), of degree bound d, the modified tester proceeds as follows. 1. Selects uniformly and independently t vertices, denoted u1 , ..., ut , and explore the neighborhood of each vertex ui till discovering at most n + 1 vertices. For each i ∈ [t], let pi = 1/|V (H)| if ui resides in a connected component of G that is isomorphic to H ∈ H, and pi = 0 otherwise. P 2. For each i ∈ [t], let ci = 1 with probability pi and ci = 0 otherwise, and let j = ti=1 ci . Then, the tester accepts with probability αj , where (α0 , α1 , ..., αt ) is the sequence of probabilities used by the POT that is guaranteed by Theorem 2.3 (i.e., the tester for Dτ ). Let c be the threshold probability associated with the tester of Theorem 2.3. Then, each graph G ∈ GH,τ is accepted with probability at least c, since for each i it holds that Pr[ci = 1] = #H (G)/N . On the other hand, we shall show that if G is -far from being in GH,τ , then #H (G)/N is Ω()-far from Dτ , and the theorem follows. Following the proof of Theorem 4.2, we show how to decrement and increment the number of good connected components in a graph, where a component is called good if it is isomorphic to some H ∈ H (and is bad otherwise). We consider two cases that refer to whether or not the single vertex is in H (i.e., whether or not s(H) = 1). We start with the case that s(H) > 1 (i.e., an isolated vertex is a bad component). In this case, we can decrement the number of good components by omitting all edges that appear in an arbitrary good component, turning this component to a collection of isolated vertices (which are bad components in this case). To increment the number of good components, we may combine s vertices that are taken from bad components, while keeping each of these components bad by either maintaining its connectivity (by adding edges, if it contains more than n vertices) or replacing it by isolated vertices (if this component contains at most n vertices). Thus, each decrement or increment operation is charged with O(n2 ) edge modifications. This completes the treatment of the case s(H) > 1. 36

We now turn to the case that s(H) = 1 (i.e., an isolated vertex is a good component). If we wish to decrement number of good components, then we pick a (largest) good component, and connect it to any bad connected component (or to another good component if all components are good). (This connection is made via a pair of vertices of degree less than d, and if no such vertex exists in the relevant component then we create it by omitting an arbitrary edge.) This operation either decreases the number of good components or increases the size of the largest good component, and so we can decrease the number of good components by O(n) edge modifications. (Note that in case we connect two good components, the number of good component may decrease by two units.) If we wish to increment the number of good components, then we select a vertex that belong to any bad connected component (or from a non-singleton good component if all components are good), and disconnect it from its current neighborhood, thus creating a new isolated vertex (which is a good component). When disconnecting this vertex from its neighbors, we may add edges so to maintain the connectivity of this component. Note that when modifying the said component, we may turn a bad component to a good one (or turn a good one to a bad one). Thus, either the number of good components increases (by either one or two units) or a bad component is created and can be used in our next attempt. The forgoing description suffices for getting the number of good components to either equal the desired number or be one unit below the desired number. To close this final gap, we make two observations. 1. Suppose that the graph contains at least n + 2 vertices in bad components. Then, by picking at most (n + 2)/2 bad components that contain m ≥ n + 2 vertices, we can form a new collection of connected components with exactly one good component (by creating a single isolated vertex and a single bad component that contains all the other m − 1 ≥ n + 1 vertices). 2. Suppose that the graph contains at least n + 2 good non-singleton components. Then, by picking n + 2 such components, we can form a new collection of connected components with exactly n + 3 good component (by creating n + 3 isolated vertices and a single bad component that contains all the other vertices, the number of which is at least 2(n + 2) − (n + 3) > n). In both cases, O(n2 ) edge modifications are used. The only case where we cannot apply either of these observations is when the number of isolated vertices is N −O(n2 ). Fortunately, we can ignore this case, because it may occur only if 1 ∈ Dτ and in such a case we may just increase the number of isolated vertices to N in the trivial manner. This completes the treatment of the case s(H) = 1.

37

5

Classes of Non-binary Distributions

In this section we generalize the results from Section 2 to distributions over larger (finite) domains. We give a characterization for the classes of distributions that have a two-sided error POT. For r ∈ N we shall identify a distribution q = (q1 , . . . , qr ) on [r] with a point in ∆(r) , where X ∆(r) = {(q1 , . . . , qr ) ∈ [0, 1]r : qi = 1}. (10) i∈[t]

Similarly, a class of distributions with domain [r] will be identified with a subset of ∆(r) in a natural way. The special case of boolean distributions discussed in Section 2 corresponds to r = 2, for which ∆(2) = {(p, 1 − p) : p ∈ [0, 1]}.

5.1

Characterizing the classes of distributions that have a POT

The following result asserts that a class of distributions has a POT if and only if there exists a polynomial that is non-negative exactly on the points that correspond to distributions in that class. Thus, the question of whether or not there exists a POT for Π ⊆ ∆(r) reduces to whether or not some polynomial can be non-negative on Π and negative on ∆(r) \ Π. Theorem 5.1 (POT and polynomials in the context of distribution testing): Let Π be an arbitrary class of distributions q = (q1 , . . . , qr ) with domain [r]; that is, Π ⊆ ∆(r) . Then, Π has a two-sided error POT if and only if there is a polynomial P : ∆(r) → R such that for every distribution q = (q1 , . . . , qr ) ∈ ∆(r) it holds P(q1 , . . . , qr ) ≥ 0 ⇐⇒ q ∈ Π.

(11)

If the total degree of P is t, then Π has a two-sided error POT TΠ that makes t queries and has polynomial detection probability %() = Ω(C ), where C < tO(r) .22 Moreover, the acceptance probability of TΠ when testing q ∈ ∆(r) can be written as Pr[TΠ accepts q] =

1 + δ · P(q1 , . . . , qr ) 2

(12)

for some constant δ > 0 that depends only on the degree of P and on an upper bound of the absolute value of all coefficients of P. Proof The “only if” direction is proved by using the independence of samples of the given distribution. Consider a POT TΠ for Π, that makes t sampling queries and accepts each distribution in Π with probability at least c. 22

The constant in the Ω() notation depends on P, while the O() notation hides some absolute constant.

38

When testing q = (q1 , . . . , qr ), for every view v = (v1 , . . . , vt ) ∈ [r]t , the probability of Q seeing this view is ti=1 qvi . Denoting by αv the probability that the tester accepts the view v = (v1 , . . . , vt ), we have t Y

X

Pr[TΠ accepts q] =

v=(v1 ,...,vt

)∈[r]t

! qvi

· αv .

i=1

Define a polynomial P to be  P(q1 , . . . , qr ) = 

X

αv

v∈[r]t

t Y

 qvi  − c.

i=1

Then, by definition of the tester, P satisfies (11). For the other direction, let P : ∆(r) → R be a polynomial of degree t. We show that the class Π = {(q1 , . . . , qr ) ∈ ∆(r) : P(q1 , . . . , qr ) ≥ 0} (13) has a POT, that makes t queries, and has polynomial detection probability. In order to simplify the proof, we shall slightly modify P, while making sure that the modifications of P does not affect Π in (13). Specifically, we multiply each monomial of degree P d < t (of P) by ( i∈[r] qi )t−d . This does not change the value of P in ∆(r) , and hence does not affect Π.23 Henceforth we shall assume that P is a homogeneous polynomial of degree t, and therefore can be written as t Y X αv qvi (14) P(q1 , . . . , qr ) = v∈[r]t

i=1

for some coefficients αv ∈ R. Assume that Π is non trivial. This implies that not all coefficients αv are zeros. Given (14), we define a POT TΠ for Π as follows. The tester makes t queries to a given distribution, gets t samples, denoted by v = (v1 , . . . , vt ), and accepts with probability βv =

1 + δ · αv , 2

This grouping of monomials to homogeneous monomials maps at most 2t monomials to a single homogeneous monomials, and thus the coefficients in the P may grow by a factor of at most 2t . 23

39

where we choose δ = 2·max{|α1v |:v∈[r]t } > 0, in order to assure that βv ∈ [0, 1] for all v. Therefore, when testing q = (q1 , . . . , qr ) the acceptance probability of the test is

Pr[TΠ accepts q] =

X

βv

v∈[r]t

t Y

 qvi =

1 +δ· 2

X

αv

v∈[r]t

i=1

t Y

 qvi  ,

i=1

and hence, by (14), the equality above becomes Pr[TΠ accepts q] =

1 + δ · P(q1 , . . . , qr ). 2

(15)

Next, we analyze the acceptance probability in (15). If q ∈ Π, then, by (13), we have P(q1 , . . . , qr ) ≥ 0, and therefore 1 Pr[TΠ accepts q] ≥ . 2 Assume q is -far from Π. Then, in particular q ∈ / Π, and hence P(q1 , . . . , qr ) < 0. Thus, using (15), we have Pr[TΠ accepts q] < 12 . In order to prove that TΠ is a POT, we need to show that Pr[TΠ accepts q] is bounded below 21 by some function that depends on . This type of result is known in real algebraic geometry as the Łojasiewicz inequality (see [19, Chapter 2.6]). Specifically, we use the following theorem of Solernó [69]. Theorem 5.2 (Effective Łojasiewicz inequality): Let P : ∆(r) → R be a polynomial, and let Π = {(p1 , . . . , pr ) ∈ ∆(r) : P(p1 , . . . , pr ) ≥ 0}. Assume that for q = (q1 , . . . , qr ) ∈ ∆(r) it holds dist(q, Π) = inf{

1X |qi − pi | : (p1 , . . . , pr ) ∈ Π} > . 2 i∈[r]

Then, P(q1 , . . . , qr ) < −Ω(C ) for some constant C < deg(P)O(r) , where the constant in the Ω() notation depends on P, and the O() notation hides some absolute constant. By applying Theorem 5.2 on (15), we conclude that if q ∈ ∆(r) is -far from Π, then Pr[TΠ accepts q] < 21 − Ω(C ), where C < deg(P)O(r) . This completes the proof of Theorem 5.1.

Corollaries: As hinted upfront, Theorem 5.1 provides a tool towards proving both positive and negative results regarding the existence of POTs for various properties. First, we use this 40

tool to show that classes of distributions that have a POT are closed under taking disjoint unions. Next, we use it to present POTs for some concrete properties of interest, and lastly we use it to derive impossibility results about other concrete properties of interest.

5.2

Closure under disjoint union

Recall that in the standard property testing model, as well as in one-sided error POT model, testable properties are closed under union. However, for properties of distributions with twosided error POT, the closure under union does not hold: Indeed, in Proposition 5.13 (see Section 5.4), we show two properties that have two-sided error POTs, but their union does not have a POT. Nevertheless, we prove next that if two disjoint classes of distributions have two-sided error POTs, then so does their union. Corollary 5.3 (closure under disjoint union): Let Π and Π0 be two disjoint classes of distributions with domain [r], and suppose that both Π and Π0 have a two-sided error POT. Then, then their union Π ∪ Π0 also has a two-sided error POT. Proof By Theorem 5.1 if Π has a POT, then there is a polynomial P : ∆(r) → R, such that Π = {q ∈ ∆(r) : P(q) ≥ 0}. Similarly, there is a polynomial P0 : ∆(r) → R, such that Π0 = {q ∈ ∆(r) : P0 (q) ≥ 0}. Define a polynomial Punion : ∆(r) → R to be Punion (q) = −P(q) · P0 (q). Since Π and Π0 are disjoint subsets of ∆(r) , it holds that Punion (q) ≥ 0 if and only if q ∈ Π ∪ Π0 : Indeed, if q ∈ Π, then q ∈ / Π0 (since the classes are disjoint), and therefore Punion (q) = / Π ∪ Π0 , −P(q) · P0 (q) ≥ 0. Similarly Punion (q) ≥ 0 for q ∈ Π0 . On the other hand, if q ∈ then P(q) < 0 and P0 (q) < 0, and hence Punion (q) = −P(q) · P0 (q) < 0. By Theorem 5.1 the class Π ∪ Π0 has a two-sided error POT. By applying Corollary 5.3 repeatedly, we get Corollary 5.4 Let Π1 , . . . Πk be disjoint classes of distributions with domain [r], and suppose that each of the classes Πi has a two-sided error POT. Then, their union Π = ∪ki=1 Πi also has a two-sided error POT. In Proposition 5.5 we strengthen the foregoing corollary. Specifically, we prove a result specifying (and improving) the query complexity and the detection probability of a POT for disjoint union of properties. We defer the proof of the proposition to Appendix A.3.

41

Proposition 5.5 (Corollary 5.4, revisited): Let Π1 , . . . Πk be disjoint classes of distributions with domain [r]. Suppose that for each i ∈ [k] the class Πi has a two-sided error POT that makes ti queries and has detection probability %i . Then, their union Π = ∪i∈[k] Πi has a twoP sided error POT that makes i∈[k] ti queries and has detection probability Ω(min{%i : i ∈ [k]}). Closure to complement: It is natural to ask whether properties having POTs are closed under taking the complement. Note, however, that if Π has a POT, then Π = {q ∈ ∆(r) : P(q) ≥ 0} for some polynomial P : ∆(r) → R, and thus is a closed24 subset of ∆(r) . Hence, its complement is an open set, and cannot have a POT. Still, it could be natural to conjecture that the closure25 of the complement, denoted by cl(∆(r) \ Π), has a POT. In Appendix A.5 we show that this is not true in general, by presenting a class of distributions Π ⊆ ∆(3) , that has a POT, such that cl(∆(r) \ Π) does not have one.

5.3

Positive corollaries

In this section we give several concrete examples of properties that have two-sided error POTs. We first note that (a weaker quantitative version of) Theorem 2.3 can be derived by observing that for every segment there exists a quadratic polynomial that is non-negative on this segment and negative outside it. Indeed, this claim appears explicitly in Section 2 and underlies all results presented there. Still, the quadratic detection probability in Theorem 2.3 doesn’t follow from Theorem 5.1, and requires explicit calculations as done in Section 2. For similar reasons, the proof of the following corollary requires an explicit calculation beyond the application Theorem 5.1. Corollary 5.6 (classes containing a single distribution have POTs): For a fixed r ≥ 2, and a distribution p = (p1 , . . . , pr ) with domain [r], let Π(p) be the class consisting of the single distribution p. Then, the property Π(p) has a two-sided error POT that makes two queries, and has quadratic detection probability. Proof Define a quadratic polynomial P in r variables that is negative for all (q1 , . . . , qr ) ∈ ∆(r) \ {(p1 , . . . , pr )} and equals zero in (p1 , . . . , pr ). Specifically, let P(q1 , . . . , qr ) = −

X

(qi − pi )2 .

i∈[r] 24 A set A ⊆ ∆(r) is a closed subset of ∆(r) if the complement set ∆(r) \ A is open in ∆(r) , where a B is open in ∆(r) if each point in B has a small neighborhood that is contained in B; that is, for every q ∈ B there exists an  > 0 such that every q 0 ∈ ∆(r) that is at distance at most  from q is actually in B. 25 For a set A ⊆ ∆(r) , the closure of A, is the set of all q ∈ ∆(r) that are arbitrarily close to A; that is, q is in the closure of A if for every  > 0 there is q 0 ∈ A such that dist(q, q 0 ) < .

42

It is immediate that P satisfies the required conditions. Therefore, by Theorem 5.1, we get a POT for Π(p) , which makes 2 queries. In order to show that the POT has quadratic detection probability, by Theorem 5.1, it is enough to show that for any q ∈ ∆(r) that satisfies dist(q, p) >  it holds P(q1 , . . . , qr ) < −Ω(2 ). Indeed, by Cauchy-Schwarz inequality we have 2

 P(q1 , . . . , qr ) = −

X i∈[r]

1 (qi − pi )2 ≤ − ·  r

X i∈[r]

1 |qi − pi | < − · 2 , r

which completes the proof. By combining Corollary 5.6 with Proposition 5.5 we can test any property that consists of finitely many distributions. Corollary 5.7 (finite classes of distributions have POTs): Fix r ≥ 2 and k ≥ 2, and let Π be a property that contains exactly k distributions with domain [r]. Then, Π has a POT that makes 2k queries and has quadratic detection probability. POTs for infinite classes of distributions. By slightly generalizing the proof of Corollary 5.6, we show POTs for classes that contain infinitely many distributions. Specifically, we first shall show that any class of distributions that correspond to an ellipsoid has a POT. Let p = (p1 , . . . , pr ) be a distribution, and let B = (B0 ; B1 , . . . , Br ) ∈ Rr+1 such that B0 ≥ 0 and Bi > 0 for all i ∈ [r]. Let Π(p,B) q of distributions that lie within an ellipsoid centered qbe a class B0 B0 ). That is at p = (p1 , . . . , pr ) with radii ( B1 , . . . B r

Π(p,B) =

 

q = (q1 , . . . , qr ) ∈ ∆(r) :



X i∈[r]

Bi (qi − pi )2 ≤ B0

 

.

(16)



In the special case of B0 = 0, the property Π(p,B) contains exactly one distributions; that is, it corresponds to the properties discussed in Corollary 5.6. Corollary 5.8 (classes that correspond to ellipsoids have POTs): Fix r ≥ 2, and let p = (p1 , . . . , pr ) and B = (B0 ; B1 . . . Br ) as above. Then, the property Π(p,B) has a two-sided error POT that makes two queries and has polynomial detection probability. Proof As in the proof of Corollary 5.6, define a polynomial P in r variables, that is nonnegative for all points (q1 , . . . , qr ) in the ellipsoid, and negative outside the ellipsoid. Specifically, let X P(q1 , . . . , qr ) = B0 − Bi (qi − pi )2 . i∈[r]

43

By (16), it is immediate that P satisfies the required conditions. The corollary follows by applying Theorem 5.1. In Appendix A.4 we strengthen Corollary 5.8; specifically, we show that for B0 > 0 the POT given in the foregoing proof has linear detection probability (rather than some polynomial probability, as guaranteed by Theorem 5.1). Corollary 5.9 (POT for classes of distributions that satisfy polynomial equalities): Let P1 , ..., Pt : Rr → R be polynomials of total degree at most d. Then, the class of distributions 

q = (q1 , . . . , qr ) ∈ ∆(r) : (∀i ∈ [t]) Pi (q1 , ..., qr ) = 0



(17)

has a two-sided error POT. Furthermore, this POT makes 2d queries and has polynomial detection probability. An appealing special case of Corollary 5.9 refers to the case that each Pi asserts the equality of two arguments (i.e., Pi (q1 , . . . , qr ) = qi1 − qi2 ). This case corresponds to the class of distributions in which some outcomes occur with the same probability (i.e., the probability that the outcome is i1 equals the probability that the outcome is i2 ). P Proof Consider the polynomial P(q1 , . . . , qr ) = − i∈[t] Pi (q1 , . . . , qr )2 . Clearly, P(q1 , . . . , qr ) ≥ 0 if and only if for every i ∈ [t] it holds that Pi (q1 , . . . , qr ) = 0. The corollary follows by applying Theorem 5.1.

5.4

Negative corollaries

In this section we give examples of classes of distributions that do not have a POT. We start with a simple claim. Claim 5.10 (a polynomial cannot be non-negative only in one quarter): Let B ⊆ R2 be a neighborhood26 of (0, 0) ∈ R2 . Then, there is no real polynomial P : R2 → R such that for all (x, y) ∈ B it holds 1. P(x, y) ≥ 0, if x < 0 and y < 0 2. P(x, y) ≤ 0, if either x > 0 or y > 0 A set B ⊆ Rn is P a neighborhood of x ∈ Rn if for some δ > 0 it holds that {y ∈ Rn : dist(x, y) < δ} ⊆ B, 1 where dist(x, y) = 2 i∈[n] |xi − yi | 26

44

Proof Assume towards contradiction that there exists a polynomial satisfying the conditions of the claim, and let P be such polynomial of minimal degree. The conditions on P, together with the continuity of P, implies that P(x, 0) = 0 for all sufficiently large x < 0 (since for all sufficiently small δ > 0 it holds that (x, δ), (x, −δ) ∈ B, which implies P(x, δ) ≤ 0 and P(x, −δ) ≥ 0). By considering P(x, 0) as a univariate polynomial in x, we conclude that for some small enough δ > 0, it holds P(x, 0) = 0 for all x ∈ [−δ, 0]. Thus, this univariate polynomial must be identically zero (i.e., P(x, 0) = 0 for all x ∈ R), which means that P(x, y) can be written as P(x, y) = y · P0 (x, y) for some polynomial P0 : R → R of degree smaller than deg(P). Then, for all (x, y) ∈ B, we have 1. P0 (x, y) ≤ 0, if x < 0 and y < 0 2. P0 (x, y) ≤ 0, if x < 0 and y > 0 3. P0 (x, y) ≥ 0, if x > 0 and y < 0 4. P0 (x, y) ≤ 0, if x > 0 and y > 0 Thus, by considering the polynomial P00 (x, y) = P0 (−x, y), we are getting a polynomial that satisfies the conditions of the claim, whose degree is strictly smaller than deg(P). This contradicts the choice of P. By “shifting”, we obtain the following corollary to Claim 5.10. Corollary 5.11 Let (x0 , y0 ) ∈ R2 and let B ⊆ R2 be a neighborhood of (x0 , y0 ). Then, there is no real polynomial P : R2 → R such that for all (x, y) ∈ B it holds 1. P(x, y) ≥ 0, if x < x0 and y < y0 2. P(x, y) ≤ 0, if either x > x0 or y > y0 Proof Assume towards contradiction that there exists a polynomial P : R2 → R satisfying the foregoing conditions. Define a polynomial P0 (x, y) = P(x − x0 , y − y0 ), and let B 0 = {(x, y) : (x + x0 , y + y0 ) ∈ B} be a neighborhood of (0, 0). The set B 0 satisfies the conditions of Claim 5.10, whereas the polynomial P0 violates its conclusion. This contradicts the hypothesis that the polynomial P exists. Now, using Theorem 5.1 and Corollary 5.11, we show a negative result for a large family of classes of distributions that correspond to polytopes in ∆(r) . For sake of simplicity, we restrict ourselves to ternary distributions (i.e., r = 3).27 A more general result will appear in Proposition 5.15. Note that by Theorem 2.2 every class of binary distributions that corresponds to a polytope in ∆(2) has a POT (since this polytope must be a segment in R2 ). Thus, r ≥ 3 is necessary for a result of the kind of Proposition 5.12. 27

45

Proposition 5.12 (a simple impossibility result): For β, γ ∈ (0, 1) let Πβ,γ the class of distributions q = (q1 , q2 , q3 ) with domain {1, 2, 3}, such that q1 ≤ β and q2 ≤ γ; that is Πβ,γ = {q = (q1 , q2 , q3 ) : q1 ≤ β, q2 ≤ γ}. If β + γ < 1, then the class Πβ,γ does not have a POT. Proof Let β, γ ∈ (0, 1) such that β + γ < 1, and assume, towards a contradiction, that Πβ,γ has a POT. Then, according to Theorem 5.1, there is a polynomial P(q1 , q2 , q3 ) : ∆(3) → R, such that 1. P(q1 , q2 , q3 ) ≥ 0, if q1 ≤ β and q2 ≤ γ. 2. P(q1 , q2 , q3 ) < 0, otherwise. By substituting q3 = 1 − q1 − q2 , the polynomial P induces a bi-variate polynomial P0 : B → R, where B = {(q1 , q2 ) ∈ R2 : q1 , q2 ≥ 0 and q1 + q2 ≤ 1}, such that 1. P0 (q1 , q2 ) ≥ 0, if q1 ≤ β and q2 ≤ γ. 2. P0 (q1 , q2 ) < 0, otherwise. Now, if β + γ < 1, then B is a neighborhood of (β, γ) ∈ R2 . Hence, the existence of such P0 contradicts Corollary 5.11. Therefore the property Πβ,γ does not have a POT.

Remark: The restriction β + γ < 1 in Proposition 5.12 is necessary. Indeed, if β + γ ≥ 1, then the class Πβ,γ = {q = (q1 , q2 , q3 ) : q1 ≤ β, q2 ≤ γ} has a two-sided error POT, as can be seen by considering the polynomial Pβ,γ (q1 , q2 , q3 ) = (β − q1 )(γ − q2 ). The reader can easily verify that for all (q1 , q2 , q3 ) ∈ ∆(3) it holds that Pβ,γ (q1 , q2 , q3 ) ≥ 0 if and only if q1 ≤ β and q2 ≤ γ. Thus, by Theorem 5.1, for β + γ ≥ 1, the property Πβ,γ has a POT. Next, we use a similar argument in presenting two classes of distributions that have twosided error POTs, whose union does not have one. Proposition 5.13 (failure of closure under non-disjoint union): Fix β, γ > 0 that satisfy β + γ < 1. Let Π1 = {q = (q1 , q2 , q3 ) : q1 ≥ β} and Π2 = {q = (q1 , q2 , q3 ) : q2 ≥ γ} be classes of distributions with domain {1, 2, 3}. Then, both Π1 and Π2 have a two-sided error POT, while their union does not have one. Proof Clearly Π1 has a two-sided error POT that makes one query and has linear detection probability:28 Specifically, let the test make one query to the given distribution and accept if and 28

Indeed, this test in analogous to the first example that was presented in the introduction (and labeled “straightforward”).

46

only if the outcome is 1. If q ∈ Π1 , then the test accepts with probability at least β, while any distribution q ∈ ∆(3) that is -far from Π1 is accepted with probability at most β − . Similarly Π2 has a two-sided error POT. We prove that Π1 ∪ Π2 = {q = (q1 , q2 , q3 ) : q1 ≥ β or q2 ≥ γ} does not have a POT by, essentially, repeating the argument in the proof of Proposition 5.12. Assume towards contradiction that Π1 ∪ Π2 has a POT. Then, there is a polynomial P : ∆(3) → R, such that P(q1 , q2 , q3 ) ≥ 0 if and only if either q1 ≥ β or q2 ≥ γ. By substituting q3 = 1 − q1 − q2 , the polynomial P induces a bi-variate polynomial P0 : B → R, where B = {(q1 , q2 ) ∈ R2 : q1 , q2 ≥ 0 and q1 + q2 ≤ 1}, such that 1. P0 (q1 , q2 ) ≥ 0, if either q1 ≥ β or q2 ≥ γ. 2. P0 (q1 , q2 ) < 0, if q1 < β and q2 < γ. By letting P00 (q1 , q2 ) = −P0 (q1 , q2 ) we get a polynomial that violates the conclusion of Corollary 5.11. Hence, we conclude that the polynomials P0 and P can not exist, thus contradicting the hypothesis that Π1 ∪ Π2 has a POT. Finally, in the following Proposition 5.15 we generalize Proposition 5.12. Specifically, Proposition 5.15 makes assertions regarding the boundaries of properties having a POT, which in turn may lead to the impossibility results (regarding POTs). We will need the following definition of boundary of a class of distributions. Definition 5.14 (boundary of subsets to ∆(r) ): Let Π ⊆ ∆(r) be a class of distributions with domain [r]. The boundary of Π, denoted by ∂Π, is the set of all distributions q = (q1 , . . . , qr ) ∈ ∆(r) that are arbitrary close both to Π and to ∆(r) \ Π. That is, q ∈ ∂Π if, for every  > 0, there is a distribution q 0 ∈ Π such that dist(q, q 0 ) < , and there is a distribution q 00 ∈ ∆(r) \ Π such that dist(q, q 00 ) < . In particular, the boundary of ∆(r) is the empty set. Proposition 5.15 (on the boundaries of properties that have a POT): Let Π ⊆ ∆(r) be a property of distributions and suppose that Π has a POT. 1. If P : ∆(r) → R is a polynomial that satisfies Π = {q ∈ ∆(r) : P(q) ≥ 0}, then P(q) = 0 for all q ∈ ∂Π. In particular, ∂Π ⊆ Π. 2. Let S ⊆ ∆(r) be a non-trivial segment.29 If S ⊆ ∂Π, then Π contains the entire line that goes through S (restricted to ∆(r) ). 29 A non-trivial segment S is defined by two distinct points p, q ∈ ∆(r) , and is the set of all convex combinations of these points; that is, S = {λp + (1 − λ)q : λ ∈ [0, 1]}. The line that goes through S is the set {λp + (1 − λ)q : λ ∈ R}.

47

3. More generally, let S ⊆ ∆(r) be a convex set30 containing more than a single point. If S ⊆ ∂Π, then Π contains the entire affine hull31 of S (restricted to ∆(r) ). Proposition 5.12 can be derived by using a special case of the second item: Indeed, for β + γ < 1, the segment S = {(β, t, 1 − β − t) : t ∈ [0, γ]} is contained in the boundary of Πβ,γ (which consists of S ∪{(t, γ, 1−t−γ) : t ∈ [0, β]}). On the other hand, the line {q = (β, t, 1−β −t) : t ∈ [0, 1−β]} that contains S is not contained in Πβ,γ (since (β, 1−β, 0) 6∈ Πβ,γ , as 1−β > γ). Thus, Πβ,γ violates the assertion of this item, and so it can not have a POT. More generally, Proposition 5.15 implies, for example, that a large family of classes of distributions Π ⊆ ∆(r) that correspond to polytopes do not have POTs (see Corollary 5.16 below). Proof Let P : ∆(r) → R be a polynomial such that Π = {q ∈ ∆(r) : P(q) ≥ 0}, and let q ∈ ∂Π. Then, for every  > 0, there is some q 0 ∈ ∆(r) such that dist(q, q 0 ) <  and q 0 ∈ Π, and hence P(q 0 ) ≥ 0. Therefore, by continuity of P, we have P(q) ≥ 0. Similarly, for every  > 0, there is some q 0 ∈ ∆(r) such that dist(q, q 0 ) <  and q 0 ∈ / Π, and hence P(q 0 ) < 0. Therefore, again, by continuity of P, we have P(q) ≤ 0. We conclude that P(q) = 0 for all q ∈ ∂Π, and in particular ∂Π ⊆ Π. This completes the proof of the first part. For the second part let S ⊆ ∂Π be a segment, and let P : ∆(r) → R be a polynomial such that Π = {q ∈ ∆(r) : P(q) ≥ 0} (as guaranteed by Theorem 5.1). By the first part we have P(q) = 0 for every q ∈ S. Consider the restriction of P to the line containing S. The univariate polynomial describing this line is zero on the (infinitely many points residing on the) segment S, and thus it must be zero on the entire line containing S. That is, P is non-negative on the entire line containing S, and so it must be the case that the entire line (restricted to ∆(r) ) is contained in Π. For the third part denote by aff(S) the affine hull of S. For any p ∈ aff(S) there are two distinct points q, q 0 ∈ S such that p belongs to the line containing both q and q 0 .32 Since S is convex, the segment defined by q and q 0 is contained in S. Thus, by the second part, Π contains the entire line that goes though S, and in particular contains the points p itself. Therefore aff(S) ⊆ Π as required. Corollary 5.16 (in general, polytopes have no POT): Let r ≥ 3. Let Π ⊂ ∆(r) be a non-trivial polytope33 that has a vertex v that is internal to ∆(r) (i.e., v is not a convex combination of Π \ {v} and all coordinates of v are positive). Then, Π does not have a POT. A set S ⊆ Rr is said to be convex if for every p, q ∈ S and every λ ∈ [0, 1], it holds λp + (1 − λ)q ∈ S. In particular, every segment is a convex set. 31 For a set S ⊆ Rr , the affine hull of S is the smallest affine subspace of Rr containing S. For example, if S contains just two points (or the segment between them), then the affine hull of S is the line that goes through these points. 32 This fact follows from the convexity of S and the minimality of aff(S). 33 A non-trivial polytope Π is a set in Rr of more than a single point (i.e., |Π| > 1) that satisfy a system of linear inequalities; that is, each point in Π satisfies all inequalities. 30

48

Proof We shall prove that there exists a non-trivial segment S that satisfies the conditions (1) S is contained in ∂Π, and (2) the entire line that passes through S is not contained in Π. Thus, by the contrapositive of the second item of Proposition 5.15, we will conclude that Π does not have a POT. Actually, as we shall shortly see, such a segment can be found under more general conditions. We first dispose of the case that the polytope is a line segment. In that case, the polytope itself may serve as the line segment S, and we are done by Proposition 5.15. Otherwise, the polytope has (non-trivial) faces. Recall that Π can be expressed as the intersection of t half-spaces (corresponding to linear conditions), such that the i-th half-space is given by Hi = {(q1 , . . . , qr ) ∈ Rr :

X

(i)

αj qj ≤ βi }.

j∈[r]

Claim 5.17 There exists i1 , i2 ∈ [t] and points p, q ∈ Π that are internal to ∆(r) such that P P (i ) (i ) p satisfies j∈[r] αj k pj = βik for k = 1, 2, whereas q satisfies j∈[r] αj 1 qj = βi1 and P (i2 ) j∈[r] αj qj < βi2 . Indeed, we may use p = v and let q 6= v be an arbitrary point that is internal to a face that contains v but is on the boundary of the polytope (w.r.t ∆(r) ). The existence of such a face follows from the hypothesis that v has no zero coordinates and is a vertex of a multi-dimentional polytope (which resides inside ∆(r) ). Note that picking q as internal to this face, guarantees that q too has no zero coordinates. Without loss of generality, we may assume that i1 = 1 and i2 = 2, and proceed with any two points p and q that satisfy the above claim. We first observe that the line segment, S, defined by p and q is on the boundary of Π, because for every p0 ∈ S and  > 0 there exists a vector d that is shorter than  such that P (1) p0 + d ∈ ∆(r) \ Π (i.e., in particular, j∈[r] αj (p0j + dj ) > β1 ). Next, observe that the entire line that passes through S is not contained in Π, because for some  > 0 it holds that P (2) p0 = (1 + )p + q 0 ∈ ∆(r) \ Π (i.e., in particular, j∈[r] αj p0j > β2 ). Thus, the corollary follows by (the second item of) Proposition 5.15.

6

More on Graph Properties in the Adjacency Rep. Model

In this section we utilize the POTs for general distributions, presented in Section 5, in order to obtain additional results regarding POTs for graph properties (in the adjacency matrix model, aka the dense graph model). Let us start with a useful claim regarding the distributions of random induced subgraphs of two close graphs.

49

Claim 6.1 Let H = ([n], F ) be a fixed graph. For every two graphs G = ([N ], E) and G0 = ([N ], E 0 ) such that N ≥ n, if G and G0 are -close, then |ρH (G) − ρH (G0 )| =

|indH (G) − indH (G0 )|  ≤ n2 , N n

where indH (G) = ρH (G) ·

N n



denotes the number of induced copies of H in G.

The straightforward proof of this claim appears in Appendix A.6. Note that the converse of Claim 6.1 does not hold in general (e.g., consider a regular N -vertex graph consisting of a single super-cycle of length 2n + 2 versus one that consists of two disjoint super-cycles of length n + 1).34 However, in certain special cases that we shall encounter, a (quantitatively weaker) converse does hold. Testing collections of cliques. We consider subclasses of the property called “clique collection”, which was studied in [36, 37]. A graph is a clique collection if it consists of isolated cliques, and we shall be interested in graphs that are further restricted. Definition 6.2 (density-parameterized clique collections): Denote by CC ≤t the class of all graphs that consist of at most t isolated cliques. For ρ = (ρ1 , . . . , ρt ) ∈ ∆(t) , where ∆(t) is as in (10), denote by CC (ρ) the class of graphs that consists of exactly t isolated cliques of densities ρ1 , . . . , ρt . For D ⊆ ∆(t) define CC (D) = ∪ρ∈D CC (ρ). More generally, let D : N → P(∆(t) ) be a function that assigns to each N ∈ N a subset of ∆(t) . Denote by CC (D) the subclass of CC ≤t such that an N -vertex graph G that consists of t isolated cliques of densities ρ1 , . . . , ρt belongs to CC (D) if and only if (ρ1 , . . . , ρt ) ∈ D(N ). We first show in Proposition 6.6 that for any finite set D ⊆ ∆(t) the property CC (D) has a POT. This result generalizes Proposition 3.4, which deals with a special case where all cliques are required to have equal sizes (i.e., ρi = 1t for all i ∈ [t]).35 Next, we restrict ourselves to t = 2 and construct POTs for a larger family of subclasses of CC ≤2 . Specifically, we consider D : N → P(∆(2) ) such that for every N ∈ N the set D(N ) consists of a constant number of intervals (or segments) of ∆(2) . We obtain POTs for CC (D) in two extreme cases: (1) when these intervals cover a relatively small portion of ∆(2) (see Proposition 6.9), and (2) when these intervals cover almost all of ∆(2) (see Proposition 6.11). In contrast, when these intervals cover a (non-trivial) constant fraction of ∆(2) , no POT exists for CC (D) (see Proposition 6.14). 34 35

See definition preceding Proposition 3.5. Actually, Proposition 3.4 refers to the complementary graphs (i.e., regular complete t-partite graphs).

50

6.1

Testing CC(D) for finite sets D ⊆ ∆(t) .

Let us start with by sketching the proposed POT for CC (D), where D ⊆ ∆(t) is a finite set. On an input graph G, the tester checks that the distribution of the subgraphs of G induced by t + 1 vertices fits a distribution that would have been observed when considering some graph in CC (D). (This fit will be checked by a distribution tester as described in Section 5.) In particular, G ∈ CC ≤t must be {P3 , It+1 }-free, where P3 is a path with 3 vertices, and It+1 is an independent set of t + 1 vertices. Now, consider two cases for G that is far from CC (D). 1. Suppose first that G is far from the CC ≤t . Then, by the result of [37], the graph G contains many induced copies of either P3 or It+1 , which means that our test will observe a distribution different from any of the ones expected (i.e., corresponding to some graph in CC (D)). Thus, the distribution tester will accept with probability that is is noticeably smaller than its threshold probability. 2. Suppose now that G belongs to CC ≤t (or is very close to it), and yet is far from CC (D). Intuitively, in such a case the distribution induced by (t + 1)-vertex subgraphs of G is far from all distributions induced by CC (D). (This intuition, which reverses Claim 6.1 in the special case of graphs in CC ≤t , is proved in Lemma 6.5.) Thus, again, the distribution tester will also accept with probability that is noticeably smaller than the threshold probability. Hence, for any D ⊆ ∆(t) (not merely a finite one), constructing a POT for the graph property CC (D) reduces to constructing a POT for the class of distributions of random (t + 1)-vertex induced subgraphs of graphs in CC (D), where the reduction amounts to sampling the distribution of subgraphs induced by a random set of t + 1 vertices in the graph.36 Indeed, we shall rely on the above conclusion both in this subsection and in subsequent ones. For sake of technical convenience (or simplicity), we shall consider a distribution that slightly differs from the one given by the induced subgraphs of G on t+1 vertices. Specifically, we shall consider the distribution of subgraphs induced by selecting t + 1 random vertices in the graph with repetitions (rather than without repetitions), and treat the (rare) event in which two sampled vertices collide as if these vertices are different but connected by an edge. Definition 6.3 (SGk ) For a given graph G = ([N ], E), define distribution SGk as following: The distribution is supported on unlabeled graphs with k vertices. The sampler picks k vertices v1 , . . . , vk ∈ [N ] uniformly at random with repetitions, and outputs the graph ([k], ES ), where (i, j) ∈ ES if and only if either (vi , vj ) ∈ E or vi = vj . For ρ = (ρ1 , . . . , ρt ) ∈ ∆(t) and a graph G = ([N ], E) ∈ CC (ρ), define Sρt+1 to be SGt+1 . 36

Actually, it suffices to construct a POT for a class of distributions that contain all distributions induced by graphs in CC (D) and none of the distributions induced by graphs that are not in CC (D). This relaxation of the POT-construction task simplifies our exposition.

51

Note that the latter definition is independent of N , and depends only on ρ. For a graph H = ([k], EH ), we shall use interchangeably the notations Pr[SGk = H] and SGk (H) to denote the probability that SGk is isomorphic to H. For example, if G ∈ CC (ρ), we P t+1 may write SGt+1 (It+1 ) = 0, SGt+1 (Kt+1 ) = i∈[t] ρt+1 i , and Pr[SG contains induced P3 ] = 0. One can easily adapt the proof of Claim 6.1 to show an analogous claim for SGk (and all H’s): Claim 6.4 Fix k ≥ 2. For every two graphs G = ([N ], E) and G0 = ([N ], E 0 ), if G and G0 are -close, then for every graph H on k vertices it holds that k SG (H) − SGk 0 (H) ≤ k 2 . Hence, the statistical distance between SGk and SGk 0 is bounded by dist(SGk , SGk 0 ) ≤ O(), where the constant in the O() notation depends only on k. A key step in the proof of Proposition 6.6 (and other results in this section), is that for graphs in CC ≤t (a weak quantitative version of) the inverse claim also holds; that is, if G, G0 ∈ CC ≤t 0 and dist(SGt+1 , SGt+1 0 ) is small, then the graphs G and G are close. Quantitatively, we prove the following lemma. Lemma 6.5 Fix t ≥ 2 and let ρ = (ρ1 , . . . , ρt ). Let G0 = ([N ], E 0 ) ∈ CC ≤t , and assume √ t 0 0 0  )-close to CC (ρ), where the constant in the O() that dist(Sρt+1 , SGt+1 0 ) ≤  . Then, G is O( notation depends only on t. The lemma generalizes the structural claims of Proposition 3.4, which deals with a special case in which all ρi ’s are equal. We defer the proof of the lemma to Appendix A.7, and show how it implies existence of a POT for CC (D), where D is a finite subset of ∆(t) . Proposition 6.6 (POT for CC (D) when D ⊆ ∆(t) is finite): Fix t ≥ 1 and let D ⊆ ∆(t) be a finite set of size k. Then, the property CC (D) has a POT that makes k · (t2 + t) queries and has polynomial detection probability. Proof On input G the POT for CC (D) tests that the distribution SGt+1 belongs to the set SD = {Sρt+1 : ρ ∈ D}. That is, by considering distributions whose domain is the set of unlabeled graphs with t + 1 vertices, our goal is to test that SGt+1 belongs to the finite set SD . Using Corollary 5.7, we obtain a test that takes 2k independent samples of SGt+1 , accepts all distributions in SD with probability at least c (for some c > 0), and has quadratic detection probability. Therefore, for G ∈ CC (D), the test accepts G with probability at least c. Since 52

  t+1 each sample from SGt+1 requires t+1 queries to G, the test makes total of 2k queries to 2 2 G. Assume that on input G = ([N ], E) the test accepts with probability c − . Our aim is to 2 show that G is O(1/t )-close to CC (D). We first note that by Corollary 5.7, if the test accepts with probability c − , then there is some ρ ∈ D such that √ dist(SGt+1 , Sρt+1 ) = O( ).

(18)

In particular √ √ Pr[SGt+1 = It+1 ] = O( ) and Pr[SGk contains induced P3 ] = O( ). Thus, by [37, Proposition 4.11], the graph G is O(1/t )-close to CC ≤t . Let us fix a graph G0 = ([N ], E 0 ) ∈ CC ≤t such that G0 is O(1/t )-close to G. Then, according to Claim 6.4, the statistical distance between the distributions SGt+1 and SGt+1 is 0 1/t dist(SGt+1 , SGt+1 ). 0 ) = O(

(19)

Hence, by the triangle inequality, the statistical distance between SGt+1 and Sρt+1 is bounded by 0 t+1 t+1 t+1 t+1 ) ≤ dist(SGt+1 ) ≤ O(1/t ), dist(SGt+1 0 , SG ) + dist(SG , Sρ 0 , Sρ

(20)

where in the second inequality we used (18) and (19). t+1 So far we have shown that G0 = ([N ], E 0 ) ∈ CC ≤t satisfies dist(SGt+1 ) ≤ O(1/t ) for 0 , Sρ 2 some ρ ∈ D. By applying Lemma 6.5 with G0 and ρ, we conclude that G0 is O(1/t )-close to CC (ρ). Recalling that G0 is O(1/t )-close to G, we infer, using the triangle inequality, that G is 2 O(1/t )-close to CC (ρ). This completes the proof of the proposition.

Digest. For the sake of future use, it is good to distill the essence of the proof of Proposition 6.6. We start by establishing the following corollary to Lemma 6.5, and then conclude t+1 def with a reduction of testing CC (D) to testing SD = {SGt+1 : G ∈ CC(D)}. (Actually, we may t+1 also reduce the testing of CC (D) to testing any class of distributions that contains SD but does t+1 not contain any distribution SG such that G 6∈ CC (D).) Corollary 6.7 Fix t ≥ 2 and D ⊆ ∆(t) . Suppose that G = ([N ], E) is 0 -close to CC ≤t but t+1 -far from CC (D). Then, SGt+1 is (Ω(t ) − O(0 ))-far from SD , where the constants in the Ω() and O() notation depend only on t. t+1 Proof Let G0 ∈ CC ≤t be 0 -close to G, and suppose that SGt+1 0 ) is δ-close to SD . Then, by Lemma 6.5, the graph G0 is O(δ 1/t )-close to CC (D). It follows that G is (0 + O(δ 1/t ))-

53

close to CC (D), which implies δ = Ω( − 0 )t . On the other hand, by Claim 6.4, we have t+1 t+1 0 dist(SGt+1 , SGt+1 is (Ω( − 0 )t − O(0 ))-far from SD . The claim 0 ) = O( ), and hence SG follows. Corollary 6.8 Fix t ≥ 2 and D ⊆ ∆(t) . If G = ([N ], E) is -far from CC ≤t , then SGt+1 is 2 t+1 Ω(t )-far from SD . t+1 Proof Let 0 = 2t . If G is 0 -far from CC ≤t , then SGt+1 is Ω((0 )t/2 )-far from SD (by [37, ≤t t+1 0 t Proposition 4.11]). On the other hand, if G is  -close to CC , then SG is (Ω( ) − O(0 ))-far t+1 from SD , (by Corollary 6.7). In both cases the claim follows.

6.2

Testing CC(D) when D(N ) ⊆ ∆(2) is tiny

In this section we restrict ourselves to subclasses of CC ≤2 , namely the graphs consisting of two disjoint cliques. Recall the previous section where we have shown a POT for CC (D) where D ⊆ ∆(t) is a finite set. Here we extend Proposition 6.6 for the special case of t = 2 by considering subclasses of CC ≤2 where we allow the number of N -vertex graphs in the subclass to grow (slowly) with N . We introduce the following notation. Let α, β : N → [0, 21 ] be two functions that satisfy 0 ≤ α(N ) ≤ β(N ) ≤ 12 for all N ∈ N. For each N ∈ N define Dα(N ),β(N ) ⊆ ∆(2) to be the set of all pairs (ρ, 1 − ρ) ∈ ∆(2) such that min(ρ, 1 − ρ) ∈ [α(N ), β(N )], and define Dα,β : N → P(∆(2) ) to be Dα,β (N ) = Dα(N ),β(N ) . Then, the class CC (Dα,β ) consists of all graphs in CC ≤2 such that an N -vertex graph G belongs to CC (Dα,β ) if and only if G ∈ CC (Dα(N ),β(N ) ), i.e., the density of its smaller clique belongs to the interval [α(N ), β(N )]. Similarly, let α, β : N → [0, 21 ]t such that α1 (N ) ≤ β1 (N ) < · · · < αt (N ) ≤ βt (N ) for all N ∈ N. Define Dα,β : N → ∆(2) to be Dα,β (N ) = ∪i∈[t] Dαi ,βi (N ). That is, (ρ, 1 − ρ) ∈ Dα,β (N ) if and only if min(ρ, 1 − ρ) ∈ [αi (N ), βi (N )] for some i ∈ [t]. Then, the class CC (Dα,β ) consists of all graphs in CC ≤2 such that an N -vertex graph belongs to CC (Dα,β ) if and only if the density of the smaller clique belongs to [αi (N ), βi (N )] for some i ∈ [t], i.e., CC (Dα,β ) = ∪i∈[t] CC (Dαi ,βi ). We show in Proposition 6.9 that for any d ∈ (0, 1] and for any α, β : N → [0, 12 ] as above that satisfy βi (N ) − αi (N ) ≤ N −d for all i ∈ [t] the class CC (Dα,β ) has a POT whose parameters depend on d and t. The special case of d = 1 corresponds to Dα,β being a finite set, which has already been covered in Proposition 6.6. Therefore, the result of this section generalizes Proposition 6.6 in the special case of CC ≤2 . P Proposition 6.9 (POT for CC (Dα,β ), when i βi (N ) − αi (N ) < N −Ω(1) ): Let α, β : N → [0, 12 ]t and d ∈ (0, 1] be such that for every N ∈ N it holds that 0 ≤ βi (N ) − αi (N ) ≤ N −d for 54

all i ∈ [t]. Then, the class CC (Dα,β ) has a POT that makes O(t/d) queries, and has detection probability %() = Ω(O(t/d) ). For the sake of simplicity we shall first prove Proposition 6.9 in the special case of t = 1, and later discuss the generalization to larger values of t. Proposition 6.10 (Proposition 6.9, the case of t = 1): Let α, β : N → [0, 12 ] and d ∈ (0, 1] be such that for every N ∈ N it holds that 0 ≤ β(N ) − α(N ) ≤ N −d . Then, the class CC (Dα,β ) has a POT that makes O(1/d) queries, and has detection probability %() = Ω(O(1/d) ). Proof In light of Corollary 6.8, it suffices to present a POT for the class of distributions 3 (which equals {SG3 : G ∈ Dα,β }), since testing G (for the graph property CC (Dα,β )) is SD α,β reduced to testing the distribution SG3 (induced by random 3-subgraphs of the input graph G). We identify such distribution with SG3 = (q0 , q1 , q2 , q3 ) ∈ ∆(4) , where qi denotes the density 3 of the subgraph with 3 vertices and i edges in G. The existence of a POT for SD ⊆ ∆(4) is α,β proved by presenting a polynomial P : ∆(4) → R such that P is non-negative on distributions in 3 3 3 SD and is negative for distributions that are not in SD . The POT for SD is obtained by α,β α,β α,β 3 that contains invoking Theorem 5.1. (Actually, we shall present a POT for a superset of SD α,β 3 no distribution SG such that G 6∈ CC (Dα,β ).) Denote by Gρ the N -vertex graph consisting of two cliques of densities ρ and 1 − ρ, and 3 (β(N )) let κ3 (ρ) = ρ3 + (1 − ρ)3 be the K3 -density in Gρ . Let m(N ) = κ3 (α(N ))+κ be the 2 |κ3 (α(N ))−κ3 (β(N ))| average between κ3 (α(N )) and κ3 (β(N )), and let f (N ) = be the distance 2 from m(N ) to each of κ3 (α(N )) and κ3 (β(N )). The hypothesis 0 ≤ β(N ) − α(N ) ≤ N −d implies that f (N ) < 3 · N −d . Using the above notation, it suffices to construct a polynomial P : ∆(4) → R such that P(q0 , q1 , q2 , q3 ) is non-negative if and only if (1) q0 + q2 < N −3 and (2) |q3 − m(N )| ≤ f (N ). When applying P to a distribution induced by 3-subgraphs of any graph G condition (1) implies that G ∈ CC ≤2 , and condition (2) implies that G has the right K3 -density. Now, define P : ∆(4) → R to be P(q0 , q1 , q2 , q3 ) = f (N )k − (q3 − m(N ))k − q0 − q2 ,

(21)

where k ∈ 2N is an even integer such that f (N )k < N −3 /3. By our choice of f (N ), and using the fact that f (N ) < 3N −d , it is enough to take k = O(1/d). Note that P is a polynomial of total degree k = O(1/d), and all its coefficients are upper bounded by a constant that does  k not depend on N , e.g. by k/2 < 2O(1/d) . Thus, although the polynomial P depends on N , its degree and an upper bound on its coefficient sizes do not depend on N . This observation is important, because we want to use the POT derived from P in order to derive a POT for CC (Dα,β ), while the latter POT invokes the former POT with a varying value of N .

55

Still, by Theorem 5.1, there exists a distribution tester T that makes k queries, whose acceptance probability when given a distribution q = (q0 , q1 , q2 , q3 ) ∈ ∆(4) is Pr[T accepts q] = 0.5 + δ · (f (N )k − (q3 − m(N ))k − q0 − q2 )

(22)

for some absolute constant δ > 0 that only depends on d (since all coefficients of the polynomial P have absolute value at most 2O(1/d) ).37 We cannot rely on Theorem 5.1 in order to obtain a bound on the detection probability of T , since the bound provided in the theorem may depend arbitrarily on the polynomial P, which in turn depends on N . Instead, we shall lower bound the detection probability of T by directly referring to (22). But first, we verify that, for every graph G, the polynomial P is non-negative on (q0 , q1 , q2 , q3 ) = 3 SG if and only if G ∈ CC ( Dα,β ). Indeed, suppose first that G = ([N ], E) ∈ CC (Dα,β ). Then, the distribution SG3 = (q0 , q1 , q2 , q3 ) induced by 3-subgraph of G satisfies q0 = q2 = 0 (since G ∈ CC ≤2 ) and |q3 − m(N )| ≤ f (N ) (since G = Gρ for some ρ ∈ [α(N ), β(N )], and so q3 = κ3 (ρ) ∈ [κ3 (α(N )), κ3 (β(N ))] = [m(N ) ± f (N )]). Therefore, by (21), we have P(q0 , q1 , q2 , q3 ) ≤ f (N )k − f (N )k − 0. On the other hand, suppose that for some N -vertex graph G the polynomial P is nonnegative on the distribution SG3 = (q0 , q1 , q2 , q3 ). Since k is even, this implies that q0 + q2 ≤ f (N )k < N −3 as well as |q3 − m(N )| ≤ f (N ). Noting that qi comes in multiples of N −3 , the fact that q0 + q2 ≤ f (N )k < N −3 implies that q0 = q2 = 0 and hence G ∈ CC ≤2 (since CC ≤2 coincides with the class of graphs that are both P3 -free and I3 -free). Hence, G = Gρ for some ρ ∈ [0, 0.5]. Using |q3 − m(N )| ≤ f (N ), it follows that κ3 (ρ) = q3 ∈ [m(N ) ± f (N )] = [κ3 (α(N )), κ3 (β(N ))], which implies that G ∈ CC (Dα,β ). It is left to provide a upper bound on the value that P takes on an arbitrary (q0 , q1 , q2 , q3 ) ∈ (4) ∆ that is -far from the set of probabilities on which P is non-negative. Again, it actually suffices to provide such a bound for distributions (q0 , q1 , q2 , q3 ) that are of the form SG3 = (q0 , q1 , q2 , q3 ) for some N -vertex graph G ∈ CC(Dα,β ). Suppose that SG3 is at distance  > 0 3 . Since from the non-negative region (of P), which implies that it is -far from SD α(N ),β(N ) 3 the coordinates of SG3 (and of any distribution in SD ) are multiples of N −3 , it follows α(N ),β(N ) 3 that  ≥ N −3 . Now, let (p0 , p1 , p2 , p3 ) ∈ SD be closest to (q0 , q1 , q2 , q3 ). Recall that α(N ),β(N ) 3 (p0 , p1 , p2 , p3 ) ∈ SD implies p0 = p2 = 0 and |p3 − m(N )| ≤ f (N ). Then, either α(N ),β(N ) q0 + q2 ≥ /2 or |q3 − p3 | ≥ /2. In the first case (i.e., q0 + q2 ≥ /2), it follows that P(SG3 ) ≤ f (N )k − /2 < −/6 (since f (N )k < N −3 /3 and  ≥ N −3 ). In the second case (i.e., |q3 − p3 | ≥ /2), it must be that |q3 − m(N )| > f (N ) (since, otherwise, p00 = p02 = 0 and 3 p03 = q3 yields (p00 , p01 , p02 , p03 ) ∈ SD that is closer to (q0 , q1 , q2 , q3 )). Likewise, it must be α(N ),β(N ) The proof of Theorem 5.1 asserts that δ > 2−k /3B, where B is an upper bound on the absolute value of all coefficients of the polynomial P (and k is its degree). 37

56

that p3 = m(N ) + sgn(q3 − m(N )) · f (N ). It follows that |q3 − m(N )| ≥ f (N ) + /2 (since |p3 − m(N )| = f (N ) + |q3 − p3 |), which implies that P(SG3 ) ≤ f (N )k − (f (N ) + /2)k < −(/2)k . Finally, combining the above analysis with Corollary 6.8, we conclude that each graph in CC (Dα,β ) is accepted with probability at least 1/2, whereas each graph that is -far from CC (Dα,β ) is accepted with probability at most 0.5 − Ω(O(1) )k . The claim follows.

Generalization to t ≥ 2: We now show how the proof of Proposition 6.10 can be generalized to larger values of t, and so establish Proposition 6.9. The generalization follows the proof idea of Corollary 5.4 (regarding disjoint union of testable classes of distributions). For each 3 (βi (N ))| 3 (βi (N )) and fi (N ) = |κ3 (αi (N ))−κ i ∈ [t], define the parameters mi (N ) = κ3 (αi (N ))+κ 2 2 (analogously to the proof Proposition 6.10). Then, analogously to (21) define the polynomial P : ∆(4) → R to be P(q0 , q1 , q2 , q3 ) = (−1)t+1 ·

Y

fi (N )k − (q3 − mi (N ))k − q0 − q2



i∈[t]

for k ∈ 2N chosen as in the proof of Proposition 6.10. By following the proof of Proposition 6.10 (and the proof of Corollary 5.3), the reader can verify that this polynomial P yields a POT for CC (Dα,β ), which makes O(t/d) queries and has detection probability %() = Ω(O(t/d) ). We omit the details.

6.3

Testing CC(D) when D(N ) ⊆ ∆(2) is almost everything

Recall that by the characterization result of [37], the class CC ≤2 has a one-sided error POT. However, if we remove from CC ≤2 even just a single N -vertex graph for every N ∈ N, we will obtain a class that does not have a one-sided error POT. Nevertheless, as we prove in this section, such a class has a two-sided error POT. This fact is as a special case of Proposition 6.11. Recall, that for α, β : N → [0, 21 ]t such that α1 (N ) ≤ β1 (N ) < · · · < αt (N ) ≤ βt (N ) for all N ∈ N, the class CC (Dα,β ) = ∪i∈[t] CC (Dαi ,βi ) consists of all graphs in CC ≤2 such that an N -vertex graph belongs to CC (Dα,β ) if and only if the density of the smaller clique belongs to [αi (N ), βi (N )] for some i ∈ [t]. For example, the classes considered in the first paragraph of this section can be described as CC (Dα,β ) for some α, β : N → [0, 12 ]2 . We show in Proposition 6.11 that for α(N ), β(N ) ∈ [0, 21 ]t as above, if for each N ∈ N almost all graphs on N vertices in CC ≤2 belong to CC (Dα,β ), then CC (Dα,β ) has a two-sided error POT. In particular, the class of graphs obtained from CC ≤2 by removing any constant number of graphs on N vertices (for every N ∈ N) has a two-sided error POT.

57

P Proposition 6.11 (POT for CC (Dα,β ), when i βi (N )−αi (N ) > 12 −N −Ω(1) ): Let α, β : N → [0, 12 ]t be such that for every N ∈ N it holds that α1 (N ) ≤ β1 (N ) < · · · < αt (N ) ≤ βt (N ). P If for some d ∈ (0, 1] and all N ∈ N it holds that i∈[t] (βi (N ) − αi (N )) > 21 − N −d , then the class CC (Dα,β ) has a POT that makes 6t queries and has polynomial detection probability (where the exponent of the polynomial depends on t and d). Similarly to the previous section, we shall first prove Proposition 6.11 in the special case of t = 1, and later discuss the generalization to larger values of t. Proposition 6.12 (Proposition 6.11, the case of t = 1): Let α, β : N → [0, 12 ] and d ∈ (0, 1] be such that for every N ∈ N it holds that α(N ) < N −d and β(N ) > 21 − N −d . Then, the class CC (Dα,β ) has a POT that makes 6 queries, and has detection probability %() = Ω(8/d ). Proof In order to develop some intuition, note that the assumption α(N ) < N −d and β(N ) > 1 − N −d implies that no graph G ∈ CC ≤2 is too far from CC (Dα,β ). Indeed, suppose that G 2 consists of two cliques of sizes K and N −K, with K ≤ N −K, such that either K/N < α(N ) or K/N > β(N ). Then, the graph G is N −d -close to CC (Dα,β ). Hence, graphs that are in CC ≤2 \ CC (Dα,β ) are N −d -close to CC (Dα,β ), and it suffices to detect such graphs with probability 1/poly(N ) (since this probability is polynomially related to its distance from the property being tested). On the other hand, any graph that is -far from CC (Dα,β ) for  > 2/N d must be 2 -far from CC ≤2 , and it suffices to detect such graphs with probability poly(). Thus, it is enough to design a tester that satisfies the following three conditions, with respect to some threshold c: 1. Graphs in CC (Dα,β ) are accepted with probability at least c; 2. Graphs that are -far from CC ≤2 are accepted with probability lower than c − Ω(2 ); 3. Graphs that are not in CC (Dα,β ) are accepted with probability lower than c − Ω(N −8 ). For sake of simplicity, we consider first the simpler case in which either α = 0 or β = 1/2. In this case, the test we propose and analyze examines the subgraph induced by three random vertices and accepts according to some (carefully chosen) predetermined probabilities. That is, we associate four probabilities with the four possible 3-vertex subgraphs that can be seen such that pi denotes the probability that the test accepts when seeing a subgraph with i edges. Since we wish to accept only graphs in CC ≤2 , we may set p0 = p2 = 0 and p1 , p3 > 0. Thus, in designing such a test, our only freedom is in the choice of min(p1 , p3 ) > 0, since without loss of generality we may have max(p1 , p3 ) = 1. Note that in order to satisfy the relation between

58

Items 1 and 2, we must require p1 ≈ p3 (e.g., |p1 − p3 | < 1/N 3 ).38 Picking p1 and p3 such that |p1 − p3 | < 1/N 3 means that each N -vertex graph in CC ≤2 is accepted with probability approximately p1 , whereas N -vertex graphs that are -far from CC ≤2 are accepted with probability at most p1 − poly(), which follows from [37, Proposition 4.11] using the bound  > 1/N 2 (i.e., each N -vertex graph not in a class is at least 1/N 2 -far from it). Finally, setting p1 > p3 favors graphs in CC ≤2 that consist of two cliques of (approximately) the same size, whereas p1 < p3 favors graphs in CC ≤2 that consist of two cliques such that one is significantly larger than the other. Note, however, that this favoring amounts to at most |p1 − p3 |, but nevertheless this will suffice for Item 3. In the general case, where both α > 0 and β < 1/2, we view the distribution induced by 3subgraphs of G as an element of ∆(4) . Thus, the tester needs to check that this distribution lies in some (carefully chosen) predetermined subset of ∆(4) . Note that graphs in CC ≤2 are associated with distributions (q0 , q1 , q2 , q3 ) ∈ ∆(4) such that q0 = q2 = 0, where qi is the probability that a random induced 3-vertex graph has i edges. Furthermore, graphs in CC (Dα,β ) are associated with a segment Iα,β of the line {(0, q1 , 0, 1 − q1 ) : q1 ∈ [0, 1]}. Thus, the desired test may be thought of as a test of Iα,β ⊆ ∆(4) that is required to (1) accept points on Iα,β with probability at least c, (2) accept points that are -far from the entire line (which passes through Iα,β ) with probability at most c − poly(), and (3) accept points that are on the line but outside Iα,β with probability at most c − poly(1/N ). Corollary 5.8 suggests that such a test can be obtained by considering an ellipsoid that contains an Θ(1/N 3 )-neighborhood of Iα,β . (This ellipsoid has a very long axis in the direction of Iα,β , and is very slim in all the directions orthogonal to Iα,β .) Note, however, that in the analysis we cannot just invoke Corollary 5.8, because we need stronger bounds; i.e., for points that are far from the long axis of the ellipsoid we need bounds that do not depend on the volume of the ellipsoid. (Such bounds are readily obtained by using the ideas that underly the proof of Corollary 5.8.) We now turn to the actual proof. Let us fix N ∈ N, and denote for simplicity α = α(N ) and β = β(N ). We restrict ourselves to designing a POT for N -vertex graphs of CC (Dα,β ). Similarly to the proof of Proposition 6.10 the POT for CC (Dα,β ) relies on testing the distribution induced by 3-subgraphs of a given graph G. Specifically, we consider the distributions SG3 = (q0 , q1 , q2 , q3 ) ∈ ∆(4) induced by 3-subgraphs of G as given in Definition 6.3, where qi denotes the density of the subgraph with 3 vertices and i edges in G. As in the proof of Proposition 6.10 we show that there exists a polynomial P : ∆(4) → R with bounded coefficients such that P 38

To see why p1 ≈ p3 must hold, consider the graph Gρ consisting of two cliques such that the smaller one has density ρ ≤ 1/2. This graph is accepted with probability κ3 (ρ) · p3 + (1 − κ3 (ρ)) · p1 = p3 − (p3 − p1 )(1 − κ3 (ρ)), where κ3 (ρ) = ρ3 + (1 − ρ)3 ∈ [0.25, 1] is the K3 -density of Gρ . On the other hand, a graph obtained from Gρ by an omission of a single edge is accepted with probability that is Θ(N −2 ) smaller. Thus, |p3 − p1 | = O(N −2 ) is required for guaranteeing that both G0.01 and G0.49 are accepted with probability that is higher than either of the graphs obtained from them by omission of a single edge.

59

is positive on distributions induced by graphs in CC (Dα,β ), and is bounded below zero for distributions induced by graphs that are far from CC (Dα,β ). The POT is derived from such polynomial by following the recipe given in Theorem 5.1. For every ρ ∈ [0, 21 ] let Gρ be the graph consisting of two cliques of densities ρ and 1 − ρ. Then, the distribution induced by Gρ is 3 = (0, 3ρ(1 − ρ), 0, ρ3 + (1 − ρ)3 ). SG3 ρ = S(ρ,1−ρ)

Thus, by considering the segment Iα,β ⊆ ∆(4) contained in the line {(0, q1 , 0, 1 − q1 ) : q1 ∈ 3 3 we see that G ∈ CC(Dα,β ) if and only if SG3 ∈ and S(β,1−β) [0, 1]} between the points S(α,1−α) Iα,β . Next, define an ellipsoid Eα,β such that for every N -vertex graph G it holds that SG3 ∈ Iα,β if and only if SG3 ∈ Eα,β . This is done by taking an ellipsoid that contains only distributions that are N −3 /2-close to Iα,β (since for every two distinct graphs inducedistributions that are  1 3 3 −2 Ω(N ) from each other). Specifically, let m = (m0 , m1 , m2 , m3 ) = 2 S(α,1−α) + S(β,1−β) be the midpoint of the interval Iα,β , and let r denote the L2-distance between this midpoint and either endpoints of the segment Iα,β . (Note that for α ≈ 0 and β ≈ 21 we have Iα,β ≈ p [(0, 0, 0, 1), (0, 0.75, 0, 0.25)] and so r ≈ 2 · (0.75/2)2 ≈ 0.53.) Since we wish the ellipsoid to contain Iα,β and no point (q0 , q1 , q2 , q3 ) that is (1/2N 3 )-far from this segment, we relax the conditions q0 = q2 = 0 and (q1 − m1 )2 + (q3 − m3 )2 ≤ r2 into (q02 + q22 ) · (2N 3 )2 + (q1 − m1 )2 /r2 + (q3 − m3 )2 /r2 ≤ 1. This yields the following ellipsoid Eα,β (which contains Iα,β , has radius r in the direction of Iα,β and radius 1/2N 3 in the orthogonal directions): Eα,β

  1 1 (4) 6 2 2 6 2 2 = (q0 , q1 , q2 , q3 ) ∈ ∆ : 4N · q0 + 2 · (q1 − m1 ) + 4N · q2 + 2 · (q3 − m3 ) ≤ 1 . r r

As noted above, by applying Corollary 5.8 we obtain a POT for Eα,β whose detection probability depends on the parameters of the ellipsoid, and in particular depends on N . Since we are interested in detection probability that is independent of N , we use a more careful analysis of the tester for Eα,β . Specifically we define a polynomial P that is non-negative on Eα,β and is bounded below zero for distributions that is far from Eα,β . The ellipsoid Eα,β defines naturally a polynomial P0 : ∆(4) → R   1 1 2 6 2 2 6 2 P (q0 , q1 , q2 , q3 ) = 1 − 4N · q0 + 2 · (q1 − m1 ) + 4N · q2 + 2 · (q3 − m3 ) r r 0

such that P0 (q0 , q1 , q2 , q3 ) ≥ 0 if and only q ∈ Eα,β . However, in order to apply Theorem 5.1 it is required that the coefficients of P0 be upper bounded independently of N . Such polynomial P : ∆(4) → R is obtained from P0 by normalizing the coefficients so that all coefficients of P

60

are bounded by 1 in absolute value.   1 1 1 2 2 2 2 P(q0 , q1 , q2 , q3 ) = − q0 + 2 6 · (q1 − m1 ) + q2 + 2 6 · (q3 − m3 ) . 4N 6 4r N 4r N

(23)

Clearly for every q = (q0 , q1 , q2 , q3 ) ∈ ∆(4) it holds that P(q0 , q1 , q2 , q3 ) ≥ 0 if and only q ∈ Eα,β . By Theorem 5.1 since all coefficients of the polynomial P of degree deg(P) = 2 are bounded in absolute value, there exists a POT T that accepts all distributions q = (q0 , q1 , q2 , q3 ) ∈ ∆(4) with probability Pr[T accepts q] = 0.5 + δ · P(q0 , q1 , q2 , q3 ). for some absolute constant δ > 0 (that depends only on the degree of P and the maximal coefficient of P). We claim that such tester gives us a POT for CC (Dα,β ) with threshold probability 0.5. Indeed, if G ∈ CC(Dα,β ), then the distribution SG3 = (q0 , q1 , q2 , q3 ) belongs to Eα,β , and hence P(q0 , q1 , q2 , q3 ) ≥ 0, thus implying that the tester accepts G with probability at least 21 . Assume now that an N -vertex graph G is -far from CC (Dα,β ) for some  > 0, and let 3 SG = (q0 , q1 , q2 , q3 ). We shall prove that P(q0 , q1 , q2 , q3 ) ≤ −Ω(8/d ), thus implying that the tester accepts G with probability that is as most 0.5 − Ω(8/d ). The proof is partitioned into two cases, depending on . Case 1:  > 2N −d . Then, by the observation in the beginning of the proof the graph G is 2 -far from CC ≤2 . Thus, by [37, Proposition 4.11] it holds that q0 + q2 = Ω(), and hence, using  > N −d ≥ N −1 , we get 2 P(q0 , q1 , q2 , q3 ) ≤

 1 1 2 2 − q − Ω(2 ) = −Ω(2 ), + q 0 2 ≤ 6 4N 4N 6

where the constants in different Ω() notations might be different. Case 2:  ≤ 2N −d . We shall show next that G ∈ / CC (Dα,β ) implies P(q0 , q1 , q2 , q3 ) = −Ω(N −8 ), which is −Ω(8/d ) by the case hypothesis. We consider two subcases. 1. If G ∈ / CC ≤2 , then either q0 ≥ N −3 or q2 ≥ N −3 (because q0 + q2 > 0 whereas each comes at multiples of N −3 ). Therefore, P(q0 , q1 , q2 , q3 ) ≤ 4N1 6 − N16 , which is definitely smaller than −Ω(N −8 ). 2. If G ∈ CC ≤2 , then q0 = q2 = 0 and q1 = 1 − q3 . In this case, G consists of two cliques of sizes K and N −K such that either K ≤ αN −1 or βN +1 < K ≤ b N2 c. Assume that K ≥ βN + 1 (the case K ≤ αN − 1 is handled analogously). Then,

61

the K3 -density of G is q3 = (K/N )3 +(1−K/N )3 ≥ (β+1/N )3 +(1−(β+1/N ))3 ≥ β 3 +(1−β)3 +Ω(N −2 ), where the first inequality is due to monotonicity of the function x3 + (1 − x)3 for x ∈ [0, 0.5], and the second inequality follows from β ≤ K−1 ≤ 0.5 − N −1 . Hence, N the distance of q = (0, 1 − q3 , 0, q3 ) from m is r + Ω(N −2 ), where Ω(N −2 ) is 3 3 the distance of q to S(β,1−β) to m and r is the distance of S(β,1−β) to m (since the three points lie on the line (0, 1 − x, 0, x)). Hence, (q1 − m1 )2 + (q3 − m3 )2 = (r + Ω(N −2 ))2 > r2 + Ω(r2 N −2 ), which implies the required bound on P because P(q0 , q1 , q2 , q3 ) ≤

1 (q1 − m1 )2 (q3 − m3 )2 1 r2 + Ω(r2 N −2 ) − − < − < −Ω(N −8 ). 4N 6 4r2 N 6 4r2 N 6 4N 6 4r2 N 6

We have shown that if G is -far from CC (Dα,β ), then the tester accepts G with probability that is as most 0.5 − Ω(8/d ), and thus the tester has detection probability %() = Ω(8/d ), as required. Since deg(P) = 2, it follows that given a graph G the distribution tester needs 2 samples from SG3 , thus making 6 queries to the graph G itself. This completes the proof of the proposition.

Remark regarding the bounds on α(N ) and β(N ): Note that we can relax the restrictions on α(N ) and β(N ) at the cost of detection probability of the tester. Specifically, let µ : R → [0, 1] be a monotone function such that µ(x) → 0 as x grows to infinity. If we assume in Proposition 6.12 that α(N ) < µ(N ) and β(N ) > 21 − µ(N) for all  N ∈ N, then the POT described in the proof has detection probability %() = poly µ−11 () , where µ−1 denotes the inverse function of µ. Generalization to t ≥ 2: In order to generalize the above proof to larger values of t, we will need to define a collection of t segments {Iαi ,βi : i ∈ [t]}, and t ellipsoids {Eαi ,βi : i ∈ [t]}, instead of a single segment (and ellipsoid) as done in the proof of Proposition 6.12. The desired POT is obtained by constructing a tester analogous to the one in Proposition 5.5 for disjoint union of ellipsoids. The following corollary follows from the proof of Proposition 6.12. Note that this corollary uses only trivial conditions regarding α and β (i.e., 0 ≤ α(N ) ≤ β(N ) ≤ 0.5), but yields no POT. Corollary 6.13 (For the discussion in Section 3.5): Let α, β : N → [0, 12 ]t such that for every N ∈ N it holds that α1 (N ) ≤ β1 (N ) < · · · < αt (N ) ≤ βt (N ) for all N ∈ N. Then, there is a 62

constant b ∈ (0, 1) and for every N ∈ N there are w = (wH )H:|V (H)|=O(1) such that the set of N -vertex graphs in CC (Dα,β ) is exactly the set of all N -vertex graphs in Πw,b . Proof By the proof above there is a universal constant c and a (weak) tester for CC (Dα,β ) that accepts any graph G ∈ CC(Dα,β ) with probability at least c, and accepts any graph G ∈ / CC (Dα,β ) with probability smaller than c. The corollary follows by letting b = 1 − c and using the proof of Theorem 3.11 to obtain a characterization of CC (Dα,β ) in terms of Πw,b .

6.4

Impossibility results for subclasses of CC ≤2

In this section we prove that for any constants 0 ≤ α < β ≤ 21 such that β − α < 0.5, the class CC (Dα,β ) does not have a two-sided error POT. This impossibility result complements Propositions 6.9 and 6.11 by explaining why the POTs provided by the latter results apply to α, β : N → [0, 0.5] such that limN →∞ (β(N ) − α(N )) ∈ {0, 0.5}. The argument uses Theorem 3.11, which allows us to consider only potential testers that on input a graph G rule based on the distribution induced by O(1)-vertex subgraphs of G. We show that if such a potential tester T provides a characterization of CC (Dα,β ) with respect to some threshold c (i.e., T accepts G with probability at least c if and only if G ∈ CC (Dα,β )), then there exist infinitely many graphs G that are Ω(1)-far from CC (Dα,β ) such that T accepts G with probability c − O(1/|V (G)|2 ). It follows that T cannot be a POT. Proposition 6.14 (classes CC (Dα,β ) that have no POT): Let 0 ≤ α < β ≤ 21 be constants such that either α > 0 or β < 12 . Then, the class CC (Dα,β ) does not have a two-sided error POT. Proof We start with an overview of the proof, where we assume towards the contradiction that there is a constant query tester T for CC (Dα,β ) with threshold probability c. Then (similarly to the proof of Theorem 3.11), we may assume that, for some constant t, the tester T reads a subgraph of G induced by random uniformly chosen t vertices, and accepts a view H = ([h], EH ) with probability wH ∈ [0, 1]. Hence the probability that T accepts G can be written P as H wH · dnsH (G), where the sum is over all t-vertex graphs, dnsH (G) denotes the density of H as an induced subgraph of G, and wH ∈ [0, 1] for each t-vertex graph H. We first claim that all N -vertex graphs G ∈ CC (Dα,β ) are accepted with probability that is at most c + O(N −2 ). This follows from the fact that every graph G ∈ CC (Dα,β ) is (N −2 )-close to a graph G0 that is not in CC (Dα,β ), since we can remove an edge from the larger clique of G to obtain such G0 . Therefore, the distribution (dnsH (G0 ))H:|V (H)|=t is O(N −2 )-close to the P distribution (dnsH (G))H:|V (H)|=t , and so H wH · dnsH (G0 ) < c (per G0 ∈ / CC (Dα,β )) implies P −2 ). H wH · dnsH (G) = c + O(N

63

In the second step we shall claim (see proof outline below) that since all graphs G ∈ CC (Dα,β ) are accepted with probability that deviates from c by at most O(N −2 ), it must be the case that all graphs in CC ≤2 are accepted by T with probability that is O(N −2 )-close to c, and hence with probability at least c−O(N −2 ). In particular, if β < 21 , then the graph consisting of two cliques each of density 12 is Ω(1)-far from CC (Dα,β ), yet it is accepted with probability at least c − O(N −2 ). Similarly, if α > 0, then the graph G = Kn is Ω(1)-far from CC (Dα,β ), yet it is accepted with probability at least c − O(N −2 ). Therefore, the detection probability of T , on some N -vertex graphs that are Ω(1)-far from the property, is at most O(N −2 ), thus implying that T is not a POT for CC (Dα,β ). The second step is proven by focusing on the behavior of T on the various graphs in CC ≤2 , while noting that this behavior (or rather T ’s acceptance probability) depends only on the density of the smallest clique, denoted ρ, which in turn determines a unique N -vertex graph in CC ≤2 , denoted Gρ . Recall that the probability that T accepts the graph Gρ is a linear combination (with coefficients in [0, 1]) of the corresponding densities (dnsH (Gρ ))H:|V (H)|=t . Moreover, for every t-vertex graph H, the density dnsH (Gρ ) is a polynomial (in ρ) of degree at most t.39 Therefore, the probability that T accepts Gρ can be written as a polynomial (in ρ) of degree at most t. Let us denote this polynomial by P : [0, 12 ] → R. Recall that (by the first step) P is almost constant on the interval [α, β]. We claim that this implies that P is almost constant also on the entire interval [0, 21 ], where the “almost” in the conclusion depends on deg(P), on the ratio between the length of the interval β − α, and on the “almost” parameter in the hypothesis. (This claim follows almost immediately from Claim 6.15, which in turn follows by polynomial interpolation.) Now, using the first step by which P(ρ) is O(1/N 2 )-close to c for every ρ ∈ [α, β], we infer that P(ρ) is O(1/N 2 )-close to c for every ρ ∈ [0, 12 ], where the constants in the O() notations might differ. Since, for every ρ ∈ [0, 21 ], we have Pr[T accepts Gρ ] = P(ρ) it follows that all graphs in CC ≤2 are accepted with probability that is O(1/N 2 )-close to the threshold. As explained above, this implies that T is not a POT for CC (Dα,β ). The detailed argument is given next. Assume towards contradiction that CC (Dα,β ) has a POT T with threshold probability c. Then, as explained in the proof overview, we may assume that for some constant t and for every N ∈ N there exists a sequence w = (wH )H:|V (H)|=t taking values in [0, 1], such that the acceptance probability of T when given a graph G on N vertices can be written as Pr[T accepts G] =

X

wH · dnsH (G),

(24)

H:|V (H)|=t

((1−ρ)N (ρN ) t t ) + , and the density of H that (Nt ) (Nt ) (1−ρ)N ρN ( )·ρN ( )·(1−ρ)N consists of Kt−1 with additional isolated vertex is dnsH (Gρ ) = t−1 N + t−1N . (t) (t) 39

For example, for t ≥ 3 the density of Kt in Gρ is dnsKt (Gρ ) =

64

where the sum is over all unlabeled t-vertex graphs H, and dnsH (G) denotes the density of H as a subgraph in G. Note that for every N -vertex graph G ∈ CC(Dα,β ) we can drop an internal edge of the larger clique to obtain a graph G0 that does not belong to CC (Dα,β ). Hence, the graphs G and G0 are (N −2 )-close. Therefore, by Claim 6.1, the densities dnsH (G) and dnsH (G0 ) differ by at most t2 · N −2 for every H. We conclude that T accepts the graph G and G0 with almost the same probability. That is: X X wH · dnsH (G) − wH · dnsH (G0 ) |Pr[T accepts G] − Pr[T accepts G0 ]| = H H X ≤ wH · |dnsH (G) − dnsH (G0 )| H

≤ rt2 · N −2 , 2

where r < 2t denotes the number of unlabeled t-vertex graphs. Since any N -vertex graph G0 ∈ / CC (Dα,β ) must be accepted with probability smaller than c, we conclude that any N -vertex graph G ∈ CC (Dα,β ) is accepted by T with probability at most c + O(N −2 ), where the constant in the O() notation depends only on t. This implies the following inequality: c ≤ Pr[T accepts G] ≤ c + rt2 · N −2

for every G ∈ CC (Dα,β ).

(25)

In order to complete the proof we shall prove that for all N -vertex graphs G ∈ CC ≤2 the acceptance probability of T is O(N −2 )-close to the threshold c, where the constant in the O() notation depends only on t and β − α. As explained in the proof overview, since α, β are constants, there is a graph in CC ≤2 that is Ω(1)-far from CC (Dα,β ). Yet, since this graph is in CC ≤2 , it is accepted with probability at least c − O(N −2 ), thus implying that T is not a POT for CC (Dα,β ). In light of the above, we now focus on the behavior of T only on N -vertex input graphs that are in CC ≤2 . For every ρ ∈ [0, 21 ] such that ρN ∈ N, let Gρ be the N -vertex graph in CC ≤2 with cliques of densities ρ and 1 − ρ. Then, as noted in the proof overview, for every t-vertex graph H the density of H in Gρ is a polynomial (in ρ) of degree t, and thus, by (24), the probability that T accepts the input graph Gρ is also a polynomial of degree t. Consider a polynomial P : [0, 12 ] → R defined as P(ρ) = Pr[T accepts Gρ ] − c. Recall that we have shown, in the first step of the proof, that Pr[T accepts Gρ ] is (1/N 2 )-close

65

to c for every ρ ∈ [α, β] that satisfies ρN ∈ N. Specifically, by (25), the polynomial P satisfies the following condition: P(ρ) ∈ [0,

rt2 ] N2

for all ρ ∈ [α, β] that satisfy ρN ∈ N.

(26)

It follows from the next claim that a polynomial P(ρ) that satisfies (26) cannot deviate from zero by more than O(N −2 ) also on a larger interval [0, 12 ]. Claim 6.15 Let P : [0, 21 ] → R be a polynomial of degree at most t. Assume that for some , δ > 0 there are t + 1 points ρ1 , . . . , ρt+1 ∈ [0, 12 ] such that |ρi − ρj | ≥ δ for all i 6= j ∈ [t + 1], and |P(ρi )| ≤  for all i ∈ [t + 1]. Then, for every x ∈ [0, 21 ], it holds that |P(x)| < (t + 1) 2t1δt . Before proving the claim, let us see how it allows us to complete the proof of Proposition 6.14. , and assume that N is large enough (e.g., N > 4δ ). Let us choose t + 1 values Let δ = β−α t α ≤ ρ1 < ρ2 · · · < ρt+1 ≤ β that satisfy ρi N ∈ N for all i ∈ [t+1] and |ρi+1 −ρi | ≥ δ− N2 > 12 δ rt2 for all i ∈ [d].40 Then, by applying Claim 6.15 (and recalling that P(ρ) ∈ [0, N 2 ] for all ρ ∈ [α, β] that satisfy ρN ∈ N), we conclude that 1 rt2 |P(ρ)| ≤ (t + 1) t · 2 = (t + 1) δ N



t β−α

t

· rt2 · N −2

1 for all ρ ∈ [0, ] . 2

(27)

Therefore, the tester T accepts all N -vertex graphs Gρ ∈ CC ≤2 with probability Pr[T accepts Gρ ] = c + P(ρ) > c − O(N −2 ), where the constant in the O() notation depends only on t and β − α. Since there are graphs in CC ≤2 that are Ω(1)-far from CC (Dα,β ), we conclude that T is not a POT for CC (Dα,β ).

Extension to smaller intervals (i.e., β(N ) = α(N ) + N −o(1) ). The proof of Proposition 6.14 extends also to the case that α and β are functions that are relatively close.  Thepoint is that the t

t only dependence on β −α occurs when we use the hypothesis that (t+1) β−α ·rt2 = o(N 2 ), which implied that Ω(1)-far graphs are accepted with probability c − o(1). Recalling that t and r are constants (which are determined by the query complexity of the potential tester), we infer that the argument holds as long as β(N ) = α(N ) + o(N 2/t ). Since we should fail each potential POT (i.e., each constant t), we can support any β(N ) = α(N ) + N −o(1) , which perfectly complements Proposition 6.10.

This can be done by letting ρi = α + (i − 1)δ ∈ [α, β] for all i ∈ [t + 1] (recall δ = β−α t ). Then |ρi − ρj | ≥ δ for all i 6= j ∈ [d + 1]. Note that ρi ’s might not satisfy the condition ρi N ∈ N. However, by modifying ρi by at most N1 we can obtain ρi ∈ [α, β] for which ρi N ∈ N holds. Such modification changes the distance between ρi and ρj by at most N2 . 40

66

We return to the proof of Claim 6.15. Proof of Claim 6.15 The proof uses interpolation of polynomials. Specifically, if we are given the values of P in t + 1 points ρ1 , . . . , ρt+1 ∈ [0, 21 ], then the polynomial P can be written as ! X Y x − ρj P(x) = P(ρi ). ρi − ρj j6=i i∈[t+1]

Therefore, for every x ∈ [0, 12 ] we can upper bound |P(x)| as follows: |P(x)| ≤

[using |ρi − ρj | ≥ δ and |P(ρi )| ≤ ] ≤ [using x, ρj ∈ [0, 12 ]] ≤ = This completes the proof of the claim.

67

! X Y x − ρj P(ρi ) ρ − ρ i j j6=i i∈[t+1] X Y x − ρj · δ j6 = i i∈[t+1] ! X Y 1 · 2δ j6=i i∈[t+1]  t 1 · . (t + 1) 2δ

Part II

Greedy Random Walk Abstract We study a discrete time self interacting random process on graphs, which we call Greedy Random Walk. The walker is located initially at some vertex. As time evolves, each vertex maintains the set of adjacent edges touching it that have not been crossed yet by the walker. At each step, the walker being at some vertex, picks an adjacent edge among the edges that have not traversed thus far according to some (deterministic or randomized) rule. If all the adjacent edges have already been traversed, then an adjacent edge is chosen uniformly at random. After picking an edge the walker jumps along it to the neighboring vertex. We show that the expected edge cover time of the greedy random walk is linear in the number of edges for certain natural families of graphs. Examples of such graphs include the complete graph, even degree expanders of logarithmic girth, and the hypercube graph. We also show that GRW is transient in Zd for all d ≥ 3.

68

7

Introduction to Part II

Greedy Random Walk (GRW) on a graph is a discrete time random process, with transition law defined as follows. The walker is located initially at some vertex of the graph. As time evolves each vertex in the graph maintains the set of all adjacent edges that the walker has not crossed yet. At each step the walker picks an unvisited edge among the edges adjacent to its current location arbitrarily according to some rule. If all the adjacent edges have already been visited, an adjacent edge is picked uniformly at random. The walker, then, jumps to a neighboring vertex along the chosen edge. We think of the process as trying to cover the graph as fast as possible by using a greedy rule that prefers to walk along an unvisited edge whenever possible. This suggests the name Greedy Random Walk. Formally, for an undirected graph G = (V, E) a GRW with a (possibly randomized) rule R on G is a sequence X0 , X1 , X2 , . . . of random variables defined on V with the following transition probabilities. For each t ≥ 0 define Ht = {(Xs−1 , Xs ) ∈ E : 0 < s ≤ t}

(28)

to be the set of all the edges traversed by the walk up to time t. For every vertex v ∈ V and time t ≥ 0 define Jt (v) = {e ∈ E : v ∈ e and e ∈ / Ht } (29) to be the set of all the edges touching v that have not been traversed by the walk up to time t. Denoting by Nv is the set of neighbors of v in G, the transition probabilities are given by:

Pr[Xt+1 = w|(Xi )i≤t ] =

   R(w|(Xi )i≤t )   1

|NXt |    0

Jt (Xt ) 6= ∅ and {Xt , w} ∈ Jt (Xt ) Jt (Xt ) = ∅ and w ∈ NXt otherwise ,

where R(w|(Xi )i≤t ) denotes the probability of choosing w ∈ NXt conditioned on the information regarding the process so far. A natural rule R is to choose uniformly at random an edge among the adjacent unvisited edges Jt (v) of the current vertex v = Xt . We shall denote this rule by RRAND . One can think of GRW as a random walk where the walker wishes to cover the graph as fast as possible and is allowed to make some local computation at each vertex she visits (e.g., mark the last edge that the walker used to reach the current vertex, and also mark the edge that the walker is going to use in the next step), but is not allowed to transfer information between vertices. A motivation for the study of GRW arises from distributed computation in which an agent sits on every vertex of a graph. Each agent has a list of neighbors and is allowed 69

to communicate only with them. The goal is to let all the agents use some resource as fast as possible, while using only the local information for each vertex, and no extra information regarding the graph and the vertices that have already been visited. An agent has a list of neighbors who communicated with him thus far during the process, and each time the agent receives the resource, she is allowed to perform only local computations before moving it to one of her neighbors. We will see that the GRW protocol performs better than simple random walk (SRW) on some families of graphs. The main difficulty in analyzing such random process comes from the fact that GRW is self-interacting, i.e., is not a Markov chain (meaning that the probability distribution of the next step depends not only on the current position of the walker, but also on the entire walk thus far). Although in many cases a certain property of self interacting random walks can be observed in simulations or seems to be suggested by “heuristical proof”, typically it is much harder to give robust proofs for random walks that do not have the Markov property. Related models include RW with choice [10] non-backtracking RW [4], RW with neighborhood exploration [15], excited RW [14] reinforced RW [62], rotor router RW [33], and more. Recently this model has been considered independently by Berenbrink et al. [16]. They showed that if G is an even degree expander graph such that every vertex is contained in a vertex-induced cycle of logarithmic length, then the expected vertex cover time by GRW is linear for any rule R.

7.1

Our Results

In Section 8 we study the edge cover time of GRW on finite graphs. Obviously, the edge cover time of any graph G = (V, E) is at least |E|, as the walker must cross every edge at least once. We prove bounds on the edge cover time of GRW by analyzing the “overhead” of the walk, i.e., the difference between the expected edge cover time of the walk, and the number of edges in a graph. For example, we establish that the expected time it takes for GRW to go via all edges of  Kn , the complete graph on n vertices is n2 +(1+o(1))n log(n). Therefore, the aforementioned “overhead” in the case of Kn is (1+o(1))n log(n). In particular, all edges of Kn are covered by  GRW in time is (1 + o(1)) · n2 , which is asymptotically faster than Θ(n2 log n), the expected edge cover time of SRW. We show that for certain families of graphs the expected edge cover time of GRW is asymptotically faster than that of SRW. In particular, we establish that expected edge cover time of GRW is linear in the number of edges for the complete graph, for the hypercube graph, and for constant even degree expanders with logarithmic girth. The later result is claimed in the paper of Berenbrink et al. [16]. Another interesting result is given in Lemma 8.9 that bounds the edge cover time of an even degree graph by GRW in terms of its vertex cover time by SRW. Specifically, we show

70

that for any graph G = (V, E) whose vertices have even degrees, and its expected vertex cover time by SRW is C it holds that the expected edge cover time of G using GRW is at most |E| + C. Therefore, for even degree graphs of logarithmic degree, whose vertex cover time is O(n log(n)) we obtain a bound on the edge cover time which is linear in the number of edges. These result should be compared with the general lower bound on the expected cover time of graphs by SRW. Recall that Feige [31] has shown that for any graph with n vertices the expected vertex cover time by a simple random walk is at least (1 − o(1))n log n. Analogously, for all graphs the expected edge cover cover is at least Ω(|E| log(|E|)) (see [75], [1]). In this direction, a result of Benjamini, Gurel-Gurevich and Morris [12] says that for bounded degree graphs linear cover time is exponentially unlikely. We are also interested in the behavior of GRW on infinite graphs. It is well known that SRW on Zd is transient if d ≥ 3, and recurrent otherwise. We prove that GRW is transient on Zd for d ≥ 3. The case of d = 2 remains open, and it is shown to be equivalent the notorious two dimensional mirror model problem [68, 24]. Our proof holds for all graphs with even degrees on which SRW is transient. This leaves unsolved the question of transience of GRW in latices with odd degrees. These and other related results are discussed in Section 9, which can be read independently of the rest of the paper. General remarks: The choice of the rule R In the first version of this paper we considered GRW that uses only the rule RRAND . After the first version of our work was uploaded to arxiv.org, Berenbrink et al. [16] independently published their work in which they consider GRW with any (deterministic or randomized) rule, even adversarial ones that try to slow-down the process. After reading their results we have noticed that in fact our proofs for bounding from above the edge cover time are independent of R and hold for any rule as well. The choice of the starting vertex In all of our results on cover time the bounds are independent of the starting vertex. Also, in most cases the considered graphs are vertex-transitive, and so, specification of the starting vertex is unnecessary.

7.2

Notation

We use the standard notations of asymptotic growth rates. For two functions f, g : N → R+ we write f = O(g), when there is a positive constant C ∈ R such that f (n) < Cg(n) for all sufficiently large values of n. The notation f = Ω(g) means there is a positive constant c > 0 such that f (n) > cg(n) for all sufficiently large values of n, and f = Θ(g) means both (n) = 0. f = O(g) and f = Ω(g). We write f = o(g) if limn→∞ fg(n)

71

8

Edge Cover Time of Finite Graphs

In GRW the choice of the next move depends on the history of the walk with respect the adjacent edges of the current vertex. Hence, it seems more natural to ask about the edge cover time, rather than vertex cover time. We show that for some common families of graphs the greedy walk covers the edges asymptotically faster than the simple random walk. Let G = (V, E) be a connected undirected graph on n vertices. Denote by CE (G) the edge cover time of GRW, i.e., the number of steps it takes for GRW to traverse all edges of G. Note that since the graph G is finite, the edge cover time CE (G) is a.s. finite. The basic idea behind the analysis is as follows. Divide the random discrete time interval [0, CE (G)] in two (random) parts: 1. The greedy part: all times in which the walker is at a vertex, that has an adjacent edge, yet to be covered, i.e., all times t ∈ [0, CE (G)] such that {Xt , Xt+1 } ∈ / Ht . 2. The simple part: all times in which the walker is positioned at a vertex all of whose adjacent edges have already been covered previously, i.e., all times t ∈ [0, CE (G)] such that {Xt , Xt+1 } ∈ Ht . In these times the choice of the next move has the same distribution as the one of a simple random walk. Roughly speaking, the GRW typically looks as follows. It starts at t0 = 0 in a greedy time part. This time part lasts until reaching at time s1 a vertex v1 whose all adjacent edges have already been covered. We say in this situation that the walk got stuck. This means that the last step before getting stuck covered the last edge touching v1 . Since at time s1 all edges touching v1 have already been covered, the walker picks an edge at random among these edges. In other words the walk is now in a simple time part, started at time s1 . This time part lasts until the walker reaches at time t1 a vertex u1 , that has an adjacent edge which has not been covered yet. By definition, the next step will belong to a greedy part, and will continue until reaching at time s2 some vertex v2 , whose all adjacent edges have already been covered, thus starting the second simple part. The walk continues in this way until all edges are covered, and then becomes a simple random walk. Formally, define the times t0 , s1 , t1 , s2 , t2 , . . . , sn recursively, where the intervals [ti−1 , si ) denote the ith greedy part, and the intervals [si , ti ) denote the ith simple part of the walk. t0 = 0,

si+1

 inf{t < t ≤ C (G) : J (X ) = ∅} i E t t = C (G) E

72

if there is such t otherwise ,

ti+1 =

 inf{s

i+1

< t ≤ CE (G) : Jt (Xt ) 6= ∅}

C (G) E

if there is such t otherwise .

We say the walk got stuck at time t if t = si for some i ∈ N. It should be clear from the description, that the vertices Xsi must all be distinct, as Xsi is the ith time that the walk got stuck, and it is impossible to get stuck in the same vertex twice. Therefore, it is enough to define the times ti and si only for i ≤ n (where n denotes the number of vertices in G). This gives a random partition (0 = t0 < s1 < t1 < s2 < t2 < · · · < tk−1 < sk = tk = · · · = sn = CE (G)) of the time segment [0, CE (G)], where the random variable k ≤ n is the first i for which si = CE (G), i.e., all edges of G are covered. Note that the total time the walker spends in the greedy parts equals to the number of edges |E|, implying the following expression on the edge cover time. CE (G) = |E| +

n X

(ti − si ).

i=1

By linearity of expectation we have the following simple expression for the expected edge cover time, which will be the key formula in our proofs. Proposition 8.1 (Key formula) Let G = (V, E) be a graph with n vertices, and let t0 , s1 , t1 , s2 , t2 , . . . be random times as above. Then, the expected edge cover time of GRW on G is E[CE (G)] = |E| +

n X

E[ti − si ].

(30)

i=1

Thus, in order to bound E[CE (G)], it is enough to bound the expected total size of all simple P parts, i.e., E[ ki=1 (ti − si )]. In order to apply Proposition 8.1, the following notation will be convenient. For i = 1, . . . , n let Bi = {v ∈ V : Jsi (v) = ∅}, be the set of vertices, all of whose adjacent edges are covered by time si (B stands for "bad"; if the walker is in some vertex in B, then the next step will be along an edge that has already been crossed, thus increasing the edge cover time). By the definition of si and ti , we note that Bi = {v ∈ V : Jt (v) = ∅} for every t ∈ [si , ti ]. Note also that Bi ⊆ Bj for all i < j, and the vertex vj = Xsj in which the walker got stick at time sj does not belong to Bi for i < j, as at any time t < sj the vertex vj still had an adjacent edge which has not been covered yet. Thus the containment Bi $ Bj is strict for all i < j ≤ k i.e., the sets Bi form a strictly increasing

73

chain until it stabilizes at Bk = V : B1 $ B2 $ · · · $ Bk = Bk+1 = . . . = Bn = V.

(31)

|Bi | < n if and only if i < k.

(32)

In particular,

Conditioned on Bi and Xsi , the length of the time segment [si , ti ] is distributed as the escape time of a simple random walk from Bi , when started at Xsi . That is, conditioned on Bi and Xsi , the random variable (ti − si ) has the same distribution as T (Xsi , Bi ), where T (v, B) = min{t : Yt ∈ / B|Y0 = v}, and Y0 , Y1 , . . . is a simple random walk on G started at Y0 = v. By applying known bounds of the expected escape time of SRW we shall use Proposition 8.1 to upper bound the expected edge cover time of GRW.

8.1

The Complete Graph

We prove in this section that for the complete graph with n vertices the expected edge cover  time is (1 + o(1)) n2 . Specifically, we prove the following result. Theorem 8.2 For any rule R the expected edge cover time of GRW on Kn is bounded by E[CE (Kn )] ≤ |E| + (1 + o(1))n log n. This is an improvement over the Θ(n2 log n) time of the SRW, which follows from the coupon collector argument. Proof Consider the complete n-vertex graph G = Kn . The proof relies on the following simple observation. For any set of vertices B ⊆ V , the escape time of SRW from B depends only on the size of B, and has geometric distribution. Specifically, for each i = 1, . . . , n, the quantity ti − si conditioned on Bi is distributed geometrically:

ti − si ∼

 G( n−|Bi | )

if |Bi | < n

0

otherwise.

n−1

(33)

Denote by Ti the expected escape time from the subset Bi . Then,

Ti = E(ti − si |Bi ) =

 

n−1 n−|Bi |

0

74

if |Bi | < n otherwise.

(34)

By averaging over Bi ’s, the quantity n X

E[ti − si ] =

i=1

n X

Pn

i=1

E[ti − si ] is equal to

E[E(ti − si |Bi )] =

i=1

n X i=1

" k−1 # X n−1 E[Ti ] = E , n − |Bi | i=1

where the last equality follows from linearity of expectation, together with (34). In order to bound the sum in the expectation, let bi = |Bi |, and note that we have an increasing sequence of natural numbers b1 < b2 < · · · < bk so that b1 ≥ 1 and bk = n for some k ≤ n. For any such sequence it holds that n−1 k−1 X X n−1 n−1 ≤ . (35) n − b n − i i i=i i=i To see this note that all summands are positive, and each one on the left hand side of the inequality, appears also on the right hand side. Therefore, we can upper bound the quantity Pn i=1 E[ti − si ] by n X

E[ti − si ] ≤

i=1

n−1 X n−1 i=1

n−i

= (1 + o(1))n log n

Applying Proposition 8.1 gives the desired result. Remark We conjecture that if the rule in the greedy part is RRAND (in which an edge is chosen uniformly at random among the adjacent unvisited edges of the current vertex), then for odd values of n, i.e., when the degree is even, the overhead for clique is O(n), i.e., E[CE (Kn )] ≤ |E| + O(n). For a related discussion see Section 10.

8.2

Expander graphs

We apply the same method as in the previous section on expander graphs. Let G = (V, E) be a d-regular graph on n vertices and let A = A(G) ∈ {0, 1}V ×V be its normalized adjacency matrix, namely  1/d (u, v) ∈ E A(u, v) = 0 (u, v) ∈ / E. It is a standard fact that A has real eigenvalues, all lying in the interval [−1, 1]. Denote the eigenvalues by 1 = λ1 ≥ λ2 ≥ · · · ≥ λn ≥ −1, and let λ(G) be the spectral radius of G, defined as λ(G) = max |λi | i=2,...,n

75

We say a d-regular graph G is a (n, d, λ)-expander, if λ(G) < λ < 1 (for more details see the excellent survey [48]). We are able to show that for d = Ω(log n), the expected edge cover time of the GRW is linear in the number of edges. This is faster than a simple random walk, which covers the edges in Ω(|E| log |E|) steps, as mentioned in the introduction. Specifically, we prove the following theorem. Theorem 8.3 Let G be a (n, d, λ)-expander graph. Then, for any rule R the expected edge cover time is   n log n . E[CE (G)] ≤ |E| + O 1−λ In particular, for an expander with d = Ω(log n) the expected edge cover time of the GRW is linear in the number of edges. Proof The key observation here is that, as in the case of the complete graph, E(ti − si |Bi ) can be bounded in terms of the size of Bi , independently of its structure. We use the following lemma of Broder and Karlin. Lemma 8.4 ([22, Lemma 3]) Let G be an (n, d, λ)-expander and let S $ V be a non-empty set of vertices. Consider a simple random walk Y0 , Y1 . . . on G, starting at some v ∈ S (i.e., Y0 = v). Let T (v, S) be the escape time of the walk from S when started from v. Then C E[T (v, S)] ≤ 1−λ



n log n + n − |S|



for some absolute constant C. Denoting by Ti the expected escape time from the subset Bi , by Lemma 8.4, for all i = 1, . . . , n we have     C log n + n if |Bi | < n 1−λ n−|Bi | Ti := E(ti − si |Bi ) ≤ (36) 0 otherwise P for some absolute constant C ∈ R. In order to upper bound ni=1 E[ti − si ] we apply an analysis similar to that in the proof of Theorem 8.2. Specifically, by averaging over the Bi ’s, P the quantity ni=1 E[ti − si ] equals to n X i=1

E[ti − si ] =

n X

E[E(ti − si |Bi )] =

i=1

n X i=1

76

" E[Ti ] = E

n X i=1

# Ti ,

where the last equality follows from linearity of expectation. Using (36) we obtain n X i=1

# n log n + E[ti − si ] ≤ E n − |Bi | i=1 " k−1 # X C C n ≤ · n log(n) + E 1−λ 1−λ n − |Bi | i=1   n log n , ≤ O 1−λ " k−1 X

C 1−λ



P n where the bound k−1 i=1 n−|Bi | ≤ O(n log(n)) in the last inequality follows using the same proof as (35). Using Proposition 8.1, we have E[CE (G)] ≤ |E| +

n X

 E[ti − si ] = |E| + O

i=1

n log n 1−λ

 ,

which completes to proof of the theorem. Next, we strengthen Theorem 8.3 by showing that for constant degree expanders with logarithmic girth whose vertices have even degrees, the expected edge cover time is linear in the number of vertices. Recall that girth of a graph G, denoted by girth(G) is the minimal length of a cycle in G. This result is claimed in [16] without proof. Theorem 8.5 Let G be a (n, d, λ)-expander graph such that d ∈ N is even, and girth(G) = g. Then, for any rule R the expected edge cover time is   log(n) E[CE (G)] ≤ |E| + O |E| · . (1 − λ)g In particular, if G = (V, E) is an expander of constant even degree with girth(G) = Ω(log(n)), then the expected edge cover time of the GRW is linear in the number of vertices. The proof relies on following simple observation. Suppose that the ith greedy part starts at some vertex v = Xsi . Then, using the fact that all degrees of G are even, we conclude that this greedy part will end at the same vertex v. Indeed, by an Euler-path type argument if a vertex has even degree and the walker entered this vertex along a new edge that has not been visited so far, then by parity there must be another unvisited edge for the walker to leave the vertex. In particular, the range covered by each greedy part forms a (not necessarily simple) cycle. We summarize this observation below: Observation 8.6 If the all degrees of a graph G = (V, E) are even, then in each greedy time part [ti , si+1 ] it holds that Xti = Xsi+1 , i.e., every greedy part ends at the same vertex it has started from. 77

Therefore, since at the greedy time parts the walker crosses no edge twice, in each greedy part [ti , si+1 ] the walker traverses along some (not necessarily simple) cycle, and thus the number of steps in each greedy time part is at least girth(G). We now turn to the proof of Theorem 8.5. Proof As in the proof of Theorem 8.3 the expected edge cover time can be bounded from above by " k−1   #  X n 1 ·E log n + . (37) E[CE (G)] = |E| + O 1−λ n − |Bi | i=1 By Observation 8.6 it follows that the random number k of greedy parts is upper bounded by |E| . g n i |) In order to bound the terms n−|B note that for all i ≤ k, it holds that k ≤ i + d·(n−|B . g i| Indeed, if in time si the number of vertices all of whose adjacent edges have already been covered is |Bi |, then the number of edges that have not been traversed so far is at most d · (n − |Bi |), and hence, by the assumption on the girth of G, the number of remaining greedy parts is i |) . Therefore, for all i ≤ k we have at most d·(n−|B g n dn 2|E| ≤ = . n − |Bi | (k − i)g (k − i)g By (37) we have !  k−1 X 1 n · E[k log(n)] + O · E[CE (G)] = |E| + O 1−λ n − |Bi | i=1 !     k−1 X log(n) 1 |E| = |E| + O |E| · +O · (1 − λ)g 1−λ (k − i) · g i=1   log(n) ≤ |E| + O |E| · , (1 − λ)g 

1 1−λ





where the last inequality uses the assumption that k ≤ n and the facts that O(1). Theorem 8.5 follows.

Pk−1

1 i=1 k−i

≤ log(k)+

We show below that the assumption that graph has logarithmic girth in Theorem 8.5 is necessary. Specifically, we present a 6-regular expander graph graph G, and a rule R, such that GRW with the rule R coves all the edges of G in expected time Ω(n log(n)). In fact, the graph G satisfies an additional property, that every vertex of G is contained in some induced cycle of logarithmic length. This should be compared with the result of Berenbrink et al. [16] who have shown that if G is an even degree expander such that every vertex of G is contained in some induced cycle of logarithmic length, then the expected vertex cover time by GRW is linear for 78

any rule R. This shows a gap between the edge cover time and the vertex cover time of GRW. Theorem 8.7 For every n = 0 (mod 3) there exists a 6-regular expander graph G = (V, E) with |V | = n vertices such that every vertex of G is contained in an induced cycle of logarithmic length, and there exists a rule R such that the expected edge cover time of G by GRW with the rule R is Ω(n log(n)). Proof Let H = (U, F ) be a 4-regular expander graph on n/3 vertices such that every vertex of G is contained in an induced cycle of length  log(n) for some constant  > 0.41 Define a graph G = (V, E) to be the cartesian product of H with the graph K3 . Namely, the vertices of G are V = U × {1, 2, 3} and ((u, i), (u0 , j)) ∈ E if and only if either (1) (u, u0 ) ∈ F and i = j, or (2) u = u0 and i 6= j. By the properties of H, the graph G is a 6 regular expander and it satisfies the property that every vertex of G is contained in some induced cycle of length at least  log(n). The vertices of G are naturally partitioned into 3 subsets V = V1 ∪ V2 ∪ V3 where Vi = {(U, i) : u ∈ U } for i = 1, 2, 3. The rule R is defined so that the first greedy part will cover all edges of the form ((u, i)(v, i)) for all (u, v) ∈ F and i ∈ {1, 2, 3}. Assume now that GRW starts from some arbitrary vertex (u0 , 1) ∈ V1 . The walker walks along some Eulerian cycle of V1 , covering all edges induced by V1 . Indeed, this can be done, as the graph induced by V1 is isomorphic to H, and hence its vertices have even degrees. After completing the cycle in V1 and returning to the initial vertex (u0 , 1), the walker moves to (u0 , 2), performs a walk along some Eulerian cycle on V2 , and returns back to (u0 , 2). Similarly, the walker, then, moves to (u0 , 3), covers all edges induced by V3 , and returns to (u0 , 3). Finally, the walker moves back to (u0 , 1), and gets stuck for the first time. Note that at this point all edges induces by each of Vi ’s have already been covered by GRW, and the remaining edges form disjoint triangles of the form {(u, 1), (u, 2), (u, 3)} induced by each of the vertices u ∈ U \ {u0 }. Hence, each subsequent greedy part will consist of 3 steps, covering one triangle at each part, and the order is defined by the first time that SRW reaches some vertex of a triangle {(u, i) : i = 1, 2, 3}. Noting that by ignoring all steps from (u, i) to (u, j) in G the random walk on G induces a SRW walk on H, it follows that in order to cover all triangles SRW needs to cover all the vertices of a copy of H. Since by the theorem of Fiege [31] the expected vertex cover time of every graph by SRW is at least Ω(n log(n), this bound also holds for the edge cover time of G. This completes the proof of the theorem. 41

Such graph can be obtained by choosing a random 4-regular graph. For reference see [20, Chapter II.4].

79

8.3

Hypercube {0, 1}d

The hypercube graph G = (V, E) is a graph, whose vertices are V = {0, 1}d and (u, v) ∈ E if and only if d(u, v) = 1, where d(·, ·) is the Hamming distance between two strings. We show that for even dimension d the edge cover time of the hypercube is linear in the number of edges. Proposition 8.8 Let d ∈ N be even, and let Qd = (V, E) be the d-dimensional hypercube graph. Then, for any rule R the expected edge cover time of Qd is bounded by E[CE (Qd )] = O(|E|). Proof

The proposition follows from the following lemma.

Lemma 8.9 Let G = (V, E) be a graph whose vertices have even degrees. Suppose that for the graph G the expected vertex cover time of SRW is C. Then, the expected edge cover time of GRW of G is at most E[CE (G)] ≤ |E| + C. Since the number of edges in Qd is |E| = 21 d · 2d , and using the fact that the expected vertex cover time of the hypercube by SRW is C = O(d · 2d ), Proposition 8.8 follows by Lemma 8.9.

We now turn to the proof of Lemma 8.9. Proof [Proof of Lemma 8.9] The proof proceeds by coupling between a SRW and a GRW so that the number of steps made by the GRW is larger than the number of steps made by SRW by at most |E|. As observed above, in Observation 8.6 for graphs of even degrees we have Xti = Xsi+1 for all i ≤ k, i.e., every greedy part finishes at the same vertex that it started from. This implies that the simple parts can be concatenated, as the end of the ith simple part is Xti , and the beginning of the (i + 1)st part is Xsi+1 . The coupling between the SRW and the GRW is the natural one, where the SRW performs all the steps that the GRW makes in its simple parts. Clearly, the number of steps made by the GRW is larger than the number of steps made by SRW by at most the total number of steps made in the greedy parts, which is bounded by |E|. Observe that whenever the SRW reaches some vertex v, it is either the case that (1) all edges adjacent to v have already been covered by GRW, or (2) the vertex v is the last vertex in the current simple part, and thus, using the property Xti = Xsi for all i, the next greedy part will cover all edges adjacent to v. This implies that by the time the SRW covers all vertices of G, the GRW has either already covered all edges of G, or will do so in the number greedy part. Therefore, the edge cover time of GRW is larger than the vertex cover time of SRW by at most |E|. This completes the proof of the lemma. 80

We also remark (without a proof) on the edge cover time of a generalization of the hypercube graph. Remark Define a generalization of the hypercube by connecting two vertices in {0, 1}d if the distance between them is at most some parameter ` ≥ 2. Specifically, for (≤`) ` ≥ 2, let Qd = (V, E` ), where V = {0, 1}d and (x, y) ∈ E iff d(x, y) ≤ `. Denoting the (≤`) number of vertices in the graph by n = 2d , the spectral radius of Qd is bounded from above by λ ≤ 1 − log` n . Therefore, by Theorem 8.3 for ` ≥ 2 the expected edge cover time of GRW (≤`) on Qd is |E` | + O(n log2 n), where the constant in the O() notation depends on `. ` Noting that the number of edges in Q≤` d is |E` | = O(n · log n), this implies that for ` = 2 the edge cover time is linear in the number of edges |E2 | = O(n · log2 n), and for ` ≥ 3 the edge cover time is (1 + o(1)|E` |.

8.4

d-regular trees

In this section we provide an upper bound for the edge cover time of GRW on trees. We are able to describe the behavior of GRW quite accurately, and subsequently provide a tight bound on the cover time. Theorem 8.10 Let G = (V, E) be a tree rooted at a vertex denoted by r. For any v ∈ V denote by Tv the subtree rooted at v and let |Tv | denote the number of edges in Tv . Then, for any rule R the GRW edge cover time of G is  E[CE (G)] = |E| + O 

 X

|Tu | .

u∈G\{r}

If the rule for GRW is RRAND and deg(r) ≥ 2, then there is a matching lower bound, namely  E[CE (G)] = |E| + Θ 

 X

|Tu | .

u∈G\{r}

The following corollary in immediate from Theorem 8.10. Corollary 8.11 If G is a d-regular tree with n vertices, then the expected edge cover time is O(n logd n). Comparing Corollary 8.11 to the cover time of SRW on d-regular trees, we again see an asymptotic speed-up over the Θ(n log2d n) time of the SRW [2]. Proof In order to use the tree structure of the graph, let us first give an overview of the behavior of GRW on trees. The walker starts at the root r and goes down greedily (i.e., an unvisited edge is traversed in every new step), until reaching a leaf. Since it got stuck at a leaf, 81

it performs a simple random walk until reaching its lowest ancestor with an adjacent edge that has not been covered yet. The non-covered edge is necessarily from the ancestor to one of its children, (as its parent has been already visited on the way down). The walker continues by moving down greedily until reaching another leaf not covered thus far, and then performs an SRW until reaching again its lowest ancestor with a child that has not been visited thus far by the walk. The walk continues in the same manner until covering all edges, getting stuck only at the leaves. In fact the walk gets stuck exactly once in each leaf, and the time CE (G) is the time when walker visits the last leaf of the tree. Note that when visiting some vertex v, the walk will cover the entire subtree of v before returning to v’s parent. This property is what makes the cover time of GRW asymptotically faster than the cover time of SRW. The order in which the vertices are visited for the first time defines some preorder traversal on the tree (first the root, then the subtrees), where for each vertex the order of the subtrees is chosen according to the rule R. We observe that the vertices (Xs1 , Xs2 , . . . , Xsk ) define some order on the leaves of the tree, induced by the preorder traversal as described above (and in particular, k equals to the number of leaves). In addition, for every i < k, the vertex Xti is the lowest ancestor of Xsi such that at time si not all of its descendants have been visited by the walk. Hence, E[ti − si ] equals the expected time it takes for the simple random walk starting at Xsi to visit this ancestor. This implies that for every edge (u, v), where u is the parent of v, there is at most one i ∈ {1, . . . , k} such that the edge (u, v) lies on the shortest path from Xsi to Xti . Therefore, if w is the leaf where the walk got stuck for the ith time, that is, Xsi = w, and v is its lowest ancestor whose subtree is not covered yet, then the expected time to reach v staring from w is X E[ti − si ] = H(w, v) = H(u1 , u2 ), (u1 ,u2 )∈P(w,v)

where H(x, y) denotes the expected number of steps required for SRW starting at x to visit y, and the sum is over all edges on the shorted path from w to v (using the convention that the edge (u1 , u2 ) means that u2 is a parent of u1 . Going over all leaves in in the graph, and using the observation that the walk gets stuck in each leaf exactly once (stopping at the last visited leaf at time sk ), and finishing the corresponding simple part at the lowest ancestor whose tree has not been covered yet, we observe that for each i < k the shortest paths from Xsi to Xti are disjoint. Furthermore, the union of all these paths covers all edges of the graph except for the path from the last covered leaf, denoted by l = Xsk , to the root of the tree. Let us denote by P(r,l) be the shortest path from l to r. Then " E

k X i=1

#



(ti − si ) = E 

 X

(u,v)∈E\P(r,l)

82

H(u, v) ≤

X (u,v)∈E

H(u, v),

(38)

where H(v, u) denotes the expected number of steps, required for SRW starting at v to visit u for the first time, and the summation is over all edges (u, v), where v is the parent of u. It is well known (see e.g., [3, Lemma 1]) that if (u, v) is an edge in a tree, then H(u, v) = 2|Tu | + 1. Proposition 8.1 together with (38), proves the upper bound of the theorem. Note that if we allow the walker return to the origin after covering the tree, then the expected return time is equal to |E| +

X

(2|T (u)| + 1) = 2

X

|T (u)| = 2

u∈V

(u,v)∈E

X

depth(u),

u∈V

where depth(u) is the distance of the vertex u from the root. If GRW uses the rule RRAND , then the subtrees rooted at the children of r are explored completely one after another (the order of the children is random), and the walk will return to r from all but the last subtree. Therefore, for each u child of r the subtree rooted at u , and hence, every edge of the tree is completely explored by GRW with probability deg(r)−1 deg(r) 1 belongs to P(r,l) with probability at most deg(r) . Therefore, by applying the formula in (38) we get "

k X E (ti − si )



#

= E

i=1

 X

H(u, v)

(u,v)∈E\P(r,l)

≥ (1 −

X 1 ) H(u, v) deg(r) (u,v)∈E

≥ (1 −

X 1 ) 2|Tu |. deg(r) u∈V \{r}

This completes the proof of the theorem. Remark Note that in the second part we can remove the condition deg(r) ≥ 2, and obtain a P bound of E[CE (G)] = |E| + Θ ( u |Tu |), where the sum is over all vertices u ∈ V that either have at least two children or have an ancestor with at least two children.

9

Greedy Random Walk on Zd

In this section we study the behavior of GRW on infinite graphs. Specifically we ask whether the walk is recurrent or transient in different graphs.

83

9.1

GRW on Zd for d 6= 2

Obviously, GRW on Z visits every vertex at most once. We show that for d ≥ 3 the greedy random walk on Zd is transient. Theorem 9.1 Let G = (V, E) be an infinite graph, where all its vertices are of even degree. If the simple random walk on G is transient, then for any rule R the greedy random walk is also transient. In particular for d ≥ 3, the greedy random walk on Zd returns to the origin only finitely many times almost surely. Proof Partition the time [0, +∞] into two types of parts, greedy parts and simple parts, by defining times t0 = 0, s1 , t1 , s2 , t2 , · · · ∈ N ∪ {+∞} as follows: t0 = 0

si+1

 inf{t ≤ t < +∞ : J (X ) = ∅} i t t = +∞

ti+1 =

 inf{s

i+1

≤ t < +∞ : Jt (Xt ) 6= ∅}

+∞

if there is such t otherwise if there is such t otherwise.

(Analogous partition underlies the results in Section 8. The difference here is that the times can have the value +∞.) For the reader’s convenience we restate Observation 8.6 adapted for the case of infinite graphs. Observation 9.2 If all degrees of a graph G = (V, E) are even and si+1 < ∞, then Xti = Xsi+1 . Assume that the event that si or ti equals +∞ for some i ≥ 1 and tk is the first such time has a positive probability. Conditioning on this event, the walk remains in a simple part starting from time sk , and hence performs a simple random walk from this time onwards. Since SRW is transient on G, the walk will return to X0 only finitely many times a.s.. Actually, since the random range R = {Xt : 0 ≤ sk } is finite, and the SRW is transient, conditioning on R, the SRW will leave R in finite time a.s., and so tk is a.s. finite, contradicting the assumption. Similarly, if the event that si or ti equals +∞ for some i ≥ 1 and sk is the first such time has a positive probability, then, conditioning on this event, the walk is in a greedy part from tk−1 onwards. In other words, from time tk−1 onwards the walker crosses each edge at most once. Hence, as the degree of X0 is finite, the maximal number of returns to X0 is at most deg(X0 ) + tk−1 , and in particular a.s. finite. 2 2 84

Assume now that the event that si , ti < +∞ for all i ≥ 1 has a positive probability, and condition on this event. Using the assumption that all vertices of the graph have even degrees it follows from Observation 9.2 that Xti = Xsi+1 for all i ≥ 0. Therefore, for all i ≥ 0 the walk in time segments [si , ti ] and [si+1 , ti+1 ] can be concatenated. Hence, the walk restricted to time S i≥0 [si , ti ] is distributed as a SRW on G, and so, by transience, returns to X0 finitely many 0) times almost surely. Since in the overall greedy parts the walker can visit X0 at most deg(X 2 times, the entire walk returns to X0 finitely many times a.s. Note that we strongly used the fact that all vertices in our graph have even degree. The following proposition shows a similar result by slightly relaxing this assumption. Proposition 9.3 Let G = (V, E) be a graph obtained from Zd by removing at most rd−2− edges from any box of radius r around the origin for some  > 0. Then the greedy random walk on G is transient. The proof generalizes the concatenation argument of Theorem 9.1. Unlike the previous proof, which relied on the fact that all vertices had even degrees, in our case some vertices have odd degrees. Hence it is possible that the simple parts cannot be concatenated into one walk. However, we can divide the simple parts into classes, such that in each class the parts can be concatenated into one simple random walk. The proof uses the fact that a simple random walkers starting from a point at distance r from the origin then it visits the origin with probability O(1/rd−2 ). Therefore, if there are rd−2− independent simple random walkers started at a sphere of radius r around the origin, then the total number of visits at the origin by all the walkers is almost surely finite. Proof We start with a time partition (t0 = 0, s1 , t1 , s2 , t2 , . . . ) as in the proof of Theorem 9.1. Call a vertex v a new start if v = Xsi for some i and Xtj 6= v for all j < i. As in the proof of Theorem 9.1, the concatenation argument implies that every new start vertex must be either the origin or have an odd degree. Consider the walk restricted to the segments [si , ti ]. The indices i ≥ 1 can be partitioned into classes C1 , C2 , . . . such that in each class Cj the segments [si , ti ], i ∈ Cj , can be concatenated into one walk that starts with a new start vertex. Namely, if Cj = {i1 < i2 < i3 < . . . }, then Xsi1 is a new start and Xsi2 = Xti1 , Xsi3 = Xti2 , . . . Denoting by mj = min Cj we have Xsmj is necessarily a new start, and therefore is either the origin or a vertex of odd degree. Moreover the times {smj }j are all distinct. S For each Cj , restricting the walk to times {[si , ti ] : i ∈ Cj } gives us a simple random walk (possibly finite) starting from Xsmj . Therefore there are at most O(rd−2− ) simple random walks, starting from a box of radius r around the origin. Using the fact that a random walk in Zd starting from a vertex at distance r from the origin hits it with probability O(1/rd−2 ), we conclude that the sum of probabilities of hitting zero converges, when summing over all random 85

walks. More precisely, let Pv be the probability that a SRW starting at v reaches the origin, and let ODD be the set of all vertices of odd degree. Then X v∈ODD

Pv =

∞ X n=1

X v∈ODD 2n−1 ≤kvk<2n

Pv ≤

X n

∞ X 1 n (d−2−) (2 ) · O( n d−2 ) = O 2−n (2 ) n=1

! < ∞.

By the first Borel-Cantelli lemma the event that only finitely many of the walks will reach the origin happens with probability 1. Therefore, GRW on this graph is transient, as required.

9.2

GRW on Z2 and the mirror model

The following observation relating the behavior of GRW on Z2 to the mirror model is due to Omer Angel. In the mirror model, introduced by Ruijgrok and Cohen [68] a mirror is placed randomly on 2 Z by aligning a mirror along either one of the diagonal directions with probability 1/3 each, or placing no mirror with probability 1/3. A particle moves along the edges of the lattice and is reflected by the mirrors according to the law of reflection. See, e.g., [24] for details. A major open problem in this area is to determine whether every orbit is periodic almost surely. We claim below that this question is equivalent to determining whether GRW with rule RRAND is recurrent in Z2 . (Recall, in the rule RRAND an edge is chosen uniformly at random among the adjacent unvisited edges of the current vertex). Let (Xt )t≥0 be GRW on Z2 with the rule RRAND . Then, there exists a coupling between (Xt )t≥0 and the particle motion in the planar mirror model until the first time they return to the origin. Indeed, if at time t ≥ 0 GRW reaches a vertex Xt that we have not visited so far, then in both GRW and in the mirror model the next step will be chosen in a non-backtracking manner, giving equal probabilities of 1/3 to each of the adjacent vertices (except for Xt−1 ). In the mirror model this uniquely defines the alignment of the mirror at vertex Xt , and hence the next move of the particle in the next visit to this place, given that the orbit is not periodic: it will go to the unvisited neighboring vertex. On the other hand, if at time t ≥ 0 we reach a vertex Xt that has already been visited previously, then the next step is uniquely determined: it is to make a move along the edge that has not been traversed so far. This defines a coupling of the two models up to the first returning time to zero. Claim 9.4 The probability that GRW with the rule RRAND on Z2 returns to the origin at least once is equal to the probability that a particle returns to origin in the planar mirror model. From Claim 9.4 we infer the following theorem.

86

Theorem 9.5 GRW with rule RRAND on Z2 returns to the origin infinitely often almost surely if and only if the orbit in the mirror model on Z2 is periodic almost surely. Proof Note first that GRW on Z2 returns to the origin infinitely often if and only if every greedy part is finite. Indeed, if there is an infinite greedy part, then there are finitely many returns to the origin as every vertex is visited at most twice in total in all greedy time parts. In the other direction, assume that all greedy time parts are finite. Then, by the concatenation argument, which follows by Observation 9.2, the simple parts form an infinite subsequence distributed as SRW on Z2 starting at the origin. The latter returns to the origin infinitely often almost surely, and hence, so does GRW. Therefore, it is enough to show that the orbit in the mirror model on Z2 is periodic a.s. if and only if every greedy part is finite. Suppose first that every orbit in the mirror model on Z2 is periodic almost surely, and suppose that GRW starts the ith greedy part at some time ti . Then, condition on the ti steps of GRW so far, defines the orientation of the mirrors in the vertices visited up till now. Since, the number of visited vertices is finite, it follows that the conditioning is on a non-zero event, and so the trajectory of the particle starting from Xti is a.s. periodic. Therefore, by considering the coupling between GRW and the mirror model conditioned on that event, analogously to Claim 9.4, it follows that with probability 1 the ith greedy part is finite. Assume now that every greedy part of GRW is finite. Note that by translation invariance it is enough to show that trajectory of a single particle starting at the origin is periodic almost surely.42 Indeed, since GRW returns to the origin twice a.s., it follows from the coupling in Claim 9.4 that the trajectory of a particle starting at the origin is periodic almost surely, as required.

10 10.1

Remarks and Open Problems A Conjecture Regarding Theorem 8.2

Recall Observation 8.6 used in the proof of Theorem 8.5. It seems to be potentially useful for proving stronger bounds on the edge cover time of GRW. To illustrate how this observation can be useful, let us consider the GRW on the complete graph Kn for odd values of n. In the proof of Theorem 8.2 we only used the assumption that the "bad" sets Bii growhat least by one each time, hP Pk−1 n−1 i k−1 n−1 ≤ n log(n). thus allowing us to bound the “overhead” by E i=1 n−|Bi | ≤ E i=1 n−i 42

Indeed, if trajectory of a particle starting at the origin is periodic almost surely, then, by translation invariance the trajectory of a particle starting at any vertex and moving in any direction is periodic almost surely. Thus, by placing, 4 particles at each vertex of the graph, and letting them move in the 4 possible directions, it follows that with probability 1 the trajectory of each of them a periodic, as this this event is an intersection of countably many probability 1 events. Therefore, trajectory of a particle is periodic almost surely if and only if all orbits are periodic almost surely.

87

We suspect, however, that the sets Bi grow linearly in n, as by the time the walker gets stuck for the first time, i.e., visits the starting vertex n/2 times, the number of vertices that have already been visited n/2 times will be linear in n. The situation, however, becomes more complicated when trying to analyze the set B2 , as it seems to require some understanding regarding the subgraph of Kn that has not been covered by the time s1 , when the walker got stuck for the first time. If this is indeed true, and the sets Bi grow linearly in each step, we would obtain a P stronger bound E[ (ti − si )] = O(n). We make the following, rather bold, conjecture. Conjecture 10.1 The expected edge cover time of GRW on Kn is E[CE (Kn )] = |E| + Θ(n). An interesting result in this direction is a recent result of Omer Angel and Yariv Yaari. They showed that for the complete graph Kn for odd values of n, i.e., when the graph Kn is of even degree, the expected number of unvisited edges in Kn until the first time the walk got stuck (i.e., up to time s1 ) is linear in n [74].

10.2

Rules on Vertices Instead of Edges

In this paper we have considered edge cover time of graphs, rather than vertex cover time. This seems to be a natural quantity to analyze due to the transition rule of GRW. A naïve modification of GRW to speed-up the vertex cover time is the following. At each step, the walker at vertex v picks an unvisited neighbor of v according to some rule and jumps there. If all neighbors have already been visited, the next move is chosen uniformly at random among the neighbors of v. For example, it is obvious that in the complete graph Kn , this walk covers all vertices in n steps. Note that, when the walker is allowed to make some local computations at a vertex, and each vertex has the information regarding its neighbors, then one can define a rule that will force the walk to perform depth first search on the graph, by letting each vertex use only the information regarding its neighbors. Such walk crosses at most twice each edge of some spanning tree, thus visiting all vertices of the graph in less than 2n steps.

10.3

Open Problems

In order to avoid trivialities, in the questions below consider GRW with the rule RRAND . 1. Give a tight bound for the “overhead” of GRW on the complete graph. Specifically, is it  true that E[CE (Kn )] = n2 + Θ(n)?

88

2. Show upper bounds on CE (G) for other families of graphs. One interesting example to look at could be the d-dimensional torus. 3. It seems also interesting to analyze the GRW on graphs with power-law degree distribution. On such graphs there are hubs of very large degrees and when visiting them, the GRW is expected to be efficient. 4. Show that for any transitive graph the expected edge cover time of the GRW cannot be asymptotically larger than that of the SRW for any finite graph. We know that this is true for vertex-transitive graphs of even degree. 5. Give bounds on the expected vertex cover time of the GRW for finite graphs. 6. Give bounds on the expected hitting time of GRW for different graphs. 7. Define GRW mixing time and show that GRW mixing time is as fast as that of SRW. Here [4] is relevant, and also [58] may be found useful. The remaining problems are regarding recurrence/transience of GRW on infinite graphs. 8. Is GRW on Z2 recurrent? Is GRW diffusive on Zd , for all d ≥ 2? (See the discussion in Section 9.2.) 9. Is GRW on the ladder Z × Z2 recurrent? 10. Prove that GRW is transient on any graph that is roughly isometric to Z3 . In particular show it for odd degree lattices. 11. Show that GRW is transient on non-amenable infinite graphs. 12. Consider GRW on a vertex transitive graph. Is there a zero-one law for the event that the walker returns to its initial location infinitely often? Note that similarly to the argument in the proof of Theorem 9.5 the event above happens almost surely if and only if the walker returns to its initial location almost surely.

89

Part III

Acquaintance Time of a Graph Abstract We define the following parameter of connected graphs. For a given graph G = (V, E) we place one agent in each vertex v ∈ V . Every pair of agents sharing a common edge is declared to be acquainted. In each round we choose some matching of G (not necessarily a maximal matching), and for each edge in the matching the agents on this edge swap places. After the swap, again, every pair of agents sharing a common edge become acquainted, and the process continues. We define the acquaintance time of a graph G, denoted by AC(G), to be the minimal number of rounds required until every two agents are acquainted. We first study the acquaintance time for some natural families of graphs including the path, expanders, and the complete bipartite graph. We also show that for all n ∈ N and for all positive integers k ≤ n1.5 there exists an n-vertex graph G such that k/c ≤ AC(G) ≤ c · k for some universal constant c ≥ 1. We also prove that the exponent of 1.5 is the best  possible by proving that for all n-vertex connected graphs G we have AC(G) = O n1.5 . Studying the computational complexity of this problem, we prove that for any constant t ≥ 1 the problem of deciding that a given graph G has AC(G) ≤ t or AC(G) ≥ 2t is N P-complete. That is, AC(G) is N P-hard to approximate within multiplicative factor of 2, as well as within any additive constant factor. On the algorithmic side, we give a deterministic algorithm that given an n-vertex graph G with AC(G) = 1 finds a strategy for acquaintance that consists of dn/ce matchings in time nc+O(1) . We also design a randomized polynomial time algorithm that given an nvertex graph G with AC(G) = 1 finds with high probability an O(log(n))-rounds strategy for acquaintance.

90

11

Introduction to Part III

In this work we deal with the following problem: agents walk on a graph meeting each other, and our goal is to make every pair of agents meet as fast as possible. Specifically, we introduce the following parameter of connected graphs. For a given graph G = (V, E) we place one agent in each vertex of the graph. Every pair of agents sharing a common edge is declared to be acquainted. In each round we choose some matching of G (not necessarily a maximal matching), and for each edge in the matching the agents on this edge swap places. After the swap, again, every pair of agents sharing a common edge become acquainted, and the process continues until every two agents are acquainted with each other. Such a sequence is called a strategy for acquaintance in G. We define the acquaintance time of a graph G, denoted by AC(G), to be the minimal number of rounds in a strategy for acquaintance in G. In order to get some feeling regarding this parameter note that if for a given graph G a list of matchings (M1 , . . . , Mt ) is a witness-strategy for the assertion that AC(G) ≤ t, then the inverse list (Mt , . . . , M1 ) is also a witness-strategy for this assertion. We remark that in general a witness-strategy is not commutative in the order of the matchings.43 For a trivial bound of AC(G) we have AC(G) ≥ bdiam(G)/2c, where diam(G) is the maximal distance between two vertices of G. It is also easy to see that for every graph G = (V, E) with n vertices it (n2 ) − 1. Indeed, before the first round exactly |E| pairs of agents are holds that AC(G) ≥ |E| acquainted. Similarly, in each round at most |E| new pairs get acquainted. This implies that  |E| + AC(G) · |E| ≥ n2 , since in any solution the total number of pairs that met up to time  AC(G) is n2 . For an upper bound, for every graph G with n vertices we have AC(G) ≤ 2n2 , as every agent can meet all others by traversing the graph along some spanning tree in at most 2n rounds. Note that for t ∈ N the problem of deciding whether a graph G has AC(G) ≤ t is in N P, and the natural N P-witness is a strategy for acquaintance in G. This problem is different from many classical N P-complete problems, such as graph coloring or vertex cover, in the sense that checking an N P-witness for AC(G) is “dynamic”, and involves evolution in time. Several problems of similar flavor have been studied in the past, including the well studied problems of Gossiping and Broadcasting (see the survey of Hedetniemi, Hedetniemi, and Liestman [46] for details), Collision-Free Network Exploration (see [25]), and the Target Set Selection Problem (see, e.g., [50, 26, 64]). One such problem of particular relevance is Routing Permutation on Graphs via Matchings studied by Alon, Chung, and Graham in [5]. In this problem the input is a graph G = (V, E) and a permutation of the vertices σ : V → V , and the goal is to route all agents to their respective destinations according to σ; that is, the agent sitting originally in the 43

For example, let G = (V = {1, 2, 3, 4}, E = {(1, 2), (2, 3), (3, 4)}) be the path of length 4. Then, the sequence (M1 = {(1, 2)}, M2 = {(3, 4)}, M3 = {(1, 2)}) is a strategy for acquaintance in G, whereas, the sequence (M1 = {(1, 2)}, M3 = {(1, 2)}, M2 = {(3, 4)}) is not.

91

vertex v should be routed to the vertex σ(v) for all v ∈ V . In our setting we encounter a similar routing problem, where we route the agents from some set of vertices S ⊆ V to some T ⊆ V without specifying the target location in T of each of the agents.

11.1

Our results

We start this work by providing asymptotic computations of the acquaintance time for some interesting families of graphs. For instance, if Pn is the path of length n, then AC(Pn ) = n − 2. In particular, this implies that AC(H) ≤ n − 2 for all Hamiltonian graphs H with n vertices. We also prove that for constant degree expanders G = (V, E) on n vertices the acquaintance n2 ). More examples include the time is O(n), which is tight, as |E| = O(n) and AC(G) = Ω( |E| complete bipartite graph, and the barbell graph. We then provide examples of graphs with different ranges of the acquaintance time. We show in Theorem 14.1 that for all n ∈ N and for all positive integers k ≤ n1.5 there exists an n-vertex graph G such that k/c ≤ AC(G) ≤ c · k for some universal constant c ≥ 1. We also prove that the exponent of 1.5 is the best possible by proving that for all n-vertex connected graphs G we have AC(G) = O (n1.5 ). We also study the problem of computing/approximating AC(G) for a given graph G. As noted above, for t ∈ N the problem of deciding whether a given graph G has AC(G) ≤ t is in N P, and the natural N P-witness is a sequence of t matchings that allows every two agents to get acquainted. We prove that the acquaintance time problem is N P-complete, by showing a reduction from the graph coloring problem. Specifically, Theorem 15.1 says that for every t ≥ 1 it is N P-hard to distinguish whether a given graph G has AC(G) ≤ t or AC(G) ≥ 2t. Hence, AC(G) is N P-hard to approximate within a multiplicative factor of 2, as well as within any additive constant. In fact, we conjecture that it is N P-hard to approximate AC within any multiplicative factor. On the algorithmic side we study graphs whose acquaintance time equals to 1. We show there is a deterministic algorithm that when given an n-vertex graph G with AC(G) = 1 finds an dn/ce-rounds strategy for acquaintance in G in time nc+O(1) . We also design a randomized polynomial time algorithm that when given an n-vertex graph G with AC(G) = 1 finds with high probability an O(log(n))-rounds strategy for acquaintance.

12

Definitions and Notation

Throughout the paper all graphs are simple and undirected. We use standard notations for the standard parameters of graphs. Given a graph G = (V, E) and two vertices u, v ∈ V the distance between u and v, denoted by dist(u, v), is the length of a shortest path from u to v

92

in G. For a vertex v and a set of vertices U ⊆ V the distance of v from U is defined to be dist(v, U ) = minu∈U dist(v, u). The diameter of the graph G, denoted by diam(G), is the maximal distance between two vertices of the graph. For a vertex u ∈ V the set of neighbors of u is denoted by N (u) = {w ∈ V : (u, w) ∈ E}. Similarly, for a set U ⊆ V the set of neighbors of U is N (U ) = {w ∈ V : (u, w) ∈ E for some u ∈ U }. The independence number of G, denoted by α(G), is the cardinality of the largest independent set, that is, a set of vertices in the graph, no two of which are adjacent. The chromatic number of G, denoted by χ(G), is the minimal number c ∈ N such that there is a coloring f : V → [c] of the vertices that satisfies f (v) 6= f (w) for all edges (v, w) ∈ E. The equi-chromatic number of G, denoted by χeq (G), is the minimal number c ∈ N such that there is a balanced coloring f : V → [c] that satisfies f (v) 6= f (w) for all edges (v, w) ∈ E, where a coloring f : V → [c] is said to be balanced if |f −1 (i)| = |f −1 (j)| for all i, j ∈ [c]. For a given graph G = (V, E) the acquaintance time is defined as follows. We place one agent in each vertex v ∈ V . Every pair of agents sharing a common edge is declared to be acquainted. In each round we choose a matching of G, and for each edge in the matching the agents on this edge swap places. After the swap, again, every pair of agents sharing a common edge become acquainted, and the process continues. A sequence of matchings in the graph is called a strategy. A strategy that allows every pair of agents to meet is called a strategy for acquaintance in G. The acquaintance time of G, denoted by AC(G), is the minimal number of rounds required for such a strategy. As mentioned in the introduction, this problem is related to a certain routing problem studied in [5]. Specifically, we are interested in the routing task summarized in the following claim. For a given tree G = (V, E) the claim gives a strategy for fast routing of the agents from some set of vertices S ⊆ V to T ⊆ V without specifying the target location in T of each of the agents. Claim 12.1 Let G = (V, E) be a tree. Let S, T ⊆ V be two subsets of the vertices of equal size k = |S| = |T |, and let ` = maxv∈S,u∈T {dist(v, u)} be the maximal distance between a vertex in S and a vertex in T . Then, there is a strategy of ` + 2(k − 1) matchings that routes all agents from S to T . Proof Let G = (V, E) be a tree, and let S, T ⊆ V be two subsets of the vertices of G. The proof is by induction on k. For the case of k = 1 the statement is trivial, as ` rounds are enough to route a single agent. For the induction step let k ≥ 2, and assume for simplicity that the only agents in the graph are those sitting in S, and our goal is to route them to T . Let span(S) be the minimal subtree of G containing all vertices s ∈ S, and define span(T ) analogously. Note first that if span(S) = span(T ), then by minimality of the span any leaf s∗ of span(S) belongs to both S and T . Therefore, we may apply the induction hypothesis to route the agents 93

from S \ {s∗ } to T \ {s∗ } in the subtree span(S) \ {s∗ }, leaving the agent from s∗ in place, which proves the induction step for the case of span(S) = span(T ). Otherwise, let us assume without loss of generality that there is some s∗ ∈ S that is not contained in span(T ). If not, then we can consider the problem of routing the agents from T to S, and note that viewing this strategy in the reverse order produces a strategy for routing from S to T . Let t∗ ∈ T be a vertex such that dist(s∗ , t∗ ) = dist(s∗ , T ), and let P = (s∗ = p0 , p1 , . . . , pr , pr+1 = t∗ ) be the shortest path from s∗ to t∗ in G (note that r ≤ ` by definition of `). By the induction hypothesis, there is a strategy consisting of ` + 2(k − 2) rounds that routes the agents from S \ {s∗ } to T \ {t∗ }. In such a strategy after the last step all agents are in T \ {t∗ } and thus the vertices {p1 , . . . , pr } contain no agents (since pi ∈ / T \ {t∗ } for all i ∈ [r]). After round number (` + 2(k − 2) − 1), i.e., one step before the last, the vertices {p1 , . . . , pr−1 } contain no agents, because dist(pi , T ) ≥ 2 for all i ≤ r − 1. Analogously, for all j ≤ r the vertices {p1 , . . . , pr−j } contain no agent after round number (` + 2(k − 2) − j). Therefore, we can augment the strategy by moving the agent from s∗ to t∗ along the path P . Specifically, for all i = 0, . . . , r we move the agent from pi to pi+1 in round ` + 2(k − 2) − r + i + 2, which adds two rounds to the strategy. The claim follows.

13

Some Concrete Examples

We start with an easy example, showing that for the graph Pn , a path of length n, the acquaintance time is Θ(n). Proposition 13.1 (AC of a path): Let Pn be a path of length n. Then AC(Pn ) = Θ(n). Remark In Theorem 13.8 we compute AC(Pn ) exactly and show that AC(Pn ) = n − 2. Nonetheless, we find the strategy given in the proof below interesting, and hence include it here. Proof Clearly AC(Pn ) ≥ bdiam(Pn )/2c = b(n − 1)/2c. For the upper bound denote the vertices of Pn by v1 , . . . , vn , where vi is connected to vi+1 for all i ∈ [n − 1], and denote by pi the agent sitting initially in the vertex vi . Consider the following strategy that works in O(n) rounds: 1. Apply Claim 12.1 in order to route all agents p1 , . . . , pbn/2c to the vertices vdn/2e+1 , . . . , vn , and route pbn/2c+1 , . . . , pn to the vertices v1 , . . . , vdn/2e . This can be done in O(n) rounds. Note that after this sequence every pair of agents (pi , pj ) with 1 ≤ i ≤ bn/2c < j ≤ n have already met each other. 94

2. Repeat the above procedure recursively on each of the two halves (v1 , . . . , vdn/2e ) and (vdn/2e+1 , . . . , vn ) simultaneously. To bound the total time T (n) of the procedure, we make O(n) rounds in the first part, and at most T (dn/2e) in the remaining parts. This gives us a bound of T (n) = O(n) + T (dn/2e) = O(n), as required. The following corollary is immediate from Proposition 13.1. Corollary 13.2 Let G be Hamiltonian graph with n vertices. Then AC(G) = O(n). We next prove that for constant degree expanders the acquaintance time is also linear in the size of the graph. For α > 0 a d-regular graph G = (V, E) with n vertices is said to be an (n, d, α)-expander if for every subset S ⊆ V of size |S| ≤ |V |/2 it holds that |N (S) \ S| ≥ α · |S|. Proposition 13.3 (AC of expander graphs): Let G = (V, E) be an (n, d, α)-expander graph for some α > 0. Then AC(G) = Θ(n), where the multiplicative constant in the Θ() notation depends only on α and d but not on n. 2

n ) = Ω(n), since the expander is of constant degree. For the Proof Recall that AC(G) = Ω( |E| upper bound we shall need the following theorem due to Björklund, Husfeldt and Khanna [17] saying that every expander graph contains a simple path of linear length.

Theorem 13.4 ([17, Theorem 4]) Let G be an (n, d, α)-expander graph. Then, G contains a simple path of length Ω( αd · n). Let P be a simple path of even length ` in G, where ` = Ω(n). Such a path exists by Theorem 13.4. Partition all agents into c = d2n/`e disjoint subsets C1 ∪ · · · ∪ Cc each of size at most `/2. Then, for every pair i, j ∈ [c] we use the strategy from Claim 12.1 to place the agents from the two subsets Ci ∪ Cj on P , and then apply the strategy from Proposition 13.1 so that every pair of agents from Ci ∪ Cj meet. By repeating this strategy for every i, j ∈ [c], we make sure that every pair of agents on G meet each other. In order to analyze the total length of the strategy, we note that for a single pair i, j ∈ [c] the total time is at most n + O(`), and hence the total length of the strategy is at most   c AC(G) ≤ · O(n + `), 2 which is linear in n, since ` = Ω(n), and c = O(n/`) = O(1).

95

13.1

Separating AC(G) From Other Parameters

In this section we provide several additional examples. These examples separate AC(G) from other parameters of graphs. Our first example shows a graph with low diameter, low clique cover number (that is, G has low chromatic number), such that AC(G) is large. Proposition 13.5 (AC of the barbell graph): Let G be the barbell graph. That is, G consists of two cliques of size n connected by a single edge, called bridge. Then AC(G) = Θ(n). Proof The upper bound follows from Hamiltonicity of G (see Corollary 13.2). For the lower bound, denote the vertices of the two cliques by A and B, and denote the bridge by (a0 , b0 ), where a0 ∈ A and b0 ∈ B. Then, in any strategy for acquaintance either all agents from A visited a0 , or all agents from B visited b0 , and the proposition follows. A more interesting example shows the existence of a Ramsey graph G with AC(G) = 1, where by Ramsey graphs we refer to graphs that contains neither a clique nor an independent set of logarithmic size. For more details regarding graphs with AC(G) = 1 see Section 16. Proposition 13.6 (Ramsey graph with AC(G) = 1): There is a graph G on n vertices that contains neither a clique nor an independent set of size Ω(log(n)) such that AC(G) = 1. Proof Let H = (U = {u1 , . . . , un/2 }, F ) be a Ramsey graph on n/2 vertices that contains neither a clique nor an independent set of size O(log(n)). We construct G = (V, E) as follows. The vertices of G are two copies of U , i.e., V = {u1 , . . . , un/2 } ∪ {v1 , . . . , vn/2 }. The edges of G are the following. 1. The vertices {u1 , . . . , un/2 } induce a copy of H. That is, (ui , uj ) ∈ E if and only if (ui , uj ) ∈ F . 2. The vertices {v1 , . . . , vn/2 } induce the complement of H. That is, we set (vi , vj ) ∈ E if and only if (ui , uj ) ∈ / F. 3. For each i, j ∈ [n/2] we have (ui , vj ) ∈ E. By the properties of H it follows that G is also a Ramsey graph. Now, it is straightforward to check that the matching M = {(ui , vi ) : i ∈ [n/2]} is a 1-round strategy for acquaintance. The proof of Proposition 13.3 may suggest that a small routing number (as defined by Alon et al. [5]) implies fast acquaintance time. The following example shows separation between the two parameters for the complete bipartite graph Kn,n . It was shown in [5] that in Kn,n for any permutation of the vertices σ : V → V the agents can be routed from v ∈ V to the destination σ(v) in 4 rounds. We prove next that AC(Kn,n ) = Θ(log(n)). 96

Proposition 13.7 (AC of Kn,n ): Let n = 2r for some r ∈ N. Let Kn,n = (A, B, E) be complete bipartite graph with |A| = |B| = n. Then AC(Kn,n ) = log2 (n). Proof Assign each agent a string x = (x0 , x1 , . . . , xr ) ∈ {0, 1}r+1 such that all agents who started on the same side have the same first bit x0 . We now describe an r-rounds strategy for acquaintance. In the i’th round move all agents with xi = 0 to A and all agents with xi = 1 to B. Now if two agents are assigned strings x and x0 such that xi 6= x0i , then in the i’th round they will be on different sides of the graph, and hence will be acquainted. We now claim that r rounds are also necessary. Indeed, suppose we have a t-rounds strategy for acquaintance. Assign each agent a string x = (x0 , x1 , . . . , xt ) ∈ {0, 1}t+1 , where xi = 0 for i ≤ t if and only if in the i’th round the agent was in A. Note that two agents met during the t rounds if and only if their strings are different. This implies 2t+1 ≥ 2n, and thus t ≥ r, as required.

13.2

Exact Computation of AC for the path and the barbell graph

In this section we compute the acquaintance time of the path graph and of the barbell graph. Theorem 13.8 Let Pn be a path with n vertices, and let Bn be the barbell graph consisting of cliques of sizes dn/2e and bn/2c connected by a single edge. Then AC(Pn ) = AC(Bn ) = n − 2. Proof We first prove that AC(Pn ) ≤ n − 2 by describing a (n − 2)-rounds strategy for acquaintance in Pn . Then we prove that AC(Bn ) ≥ n − 2. This is clearly enough for the proof of the theorem as Pn is contained in Bn . In order to prove that AC(Pn ) ≤ n − 2 consider the strategy that in odd-numbered rounds flips all edges {(i, i + 1) : i odd}, and in the even-numbered rounds swaps all edges {(i, i + 1) : i even}. Consider the walk performed by an agent that begins in some odd-indexed vertex under this strategy. The agent will move one step up in each round until reaching the vertex n, will stay there for one round, and then move down one step in each round. Similarly, an agent starting at an even vertex will move down until reaching the vertex 1, stay there for one round and the move up. After n rounds, the agent who started in position i is in position n + 1 − i, and in particular every pair of agents have already met. We claim that in fact all agents are acquainted two rounds earlier. Indeed, consider two agents pi and pj who started in non-adjacent the vertices i ≤ j − 2 respectively. The proof follows by considering the following 3 cases.

97

1. |i − j| is even: Assume for concreteness that i and j are odd. (The case of i and j even is handled similarly) Then, pi meets pj in one of the first n − i − 1 rounds since after the (n − i − 1)’st rounds the agent pi reaches the vertex n − 1. 2. i is odd and j is even: In this case the agents move towards each other, and hence meet in the (j − i − 2)’nd round. 3. i is even and j is odd: Then, the agent pi reaches the vertex 1 after i − 1 rounds, stays there for another round, and then moves up. Therefore, in the t’th round the agent pi visits the vertex t − i + 1 for all i ≤ t ≤ n − 2. Analogously, for all n − j < t ≤ n − 2 the agent pj visits in t’th round the vertex 2n − (t + j − 1). This implies that in round the agents pi and pj are located in neighboring vertices n − i+j−1 number t = n − j−i+1 2 2 i+j−1 and n − 2 + 1 respectively. This completes the proof of the first part of the proof, namely AC(Pn ) ≤ n − 2. For the lower bound consider the barbell graph Bn consisting of two disjoint cliques of sizes dn/2e and bn/2c connected by a single edge, called the bridge. We claim that AC(Bn ) ≥ n−2. Suppose there is an m-round strategy for acquaintance in Bn with k swaps across the bridge. Any agent involved in such a swap is immediately acquainted with all others. Call these agents good. If the strategy has k swaps, then 2k of the m + 1 configurations (those before and after the bridge-swaps) have good agents at both endpoints of the bridge. Note that a second consecutive swaps across the bridge achieves nothing, and also that there is also no point in swapping across edges not incident with the bridge. Hence, if there are k swaps across the bridge, then the number of bad agents in the two cliques are at least dn/2e − k and bn/2c − k. These agents can only be acquainted by being by the bridge simultaneously, which requires at least (dn/2e − k) · (bn/2c − k) configurations. Therefore, we get m + 1 ≥ 2k + (dn/2e − k)(bn/2c − k) = k 2 − (n − 2)k + dn/2ebn/2c. This is minimized for k = n/2 − 1, giving a lower bound of m + 1 ≥ n − 1 for even values of n, and m + 1 ≥ n − 5/4 for odd n. This clearly suffices since m is an integer.

14

The Range of AC(G)

In this section we provide examples of families of graphs on n vertices whose acquaintance time ranges from constant to n1.5 . Theorem 14.1 For all n ∈ N and for all positive integers k ≤ n1.5 there exists an n-vertex graph G such that k/c ≤ AC(G) ≤ c · k for some universal constant c ≥ 1. 98

The proof of the theorem is divided into two parts. In Proposition 14.2 we take care of k ≤ n, and Proposition 14.3 takes care of n ≤ k ≤ n1.5 . Proposition 14.2 For all n ∈ N and for all positive integers k ≤ n there exists an n-vertex graph G such that k/c ≤ AC(G) ≤ c · k for some universal constant c ≥ 1. Proof In order to prove the proposition for n ∈ N such that n = 0 (mod k) consider the graph Gk,` = (V, E) with vertices V = {vi,j : i ∈ [k], j ∈ [`]}, where the vertices {vi,j : j ∈ [`]} form a clique for all i ∈ [k], and, in addition, for every i, i0 ∈ [k] such that |i − i0 | = 1 we have (vi,j , vi0 ,j ) ∈ E for all j ∈ [`]. That is, the vertices are divided into k cliques each of size `, and the edges between adjacent cliques form a perfect matching. We claim that AC(Gk,` ) = Θ(k). For a lower bound diam(Gk,` ) = k implies that AC(Gk,` ) = Ω(k). For an upper bound consider first the case of k = 2; that is, the graph consisting of two disjoint cliques each of size `, with ` edges between them forming a perfect matching. Then AC(G2,` ) = O(1), which can be witnessed by swapping `/2 vertices in one clique with `/2 vertices in the other clique a constant number of times. The bound AC(Gk,` ) = O(k) is obtained by using a strategy similar to the one for Pk described in Proposition 13.1, where we consider each clique as a single block, and each swap in Pk corresponds to a swap of the blocks, rather than single vertices. The only difference is the fact that even if two blocks of size ` are adjacent, it does not imply that all the 2` agents in the two blocks have met. In order to make them meet we apply the O(1)-rounds strategy above for the G2,` graph. This completes the proof for n = 0 (mod k). In order to generalize the example above to general n ∈ N let ` = bn/kc, and let r = n (mod k). Consider the graph G obtained from Gk,` by adding to it an r-clique and connecting it to r vertices in the last clique by a matching of size r. That is, the graph consists of k cliques of size ` and another clique of size r with matchings of maximal size between consecutive cliques. The lower bound of AC(G) ≥ bdiam(G)/2c = Ω(k) still holds, whereas for the upper bound we can first move the agents from the r-clique to meet all other vertices by visiting all cliques and return them back in 2k rounds, and then apply the strategy for Gk,` . This gives us AC(G) = Θ(k).

Proposition 14.3 For all n ∈ N and for all positive integers k ∈ [n, n1.5 ] there exists an n-vertex graph G such that k/c ≤ AC(G) ≤ c · k for some universal constant c ≥ 1. Proof Consider the graph Or,` that consists of r cliques of size `, another vertex z called the center, and in each clique one of the vertices is connected to the center. We claim AC(Or,` ) = Θ(min(n`, nr)). Since the total number of vertices in the graph is n = r` + 1, by choosing r ≈ k/n and ` ≈ n2 /k we will get that AC(Or,` ) = Θ(k), as required. 99

In order to prove an upper bound of O(nr) note that solving the acquaintance problem on  Or,` can be reduced to solving 2r problems of Hamiltonian graphs of size 2` + 1, where each problem corresponds to a pair of cliques with the center z. By Hamiltonicity each such problem is solved in O(`) rounds. In order to prove an upper bound of O(n`) we can bring every agent to the center, and all other agents will meet him in O(`) rounds, using the vertices connected to z. For the lower bound define for every agent pi and every time t ∈ N the variable ϕt (pi ) to be the number of agents that pi has met up to time t. Note that for h = AC(Or,` ) there is a strategy P such that i∈[n] ϕh (pi ) = n · (n − 1), since every agent met every other agent up to time h. P On the other hand, in each time t the sum i∈[n] ϕt (pi ) increases by at most 2r + `, as the only agents who could potentially affect the sum are those who moved to the center (contributing at most r to the sum), an agent who moved from the center to one of the cliques (contributing at most ` to the sum), and the r neighbors of the center (each contributing at most 1 to the sum). This implies a lower bound of AC(Or,` ) · (2r + `) = Ω(n2 ), as required. In order to construct an n-vertex graph for general n and k as in the assumption, take c, and t = n − 1 (mod `). Consider the n-vertex graph analogous to the r = b nk c, ` = b n−1 r construction above, that consists of r − t cliques of size `, t cliques of size ` + 1, and a center connected to one of the vertices is each clique. The argument above proves Proposition 14.3. Building on the lower bound in the proof of Proposition 14.3 we show that bottlenecks in graphs imply high acquaintance time. Proposition 14.4 Let G = (V, E) be a graph with n vertices. Suppose there is a subset of vertices S ⊆ V such that when removing S from G each connected component in the remaining (n2 )−|E| ). graph is of size at most `. Then AC(G) = Ω( |S|·`+P deg(s) s∈S

Proof Denote by G[V \ S] the graph obtained by removing from G the vertices in S and the edges touching them. Define for every agent pi and every time t ∈ N the set ϕt (pi ) ⊆ {pj : j ∈ [n]} to contain all agents that pi has met up to time t, as well as all agents who shared a connected component in G[V \ S] with pi up to time t. By definition of AC for h = AC(G) P there is a strategy such that i∈[n] |ϕh (pi )| = n · (n − 1), since every agent met every other agent up to time h. Note that in the t’th round the increment to ϕt (pi ) compared to ϕt−1 (pi ) is either because pi entered S and met new agents in S and in its connected components of G[V \ S], or because an agent left S and entered one of the connected components. Thus, in P P each time t the sum i∈[n] |ϕt (pi )| increases by at most |S| · ` + s∈S deg(s), where |S| · ` upper bounds the number of meetings that were added because of agents moving out of S, P while the value s∈S deg(s) bounds the number of meetings that are accounted for by agents (n2 )−|E| that entered S in round t. This implies a lower bound of Ω( |S|·`+P ), which completes s∈S deg(s) the proof of Proposition 14.4. 100

Next, we show that for every graph G with n vertices the acquaintance time is in fact asymptotically smaller than the trivial bound of 2n2 . We start with the following theorem.   2 Theorem 14.5 For every graph G with n vertices it holds that AC(G) = O log(n)/nlog log(n) . Below, in Theorem 14.8 we prove a stronger result, namely that for all n-vertex graphs G it holds that AC(G) = O(n1.5 ). Still, we think that the proof of Theorem 14.5 is also interesting, and we include it here. The proof of Theorem 14.5 relies on the following two claims. Claim 14.6 Let G be a graph with n vertices. If G contains a simple path of length `, then AC(G) = O(n2 /`). Claim 14.7 Let G be a graph with n vertices. If G has a vertex of degree ∆, then AC(G) = O(n2 /∆). We postpone the proofs of the claims until later, and show how to deduce Theorem 14.5 from them.   log(n) Proof of Theorem 14.5 Let k = Θ log log(n) be the largest integer such that k k ≤ n. For such a choice of k the graph G either contains a simple path of length k, or it contains a vertex of degree at least k. In the former case by Claim 14.6 we have AC(G) = O(n2 /k). In the latter case we use Claim 14.7 to conclude that AC(G) = O(n2 /k). The theorem follows. We now prove Claims 14.6 and 14.7. Proof of Claim 14.6: Assume without loss of generality that ` is the length of the longest simple path in G. Then, in particular, we have dist(u, v) ≤ ` for every two vertices u, v ∈ V . We shall also assume that G is a tree that contains a path of length ` (if not, apply the argument below on a spanning tree of G, which is enough as AC(G) is upper bounded by AC of its spanning tree). In order to prove the claim we apply Claim 12.1 together with Proposition 13.1 similarly to the proof of Proposition 13.3. Divide the agents into c = O(n/`) subsets C1 , . . . , Cc of size at most b`/2c each. For every pair i, j ∈ [c] we use Claim 12.1 to route the agents from the two subsets Ci ∪ Cj to a path of length `, and then apply the strategy from Proposition 13.1 so that every pair of agents from Ci ∪ Cj meet. By repeating this strategy for every i, j ∈ [c], we make sure that every pair of agents on G meet each other. Since dist(u, v) ≤ ` for every two vertices u, v ∈ V , by Claim 12.1 the agents Ci ∪ Cj can be routed to a path of length ` in O(`) rounds. Then, using the strategy from Proposition 13.1 every pair of agents from Ci ∪ Cj meet in at most O(`). Therefore, the acquaintance time of G can be upper bounded by   c AC(G) ≤ · O(`) = O(n2 /`), 2 101

as required. Proof of Claim 14.7: Assume without loss of generality that G is a tree rooted at a vertex r of degree ∆. (This can be done by considering a spanning tree of G.) Denote the children of r by v1 , . . . , v∆ , and let p1 , . . . , p∆ be the agents originally located at these vertices. We claim that there is an O(n)-rounds strategy that allows p1 , . . . , p∆ to meet all agents. Given such a strategy, we apply it on G repeatedly, with ∆ new agents in v1 , . . . , v∆ in each iteration. The agents can be placed there in n + 2∆ rounds using Claim 12.1. Repeating the P process we get that AG(G) ≤ dn/∆e O(n + ∆) = O(n2 /∆), as required. i=1 Next, we describe an O(n)-rounds strategy that allows p1 , . . . , p∆ to meet all agents. For any 1 ≤ i ≤ ∆ consider a subtree Ti = (Vi , Ei ) of G rooted at vi . It is enough to show how the agents p1 , . . . , p∆ can meet all agents from Ti in O(|Ti |) rounds. First, let pi meet all agents in Ti in O(|Ti |) steps and return back to vi . This can be done by running pi along a DFS of Ti . It is enough now to find a O(|Ti |)- rounds strategy that allows all agents of Ti to visit the root r. This task can be reduced to the routing problem considered in Claim 12.1. Specifically, define a tree T 0 on 2|Vi | + 1 vertices rooted at r that contains the tree Ti with additional |Vi | vertices each connected only to the root r. By Claim 12.1 there is a O(|Ti |)-rounds strategy in T 0 that routes all agents from the copy of Ti to the additional |Vi | vertices. It is easy to see that this strategy can be turned into a strategy that allows all agents of Ti to visit in the root r by disregarding the edges between r and the additional (imaginary) vertices. This completes the proof of Claim 14.7. Our next goal is to prove the following stronger theorem. Theorem 14.8 For all n-vertex graphs G it holds that AC(G) = O(n1.5 ). The key step in the proof of Theorem 14.8 is the following lemma. Lemma 14.9 Let G = (V, E) be a graph with n vertices, and suppose that the maximal degree of G is ∆. Then AC(G) ≤ 20∆n. Indeed, Theorem 14.8 is an immediate application of Lemma 14.9 together with Claim 14.7, as they imply that for any n vertex graph with maximal degree ∆ it holds that AC(G) ≤ min(O(n2 /∆), O(n∆)) ≤ O(n1.5 ). We now turn to the proof of Lemma 14.9. Proof Clearly, removing edges from G can only increase its acquaintance time. Thus, in order to upper bound AC(G) we may fix a spanning tree of G and use only the edges of the tree, and so, we henceforth assume that G is an n-vertex tree. A contour of the tree is a cycle that crosses each edge exactly twice, and visits each vertex v a number of times equal to its degree. Such a contour is obtained by considering a DFS walk on G (see Figure 1). We remove 102

Figure 1: A tree with a marked contour. an edge from the contour to get a path Γ in G of length 2n − 3, that visits every vertex at most ∆ times. Let π be the projection from Γ to G. We first argue that it is possible to choose n vertices on the path Γ that project to distinct (and hence all) vertices of G, so that the gaps between the chosen consecutive vertices along Γ are at most 3. (A similar statement appears in Lemma 2.4 in [11].) To do this, we need to pick one vertex of Γ from π −1 (x) for each x ∈ G. Fix a root for the tree at which Γ starts. In a vertex x in an even level we pick the first vertex of Γ projecting to x. For x in an odd level we pick the last one. See Figure 1 for an example. Note that Γ only visits leaves of the tree once. Between leaves the contour descends some levels towards the root, and then ascends. Along the descent vertices are visited for the last time, and so every other vertex is selected. Along the ascent vertices are visited for the first time. Consequently, it is not possible to have more than three steps of Γ between consecutive marked vertices. Consider the following n-rounds strategy. In even rounds we swap the edges {(i, i + 1) : i even}, and in odd rounds we swap the edges {(i, i + 1) : i odd}. It is easy to see that after n rounds the agents are in reversed order on the path, and so every pair of agents must have swapped places. The n agents on the vertices of G can be seen as being on the vertices of Γ, where we use the marks specified above to decide which vertex of Γ is occupied. In order to present a O(∆ · n)-rounds strategy for acquaintance in G we emulate the strategy for the path Γ, except that our goal is to make the n agents located in the marked vertices of Γ swap places, and hence meet. This is done by simulating each round of the strategy for Γ by a sequence of at most 20∆ 103

matchings. In order to swap a consecutive pair of agents pi and pj in vertices i and j we can perform a sequence of swaps in Γ, namely (i, i + 1), . . . , (j − 1, j), which brings the agent pi to the vertex j, followed by the sequence (j − 1, j − 2), . . . , (i + 1, i), bringing the agent pj to the vertex i. This projects to swaps on G that exchange the agents at π(i) and π(j) and leaves all others unchanged. The gaps between consecutive agents are at most 3 so it takes at most 5 steps on G to perform such a swap. The difficulty is that swapping between a pair of agents pi and pj could interfere with swapping another pair pi0 and pj 0 , which can happens if the projections of the intervals [i, j] and [i0 , j 0 ] in the path Γ intersect in G. If not for this problem, we would have a 5n round acquaintance strategy for G. In order to solve this problem, we shall separate each round into several sub-rounds, so that conflicting pairs are in different sub-rounds. Since Γ visits each vertex of G at most ∆ times, and since the intervals [i, j] of Γ that we care about are disjoint, each vertex of G is contained in at most ∆ such intervals. Each interval consists of at most 4 vertices of G, and therefore each pair [i, j] is in conflict with less than 4∆ other pairs [i0 , j 0 ]. We can assign each pair one of 4∆ colors, so that conflicting pairs have different colors. We now split the round into 20∆ subrounds where in 5 consecutive sub-rounds we swap all pairs of color i that are to be swapped in that round of the path strategy. Each round of the strategy on Pn can be simulated by 20∆ rounds on G, and hence AC(G) ≤ 20∆n. This completes the proof of Lemma 14.9.

15

N P-Hardness Results

In this section we show that the acquaintance time problem is N P-hard. Specifically, we prove the following theorem. Theorem 15.1 For every t ≥ 1 it is N P-hard to distinguish whether a given graph G has AC(G) ≤ t or AC(G) ≥ 2t. Before actually proving the theorem, let us first see the proof for the special case of t = 1. Special case of t = 1: We start with the following N P-hardness result, saying that for a given graph G it is hard to distinguish between graphs with small chromatic number and graphs with somewhat large independent set. Specifically, Lund and Yannakakis [56] prove the following result. Theorem 15.2 ([56, Theorem 2.8]) For every K ∈ N sufficiently large the following gap problem is N P-hard. Given a graph G = (V, E) distinguish between the following two cases: 104

• χeq (G) ≤ K; i.e., there exists a K-coloring of the vertices of G with color classes of size |V | each.44 K • α(G) ≤

n 45 . 2K

We construct a reduction from the problem above to the acquaintance time problem, that given a n graph G outputs a graph H so that (1) if χeq (G) ≤ K, then AC(H) = 1, and (2) if α(G) ≤ 2K , then AC(H) ≥ 2. Given a graph G = (V, E) with n vertices V = {vi : i ∈ [n]}, the reduction outputs a graph H = (V 0 , E 0 ) as follows. The graph H contains |V 0 | = 2n vertices, partitioned into two parts V 0 = V ∪ U , where |V | = n and U = U1 ∪ · · · ∪ UK with |Uj | = n/K for all j ∈ [K]. The vertices V induce the complement graph of G. For each j ∈ [K] the vertices of Uj form an independent set. In addition, we set edges between every pair of vertices (v, u) ∈ V × U as well as between every pair of vertices in (u, u0 ) ∈ Uj × Uj 0 for all j 6= j 0 . This completes the description of the reduction. Completeness: We first prove the completeness part, namely, if χeq (G) = K, then AC(H) = 1. Suppose that the color classes of G are V = V1 ∪ · · · ∪ VK with |Vj | = n/K for all j ∈ [K]. Note that each color class Vj induces a clique in H. Consider the matching that for each j ∈ [K] swaps the agents in Uj with the agents in Vj . (This is possible since by the assumption |Vj | = Kn , and all vertices of Uj are connected to all vertices of Vj .) In order to verify that such matching allows every pair of agents to meet each other, let us denote by pv the agent sitting originally in vertex v. Note that before the swap all pairs listed below have already met. 1. For all j ∈ [K] and every v, v 0 ∈ Vj the pair of agents (pv , pv0 ) have met. 2. For all j 6= j 0 and for every u ∈ Uj , u0 ∈ Uj 0 the pair of agents (pu , pu0 ) have met. 3. For all v ∈ V and u ∈ U the pair of agents (pv , pu ) have met. After the swap the following pairs meet. 1. For all j 6= j 0 and for every v ∈ Vj , v 0 ∈ Vj 0 the agents pv and pv0 meet using an edge between Uj and Uj0 . 2. For all j ∈ [K] and every u, u0 ∈ Vj the agents pu and pu0 meet using an edge in Vj . This completes the completeness part. 44

The statement of Theorem 2.8 in [56] says that G is K-colorable. However, it follows from the proof that in fact G is equi-K-colorable. 45 The statement of Theorem 2.8 in [56] says that χ(G) ≥ 2K. However, the proof implies that in fact α(G) ≤ n . 2K

105

Soundness: For the soundness part assume that AC(H) = 1. We claim that α(G) > 1/2K. Note first that if there is a single matching that allows all agents to meet, then for every j ∈ [K] all but at most K agents from Uj must have been moved by the matching to V . (This holds since U does not contain K + 1 clique.) Moreover, all the agents from Uj who moved to V must have moved to a clique induced by V . This implies that V contains a clique of size at least n/K − K > n/2K, which implies that α(G) > n/2K. This completes the proof of Theorem 15.1 for the special case of t = 1. The proof of Theorem 15.1 for general t ≥ 2 is quite similar, although it requires some additional technical details. Proof of Theorem 15.1: We start with the following N P-hardness result due to Khot [51], saying that for a given graph G it is hard to distinguish between graphs with small chromatic number and graphs with small independent set. Specifically, Khot proves the following result. Theorem 15.3 ([51, Theorem 1.6]) For every t ∈ N, and for every K ∈ N sufficiently large (it is enough to take K = 2O(t) ) the following gap problem is N P-hard. Given a graph G = (V, E) distinguish between the following two cases: • χeq (G) ≤ K. • α(G) ≤

n . 4t2t+1 K 2t

We construct a reduction from the problem above to the acquaintance time problem, that given a graph G outputs a graph H so that (1) if χeq (G) ≤ K, then AC(H) ≤ t, and (2) if α(G) ≤ n , then AC(H) ≥ 2t. 4t2t+1 K 2t Given a graph G = (V, E) with n vertices, the reduction r(G) outputs a graph H = (V 0 , E 0 ) as follows. The graph H contains |V 0 | = (t + 1)n vertices, partitioned into two parts V 0 = V ∪ U , where |V | = n and U = ∪i∈[t],j∈[K] Ui,j with |Ui,j | = n/K for all i ∈ [t], j ∈ [K]. The vertices V induce the complement graph of G. For each i ∈ [t], j ∈ [K] the vertices of Ui,j form an independent set. In addition, we set edges between every pair of vertices (v, u) ∈ V × U as well as between every pair of vertices in (u, u0 ) ∈ Ui,j × Ui0 ,j 0 for all (i, j) 6= (i0 , j 0 ). This completes the description of the reduction. Completeness: We first prove the completeness part, namely, if χeq (G) = K, then AC(H) ≤ t. Suppose that the color classes of G are V = V1 ∪ · · · ∪ VK with |Vj | = n/K for all j ∈ [K]. Note that each color class Vj induces a clique in H. We show that AC(H) ≤ t, which can be achieved as follows: For all i ∈ [t] in the i’th round the agents located in vertices Ui,j swap places with the agents in Vj for all j ∈ [K]. (This is possible since by the assumption |Ui,j | = |Vj | = Kn , and all vertices of Ui,j are connected to all vertices of Vj .)

106

We next verify that this strategy allows every pair of agents to meet each other. Indeed, denoting by pv the agent sitting originally in vertex v the only pairs who did not met each other before the first round are contained in the following two classes: 1. For all j 6= j 0 ∈ [K] and for every v ∈ Vj , v 0 ∈ Vj 0 the pair of agents (pv , pv0 ). 2. For each i ∈ [t] and j ∈ [K] and every u, u0 ∈ Ui,j the pair of agents (pu , pu0 ). Then, when the agents move along the prescribed matchings, the pairs from the first class meet after the first round. And for each round i ∈ [t] the pairs from the second class that correspond to u, u0 ∈ Ui,j for some j ∈ [K] meet after the i’th round. This proves the completeness part of the reduction. Soundness: For the soundness part assume that AC(H) ≤ 2t − 1, and consider the corresponding (2t − 1)-rounds strategy for acquaintance in G. By a counting argument there are n2 agents who originally were located in U and visited V at most once. By averaging, there are n agents who either never visited V or visited V simultaneously, and this was their only 2(2t−1) visit to V . Let us denote this set of agents by P0 . The following claim completes the proof of the soundness part. n Claim 15.4 Let P0 be a set of agents of size 2(2t−1) . Suppose they visited V at most once simultaneously, and visited U at most 2t times. If every pair of agents from P0 met each other n n during these rounds, then α(G) ≥ 2(2t−1)(tK) 2t > 4t2t+1 K 2t .

Proof Let us assume for concreteness that P0 stayed in U until the last round, and then moved to V . Associate with each agent a sequence of sets Ui,j of length 2t which he visited during the first 2t − 1 rounds. This defines a natural partition of P0 into (tK)2t clusters, where the agents are in the same cluster if and only if they have the same sequence. That is, the agents meet each other if and only if they are in different clusters. Thus, at least one of the clusters is of size |P0 | n at least (tK) 2t = 2(2t−1)(tK)2t . If we assume that every pair of agents from P0 met each other eventually, then it must be the case that in the last round each cluster moved to some clique in n V , and in particular G contains a clique of size 2(2t−1)(tK) 2t . The claim follows. We have shown a reduction from the coloring problem to the acquaintance time problem, that given a graph G outputs a graph H so that (1) if χeq (G) ≤ K, then AC(H) ≤ t, and (2) if n α(G) ≤ 4t2t+1 , then AC(H) ≥ 2t. This completes the proof of Theorem 15.1. K 2t

107

15.1

Towards stronger hardness results

We conjecture that, in fact, a stronger hardness result holds, compared to the one stated in Theorem 15.1. Conjecture 15.5 For every constant t ∈ N it is N P-hard to decide whether a given graph G has AC(G) = 1 or AC(G) ≥ t. Below we describe a gap problem similar in spirit to the hardness results of Lund and Yannakakis and that of Khot whose N P-hardness implies Conjecture 15.5. In order to describe the gap problem we need the following definition. Definition 15.6 Let t ∈ N and β > 0. A graph G = (V, E) is said to be (β, t)-intersecting if for every t subsets (not necessarily disjoint) of the vertices S1 , . . . , St ⊆ V of size βn and for every t bijections πi : Si → [βn] there exist j, k ∈ [βn] such that all pre-images of the pair (j, k) are edges in E, i.e., for all i ∈ [t] it holds that (πi−1 (j), πi−1 (k)) ∈ E. Note that a graph G is (β, 1)-intersecting if and only if G does not contain an independent set of size βn. In addition, note that if G is (β, t)-intersecting then it is also (β 0 , t0 )-intersecting for β 0 ≥ β and t0 ≤ t, and in particular α(G) < β. We remark without proof that the problem of deciding whether a given graph G is (β, t)intersecting is coN P-complete. We make the following conjecture regarding N P-hardness of distinguishing between graphs with small chromatic number and (β, t)-intersecting graphs. Conjecture 15.7 For every t ∈ N and for all K ∈ N sufficiently large it is N P-hard to distinguish between the following two cases for a given graph G = (V, E): • χeq (G) ≤ K. • The graph G is (1/K t , t)-intersecting. Remark Conjecture 15.7 does not seem to follow immediately from the result of Khot stated in Theorem 15.3. One reason for that is due to the fact that Khot’s hard instances for the problem are bounded degree graphs, and we suspect that such graphs cannot be (β, t)-intersecting for arbitrarily small β > 0 even in the case of t = 2. Theorem 15.8 Conjecture 15.7 implies Conjecture 15.5. The proof of this implication is analogous to the proof of Theorem 15.1, and we omit it. The reduction is exactly the same as described in the proof of Theorem 15.1 for the special case of t = 1. The analysis is similar to the proof of Theorem 15.1 for general t ≥ 1, where instead of using the assumption that α(G) is small we use the stronger assumption in the NO-case of Conjecture 15.7. 108

16

Graphs with AC(G) = 1

In this section we study graphs whose acquaintance time equals 1. We state some structural results for such graphs, and use them to give efficient approximation algorithms for AC on them. Specifically, for a graph G with AC(G) = 1 and a constant c we give a deterministic algorithm that returns an n/c strategy for acquaintance in G whose running time is nc+O(1) . We also give a randomized polynomial time algorithm that returns an O(log(n)) strategy for such graphs. These results appear in Section 16.1. Definition 16.1 Let G = (V, E) be a graph, and let V = A ∪ B ∪ C be a partition of the vertices with A = {ai }ki=1 and B = {bi }ki=1 for some k ∈ N. The tuple (A, B, C) is called a one-matching-witness for G if it satisfies the following conditions. 1. (ai , bi ) ∈ E for all i ∈ [k]. 2. Either (ai , bj ) ∈ E or (aj , bi ) ∈ E for all i 6= j ∈ [k]. 3. Either (ai , aj ) ∈ E or (bi , bj ) ∈ E for all i 6= j ∈ [k]. 4. The vertices of C induce a clique in G. 5. For all c ∈ C and for all i ∈ [k] we have either (c, ai ) ∈ E or (c, bi ) ∈ E. (In items 2, 3, and 5 the either-or condition is not exclusive.) Claim 16.2 A graph G = (V, E) satisfies AC(G) = 1 if and only if it has a one-matchingwitness. Proof Suppose first that AC(G) = 1, and let M = {(a1 , b1 ), . . . , (ak , bk )} be a matching that witnesses the assertion AC(G) = 1. Let A = {ai }ki=1 , B = {bi }ki=1 , and C = V \ (A ∪ B), Then (A, B, C) is a one-matching-witness for G. For the other direction, if (A, B, C) is a one-matching-witness for G, then the matching M = {(a1 , b1 ), . . . , (ak , bk )} is 1-round strategy for acquaintance in G. The following corollary is immediate from Claim 16.2. Corollary 16.3 Let G = (V, E) be an n-vertex graph that satisfies AC(G) = 1, and suppose that (A = {ai }ki=1 , B = {bi }ki=1 , C) is a one-matching-witness for G. Then, 1. For all i ∈ [k] it holds that deg(ai ) + deg(bi ) ≥ 2k + |C| = n. 2. For all c ∈ C we have deg(c) ≥ k + |C| − 1 ≥ bn/2c. 3. There are at least bn/2c vertices v ∈ V with deg(v) ≥ bn/2c. 109

Claim 16.4 Let G = (V, E) be a graph with n vertices that satisfies AC(G) = 1, and let U ⊆ V be the set of vertices v ∈ V such that deg(v) ≥ bn/2c. Then, for every W ⊆ V \ U there exists a matching of size |W | between U and W . Proof Let (A, B, C) be a one-matching-witness for G. Note that by Corollary 16.3 Item 2 we have C ⊆ U , and thus W ⊆ A ∪ B. By Item 1 of Corollary 16.3 for every i ∈ [k] it holds that deg(ai ) + deg(bi ) ≥ n, and thus either ai or bi belongs to U . Therefore, the required matching is given by M = {(ai , bi ) : i ∈ [k] such that either ai ∈ W or bi ∈ W }. The following proposition gives additional details on the structure of graphs with AC(G) = 1. It will be used later for the analysis of a (randomized) approximation algorithm for acquaintance in such graphs (see Theorem 16.8). Proposition 16.5 Let G = (V, E) be a graph with n vertices that satisfies AC(G) = 1, and let u, v ∈ V be two vertices of degree at least n/2. Then, either |N (u) ∩ N (v)| = Ω(n) or |E[N (u), N (v)]| = Ω(n2 ), where E[N (u), N (v)] = {(a, b) ∈ E : a ∈ N (u), b ∈ N (v)} denotes the set of edges between N (u) and N (v). Proof If |N (u) ∩ N (v)| ≥ 0.1n, then we are done. Assume now that |N (u) ∩ N (v)| < 0.1n. Therefore |N (u) ∪ N (v))| > 0.9n, as |N (u)| + |N (v)| ≥ n. Define two disjoint sets N 0 (u) = N (u) \ N (v) and N 0 (v) = N (v) \ N (u), and note that by disjointness we have |N 0 (u)| ≥ 0.4n and |N 0 (v)| ≥ 0.4n. It suffices to prove that |E[N 0 (u), N 0 (v)]| = Ω(n2 ). Suppose that (A = {ai }ki=1 , B = {bi }ki=1 , C) is a one-matching-witness for G. Consider the indices I = {i ∈ [k] : ai , bi ∈ N 0 (u) ∪ N 0 (v)}, and define a partition I = Iu ∪ Iv ∪ Iu,v , where Iu = {i ∈ [k] : ai , bi ∈ N 0 (u)}, Iv = {i ∈ [k] : ai , bi ∈ N 0 (v)}, and Iu,v = I \ (Iu ∪ Iv ). Also, define Cu = C ∩ N 0 (u), and Cv = C ∩ N 0 (v). Note that |N 0 (u)| = |Cu | + 2|Iu | + |Iu,v |, and analogously |N 0 (v)| = |Cv | + 2|Iv | + |Iu,v |. Using this partition we have |E[N 0 (u), N 0 (v)]| ≥ |Cu | · |Cv | +

|Iu,v |2 + |Iu | · |Iv | + |Iu | · |Cv | + |Iv | · |Cu |, 2

where the first term follows from the fact that C induces a clique (Definition 16.1 Item 4), the second and third terms follow from Item 2 of Definition 16.1, and the last two terms follow from Item 5 of Definition 16.1. |2 Now, if |Iu,v | > 0.2n, then |E[N 0 (u), N 0 (v)]| ≥ |Iu,v ≥ 0.02n2 , as required. Otherwise, 2 we have |Cu | + |Iu | ≥ 0.1n and |Cv | + |Iv | ≥ 0.1n, and therefore |E[N 0 (u), N 0 (v)]| ≥ (|Cu | + |Iu |) · (|Cv | + |Iv |) ≥ 0.01n2 , as required.

110

16.1

Algorithmic results

Recall that (unless P = N P) there is no polynomial time algorithm that, when given a graph G with AC(G) = 1, finds a 1-round strategy for acquaintance of G. In this section we provide two approximation algorithms regarding graphs whose acquaintance time equals 1. In Theorem 16.7 we give a deterministic algorithm that finds an n/c-rounds strategy for acquaintance in such graphs whose running time is nc+O(1) . In Theorem 16.8 we give a randomized algorithm that finds an O(log(n))-rounds strategy for acquaintance in such graphs. We start with the following simple deterministic algorithm. Proposition 16.6 There is a deterministic polynomial time algorithm that when given as input an n-vertex graph G = (V, E) such that AC(G) = 1 outputs an n-rounds strategy for acquaintance in G. Proof The algorithm works by taking one agent at a time and finding a 1-round strategy that allows this agent to meet all others. For each agent p the algorithm works as follows. If the location of p is the vertex v ∈ V , then for each possible destination u ∈ N (v) ∪ {v} for p the algorithm constructs the bipartite graph Hv,u = (A ∪ B, F ) where A = V \ (N (v) ∪ {v}) and B = N (u) \ {v}, and there is an edge (a, b) ∈ A × B in F if and only if it is contained in E. The algorithm then looks for a matching of size |A| in Hv,u . Such a matching, if it exists, can be found in polynomial time (e.g., using an algorithm for maximum flow). We claim below that such a matching, augmented with the edge (v, u) if needed (that is, if v 6= u), gives a 1-round strategy that allows p to meet all other agents. Repeating this procedure for all agents gives an n-rounds strategy for acquaintance in G. In order to prove correctness of the algorithm we claim that for all v ∈ V there is some u ∈ N (v) ∪ {v} such that the graph Hv,u contains a matching of size |A|. Furthermore, any such matching, augmented with the edge (v, u) if needed, gives a 1-round strategy that allows p to meet all other agents. Indeed, Hv,u contains all edges from the agents who didn’t meet p before the first round, to the neighbours of u. Consider a 1-round strategy for acquaintance in G, and let u be the vertex in which p is located after this round. This strategy (restricted to the edges of Hv,u ) induces a matching of size |A|, and so there exists some u ∈ N (v) ∪ {v} such that Hv,u contains a matching of size |A|. To finish the proof of correctness note that any matching of size |A| in Hv,u gives a 1-round strategy that allows p to meet all other agents. We modify the proof above to get the following stronger result. Theorem 16.7 There is an algorithm that when given as input c ∈ N and an n-vertex graph G = (V, E) with AC(G) = 1 outputs an dn/ce-rounds strategy for acquaintance in G in time nc+O(1) .

111

Proof Modifying the algorithm in the proof of Proposition 16.6 we may take c agents at a time, and find one matching that allows each of them to meet all agents. Given a set P = {p1 , . . . , pc } of agents located in the vertices {v1 , . . . , vc } respectively the matching is found as follows. The algorithm goes over all possible destinations {ui ∈ N (vi )∪{vi } : i ∈ [c]} for all agents pi . If (vi , vj ) ∈ / E and (ui , uj ) ∈ / E for some i 6= j, then the agents pi and pj do not meet, and we skip to the next potential destination. Otherwise, define a bipartite graph H = (A ∪ B, F ) T S S where A = V \(( j∈[c] N (vi )) {vi , ui : i ∈ [c]}) and B = j∈[c] N (ui )\{vi , ui : i ∈ [c]}. We add the edge (a, b) ∈ A × B to H if the agent located originally in a can move to b and meet all agents from P (who moved in the same round from vi to ui ). Formally, we add (a, b) ∈ A × B to H if and only if (a, b) ∈ E and for all i ∈ [c] either a ∈ N (vi ) or b ∈ N (ui ) (or both). The algorithm then looks for a matching of size |A| in H, and when found, augments it with {(vi , ui ) : i ∈ [c]} and adds it to the strategy for acquaintance in G. The correctness is very similar to the argument in the proof of Proposition 16.6, showing that any 1-round strategy induces the desired matching in H for some choice of u1 , . . . , uc , and that any matching in H gives a 1-round strategy that allows each of the agents in P to meet every agent of G. We now turn to a randomized polynomial time algorithm with the following guarantee. Theorem 16.8 There is a randomized polynomial time algorithm which when given a graph G with AC(G) = 1 finds an O(log(n))-rounds strategy for acquaintance in G with high probability. Proof Let G = (V, E) be an n-vertex graph, and let U ⊆ V be the set of vertices of degree at least bn/2c. By Item 3 of Corollary 16.3 we have |U | ≥ bn/2c. The following lemma describes a key step in the algorithm. Lemma 16.9 Let PU be the agents originally located in U . Then, there exists a polynomial time randomized algorithm that finds an O(log(n))-rounds strategy that makes every two agents in PU meet with high probability. Now, consider all the agents P in G. For every subset of P 0 ⊆ P of size |P 0 | ≤ |U | we can use the aforementioned procedure to produce an O(log(n))-rounds strategy that allows all agents in P 0 to meet with high probability. Let us partition the agents P into a constant number  | c = 2|V of disjoint subsets P = P1 ∪ · · · ∪ Pc with at most b|U |/2c agents each in each Pi , |U | and apply the procedure to each pair Pi ∪ Pj separately. By Claim 16.4 we can transfer any Pi ∪ Pj to U in one step. When all pairs have been dealt with, all agents have already met each other. This gives us an O(log(n))-rounds strategy for the acquaintance problem in graphs with AC(G) = 1 that can be found in randomized polynomial time. 112

We return to the proof of Lemma 16.9. Proof of Lemma 16.9 We describe a randomized algorithm that finds an O(log(n))-rounds strategy that allows every two agents in PU to meet. Consider the following algorithm for constructing a matching M . 1. Select a random ordering σ : {1, . . . , |U |} → U of U . 2. Start with the empty matching M = ∅. 3. Start with an empty set of vertices S = ∅. The set will include the vertices participating in M , as well as some of the vertices that will not move. 4. For each i = 1, . . . , |U | do (a) Set ui = σ(i). (b) Select a vertex u0i ∈ N (ui ) ∪ {ui } as follows. i. With probability 0.5 let u0i = ui . ii. With probability 0.5 pick u0i ∈ N (ui ) uniformly at random. / S, then (c) If ui ∈ / S and u0i ∈

// (ui , u0i ) will be used in the current step

i. S ← S ∪ {ui , u0i }. ii. If ui 6= u0i , then M ← M ∪ {(ui , u0i )}. 5. Output M . The following claim bounds the probability that a pair of agents in PU meet after a single step of the algorithm. Claim 16.10 For every u, v ∈ U , let pu and pv be the agents located in u and v respectively. Then, Pr[The agents pu , pv meet after one step] ≥ c for some absolute constant c > 0 that does not depend on n or G . In order to achieve an O(log(n))-rounds strategy that allows every two agents in PU to meet apply the matching constructed above, and then return the agents to their original positions (by applying the same matching again). Repeating this random procedure independently d 3 log(n) e c 3 times will allow every pair of agents to meet with probability at least 1/n . Therefore, by a union bound all pairs of agents pu , pv ∈ PU will meet with probability at least 1/n. This completes the proof of Lemma 16.9. Proof of Claim 16.10 We claim first that for every i ≤ |U | and for every vertex w ∈ U ∪ N (U ) the probability that in step 4 of the algorithm the vertex w has been added to S 113

before the i’th iteration is upper bounded by 3i/n. Indeed, Pr[∃j < i such that w ∈ {uj , u0j }] ≤ Pr[w ∈ {uj : j < i}] + Pr[w ∈ {u0j : j < i}] i

i X ≤ + Pr[u0j = w] n j=1 i + i · 1/bn/2c n 4i [for n > 1] ≤ n ≤

where the bound Pr[u0j = w] ≤ 1/bn/2c follows from the assumption that deg(uj ) ≥ bn/2c for all uj ∈ U , and hence, the probability of picking u0j to be w is 1/ deg(uj ) ≤ 1/bn/2c. Let T ∈ {2, . . . , |U |} be a parameter to be chosen later. Now, let i ≤ |U | be the (random) index such that σ(i) = u, and let j ≤ |U | be the (random) index such that σ(j) = v. Then, Pr[i ≤ T and j ≤ T ] =

T 2



· (|U | − 2)! T (T − 1) T2 = ≥ 2. |U |! 2 · |U | · (|U | − 1) 4n

Conditioning on this event, the probability that either u or u0 have been added to S before iteration i is upper bounded by 4i ≤ 4T , and similarly the probability that either v or v 0 have n n T2 8T been added in S before iteration j is at most 4T . Therefore, with probability at least 4n 2 ·(1− n ) n both (u, u0 ) and (v, v 0 ) will be used in the current step. Therefore, Pr[the agents pu , pv meet after one step] ≥

8T T2 · (1 − ) · Pr[(u0 , v 0 ) ∈ E]. 2 4n n

In order to lower bound Pr[(u0 , v 0 ) ∈ E] we use Proposition 16.5, saying that for every two vertices u, v ∈ U it holds that either N (u) ∩ N (v) ≥ αn or |E[N (u), N (v)]| ≥ α · n2 for some constant α > 0 that does not depend on n or G. Therefore, for every u, v ∈ U it holds that Pr[(u0 , v 0 ) ∈ E] ≥ α/4. Letting T = αn/12 we get that Pr[the agents pu , pv meet after one step] = Ω(α3 ), as required.

17

Other Variants and Open Problems

There are several variants of the problem that one may consider. 1. The problem of maximizing the number of pairs that meet when some predetermined number t ∈ N of matchings is allowed. Clearly, this problem is also N P-complete, even in the case t = 1.

114

2. There are two graphs G and H. The agents move along matchings of H, but meet if they share an edge in G. In particular, this looks natural if H is contained in G. 3. Instead of choosing a matching in each round, one may choose a vertex-disjoint collection of cycles, and move agents one step along the cycle. This is a generalization of the problem discussed in this paper, where we allow only collections of 2-cycles. One may also consider a more game-theoretic variant of the problem: Let G = (V, E) be a fixed graph with one agent sitting in each vertex of G. In each round every agent pu sitting in a vertex u ∈ V chooses a neighbor u0 ∈ N (v) according to some strategy. Then, for every edge (v, w) ∈ E the agents pv and pw swap places if the choice of the agent pv was w and the choice of pw was v. Suppose that the graph is known, but the agents have no information regarding their location in the graph (e.g., G is an unlabeled vertex transitive graph). Find an optimal strategy for the agents so that everyone will meet everyone else as quickly as possible. The question also makes sense in the case where the graph is not known to the agents. We conclude with a list of open problems. Problem 17.1 Find AC of the Hypercube graph. Recall that AC(Hypercube) is between Ω(n/ log(n)) and O(n), where the lower bound is trivial from the number of edges, and the upper bound follows from Hamiltonicity of the graph (Corollary 13.2). Problem 17.2 Prove Conjecture 15.5, namely, that for every constant t ∈ N it is N P-hard to decide whether a given graph G has AC(G) = 1 or AC(G) ≥ t. Recall that it follows from Conjecture 15.7. Problem 17.3 Prove stronger inapproximability results. Is it true that AC is hard to approximate within a factor of log(n)? How about n0.01 ? How about n0.99 ? Note that the upper bound AC(G) ≤ n2 /∆ from Claim 14.7 together the lower bound of AC(G) ≥ n/∆ gives an O(n)-approximation algorithm for the problem. Problem 17.4 Give a polynomial time algorithm that given a graph G outputs a graph H such that AC(H) = f (AC(G)) for a super-linear function f : N → N (e.g., f (n) = n2 ). Such an algorithm can be useful for hardness of approximation results for the AC problem. Problem 17.5 Derandomize the algorithm given in the proof of Theorem 16.8. Problem 17.6 Give a structural result regarding graphs with small constant values of AC(G) similar to Claim 16.2. Also, is there an efficient O(log(n))-approximation algorithm for such graphs? Problem 17.7 Given algorithmic results on approximation of AC on special families of graphs. For example, we do not know whether it is possible to O(1)-approximate AC on trees. 115

Part IV

Bi-Lipschitz Bijection between the Boolean Cube and the Hamming Ball Abstract We construct a bi-Lipschitz bijection from the Boolean cube to the Hamming ball of equal volume. More precisely, we show that for all even n ∈ N there exists an explicit  bijection ψ : {0, 1}n → x ∈ {0, 1}n+1 : |x| > n/2 such that for every x 6= y ∈ {0, 1}n it holds that

dist(ψ(x), ψ(y)) 1 ≤ ≤ 4, 5 dist(x, y)

where dist(·, ·) denotes the Hamming distance. In particular, this implies that the Hamming ball is bi-Lipschitz transitive. This result gives a strong negative answer to an open problem of Lovett and Viola [CC 2012], who raised the question in the context of sampling distributions in low-level complexity classes. The conceptual implication is that the problem of proving lower bounds in the context of sampling distributions requires ideas beyond the sensitivity-based structural results of Boppana [IPL 97]. We study the mapping ψ further and show that it (and its inverse) are computable in DLOGTIME-uniform TC0 , but not in AC0 . Moreover, we prove that ψ is “approximately local” in the sense that all but the last output bit of ψ are essentially determined by a single input bit.

116

18

Introduction to Part IV

The Boolean cube {0, 1}n and the Hamming ball Bn = {x ∈ {0, 1}n+1 : |x| > n/2}, equipped with the Hamming distance, are two fundamental combinatorial structures that exhibit, in some aspects, different geometric properties. As a simple illustrative example, for an even integer n ∈ N, consider the vertex and edge boundaries46 of {0, 1}n and Bn , when viewed as subsets of {0, 1}n+1 of equal density 1/2. The Boolean cube is easily seen to maximize vertex boundary among all subsets of equal density (since all its vertices lie on the boundary), whereas Harper’s vertex-isoperimetric inequality [42] implies that the Hamming ball is in fact the unique minimizer. The same phenomena occurs for edge boundary, though interestingly, the roles are reversed: among all monotone sets47 of density 1/2, the Poincaré inequality implies that the Boolean cube is the unique minimizer of edge boundary, whereas a classical result of Hart shows that the Hamming ball is the unique maximizer [43]. From the Boolean functions perspective, the indicator of {0, 1}n embedded in {0, 1}n+1 is commonly referred to as the dictator function, and the indicator of Bn ⊂ {0, 1}n+1 is the majority function, and it is a recurring theme in the analysis of Boolean functions that they are, in some senses, opposites of one another. Lovett and Viola [55] suggested to utilize the opposite structure of the Boolean cube and the Hamming ball for proving lower bounds on sampling by low-level complexity classes such as AC0 and TC0 . In particular, Lovett and Viola were interested in proving that for any even n, any bijection f : {0, 1}n → Bn has a large average stretch, where avgStretch(f ) = Ex∼{0,1}n [dist(f (x), f (x + ei ))] , i∼[n]

and dist(·, ·) denotes the Hamming distance. To be more precise, Lovett and Viola raised the following open problem. Problem 18.1 ([55], Open Problem 4.1) Let n ∈ N be an even integer. Prove that for any bijection f : {0, 1}n → Bn , it holds that avgStretch(f ) = (log n)ω(1) .

(39)

A positive answer to Problem 18.1 would demonstrate yet another scenario in which the Boolean cube and the Hamming ball have a different geometric structure – any bijection from the former to the latter does not respect distances. Furthermore, a positive answer to Problem 18.1 would have applications to lower bounds for sampling in AC0 ; even a weaker claim, where the right The edge boundary of a subset A ⊂ {0, 1}n+1 is set of edges with one endpoint in A and one outside A. The vertex boundary of A is the set of vertices outside A that are endpoints of boundary edges. 47 Recall that a subset A ⊂ {0, 1}n+1 is monotone if x ∈ A implies y ∈ A for all y  x. 46

117

hand side in Equation (39) is replaced with ω(1), would have implications for sampling in the lower class NC0 . We discuss this further in Section 18.2. Arguably, the simplest and most natural bijection ϕ : {0, 1}n → Bn to consider is the following. ( flip(x) ◦ 1 if |x| ≤ n/2 ϕ(x) = x◦0 otherwise, where flip(x) denotes the bit-wise complement of x. It is straightforward to verify that avgStretch(ϕ) = √ Θ( n). To see this, note that any edge (x, y) in {0, 1}n , where |x| = n/2 and |y| = n/2 + 1, contributes n to the average stretch, whereas all other edges contribute 1. The assertion then √ follows since Θ(1/ n) fraction of the edges are of the first type. In fact, the maximum stretch of ϕ is n, where maxStretch(ϕ) = max n dist(ϕ(x), ϕ(x + ei )). x∈{0,1} i∈[n]

As far as we know, prior to our work this simple bijection achieved the best-known upper bound on the average stretch between {0, 1}n and Bn , and no non-trivial upper bounds (i.e. sublinear) on maximum stretch were known. For a survey on metric embeddings of finite spaces see [54]. In particular, a lot of research has been done on the question of embedding into the Boolean cube. For example, see [7, 45] for some work on embeddings between random subsets of the Boolean cube, and [40] for isometric embeddings of arbitrary graphs into the Boolean cube.

18.1

Our Results

The main result of this paper is a strong negative answer to Problem 18.1. Theorem 18.2 (Main theorem) For all even integers n, there exists a bijection ψ : {0, 1}n → Bn with maxStretch(ψ) ≤ 4 and maxStretch(ψ −1 ) ≤ 5. Theorem 18.2 highlights a surprising geometric resemblance between the Boolean cube and the Hamming ball. In the language of metric geometry, Theorem 18.2 says that there is a bi-Lipschitz bijection between the two spaces. Corollary 18.3 (A bi-Lipschitz bijection between {0, 1}n and Bn ) For all even integers n, there exists a bijection ψ : {0, 1}n → Bn , such that for every x 6= y ∈ {0, 1}n it holds that dist(ψ(x), ψ(y)) 1 ≤ ≤ 4. 5 dist(x, y) 118

As a corollary from Theorem 18.2, we obtain that the subgraph of {0, 1}n+1 induced by the vertices of Bn is bi-Lipschitz transitive. Informally speaking, this says that any two points in Bn have roughly the same “view” – even the unique point with Hamming weight n + 1 and the boundary points which have weight n/2 + 1. More formally, Corollary 18.4 (The Hamming balls are uniformly bi-Lipschitz transitive) For all even integers n, and for every two vertices x, y ∈ Bn there is a bijection f : Bn → Bn such that f (x) = y, f (y) = x, and for every z 6= w ∈ Bn , it holds that 1 dist(f (z), f (w)) ≤ ≤ 20. 20 dist(z, w) To see this, first note that Bn is a convex subset of {0, 1}n+1 , and thus, the distances between vertices in Bn are the same as their distances as a subset of the cube. Now, for a given pair x, y ∈ Bn , let x0 = ψ −1 (x) and y 0 = ψ −1 (y), where ψ is the function from Theorem 18.2. Define f : Bn → Bn as f (z) = ψ(ψ −1 (z) ⊕ x0 ⊕ y 0 ). It is easy to see that f indeed satisfies the requirements of Corollary 18.4. Approximating ψ i We highlight another consequence of our main theorem that is perhaps somewhat surprising – the bijection ψ of Theorem 18.2 is “approximately local” in the sense that almost all of its output bits are essentially determined by only a constant number of inputs bits. To see this, we view the bijection ψ : {0, 1}n → Bn as a vector of Boolean functions hψ1 , . . . , ψn+1 i, where ψi (x) is the ith output bit of ψ on input x ∈ {0, 1}n . Recall that the total influence of a Boolean function ψi : {0, 1}n → {0, 1} is the quantity Inf [ ψi ] = Ex∼{0,1}n [#{j ∈ [n] : ψi (x) 6= ψi (x + ej )}] . By linearity of expectation, Theorem 18.2 implies that a typical ψi has bounded total influence. Ei∼[n+1] [Inf [ ψi ]] = avgStretch(ψ) ≤ maxStretch(ψ) ≤ 4.

(40)

Next we recall Friedgut’s Junta Theorem, which states that a Boolean function with bounded total influence is well-approximated by another Boolean function that only depends on a constant number of input bits. More precisely, Theorem 18.5 (Friedgut’s Junta Theorem [32]) Let f : {0, 1}n → {0, 1} be a Boolean function. For every  > 0 there exists a Boolean function g : {0, 1}n → {0, 1} such that g is a 2O(Inf [ f ]/) -junta48 and Pr[f (x) 6= g(x)] ≤ . 48

Recall that a k-junta is a Boolean function that only depends on at most k of its input bits.

119

Combining Equation (40) with Friedgut’s Junta Theorem, we see that for any constants δ,  > 0, all but a δ-fraction of the ψi ’s are -approximated by O(1)-juntas. A similar argument has appeared recently in the paper of Austin [9], where he studies Bi-Lipschitz functions F : [0, 1]N → [0, 1]M . In fact, we give a direct proof that the function ψ in Theorem 18.2 satisfies the following stronger property. Proposition 18.6 For all i ∈ [n] it holds that √ Pr[ψi (x) = xi ] > 1 − O(1/ n). x

That is, all but the last output bit of ψ are essentially determined by a single input bit. The complexity of ψ. Since the original motivation for constructing ψ comes from efficient sampling of distributions, Theorem 18.2 is of larger interest if the bijection ψ (and ψ −1 ) can be computed by low-level circuits. Proposition 18.7 The bijections ψ and ψ −1 are computable in DLOGTIME-uniform TC0 . Remark In fact, we show that for all i ∈ [n + 1] there is an NC0 -reduction from majority to ψi . That is, TC0 is the “correct” complexity of ψ, and in particular, ψ is not in AC0 . See Proposition 20.1 and Remark 20 for details.

18.2

The Complexity of Distributions

Lower bounds in circuit complexity are usually concerned with showing that a family of functions {fn : {0, 1}n → {0, 1}}n∈N cannot be computed by a family of circuits {Cn }n∈N belonging to some natural class of circuits such as AC0 or TC0 . Taking a broader interpretation of computation, it is often interesting to show that a class of circuits cannot perform a certain natural task beyond just computing a function. One such natural task, introduced by Viola [71], is that of sampling distributions. In this problem, for a given distribution D supported on {0, 1}n , we are looking for a function f : {0, 1}m → {0, 1}n that samples (or approximates) D, that is, for a uniformly random x ∼ {0, 1}m , the distribution f (x) is equal (or close to) D, and furthermore, each output bit fi of the function f belongs to some low-level complexity class, such as AC0 or TC0 . As a concrete example, let U⊕ be the uniform distribution over the set {(x, parity(x)) : x ∈ {0, 1}n−1 } ⊆ {0, 1}n . Note that although the parity function is not computable in AC0

120

(see [44] and references therein) there is a function f : {0, 1}n−1 → {0, 1}n that samples U⊕ , such that each output bit depends on only two input bits: f (x1 , . . . , xn−1 ) = (x1 , x1 + x2 , x2 + x3 , . . . , xn−2 + xn−1 , xn−1 ). Motivated by the foregoing somewhat surprising example, Viola [71] suggested to replace parity above with majority – the other notoriously hard function for AC0 . The following two problems have been stated in [55]. Problem 18.8 Let n ∈ N be even. Does there exist a bijection g : {0, 1}n → Bn such that each output bit of g is computable in AC0 ? Problem 18.9 Let n ∈ N be odd. Does there exist a bijection h : {0, 1}n → {(x, majority(x)) : x ∈ {0, 1}n } such that each output bit of h is computable in AC0 ? Note that a positive answer to Problem 18.8 implies a positive answer to Problem 18.9. Indeed, if g : {0, 1}n → Bn is an embedding from Problem 18.8, then the function h : {0, 1}n+1 → {0, 1}n+2 defined as  g(x , . . . , x ) ◦ 1 1 n h(x1 , . . . , xn , xn+1 ) = flip(g(x , . . . , x )) ◦ 0 1

n

xn+1 = 1 xn+1 = 0

gives an embedding for Problem 18.9. Therefore, a negative answer to Problem 18.9 implies a negative answer to Problem 18.8. In the other direction, if a function h : {0, 1}n → {(x, majority(x)) : x ∈ {0, 1}n } gives a positive answer to Problem 18.9, then the function g : {0, 1}n → {0, 1}n defined as49  (h(x , . . . , x )) 1 n [1,...,n] g(x1 , . . . , xn ) = flip(h(x , . . . , x )) 1

n

[1,...,n]

(h(x1 , . . . , xn ))n+1 = 1 otherwise

samples Bn−1 using input of length n, which almost50 answers Problem 18.9. On the positive side Viola [71] showed an explicit AC0 circuit C : {0, 1}poly(n) → {0, 1}n of size poly(n) whose output distribution has statistical distance 2−n from the uniform distribution on {(x, majority(x)) : x ∈ {0, 1}n }. Problem 18.1 was raised by Lovett and Viola [55] in an attempt to prove a lower bound for Problem 18.8. A positive answer to Problem 18.1 would imply a lower bound for Problem 18.8, We use the following notation: for a string s ∈ {0, 1}n and integers i ≤ j in [n], the string s[i,...,j] denotes the substring si si+1 · · · sj . 50 Problem 18.9 asks for a function that takes n − 1 bits as input. 49

121

since by the result of [21], any function f : {0, 1}n → {0, 1}n+1 computable by a polynomial size Boolean circuit of constant depth has average stretch at most logO(1) (n). As we resolved Problem 18.1 negatively in a strong sense, it seems that new ideas beyond the sensitivity-based structural results of [21] will be required in order to resolve Problems 18.8 and 18.9. For a lower bound Viola [70] gave an explicit construction of a function b : {0, 1}n → {0, 1} such that (x, b(x)) cannot be sampled by AC0 circuits. That is, it gives a negative answer to Problem 18.9 if we replace majority by the function b. Nonetheless, we feel that it would still be interesting to give a negative answer to Problem 18.9 for majority function, since this is a more natural function.

18.3

Proof Overview

In this section we describe in high-level the proof of Theorem 18.2. A full proof is given in Section 19. Let n ∈ N be an even integer. Our goal is to map {0, 1}n to Bn in a way that the two endpoints of every edge in {0, 1}n are mapped to close vertices in Bn . The key building block we use is a classical partition of the vertices of {0, 1}n to symmetric chains, due to De Bruijn, Tengbergen, and Kruyswijk [23], where a symmetric chain is a path {ck , ck+1 , . . . , cn−k } in {0, 1}n , such that each ci has Hamming weight i (see Figure 2). As a first step, we study the chains in the partition of De Bruijn et al. Roughly speaking, we show51 that adjacent chains move closely to each other. More precisely, if two adjacent vertices x and y belong to different chains, then x and y have the same distance from the top of their respective chains, up to some additive constant. Moreover, the lengths of the two chains differ by at most some additive constant, and the ith vertex in one chain, when counting from the top, is O(1)-close to the ith vertex in the other chain (if such exists). We now describe how to map {0, 1}n to Bn based on the partition of De Bruijn et al. Consider a chain ck , ck+1 , . . . , cn−k . Our mapping will “squeeze” the vertices to the top half of the cube while exploiting the extra dimension. In particular, every vertex will climb up its chain half the distance it has from the top, and then, the collision between two vertices is resolved by setting the extra last bit to 1 for the first vertex and to 0 for the second vertex. More precisely, the vertex cn−k , which is at the top of its chain, is mapped to cn−k ◦1, while cn−k−1 is mapped to cn−k ◦ 0. The third vertex from the top cn−k−2 is mapped to cn−k−1 ◦ 1 while cn−k−3 is mapped to cn−k−1 ◦ 0 and so on. The vertex ck at the bottom of the chain is mapped to cn/2 ◦ 1, which is indeed in Bn . Consider now two adjacent vertices x, y in {0, 1}n . By the above, these vertices reside in “close” chains with roughly the same length and have roughly the same distance from the top of 51

This is somewhat implicit in our proofs, and is mentioned here mainly in order to build an intuition.

122

their respective chains. Thus, in the climbing process, both x and y will be mapped to vertices that have roughly the same distance from the top of their respective chains, and hence, from the discussion above, their images will be O(1)-close.

19

Proof of the Main Theorem

In this section we prove the main theorem. In Section 19.1 we describe the De Bruijn-TengbergenKruyswijk partition. In Section 19.2 we define the mapping ψ and prove basic facts about it. In Section 19.3 we give the proof for Theorem 18.2, omitting some technical details that can be found in Section 19.4.

19.1

The De Bruijn-Tengbergen-Kruyswijk Partition

Definition 19.1 Let n be an even integer. A symmetric chain in {0, 1}n is a sequence of vertices C = {ck , ck+1 , . . . , cn−k } such that |ci | = i for i = k, k + 1, . . . , n − k, and dist(ci , ci+1 ) = 1 for i = k, k + 1, . . . , n − k − 1. We say that a symmetric chain is monotone if it satisfies the following property: if ci−1 and ci differ in the j th coordinate, and ci and ci+1 differ in the (j 0 )th coordinate, then j < j 0 . We shall represent a monotone symmetric chain as follows. Let y ∈ {0, 1, t}n be such that m = |{i : yi = t}| satisfies m ≡ n (mod 2), and let k = (n−m)/2. The monotone symmetric chain Cy = {ck , ck+1 , . . . , cn−k } is specified by y as follows. For i = k, k + 1, . . . , n − k, the string ci is obtained by replacing the m−(i−k) leftmost symbols t of y by 0 and the remaining i − k symbols t by 1. Note that Cy is indeed a monotone symmetric chain. De Bruijn, Tengbergen, and Kruyswijk [23] suggested a recursive algorithm that partitions {0, 1}n to monotone symmetric chains. We will follow the presentation of the algorithm described in [72] (see Problem 6E in Chapter 6). The algorithm gets as input a string x ∈ {0, 1}n , and computes a string y ∈ {0, 1, t}n which encodes the monotone symmetric chain Cy that contains x. The algorithm is iterative. During the running of the algorithm, every coordinate of x is either marked or unmarked, where we denote a marked 0 by ˆ0 and a marked 1 by ˆ1. In each step, the algorithm chooses a consecutive pair 10, marks it by ˆ1ˆ0, temporarily deletes it, and repeats the process. The algorithm halts once there is no such pair, that is the remaining string is of the form 00 . . . 01 . . . 11. We call this stage of the algorithm the marking stage, and denote the marked string by mark(x) ∈ {0, 1, ˆ0, ˆ1}n . The string y is then defined as follows: if the ith bit of x was marked then yi = xi . Otherwise, yi = t. For example, consider the string x = 01100110. At the first iteration, the algorithm may mark the third and fourth bits to obtain 01ˆ1ˆ00110. Then, the second and fifth bits are 123

1111

ˆ1ˆ011

ˆ1ˆ1ˆ0ˆ0

ˆ1ˆ0ˆ1ˆ0

ˆ1ˆ000

1ˆ1ˆ01

11ˆ1ˆ0

ˆ1ˆ001

0ˆ1ˆ01

0ˆ1ˆ00

00ˆ1ˆ0

0111

01ˆ1ˆ0

0011

0001

0000 Figure 2: The De Bruijn-Tengbergen-Kruyswijk Partition for n = 4. ˆ marked 01ˆ 1ˆ 0ˆ 0110. Lastly, the rightmost two bits are marked, and we obtain the marked string mark(x) = 0ˆ1ˆ1ˆ0ˆ01ˆ1ˆ0. Hence y = t1100 t 10 and Cy = {01100010, 01100110, 111100110}. Readily, the algorithm induces a partition of {0, 1}n to monotone symmetric chains. We stress that although the algorithm has some degree of freedom when choosing a 10 pair out of, possibly, many pairs in a given iteration, the output of the algorithm, y, is independent of the specific choices that were made. That is, y is a function of x, and does not depend on the specific order in which the algorithm performs the marking. This assertion can be proven easily by induction on n. As a consequence, we may choose the ordering of the 10 pairs as we wish. We will use this fact in the proof of Theorem 18.2.

19.2

The Bijection ψ

We define the mapping ψ as follows. Let n ∈ N be an even integer. For an input x ∈ {0, 1}n , let C = {ck , ck+1 , . . . , cn−k } be the symmetric chain from the partition of De Bruijn et al., that contains x. Let j be the index such that x = cj . Define  c (n−k)+j ◦ 1 j ≡ (n − k) (mod 2); def 2 ψ(x) = c (n−k)+j+1 ◦ 0 j ≡ 6 (n − k) (mod 2).

(41)

2

Claim 19.2 The mapping ψ is a bijection from {0, 1}n to Bn . Proof We first show that the range of ψ is Bn . Consider x ∈ {0, 1}n and let C = {ck , ck+1 , . . . , cn−k } be the symmetric chain that contains x. Suppose that x = cj for some k ≤ j ≤ n − k. If

124

j ≡ (n − k) (mod 2), then using the fact that j ≥ k, n−k+j n +1> . |ψ(x)| = c n−k+j ◦ 1 = 2 2 2 Otherwise, j 6≡ (n − k) (mod 2). Since n is even, it follows that j 6≡ k (mod 2), and thus j ≥ k + 1. Hence, n |ψ(x)| = c n−k+j+1 ◦ 0 > . 2 2 In both cases ψ(x) ∈ Bn . We conclude the proof by describing the inverse mapping ψ −1 : Bn → {0, 1}n . For z ∈ Bn , write z = x ◦ zn+1 , where x ∈ {0, 1}n and zn+1 is the (n + 1)st bit of z. Let C = {ck , ck+1 , . . . , cn−k } be the symmetric chain that contains x, and let j be the index such that x = cj (note that j ≥ n/2). Then, ψ −1 (z) =

 c

2j−(n−k)

if zn+1 = 1;

c

2j−(n−k)−1

if zn+1 = 0.

It is straightforward to verify that this is indeed the inverse mapping of ψ.

(42)

In order to un-

derstand the mapping ψ better, consider x ∈ {0, 1}n and let y ∈ {0, 1, t}n be the encoding of the chain that contains x. Note that if 1 ≤ i1 < i2 < · · · < it ≤ n are the coordinates in which y contains t, then there exists some 0 ≤ ` ≤ t such that xi1 = · · · = xi` = 0 and xi`+1 = · · · = xit = 1. That is, x is located at the (` + 1)st position of the chain Cy , when counting from the top. The function ψ outputs the vertex located at the (b`/2c + 1)st position in the chain, concatenated with 1 or 0, depending on the parity of `. In other words, we obtain ψ(x) by keeping intact all the bits of x in the coordinates other than ib`/2c+1 , . . . , i` , and by setting ψ(x)ib`/2c+1 = · · · = ψ(x)i` = 1. Then, we append 1 to the obtained string if ` is even, and append 0 otherwise. For example, let us consider the Boolean cube {0, 1}4 , whose partition is presented in Figure 2, and write explicitly where each vertex in the chain of length 5 is mapped under ψ. ψ(1111) = 1111 ◦ 1, ψ(0111) = 1111 ◦ 0, ψ(0011) = 0111 ◦ 1, ψ(0001) = 0111 ◦ 0, ψ(0000) = 0011 ◦ 1,

125

where we write the concatenation mark ◦ only for the sake readability. The following claim is immediate from the definition of ψ. Claim 19.3 Fix a string x ∈ {0, 1}n . Let M ⊆ [n] be the set of marked coordinates in mark(x). Then, • For every i ∈ M it holds that ψ(x)i = xi . • For every j ∈ [n] \ M , the j th coordinate of ψ(x) does not depend on any of the bits {xi }i∈M . We are now ready to prove Theorem 18.2.

19.3

Proof of Theorem 18.2

Proof [Proof of Theorem 18.2] We first show that maxStretch(ψ) ≤ 4. Take x ∈ {0, 1}n and i ∈ [n] such that xi = 0. Our goal is to show that dist(ψ(x), ψ(x + ei )) ≤ 4. As mentioned in Section 19.1, the output of the algorithm on input x is independent of the order in which the algorithm marks the 10 pairs. Therefore, given an input x, we may perform the marking stage in three steps: 1. Perform the marking stage on the prefix of x of length i − 1. 2. Perform the marking stage on the suffix of x of length n − i. 3. Perform the marking stage on the resulting, partially marked, string. Since x and x + ei agree on all but the ith coordinate, the running of the marking stage in steps 1 and 2 yield the same marking. That is, prior to the third step the strings x and x + ei have the same bits marked. Denote by s ∈ {0, 1, ˆ0, ˆ1}i−1 and t ∈ {0, 1, ˆ0, ˆ1}n−i the two partially marked strings such that the resulted strings after the second step on inputs x and x + ei are s ◦ 0 ◦ t and s ◦ 1 ◦ t respectively. Let us suppose for concreteness that the string s contains a unmarked zeros and b unmarked ones, and the string t contains c unmarked zeros and d unmarked ones. Recall that at the end of the marking stage, all unmarked zeros are to the left of all unmarked ones in both s and t. By Claim 19.3, the only coordinates that may contribute to dist(ψ(x), ψ(x + ei )) are the unmarked coordinates prior to the third step, and so dist(ψ(x), ψ(x + ei )) = dist(ψ(0a 1b ◦ 0 ◦ 0c 1d ), ψ(0a 1b ◦ 1 ◦ 0c 1d )).52 52

Note that ψ on the right hand side is applied to inputs of length not necessarily n. However, for the sake of readability, we do not indicate the input length when applying ψ. In other words, ψ is a shorthand for a family of functions {ψn }n∈N .

126

Therefore, it is enough to bound from above the right hand side by 4. At this point, it is fairly easy to be convinced that the right hand side is bounded by some constant. Proving that the constant is 4 is done by a somewhat tedious case analysis, according to the relations between a, b, c and d. We defer the proof of the following claim to Section 19.4. Claim 19.4 For every a, b, c, d ∈ N, we have dist(ψ(0a 1b ◦ 0 ◦ 0c 1d ), ψ(0a 1b ◦ 1 ◦ 0c 1d )) ≤ 4. This completes the proof for maxStretch(ψ) ≤ 4. We now prove that maxStretch(ψ −1 ) ≤ 5, where we use the description of ψ −1 given in Equation (42). In order to bound maxStretch(ψ −1 ), let us fix an edge in Bn , that is, take z ∈ Bn and i ∈ [n + 1] such that zi = 0 and show that dist(ψ −1 (z), ψ −1 (z + ei )) ≤ 5. By the proof of Claim 19.2, if i = n + 1 then ψ −1 (z) and ψ −1 (z + ei ) are consecutive vertices in some monotone symmetric chain, and thus dist(ψ −1 (z), ψ −1 (z + ei )) = 1. Therefore, we shall assume henceforth that i 6= n+1. Let z = x◦zn+1 and z+ei = (x+ei )◦ zn+1 for some x ∈ {0, 1}n and zn+1 ∈ {0, 1}. Similarly to the proof for maxStretch(ψ) ≤ 4, we perform the marking stage by first performing the marking stage on the prefix of x of length i − 1, then perform the marking stage on the suffix of x of length n − i, and finally, perform the marking stage on the resulting, partially marked string. Denote by s ∈ {0, 1, ˆ0, ˆ1}i−1 and t ∈ {0, 1, ˆ0, ˆ1}n−i the two partially marked strings such that the resulted strings after the second step on inputs x and x + ei are s ◦ 0 ◦ t and s ◦ 1 ◦ t respectively. Suppose again for concreteness that the string s contains a unmarked zeros and b unmarked ones, and the string t contains c unmarked zeros and d unmarked ones. By Claim 19.3, the only coordinates that may contribute to dist(ψ −1 (z), ψ −1 (z + ei )) are the unmarked coordinates in s and t, and so dist(ψ −1 (z), ψ −1 (z + ei )) = dist(ψ −1 (0a 1b ◦ 0 ◦ 0c 1d ◦ zn+1 ), ψ −1 (0a 1b ◦ 1 ◦ 0c 1d ◦ zn+1 )). Therefore, it is enough to upper bound the right hand side by 5. We first note that a + c + 1 ≤ b + d. To see this recall that |z| > n/2 and 0a 1b ◦ 0 ◦ 0c 1d was obtain from x = z1 . . . zn (that is, z without its last bit zn+1 ) by deleting the same number of zeros and ones. Claim 19.5 For every a, b, c, d ∈ N such that a + c + 1 ≤ b + d, and for every zn+1 ∈ {0, 1} it holds that dist(ψ −1 (0a 1b ◦ 0 ◦ 0c 1d ◦ zn+1 ), ψ −1 (0a 1b ◦ 1 ◦ 0c 1d ◦ zn+1 )) ≤ 5.

127

Therefore, by Claim 19.5 we have maxStretch(ψ −1 ) ≤ 5. This completes the proof of Theorem 18.2.

19.4

Proof of Missing Claims

We now return to the proofs of Claim 19.4 and Claim 19.5. Proof [Proof of Claim 19.4] Let w = 0a 1b ◦ 0 ◦ 0c 1d and w0 = 0a 1b ◦ 1 ◦ 0c 1d . We prove the claim using the following case analysis. It will be convenient to introduce the function even : N → {0, 1} defined as even(n) = 1 if n is even, and even(n) = 0 otherwise. Case 1 (b = c). In this case we have w = 0a ◦ 1b 0b ◦ 01d and w0 = 0a 1 ◦ 1b 0b ◦ 1d . After the marking stage we get mark(w) = 0a ◦ ˆ1b ˆ0b ◦ 01d and mark(w0 ) = 0a 1 ◦ ˆ1b ˆ0b ◦ 1d . Therefore, ψ(w) = 0b

a+1 c 2

1a−b

a+1 c 2

◦ 1b 0b ◦ 1d+1 ◦ even(a + 1)

and a

a

ψ(w0 ) = 0b 2 c 1d 2 e+1 ◦ 1b 0b ◦ 1d ◦ even(a). By inspection, one can now easily verify that dist(ψ(w), ψ(w0 )) ≤ 4 in this case. Case 2 (b > c). In this case we have w = 0a ◦ 1b−c−1 ◦ 1c+1 0c+1 ◦ 1d and w0 = 0a ◦ 1b−c+1 ◦ 1c 0c ◦ 1d . After the marking stage we get mark(w) = 0a 1b−c−1 ◦ ˆ1c+1 ˆ0c+1 ◦ 1d and mark(w0 ) = 0a 1b−c+1 ◦ ˆ1c ˆ0c ◦ 1d . Therefore, a

a

ψ(w) = 0b 2 c 1d 2 e+b−c−1 ◦ 1c+1 0c+1 ◦ 1d ◦ even(a) and a

a

ψ(w0 ) = 0b 2 c 1d 2 e+b−c+1 ◦ 1c 0c ◦ 1d ◦ even(a). Therefore, in this case, dist(ψ(w), ψ(w0 )) ≤ 1. Case 3 (b < c and a ≥ c − b). In this case we have w = 0a ◦ 1b 0b ◦ 0c−b+1 ◦ 1d and w0 = 0a ◦ 1b+1 1b+1 ◦ 0c−b−1 ◦ 1d . After the marking stage we get mark(w) = 0a ◦ ˆ1b ˆ0b ◦ 0c−b+1 1d and mark(w0 ) = 0a ◦ ˆ1b+1 ˆ0b+1 ◦ 0c−b−1 1d . By the assumption that a ≥ c − b we have a ≥ b a+c−b+1 c, 2 and so a+c−b+1 a+c−b+1 ψ(w) = 0b 2 c 1a−b 2 c ◦ 1b 0b ◦ 1d+c−b+1 ◦ even(a + c − b + 1) and ψ(w0 ) = 0b

a+c−b−1 c 2

1a−b

a+c−b−1 c 2

◦ 1b+1 0b+1 ◦ 1d+c−b−1 ◦ even(a + c − b − 1).

128

Therefore, by inspection we have dist(ψ(w), ψ(w0 )) ≤ 4 for this case. Case 4 (b < c and a < c − b). Just like in the previous case, we have mark(w) = 0a ◦ ˆ1b ˆ0b ◦ 0c−b+1 1d and mark(w0 ) = 0a ◦ ˆ1b+1 ˆ0b+1 ◦ 0c−b−1 1d . By the assumption that a < c − b, we have a ≤ b a+c−b−1 c, and so 2 ψ(w) = 0a ◦ 1b 0b ◦ 0b

a+c−b+1 c−a 2

1c−b+1+d−b

a+c−b+1 c+a 2

◦ even(a + c − b + 1)

and ψ(w0 ) = 0a ◦ 1b+1 0b+1 ◦ 0b

a+c−b−1 c−a 2

1c−b−1+d−b

Therefore, in this case, dist(ψ(w), ψ(w0 )) ≤ 2. This completes the proof of Claim 19.4.

a+c−b−1 c+a 2

◦ even(a + c − b − 1).

We now turn to the proof of Claim 19.5.

Proof [Proof of Claim 19.5] Let w = 0a 1b ◦ 0 ◦ 0c 1d and w0 = 0a 1b ◦ 1 ◦ 0c 1d . Our goal is to show that dist(ψ −1 (w ◦ zn+1 ), ψ −1 (w0 ◦ zn+1 )) ≤ 5. Let us suppose for simplicity that zn+1 = 1. The case zn+1 = 0 is handled similarly, and the same bound is achieved. We prove the claim using the following case analysis. Case 1 (b = c). In this case we have w = 0a ◦ 1b 0b ◦ 01d and w0 = 0a 1 ◦ 1b 0b ◦ 1d . After the marking stage we get mark(w) = 0a ◦ ˆ1b ˆ0b ◦ 01d and mark(w0 ) = 0a 1 ◦ ˆ1b ˆ0b ◦ 1d . The assumption a + c + 1 ≤ b + d implies that in this case we have d − a − 1 ≥ 0. Therefore, ψ −1 (w ◦ 1) = 0a ◦ 1b 0b ◦ 0a+2 1d−a−1 and ψ −1 (w0 ◦ 1) = 0a+1 ◦ 1b 0b ◦ 0a−1 1d−a+1 . Therefore, dist(ψ −1 (w ◦ 1), ψ −1 (w0 ◦ 1)) ≤ 4. Case 2 (b < c). In this case we have w = 0a ◦1b 0b ◦0c−b+1 1d and w0 = 0a ◦1b+1 0b+1 ◦0c−b−1 1d . After the marking stage we get mark(w) = 0a ◦ ˆ1b ˆ0b ◦ 0c−b+1 1d and mark(w0 ) = 0a ◦ ˆ1b+1 ˆ0b+1 ◦ 0c−b−1 1d . Therefore, ψ −1 (w ◦ 1) = 0a ◦ 1b 0b ◦ 0a+2(c−b+1) 1d−(a+c−b+1) and ψ −1 (w0 ◦ 1) = 0a ◦ 1b+1 0b+1 ◦ 0a+2(c−b−1) 1d−(a+c−b−1) .

129

Therefore, dist(ψ −1 (w ◦ 1), ψ −1 (w0 ◦ 1)) ≤ 3. Case 3 (b > c). In this case, w = 0a 1b−c−1 ◦ 1c+1 0c+1 ◦ 1d and w0 = 0a 1b−c+1 ◦ 1c 0c ◦ 1d . After the marking stage we get mark(w) = 0a 1b−c−1 ◦ ˆ1c+1 ˆ0c+1 ◦ 1d and mark(w0 ) = 0a 1b−c+1 ◦ ˆ1c ˆ0c ◦ 1d . Subcase 3.1 (a < b − c) ψ −1 (w ◦ 1) = 02a 1b−c−a−1 ◦ 1c+1 0c+1 ◦ 1d and ψ −1 (w0 ◦ 1) = 02a 1b−c−a+1 ◦ 1c 0c ◦ 1d . Thus, dist(ψ −1 (w ◦ 1), ψ −1 (w0 ◦ 1)) ≤ 1. Subcase 3.2 (a = b − c) ψ −1 (w ◦ 1) = 02a−1 ◦ 1c+1 0c+1 ◦ 01d−1 and ψ −1 (w0 ◦ 1) = 02a 1 ◦ 1c 0c ◦ 1d . Therefore, in this case, dist(ψ −1 (w ◦ 1), ψ −1 (w0 ◦ 1)) ≤ 3. Subcase 3.3 (a > b − c) ψ −1 (w ◦ 1) = 0a+b−c−1 ◦ 1c+1 0c+1 ◦ 0a−b+c+1 1d−(a−b+c+1) and ψ −1 (w0 ◦ 1) = 0a+b−c+1 ◦ 1c 0c ◦ 0a−b+c−1 1d−(a−b+c−1) . Thus, dist(ψ −1 (w ◦ 1), ψ −1 (w0 ◦ 1)) ≤ 5. This completes the proof of Claim 19.5.

20

The Mapping ψ is Computable in DLOGTIME-uniform TC0

In this section we analyze the complexity of the bijection ψ described in the proof of Theorem 18.2. We first claim that each output bit of ψ (and of ψ −1 ) can be computed in DLOGTIME130

uniform TC0 . In Proposition 20.1 and in the remark following it, we show that indeed TC0 is the “correct” class for ψ. Proposition 18.7 (restated). The bijections ψ and ψ −1 are computable in DLOGTIME-uniform TC0 . We prove the proposition only for ψ. The proof for ψ −1 is very similar, and we omit it. Proof We divide the proof into two steps. First we show that the marking stage can be implemented in TC0 . Then, given the marking of an input, we show how to compute ψ in TC0 . Both steps can be easily seen to be DLOGTIME-uniform. Throughout the proof, the output of the marking stage is represented by two bits for each coordinate, encoding a symbol in {0, 1, ˆ0, ˆ1}, where one bit represents the Boolean symbol, and the other indicates whether the coordinate is marked or not. Implementing the marking stage. Let x ∈ {0, 1}n . In order to implement the marking stage in TC0 , we observe that the ith coordinate in x is marked if and only if there are coordinates si ≤ i ≤ ei such that 1. The number of ones in x[si ,...,ei ] is equal to the number of zeros in x[si ,,...,ei ] . 2. For every k ∈ {si , , . . . , ei }, the number of ones in the prefix x[si ,...,k] is greater or equal to the number of zeros in x[si ,...,k] . Fix i ∈ [n] and fix si , ei ∈ [n] such that si ≤ i ≤ ei . Thinking of the bit 1 as ’(’ and 0 as ’)’, the above two conditions are equivalent to checking whether the string x[si ,...,ei ] of parentheses is balanced, or in other words, deciding whether x[si ,...,ei ] is in Dyck language. It is well-known that Dyck language can be recognized in TC0 [57]. In fact, it is not hard to show that deciding whether a string of length m is in Dyck language can be carried out by a DLOGTIME-uniform TC0 circuit with size O(m). Now, for each i ∈ [n], we go over all choices for si , ei in parallel, and take the OR of the O(n2 ) results. Thus, for each i ∈ [n], there is a DLOGTIME-uniform TC0 circuit with size O(n3 ) that decides whether the ith coordinate is marked or not. Computing ψ(x) from mark(x): In order to compute ψ(x), let mark(x) ∈ {0, 1, ˆ0, ˆ1}n be the marking of x. Since every marked coordinate will remain unchanged, we need to consider only of the unmarked coordinates. Recall also that the unmarked bits form a sequence of zeros followed by a sequence of ones. That is, if we ignore the marked coordinates, then we get a string of the form 0a 1b for some a = a(x), b = b(x), and the output should be a a 0b 2 c 1d 2 e+b ◦ even(a) (recall that even(a) = 1 if a is even, and even(a) = 0 otherwise). This can be implemented as follows. 131

1. Let a be the number of unmarked zeros in mark(x). 2. For each i ∈ [n], let ui = ui (x) be the number of unmarked coordinates among {1, . . . , i}. 3. For all unmarked coordinates i ∈ [n], if 2ui < a, then set the ith bit of the output to 0. Otherwise, set the ith bit to 1. 4. Set the (n + 1)st bit of the output to even(a). It is easy to verify that given mark(x), checking whether the inequality 2ui < a holds can be done in TC0 , and so the entire second step can be carried out by a TC0 circuit. We remark that the bijection ψ cannot be computed in AC0 . For example, we prove that the first output bit of ψ cannot be computed in AC0 . Proposition 20.1 The function majority is NC0 -reducible to ψ1 , i.e., majority ≤NC0 ψ1 . In particular ψ1 ∈ / AC0 . Proof We first note that ψ1 (x) = 0 if and only if x1 = 0 and mark(x) contains at least two unmarked zeros. For odd n, we construct a reduction r : {0, 1}n → {0, 1}3n+1 that for input x ∈ {0, 1}n outputs a string r(x) ∈ {0, 1}3n+1 as follows. Let x0 ∈ {0, 1}2n be the string obtained from x by replacing each 0 of x with 10, and by replacing each 1 of x with 00. Define r(x) = 0 ◦ 1n ◦ x0 . For example, if x = 01101, then x0 = 10 ◦ 00 ◦ 00 ◦ 10 ◦ 00, and r(x) = 0 ◦ 15 ◦ 10 ◦ 00 ◦ 00 ◦ 10 ◦ 00. By the definition of r, it is clear that each bit of r(x) depends on at most one bit of x. It is straightforward to check that majority(x) = ψ1 (r(x)), and the assertion, then, follows. Remark Note that the reduction above also gives majority ≤NC0 ψn+1 . A similar proof also shows that majority ≤NC0 ψi for all i ∈ [n + 1].

21

All But the Last Output Bit Depend Essentially on a Single Input Bit

In this section we prove Proposition 18.6. We recall it here for convenience. Proposition 18.6 (restated). For all i ∈ [n] it holds that √ Pr[ψi (x) = xi ] > 1 − O(1/ n). x

132

Before proving the proposition, we need to further study the structure of the De Bruijn-TengbergenKruyswijk partition described in Section 19.1. We start with the following claim. Claim 21.1 Let n be an integer, and let P be a partition of {0, 1}n into symmetric chains. For every 1 ≤ t ≤ n + 1, let Mt be the number of symmetric chains of length t in P. Then, (

n n−t+1 2

Mt =





n n−t−1 2



0

t 6≡ n (mod 2); otherwise.

Proof Note first that if C = {ck , ck+1 , . . . , cn−k } is a symmetric chain, then its length is n − 2k + 1. In particular, this implies that there are no symmetric chains of length t where t ≡ n (mod 2), and hence Mt = 0 for such t. Next, we prove the claim for t 6≡ n (mod 2). This is done by backward induction on t. For t = n + 1 we clearly have a unique symmetric chain starting at 0n and ending at 1n , and hence Mn+1 = 1, as claimed. Before actually doing the induction step, let us consider the next case, namely, t = n − 1. Note that only one of the vertices of Hamming weight 1 is contained in the unique chain of length n+1, and so, since distinct vertices with equal weight are contained in distinct symmetric chains, there are n − 1 chains with bottom vertex of Hamming weight 1. Therefore Mn−1 = n − 1, as claimed. For the general induction step, suppose that the claim holds for all t0 larger than t. We prove the assertion for t 6≡ n (mod 2). Every symmetric chain of length t must be of the . Since the chains of length greater than t form C = {ck , ck+1 , . . . , cn−k }, where k = n−t+1 2 are disjoint, and each contains a vertex with Hamming weight k, it follows that the number of vertices with Hamming weight k that are contained in chains of length greater than t is    P n n n 0 = M . The remaining − vertices must be contained in chains of length 0 t t >t k−1 k k−1 t, and so, since distinct vertices of Hamming weight k are contained in distinct symmetric   n chains, it follows that there are nk − k−1 chains of length t. The following corollary is immediate from the observation that any x ∈ {0, 1}n such that mark(x) contains exactly a unmarked zeros and b unmarked ones is contained in a unique chain of length a + b + 1 in the De Bruijn-Tengbergen-Kruyswijk partition. Corollary 21.2 Let n, a, b ∈ N such that a + b ≡ n (mod 2), and a + b ≤ n. Then, 1. The number of x ∈ {0, 1}n such that mark(x) contains exactly a unmarked zeros and b   n n unmarked ones is n−a−b − n−a−b−2 . 2

2

2. The number of x ∈ {0, 1}n such that mark(x) contains exactly a unmarked zeros (and  n any number of unmarked ones) is b n−a . c 2

133

We are now ready to prove Proposition 18.6. Proof [Proof of Proposition 18.6] Let x ∈ {0, 1}n , and let mark(x) be its marking. Suppose that the unmarked coordinates in mark(x) are i1 < i2 < · · · < it , and let 0 ≤ ` ≤ t be such that xi1 = · · · = xi` = 0 and xi`+1 = · · · = xit = 1. Note that ψi (x) 6= xi if and only if the ith coordinate is unmarked in mark(x) and i = ij for some j ∈ {b 2` c + 1, . . . , `}. As in the proof of Theorem 18.2, it will be convenient to perform the following partial marking of x. First perform the marking stage on the prefix of x of length i − 1, and denote the resulting string by s ∈ {0, 1, ˆ0, ˆ1}i−1 . Then, perform the marking stage on the suffix of x of length n − i, and denote the result string by t ∈ {0, 1, ˆ0, ˆ1}n−i . Suppose for concreteness that the string s contains a unmarked zeros and b unmarked ones, and the string t contains c unmarked zeros and d unmarked ones. By the definition of ψ we have ψi (x) 6= xi if and only if xi = 0, b = 0 and a ≥ c. Therefore, since each bit of x is chosen independently, the resulting partially marked strings s, t and the bit xi are also independent, and so n−i i 1 XX Pr[a = j, b = 0] Pr[c = k]. Pr[ψi (x) 6= xi ] = Pr[xi = 0] · Pr[b = 0, a ≥ c] = · 2 k=0 j=k

By Corollary 21.2, for j 6≡ i (mod 2) we have Pr[a = j, b = 0] =



1

·

2i−1

and Pr[c = k] =

i−1

 −

i−j−1 2

1 2n−i

 ·



n−i

c b n−i−k 2

i−1

 ,

i−j−3 2

 .

Therefore, for every k ≤ i we have i X j=k

Pr[a = j, b = 0] =

1 2i−1

·



X

i−1



 −

i−j−1 2

k≤j≤i j6≡i (mod 2)

i−1



i−j−3 2

=

1 2i−1

 ·

i−1

b i−k−1 c 2

 ,

and so Pr[ψi (x) 6= xi ] ≤

min(i,n−i) 

1 2n+1

·

X

n−i

k=0

b n−i−k c 2



i−1 b i−k−1 c 2

 .

Let us assume that i ≥ n/2 (the case of i < n/2 is handled similarly). Then, using the fact that √  i−1 i ≤ O(2 / i) for all k, we have i−k−1 c b 2

 Pr[ψi (x) 6= xi ] = O

1 √

 i

134

·

1 2n−i

 n−i  X n−i · . n−i−k b c 2 k=0

By the identity  X  n−i  n−i  X n−i n−i = = 2n−i , n−i−k c j b 2 j=0 k=0 √ we get Pr[ψi (x) 6= xi ] = O(1/ i), and so, since we assumed that i ≥ n/2 we get that √ Pr[ψi (x) 6= xi ] = O(1/ n), as required.

22

Concluding Remarks and Open Problems

Bi-Lipschitz bijection between balanced halfspaces. Let a0 , . . . , an ∈ R. The halfspace determined by the ai ’s is the set of all points (x1 , . . . , xn ) ∈ {−1, 1}n such that a0 +a1 x1 +· · ·+an xn ≥ 0.53 A balanced halfspace is a halfspace with a0 = 0. The Boolean cube {−1, 1}n embedded in the natural way in {−1, 1}n+1 and the Hamming ball {x ∈ {−1, 1}n+1 : x1 + · · · + xn+1 ≥ 0} are two examples of balanced halfspaces. We showed a bi-Lipschitz bijection between them. It is therefore natural to ask the following question. Problem 22.1 Is there a bi-Lipschitz bijection between any two balanced halfspaces? Or even a bijection with constant average stretch from the Boolean cube {−1, 1}n to any balanced halfspace in {−1, 1}n+1 ? In functions terminology, the Boolean cube {−1, 1}n embedded in {−1, 1}n+1 is indicated by the dictator function, while the Hamming ball is indicated by the majority function. Problem 22.1 refers more generally to linear threshold functions. One attempt at solving Problem 22.1 positively, would be to generalize the partition of De Bruijn et al. to general halfspaces. Another interesting problem, inspired by Corollary 18.4, is the following. Problem 22.2 Is it true that any halfspace is bi-Lipschitz transitive? Bi-Lipschitz bijection of the hypercube mapping the half cube to the Hamming ball The following problem has been suggested to us by Daniel Varga. It asks whether the bijection given in Theorem 18.2 can be strengthened in the following way. Problem 22.3 Let n be even. Is there a bi-Lipschitz bijection f : {0, 1}n+1 → {0, 1}n+1 that maps the half cube to the Hamming ball? That is, for all x ∈ {0, 1}n+1 such that x1 = 1 the bijection satisfies f (x) ∈ Bn . 53

The {−1, 1}n representation of the Boolean cube is more natural in the context of halfspaces.

135

Tightness of the stretch from the Boolean cube to the Hamming ball. One may ask whether the constants 4 and 5 in Theorem 18.2 are tight. By a slight variation on the proof of Theorem 18.2, we can show that there exists a bijection φ : {0, 1}n → Bn with maxStretch(φ) ≤ 3, improving on Theorem 18.2 in this respect. However, the maximum stretch of φ−1 is unbounded. Theorem 22.4 For all even integers n, define the bijection φ : {0, 1}n → Bn as follows. Let x ∈ {0, 1}n , and let C = {ck , ck+1 , . . . , cn−k } be the symmetric chain from the partition of De Bruijn et al., that contains x. Let j be the index such that x = cj . Define, def

φ(x) =

 c

n−j

◦1

c ◦ 0 j

j ≤ n/2; otherwise.

Then, maxStretch(φ) = 3 and avgStretch(φ−1 ) = 2 + o(1). The proof of Theorem 22.4 is similar to the proof of Theorem 18.2, and thus we omit it. One can easily see that any bijection f : {0, 1}n → Bn has maximum stretch at least 2. Indeed, let y = f (x) ∈ Bn be a point with Hamming weight n/2 + 1. Then y has only n/2 neighbors in Bn , which cannot accommodate all n neighbors of x ∈ {0, 1}n . We do not know whether the stretch 3 of φ in Theorem 22.4 is tight or not, and leave it as an open problem. What is the smallest possible stretch of a bijection from Bn to {0, 1}n ? Are the constants 4 and 5 optimal if one considers only bi-Lipschitz bijections? Is the constant 20 in Corollary 18.4 optimal? Lower bounds on average and maximum stretch Problem 22.5 Exhibit an explicit subset A ⊂ {0, 1}n+1 of density 1/2 such that any bijection f : {0, 1}n → A has avgStretch(f ) = ω(1), or prove that no such subset exists. As a concrete candidate, we suggest to consider sets A = {x : f (x) = 1}, where f is a monotone noise-sensitive function (e.g., Tribes54 or Recursive-Majority-of-Three). A sufficiently strong positive answer to this question would imply a lower bound for sampling the uniform distribution on A by low-level complexity classes. Bijections from the Gale-Shapley algorithm for the stable marriage problem Let A, B be two subsets of {0, 1}n+1 with density 1/2. Consider the Gale-Shapley algorithm for the stable marriage problem, where each vertex v ∈ A ranks all the vertices in B according to their distance to v (breaking ties according some rule). What can be said about the average 54

We note that Tribes has density close to 1/2.

136

stretch of the bijection obtained from this algorithm? Two interesting settings are (1) A = {0, 1}n , B = Bn and (2) A, B are random subsets of {0, 1}n of density 1/2. For related work in this direction see Holroyd [47]. Another natural bijection to consider, suggested to us by Avishay Tal, is the one induced by the Hungarian method for the assignment problem [53].

137

Appendices: Proofs of Technical Claims from Part I A.1

Proof of Claim 3.2

The following claim improves over a similar claim that appeared in [34, Apdx D]. Claim 3.2, restated: If N K is even and the set of K-regular N -vertex graphs.

P

v∈[N ]

|dG (v) − K| ≤ 0 · N 2 , then G is 60 -close to

Proof We modify G in three stages, while keeping track of the number of edge modifications. In the first stage we reduce all vertex degrees to at most K, by scanning all vertices and omitting P def dG (v)−K edges incident at each vertex v ∈ H = {u : dG (u) > K}. Since v∈H (dG (v)−K) ≤ 0 N 2 , we obtain a graph G0 that is 20 -close to G such that dG0 (v) ≤ K holds for each vertex P v, because every omitted edge reduces v∈H (dG (v) − K) by at least one unit. Furthermore, P 0 2 v∈[N ] |dG0 (v) − K| ≤  · N , because each omitted edge {u, v} reduces either |d(u) − K| or |d(v) − K| (while possibly increasing the other by one unit). In the second stage, we insert an edge between every pair of vertices that are currently nonadjacent and have both degree smaller than K. Thus, we obtain a graph G00 that is 0 -close to G0 such that {v : dG00 (v) < K} is a clique (and dG00 (v) ≤ K for all v). In the third stage, we iteratively increase the degrees of vertices that have degree less than K while preserving the degrees of all other vertices. Denoting by Γ(v) the current set of neighbours of vertex v, we distinguish two cases. Case 1: There exists a single vertex of degree less than K. Denoting this vertex by v, we note that |Γ(v)| ≤ K − 2 must hold. We shall show that there exists two vertices u, w such that {u, w} is an edge in the current graph but u, w 6∈ Γ(v) ∪ {v}. Adding the edges {u, v} and {w, v} to the graph, while omitting the edge {u, w}, we increase |Γ(v)| by two, while preserving the degrees of all other vertices. We show the existence of two such vertices by starting with an arbitrary vertex u 6∈ Γ(v) ∪ {v}. Vertex u has K neighbors (since u 6= v)55 , and these neighbors cannot all be in Γ(v) ∪ {v} (which has size at most K − 1). Thus, there exists w ∈ Γ(u) \ (Γ(v) ∪ {v}), and we are done. Case 2: There exist at least two vertices of degree less than K. Let v1 and v2 be two vertices such that |Γ(vi )| ≤ K − 1 holds for both i = 1, 2. Note that {v1 , v2 } is an edge in the current graph, since the set of vertices of degree less than K constitute a clique. We shall show that there exists two vertices u1 , u2 such that {u1 , u2 } is an edge in the current graph but neither {v1 , u1 } nor {v2 , u2 } are edges (and so |Γ(u1 )| = |Γ(u2 )| = K). Adding the 55

Recall that, by the case hypothesis, all vertices other than v have degree K.

138

edges {u1 , v1 } and {u2 , v2 } to the graph, while omitting the edge {u1 , u2 }, we increase |Γ(vi )| by one (for each i = 1, 2), while preserving the degrees of all other vertices. We show the existence of two such vertices by starting with an arbitrary vertex u1 6∈ Γ(v1 ) ∪ {v1 , v2 }. Such a vertex exists since v2 ∈ Γ(v1 ) and so |Γ(v1 ) ∪ {v1 , v2 }| ≤ K < N . Vertex u1 has K neighbors (since u1 6∈ Γ(v1 ), whereas all vertices of lower degree are neighbors of v1 ). Note that Γ(u1 ) cannot be contained in Γ(v2 ) ∪ {v2 }, because v1 6∈ Γ(u1 ) whereas v1 ∈ Γ(v2 ) (and Γ(u1 ) ⊆ Γ(v2 ) ∪ {v2 } would have implied Γ(u1 ) ⊆ Γ(v2 ) ∪ {v2 } \ {v1 }, which is impossible since |Γ(u1 )| = K whereas |Γ(v2 ) ∪ {v2 } \ {v1 }| ≤ K − 1). Thus, there exists u2 ∈ Γ(u1 ) \ (Γ(v2 ) ∪ {v2 }). P Thus, in each step of the third stage, we decrease v∈[N ] |dG00 (v) − K| by two units, while preserving both the invariances established in the second stage (i.e., {v : dG00 (v) < K} is a clique and dG00 (v) ≤ K for all v). Since in each step we modified three edges, we conclude (k) that G00 is 30 -close to RN , and the claim follows (by recalling that G is 30 -close to G00 ).

A.2

Proofs of Propositions 3.5 and 3.6

Proposition 3.5, restated: Let ` ≥ 3 be an integer and t ∈ T` , where    {1, 2, 3} if ` ≡ 1 T` = {1} if ` ≡ 2   N if ` ≡ 0

(mod 2) (mod 4) (mod 4)

(43)

and k(N ) = 2N/t`. Then: 1. The set SC` C ∩ R(k) equals the set of graphs that consists of t super-cycle of length `, each containing N/t vertices, such that clouds that are at distance four apart have equal size. Furthermore, if ` 6≡ 0 (mod 4), then each cloud has size N/t`. √ 2. If a graph G = ([N ], E) is δ-close to both SC` C and R(k) , then G is O( δ)-close to SC` C ∩ R(k) , where the hidden constant depends polynomially on t`. Thus, SC` C ∩ R(k) has a two sided error POT. Proof Once Item 2 is proved, we use Theorem 3.3 to conclude that SC` C ∩ R(k) has a two sided error POT. The proof of Item 2 is facilitated by Item 1, which anyhow serves as a good warm-up towards Item 2. Our exposition breaks down to several cases (i.e., ` = 4, other ` ≡ 0 (mod 4), and ` 6≡ 0 (mod 4)), where in each case we first prove Item 1 and next prove Item 2.

139

We start with the case that ` = 4. Recalling that a super-cycle of length four can be viewed as a bi-clique, we consider the class BCC ∩ R(k) , where BCC denote the class of graphs that are each a collection of isolated bi-cliques. Suppose that G ∈ BCC ∩ R(k) , and consider the (1) (1) (m) (m) sequences of pairs (S0 , S1 ) through (S0 , S1 ) that are guaranteed by G ∈ BCC. That is, vertices u and v are connected in G if and only if there exist j ∈ [m] and i ∈ {0, 1} such that (j) (j) u ∈ Si and v ∈ S1−i . Using G ∈ R(k) , we infer that for every j ∈ [m] and every i ∈ {0, 1} it (j) holds that |Si | = k(N ) = N/2t. (Indeed, if N/2k(N ) is not an integer, then BCC ∩R(k) = ∅.) Thus, G may be viewed as a collection of t super-cycles of length four in which each cloud has size N/4t, and Item 1 follows. Towards Item 2, suppose that G is δ-close to both BCC and R(k) , and let G0 ∈ BCC be P δ-close to G. Then, G0 is 2δ-close to R(k) , and thus v∈[N ] |dG0 (v) − (N/2t)| ≤ 2δ · N 2 , √ where dG0 (v) denotes the degree of v in G0 . Fixing any integer q (e.g., q = Θ(1/ δ), we call a vertex v good if |dG0 (v) − (N/2t)| ≤ N/q, and note that all but at most 2qδ · N vertices are (1) (1) (m) (m) good. Consider the sequences of pairs (S0 , S1 ) through (S0 , S1 ) that are guaranteed by (j) (j) G0 ∈ BCC. If vertex v ∈ Si is good, then S1−i must have size at least (N/2t)−(N/q) > 2δN , and so must also contain a good vertex. Hence, any biclique containing a good vertex has (N/2t) ± (N/q) vertices on each side (i.e., each cloud). Letting m0 denote the number of p bicliques that contain good vertices and assuming q > 2(t + 1)2 and q < 1/δ, it follows that m0 = t (because m0 > t is ruled out by 2(t + 1)((N/2t) − (N/q)) > N , whereas m0 < t is ruled out by 2(t − 1)((N/2t) + (N/q)) < N − 2qδN ). Moving all vertices to these m0 = t bicliques and modifying the edges accordingly, we obtain a graph G00 that is 2qδ-close to G0 . Furthermore, each biclique of G00 has (N/2t) ± (N/q) good vertices on each side. Thus, by moving at most 2t · (N/q) + 2qδN = 2(tN/q + qδ)N vertices and modifying the edges accordingly, we obtain a graph in BCC ∩ R(k) . It follows that G is (δ + 4qδ + 2t/q)-close to √ BCC ∩ R(k) . Using q = 1/ 4δ, the claim follows in the current case (i.e., for ` = 4). We now turn to the general case that ` ≡ 0 (mod 4). Starting with Item 1, suppose that (1) (1) (m) (m) G ∈ SC` C ∩ R(k) and consider the sequences of clouds (S0 , ..., S`−1 ) through (S0 , ..., S`−1 ) that are guaranteed by G ∈ SC` C. Using G ∈ R(k) , we infer that for every j ∈ [m] and every i ∈ {0, 1, ..., ` − 1} it holds that (j)

(j)

|Si−1 mod ` | + |Si+1 mod ` | = k(N ) (j)

(j)

(44)

and so |Si | = |Si+4 mod ` | holds for every i, j. Combining the latter with (44), it follows that P (j) i |Si | = (`/2) · k(n) = N/t for every j, which establishes Item 1. (Indeed, if 2N/t` is not an integer, then SC` C ∩ R(k) = ∅.) Turning to Item 2, suppose that G is δ-close to both SC` C and R(k) , and let G0 ∈ SC` C be δ-close to G. Then, G0 is 2δ-close to R(k) , and thus all but at most 2qδ · N vertices have

140

√ degree k(N ) ± (N/q). (We shall again use q = Θ(1/ δ).) We call these non-exceptional (1) (1) (m) (m) vertices good. Consider the sequences of clouds (S0 , ..., S`−1 ) through (S0 , ..., S`−1 ) that (j) (j) are guaranteed by G0 ∈ SC` C. We say that a cloud Si is small (resp., big) if |Si | < 3qδN (j) (resp., if |Si | ≥ 3qδN ), and note that big clouds contain good vertices. For each j ∈ [m], we consider the following three cases. Case 1: all clouds in the j th super-cycle are small. Assuming that k(N )−N/q > 6qδ ·N , we conclude that vertices on such super-cycles are not good, and thus their total number is at most 2qδN . Case 2: the j th super-cycle contains four consecutive clouds that are big; that is, there (j) (j) (j) (j) exists an i0 such that the clouds Si0 , Si0 +1 mod ` , Si0 +2 mod ` , Si0 +3 mod ` are all big. In this case, we can proceed analogously to the perfect case (where all degrees equal k(N )), and (j) (j) infer that |Si | + |Si+2 mod ` | = k(N ) ± N/q holds for every i, and the number of vertices residing on this super-cycle is (`/2)·(k(N )±(N/q)) = (N/t)±(`N/2q). Details follow. The forgoing claim is established in ` − 3 iterations, where in the (i + 1)st iteration (j) (j) (j) (j) we use hypotheses regarding the sets Si0 +i mod ` , Si0 +i+1 mod ` , Si0 +i+2 mod ` , Si0 +i+3 mod ` (j) and make an inference reagrding the set Si0 +i+4 mod ` . Specifically, we assume that sets (j) (j) (j) Si0 +i+1 mod ` and Si0 +i+3 mod ` contain good vertices and that Si0 +i | ≥ 3qδN − (i − (j) (j) (j) 1)N/2q, and infer that |Si0 +i+4 mod ` | = |Si0 +i |±2N/q and thus |Si0 +i+4 mod ` | ≥ 3qδN − (j) 2iN/q. Using i < ` and 2`N/q < qδN , it follows that Si0 +i+4 mod ` contain good vertices. Having inferred that all clouds in the j th super-cycle contain good vertices, we infer that (j) (j) |Si | + |Si+2 mod ` | = k(N ) ± N/q holds for every i. Case 3: the j th super-cycle has big clouds but no four consecutive clouds are big. We shall show that in this case the set of these super-cycles is close to one that satisfies the conclusion of Case 2. Focusing on one such super-cycle (i.e., the j th super-cycle), suppose that the ith cloud is (j) big and the i − 1st cloud is small. Then, the i + 1st cloud must be big (since |Si−1 mod ` | + (j) |Si+1 mod ` | ≥ k(N )−N/q > 6qδN ), and so either the i+3rd or the i+2nd cloud must be (j) small (because otherwise Case 2 holds). It follows that |Si+1 mod ` | = k(N )±N/q±3qδN (j) (j) and |Si | + |Si+2 mod ` | = k(N ) ± N/q, which means that these three sets form a biclique with approximately k(N ) vertices on each side and approximately 2k(n) = 4N/t` vertices in total. Considering the approximate number of vertices counted in each of the three cases, we may ignore Case 1 and conclude that each super-cycle of Case 2 contributes (approximately) N/t vertices whereas the contribution of Case 3 comes in multiples of 4 · (N/t`). Thus, if we have t0 ≤ t super-cycles in Case 2, then we must have (t − t0 ) · `/4 bicliques 141

in Case 3. But in such a case, we can rearrange these bicliques in t − t0 super-cycles of length ` such that each super-cycle contains `/4 bicliques and `/2 small clouds such that each two consecutive bicliques are connected via a super-path of two small clouds (i.e., the resulting super-cycle will consist of ` clouds such that the ith cloud has size k(N ) ± (N/q) ± 3qδN if bi/4c ∈ {0, 1}, and is small otherwise). Indeed, this may require changing the edges of all vertices that reside in small clouds, but the number of such vertices is less than t` · 3qδN . Thus, G0 is 3t`qδ-close to a graph that consists of t super-cycles in which each vertex has degree k(N ) ± (N/q) ± 3qδN . It follows that G0 is t` · (6qδ + 1/q)-close to R(k) , and the claim follows (for any ` ≡ 0 (mod 4)). Finally, we turn to the case that ` 6≡ 0 (mod 4). Using notations as in the case of ` ≡ (j) (j) 0 (mod 4), we again have |Si | = |Si+4 mod ` | for every i, j. However, here (using ` 6≡ (j) (j) 0 (mod 4)), we can infer that |Si | = |Si+2 mod ` |, and (combining this with (44)) it follows (j) that |Si | = k(N )/2 = N/t` holds for each i, j. This establishes Item 1. (Indeed, if N/t` is not an integer, then SC` C ∩ R(k) = ∅ follows.) When proving Item 2, we again let G be δ-close to both SC` C and R(k) , and let G0 ∈ SC` C be δ-close to G. Using the same notions of ‘good’ and ‘small’ as before, we again consider the √ (j) (j) same three cases regarding each super-cycle (S0 , ..., S`−1 ) of G0 , where again q = Θ(1/ δ). Case 1: all clouds in the j th super-cycle are small. Again, we conclude that the total number of vertices on such super-cycles is at most 2qδN . Case 2: the j th super-cycle contains four consecutive clouds that are big. Again, it fol(j) (j) lows that all clouds on this super-cycle contain good vertices, and |Si | + |Si+2 mod ` | = k(N ) ± N/q for every i. However, here we can also infer that each cloud has size (j) (N/t`) ± (`N/q). This holds because, using ` 6≡ 0 (mod 4), it holds that |Si | = (j) (j) |Si+2 mod ` | ± 2(` − 1)N/q for every i, and |Si | = (k(N ) ± (2`N/q))/2 = (N/t`) ± (`N/q) follows. This means that, in this case, all clouds are big. Case 3: the j th super-cycle has both small and big clouds. Here we shall show that this case is actually impossible. As in the case of ` ≡ 0 (mod 4), we first observe that the approximate number of vertices that resides on super-cycles that satisfy Case 3 is a multiple of 4N/t`. Thus, if we have t0 ≥ 0 super-cycles in Case 2, then t0 · ` ≡ t` (mod 4), whereas t0 ≤ t ∈ T` . By considering both cases for ` 6≡ 0 (mod 4), we infer that t0 = t must hold56 , which leaves no room for Case 3. 56 If ` ≡ 1 (mod 2), then t0 ` ≡ t` (mod 4) implies t0 ≡ t (mod 4), which implies t0 = t (since t < 4). If ` ≡ 2 (mod 4), then t0 ` ≡ t` (mod 4) implies t0 ≡ t (mod 2), which implies t0 = t (since t = 1).

142

We conclude that almost all vertices reside in one of t super-cycles, which in turn contain clouds that are each of size (N/t`) ± (`N/q). Thus, G0 is t`2 N/q-close to SC` C ∩ R(k) , and the claim √ follows (when using q = Θ(1/ δ) again).

Proposition 3.6, restated: Let ` ≥ 3 and t ∈ N \ T` , where T` is as in (43). Then, for any integer k(N ) = 2N/t`, there exists an N -vertex graph in R(k) that is O(1/N )-close to SC` C but Ω(1)-far from SC` C ∩ R(k) . Proof Fix any ` 6≡ 0 (mod 4) and t ∈ N \ T` . Then, SC` C ∩ R(k) consists of super-cycles of length ` such that each cloud has size N/t` (see the proof of Item 1 of Proposition 3.5, and note that it does not use the hypothesis t ∈ T` ).57 Indeed, if N/t` is not an integer, then SC` C ∩ R(k) = ∅ and we are done. Otherwise, let i = 4 if ` is odd and i = 2 otherwise, and note that i ≤ t (since t ≥ 4 if ` is odd and t ≥ 2 if ` ≡ 2 (mod 4)). Note that i` ≡ 0 (mod 4), and consider the N -vertex graph G ∈ R(k) that consists of t − i super-cycles of length ` with clouds of size N/t` and i`/4 bicliques with 2N/t` vertices on each side. Observing that we can transform each such biclique into a super-cycle of length ` by moving ` − 2 vertices into singleton clouds, it follows that G is O(1/N )-close to SC` C (i.e., the distance of G to SC` C is smaller than 2i`2 · (2k(N )/N 2 ) ≤ 32`/tN ). The claim follows by observing that G is Ω(1)-far from SC` C ∩ R(k) (because the collection of i`/4 bi-cliques in G is far from a collection of i super-cycles of length ` with clouds of size N/t`).

A.3

Proof of Proposition 5.5

Proposition 5.5, restated: Let Π1 , . . . Πk be disjoint classes of distributions with domain [r]. Assume that for each i ∈ [k] the class Πi has a two-sided error POT that makes ti queries and has detection probability %i . Then, their union Π = ∪i∈[k] Πi has a two-sided error POT that P makes i∈[k] ti queries and has detection probability Ω(min{%i : i ∈ [k]}). Before proving the proposition, let us make a comment on the proof of Theorem 5.1. The proof gives us more information than what is stated in the theorem. Specifically, the proof gives us some relations between the class of distributions Π and the polynomial P in the statement of the theorem. This extra information is summarized in Proposition A.7 below. The following definition will be convenient in the statement of the proposition. Definition A.6 Let P(q1 , . . . , qr ) : ∆(r) → R be a polynomial of total degree at most t. Write P(q1 , . . . , qr ) =

X

αm · m(q1 , . . . , qr ),

m:deg(m)≤t 57

The hypothesis t ∈ T` is only used when establishing Item 2 of Proposition 3.5 (for the case of ` 6≡ 0 (mod 4)).

143

where the sum is over monomials m of total degree at most t, and αm ∈ R for each monomial m. The polynomial P is called γ-normalized if |αm | ≤ γ for each monomial m. Note that, for every γ-normalized polynomial P : ∆(r) → R of degree t, there is a homogeneous 2t γ-normalized polynomial P∗ of the same degree such that P(q) ≥ 0 if and only if P∗ (q) ≥ 0 (for every q in the domain). Specifically, suppose that P(q1 , . . . , qr ) = P m αm · m(q1 , . . . , qr ) is a non-zero polynomial of degree t. By multiplying each monomial P m of degree d < t by ( i∈[r] qi )t−d , we obtain a homogeneous polynomial P∗ (q1 , . . . , qr ) =  P Qt (r) v=(v1 ,...,vt )∈[r]t i=1 qvi · βv of degree t that agrees with P on every point in ∆ . Note P that for each v ∈ [r]t we have βv = m αm , where the sum is over all monomials m that Qt divide i=1 qvi . (Indeed, the above reproduces an argument that was outlined in the proof of Theorem 5.1.) We note that the proof of Theorem 5.1 actually establishes the following. Proposition A.7 (Theorem 5.1, revised): A property Π of distributions with domain [r] has a two-sided error POT that makes t queries and has detection probability % : (0, 1] → (0, 1] if and only if there exists a 1-normalized polynomial P : ∆(r) → R of degree t satisfying the following conditions: 1. P(q) ≥ 0 for every q ∈ Π. 2. If q ∈ ∆(r) is -far from Π, then P(q) < −%(). Moreover, if P is a γ-normalized polynomial of degree t, then Π = {q ∈ ∆(r) : P(q) ≥ 0} has a two-sided error POT TΠ that makes t queries, whose acceptance probability when testing q ∈ ∆(r) can be written as Pr[TΠ accepts q] =

1 1 + t+1 P(q). 2 2 γ

We now turn to the proof of Proposition 5.5. Proof We give a proof for the special case of k = 2. The proof for larger values of k follows by induction on k. Since each of the classes Πi (i = 1, 2) has a two-sided error POT, by Proposition A.7 there are 1-normalized polynomials Pi : ∆(r) → R of degree ti , such that Pi (q) ≥ 0 for all q ∈ Πi , and Pi (q) < −%i () for any q ∈ ∆(r) that is -far from Πi . Define a polynomial P : ∆(r) → R of degree t1 + t2 to be P(q) = −δ · (P1 (q) · P2 (q)), where δ > 0 is some constant (e.g., δ = 2−2(t1 +t2 +1) ), assuring that P is 2−t1 −t2 −1 -normalized. By applying Proposition A.7 with the polynomial P, we obtain a POT TΠ for Π = Π1 ∪ Π2 that makes t1 + t2 queries, as the total degree of P is at most t1 + t2 . 144

We show below that the detection probability of TΠ is Ω(min(%1 , %2 )). By Proposition A.7 it is enough to show that any q ∈ ∆(r) that is -far from Π satisfies the inequality P(q) < −Ω(min(%1 (), %2 ())). By continuity of P1 , since Π1 = {q : P1 (q) ≥ 0} is the preimage of closed set [0, ∞) ⊆ R under P1 , it follows that Π1 is a closed subset of ∆(r) . Similarly Π2 is also a closed subsets of ∆(r) . Therefore, since Π1 and Π2 are disjoint closed sets, there exists some γ > 0 such that dist(q (1) , q (2) ) > γ for all q (1) ∈ Π1 and q (2) ∈ Π2 . Fix  < γ/2 and let q = (q1 , . . . , qr ) be a distribution satisfying dist(q, Π) = . If dist(q, Π1 ) = , then, by letting η = dist(q, Π2 ), the triangle inequality gives us η > γ/2. Therefore, by using the bounds P1 (q) < −%1 () and P2 (q) < −%1 (η) we have P(q) = −δ · (P1 (q) · P2 (q)) < −δ · %1 () · %2 (η) ≤ −δ · %1 () · %2 (γ/2), where the second inequality follows from monotonicity of %2 . Similarly, if dist(q, Π2 ) = , then P(q) < −δ · %2 () · %1 (γ/2). In both cases we have P(q) < −Ω(min(%1 (), %2 ())), where the constant in the Ω() notation depends only on Π1 and Π2 (i.e., it is δ·min(%1 (γ/2), %2 (γ/2))). This completes the proof of Proposition 5.5.

A.4

Strengthening Corollary 5.8

In this section we strengthen Corollary 5.8 on testing classes of distributions defined by an ellipsoid. Recall the definition of Π(p,B) in (16). For p = (p1 , . . . , pr ) and B = (B0 ; B1 . . . Br ) ∈ Rr+1 , such that B0 ≥ 0 and Bi > 0 for all i ∈ [r], let Π(p,B) be the ellipsoid in ∆(r) defined as Π(p,B) = {q = (q1 , . . . , qr ) :

X

Bi (qi − pi )2 ≤ B0 }.

i∈[r]

Proposition A.8 Fix r ≥ 2, and let p = (p1 , . . . , pr ) and B = (B0 ; B1 . . . Br ) ∈ Rr+1 , such that Bi > 0 for all i = 0, 1, . . . , r. Then, the property Π(p,B) has a two-sided error POT, that makes two queries and has linear detection probability. Proof As in the proof of Corollary 5.8, define a polynomial P in r variables, that is nonnegative for all points (q1 , . . . , qr ) in the ellipsoid, and negative outside the ellipsoid. Namely P(q1 , . . . , qr ) = B0 −

X i∈[r]

145

Bi (qi − pi )2 .

Clearly Π(p,B) = {q : P(q) ≥ 0}. The following claim completes the proof of the proposition. Claim A.9 For any q = (q1 , . . . , qr ) that is -far from Π(p,B) , it holds that X

Bi (qi − pi )2 > B0 + Ω(),

(45)

i∈[r]

where the constant in the Ω() notation depends only on B. According to Claim A.9 for any q = (q1 , . . . , qr ) ∈ ∆(r) that is -far from Π(p,B) it holds that P(q1 , . . . , qr ) < −Ω(). The proposition follows by normalizing P and applying the characterization given in Proposition A.7. We return to the proof of Claim A.9. Proof Consider the distribution q 0 = (q10 , . . . , qr0 ), which is a convex combination of p and q, P P such that i∈[r] Bi (qi0 − pi )2 = B0 . The expression i∈[r] Bi (qi − pi )2 can be bounded from below as follows: X

X

Bi (qi − pi )2 =

i∈[r]

Bi ((qi0 − pi ) + (qi − qi0 ))2

i∈[r]

X

=

Bi (qi0 − pi )2 +

i∈[r]

X



X

Bi (qi − qi0 )2 + 2

i∈[r]

Bi (qi0 − pi )2 + 2

i∈[r]

X

X

Bi (qi0 − pi )(qi − qi0 )

i∈[r]

Bi (qi0 − pi )(qi − qi0 ).

i∈[r]

Since q 0 is a convex combination of p and q, for each i ∈ [r] it holds that qi0 ≥ pi if and only if P qi0 ≤ qi , and in particular (qi0 − pi )(qi − qi0 ) ≥ 0. Therefore i∈[r] Bi (qi − pi )2 can be bounded from below by  X

Bi (qi − pi )2 ≥

i∈[r]

X

Bi (qi0 − pi )2 + 2 min{Bi } ·  i∈[r]

i∈[r]

 X

(qi0 − pi )(qi − qi0 ) .

(46)

i∈[r]

By the choice of q 0 ∈ ∆(r) the first sum equals to B0 . Therefore, in order to prove the claim it P is enough to show that i∈[r] (qi0 − pi )(qi − qi0 ) > Ω(), where the constant in the Ω() notation depends only on B. Using again the fact that q 0 is a convex combination of p and q, we conclude that the vectors   q 0 − p and q − q 0 are co-linear, and thus X i∈[r]

(qi0

− pi )(qi −

qi0 )

=

sX

(qi0

i∈[r]

146

− pi

)2

·

sX i∈[r]

(qi − qi0 )2 .

(47)

We bound from below each term of the product separately. The first term is upper bounded as following. sX

v uX u 0 2 (qi − pi ) ≥ t

i∈[r]

i∈[r]

s

Bi maxj∈[r] {Bj }

· (qi0 − pi )2 =

B0 maxj∈[r] {Bj }

.

P where the equality is by the choice of q 0 that satisfies i∈[r] Bi (qi0 − pi )2 = B0 . The second term can be upper bounded by applying Cauchy-Schwarz inequality. √

r

sX

(qi − qi0 )2 ≥

i∈[r]

X

|qi − qi0 | > 2,

i∈[r]

where the second inequality uses the assumption that q 0 ∈ Π(p,B) , and q is -far from Π(p,B) . This implies a lower bound on (47). Specifically, we have X i∈[r]

2 (qi0 − pi )(qi − qi0 ) ≥ √ · r

s

B0 maxj∈[r] {Bj }

· .

Then, by plugging it in (46) we complete the proof of the claim.

A.5

Testable classes of distributions are not closed under taking complements

Following the remark in the end of Section 5.2, we describe a class of ternary distributions Π ⊆ ∆(3) that has a POT, while cl(∆(3) \ Π) does not have one58 . We start with the following claim. Claim A.10 Let D = {(x, y) ∈ [0, 1]2 : x + y ≤ 1} be a subset of R2 . For α ∈ (0, 1) let A = {(x, y) ∈ D : P(x, y) ≥ 0}, where P : D → R is the polynomial P(x, y) = y 2 −(x−α)·x2 . Then, there is no real polynomial Q such that cl(D \ A) = {(x, y) ∈ D : Q(x, y) ≥ 0}, where cl(D \ A) is the closure of the complement59 of A in D. Proof Note first that A can be written as A = {(x, y) ∈ D|x ≤ α} ∪ B, By cl(∆(3) \ Π) we refer to the set of all (q1 , q2 , q3 ) ∈ ∆(3) , such that for all  > 0 there is (q10 , q20 , q30 ) ∈ ∆ \ Π that satisfies 12 (|q1 − q10 | + |q2 − q20 | + |q3 − q30 |) < . 59 By cl(D \ A) we refer to the set of all (x, y) ∈ D, such that for all  > 0 there is (x0 , y 0 ) ∈ D \ A that satisfies 1 0 0 2 (|x − x | + |y − y |) < . 58

(3)

147

where

  p B = (x, y) ∈ D x ≥ α, y ≥ x (x − α) .

In particular the boundary of A is ∂A = written as ∂A =





  p x, x (x − α) x ∈ [α, 1] , which can also be

 √ x2 , x2 · (x − α) |x ∈ [ α, 1] .

Assume towards contradiction that there is a polynomial Q that satisfies the condition stated in the claim, namely cl(D \ A) = {(x, y) ∈ D : Q(x, y) ≥ 0}. Then, in particular (1) Q must be zero on the boundary of A, and (2) for any (x, y) ∈ A \ ∂A, it must be the case that Q(x, y) < 0. We prove below that no polynomial satisfies these two conditions simultaneously. Specifically we show that any polynomial satisfying (1), must vanish at the point (0, 0) ∈ A \ ∂A, thus contradicting condition (2). Let Q be a polynomial that vanishes of ∂A. Note that the polynomial P is irreducible60 , and √ the two polynomials P and Q agree on the curve ∂A = {(x2 , x2 · (x − α)) |x ∈ [ α, 1]}, Therefore, since the two polynomials have infinitely many common zeros, by Bezout’s theorem, they have a common non-trivial factor, i.e., there is a non-constant polynomial R, such that P = R · P0 and Q = R · Q0 , for some polynomials P0 and Q0 . However, since P is irreducible, we conclude that P0 is some constant polynomial and R = cP for some non-zero constant c ∈ R, and thus Q can be written as Q = c · P · Q0 . Therefore, since P vanishes at (0, 0) it follows that Q also vanishes and (0, 0). The claim follows. Using Claim A.10 we exhibit a property of ternary distributions Π that has a POT, while cl(∆(3) \ Π) does not have one. Proposition A.11 Let α ∈ (0, 1) and let P(x, y) = y 2 − (x − α) · x2 be as in Claim A.10. Define Π ⊆ ∆(3) to be Π = {(q1 , q2 , q3 ) ∈ ∆(3) : P(q1 , q2 ) ≥ 0}. Then, Π has a two-sided error POT, while the property cl(∆(3) \ Π) does not have one. Proof Clearly, by Theorem 5.1, Π has a two-sided error POT. In order to prove that Π0 := cl(∆(3) \ Π) does not have a two-sided error POT, it is enough to show that there is no polynomial P0 : ∆(3) → R, that satisfies Π0 = {(q1 , q2 , q3 ) ∈ ∆(3) : P0 (q1 , q2 , q3 ) ≥ 0}, which follows easily from Claim A.10. Details follow. 60

Namely, P cannot be written as a product of smaller degree. checked by P3 of two Ppolynomials P3 This can be P 3 3 writing P either as P(x, y) = (y 2 + ay + i=0 bi xi )( i=0 ci xi ) or P(x, y) = (y + i=0 di xi )(y + i=0 ei xi ), and verifying that P has no non-trivial factorizations.

148

Assume towards contradiction that such polynomial P0 exists. Define a real polynomial Q : D → R to be Q(x, y) = P0 (x, y, 1 − x − y), where D = {(x, y) ∈ [0, 1]2 : x + y ≤ 1}, as in Claim A.10. Note that (x, y, 1 − x − y) ∈ ∆(3) for all (x, y) ∈ D, and thus Q is well defined. Let A = {(x, y) ∈ D : P(x, y) ≥ 0}. It is a routine exercise to show that Q(x, y) ≥ 0 if and only if (x, y) ∈ cl(D \ A), by using the definition of Q, as well as the definitions of cl(∆(3) \ Π) and cl(D \ A). By Claim A.10, such polynomial does not exists, thus contradicting the assumption that Π0 has a POT.

A.6

Proof of Claim 6.1

Claim 6.1, restated: Let H = ([n], F ) be a fixed graph. For every two graphs G = ([N ], E) and G0 = ([N ], E 0 ) such that N ≥ qn, if G and G0 are -close, then |ρH (G) − ρH (G0 )| = where indH (G) = ρH (G) ·

N n



|indH (G) − indH (G0 )| ≤ n2 , N choosen

denotes the number of induced copies of H in G.

Proof Assume that G and G0 are -close. Then, there is a sequence of graphs on N vertices (G0 = G, G1 , . . . , Gt = G0 ), where t ≤ N 2 , such that Gi and Gi+1 differ by exactly one edge, for every i ∈ {0, . . . , t − 1}. Note that for every such i we have   N −2 |indH (Gi ) − indH (Gi+1 )| ≤ n−2 as one pair of vertices is contained in at most triangle inequality, we have

N −2 n−2



subgraphs on n vertices. Therefore, by the

t

|indH (G) − indH (G0 )| X |indH (Gi ) − indH (Gi+1 )|   ≤ ≤ N 2 N N n

n

i=1

N −2 n−2  N n

 ≤ n2 .

The claim follows.

A.7

Proof of Lemma 6.5

Lemma 6.5, restated: Fix t ≥ 2 and let ρ = (ρ1 , . . . , ρt ). Let G0 = ([N ], E 0 ) ∈ CC ≤t , and √ t 0 0 0 assume that dist(Sρt+1 , SGt+1  )-close to CC (ρ), where the constant in 0 ) ≤  . Then, G is O( the O() notation depends only on t. 149

We first observe that for any ρ = (ρ1 , . . . , ρt ) and ρ0 = (ρ01 , . . . , ρ0t ), the distance between (N -vertex) graphs in CC ≤t that have the corresponding clique-densities is upper bounded by def P δ = i∈[t] |ρi − ρ0i |. This holds because it suffices to move δN vertices among the cliques of one graph in order to result in the other. This observation, which will be used several times in the section, is summarized in the following claim. Claim A.12 Fix t ≥ 2. For given ρ = (ρ1 , . . . , ρt ) and ρ0 = (ρ01 , . . . , ρ0t ), let G = ([N ], E) ∈ P CC (ρ) and G0 = ([N ], E 0 ) ∈ CC (ρ0 ). If i∈[t] |ρi − ρ0i | < , then G is -close to G0 . A warm-up (the case t = 2). Before proving Lemma 6.5 for all t ≥ 2, let us consider a special case of t = 2, i.e., the graph G0 consists of two cliques of densities ρ0 and 1 − ρ0 . Then α00 := Pr[SG3 = I3 ] = 0 α10 := Pr[SG3 is a graph with exactly one edge] = ρ0 (1 − ρ0 ) α20 := Pr[SG3 = P3 ] = 0 α30 := Pr[SG3 = K3 ] = (ρ0 )3 + (1 − ρ0 )3 . By the hypothesis of Lemma 6.5 we have |ρ0 (1 − ρ0 ) − ρ(1 − ρ)| < 0 , which implies either √ √ |ρ − ρ0 | < O( 0 ) or |ρ − (1 − ρ0 )| < O( 0 ). By applying Claim A.12 we conclude the lemma for the special case of t = 2. The general case (i.e., t ≥ 2). We shall proceed in two steps corresponding to the two parts of the following lemma, while noting that the second part coincides with Lemma 6.5. Lemma A.13 (Lemma 6.5, rephrased) Fix t ≥ 2 and let G = ([N ], E) ∈ CC ≤t . Then 1. The distribution SGt+1 uniquely defines the graph G. That is, the distribution SGt+1 uniquely determines ρ such that G ∈ CC (ρ). 2. If G0 = ([N ], E 0 ) ∈ CC ≤t and the statistical distance between SGt+1 and SGt+1 is at most 0 √ t 0 0 0  , then G is O(  )-close to G, where the constant in the O() notation depends only on t. Proof Since G ∈ CC ≤t , it holds that G ∈ CC(ρ) for some ρ = (ρ1 , . . . , ρt ). In the first item we need to show that SGt+1 uniquely defines ρ. In order to achieve this goal, we use the following two claims. (which will be proved later). Claim A.14 There are s1 , s2 , . . . , st ∈ [0, 1] that depend only on SGt+1 such that X

(ρi )k = sk for k = 1, . . . , t.

i∈[t]

150

(48)

More specifically, each sk can be expressed as sk =

X

αk (H)SGt+1 (H),

(49)

H

where the sum is over graphs H with t + 1 vertices, and αk (H) depends only on k and H. Claim A.15 For any given s1 , . . . , st ∈ R the system of equation in (48) (in variables ρ1 , . . . , ρt ) has a unique solution over the complex numbers, up to a permutation of the variables. Given the two claims, consider the unique solution of the system (48). Since the clique densities of G constitutes a solution, it follows that it is the unique solution, which proves the first item of the lemma. For the second item, let G0 ∈ CC ≤t . Then we can write G0 ∈ CC(ρ0 ) for some ρ0 = P (ρ01 , . . . , ρ0t ). Define s0k analogously to Claim A.14, such that i∈[t] (ρ0i )k = s0k for k = 1, . . . , t. Applying expression (49) for every k = 1, . . . , t we have X X t+1 t+1 0 |sk − sk | = αk (H)SG (H) − αk (H)SG0 (H) H H X t+1 αk (H) · SG (H) − SGt+1 ≤ 0 (H) H

= 2 max{αk (H)} · dist(SGt+1 (H), SGt+1 0 (H)) H 0

= O( ), where the constant in the O() notation depends only on t. The following claim (to be proved later) allows us to complete the proof of the lemma. (Actually, given Claim A.14, the following claim is the actual core of the lemma.) Claim A.16 Let s1 , . . . , st , s01 , . . . , s0t ∈ [0, 1] such that, for every k ∈ [t], we have |sk − s0k | ≤ O(0 ). Assume that (ρ1 , . . . , ρt ) ∈ [0, 1]t satisfy X

(ρi )k = sk for k = 1, . . . , t,

(50)

(ρ0i )k = s0k for k = 1, . . . , t.

(51)

i∈[t]

and (ρ01 , . . . , ρ0t ) ∈ [0, 1]t satisfy X i∈[t]

√ Then, there is a permutation π of the index set {1, . . . , t}, that satisfies |ρi − ρ0π(i) | = O( t 0 ) for every i ∈ [t], where the constant in the O() notation depends only on t. 151

By Claim A.16 above, we have G ∈ CC (ρ) and G0 ∈ CC (ρ0 ) such that X

√ t |ρi − ρ0i | = O( 0 ).

i∈[t]

√ By applying Claim A.12, we conclude that G and G0 are O( t 0 )-close, where the constant in the O() notation depends only on t. This completes the proof of the lemma. We now turn to the proofs of the subclaims stated during the proof of Lemma A.13. The proofs rely on basic results regarding symmetric polynomials, power sums, and continuity of the roots in algebraic equation. P Proof of Claim A.14 For the special case of k = 1 we have s1 = 1, as i∈[t] ρi = 1. Therefore we let k ∈ {2, . . . , t} and consider the following polynomial. 

t+1−k

  X



(ρi )k  · 

i∈[t]

X

ρj 

.

(52)

j∈[t]

P P Since i∈[t] ρi = 1, the expression in (52) equals to i∈[t] (ρi )k . Thus it is enough to show that (52) can be written as X αk (H)SGt+1 (H) H

for some αk (H). Let H = ([t + 1], EH ) ∈ CC ≤t be a graph with at most t cliques of sizes (c1 , . . . , ct ), where some of ci ’s might be zeros. Then SGt+1 (H) =

1 XY (ρσ(i) )cσ(i) , KH σ∈S t

(53)

i∈[t]

where St denotes the set of permutations on [t], and KH ≥ 1 is some constant61 that depends only on (c1 , . . . , ct ). It is a standard fact that the set {SGt+1 (H) : H = ([t], E)}, considered as polynomials in the variables ρ1 , . . . , ρt , forms a basis for the space of symmetric polynomial of degree t + 1. Thus, since the expression (52) is a symmetric homogeneous polynomial of degree t + 1, it can be written as a linear combination of polynomials in (53). Therefore for every k the power sum P P t+1 k i∈[t] (ρi ) can be written as H αk (H)SG (H) for some coefficients αk (H). Proof of Claim A.15 The system of equations (48), can be solved using Newton’s identities, by reducing the problem to a univariate monic polynomial of degree t. More explicitly, define 61

In fact, KH = k1 ! · k2 ! · · · kt !, where ki = |{j : cj = i}| is the number of i-cliques in H.

152

the matrix Mk (s) ∈ Rk×k as 

s1 1 0   s2 s1 2    s3 s2 s1 Mk (s) =   .. .. ..  . . .   s  k−1 sk−2 sk−3 sk sk−1 sk−2

... ... .. . .. . ... ...

0 0

0 0



    0 0   ..  .. . .   s1 k − 1   s2 s1

for k = 1 . . . t.

(54)

Define Λk (s) to be (−1)k Λk = Λk (s) = det[Mk (s)], k!

(55)

and consider the polynomial Ps (x) = xt + Λ1 xn−1 + Λ2 xn−2 + · · · + Λn . Then, the t roots of Ps are the unique, up to a permutation, solution of the system (48) (for details see, e.g., [73]). Proof of Claim A.16 As explained in the proof of Claim A.15, the vector (ρ1 , . . . , ρt ) consist exactly of all the roots of the monic polynomial Ps (x) = xt + Λ1 xt−1 + Λ2 xt−2 + · · · + Λt ,

(56)

Q where Λk ’s are given in (55). That is Ps (x) = ti=1 (x − ρi ). Similarly, the vector (ρ01 , . . . , ρ0t ) contains exactly all the roots of the polynomial Ps0 (x) = xt + Λ01 xt−1 + Λ02 xt−2 + · · · + Λ0t with Λ0k = Λ0k (s0 ) defined analogously, i.e., Ps0 (x) = Next, we claim that |Λk − Λ0k | = O(0 )

153

Qt

i=1 (x

(57)

− ρ0i ).

for k = 1, . . . , t.

(58)

Indeed, by the assumption of the claim we have |si − s0i | < O(0 ) for every i = 1, . . . , t. Thus, considering (54) and (55), we have (−1)k (−1)k 0 |Λk − = det[Mk (s)] − det[Mk (s )] k! k! t t Y 1 X Y [ triangle inequality] ≤ (M (s))i,σ(i) − (M (s0 ))j,σ(j) k! σ∈S i=1 j=1 "k t # t Y Y (M (s))i,σ(i) − (M (s0 ))j,σ(j) ≤ max Λ0k |

σ∈Sk

i=1

j=1

0

≤ O( ), where the last inequality relies on the fact that all entries of the matrices M (s) and M (s0 ) are between 0 and k − 1, together with the observation that for all m1 , . . . , mk ∈ [0, k − 1] and Q Q i , . . . , k , such that |i | < O(0 ) for all i, we have | i∈[k] (mi + i ) − i∈[k] mi | = O(0 ). Having established (58), we have two monic polynomials Ps and Ps0 in (56) and (57), whose coefficients differ by at most O(0 ). Recall, the roots of Ps are (ρ1 , . . . , ρt ), and the roots of Ps0 are (ρ01 , . . . , ρ0t ). The claim follows from continuity of the roots of monic polynomials. Specifically, we quote a theorem due to Ostrowski [61, Appendix A]. P P Theorem A.17 Let f and g be two monic polynomials f (z) = ti=1 ai z i and g(z) = ti=1 bi z i such that at = bt = 1. Let x1 , . . . , xt be the t roots of f (z) (with multiplicities), and let y1 , . . . , yt be the roots of g(z). For γ = max{|xi |, |yi |}

(59)

v u t uX t =t |bi − ai |γ i .

(60)

i∈[t]

introduce the expression

i=1

Then, the roots xi and yi can be ordered in such a way that |xi − yi | < 2t

for i = 1, . . . , t.

(61)

We apply Theorem A.17 to the polynomials Ps and Ps0 . By (58), the coefficients of the polynomials differ by at most O(0 ). Recall that the roots of the polynomials are the densities of the cliques in G and G0 , and hence they all lie in the interval [0, 1]. Therefore, γ in (59) is √ bounded by 1, and hence  is (60) is bounded by O( t 0 ). Hence, by Theorem A.17 there is an √ ordering of the roots such that |ρi − ρ0i | < O( t 0 ). This completes the proof of Claim A.16.

154

Publications Not Included in the Thesis This section includes a short summary of my publications that have not been included in this thesis. Direct Sum Testing In a joint work with David et al. [DDG+ 13] we study the Direct Sum Testing Problem. The k-fold direct sum encoding of a string a ∈ {0, 1}n is the function f which P takes as input subsets S ⊆ [n] of size k, and whose output on such an S is f (S) = i∈S ai (mod 2). In the Direct Sum Testing Problem, we are given a function f , and our goal is to test whether f is (close to) a direct sum encoding of some sting a ∈ {0, 1}n . Identifying the subsets of [n] with vectors in {0, 1}n in the natural way, this problem can be thought of as linearity testing of functions whose domain is restricted to the k’th layer of the hypercube. For this problem we consider a variant of the natural 3-query linearity test introduced by Blum, Luby, and Rubinfeld [18]. The test works by picking inputs S, T ⊆ [n] of size k that are correlated so that their symmetric difference S∆T is also of size k and checks that f (S) + f (T ) = f (S∆T ). The proof proceeds via a new proof for linearity testing on the hypercube, which extends also to our setting. An interesting step in the proof is a relation between Direct Sum Testing Problem and the well studied Direct Product Testing Problem [27, 49, 29]. On the Conditional Hardness of Coloring In a joint work with Irit Dinur [DS10] we study the work of [28] that describes a reduction from Unique Games to the graph coloring problem. We find that (under appropriate conjecture) a careful calculation of the parameters in [28] implies hardness of coloring a 4-colorable graph with logc (log(n)) colors for some constant c > 0. By giving a tighter analysis of the reduction we show hardness of coloring a 4-colorable graph with logc (n) colors for some constant c > 0. The main technical contribution of the paper is a variant of the Majority is Stablest Theorem [59], which says that among all balanced functions in which every coordinate has o(1) influence, the Majority function has the largest noise stability. We adapt the theorem for our applications to get a better dependency between the parameters required for the reduction. A Note on Subspace Evasive Sets This is a joint work with Avraham Ben-Aroya [BAS13]. A subspace-evasive set over a field F is a subset of Fn that has small intersection with any lowdimensional affine subspace of Fn . More formally, we say that a set S ⊆ Fn is (k, c)-subspace evasive if for every affine subspace W ⊆ Fn of dimension k it holds that |S ∩ W | ≤ c. Interest in subspace evasive sets began in the work of Pudlák and Rödl [63] in the context of explicit constructions of Ramsey graphs. More recently, Guruswami [41] showed that obtaining such sets over large fields can be used to construct capacity-achieving list-decodable codes with a

155

constant list size. In the later setting we think of the field F being of size polynomial in n, that is, it is required O(log(n)) bits to specify an element of F. A standard probabilistic argument shows that over any finite field F there exists a (k, O(k/))subspace evasive sets of size |F|(1−)n . Recently, Dvir and Lovett [30] gave an elegant algebraic construction of subspace evasive sets. Their construction gives sets of size |F|(1−)n that are k (k, k )-subspace evasive. In this work we construct subspace evasive sets with slightly better n (1−)n parameters. which  Specifically, we give an explicit constriction of a set S ⊆ F of size |F|  k is k, 2 -subspace evasive, thus slightly improving the parameters of Dvir and Lovett. Note that our construction is not optimal in terms of intersection size. Namely, probabilistic construction gives a set that intersects every subspace of dimension k in at most O(k/) points, while our construction gives intersection that is exponential in k. This means that in order to get an explicit subspace-evasive set with optimal parameters one needs to have a more “global” construction. This seems like an interesting (and possibly challenging) open problem for future work. The second part of the paper shows that for a certain range of the parameters the subspaceevasive sets obtained using the probabilistic method are almost optimal (up to a multiplicative constant factor). Roughly speaking we prove that for any k and  and for any set S ⊆ Fn of size |F|(1−)n , there exists a subspace W ⊆ Fn of dimension dim(W ) = k that intersects S on  at least Ω k points, The proof relies of the Kövári-Sós-Turán Theorem saying that any dense enough bipartite graph contains a large bi-clique. Excited Random Walk with Periodic Cookies In a joint work with Gady Kozma and Tal Orenshtein [KOS13] we analyze a random walk on Z, where the transition probabilities in each step depends on the number of visits to the current vertex up until now. Specifically, we study a random walk on Z defined by parameters (p1 , . . . , pM ) ∈ [0, 1]M for some positive integer M , where the walker upon the ith visit to z ∈ Z moves to z + 1 with probability pi (mod M ) , and moves to z − 1 with probability 1 − pi (mod M ) . We give an explicit formula in terms of the parameters (p1 , . . . , pM ) which determines whether the walk is recurrent, transient to the left, PM 1 or transient to the right. In particular, in the case that M1 i=1 pi = 2 all behaviors are possible, and may depend on the order of the pi . The basic approach we use is due to Kesten-Kozlov-Spitzer who reduces the question of recurrence/tranisience of excited random walk to a question about a certain Markov chains on N, which are simpler to analyze using relatively standard techniques.

156

List of Publications [AS13] Omer Angel and Igor Shinkar. A Tight Upper Bound on Acquaintance Time of Graphs. Submitted, 2013. Available at arXiv.org:1307.6029. [BAS13] Avraham Ben-Aroya and Igor Shinkar. A Note on Subspace Evasive Sets. Submitted, 2012. Available at ECCC TR12-095. [BCS13] Itai Benjamini, Gil Cohen, and Igor Shinkar. Bi-Lipschitz Bijection between the Boolean Cube and the Hamming Ball. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS ’2014), 2014. Available at ECCC TR13-138. [BST13] Itai Benjamini, Igor Shinkar, and Gilad Tsur. Acquaintance Time of a Graph. SIAM J. Discrete Math., 28(2):767–785, 2014 Available at arXiv.org:1302.2787. [DDG+ 13] Roee David, Irit Dinur, Elazar Goldenberg, Guy Kindler, and Igor Shinkar. Direct Sum Testing. Submitted, 2013. Available at ECCC TR14-002. [DS10] Irit Dinur and Igor Shinkar. On the Conditional Hardness of Coloring a 4-Colorable Graph with Super-Constant Number of Colors. In Proceedings of the 13th Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX ’2010), 2010. Available at ECCC TR13-148. [GS12] Oded Goldreich and Igor Shinkar. Two-Sided Error Proximity Oblivious Testing. In Proceedings of the 16th International Workshop on Randomization and Computation (RANDOM ’2012), 2012. Available at ECCC TR12-021. [KOS13] Gady Kozma, Tal Orenshtein, and Igor Shinkar. Excited Random Walk with Periodic Cookies. Submitted, 2013. Available at arXiv:1311.7439. [OS11] Tal Orenshtein and Igor Shinkar. Greedy Random Walk. Combinatorics, Probability and Computing, 23(2):269-289, 2014. Available at arXiv:1101.5711.

157

References [1] D. J. Aldous. Lower Bounds for Covering Times for Reversible Markov Chains and Random Walks on Graphs. Journal of Theoretical Probability, 2:91–100, 1989. [2] D. J. Aldous. Random Walk Covering of Some Special Trees. Journal of Mathematical Analysis and Applications, 157:271–283, 1991. [3] R. Aleliunas, R. M. Karp, R. J. Lipton, L. Lovasz, and R. Charles. Random Walks, Universal Traversal Sequences, and the Complexity of Maze Problems. In Proceedings of the 20th Annual Symposium on Foundations of Computer Science, pages 218–223, 1979. [4] N. Alon, I. Benjamini, E. Lubetzky, and S. Sodin. Non-backtracking Random Walks Mix Faster. Communications in Contemporary Mathematics, 9:585–603, 2007. [5] N. Alon, F. R. K. Chung, and R. L. Graham. Routing permutations on graphs via matchings. SIAM Journal on Discrete Mathematics, 7(3):513–530, 1994. [6] N. Alon, E. Fischer, M. Krivelevich and M. Szegedy. Efficient Testing of Large Graphs. Combinatorica, Vol. 20, pages 451–476, 2000. [7] O. Angel and I. Benjamini. A phase transition for the metric distortion of percolation on the hypercube. Combinatorica, 27(6):645–658, 2007. [8] O. Angel and I. Shinkar. A tight upper bound on acquaintance time of graphs. Available from http://arxiv.org/abs/1307.6029, 2013. [9] T. Austin. On the failure of concentration for the `∞ -ball. http://arxiv.org/abs/1309.3315.

2013.

[10] C. Avin and B. Krishnamachari. The Power of Choice in Random Walks: An Empirical Study. Computer Networks, 52(1):44–60, 2008. [11] G. Barnes and U. Fiege. Short random walks on graphs. SIAM Journal on Discrete Mathematics, 9(1):19–28, 1996. [12] I. Benjamini, O. Gurel-Gurevich, and B. Morris. Linear Cover Time is Exponentially Unlikely. Probability Theory and Related Fields, 155(1-2) 451–461, 2013. [13] I. Benjamini, I. Shinkar, and G. Tsur. Acquaintance time of a graph. Available from http://arxiv.org/abs/1302.2787, 2013. [14] I. Benjamini and D. B. Wilson. Excited Random Walk. Elect. Comm. in Probab., 8:86–92, 2003. 158

[15] P. Berenbrink, C. Cooper, R. Elsässer, T. Radzik, and T. Sauerwald. Speeding up random walks with neighborhood exploration. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, 1422–1435, Philadelphia, PA, USA, 2010. [16] P. Berenbrink, C. Cooper, and T. Friedetzky. Random walks which prefer unvisited edges, and exploring high girth even degree expanders in linear time. In Proceedings of the 31st Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, 2012. [17] A. Björklund, T. Husfeldt, and S. Khanna. Approximating longest directed paths and cycles. In Proceedings of the 31st International Colloquium on Automata, Languages and Programming, pages 222–233, 2004. [18] M. Blum, M. Luby, and R. Rubinfeld. Self-testing/correcting with applications to numerical problems. In Proceedings of the 22nd annual ACM Symposium on Theory of Computing, (STOC ’90) pages 73–83, 1990. [19] J. Bochnak, M. Coste, and M. Roy. Real Algebraic Geometry. Springer, 1998. [20] B. Bollobás. Random Graphs. Cambridge University Press, 2001. [21] R. Boppana. The average sensitivity of bounded-depth circuits. Information Processing Letters, 63(5):257–261, 1997. [22] A. Z. Broder and A. R. Karlin. Bounds on the cover time. Journal of Theoretical Probability, 2:101–120, 1989. [23] N. G. de Bruijn, C. van Ebbenhorst Tengbergen, and D. Kruyswijk. On the set of divisors of a number. Nieuw Arch. Wiskunde (2), 23:191–193, 1951. [24] L. A. Bunimovich and S. E. Troubetzkoy. Recurrence properties of lorentz lattice gas cellular automata. Journal of Statistical Physics, 67:289–302, 1992. [25] J. Czyzowicz, D. Dereniowski, L. Gasieniec, R. Klasing, A. Kosowski, and D. Pajak. Collision-free network exploration. In Proceedings of the 11th Latin American Theoretical INformatics Symposium, 2014. [26] N. Chen. On the approximability of influence in social networks. SIAM Journal on Discrete Mathematics, 23(5):1400–1415, 2009. [27] I. Dinur and E. Goldenberg. Locally testing direct product in the low error range. In Proceedings of the 49th IEEE Symposium on Foundations of Computer Science, (FOCS ’2008), pages 613–622, 2008.

159

[28] I. Dinur, E. Mossel, and O. Regev. Conditional hardness for approximate coloring. SIAM Journal on Computing, 39(3):843–873, 2009. [29] I. Dinur and D. Steurer. Direct product testing. In Proceedings of the 29th IEEE Conference on Computational Complexity (CCC ’14), 2014. [30] Z. Dvir and S. Lovett. Subspace evasive sets. In Proceedings of the 44th ACM Symposium on Theory of Computing, 2012. [31] U. Feige. A tight lower bound on the cover time for random walks on graphs. Random Structures & Algorithms, 6(1):433–438, 1995. [32] E. Friedgut. Boolean functions with low average sensitivity depend on few coordinates. Combinatorica, 18(1):27–35, 1998. [33] T. Friedrich and T. Sauerwald. The cover time of deterministic random walks. In Proceedings of the 16th annual international conference on Computing and combinatorics, COCOON’10, pages 130–139, Berlin, Heidelberg, 2010. Springer-Verlag. [34] O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. Journal of the ACM, pages 653–750, July 1998. Extended abstract in 37th FOCS, 1996. [35] O. Goldreich and D. Ron. Property Testing in Bounded Degree Graphs. Algorithmica, Vol. 32 (2), pages 302–343, 2002. Extended abstract in 29th STOC, 1997. [36] O. Goldreich and D. Ron. Algorithmic Aspects of Property Testing in the Dense Graphs Model. SIAM Journal on Computing, Vol. 40, No. 2, pages 376–445, 2011. Extended abstract in 13th RANDOM, LNCS 5687, 2009. [37] O. Goldreich and D. Ron. On Proximity Oblivious Testing. SIAM Journal on Computing, Vol. 40, No. 2, pages 534–566, 2011. Extended abstract in 41st STOC, 2009. [38] O. Goldreich and L. Trevisan. Three theorems regarding testing graph properties. Random Structures and Algorithms, Vol. 23 (1), pages 23–57, August 2003. [39] O. Goldreich and L. Trevisan. Errata to [38]. Manuscript, August 2005. Available from http://www.wisdom.weizmann.ac.il/∼oded/p_ttt.html [40] R. L. Graham. Isometric embeddings of graphs. Selected Topics in Graph Theory, 3:133– 150, 1988. [41] V. Guruswami. Linear-algebraic list decoding of folded reed-solomon codes. In Proceedings of the 26th IEEE Conference on Computational Complexity (CCC ’11), 2011. 160

[42] L. H. Harper. Optimal numbering and isoperimetric problems on graphs. Journal of Combinatorial Theory, (1):385–393, 1966. [43] S. Hart. A note on the edges of the n-cube. Discrete Mathamatics, 14(2):157–163, 1976. [44] J. Hastad. Almost optimal lower bounds for small depth circuits. In Proceedings of the eighteenth annual ACM Symposium on Theory of Computing, pages 6–20. ACM, 1986. [45] J. Hastad, T. Leighton, and M. Newman. Reconfiguring a hypercube in the presence of faults. In Proceedings of the nineteenth annual ACM Symposium on Theory of Computing, pages 274–284, 1987. [46] S. T. Hedetniemi, S. M. Hedetniemi, and A. Liestman. A survey of gossiping and broadcasting in communication networks. Networks, 18(4):319–349, 1988. [47] A. E. Holroyd. Geometric properties of poisson matchings. Probability Theory and Related Fields, 150(3–4):511–527, 2011. [48] S. Hoory, N. Linial, and A. Wigderson. Expander Graphs and Their Applications. Bull. Amer. Math. Soc., 43:439–561, 2006. [49] R. Impagliazzo, V. Kabanets, and A. Wigderson. New direct-product testers and 2-query PCPs. SIAM J. Comput., 41(6):1722-1768, 2012. [50] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137–146, 2003. [51] S. Khot. Improved inaproximability results for maxclique, chromatic number and approximate graph coloring. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, pages 600–609, 2001. [52] W.B. Kinnersley, D. Mitsche, and P. Prałat. A note on the acquaintance time of random graphs. The Electronic Journal of Combinatorics, 20(3), 2013. [53] H. W. Kuhn. The Hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955. [54] N. Linial. Finite metric spaces - combinatorics, geometry and algorithms. In Proceedings of the International Congress of Mathematicians III, pages 573–586, 2002. [55] S. Lovett and E. Viola. Bounded-depth circuits cannot sample good codes. Computational Complexity, 21(2):245–266, 2012. 161

[56] C. Lund and M. Yannakakis. On the hardness of approximating minimization problems. Journal of the ACM, 41(5):960–981, 1994. [57] N. Lynch. Log space recognition and translation of parenthesis languages. Journal of the ACM, 24(4):583–590, 1977. [58] N. Madras and C. C. Wu. Self-Avoiding Walks on Hyperbolic Graphs. Comb. Probab. Comput., 14(4):523–548, 2005. [59] E. Mossel, R. O’Donnell, and K. Oleszkiewicz. Noise stability of functions with low influences: Invariance and optimality. Annals of Mathematics, 171(1):295–341, 2010. [60] T. Muller and P. Prałat. The acquaintance time of (percolated) random geometric graphs. Available from http://arxiv.org/abs/1312.7170, 2013. [61] A. M. Ostrowski. Solution of Equations and Systems of Equations. Academic Press, 1960. [62] R. Pemantle. A Survey of Random Processes with Reinforcement. Probability surveys, 4:1–79, 2007. [63] P. Pudlák and V. Rödl. Pseudorandom sets and explicit construction of Ramsey graphs. Quaderni di Matematica, 13:327–346, 2004. [64] D. Reichman. New bounds for contagious sets. Discrete Mathematics, 312(10):1812– 1814, 2012. [65] D. Ron. Property testing: A learning theory perspective. Foundations and Trends in Machine Learning, 1(3), pages 307–402, 2008. [66] D. Ron. Algorithmic and analysis techniques in property testing. Foundations and Trends in TCS, 5(2), pages 73–205, 2009. [67] R. Rubinfeld and M. Sudan. Robust characterization of polynomials with applications to program testing. SIAM Journal on Computing, 25(2), pages 252–271, 1996. [68] T. Ruijgrok and E. Cohen. 133(7–8):415–418, 1988.

Deterministic Lattice Gas Models.

Physics Letters A,

[69] P. Solernó. Effective Łojasiewicz Inequalities in Semialgebraic Geometry. Applicable Algebra in Engineering, Communication and Computing, Vol. 2 (1), pages 1–14, 1990. [70] E. Viola. Extractors for circuit sources. In IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pages 220–229. IEEE, 2011.

162

[71] E. Viola. The complexity of distributions. SIAM Journal on Computing, 41(1):191–218, 2012. [72] J. H. van Lint and R.M. Wilson. A Course in Combinatorics. Cambridge University Press, Cambridge, 2001. [73] Y. Wu and C. Hadjicostis. On Solving Composite Power Polynomial Equations. Math. Comput., 74(250):853–868, 2005. [74] Y. Yaari. M.Sc. Thesis, Weizmann Institute of Science, Israel. 2011. [75] D. Zuckerman. A Technique for Lower Bounding the Cover Time. In Proceedings of the 22nd annual ACM Symposium on Theory of Computing, STOC ’90, pages 254–259, New York, NY, USA, 1990. ACM.

163

דוקטור לפילוסופיה

and it is a recurring theme in the analysis of Boolean functions that they are, ..... in the first part, and apply the results to further study of graph properties in the ...

1MB Sizes 3 Downloads 49 Views

Recommend Documents

No documents