lec8.pdf (rutgers.edu)

Lecture 1 (mit.edu)

# # introduction

Suppose we have a Boolean function $f:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\}$, it defines a communication problem as follows: there are two players, say Alice and Bob, who get inputs $x\in\{0,1\}^n$ and $y\in\{0,1\}^n$, respectively, and their goal is to compute $f(x,y)$.

For instance, in the equality problem, Alice and Bob would like to evaluate:

$\text{EQ}(x,y)=\begin{cases} 1&x=y\\ 0&x\neq y \end{cases}$

Considering neither Alice nor Bob has the entire input on their own, the players need to communicate with each other to determine the value of $f(x, y)$. The communication happens according to a protocol and is done as follows:

Alice sends a single bit to Bob based solely on her input; after receiving this bit, Bob responds back with a single bit of his own which is a function of his input and the message of Alice; the players continue this throughout the protocol; we assume that the last bit communicated by any player is the output of the problem.

The main measure of efficiency in this model is the communication cost of the protocols, defined as follows.

Definition(Communication cost). The communication cost of a protocol $\pi$, denoted by $\|\pi\|$, is the worst-case number of bits communicated between Alice and Bob in $\pi$ over any choice of inputs $x,y$.

Definition(Deterministic communication complexity). The deterministic communication complexity of function $f$ is defined as $D(f)=\min_{\pi}\|\pi\|$, where $\pi$ ranges over all protocols that can solve $f$.

Note that $D(f)\leq 2n$ for any function as Alice can send all here $n$-bit input to Bob, while Bob responding with $n-1$ 0’s between each one, and then Bob can output the final answer in another one bit.

# # Protocol Trees

An easy way of describing a protocol between Alice and Bob is by using a protocol tree:

At the root of the tree, the left-child node contains all $(x, y)$ pairs such that Alice sends bit 0 for them and the right child-node contains the $(x, y)$ pairs where Alice sends 1 for them. We continue this way so that whenever we are at a node $N$, the root-to-node path corresponds to the bits communicated so far.

Theorem(Rectangle Property). For a node $N$ in the protocol tree, the set $S\subseteq\{0,1\}^n\times\{0,1\}^n$ assigned to it forms a combinatorial rectangle, i.e. $S=X\times Y$ for $X,Y\subseteq\{0,1\}^n$. In other word:

$\begin{cases} (x,y)\in S\\ (a,b)\in S \end{cases} \Rightarrow \begin{cases} (x,b)\in S\\ (a,y)\in S\\ \end{cases}$

Proof:

Here is a video illustrating [Protocol Trees and Combinatorial Rectangles](Protocol Trees and Combinatorial Rectangles - YouTube) (strongly recommended). Let’s assume $M$ is a matrix representing function $f$:

thus when Alice sends a bit, then some rows of the matrix is erased since the bit sent by Alice is only determined by the input for Alice and message already sent/received. The message is represented by the path from root to the node, which is fixed. So when we track the path from root to node $N$, we are erasing some rows of matrix, some columns of matrix,… since we know the protocol and the message already sent.

For example, if we see Alice sends 0, then the input for Alice cannot be 00, so we erase a row of of matrix. Every path corresponds to a list of alternative removement, leading to a sub-matrix of all same elements (which is called monochromatic combinatorial rectangle, and “monochromatic” is for all same):

then we can compute the function $f$. The theorem is saying that the left entries forms a sub-matrix, thus entries $(x,y),(a,b)$ are left indicates $(x,b),(y,a)$ are left.

$\square$.

Let $C(f)$ denote the minimum monochromatic combinatorial rectangles needed to partition $M_f$ (the matrix representation for $f$).

Theorem: The number of leaves in a protocol tree is at least $C(f)$, thus the height of the tree is at least $\lceil\log C(f)\rceil$.

Example: For function $\text{EQ}$,

$M_{\text{EQ}}=\begin{pmatrix} 1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1 \end{pmatrix}$

thus $C(\text{EQ})\geq n$.

# # A steaming lower bound for Distinct Element Problem

Theorem. Any deterministic streaming algorithm for computing DE exactly requires $\Omega(\min\{n,m\})$ bits, given a stream of $n$ elements from the universe $[m]$.

Proof: The proof comes from a reduction from Equality.

Consider an instance $(x,y)$ of EQ problem. Define the sub-streams:

$\sigma_A:=\{i:x_i=1\},\sigma_B:=\{j:y_j=1\}$

where $x_i,y_j$ denote the $i$-th/$j$-th bits of $x,y$. And let $\sigma:=\sigma_A\circ\sigma_B$ be the concatenation of two sub-streams.

An important property is that when $\text{EQ}(x,y)=1$, we have $\text{DE}(\sigma)=\text{DE}(\sigma_A)=\text{DE}(\sigma_B)$, while $\text{EQ}(x,y)=0$, either $\text{DE}(\sigma)\neq\text{DE}(\sigma_A)$ or $\text{DE}(\sigma)\neq\text{DE}(\sigma_B)$.

Assume there is a deterministic streaming algorithm $\mathcal{A}$ for computing DE that requires $s$ bits of space, now consider the following protocol $\pi$ for EQ based on $\mathcal{A}$. Alice generates $\sigma_A$ and Bob generates $\sigma_B$ locally, then Alice runs $\mathcal{A}$ on $\sigma_A$, then sends the memory content and $\text{DE}(\sigma_A)$ to Bob, which is $s+\log n$ bits.

Bob continues to run $\mathcal{A}$ on $\sigma_B$ with memory contents from Alice then get answer for $\text{DE}(\sigma)$. Thus Bob knows $\text{DE}(\sigma),\text{DE}(\sigma_A),\text{DE}(\sigma_B)$ and can output $\text{EQ}(x,y)$ by the property mentioned above.

Thus $s+\log n\geq n$, which leads to $s=\Omega(n)$.

$\square$.

# # Randomized Communication Complexity

There are two ways of introducing random bits to the communication model

• the private coin/randomness model where Alice and Bob have access to separate sources of randomness on their own.
• the public coin/randomness model where players have access to a shared source of randomness.

Similar to other settings, we require that a randomized protocol for a problem $f$ to output the correct answer to $f(x, y)$ for any given $x, y$ to Alice and Bob, with probability at least 2/3 (or some other constant strictly more than half).

Newman’s Theorem: Any public coin protocol $\pi$ for a communication problem $f$ with probability of success at least $1-\epsilon$ can be simulated by a private coin protocol $\theta$ such that $\|\theta\|\leq \|\pi\|+O(\log n+\log 1/\delta)$ and $\theta$ outputs the correct answer to $f$ with probability at least $1-\epsilon-\delta$.

Proof:

Let $\pi(x,y,r)$ denote the deterministic output of protocol $\pi$ on inputs $x,y$ and public randomness $r$.

Lemma: For some $t=O(n/\delta^2)$, there exist strings $r_1,...,r_t$ such that for all $(x,y)\in\{0,1\}^n\times\{0,1\}^n$, we have

$\mathop{\text{Pr}}_{i\leftarrow[t]}\left [\pi(x,y,r_i)\neq f(x,y)\right ]\leq\epsilon+\delta$

Assuming the lemma, let’s see how to design a good protocol. Given the protocol $\pi$, we can hardcode $r_1,...,r_t$ since they are independent of input $(x,y)$. Then Alice samples an index $i\in [t]$ and sends it to Bob, using $\log t=O(\log n+\log 1/\delta)$ bits, then Alice and Bob all simulates protocol $\pi$ with public random variable $r_i$. It’s obvious to see that the protocol satisfies.

Proof of Lemma:

Define the indicator random variable $Z(x,y,r)\in\{0,1\}$ for $\pi(x,y,r)=f(x,y)$. By the guarantee of the protocol $\pi$, we have that:

$\forall x,y,\quad \mathbb{E}_r[Z(x,y,r)]=\text{Pr}[\pi\text{ errs on }(x,y)]\leq\epsilon$

Now, suppose we sample public coins $t=\lceil 2n/\delta^2\rceil$ times. By Chernoff bound:

\begin{aligned} \mathop {\text{Pr}}_{r_1,...,r_t}\left [\frac{\sum_{i=1}^tZ(x,y,r_i)}{t}\geq\epsilon+\delta\right ] &\leq\mathop {\text{Pr}}_{r_1,...,r_t}\left [\frac{\sum_{i=1}^tZ(x,y,r_i)}{t}-\mathbb{E}\left [\frac{\sum_{i=1}^tZ(x,y,r_i)}{t}\right ]\geq\delta\right ]\\ &\leq\exp (-2\delta^2t)\\ &\leq \exp(-4n) \end{aligned}

As such, by union bound over all choices of $(x,y)\in\{0,1\}^n\times\{0,1\}^n$, we have,

$\mathop{\text{Pr}}_{r_1,...,r_t}\left [\exists(x,y)\text{ s.t. } \frac{\sum_{i=1}^tZ(x,y,r_i)}{t}\geq\epsilon+\delta\right ]\leq 2^{2n}\exp(-4n)<2^{2n-4n}<1$

If for all $r_1,...,r_t$, we have "for all $(x,y)\in\{0,1\}^n\times\{0,1\}^n$, $\mathop{\text{Pr}}_{i\leftarrow[t]}\left [\pi(x,y,r_i)\neq f(x,y)\right ]>\epsilon+\delta$", then it must hold that

$\mathop{\text{Pr}}_{r_1,...,r_t}\left [\exists(x,y)\text{ s.t. } \frac{\sum_{i=1}^tZ(x,y,r_i)}{t}\geq\epsilon+\delta\right ]=1$

which contradicts. So there exists such $r_1,...,r_t$.

$\square$.

Remark: Since we only care about communication complexity, thus we don’t care about how to find $r_1,...,r_t$. We could assume that we have infinite local computing resources.

Remark: Newman’s theorem can be seen as a very basic pseudo-random number generator (PSG): We were able to reduce the entire random bits needed by $\pi$ to only $O(\log n + \log (1/\delta))$ bits (and thus communicate it between the players) at the cost of only paying an additive factor of $\delta$ in the algorithm. Note that aside from transforming public coins to private coins, this theorem also implies that any constant-error protocol can be made to work with only $O(\log n)$ bits of randomness.

Definition(Communication cost). The communication cost of a randomized protocol $\pi$ is the worst-case number of bits communicated between Alice and Bob in $\pi$ over any choice of inputs $x,y$ and the randomness of the protocol.

We can view a public coin protocol as a distribution over deterministic protocols obtained by first using the public coins to sample the deterministic protocol and then running the deterministic protocol on the input. As such, we can alternatively define the communication cost of $\pi$ as the maximum communication cost of any deterministic protocol in the support of this distribution.

Definition(Randomized communication complexity). The randomized communication complexity of function $f$ is defined as $R(f)=\min_\pi\|\pi\|$, where $\pi$ ranges over all randomized protocols that can solve $\pi$ with probability of success at least 2/3.

Definition. Let $\mu$ be a distribution on inputs $(x, y)$. We define the distributional communication complexity of $f$ over distribution $\mu$ as $D_\mu(f) = \min_\pi\|\pi\|$ where $\pi$ ranges overall deterministic protocols that output the correct answer on inputs sampled from $\mu$ with probability at least 2/3.

Proposition(Yao’s minmax principle). For any communication problem $f:\{0,1\}^n\times\{0,1\}^n\rightarrow\{0,1\}$:

• $D_\mu(f)\leq R(f)$ for all input distributions $\mu$.
• $D_\mu(f)=R(f)$ for some input distribution $\mu$.

Yao’s minimax principle gives us a way of proving lower bounds for randomized protocols by instead considering deterministic ones on random inputs (we typically only use the first part which follows from a simple averaging argument – the second part implies that this approach can always gives us the tightest possible bound if we are able to find the “right” distribution $\mu$).

## # A example: Randomized Communication Complexity of Equality

Algorithm:

1. Generate a public random string $a\in_R\{0,1\}^n$ uniformly at random.
2. Alice send $c=\sum_{i=1}^n a_ix_i\mod 2$ to Bob.
3. Bob also computes $c'=\sum_{i=1}^n a_iy_i\mod 2$, and output $\begin{cases}0&\text{with prob 1/3}\\ \begin{cases}1&c'=c\\0&c'\neq c\end{cases} &\text{with prob 2/3}\end{cases}$.

Analysis:

when $x\neq y$:

$\mathop{\text{Pr}}_{a\sim\{0,1\}^n}\left [\sum_{i=1}^n a_ix_i=\sum_{i=1}^n a_iy_i\mod 2\right ]=\frac{1}{2}$

Thus when $x=y$, with probability $2/3$ Bob outputs 1, and when $x\neq y$ , with probability $1/3+2/3\cdot 1/2=2/3$ Bob outputs 0.

The randomized communication complexity is $O(1)$ since Alice just need to send 1 bit to Bob. And with Newman’s theorem, we can turn the above protocol into a private random protocol.

It is also worth mentioning that one can prove an $\Omega(\log n)$ bit lower bound on the communication cost of any private coin protocol for the equality problem. This in turn implies the tightness of Newman’s theorem (for constant $\delta$).

# # One Way Communication Complexity: The Index Problem

Problem: In the index communication problem Ind , Alice gets a string $x\in\{0,1\}^n$ and Bob gets an index $i\in[n]$; the goal for the players is to output $x_i$. i.e., $\text{Ind}(x,i)=x_i$.

Theorem. Any randomized streaming algorithm for computing DE exactly requires $\Omega(R(\text{Ind})-\log n)$ bits of space, where Ind is on domain $\{0,1\}^n$.

Proof:

Assume there is a randomized streaming algorithm $\mathcal{A}$ for computing DE that requires $s$ bits of space. Now consider an instance $(x, i)$ of Ind problem. Define the sub-streams:

$\sigma_A:=\{j:x_j=1\} \quad\sigma_B=\{i\}$

and let $\sigma:=\sigma_A\circ\sigma_B$ be the concatenation of two sub-streams. An important property is, when $\text{Ind}(x,i)=1$, we have $\text{DE}(\sigma)=\text{DE}(\sigma_A)$; while $\text{Ind}(x,i)=0$ we have $\text{DE}(\sigma)=\text{DE}(\sigma_A)+1$.

Similar to the previous proof, we find a deduction from DE to Ind , in other way, we find a algorithm for Ind with space $R(\text{DE})+O(\log n)$. Thus, $R(\text{Ind})\leq R(\text{DE})+O(\log n)$.

$\square$.

Definition(One-way communication complexity). In the one-way communication model, we only allow Alice to send a single message to Bob and Bob then needs to output the answer.

We then define

• One-way deterministic communication complexity: $\overrightarrow{D}(f):=\min_\pi\|\pi\|$ where $\pi$ ranges over all deterministic one-way protocols for $f$.
• One-way randomized communication complexity: $\overrightarrow{R}(f):=\min_\pi\|\pi\|$ where $\pi$ ranges over all randomized one-way protocols for $f$ with probability of success at least $2/3$.
• One-way distributional communication complexity: $\overrightarrow{D}_\mu(f):=\min_\pi\|\pi\|$ where $\pi$ ranges over all deterministic one-way protocols for $f$ with probability of success at least $2/3$ over the distribution $\mu$.

$\overrightarrow{D}(\text{Ind})\geq n$ is obvious, since by pigeonhole principle, if the message of Alice has size less than $n$, then at least two different strings $x\neq y\in\{0,1\}^n$ , are mapped to the same message $M$. Now let $i$ be an index where $x_i\neq y_i$ . The output of Bob is a deterministic function of $M$ and $i$ and is thus wrong for one of $x_i$ and $y_i$ , a contradiction.

Theorem. $\overrightarrow{R}(\text{Ind})=\Omega(n)$.

Proof:

Let’s define the Hamming distance:

$\Delta(x,y):=|\{i\mid x_i\neq y_i\}|$

Lemma: For any parameter $\delta\in(0,\frac{1}{4})$, there exists a subset $\mathcal{F}\subseteq\{0,1\}^n$ with size $\exp(n\delta^2/4)$ such that for any two $x\neq y\in\mathcal{F}$, $\Delta(x,y)>n/2-\delta n$.

Proof:

If $x,y$ are uniformly sampled from $\{0,1\}^n$, we have:

$\mathbb{E}[\Delta(x,y)]=\sum_{i=1}^n\text{Pr}[x_i\neq y_i]=\frac{n}{2}$

By Chernoff bound,

$\text{Pr}[\Delta(x,y)\leq n/2-\delta n]\leq\text{Pr}[|\Delta(x,y)-\mathbb{E}[\Delta(x,y)]|\geq \delta n]\leq\exp(-\frac{\delta^2}{2}\cdot n)$

Let $t=\exp(n\delta^2/4)$ and suppose we sample strings $x_1,...,x_t$ uniformly from $\mathcal{F}$. By union bound:

$\text{Pr}[\exists i\neq j,\Delta(x_i,x_j)\leq n/2-\delta n]\leq \begin{pmatrix}t\\2\end{pmatrix}\cdot\exp(-\delta^2 n/2)<1$

Thus there must exist $x_1,...,x_t$ satisfying the requirements.

$\square$.

The lemme illustrates a construction of “large” family of strings in $\{0,1\}^n$ that are “far from” each other.

We will use Yao’s minmax principle to show that there is a distribution $\mu$ where $\overrightarrow{D}_\mu(\text{Ind})=\Omega(n)$ output the correct answer with probability 0.9 as opposed to 2/3.

With Lemma above, we can construct a set $\mathcal{F}$, for all $x,y\in\mathcal{F},\Delta(x,y)>0.4n$ for $\delta=0.1$, and $|\mathcal{F}|=2^{\Omega(n)}$. In the distribution $\mu$, we sample $x\in\mathcal{F}$ uniformly at random and give it to Alice, and sample $i\in[n]$ uniformly at random and independently and give it to Bob.

Now consider any deterministic one-way protocol $\pi$ for Ind on the distribution $\mu$ with probability of success at least 0.9. By definition:

$\mathop{\text{Pr}}_{x\in\mathcal{F}}\mathop{\text{Pr}}_{i\in[n]}[\pi\text{ errs on input }(x,i)]\leq 0.1$

Let $\mathcal{F}'$ be the subset of $\mathcal{F}$:

$\mathcal{F}'=\{x\in\mathcal{F}\mid \mathop{\text{Pr}}_{i\in[n]}[\pi\text{ errs on input }(x,i)]\leq 0.2\}$

We have $|\mathcal{F}'|\geq|\mathcal{F}|/2$ as otherwise:

\begin{aligned} \mathop{\text{Pr}}_{x\in\mathcal{F}}\mathop{\text{Pr}}_{i\in[n]}[\pi\text{ errs on input }(x,i)]&=\mathop{\text{Pr}}_{x\in\mathcal{F}'}\mathop{\text{Pr}}_{i\in[n]}[\pi\text{ errs on input }(x,i)]+\mathop{\text{Pr}}_{x\in\mathcal{F}-\mathcal{F}'}\mathop{\text{Pr}}_{i\in[n]}[\pi\text{ errs on input }(x,i)]\\ &>\mathop{\text{Pr}}_{x\in\mathcal{F}-\mathcal{F}'}\mathop{\text{Pr}}_{i\in[n]}[\pi\text{ errs on input }(x,i)]\\ &>\frac{|\mathcal{F}-\mathcal{F}'|}{|\mathcal{F}|}\cdot 0.2\\ &>\frac{1}{2}\cdot 0.2=0.1 \end{aligned}

For any $x\in\mathcal{F}'$, let $M(x)$ denote the message sent by Alice to Bob when here input was $x$ (note that $M(x)$ is a deterministic function of $x$). We claim that given only $M(x)$ for any $x\in\mathcal{F}'$, we can recover the string $x$ entirely. The proof is as follows

Given $M(x)$, define the vector $y(x)\in\{0,1\}^n$ such that for any $i\in [n],y_i:=\pi(x,i)$, i.e., the output of the protocol when the input to Alice is $x$ and input to Bob is $i$. Note that $y(x)$ is a deterministic function of $M(x)$ only. Moreover we have

$\Delta(y,x)\leq 0.2n$

by the definition $\mathcal{F}'$ as only 0.2 fraction of entries of $y(x)$ can be different from $x$. On the other hand, for any $z\in\mathcal{F}$, we have

\begin{aligned} \Delta(y(x),z)&\geq \Delta(z,x)-\Delta(y(x),x)\\ &>0.4n-0.2n\\ &\geq \Delta(y(x),x) \end{aligned}

As such, given $y(x)$, we can simply find the string $w\in\mathcal{F}$ which minimizes $\Delta(y(x),w)$; by the two equations above, this string has to be $x$, thus allowing us to recover $x$ from $y(x)$. Since $y(x)$ itself was a deterministic function of $M(x)$, we obtained a one-to-one mapping from $x\rightarrow M(x)$ for all $x\in\mathcal{F}'$. This implies that there needs to be $|\mathcal{F}'|=2^{\Omega(n)}$, different messages sent by Alice, which means length of her message needs to be $\Omega (n)$ bits, concluding the proof.

$\square$.

## # A reduction from nearest neighbor query to index problem

Problem(nearest neighbor search) NNS .

Given a stream of points $a_1,a_2,...,a_m$in 2-dimensional Euclidean space $[n]^2$ and another point $p$ at the end of the stream. Assuming $m>n$, to compute:

$\min_{i=1,...,m}\|a_i-b\|$

Intuitively, if we can solve this problem efficiently, then with same complexity can we determine that an element is in the set of not, which seems quite difficult.

Theorem. Any algorithm trying to solve NNS requires $\Omega(n)$ bits of storage.

Proof:

The proof is based on the reduction to Ind . Assuming there exists an algorithm $\mathcal{A}$ computing NNS with $s$ bits. Now consider an instance for Ind problem $(x,i)$, define points:

\begin{aligned} &a_j=(j,x_j)\text{ for }j=1,...,n\\ &b=(i,0) \end{aligned}

and sub-streams:

$\sigma_A=\{a_1,a_2,...,a_n\}\quad \sigma_B=\{b\}$

Let Alice run $\mathcal{A}$ on $\sigma_A$ and send storage to Bob, which needs $s$ bits. The answer of NNS on $\sigma_A$ (whether $a_n\in\{a_1,...,a_{n-1}\}$) doesn’t matter.

Then Bob continues to run $\mathcal{A}$ on $\sigma_B$. If Bob outputs $0$, which means there exists a point $a_j$ satisfying:

$\|a_j-b\|=0\Rightarrow j=i\wedge a_j=0$

thus Bob could output 0 as the answer to Ind . Otherwise, he ought to output 1. Thus, we have $s\geq\overrightarrow{R}(\text{Ind})=\Omega(n)$.

$\square$.