# Can You Paint by Number?

## This Week's Fiddler

When we start a new cluster, we have one square guaranteed, and then there's a 50% chance the next square will continue the cluster. We want to find the average size of said cluster.

It's a geometric distribution with a $0.5$ probability of failure! In essence, the cluster continues to grow so long as there is "success". While the failure itself does not contribute to the count, we start counting at 1, evening out the ends of any off-by-one offset. Thus, the expected value is $\frac{1}{1-0.5}=2$.

Interestingly, we can also reason about this another way. We know that the chance of a cluster being size 1 is $\frac{1}{2}$, the chance of it being size 2 is $\frac{1}{2^2}$, the chance of it being size 3 is $\frac{1}{2^3}$, etc. Thus, the expected cluster size is $\sum_{n=1}^{\infty}\frac{n}{2^n}$. This is a known identity (using calculus) which evaluates to 2, confirming our answer.

**Answer:** 2 squares

## Extra Credit

Let's consider a rectangular "chunk": the resulting rectangle obtained when drawing two vertical lines in the strip such that the lines do not go through any cluster. Specifically, let's consider the minimum such chunk, that is, the rectangular chunks that contain no smaller rectangular chunks within them. For example, if the example in the problem were to be chunked, it would look like this:

### How many squares per chunk?

Note that chunks stop if and only if each row, going from one column to the next, flips color. Since there are 2 rows and 2 colors, the chance of this happening on each column border is $(\frac{1}{2})^2=\frac{1}{4}$, meaning we expect the average chunk to contain 4 columns. As there are 2 rows per column, this makes for 8 squares per chunk.

However, that's not what we want to find out; we want to find out the average number of squares per *cluster*. Luckily, while chunks can contain multiple clusters, we have constructed them in such a way that they cleanly divide up clusters; that is, we can count how many clusters we expect per chunk.

### How many clusters per chunk?

For each column, let's define two possibilities: $S$ (the two squares have the same color) and $D$ (the two squares have different colors). Let's try to find $E(S)$ and $E(D)$, the expected number of clusters in a chunk expected to be added after a column with an $S$ column and $D$ column, respectively.

Let's say our chunk starts with an $S$ column. The next column could be a nonmatching $S$ column ($\frac{1}{4}$ chance), a matching $S$ column ($\frac{1}{4}$ chance), or a $D$ column ($\frac{1}{2}$ chance). If it is nonmatching, the chunk ends, and the value is 0. If it is matching, no new clusters are added, so the value is $E(S)$. If it is a $D$ column, one cluster is added (whichever color is unmatching), making the value $1+E(D)$. All together, we get that $E(S)=\frac{1}{4}E(S)+\frac{1}{2}(1+E(D))=\frac{2}{3}(1+E(D))$.

Now let's say that our chunk starts with a $D$ column. The next column could be nonmatching $D$ column ($\frac{1}{4}$ chance), a matching $D$ column ($\frac{1}{4}$ chance), or an $S$ column ($\frac{1}{2}$ chance). Like before, a nonmatching column ends the chunk, making the value 0. A matching column adds no new clusters, so the value is $E(D)$. If it is an $S$ column, no new cluster is added, so the value is $E(S)$. All together, we have $E(D)=\frac{1}{4}E(D)+\frac{1}{2}E(S)=\frac{2}{3}E(S)$.

With substitution, we get that $E(S)=\frac{2}{3}(1+\frac{2}{3}E(S))=\frac{2}{3}+\frac{4}{9}E(S)=\frac{2}{3}\cdot\frac{9}{5}=\frac{6}{5}$, and $E(D)=\frac{2}{3}E(S)=\frac{2}{3}\cdot\frac{6}{5}=\frac{4}{5}$.

How does this translate to chunks overall? To find the expected number of regions per chunk, $E(*)$, we consider $S$ as the first column, which yields $1+E(S)$ clusters, and $D$ as the first column, which yields $2+E(D)$ clusters (as $D$ necessitates the start of two differently colored clusters). Each has an equal chance, so $E(*)=\frac{1}{2}\bigl(1+E(S)+2+E(D)\bigl)=\frac{1}{2}\bigl(1+\frac{6}{5}+2+\frac{4}{5}\bigl)=\frac{5}{2}$.

This means we expect 2.5 clusters per chunk.

### So how many squares per cluster?

With 8 squares per chunk and 2.5 clusters per chunk, we can expect $\frac{8}{2.5}=\frac{16}{5}=3.2$ squares per cluster!

To verify my answer, I made a simulation:

## Code

```
import random
random.seed(12)
T = 10**7
all_regions: list[int] = []
a, b = random.randrange(2), random.randrange(2)
regions = [2 - a - b, a + b]
for t in range(T):
c, d = random.randrange(2), random.randrange(2)
if c == d:
regions[c] += 2
all_regions.append(regions[1 - c])
regions[1 - c] = 0
else:
if a == b or a == c:
regions[c] += 1
regions[d] += 1
else:
all_regions.extend(regions)
regions[c] = 1
regions[d] = 1
a, b = c, d
all_regions.extend(regions)
whole_regions = [region for region in all_regions if region]
print(sum(whole_regions) / len(whole_regions))
```

Output: `3.2002681184361506`

Seems about right!

**Answer:** 3.2 squares

## Beyond

If I had more time, I'd love to explore what implications this has for strips with more than 2 rows... or more than 2 colors! Sitting tight for the person/people who end up doing that :)

— Emily