• Emily Zhang
  • Fiddler

Can You Paint by Number?

2024 Apr 52024 April 5 Problem Back

This Week's Fiddler

When we start a new cluster, we have one square guaranteed, and then there's a 50% chance the next square will continue the cluster. We want to find the average size of said cluster.

It's a geometric distribution with a 0.50.50.5 probability of failure! In essence, the cluster continues to grow so long as there is "success". While the failure itself does not contribute to the count, we start counting at 1, evening out the ends of any off-by-one offset. Thus, the expected value is 11−0.5=2\frac{1}{1-0.5}=21−0.51​=2.

Interestingly, we can also reason about this another way. We know that the chance of a cluster being size 1 is 12\frac{1}{2}21​, the chance of it being size 2 is 122\frac{1}{2^2}221​, the chance of it being size 3 is 123\frac{1}{2^3}231​, etc. Thus, the expected cluster size is ∑n=1∞n2n\sum_{n=1}^{\infty}\frac{n}{2^n}∑n=1∞​2nn​. This is a known identity (using calculus) which evaluates to 2, confirming our answer.

Answer: 2 squares


Extra Credit

Let's consider a rectangular "chunk": the resulting rectangle obtained when drawing two vertical lines in the strip such that the lines do not go through any cluster. Specifically, let's consider the minimum such chunk, that is, the rectangular chunks that contain no smaller rectangular chunks within them. For example, if the example in the problem were to be chunked, it would look like this:

Chunked 2-cm-wide strip

How many squares per chunk?

Note that chunks stop if and only if each row, going from one column to the next, flips color. Since there are 2 rows and 2 colors, the chance of this happening on each column border is (12)2=14(\frac{1}{2})^2=\frac{1}{4}(21​)2=41​, meaning we expect the average chunk to contain 4 columns. As there are 2 rows per column, this makes for 8 squares per chunk.

However, that's not what we want to find out; we want to find out the average number of squares per cluster. Luckily, while chunks can contain multiple clusters, we have constructed them in such a way that they cleanly divide up clusters; that is, we can count how many clusters we expect per chunk.


How many clusters per chunk?

For each column, let's define two possibilities: SSS (the two squares have the same color) and DDD (the two squares have different colors). Let's try to find E(S)E(S)E(S) and E(D)E(D)E(D), the expected number of clusters in a chunk expected to be added after a column with an SSS column and DDD column, respectively.

Let's say our chunk starts with an SSS column. The next column could be a nonmatching SSS column (14\frac{1}{4}41​ chance), a matching SSS column (14\frac{1}{4}41​ chance), or a DDD column (12\frac{1}{2}21​ chance). If it is nonmatching, the chunk ends, and the value is 0. If it is matching, no new clusters are added, so the value is E(S)E(S)E(S). If it is a DDD column, one cluster is added (whichever color is unmatching), making the value 1+E(D)1+E(D)1+E(D). All together, we get that E(S)=14E(S)+12(1+E(D))=23(1+E(D))E(S)=\frac{1}{4}E(S)+\frac{1}{2}(1+E(D))=\frac{2}{3}(1+E(D))E(S)=41​E(S)+21​(1+E(D))=32​(1+E(D)).

Now let's say that our chunk starts with a DDD column. The next column could be nonmatching DDD column (14\frac{1}{4}41​ chance), a matching DDD column (14\frac{1}{4}41​ chance), or an SSS column (12\frac{1}{2}21​ chance). Like before, a nonmatching column ends the chunk, making the value 0. A matching column adds no new clusters, so the value is E(D)E(D)E(D). If it is an SSS column, no new cluster is added, so the value is E(S)E(S)E(S). All together, we have E(D)=14E(D)+12E(S)=23E(S)E(D)=\frac{1}{4}E(D)+\frac{1}{2}E(S)=\frac{2}{3}E(S)E(D)=41​E(D)+21​E(S)=32​E(S).

With substitution, we get that E(S)=23(1+23E(S))=23+49E(S)=23⋅95=65E(S)=\frac{2}{3}(1+\frac{2}{3}E(S))=\frac{2}{3}+\frac{4}{9}E(S)=\frac{2}{3}\cdot\frac{9}{5}=\frac{6}{5}E(S)=32​(1+32​E(S))=32​+94​E(S)=32​⋅59​=56​, and E(D)=23E(S)=23⋅65=45E(D)=\frac{2}{3}E(S)=\frac{2}{3}\cdot\frac{6}{5}=\frac{4}{5}E(D)=32​E(S)=32​⋅56​=54​.

How does this translate to chunks overall? To find the expected number of regions per chunk, E(∗)E(*)E(∗), we consider SSS as the first column, which yields 1+E(S)1+E(S)1+E(S) clusters, and DDD as the first column, which yields 2+E(D)2+E(D)2+E(D) clusters (as DDD necessitates the start of two differently colored clusters). Each has an equal chance, so E(∗)=12(1+E(S)+2+E(D))=12(1+65+2+45)=52E(*)=\frac{1}{2}\bigl(1+E(S)+2+E(D)\bigl)=\frac{1}{2}\bigl(1+\frac{6}{5}+2+\frac{4}{5}\bigl)=\frac{5}{2}E(∗)=21​(1+E(S)+2+E(D))=21​(1+56​+2+54​)=25​.

This means we expect 2.5 clusters per chunk.


So how many squares per cluster?

With 8 squares per chunk and 2.5 clusters per chunk, we can expect 82.5=165=3.2\frac{8}{2.5}=\frac{16}{5}=3.22.58​=516​=3.2 squares per cluster!

To verify my answer, I made a simulation:

Code
import random


random.seed(12)

T = 10**7
all_regions: list[int] = []
a, b = random.randrange(2), random.randrange(2)
regions = [2 - a - b, a + b]
for t in range(T):
    c, d = random.randrange(2), random.randrange(2)
    if c == d:
        regions[c] += 2
        all_regions.append(regions[1 - c])
        regions[1 - c] = 0
    else:
        if a == b or a == c:
            regions[c] += 1
            regions[d] += 1
        else:
            all_regions.extend(regions)
            regions[c] = 1
            regions[d] = 1
    a, b = c, d
all_regions.extend(regions)
whole_regions = [region for region in all_regions if region]
print(sum(whole_regions) / len(whole_regions))

Output: 3.2002681184361506

Seems about right!

Answer: 3.2 squares


Beyond

If I had more time, I'd love to explore what implications this has for strips with more than 2 rows... or more than 2 colors! Sitting tight for the person/people who end up doing that :)


— Emily

Back to Fiddler list
© 2024 Emily Bradon Zhang·Support Gaza Recovery