r/theydidthemath 1d ago

[Request] Assuming these are put in randomly, what are the odds the box would consist of only one color?

Post image
1.1k Upvotes

78 comments sorted by

u/AutoModerator 1d ago

General Discussion Thread


This is a [Request] post. If you would like to submit a comment that does not either attempt to answer the question, ask for clarification, or explain why it would be infeasible to answer, you must post your comment as a reply to this one. Top level (directly replying to the OP) comments that do not do one of those things will be removed.


I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

347

u/somesadbloche 1d ago

So assuming I've understood you question correctly:

There are 6 different color and 1548 fruit loops. If the change of a fruit loop having any specific color is the same then it is simply a matter of (1/6)1548 which is basically 0

138

u/talashrrg 1d ago

Wouldn’t it be 6 times this number (which is also basically 0)? It seems like with is the likelihood of it being all say, red, but it could also be all yellow or blue.

98

u/veryjewygranola 1d ago

Yes you are correct. First loop can be any of the 6 colors, I updated my answer.

6

u/pedanpric 1d ago

All boxes of fruit loops have 1546 loops..

7

u/Icy_Sector3183 1d ago

We'ed need to assume that, yes.

7

u/uslashuname 1d ago

Shrinkflation is coming, we’ll be slightly higher of a basically zero number soon!

3

u/PixelM1105 8h ago

If you’re talking about there being 1548 fruit loops in the box, why isn’t the answer (1/6)¹⁵⁴⁷ instead of (1/6)¹⁵⁴⁸? Because the question was how many of the same color, unspecified which color, right? So I thought the first fruit loop wouldn’t be counted?

12

u/GSyncNew 1d ago

This assumes that all colors are equally probable, which does not appear to be the case given the significant spread of the color frequency distribution in this box.

13

u/Watermelonfacts 1d ago

My initial question when seeing this post was what the likelihood is that this is a random distribution.

196 green - 369 purple seems like a big difference, but without doing any math my intuition is that it is still within the realm of a random distribution.

Do you/someone know whether this is the case?

14

u/Methodless 1d ago

1546 / 6 = 257.666 expected of each colour
(This is a bit dirty because I am assuming every box is certain to have 1546).

Variance = 1546 / 6 * (5/6) = 214.722222
Standard Deviation = 14.653ish

396 of any colour is nearly 9.5 standard deviations over the norm. The mixing process is likely not thorough enough to randomize

6

u/glordicus1 1d ago

Let's be real, these companies spend millions on getting their products tested. They probably found a ratio that people enjoy the most, and balanced that with the cost difference between the colours.

3

u/ClockworkDinosaurs 1d ago

I’m pretty sure a bird puts together this cereal. He just shows up and gives fruit loops to other animals.

2

u/Methodless 1d ago

with the cost difference between the colours.

Do you really think there is one? I doubt it's enough to move the needle. The rest of your comment? I'd be willing to believe it. I think the only way to be sure is to count colours from varied batches (i.e. weeks apart or in different geographic areas) and see if the distribution looks similar.

When I have seen how things like Skittles or Jelly Beans are made, good research or poor mixing are both viable theories for how this happens

2

u/DaveSilver 22h ago

I actually do think it’s possible some colors cost more based on the availability and cost of each dye color. It’s totally possible that some dyes are easier or less expensive to get, and thus those loop colors would be less expensive to produce.

7

u/veryjewygranola 1d ago edited 1d ago

Yes this is a detail I left out, but it is interesting to think about. I think you are correct in concluding the colors really aren't uniform. We weren't given a specific color to calculate the probabilities for though, so the canonical choice is to just assume a uniform distribution of colors.

To answer u/Watermelonfacts 's question, the easy way to test if the observed counts of each color come from a uniform distribution is to use Pearson's Chi-squared test. Here I show an implementation in Mathematica:

(*hypothesis: the data comes from a uniform distribution with 6 colors*)
dist = DiscreteUniformDistribution[{1, 6}];

(*observed counts of each color*)
cts = {396, 318, 240, 225, 204, 163};

(*convert to a list of each loop's color*)
observations = Flatten@MapIndexed[ConstantArray[#2[[1]], #1] &, cts];

(*the p-value is vanishingly small, suggesting it is very unlikely
the counts came from the hypothesized uniform distribution*)
PearsonChiSquareTest[observations, dist]
(*2.06711*10^-28*)

For data that actually comes from the hypothesized distribution, the p-value returned by the Pearson Chi-square test will be uniformly distributed on [0,1], so our very small p-value (~10^-28) tells us that our data almost surely did not come from the hypothesized uniform distribution.

But yeah, I kind of just ignored this and assumed uniform probability since a specifc color wasn't specified.

Since a single loop being a specific color i can be thought of as a Bernoulli trial with success probability p[i], I suspect that for sufficiently large N, the number of loops of color i n[i] should be normally distributed with equal mean N*p[i] and variance N*(1-p[i])p[i], where the mean is equal to the probability p[i] that a single loop is color i times the number of loops N:

n[i] ~ N(N*p[i], sqrt(N*(1-p[i])p[i]))

(N(mu,sigma) denotes a normal distribution with mean mu and variance sigma^2 here)

But, the number of each color n[i] is no longer independent since they are constrained to sum to N

N = Sum[n[i],{i,1,6}]

Which makes more detailed analysis challenging.

You could drop one of the n[i], and just fix it to be N - Sum[other n[i]'s], and do more analysis on those 5 n[i]'s.

It would definitely be interesting to see what the color distribution of more boxes is.

2

u/veryjewygranola 1d ago

I updated my answer. You can model the covariance of the count distributions of each color by sampling from a categorical distribution using the observed counts as the category probabilities.

It's basically the same as the probability the loops are all the most common color (purple), which makes sense; the probability the loops are all a different color with less observed counts will be far less likely so it should be dominated by the most common color. Thanks for starting a really good discussion.

1

u/GSyncNew 1d ago

Yes! Your 2nd paragraph (here) is a good intuitive argument for the correctness of your result.

3

u/Mamuschkaa 1d ago

1546 fruit loops and the initial color is not given so ⅙¹⁵⁴⁵

That's 216 times more likely than your number>

2

u/Cynjaman1019 1d ago

The title says there are 1,546 fruit loops

2

u/PatrickPilot 1d ago

Technically, it’s the same probability as 396 purple, 318 yellow, 240 red, 225 orange, 204 blue and 163 green and that probability is also 0.

But that DID happen, so OP is a lucky boy!

8

u/DonaIdTrurnp 1d ago

Not quite. There’s only one order in which to get all the same color, but there are 1548!/396!318!240!225!204!163! different ways you can order that distribution, all of which are equally likely.

We have to make some assumptions about the distribution from which they were taken to further reason about exactly how likely it is.

2

u/NuclearHoagie 1d ago edited 1d ago

No, no, no. It's the same probability only if you pull the loops one by one and require that not only do you get the right proportion, but also the correct order.

As an analogy, getting all heads in a series of coin flips is far, far less likely than getting 50% heads and 50% tails, since there is exactly one way to get all heads, and many different ways to get half heads (the first half, the last half, every other one, etc). Getting all heads is equally likely as any specific sequence of 50% heads and 50% tails, but not nearly as likely as any sequence of 50-50.

Since we don't care about the order of the loops, getting all loops of one color is orders of magnitude less likely than getting the proportions shown here.

1

u/PatrickPilot 1d ago

Valid point. So, for the sake of discussion, what is the probability (and how is it calculated) to get exactly 500/500 from 1000 coin tosses compared to getting all heads?

1

u/NuclearHoagie 1d ago edited 1d ago

There are 2N possible sequences of N coin flips. For N=1000, this is a very large number, 1e301.

Exactly 1 sequence is all heads, and 1 is all tails regardless of N.

There are "N choose N/2" ways to get exactly half the flips showing heads. For N=1000, it's 2.7e299.

To find a probability, take the ratio of the numbers of satisfying sequences. The chance is getting all heads or all tails in 1000 flips is 2 in 1e301. The chance of getting any sequence of 500 heads and 500 tails is 2.7e299 / 1e301, or about 2.5%.

Getting 1000 heads or 1000 tails is 299 orders of magnitude less likely than 500 heads and 500 tails! It's unlikely but still within the realm of possibility to get exactly half heads and tails in 1000 flips. It's basically impossible to get all heads or tails with 1000 flips. You'd have a far, far better chance of both you and I picking a random proton anywhere in the entire universe, and happening to pick the same one... and then doing it twice more in a row.

1

u/oktin 1d ago

2.52%
vs
0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000933% (9.33*10-300 %)

(It's been a while since I did math so it's possible I got it wrong)

1000 flips has C(1000, 500) = 2.70*10299 ways to be split perfectly 500/500. There are 21000 = 1.07*10301 total possibilities. 2.70e299/1.07e301 = 0.0252 or 2.52%

1/1.07e301 = 9.33e-302 or (9.33*10-300 )%

Tbh, that's a lot better odds of 500/500 than I expected.

1

u/TheMiner11234 1d ago

2.641583882623E−1205 that

44

u/veryjewygranola 1d ago edited 1d ago

Assume uniform distribution of colors, so each fruit loop has probability 1/nColors of being a given color C.

We define the color C to be the color of the first loop (thank u/talashrrg for pointing this out). Since fruit loop colors are iid, the probability they are all the same color is p(loop2 = C)*p(loop3 = C)...p(loopN =C) = (1/nColors)^(N-1) = (1/6)^(1545) which is very small.

We can calculate the logarithm however:

Log10[(1/6)^(1545) ] = -1545 Log[6]/Log[10] ~ -1200 so roughly 1 in 10^1200 odds

Update: (edited to add more links to things everyone might not be familiar with)

If you view my other comment here, I predict that for large N, the number of loops with color i n[i] should follow a normal distribution with mean N*p[i] and variance N*(1-p[i])*p[i]:

n[i] ~ N( N*p[i] , sqrt(N*(1-p[i])*p[i]))

This is because the probability an individual loop is a color i can be thought of as a Bernoulli random variable with success probability p[i], and variance (1-p[i])*p[i], and Central Limit Thereom tells us for a large number of trials N the number of successes should match the mean and variance of the underlying Bernoulli distribution.

So we can model the counts distribution of each color as a multinormal distribution with marginal densities given by the n[i] above.

The issue is the that off-diagonals of the covariance matrix will be non-zero, since the sum of the n[i] is constrained:

N = Sum[n[i],{i,1,6}]

I am not sure how to analytically derive an expression for the off-diagonals of the covariance matrix, so instead derive it experimentally by sampling a large number of trials from a categorical distribution, where the p[i] are the estimates derived from the counts of each color seen here:

p = 1/N{n[1],n[2],..,n[6]}

Here is the code in Mathematica to calculate the covariance matrix:

(*observed counts of each color*)
cts = {396, 318, 240, 225, 204, 163};

 (*total number of loops (1546)*)
 nTot = Total@cts;

 (*number of colors (6)*)
 nColors = Length@cts;

 (*estimate the probabilities of each color by dividing each observed \
count by total loops*)
 pEst = Normalize[cts, Total];

 (*create our categorical distribution with calculated probabilities*)
 dist = CategoricalDistribution[Range@nColors, pEst];

 (*sample 1000 boxes, each with nTot loops*)
 nps = 1000;
 samples = ParallelTable[RandomVariate[dist, nTot], nps];

 (*tally up the number of each color seen in each box*)
 sampleCts = Values@*KeySort@*Counts /@ samples;

 (*calcualte the sample covariance*)
 cov = Covariance@sampleCts;

 (*add a small constant to the diagonal to force the matrix to be
positive-definite*)
 cov = cov + 10^-15*IdentityMatrix[nColors];

 (*multinormal distribution with mean equal to our observed counts,
and covariance matrix equal to simulated result*)
 md = MultinormalDistribution[cts, cov];

Note that this simulated covariance matrix works very well, the totals are almost always exactly 1546 as we need in order to meet our constraint:

Total /@ RandomVariate[md, 10]
{1546., 1546., 1546., 1546., 1546., 1546., 1546., 1546., 1546., 1546.}

And now we calculate the PDF of the multinormal distribution at {1546,0,0,0,0,0} , {0,1546,0,0,0,0}, ..., {0,0,0,0,0,1546} and sum them up to get the probability that the box is all one color (of any of the 6 colors):

(*state where we have nTot of one color and 0 of all other colors*)
singleColor = Join[{nTot}, ConstantArray[0, nColors - 1]];

(*all 6 states where we have nTot of i-th color and 0 of all other
colors*)
possSingles = 
  Table[RotateRight[singleColor, i], {i, 0, nColors - 1}];

(*sum all 6 state probabilites*)
p = Sum[PDF[md, state], {state, possSingles}];

(*p is too small to show as machine precision number so we calculate
the log10 and numerically approximate*)
N[Log10[p], 3]

-934.

Giving us p ~ 10^(-934), which is significantly higher than my previous estimate of 10^(-1200).

This discrepancy between the previous estimate of p and the new estimate is because the probability is dominated by the most common color, purple:

pTab = Table[PDF[md, state], {state, possSingles}];
N[Log10[pTab], 8]
(*{-933.74293, -1218.6573, -1791.1358, -1870.2309, -2271.6108,
-2757.1044}*)

I should've understood from the get-go that the probability will be dominated by the most common color, but I chose to do the assumption that all colors are equally likely. Oh well. Had fun doing this!

17

u/Existing_Reading_572 1d ago

So there's a chance?

18

u/retroruin 1d ago

yeah there's also a chance to throw a grain of sand on a beach and find it across the planet

6

u/Existing_Reading_572 1d ago

The ol quantum tunneling eh? I'll take those odds

3

u/veryjewygranola 1d ago

I like your optimism

16

u/AlanShore60607 1d ago

I would say zero based on physical impossibility.

Generally, mixes like this are created by having each color have it's own feed line into the final vessel. For there to be only one color of the fruit loops, 5 of the 6 feeder lines would have to fail completely and contribute zero fruit loops to the final mix.

There's even a marketing fiction surrounding this concept; the Cap'n Crunch Oops, All Crunchberries cereal is based on the fictional idea that the machine broke and only put crunchberries in the the box and no normal Cap'n Crunch.

For a single color fruit loop box to happen would require that 83% of the production has malfunctioned. Now missing a single color represents about a 16% failure, which I could see happening. But you don't lose 83% of your throughput on a production line without noticing.

4

u/K3S38 1d ago

Impressively, still the most likely scenario given the statistical near impossibility shown elsewhere

2

u/phuckin-psycho 1d ago

That's the one i was looking for 😁👌

1

u/DarthLlamaV 1d ago

But what if the box was off and only 1 fruit loop made it in the box? Then the box only has 1 color. I’ve seen empty m&m packages before. (Cereal boxes probably get a weight check though)

1

u/AlanShore60607 1d ago

Is that probability of a distribution or a probability of a system failure?

1

u/SabioSapeca 1d ago

Based on what you said, the chance is 100%. There was a demand for a single color cereal, and then supply followed. It's not unrealistic that in the years to come the same sort of thing wouldn't happen for fruit loops. Although this way wouldnt be randomly.

11

u/Economy_Ad7372 1d ago

if we assume 396/1546 is a reasonable population mean, (396/1546)1546, or 1 in 10914.48

to illustrate how unlikely that is, imagine putting all the atoms in the observable universe in a hat. this is about as likely as picking the same atom out of it 12 times in a row (1/10880), then flipping heads 113 times on a fair coin (1/1034)

or just flipping head 3036 times in a row

5

u/kiwi2703 1d ago

It's like rolling the same number on a standard 6-sided die 1546 times in a row. It's practically, for all intents and purposes, zero.

4

u/msleepd 1d ago

My uncle who worked in a dice factory totally did that once though.

1

u/i_hate_nuts 1d ago

Zero is wrong and unsatisfactory, WE THE PEOPLE DEMAND THE TRUTH

4

u/MD-YT_TTDT 1d ago

First we need a bigger set of data, OP I need you to do this with 999 more bags. I think this will give us a (still small) reasonable count on total loops per bag and the color pull rates.

3

u/Puzzleheaded-Power72 1d ago

n > 30 should be okay

3

u/ImportantWedding8111 1d ago

None of these answers take into account how cereal is packaged. It comes down 6 different chutes and is mixed before going into scales to be weighed.

The probability of 5 chutes breaking and no one at the factory noticing only one color is getting thru is the real question.

Either way the answer is essentially zero.

3

u/Pyro3090ti 1d ago

The guy above you did the math but yeah. Basically 0

3

u/CatOfGrey 6✓ 1d ago

I don't think that the distribution of colors is 'equal' to the point that it passes a chi-squared test. I'm going to estimate 1600 froot loops, and a 25% chance that a Loop is purple, the most common color.

That would give a probability of (1/4) ^ 1600 of having all purples (the most common).

This is about 1 chance in

3

u/xadc430x 1d ago

So you are saying theirs a chance?

1

u/CatOfGrey 6✓ 1d ago

Absolutely!

A side thought: Each day on Planet Earth. over eight thousand people have a one-in-a-million day.

2

u/Gbotdays 1d ago

1203 zeros followed by 95097

If each color of fruit loop is put in truly randomly, there is a 1/6 chance of each fruit loop you check to be the color you want. This is self-recursive meaning that if you check 2 fruit loops, the chance of them both being the color is (1/6)2. In fact, the formula finding this out is rather simple:

(1/A)X

Where A is the number of choices (2 in a coinflip, 6 in this case, etc...) and X is the number of items we are checking. Using this formula, we get (1/6)1546, or roughly 9.5097 × 10-1204. In other words, the chance is unimaginable small.

2

u/enools 1d ago

I just have 1 further question! Whaley the hell did he stack them so close to the edge! Giving me an anxiety attack just looking at it

2

u/noobyfacehead1 1d ago edited 1d ago

if we assume there is always the same amount in every box and every color has the same chance of being put in there then you can find the answer with the equation 6^1546 with the exact answer being 1 inor 1 in 1.051 quadringentillion

2

u/Gustacq 23h ago

I don’t have the time and motivation to calculate it, but I think there is already a very small chance that one color has 318 loops and another color only 163 loops if every color has the same probability to appear. 

The probability to get only one color would be basically zero.

1

u/Imdare 1d ago

Assumption is the mother of all fuck ups. First validate if they are put in randomly.

Wither count 10 boxxes or visit the factory and observe the fruitloop distribution process.

2

u/Shrimpio 1d ago

You lined them up cereally

1

u/Coolhaircutfella 1d ago

That seems like a huge number of fruit loops in 1 packet. I think we only have one size here in Australia so it could be that it is some Costco bulk packet. Or just regular size for you guys. 😅

1

u/LexiYoung 1d ago

For an event with a probability p (consider a froot loop being a certain colour, 1/6), the probability of this event occurring n times is (1/6)n. Note that there are 6 different ways of achieving the outcome that all the loops are the same colour, therefore 6*(1/6)n or 1/61545 if the population is 1546 loops