Akshay Agrawal commited on
Commit
4957931
·
unverified ·
2 Parent(s): d9d5845 39a7ead

Merge pull request #62 from marimo-team/haleshot/11_expectation

Browse files
Files changed (1) hide show
  1. probability/11_expectation.py +860 -0
probability/11_expectation.py ADDED
@@ -0,0 +1,860 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # /// script
2
+ # requires-python = ">=3.10"
3
+ # dependencies = [
4
+ # "marimo",
5
+ # "matplotlib==3.10.0",
6
+ # "numpy==2.2.3",
7
+ # "scipy==1.15.2",
8
+ # ]
9
+ # ///
10
+
11
+ import marimo
12
+
13
+ __generated_with = "0.11.19"
14
+ app = marimo.App(width="medium", app_title="Expectation")
15
+
16
+
17
+ @app.cell(hide_code=True)
18
+ def _(mo):
19
+ mo.md(
20
+ r"""
21
+ # Expectation
22
+
23
+ _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/expectation/), by Stanford professor Chris Piech._
24
+
25
+ A random variable is fully represented by its Probability Mass Function (PMF), which describes each value the random variable can take on and the corresponding probabilities. However, a PMF can contain a lot of information. Sometimes it's useful to summarize a random variable with a single value!
26
+
27
+ The most common, and arguably the most useful, summary of a random variable is its **Expectation** (also called the expected value or mean).
28
+ """
29
+ )
30
+ return
31
+
32
+
33
+ @app.cell(hide_code=True)
34
+ def _(mo):
35
+ mo.md(
36
+ r"""
37
+ ## Definition of Expectation
38
+
39
+ The expectation of a random variable $X$, written $E[X]$, is the average of all the values the random variable can take on, each weighted by the probability that the random variable will take on that value.
40
+
41
+ $$E[X] = \sum_x x \cdot P(X=x)$$
42
+
43
+ Expectation goes by many other names: Mean, Weighted Average, Center of Mass, 1st Moment. All of these are calculated using the same formula.
44
+ """
45
+ )
46
+ return
47
+
48
+
49
+ @app.cell(hide_code=True)
50
+ def _(mo):
51
+ mo.md(
52
+ r"""
53
+ ## Intuition Behind Expectation
54
+
55
+ The expected value represents the long-run average value of a random variable over many independent repetitions of an experiment.
56
+
57
+ For example, if you roll a fair six-sided die many times and calculate the average of all rolls, that average will approach the expected value of 3.5 as the number of rolls increases.
58
+
59
+ Let's visualize this concept:
60
+ """
61
+ )
62
+ return
63
+
64
+
65
+ @app.cell(hide_code=True)
66
+ def _(np, plt):
67
+ # Set random seed for reproducibility
68
+ np.random.seed(42)
69
+
70
+ # Simulate rolling a die many times
71
+ exp_num_rolls = 1000
72
+ exp_die_rolls = np.random.randint(1, 7, size=exp_num_rolls)
73
+
74
+ # Calculate the running average
75
+ exp_running_avg = np.cumsum(exp_die_rolls) / np.arange(1, exp_num_rolls + 1)
76
+
77
+ # Create the plot
78
+ plt.figure(figsize=(10, 5))
79
+ plt.plot(range(1, exp_num_rolls + 1), exp_running_avg, label='Running Average')
80
+ plt.axhline(y=3.5, color='r', linestyle='--', label='Expected Value (3.5)')
81
+ plt.xlabel('Number of Rolls')
82
+ plt.ylabel('Average Value')
83
+ plt.title('Running Average of Die Rolls Approaching Expected Value')
84
+ plt.legend()
85
+ plt.grid(alpha=0.3)
86
+ plt.xscale('log') # Log scale to better see convergence
87
+
88
+ # Add annotations
89
+ plt.annotate('As the number of rolls increases,\nthe average approaches the expected value',
90
+ xy=(exp_num_rolls, exp_running_avg[-1]), xytext=(exp_num_rolls/3, 4),
91
+ arrowprops=dict(facecolor='black', shrink=0.05, width=1.5))
92
+
93
+ plt.gca()
94
+ return exp_die_rolls, exp_num_rolls, exp_running_avg
95
+
96
+
97
+ @app.cell(hide_code=True)
98
+ def _(mo):
99
+ mo.md(r"""## Properties of Expectation""")
100
+ return
101
+
102
+
103
+ @app.cell(hide_code=True)
104
+ def _(mo):
105
+ mo.accordion(
106
+ {
107
+ "1. Linearity of Expectation": mo.md(
108
+ r"""
109
+ $$E[aX + b] = a \cdot E[X] + b$$
110
+
111
+ Where $a$ and $b$ are constants (not random variables).
112
+
113
+ This means that if you multiply a random variable by a constant, the expectation is multiplied by that constant. And if you add a constant to a random variable, the expectation increases by that constant.
114
+ """
115
+ ),
116
+ "2. Expectation of the Sum of Random Variables": mo.md(
117
+ r"""
118
+ $$E[X + Y] = E[X] + E[Y]$$
119
+
120
+ This is true regardless of the relationship between $X$ and $Y$. They can be dependent, and they can have different distributions. This also applies with more than two random variables:
121
+
122
+ $$E\left[\sum_{i=1}^n X_i\right] = \sum_{i=1}^n E[X_i]$$
123
+ """
124
+ ),
125
+ "3. Law of the Unconscious Statistician (LOTUS)": mo.md(
126
+ r"""
127
+ $$E[g(X)] = \sum_x g(x) \cdot P(X=x)$$
128
+
129
+ This allows us to calculate the expected value of a function $g(X)$ of a random variable $X$ when we know the probability distribution of $X$ but don't explicitly know the distribution of $g(X)$.
130
+
131
+ This theorem has the humorous name "Law of the Unconscious Statistician" (LOTUS) because it's so useful that you should be able to employ it unconsciously.
132
+ """
133
+ ),
134
+ "4. Expectation of a Constant": mo.md(
135
+ r"""
136
+ $$E[a] = a$$
137
+
138
+ Sometimes in proofs, you'll end up with the expectation of a constant (rather than a random variable). Since a constant doesn't change, its expected value is just the constant itself.
139
+ """
140
+ ),
141
+ }
142
+ )
143
+ return
144
+
145
+
146
+ @app.cell(hide_code=True)
147
+ def _(mo):
148
+ mo.md(
149
+ r"""
150
+ ## Calculating Expectation
151
+
152
+ Let's calculate the expected value for some common examples:
153
+
154
+ ### Example 1: Fair Die Roll
155
+
156
+ For a fair six-sided die, the PMF is:
157
+
158
+ $$P(X=x) = \frac{1}{6} \text{ for } x \in \{1, 2, 3, 4, 5, 6\}$$
159
+
160
+ The expected value is:
161
+
162
+ $$E[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} = \frac{21}{6} = 3.5$$
163
+
164
+ Let's implement this calculation in Python:
165
+ """
166
+ )
167
+ return
168
+
169
+
170
+ @app.cell
171
+ def _():
172
+ def calc_expectation_die():
173
+ """Calculate the expected value of a fair six-sided die roll."""
174
+ exp_die_values = range(1, 7)
175
+ exp_die_probs = [1/6] * 6
176
+
177
+ exp_die_expected = sum(x * p for x, p in zip(exp_die_values, exp_die_probs))
178
+ return exp_die_expected
179
+
180
+ exp_die_result = calc_expectation_die()
181
+ print(f"Expected value of a fair die roll: {exp_die_result}")
182
+ return calc_expectation_die, exp_die_result
183
+
184
+
185
+ @app.cell(hide_code=True)
186
+ def _(mo):
187
+ mo.md(
188
+ r"""
189
+ ### Example 2: Sum of Two Dice
190
+
191
+ Now let's calculate the expected value for the sum of two fair dice. First, we need the PMF:
192
+ """
193
+ )
194
+ return
195
+
196
+
197
+ @app.cell
198
+ def _():
199
+ def pmf_sum_two_dice(y_val):
200
+ """Returns the probability that the sum of two dice is y."""
201
+ # Count the number of ways to get sum y
202
+ exp_count = 0
203
+ for dice1 in range(1, 7):
204
+ for dice2 in range(1, 7):
205
+ if dice1 + dice2 == y_val:
206
+ exp_count += 1
207
+ return exp_count / 36 # There are 36 possible outcomes (6×6)
208
+
209
+ # Test the function for a few values
210
+ exp_test_values = [2, 7, 12]
211
+ for exp_test_y in exp_test_values:
212
+ print(f"P(Y = {exp_test_y}) = {pmf_sum_two_dice(exp_test_y)}")
213
+ return exp_test_values, exp_test_y, pmf_sum_two_dice
214
+
215
+
216
+ @app.cell
217
+ def _(pmf_sum_two_dice):
218
+ def calc_expectation_sum_two_dice():
219
+ """Calculate the expected value of the sum of two dice."""
220
+ exp_sum_two_dice = 0
221
+ # Sum of dice can take on the values 2 through 12
222
+ for exp_x in range(2, 13):
223
+ exp_pr_x = pmf_sum_two_dice(exp_x) # PMF gives P(sum is x)
224
+ exp_sum_two_dice += exp_x * exp_pr_x
225
+ return exp_sum_two_dice
226
+
227
+ exp_sum_result = calc_expectation_sum_two_dice()
228
+
229
+ # Round to 2 decimal places for display
230
+ exp_sum_result_rounded = round(exp_sum_result, 2)
231
+
232
+ print(f"Expected value of the sum of two dice: {exp_sum_result_rounded}")
233
+
234
+ # Let's also verify this with a direct calculation
235
+ exp_direct_calc = sum(x * pmf_sum_two_dice(x) for x in range(2, 13))
236
+ exp_direct_calc_rounded = round(exp_direct_calc, 2)
237
+
238
+ print(f"Direct calculation: {exp_direct_calc_rounded}")
239
+
240
+ # Verify that this equals 7
241
+ print(f"Is the expected value exactly 7? {abs(exp_sum_result - 7) < 1e-10}")
242
+ return (
243
+ calc_expectation_sum_two_dice,
244
+ exp_direct_calc,
245
+ exp_direct_calc_rounded,
246
+ exp_sum_result,
247
+ exp_sum_result_rounded,
248
+ )
249
+
250
+
251
+ @app.cell(hide_code=True)
252
+ def _(mo):
253
+ mo.md(
254
+ r"""
255
+ ### Visualizing Expectation
256
+
257
+ Let's visualize the expectation for the sum of two dice. The expected value is the "center of mass" of the PMF:
258
+ """
259
+ )
260
+ return
261
+
262
+
263
+ @app.cell(hide_code=True)
264
+ def _(plt, pmf_sum_two_dice):
265
+ # Create the visualization
266
+ exp_y_values = list(range(2, 13))
267
+ exp_probabilities = [pmf_sum_two_dice(y) for y in exp_y_values]
268
+
269
+ dice_fig, dice_ax = plt.subplots(figsize=(10, 5))
270
+ dice_ax.bar(exp_y_values, exp_probabilities, width=0.4)
271
+ dice_ax.axvline(x=7, color='r', linestyle='--', linewidth=2, label='Expected Value (7)')
272
+
273
+ dice_ax.set_xticks(exp_y_values)
274
+ dice_ax.set_xlabel('Sum of two dice (y)')
275
+ dice_ax.set_ylabel('Probability: P(Y = y)')
276
+ dice_ax.set_title('PMF of Sum of Two Dice with Expected Value')
277
+ dice_ax.grid(alpha=0.3)
278
+ dice_ax.legend()
279
+
280
+ # Add probability values on top of bars
281
+ for exp_i, exp_prob in enumerate(exp_probabilities):
282
+ dice_ax.text(exp_y_values[exp_i], exp_prob + 0.001, f'{exp_prob:.3f}', ha='center')
283
+
284
+ plt.tight_layout()
285
+ plt.gca()
286
+ return dice_ax, dice_fig, exp_i, exp_prob, exp_probabilities, exp_y_values
287
+
288
+
289
+ @app.cell(hide_code=True)
290
+ def _(mo):
291
+ mo.md(
292
+ r"""
293
+ ## Demonstrating the Properties of Expectation
294
+
295
+ Let's demonstrate some of these properties with examples:
296
+ """
297
+ )
298
+ return
299
+
300
+
301
+ @app.cell
302
+ def _(exp_die_result):
303
+ # Demonstrate linearity of expectation (1)
304
+ # E[aX + b] = a*E[X] + b
305
+
306
+ # For a die roll X with E[X] = 3.5
307
+ prop_a = 2
308
+ prop_b = 10
309
+
310
+ # Calculate E[2X + 10] using the property
311
+ prop_expected_using_property = prop_a * exp_die_result + prop_b
312
+ prop_expected_using_property_rounded = round(prop_expected_using_property, 2)
313
+
314
+ print(f"Using linearity property: E[{prop_a}X + {prop_b}] = {prop_a} * E[X] + {prop_b} = {prop_expected_using_property_rounded}")
315
+
316
+ # Calculate E[2X + 10] directly
317
+ prop_expected_direct = sum((prop_a * x + prop_b) * (1/6) for x in range(1, 7))
318
+ prop_expected_direct_rounded = round(prop_expected_direct, 2)
319
+
320
+ print(f"Direct calculation: E[{prop_a}X + {prop_b}] = {prop_expected_direct_rounded}")
321
+
322
+ # Verify they match
323
+ print(f"Do they match? {abs(prop_expected_using_property - prop_expected_direct) < 1e-10}")
324
+ return (
325
+ prop_a,
326
+ prop_b,
327
+ prop_expected_direct,
328
+ prop_expected_direct_rounded,
329
+ prop_expected_using_property,
330
+ prop_expected_using_property_rounded,
331
+ )
332
+
333
+
334
+ @app.cell(hide_code=True)
335
+ def _(mo):
336
+ mo.md(
337
+ r"""
338
+ ### Law of the Unconscious Statistician (LOTUS)
339
+
340
+ Let's use LOTUS to calculate $E[X^2]$ for a die roll, which will be useful when we study variance:
341
+ """
342
+ )
343
+ return
344
+
345
+
346
+ @app.cell
347
+ def _():
348
+ # Calculate E[X^2] for a die roll using LOTUS (3)
349
+ lotus_die_values = range(1, 7)
350
+ lotus_die_probs = [1/6] * 6
351
+
352
+ # Using LOTUS: E[X^2] = sum(x^2 * P(X=x))
353
+ lotus_expected_x_squared = sum(x**2 * p for x, p in zip(lotus_die_values, lotus_die_probs))
354
+ lotus_expected_x_squared_rounded = round(lotus_expected_x_squared, 2)
355
+
356
+ expected_x_squared = 3.5**2
357
+ expected_x_squared_rounded = round(expected_x_squared, 2)
358
+
359
+ print(f"E[X^2] for a die roll = {lotus_expected_x_squared_rounded}")
360
+ print(f"(E[X])^2 for a die roll = {expected_x_squared_rounded}")
361
+ return (
362
+ expected_x_squared,
363
+ expected_x_squared_rounded,
364
+ lotus_die_probs,
365
+ lotus_die_values,
366
+ lotus_expected_x_squared,
367
+ lotus_expected_x_squared_rounded,
368
+ )
369
+
370
+
371
+ @app.cell(hide_code=True)
372
+ def _(mo):
373
+ mo.md(
374
+ r"""
375
+ /// Note
376
+ Note that E[X^2] != (E[X])^2
377
+ """
378
+ )
379
+ return
380
+
381
+
382
+ @app.cell(hide_code=True)
383
+ def _(mo):
384
+ mo.md(
385
+ r"""
386
+ ## Interactive Example
387
+
388
+ Let's explore how the expected value changes as we adjust the parameters of common probability distributions. This interactive visualization focuses specifically on the relationship between distribution parameters and expected values.
389
+
390
+ Use the controls below to select a distribution and adjust its parameters. The graph will show how the expected value changes across a range of parameter values.
391
+ """
392
+ )
393
+ return
394
+
395
+
396
+ @app.cell(hide_code=True)
397
+ def _(mo):
398
+ # Create UI elements for distribution selection
399
+ dist_selection = mo.ui.dropdown(
400
+ options=[
401
+ "bernoulli",
402
+ "binomial",
403
+ "geometric",
404
+ "poisson"
405
+ ],
406
+ value="bernoulli",
407
+ label="Select a distribution"
408
+ )
409
+ return (dist_selection,)
410
+
411
+
412
+ @app.cell(hide_code=True)
413
+ def _(dist_selection):
414
+ dist_selection.center()
415
+ return
416
+
417
+
418
+ @app.cell(hide_code=True)
419
+ def _(dist_description):
420
+ dist_description
421
+ return
422
+
423
+
424
+ @app.cell(hide_code=True)
425
+ def _(mo):
426
+ mo.md("""### Adjust Parameters""")
427
+ return
428
+
429
+
430
+ @app.cell(hide_code=True)
431
+ def _(controls):
432
+ controls
433
+ return
434
+
435
+
436
+ @app.cell(hide_code=True)
437
+ def _(
438
+ dist_selection,
439
+ lambda_range,
440
+ np,
441
+ param_lambda,
442
+ param_n,
443
+ param_p,
444
+ param_range,
445
+ plt,
446
+ ):
447
+ # Calculate expected values based on the selected distribution
448
+ if dist_selection.value == "bernoulli":
449
+ # Get parameter range for visualization
450
+ p_min, p_max = param_range.value
451
+ param_values = np.linspace(p_min, p_max, 100)
452
+
453
+ # E[X] = p for Bernoulli
454
+ expected_values = param_values
455
+ current_param = param_p.value
456
+ current_expected = round(current_param, 2)
457
+ x_label = "p (probability of success)"
458
+ title = "Expected Value of Bernoulli Distribution"
459
+ formula = "E[X] = p"
460
+
461
+ elif dist_selection.value == "binomial":
462
+ p_min, p_max = param_range.value
463
+ param_values = np.linspace(p_min, p_max, 100)
464
+
465
+ # E[X] = np for Binomial
466
+ n = int(param_n.value)
467
+ expected_values = [n * p for p in param_values]
468
+ current_param = param_p.value
469
+ current_expected = round(n * current_param, 2)
470
+ x_label = "p (probability of success)"
471
+ title = f"Expected Value of Binomial Distribution (n={n})"
472
+ formula = f"E[X] = n × p = {n} × p"
473
+
474
+ elif dist_selection.value == "geometric":
475
+ p_min, p_max = param_range.value
476
+ # Ensure p is not 0 for geometric distribution
477
+ p_min = max(0.01, p_min)
478
+ param_values = np.linspace(p_min, p_max, 100)
479
+
480
+ # E[X] = 1/p for Geometric
481
+ expected_values = [1/p for p in param_values]
482
+ current_param = param_p.value
483
+ current_expected = round(1 / current_param, 2)
484
+ x_label = "p (probability of success)"
485
+ title = "Expected Value of Geometric Distribution"
486
+ formula = "E[X] = 1/p"
487
+
488
+ else: # Poisson
489
+ lambda_min, lambda_max = lambda_range.value
490
+ param_values = np.linspace(lambda_min, lambda_max, 100)
491
+
492
+ # E[X] = lambda for Poisson
493
+ expected_values = param_values
494
+ current_param = param_lambda.value
495
+ current_expected = round(current_param, 2)
496
+ x_label = "λ (rate parameter)"
497
+ title = "Expected Value of Poisson Distribution"
498
+ formula = "E[X] = λ"
499
+
500
+ # Create the plot
501
+ dist_fig, dist_ax = plt.subplots(figsize=(10, 6))
502
+
503
+ # Plot the expected value function
504
+ dist_ax.plot(param_values, expected_values, 'b-', linewidth=2, label="Expected Value Function")
505
+
506
+ dist_ax.plot(current_param, current_expected, 'ro', markersize=10, label=f"Current Value: E[X] = {current_expected}")
507
+
508
+ dist_ax.hlines(current_expected, param_values[0], current_param, colors='r', linestyles='dashed')
509
+
510
+ dist_ax.vlines(current_param, 0, current_expected, colors='r', linestyles='dashed')
511
+
512
+ dist_ax.fill_between(param_values, 0, expected_values, alpha=0.2, color='blue')
513
+
514
+ dist_ax.set_xlabel(x_label, fontsize=12)
515
+ dist_ax.set_ylabel("Expected Value: E[X]", fontsize=12)
516
+ dist_ax.set_title(title, fontsize=14, fontweight='bold')
517
+ dist_ax.grid(True, alpha=0.3)
518
+
519
+ # Move legend to lower right to avoid overlap with formula
520
+ dist_ax.legend(loc='lower right', fontsize=10)
521
+
522
+ # Add formula text box in upper left
523
+ dist_props = dict(boxstyle='round', facecolor='white', alpha=0.8)
524
+ dist_ax.text(0.02, 0.95, formula, transform=dist_ax.transAxes, fontsize=12,
525
+ verticalalignment='top', bbox=dist_props)
526
+
527
+ if dist_selection.value == "geometric":
528
+ max_y = min(50, 2/max(0.01, param_values[0]))
529
+ dist_ax.set_ylim(0, max_y)
530
+ elif dist_selection.value == "binomial":
531
+ dist_ax.set_ylim(0, int(param_n.value) + 1)
532
+ else:
533
+ dist_ax.set_ylim(0, max(expected_values) * 1.1)
534
+
535
+ annotation_x = current_param + (param_values[-1] - param_values[0]) * 0.05
536
+ annotation_y = current_expected
537
+
538
+ # Adjust annotation position if it would go off the chart
539
+ if annotation_x > param_values[-1] * 0.9:
540
+ annotation_x = current_param - (param_values[-1] - param_values[0]) * 0.2
541
+
542
+ dist_ax.annotate(
543
+ f"Parameter: {current_param:.2f}\nE[X] = {current_expected}",
544
+ xy=(current_param, current_expected),
545
+ xytext=(annotation_x, annotation_y),
546
+ arrowprops=dict(facecolor='black', shrink=0.05, width=1.5, alpha=0.7),
547
+ bbox=dist_props
548
+ )
549
+
550
+ plt.tight_layout()
551
+ plt.gca()
552
+ return (
553
+ annotation_x,
554
+ annotation_y,
555
+ current_expected,
556
+ current_param,
557
+ dist_ax,
558
+ dist_fig,
559
+ dist_props,
560
+ expected_values,
561
+ formula,
562
+ lambda_max,
563
+ lambda_min,
564
+ max_y,
565
+ n,
566
+ p_max,
567
+ p_min,
568
+ param_values,
569
+ title,
570
+ x_label,
571
+ )
572
+
573
+
574
+ @app.cell(hide_code=True)
575
+ def _(mo):
576
+ mo.md(
577
+ r"""
578
+ ## Expectation vs. Mode
579
+
580
+ The expected value (mean) of a random variable is not always the same as its most likely value (mode). Let's explore this with an example:
581
+ """
582
+ )
583
+ return
584
+
585
+
586
+ @app.cell(hide_code=True)
587
+ def _(np, plt, stats):
588
+ # Create a skewed distribution
589
+ skew_n = 10
590
+ skew_p = 0.25
591
+
592
+ # Binomial PMF
593
+ skew_x_values = np.arange(0, skew_n+1)
594
+ skew_pmf_values = stats.binom.pmf(skew_x_values, skew_n, skew_p)
595
+
596
+ # Find the mode (most likely value)
597
+ skew_mode = skew_x_values[np.argmax(skew_pmf_values)]
598
+
599
+ # Calculate the expected value
600
+ skew_expected = skew_n * skew_p
601
+ skew_expected_rounded = round(skew_expected, 2)
602
+
603
+ skew_fig, skew_ax = plt.subplots(figsize=(10, 5))
604
+ skew_ax.bar(skew_x_values, skew_pmf_values, alpha=0.7, width=0.4)
605
+
606
+ # Add vertical lines for mode and expected value
607
+ skew_ax.axvline(x=skew_mode, color='g', linestyle='--', linewidth=2,
608
+ label=f'Mode = {skew_mode} (Most likely value)')
609
+ skew_ax.axvline(x=skew_expected, color='r', linestyle='--', linewidth=2,
610
+ label=f'Expected Value = {skew_expected_rounded} (Mean)')
611
+
612
+ skew_ax.annotate('Mode', xy=(skew_mode, 0.05), xytext=(skew_mode-2.0, 0.1),
613
+ arrowprops=dict(facecolor='green', shrink=0.05, width=1.5), color='green')
614
+ skew_ax.annotate('Expected Value', xy=(skew_expected, 0.05), xytext=(skew_expected+1, 0.15),
615
+ arrowprops=dict(facecolor='red', shrink=0.05, width=1.5), color='red')
616
+
617
+ if skew_mode != int(skew_expected):
618
+ min_x = min(skew_mode, skew_expected)
619
+ max_x = max(skew_mode, skew_expected)
620
+ skew_ax.axvspan(min_x, max_x, alpha=0.2, color='purple')
621
+
622
+ # Add text explaining the difference
623
+ mid_x = (skew_mode + skew_expected) / 2
624
+ skew_ax.text(mid_x, max(skew_pmf_values) * 0.5,
625
+ f"Difference: {abs(skew_mode - skew_expected_rounded):.2f}",
626
+ ha='center', va='center', bbox=dict(facecolor='white', alpha=0.7))
627
+
628
+ skew_ax.set_xlabel('Number of Successes')
629
+ skew_ax.set_ylabel('Probability')
630
+ skew_ax.set_title(f'Binomial Distribution (n={skew_n}, p={skew_p})')
631
+ skew_ax.grid(alpha=0.3)
632
+ skew_ax.legend()
633
+
634
+ plt.tight_layout()
635
+ plt.gca()
636
+ return (
637
+ max_x,
638
+ mid_x,
639
+ min_x,
640
+ skew_ax,
641
+ skew_expected,
642
+ skew_expected_rounded,
643
+ skew_fig,
644
+ skew_mode,
645
+ skew_n,
646
+ skew_p,
647
+ skew_pmf_values,
648
+ skew_x_values,
649
+ )
650
+
651
+
652
+ @app.cell(hide_code=True)
653
+ def _(mo):
654
+ mo.md(
655
+ r"""
656
+ /// NOTE
657
+ For the sum of two dice we calculated earlier, we found the expected value to be exactly 7. In that case, 7 also happens to be the mode (most likely outcome) of the distribution. However, this is just a coincidence for this particular example!
658
+
659
+ As we can see from the binomial distribution above, the expected value (2.50) and the mode (2) are often different values (this is common in skewed distributions). The expected value represents the "center of mass" of the distribution, while the mode represents the most likely single outcome.
660
+ """
661
+ )
662
+ return
663
+
664
+
665
+ @app.cell(hide_code=True)
666
+ def _(mo):
667
+ mo.md(
668
+ r"""
669
+ ## 🤔 Test Your Understanding
670
+
671
+ Choose what you believe are the correct options in the questions below:
672
+
673
+ <details>
674
+ <summary>The expected value of a random variable is always one of the possible values the random variable can take.</summary>
675
+ ❌ False! The expected value is a weighted average and may not be a value the random variable can actually take. For example, the expected value of a fair die roll is 3.5, which is not a possible outcome.
676
+ </details>
677
+
678
+ <details>
679
+ <summary>If X and Y are independent random variables, then E[X·Y] = E[X]·E[Y].</summary>
680
+ ✅ True! For independent random variables, the expectation of their product equals the product of their expectations.
681
+ </details>
682
+
683
+ <details>
684
+ <summary>The expected value of a constant random variable (one that always takes the same value) is that constant.</summary>
685
+ ✅ True! If X = c with probability 1, then E[X] = c.
686
+ </details>
687
+
688
+ <details>
689
+ <summary>The expected value of the sum of two random variables is always the sum of their expected values, regardless of whether they are independent.</summary>
690
+ ✅ True! This is the linearity of expectation property: E[X + Y] = E[X] + E[Y], which holds regardless of dependence.
691
+ </details>
692
+ """
693
+ )
694
+ return
695
+
696
+
697
+ @app.cell(hide_code=True)
698
+ def _(mo):
699
+ mo.md(
700
+ r"""
701
+ ## Practical Applications of Expectation
702
+
703
+ Expected values show up everywhere - from investment decisions and insurance pricing to machine learning algorithms and game design. Engineers use them to predict system reliability, data scientists to understand customer behavior, and economists to model market outcomes. They're essential for risk assessment in project management and for optimizing resource allocation in operations research.
704
+ """
705
+ )
706
+ return
707
+
708
+
709
+ @app.cell(hide_code=True)
710
+ def _(mo):
711
+ mo.md(
712
+ r"""
713
+ ## Key Takeaways
714
+
715
+ Expectation gives us a single value that summarizes a random variable's central tendency - it's the weighted average of all possible outcomes, where the weights are probabilities. The linearity property makes expectations easy to work with, even for complex combinations of random variables. While a PMF gives the complete probability picture, expectation provides an essential summary that helps us make decisions under uncertainty. In our next notebook, we'll explore variance, which measures how spread out a random variable's values are around its expectation.
716
+ """
717
+ )
718
+ return
719
+
720
+
721
+ @app.cell(hide_code=True)
722
+ def _(mo):
723
+ mo.md(r"""#### Appendix (containing helper code)""")
724
+ return
725
+
726
+
727
+ @app.cell(hide_code=True)
728
+ def _():
729
+ import marimo as mo
730
+ return (mo,)
731
+
732
+
733
+ @app.cell(hide_code=True)
734
+ def _():
735
+ import matplotlib.pyplot as plt
736
+ import numpy as np
737
+ from scipy import stats
738
+ import collections
739
+ return collections, np, plt, stats
740
+
741
+
742
+ @app.cell(hide_code=True)
743
+ def _(dist_selection, mo):
744
+ # Parameter controls for probability-based distributions
745
+ param_p = mo.ui.slider(
746
+ start=0.01,
747
+ stop=0.99,
748
+ step=0.01,
749
+ value=0.5,
750
+ label="p (probability of success)",
751
+ full_width=True
752
+ )
753
+
754
+ # Parameter control for binomial distribution
755
+ param_n = mo.ui.slider(
756
+ start=1,
757
+ stop=50,
758
+ step=1,
759
+ value=10,
760
+ label="n (number of trials)",
761
+ full_width=True
762
+ )
763
+
764
+ # Parameter control for Poisson distribution
765
+ param_lambda = mo.ui.slider(
766
+ start=0.1,
767
+ stop=20,
768
+ step=0.1,
769
+ value=5,
770
+ label="λ (rate parameter)",
771
+ full_width=True
772
+ )
773
+
774
+ # Parameter range sliders for visualization
775
+ param_range = mo.ui.range_slider(
776
+ start=0,
777
+ stop=1,
778
+ step=0.01,
779
+ value=[0, 1],
780
+ label="Parameter range to visualize",
781
+ full_width=True
782
+ )
783
+
784
+ lambda_range = mo.ui.range_slider(
785
+ start=0,
786
+ stop=20,
787
+ step=0.1,
788
+ value=[0, 20],
789
+ label="λ range to visualize",
790
+ full_width=True
791
+ )
792
+
793
+ # Display appropriate controls based on the selected distribution
794
+ if dist_selection.value == "bernoulli":
795
+ controls = mo.hstack([param_p, param_range], justify="space-around")
796
+ elif dist_selection.value == "binomial":
797
+ controls = mo.hstack([param_p, param_n, param_range], justify="space-around")
798
+ elif dist_selection.value == "geometric":
799
+ controls = mo.hstack([param_p, param_range], justify="space-around")
800
+ else: # poisson
801
+ controls = mo.hstack([param_lambda, lambda_range], justify="space-around")
802
+ return controls, lambda_range, param_lambda, param_n, param_p, param_range
803
+
804
+
805
+ @app.cell(hide_code=True)
806
+ def _(dist_selection, mo):
807
+ # Create distribution descriptions based on selection
808
+ if dist_selection.value == "bernoulli":
809
+ dist_description = mo.md(
810
+ r"""
811
+ **Bernoulli Distribution**
812
+
813
+ A Bernoulli distribution models a single trial with two possible outcomes: success (1) or failure (0).
814
+
815
+ - Parameter: $p$ = probability of success
816
+ - Expected Value: $E[X] = p$
817
+ - Example: Flipping a coin once (p = 0.5 for a fair coin)
818
+ """
819
+ )
820
+ elif dist_selection.value == "binomial":
821
+ dist_description = mo.md(
822
+ r"""
823
+ **Binomial Distribution**
824
+
825
+ A Binomial distribution models the number of successes in $n$ independent trials.
826
+
827
+ - Parameters: $n$ = number of trials, $p$ = probability of success
828
+ - Expected Value: $E[X] = np$
829
+ - Example: Number of heads in 10 coin flips
830
+ """
831
+ )
832
+ elif dist_selection.value == "geometric":
833
+ dist_description = mo.md(
834
+ r"""
835
+ **Geometric Distribution**
836
+
837
+ A Geometric distribution models the number of trials until the first success.
838
+
839
+ - Parameter: $p$ = probability of success
840
+ - Expected Value: $E[X] = \frac{1}{p}$
841
+ - Example: Number of coin flips until first heads
842
+ """
843
+ )
844
+ else: # poisson
845
+ dist_description = mo.md(
846
+ r"""
847
+ **Poisson Distribution**
848
+
849
+ A Poisson distribution models the number of events occurring in a fixed interval.
850
+
851
+ - Parameter: $\lambda$ = average rate of events
852
+ - Expected Value: $E[X] = \lambda$
853
+ - Example: Number of emails received per hour
854
+ """
855
+ )
856
+ return (dist_description,)
857
+
858
+
859
+ if __name__ == "__main__":
860
+ app.run()