Haleshot commited on
Commit
13db4ac
·
unverified ·
1 Parent(s): 6c07e48

Add `random-variables` notebook

Browse files
Files changed (1) hide show
  1. probability/09_random_variables.py +552 -0
probability/09_random_variables.py ADDED
@@ -0,0 +1,552 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # /// script
2
+ # requires-python = ">=3.10"
3
+ # dependencies = [
4
+ # "marimo",
5
+ # "matplotlib==3.10.0",
6
+ # "numpy==2.2.3",
7
+ # "scipy==1.15.2",
8
+ # ]
9
+ # ///
10
+
11
+ import marimo
12
+
13
+ __generated_with = "0.11.9"
14
+ app = marimo.App(width="medium", app_title="Random Variables")
15
+
16
+
17
+ @app.cell
18
+ def _():
19
+ import marimo as mo
20
+ return (mo,)
21
+
22
+
23
+ @app.cell
24
+ def _():
25
+ import matplotlib.pyplot as plt
26
+ import numpy as np
27
+ from scipy import stats
28
+ return np, plt, stats
29
+
30
+
31
+ @app.cell(hide_code=True)
32
+ def _(mo):
33
+ mo.md(
34
+ r"""
35
+ # Random Variables
36
+
37
+ _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/rvs/), by Stanford professor Chris Piech._
38
+
39
+ Random variables are functions that map outcomes from a probability space to numbers. This mathematical abstraction allows us to:
40
+
41
+ - Work with numerical outcomes in probability
42
+ - Calculate expected values and variances
43
+ - Model real-world phenomena quantitatively
44
+ """
45
+ )
46
+ return
47
+
48
+
49
+ @app.cell(hide_code=True)
50
+ def _(mo):
51
+ mo.md(
52
+ r"""
53
+ ## Types of Random Variables
54
+
55
+ ### Discrete Random Variables
56
+ - Take on countable values (finite or infinite)
57
+ - Described by a probability mass function (PMF)
58
+ - Example: Number of heads in 3 coin flips
59
+
60
+ ### Continuous Random Variables
61
+ - Take on uncountable values in an interval
62
+ - Described by a probability density function (PDF)
63
+ - Example: Height of a randomly selected person
64
+ """
65
+ )
66
+ return
67
+
68
+
69
+ @app.cell(hide_code=True)
70
+ def _(mo):
71
+ mo.md(
72
+ r"""
73
+ ## Properties of Random Variables
74
+
75
+ Each random variable has several key properties that help us understand and work with it:
76
+
77
+ | Property | Description | Example |
78
+ |----------|-------------|---------|
79
+ | Meaning | Semantic description | Number of successes in n trials |
80
+ | Symbol | Notation used | $X$, $Y$, $Z$ |
81
+ | Support/Range | Possible values | $\{0,1,2,...,n\}$ for binomial |
82
+ | Distribution | PMF or PDF | $p_X(x)$ or $f_X(x)$ |
83
+ | Expectation | Weighted average | $E[X]$ |
84
+ | Variance | Measure of spread | $Var(X)$ |
85
+ | Standard Deviation | Square root of variance | $\sigma_X$ |
86
+ | Mode | Most likely value | argmax$_x$ $p_X(x)$ |
87
+
88
+ Additional properties include:
89
+
90
+ - [Entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory)) (measure of uncertainty)
91
+ - [Median](https://en.wikipedia.org/wiki/Median) (middle value)
92
+ - [Skewness](https://en.wikipedia.org/wiki/Skewness) (asymmetry measure)
93
+ - [Kurtosis](https://en.wikipedia.org/wiki/Kurtosis) (tail heaviness measure)
94
+ """
95
+ )
96
+ return
97
+
98
+
99
+ @app.cell(hide_code=True)
100
+ def _(mo):
101
+ mo.md(
102
+ r"""
103
+ ## Probability Mass Functions (PMF)
104
+
105
+ For discrete random variables, the PMF $p_X(x)$ gives the probability that $X$ equals $x$:
106
+
107
+ $p_X(x) = P(X = x)$
108
+
109
+ Properties of a PMF:
110
+
111
+ 1. $p_X(x) \geq 0$ for all $x$
112
+ 2. $\sum_x p_X(x) = 1$
113
+
114
+ Let's implement a PMF for rolling a fair die:
115
+ """
116
+ )
117
+ return
118
+
119
+
120
+ @app.cell
121
+ def _(np, plt):
122
+ def die_pmf(x):
123
+ if x in [1, 2, 3, 4, 5, 6]:
124
+ return 1/6
125
+ return 0
126
+
127
+ # Plot the PMF
128
+ _x = np.arange(1, 7)
129
+ probabilities = [die_pmf(i) for i in _x]
130
+
131
+ plt.figure(figsize=(8, 4))
132
+ plt.bar(_x, probabilities)
133
+ plt.title("PMF of Rolling a Fair Die")
134
+ plt.xlabel("Outcome")
135
+ plt.ylabel("Probability")
136
+ plt.grid(True, alpha=0.3)
137
+ plt.gca()
138
+ return die_pmf, probabilities
139
+
140
+
141
+ @app.cell(hide_code=True)
142
+ def _(mo):
143
+ mo.md(
144
+ r"""
145
+ ## Probability Density Functions (PDF)
146
+
147
+ For continuous random variables, we use a PDF $f_X(x)$. The probability of $X$ falling in an interval $[a,b]$ is:
148
+
149
+ $P(a \leq X \leq b) = \int_a^b f_X(x)dx$
150
+
151
+ Properties of a PDF:
152
+
153
+ 1. $f_X(x) \geq 0$ for all $x$
154
+ 2. $\int_{-\infty}^{\infty} f_X(x)dx = 1$
155
+
156
+ Let's look at the normal distribution, a common continuous random variable:
157
+ """
158
+ )
159
+ return
160
+
161
+
162
+ @app.cell
163
+ def _(np, plt, stats):
164
+ # Generate points for plotting
165
+ _x = np.linspace(-4, 4, 100)
166
+ _pdf = stats.norm.pdf(_x, loc=0, scale=1)
167
+
168
+ plt.figure(figsize=(8, 4))
169
+ plt.plot(_x, _pdf, 'b-', label='PDF')
170
+ plt.fill_between(_x, _pdf, where=(_x >= -1) & (_x <= 1), alpha=0.3)
171
+ plt.title("Standard Normal Distribution")
172
+ plt.xlabel("x")
173
+ plt.ylabel("Density")
174
+ plt.grid(True, alpha=0.3)
175
+ plt.legend()
176
+ return
177
+
178
+
179
+ @app.cell(hide_code=True)
180
+ def _(mo):
181
+ mo.md(
182
+ r"""
183
+ ## Expected Value
184
+
185
+ The expected value $E[X]$ is the long-run average of a random variable.
186
+
187
+ For discrete random variables:
188
+ $E[X] = \sum_x x \cdot p_X(x)$
189
+
190
+ For continuous random variables:
191
+ $E[X] = \int_{-\infty}^{\infty} x \cdot f_X(x)dx$
192
+
193
+ Properties:
194
+
195
+ 1. $E[aX + b] = aE[X] + b$
196
+ 2. $E[X + Y] = E[X] + E[Y]$
197
+ """
198
+ )
199
+ return
200
+
201
+
202
+ @app.cell
203
+ def _(np):
204
+ def expected_value_discrete(x_values, probabilities):
205
+ return sum(x * p for x, p in zip(x_values, probabilities))
206
+
207
+ # Example: Expected value of a fair die roll
208
+ die_values = np.arange(1, 7)
209
+ die_probs = np.ones(6) / 6
210
+
211
+ E_X = expected_value_discrete(die_values, die_probs)
212
+ return E_X, die_probs, die_values, expected_value_discrete
213
+
214
+
215
+ @app.cell
216
+ def _(E_X):
217
+ print(E_X)
218
+ return
219
+
220
+
221
+ @app.cell(hide_code=True)
222
+ def _(mo):
223
+ mo.md(
224
+ r"""
225
+ ## Variance
226
+
227
+ The variance $Var(X)$ measures the spread of a random variable around its mean:
228
+
229
+ $Var(X) = E[(X - E[X])^2]$
230
+
231
+ This can be computed as:
232
+ $Var(X) = E[X^2] - (E[X])^2$
233
+
234
+ Properties:
235
+
236
+ 1. $Var(aX) = a^2Var(X)$
237
+ 2. $Var(X + b) = Var(X)$
238
+ """
239
+ )
240
+ return
241
+
242
+
243
+ @app.cell
244
+ def _(E_X, die_probs, die_values, np):
245
+ def variance_discrete(x_values, probabilities, expected_value):
246
+ squared_diff = [(x - expected_value)**2 for x in x_values]
247
+ return sum(d * p for d, p in zip(squared_diff, probabilities))
248
+
249
+ # Example: Variance of a fair die roll
250
+ var_X = variance_discrete(die_values, die_probs, E_X)
251
+ std_X = np.sqrt(var_X)
252
+ return std_X, var_X, variance_discrete
253
+
254
+
255
+ @app.cell(hide_code=True)
256
+ def _(mo, std_X, var_X):
257
+ mo.md(
258
+ f"""
259
+ ### Examples of Variance Calculation
260
+
261
+ For our fair die example:
262
+
263
+ - Variance: {var_X:.2f}
264
+ - Standard Deviation: {std_X:.2f}
265
+
266
+ This means that on average, a roll deviates from the mean (3.5) by about {std_X:.2f} units.
267
+
268
+ Let's look another example for a fair coin:
269
+ """
270
+ )
271
+ return
272
+
273
+
274
+ @app.cell
275
+ def _(variance_discrete):
276
+ # Fair coin (X = 0 or 1)
277
+ coin_values = [0, 1]
278
+ coin_probs = [0.5, 0.5]
279
+ coin_mean = sum(x * p for x, p in zip(coin_values, coin_probs))
280
+ coin_var = variance_discrete(coin_values, coin_probs, coin_mean)
281
+ return coin_mean, coin_probs, coin_values, coin_var
282
+
283
+
284
+ @app.cell
285
+ def _(np, stats, variance_discrete):
286
+ # Standard normal (discretized for example)
287
+ normal_values = np.linspace(-3, 3, 100)
288
+ normal_probs = stats.norm.pdf(normal_values)
289
+ normal_probs = normal_probs / sum(normal_probs) # normalize
290
+ normal_mean = 0
291
+ normal_var = variance_discrete(normal_values, normal_probs, normal_mean)
292
+ return normal_mean, normal_probs, normal_values, normal_var
293
+
294
+
295
+ @app.cell
296
+ def _(np, variance_discrete):
297
+ # Uniform on [0,1] (discretized for example)
298
+ uniform_values = np.linspace(0, 1, 100)
299
+ uniform_probs = np.ones_like(uniform_values) / len(uniform_values)
300
+ uniform_mean = 0.5
301
+ uniform_var = variance_discrete(uniform_values, uniform_probs, uniform_mean)
302
+ return uniform_mean, uniform_probs, uniform_values, uniform_var
303
+
304
+
305
+ @app.cell(hide_code=True)
306
+ def _(coin_var, mo, normal_var, uniform_var):
307
+ mo.md(
308
+ f"""
309
+ Let's look at some calculated variances:
310
+
311
+ - Fair coin (X = 0 or 1): Var(X) = {coin_var:.4f}
312
+ - Standard normal distribution (discretized): Var(X) ≈ {normal_var:.4f}
313
+ - Uniform distribution on [0,1] (discretized): Var(X) ≈ {uniform_var:.4f}
314
+ """
315
+ )
316
+ return
317
+
318
+
319
+ @app.cell(hide_code=True)
320
+ def _(mo):
321
+ mo.md(
322
+ r"""
323
+ ## Interactive Example: Comparing PMF and PDF
324
+
325
+ This example shows the relationship between a Binomial distribution (discrete) and its Normal approximation (continuous).
326
+ The parameters control both distributions:
327
+
328
+ - **Number of Trials**: Controls the range of possible values and the shape's width
329
+ - **Success Probability**: Affects the distribution's center and skewness
330
+ """
331
+ )
332
+ return
333
+
334
+
335
+ @app.cell
336
+ def _(mo, n_trials, p_success):
337
+ mo.hstack([n_trials, p_success], justify='space-around')
338
+ return
339
+
340
+
341
+ @app.cell(hide_code=True)
342
+ def _(mo):
343
+ # Distribution parameters
344
+ n_trials = mo.ui.slider(1, 20, value=10, label="Number of Trials")
345
+ p_success = mo.ui.slider(0, 1, value=0.5, step=0.05, label="Success Probability")
346
+ return n_trials, p_success
347
+
348
+
349
+ @app.cell(hide_code=True)
350
+ def _(n_trials, np, p_success, plt, stats):
351
+ fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
352
+
353
+ # Discrete: Binomial PMF
354
+ k = np.arange(0, n_trials.value + 1)
355
+ pmf = stats.binom.pmf(k, n_trials.value, p_success.value)
356
+ ax1.bar(k, pmf, alpha=0.8, color='#1f77b4', label='PMF')
357
+ ax1.set_title(f'Binomial PMF (n={n_trials.value}, p={p_success.value})')
358
+ ax1.set_xlabel('Number of Successes')
359
+ ax1.set_ylabel('Probability')
360
+ ax1.grid(True, alpha=0.3)
361
+
362
+ # Continuous: Normal PDF approx.
363
+ mu = n_trials.value * p_success.value
364
+ sigma = np.sqrt(n_trials.value * p_success.value * (1-p_success.value))
365
+ x = np.linspace(max(0, mu - 4*sigma), min(n_trials.value, mu + 4*sigma), 100)
366
+ pdf = stats.norm.pdf(x, mu, sigma)
367
+
368
+ ax2.plot(x, pdf, 'r-', linewidth=2, label='PDF')
369
+ ax2.fill_between(x, pdf, alpha=0.3, color='red')
370
+ ax2.set_title(f'Normal PDF (μ={mu:.1f}, σ={sigma:.1f})')
371
+ ax2.set_xlabel('Continuous Approximation')
372
+ ax2.set_ylabel('Density')
373
+ ax2.grid(True, alpha=0.3)
374
+
375
+ # Set consistent x-axis limits for better comparison
376
+ ax1.set_xlim(-0.5, n_trials.value + 0.5)
377
+ ax2.set_xlim(-0.5, n_trials.value + 0.5)
378
+
379
+ plt.tight_layout()
380
+ plt.gca()
381
+ return ax1, ax2, fig, k, mu, pdf, pmf, sigma, x
382
+
383
+
384
+ @app.cell(hide_code=True)
385
+ def _(mo, n_trials, np, p_success):
386
+ mo.md(f"""
387
+ **Current Distribution Properties:**
388
+
389
+ - Mean (μ) = {n_trials.value * p_success.value:.2f}
390
+ - Standard Deviation (σ) = {np.sqrt(n_trials.value * p_success.value * (1-p_success.value)):.2f}
391
+
392
+ Notice how the Normal distribution (right) approximates the Binomial distribution (left) better when:
393
+
394
+ 1. The number of trials is larger
395
+ 2. The success probability is closer to 0.5
396
+ """)
397
+ return
398
+
399
+
400
+ @app.cell(hide_code=True)
401
+ def _(mo):
402
+ mo.md(
403
+ r"""
404
+ ## Common Distributions
405
+
406
+ 1. Bernoulli Distribution
407
+ - Models a single success/failure experiment
408
+ - $P(X = 1) = p$, $P(X = 0) = 1-p$
409
+ - $E[X] = p$, $Var(X) = p(1-p)$
410
+
411
+ 2. Binomial Distribution
412
+
413
+ - Models number of successes in $n$ independent trials
414
+ - $P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}$
415
+ - $E[X] = np$, $Var(X) = np(1-p)$
416
+
417
+ 3. Normal Distribution
418
+
419
+ - Bell-shaped curve defined by mean $\mu$ and variance $\sigma^2$
420
+ - PDF: $f_X(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$
421
+ - $E[X] = \mu$, $Var(X) = \sigma^2$
422
+ """
423
+ )
424
+ return
425
+
426
+
427
+ @app.cell(hide_code=True)
428
+ def _(mo):
429
+ mo.md(
430
+ r"""
431
+ ## Practice Problems
432
+
433
+ ### Problem 1: Discrete Random Variable
434
+ Let $X$ be the sum when rolling two fair dice. Find:
435
+
436
+ 1. The support of $X$
437
+ 2. The PMF $p_X(x)$
438
+ 3. $E[X]$ and $Var(X)$
439
+
440
+ <details>
441
+ <summary>Solution</summary>
442
+ Let's solve this step by step:
443
+ ```python
444
+ def two_dice_pmf(x):
445
+ outcomes = [(i,j) for i in range(1,7) for j in range(1,7)]
446
+ favorable = [pair for pair in outcomes if sum(pair) == x]
447
+ return len(favorable)/36
448
+
449
+ # Support: {2,3,...,12}
450
+ # E[X] = 7
451
+ # Var(X) = 5.83
452
+ ```
453
+ </details>
454
+
455
+ ### Problem 2: Continuous Random Variable
456
+ For a uniform random variable on $[0,1]$, verify that:
457
+
458
+ 1. The PDF integrates to 1
459
+ 2. $E[X] = 1/2$
460
+ 3. $Var(X) = 1/12$
461
+
462
+ Try solving this yourself first, then check the solution below.
463
+ """
464
+ )
465
+ return
466
+
467
+
468
+ @app.cell
469
+ def _():
470
+ # DIY
471
+ return
472
+
473
+
474
+ @app.cell(hide_code=True)
475
+ def _(mktext, mo):
476
+ mo.accordion({"Solution": mktext}, lazy=True)
477
+ return
478
+
479
+
480
+ @app.cell(hide_code=True)
481
+ def _(mo):
482
+ mktext = mo.md(
483
+ r"""
484
+ Let's solve each part:
485
+
486
+ 1. **PDF integrates to 1**:
487
+ $\int_0^1 1 \, dx = [x]_0^1 = 1 - 0 = 1$
488
+
489
+ 2. **Expected Value**:
490
+ $E[X] = \int_0^1 x \cdot 1 \, dx = [\frac{x^2}{2}]_0^1 = \frac{1}{2} - 0 = \frac{1}{2}$
491
+
492
+ 3. **Variance**:
493
+ $Var(X) = E[X^2] - (E[X])^2$
494
+
495
+ First calculate $E[X^2]$:
496
+ $E[X^2] = \int_0^1 x^2 \cdot 1 \, dx = [\frac{x^3}{3}]_0^1 = \frac{1}{3}$
497
+
498
+ Then:
499
+ $Var(X) = \frac{1}{3} - (\frac{1}{2})^2 = \frac{1}{3} - \frac{1}{4} = \frac{1}{12}$
500
+ """
501
+ )
502
+ return (mktext,)
503
+
504
+
505
+ @app.cell(hide_code=True)
506
+ def _(mo):
507
+ mo.md(
508
+ r"""
509
+ ## 🤔 Test Your Understanding
510
+
511
+ Pick which of these statements about random variables you think are correct:
512
+
513
+ <details>
514
+ <summary>The probability density function can be greater than 1</summary>
515
+ ✅ Correct! Unlike PMFs, PDFs can exceed 1 as long as the total area equals 1.
516
+ </details>
517
+
518
+ <details>
519
+ <summary>The expected value of a random variable must equal one of its possible values</summary>
520
+ ❌ Incorrect! For example, the expected value of a fair die is 3.5, which is not a possible outcome.
521
+ </details>
522
+
523
+ <details>
524
+ <summary>Adding a constant to a random variable changes its variance</summary>
525
+ ❌ Incorrect! Adding a constant shifts the distribution but doesn't affect its spread.
526
+ </details>
527
+ """
528
+ )
529
+ return
530
+
531
+
532
+ @app.cell(hide_code=True)
533
+ def _(mo):
534
+ mo.md(
535
+ """
536
+ ## Summary
537
+
538
+ You've learned:
539
+
540
+ - The difference between discrete and continuous random variables
541
+ - How PMFs and PDFs describe probability distributions
542
+ - Methods for calculating expected values and variances
543
+ - Properties of common probability distributions
544
+
545
+ In the next lesson, we'll explore Probability Mass Functions in detail, focusing on their properties and applications.
546
+ """
547
+ )
548
+ return
549
+
550
+
551
+ if __name__ == "__main__":
552
+ app.run()