Haleshot commited on
Commit
5e6566d
·
unverified ·
1 Parent(s): a488e86

add `Poisson distribution` notebook

Browse files

A notebook that explores the Poisson distribution, including its definition, properties, and relationship to the binomial distribution. It features interactive visualizations, etc.

probability/15_poisson_distribution.py ADDED
@@ -0,0 +1,769 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # /// script
2
+ # requires-python = ">=3.10"
3
+ # dependencies = [
4
+ # "marimo",
5
+ # "matplotlib==3.10.0",
6
+ # "numpy==2.2.4",
7
+ # "scipy==1.15.2",
8
+ # "altair==5.2.0",
9
+ # "wigglystuff==0.1.10",
10
+ # "pandas==2.2.3",
11
+ # ]
12
+ # ///
13
+
14
+ import marimo
15
+
16
+ __generated_with = "0.11.24"
17
+ app = marimo.App(width="medium", app_title="Poisson Distribution")
18
+
19
+
20
+ @app.cell(hide_code=True)
21
+ def _(mo):
22
+ mo.md(
23
+ r"""
24
+ # Poisson Distribution
25
+
26
+ _This notebook is a computational companion to ["Probability for Computer Scientists"](https://chrispiech.github.io/probabilityForComputerScientists/en/part2/poisson/), by Stanford professor Chris Piech._
27
+
28
+ A Poisson random variable gives the probability of a given number of events in a fixed interval of time (or space). It makes the Poisson assumption that events occur with a known constant mean rate and independently of the time since the last event.
29
+ """
30
+ )
31
+ return
32
+
33
+
34
+ @app.cell(hide_code=True)
35
+ def _(mo):
36
+ mo.md(
37
+ r"""
38
+ ## Poisson Random Variable Definition
39
+
40
+ $X \sim \text{Poisson}(\lambda)$ represents a Poisson random variable where:
41
+
42
+ - $X$ is our random variable (number of events)
43
+ - $\text{Poisson}$ indicates it follows a Poisson distribution
44
+ - $\lambda$ is the rate parameter (average number of events per time interval)
45
+
46
+ ```
47
+ X ~ Poisson(λ)
48
+ ↑ ↑ ↑
49
+ | | +-- Rate parameter:
50
+ | | average number of
51
+ | | events per interval
52
+ | +-- Indicates Poisson
53
+ | distribution
54
+ |
55
+ Our random variable
56
+ counting number of events
57
+ ```
58
+
59
+ The Poisson distribution is particularly useful when:
60
+
61
+ 1. Events occur independently of each other
62
+ 2. The average rate of occurrence is constant
63
+ 3. Two events cannot occur at exactly the same instant
64
+ 4. The probability of an event is proportional to the length of the time interval
65
+ """
66
+ )
67
+ return
68
+
69
+
70
+ @app.cell(hide_code=True)
71
+ def _(mo):
72
+ mo.md(
73
+ r"""
74
+ ## Properties of Poisson Distribution
75
+
76
+ | Property | Formula |
77
+ |----------|---------|
78
+ | Notation | $X \sim \text{Poisson}(\lambda)$ |
79
+ | Description | Number of events in a fixed time frame if (a) events occur with a constant mean rate and (b) they occur independently of time since last event |
80
+ | Parameters | $\lambda \in \mathbb{R}^{+}$, the constant average rate |
81
+ | Support | $x \in \{0, 1, \dots\}$ |
82
+ | PMF equation | $P(X=x) = \frac{\lambda^x e^{-\lambda}}{x!}$ |
83
+ | Expectation | $E[X] = \lambda$ |
84
+ | Variance | $\text{Var}(X) = \lambda$ |
85
+
86
+ Note that unlike many other distributions, the Poisson distribution's mean and variance are equal, both being $\lambda$.
87
+
88
+ Let's explore how the Poisson distribution changes with different rate parameters.
89
+ """
90
+ )
91
+ return
92
+
93
+
94
+ @app.cell(hide_code=True)
95
+ def _(TangleSlider, mo):
96
+ # Create interactive elements using TangleSlider
97
+ lambda_slider = mo.ui.anywidget(TangleSlider(
98
+ amount=5,
99
+ min_value=0.1,
100
+ max_value=20,
101
+ step=0.1,
102
+ digits=1,
103
+ suffix=" events"
104
+ ))
105
+
106
+ # interactive controls
107
+ _controls = mo.vstack([
108
+ mo.md("### Adjust the Rate Parameter to See How Poisson Distribution Changes"),
109
+ mo.hstack([
110
+ mo.md("**Rate parameter (λ):** "),
111
+ lambda_slider,
112
+ mo.md("**events per interval.** Higher values shift the distribution rightward and make it more spread out.")
113
+ ], justify="start"),
114
+ ])
115
+ _controls
116
+ return (lambda_slider,)
117
+
118
+
119
+ @app.cell(hide_code=True)
120
+ def _(lambda_slider, np, plt, stats):
121
+ _lambda = lambda_slider.amount
122
+
123
+ # PMF for values
124
+ _max_x = max(20, int(_lambda * 3)) # Show at least up to 3*lambda
125
+ _x = np.arange(0, _max_x + 1)
126
+ _pmf = stats.poisson.pmf(_x, _lambda)
127
+
128
+ # Relevant key statistics
129
+ _mean = _lambda # For Poisson, mean = lambda
130
+ _variance = _lambda # For Poisson, variance = lambda
131
+ _std_dev = np.sqrt(_variance)
132
+
133
+ # plot
134
+ _fig, _ax = plt.subplots(figsize=(10, 6))
135
+
136
+ # PMF as bars
137
+ _ax.bar(_x, _pmf, color='royalblue', alpha=0.7, label=f'PMF: P(X=k)')
138
+
139
+ # for the PMF values
140
+ _ax.plot(_x, _pmf, 'ro-', alpha=0.6, label='PMF line')
141
+
142
+ # Vertical lines - mean and key values
143
+ _ax.axvline(x=_mean, color='green', linestyle='--', linewidth=2,
144
+ label=f'Mean: {_mean:.2f}')
145
+
146
+ # Stdev region
147
+ _ax.axvspan(_mean - _std_dev, _mean + _std_dev, alpha=0.2, color='green',
148
+ label=f'±1 Std Dev: {_std_dev:.2f}')
149
+
150
+ _ax.set_xlabel('Number of Events (k)')
151
+ _ax.set_ylabel('Probability: P(X=k)')
152
+ _ax.set_title(f'Poisson Distribution with λ={_lambda:.1f}')
153
+
154
+ # annotations
155
+ _ax.annotate(f'E[X] = {_mean:.2f}',
156
+ xy=(_mean, stats.poisson.pmf(int(_mean), _lambda)),
157
+ xytext=(_mean + 1, max(_pmf) * 0.8),
158
+ arrowprops=dict(facecolor='black', shrink=0.05, width=1))
159
+
160
+ _ax.annotate(f'Var(X) = {_variance:.2f}',
161
+ xy=(_mean, stats.poisson.pmf(int(_mean), _lambda) / 2),
162
+ xytext=(_mean + 1, max(_pmf) * 0.6),
163
+ arrowprops=dict(facecolor='black', shrink=0.05, width=1))
164
+
165
+ _ax.grid(alpha=0.3)
166
+ _ax.legend()
167
+
168
+ plt.tight_layout()
169
+ plt.gca()
170
+ return
171
+
172
+
173
+ @app.cell(hide_code=True)
174
+ def _(mo):
175
+ mo.md(
176
+ r"""
177
+ ## Poisson Intuition: Relation to Binomial Distribution
178
+
179
+ The Poisson distribution can be derived as a limiting case of the [binomial distribution](http://marimo.app/https://github.com/marimo-team/learn/blob/main/probability/14_binomial_distribution.py).
180
+
181
+ Let's work on a practical example: predicting the number of ride-sharing requests in a specific area over a one-minute interval. From historical data, we know that the average number of requests per minute is $\lambda = 5$.
182
+
183
+ We could approximate this using a binomial distribution by dividing our minute into smaller intervals. For example, we can divide a minute into 60 seconds and treat each second as a [Bernoulli trial](http://marimo.app/https://github.com/marimo-team/learn/blob/main/probability/13_bernoulli_distribution.py) - either there's a request (success) or there isn't (failure).
184
+
185
+ Let's visualize this concept:
186
+ """
187
+ )
188
+ return
189
+
190
+
191
+ @app.cell(hide_code=True)
192
+ def _(fig_to_image, mo, plt):
193
+ # Create a visualization of dividing a minute into 60 seconds
194
+ _fig, _ax = plt.subplots(figsize=(12, 2))
195
+
196
+ # Example events at 2.75s and 7.12s
197
+ _events = [2.75, 7.12]
198
+
199
+ # Create an array of 60 rectangles
200
+ for i in range(60):
201
+ _color = 'royalblue' if any(i <= e < i+1 for e in _events) else 'lightgray'
202
+ _ax.add_patch(plt.Rectangle((i, 0), 0.9, 1, color=_color))
203
+
204
+ # markers for events
205
+ for e in _events:
206
+ _ax.plot(e, 0.5, 'ro', markersize=10)
207
+
208
+ # labels
209
+ _ax.set_xlim(0, 60)
210
+ _ax.set_ylim(0, 1)
211
+ _ax.set_yticks([])
212
+ _ax.set_xticks([0, 15, 30, 45, 60])
213
+ _ax.set_xticklabels(['0s', '15s', '30s', '45s', '60s'])
214
+ _ax.set_xlabel('Time (seconds)')
215
+ _ax.set_title('One Minute Divided into 60 Second Intervals')
216
+
217
+ plt.tight_layout()
218
+
219
+ # Convert plot to image for display
220
+ _img = mo.image(fig_to_image(_fig), width="100%")
221
+
222
+ # explanation
223
+ _explanation = mo.md(
224
+ r"""
225
+ In this visualization:
226
+ - Each rectangle represents a 1-second interval
227
+ - Blue rectangles indicate intervals where an event occurred
228
+ - Red dots show the actual event times (2.75s and 7.12s)
229
+
230
+ If we treat this as a binomial experiment with 60 trials (seconds), we can calculate probabilities using the binomial PMF. But there's a problem: what if multiple events occur within the same second? To address this, we can divide our minute into smaller intervals.
231
+ """
232
+ )
233
+ return e, i
234
+
235
+
236
+ @app.cell(hide_code=True)
237
+ def _(mo):
238
+ mo.md(
239
+ r"""
240
+ The total number of requests received over the minute can be approximated as the sum of the sixty indicator variables, which conveniently matches the description of a binomial — a sum of Bernoullis.
241
+
242
+ Specifically, if we define $X$ to be the number of requests in a minute, $X$ is a binomial with $n=60$ trials. What is the probability, $p$, of a success on a single trial? To make the expectation of $X$ equal the observed historical average $\lambda$, we should choose $p$ so that:
243
+
244
+ \begin{align}
245
+ \lambda &= E[X] && \text{Expectation matches historical average} \\
246
+ \lambda &= n \cdot p && \text{Expectation of a Binomial is } n \cdot p \\
247
+ p &= \frac{\lambda}{n} && \text{Solving for $p$}
248
+ \end{align}
249
+
250
+ In this case, since $\lambda=5$ and $n=60$, we should choose $p=\frac{5}{60}=\frac{1}{12}$ and state that $X \sim \text{Bin}(n=60, p=\frac{5}{60})$. Now we can calculate the probability of different numbers of requests using the binomial PMF:
251
+
252
+ $P(X = x) = {n \choose x} p^x (1-p)^{n-x}$
253
+
254
+ For example:
255
+
256
+ \begin{align}
257
+ P(X=1) &= {60 \choose 1} (5/60)^1 (55/60)^{60-1} \approx 0.0295 \\
258
+ P(X=2) &= {60 \choose 2} (5/60)^2 (55/60)^{60-2} \approx 0.0790 \\
259
+ P(X=3) &= {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389
260
+ \end{align}
261
+
262
+ This is a good approximation, but it doesn't account for the possibility of multiple events in a single second. One solution is to divide our minute into even more fine-grained intervals. Let's try 600 deciseconds (tenths of a second):
263
+ """
264
+ )
265
+ return
266
+
267
+
268
+ @app.cell(hide_code=True)
269
+ def _(e, fig_to_image, mo, plt):
270
+ # Create a visualization of dividing a minute into 600 deciseconds
271
+ # (Just showing the first 100 for clarity)
272
+ _fig, _ax = plt.subplots(figsize=(12, 2))
273
+
274
+ # Example events at 2.75s and 7.12s (convert to deciseconds)
275
+ _events = [27.5, 71.2]
276
+
277
+ # Create a representative portion of the 600 rectangles (first 100)
278
+ for _i in range(100):
279
+ _color = 'royalblue' if any(_i <= _e < _i + 1 for _e in _events) else 'lightgray'
280
+ _ax.add_patch(plt.Rectangle((_i, 0), 0.9, 1, color=_color))
281
+
282
+ # Add markers for events
283
+ for _e in _events:
284
+ if _e < 100: # Only show events in our visible range
285
+ _ax.plot(e, 0.5, 'ro', markersize=10)
286
+
287
+ # Add labels
288
+ _ax.set_xlim(0, 100)
289
+ _ax.set_ylim(0, 1)
290
+ _ax.set_yticks([])
291
+ _ax.set_xticks([0, 20, 40, 60, 80, 100])
292
+ _ax.set_xticklabels(['0s', '2s', '4s', '6s', '8s', '10s'])
293
+ _ax.set_xlabel('Time (first 10 seconds shown)')
294
+ _ax.set_title('One Minute Divided into 600 Decisecond Intervals (first 100 shown)')
295
+
296
+ plt.tight_layout()
297
+
298
+ # Convert plot to image for display
299
+ _img = mo.image(fig_to_image(_fig), width="100%")
300
+
301
+ # Add explanation
302
+ _explanation = mo.md(
303
+ r"""
304
+ With $n=600$ and $p=\frac{5}{600}=\frac{1}{120}$, we can recalculate our probabilities:
305
+
306
+ \begin{align}
307
+ P(X=1) &= {600 \choose 1} (5/600)^1 (595/600)^{600-1} \approx 0.0333 \\
308
+ P(X=2) &= {600 \choose 2} (5/600)^2 (595/600)^{600-2} \approx 0.0837 \\
309
+ P(X=3) &= {600 \choose 3} (5/600)^3 (595/600)^{600-3} \approx 0.1402
310
+ \end{align}
311
+
312
+ As we make our intervals smaller (increasing $n$), our approximation becomes more accurate.
313
+ """
314
+ )
315
+ return
316
+
317
+
318
+ @app.cell(hide_code=True)
319
+ def _(mo):
320
+ mo.md(
321
+ r"""
322
+ ## The Binomial Distribution in the Limit
323
+
324
+ What happens if we continue dividing our time interval into smaller and smaller pieces? Let's explore how the probabilities change as we increase the number of intervals:
325
+ """
326
+ )
327
+ return
328
+
329
+
330
+ @app.cell(hide_code=True)
331
+ def _(mo):
332
+ # slider for number of intervals
333
+ intervals_slider = mo.ui.slider(
334
+ start = 60,
335
+ stop = 10000,
336
+ step=100,
337
+ value=600,
338
+ label="Number of intervals to divide a minute")
339
+ return (intervals_slider,)
340
+
341
+
342
+ @app.cell(hide_code=True)
343
+ def _(intervals_slider):
344
+ intervals_slider
345
+ return
346
+
347
+
348
+ @app.cell(hide_code=True)
349
+ def _(intervals_slider, np, pd, plt, stats):
350
+ # number of intervals from the slider
351
+ n = intervals_slider.value
352
+ _lambda = 5 # Fixed lambda for our example
353
+ p = _lambda / n
354
+
355
+ # Calculate the binomial probabilities
356
+ _x_values = np.arange(0, 15)
357
+ _binom_pmf = stats.binom.pmf(_x_values, n, p)
358
+
359
+ # Calculate the true Poisson probabilities
360
+ _poisson_pmf = stats.poisson.pmf(_x_values, _lambda)
361
+
362
+ # Create a DataFrame for comparison
363
+ df = pd.DataFrame({
364
+ 'Events': _x_values,
365
+ f'Binomial(n={n}, p={p:.6f})': _binom_pmf,
366
+ f'Poisson(λ=5)': _poisson_pmf,
367
+ 'Difference': np.abs(_binom_pmf - _poisson_pmf)
368
+ })
369
+
370
+ # Plot both PMFs
371
+ fig, _ax = plt.subplots(figsize=(10, 6))
372
+
373
+ # Bar plot for the binomial
374
+ _ax.bar(_x_values - 0.2, _binom_pmf, width=0.4, alpha=0.7,
375
+ color='royalblue', label=f'Binomial(n={n}, p={p:.6f})')
376
+
377
+ # Bar plot for the Poisson
378
+ _ax.bar(_x_values + 0.2, _poisson_pmf, width=0.4, alpha=0.7,
379
+ color='crimson', label='Poisson(λ=5)')
380
+
381
+ # Add labels and title
382
+ _ax.set_xlabel('Number of Events (k)')
383
+ _ax.set_ylabel('Probability')
384
+ _ax.set_title(f'Comparison of Binomial and Poisson PMFs with n={n}')
385
+ _ax.legend()
386
+ _ax.set_xticks(_x_values)
387
+ _ax.grid(alpha=0.3)
388
+
389
+ plt.tight_layout()
390
+ return df, fig, n, p
391
+
392
+
393
+ @app.cell(hide_code=True)
394
+ def _(df, fig, fig_to_image, mo, n, p):
395
+ # table of values
396
+ _styled_df = df.style.format({
397
+ f'Binomial(n={n}, p={p:.6f})': '{:.6f}',
398
+ f'Poisson(λ=5)': '{:.6f}',
399
+ 'Difference': '{:.6f}'
400
+ })
401
+
402
+ # Calculate the maximum absolute difference
403
+ _max_diff = df['Difference'].max()
404
+
405
+ # output
406
+ _chart = mo.image(fig_to_image(fig), width="100%")
407
+ _explanation = mo.md(f"**Maximum absolute difference between distributions: {_max_diff:.6f}**")
408
+ _table = mo.ui.table(df)
409
+
410
+ mo.vstack([_chart, _explanation, _table])
411
+ return
412
+
413
+
414
+ @app.cell(hide_code=True)
415
+ def _(mo):
416
+ mo.md(
417
+ r"""
418
+ As you can see from the interactive comparison above, as the number of intervals increases, the binomial distribution approaches the Poisson distribution! This is not a coincidence - the Poisson distribution is actually the limiting case of the binomial distribution when:
419
+
420
+ - The number of trials $n$ approaches infinity
421
+ - The probability of success $p$ approaches zero
422
+ - The product $np = \lambda$ remains constant
423
+
424
+ This relationship is why the Poisson distribution is so useful - it's easier to work with than a binomial with a very large number of trials and a very small probability of success.
425
+
426
+ ## Derivation of the Poisson PMF
427
+
428
+ Let's derive the Poisson PMF by taking the limit of the binomial PMF as $n \to \infty$. We start with:
429
+
430
+ $P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x}$
431
+
432
+ While this expression looks intimidating, it simplifies nicely:
433
+
434
+ \begin{align}
435
+ P(X=x)
436
+ &= \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x}
437
+ && \text{Start: binomial in the limit}\\
438
+ &= \lim_{n \rightarrow \infty}
439
+ {n \choose x} \cdot
440
+ \frac{\lambda^x}{n^x} \cdot
441
+ \frac{(1-\lambda/n)^{n}}{(1-\lambda/n)^{x}}
442
+ && \text{Expanding the power terms} \\
443
+ &= \lim_{n \rightarrow \infty}
444
+ \frac{n!}{(n-x)!x!} \cdot
445
+ \frac{\lambda^x}{n^x} \cdot
446
+ \frac{(1-\lambda/n)^{n}}{(1-\lambda/n)^{x}}
447
+ && \text{Expanding the binomial term} \\
448
+ &= \lim_{n \rightarrow \infty}
449
+ \frac{n!}{(n-x)!x!} \cdot
450
+ \frac{\lambda^x}{n^x} \cdot
451
+ \frac{e^{-\lambda}}{(1-\lambda/n)^{x}}
452
+ && \text{Using limit rule } \lim_{n \rightarrow \infty}(1-\lambda/n)^{n} = e^{-\lambda}\\
453
+ &= \lim_{n \rightarrow \infty}
454
+ \frac{n!}{(n-x)!x!} \cdot
455
+ \frac{\lambda^x}{n^x} \cdot
456
+ \frac{e^{-\lambda}}{1}
457
+ && \text{As } n \to \infty \text{, } \lambda/n \to 0\\
458
+ &= \lim_{n \rightarrow \infty}
459
+ \frac{n!}{(n-x)!} \cdot
460
+ \frac{1}{x!} \cdot
461
+ \frac{\lambda^x}{n^x} \cdot
462
+ e^{-\lambda}
463
+ && \text{Rearranging terms}\\
464
+ &= \lim_{n \rightarrow \infty}
465
+ \frac{n^x}{1} \cdot
466
+ \frac{1}{x!} \cdot
467
+ \frac{\lambda^x}{n^x} \cdot
468
+ e^{-\lambda}
469
+ && \text{As } n \to \infty \text{, } \frac{n!}{(n-x)!} \approx n^x\\
470
+ &= \lim_{n \rightarrow \infty}
471
+ \frac{\lambda^x}{x!} \cdot
472
+ e^{-\lambda}
473
+ && \text{Canceling } n^x\\
474
+ &=
475
+ \frac{\lambda^x \cdot e^{-\lambda}}{x!}
476
+ && \text{Simplifying}\\
477
+ \end{align}
478
+
479
+ This gives us our elegant Poisson PMF formula: $P(X=x) = \frac{\lambda^x \cdot e^{-\lambda}}{x!}$
480
+ """
481
+ )
482
+ return
483
+
484
+
485
+ @app.cell(hide_code=True)
486
+ def _(mo):
487
+ mo.md(
488
+ r"""
489
+ ## Poisson Distribution in Python
490
+
491
+ Python's `scipy.stats` module provides functions to work with the Poisson distribution. Let's see how to calculate probabilities and generate random samples.
492
+
493
+ First, let's calculate some probabilities for our ride-sharing example with $\lambda = 5$:
494
+ """
495
+ )
496
+ return
497
+
498
+
499
+ @app.cell
500
+ def _(stats):
501
+ # Set lambda parameter
502
+ _lambda = 5
503
+
504
+ # Calculate probabilities for X = 1, 2, 3
505
+ p_1 = stats.poisson.pmf(1, _lambda)
506
+ p_2 = stats.poisson.pmf(2, _lambda)
507
+ p_3 = stats.poisson.pmf(3, _lambda)
508
+
509
+ print(f"P(X=1) = {p_1:.5f}")
510
+ print(f"P(X=2) = {p_2:.5f}")
511
+ print(f"P(X=3) = {p_3:.5f}")
512
+
513
+ # Calculate cumulative probability P(X ≤ 3)
514
+ p_leq_3 = stats.poisson.cdf(3, _lambda)
515
+ print(f"P(X≤3) = {p_leq_3:.5f}")
516
+
517
+ # Calculate probability P(X > 10)
518
+ p_gt_10 = 1 - stats.poisson.cdf(10, _lambda)
519
+ print(f"P(X>10) = {p_gt_10:.5f}")
520
+ return p_1, p_2, p_3, p_gt_10, p_leq_3
521
+
522
+
523
+ @app.cell(hide_code=True)
524
+ def _(mo):
525
+ mo.md(r"""We can also generate random samples from a Poisson distribution and visualize their distribution:""")
526
+ return
527
+
528
+
529
+ @app.cell(hide_code=True)
530
+ def _(np, plt, stats):
531
+ # 1000 random samples from Poisson(lambda=5)
532
+ _lambda = 5
533
+ _samples = stats.poisson.rvs(_lambda, size=1000)
534
+
535
+ # theoretical PMF
536
+ _x_values = np.arange(0, max(_samples) + 1)
537
+ _pmf_values = stats.poisson.pmf(_x_values, _lambda)
538
+
539
+ # histograms to compare
540
+ _fig, _ax = plt.subplots(figsize=(10, 6))
541
+
542
+ # samples as a histogram
543
+ _ax.hist(_samples, bins=np.arange(-0.5, max(_samples) + 1.5, 1),
544
+ alpha=0.7, density=True, label='Random Samples')
545
+
546
+ # theoretical PMF
547
+ _ax.plot(_x_values, _pmf_values, 'ro-', label='Theoretical PMF')
548
+
549
+ # labels and title
550
+ _ax.set_xlabel('Number of Events')
551
+ _ax.set_ylabel('Relative Frequency / Probability')
552
+ _ax.set_title(f'1000 Random Samples from Poisson(λ={_lambda})')
553
+ _ax.legend()
554
+ _ax.grid(alpha=0.3)
555
+
556
+ # annotations
557
+ _ax.annotate(f'Sample Mean: {np.mean(_samples):.2f}',
558
+ xy=(0.7, 0.9), xycoords='axes fraction',
559
+ bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.3))
560
+ _ax.annotate(f'Theoretical Mean: {_lambda:.2f}',
561
+ xy=(0.7, 0.8), xycoords='axes fraction',
562
+ bbox=dict(boxstyle='round,pad=0.5', fc='lightgreen', alpha=0.3))
563
+
564
+ plt.tight_layout()
565
+ plt.gca()
566
+ return
567
+
568
+
569
+ @app.cell(hide_code=True)
570
+ def _(mo):
571
+ mo.md(
572
+ r"""
573
+ ## Changing Time Frames
574
+
575
+ One important property of the Poisson distribution is that the rate parameter $\lambda$ scales linearly with the time interval. If events occur at a rate of $\lambda$ per unit time, then over a period of $t$ units, the rate parameter becomes $\lambda \cdot t$.
576
+
577
+ For example, if a website receives an average of 5 requests per minute, what is the distribution of requests over a 20-minute period?
578
+
579
+ The rate parameter for the 20-minute period would be $\lambda = 5 \cdot 20 = 100$ requests.
580
+ """
581
+ )
582
+ return
583
+
584
+
585
+ @app.cell(hide_code=True)
586
+ def _(mo):
587
+ # sliders for the rate and time period
588
+ rate_slider = mo.ui.slider(
589
+ start = 0.1,
590
+ stop = 10,
591
+ step=0.1,
592
+ value=5,
593
+ label="Rate per unit time (λ)"
594
+ )
595
+
596
+ time_slider = mo.ui.slider(
597
+ start = 1,
598
+ stop = 60,
599
+ step=1,
600
+ value=20,
601
+ label="Time period (t units)"
602
+ )
603
+
604
+ controls = mo.vstack([
605
+ mo.md("### Adjust Parameters to See How Time Scaling Works"),
606
+ mo.hstack([rate_slider, time_slider], justify="space-between")
607
+ ])
608
+ return controls, rate_slider, time_slider
609
+
610
+
611
+ @app.cell(hide_code=True)
612
+ def _(mo, np, plt, rate_slider, stats, time_slider):
613
+ # parameters from sliders
614
+ _rate = rate_slider.value
615
+ _time = time_slider.value
616
+
617
+ # scaled rate parameter
618
+ _lambda = _rate * _time
619
+
620
+ # PMF for values
621
+ _max_x = max(30, int(_lambda * 1.5))
622
+ _x = np.arange(0, _max_x + 1)
623
+ _pmf = stats.poisson.pmf(_x, _lambda)
624
+
625
+ # plot
626
+ _fig, _ax = plt.subplots(figsize=(10, 6))
627
+
628
+ # PMF as bars
629
+ _ax.bar(_x, _pmf, color='royalblue', alpha=0.7,
630
+ label=f'PMF: Poisson(λ={_lambda:.1f})')
631
+
632
+ # vertical line for mean
633
+ _ax.axvline(x=_lambda, color='red', linestyle='--', linewidth=2,
634
+ label=f'Mean = {_lambda:.1f}')
635
+
636
+ # labels and title
637
+ _ax.set_xlabel('Number of Events')
638
+ _ax.set_ylabel('Probability')
639
+ _ax.set_title(f'Poisson Distribution Over {_time} Units (Rate = {_rate}/unit)')
640
+
641
+ # better visualization if lambda is large
642
+ if _lambda > 10:
643
+ _ax.set_xlim(_lambda - 4*np.sqrt(_lambda), _lambda + 4*np.sqrt(_lambda))
644
+
645
+ _ax.legend()
646
+ _ax.grid(alpha=0.3)
647
+
648
+ plt.tight_layout()
649
+ plt.gca()
650
+
651
+ # additional information
652
+ info = mo.md(
653
+ f"""
654
+ When the rate is **{_rate}** events per unit time and we observe for **{_time}** units:
655
+
656
+ - The expected number of events is **{_lambda:.1f}**
657
+ - The variance is also **{_lambda:.1f}**
658
+ - The standard deviation is **{np.sqrt(_lambda):.2f}**
659
+ - P(X=0) = {stats.poisson.pmf(0, _lambda):.4f} (probability of no events)
660
+ - P(X≥10) = {1 - stats.poisson.cdf(9, _lambda):.4f} (probability of 10 or more events)
661
+ """
662
+ )
663
+ return (info,)
664
+
665
+
666
+ @app.cell(hide_code=True)
667
+ def _(mo):
668
+ mo.md(
669
+ r"""
670
+ ## 🤔 Test Your Understanding
671
+ Pick which of these statements about Poisson distributions you think are correct:
672
+
673
+ /// details | The variance of a Poisson distribution is always equal to its mean
674
+ ✅ Correct! For a Poisson distribution with parameter $\lambda$, both the mean and variance equal $\lambda$.
675
+ ///
676
+
677
+ /// details | The Poisson distribution can be used to model the number of successes in a fixed number of trials
678
+ ❌ Incorrect! That's the binomial distribution. The Poisson distribution models the number of events in a fixed interval of time or space, not a fixed number of trials.
679
+ ///
680
+
681
+ /// details | If $X \sim \text{Poisson}(\lambda_1)$ and $Y \sim \text{Poisson}(\lambda_2)$ are independent, then $X + Y \sim \text{Poisson}(\lambda_1 + \lambda_2)$
682
+ ✅ Correct! The sum of independent Poisson random variables is also a Poisson random variable with parameter equal to the sum of the individual parameters.
683
+ ///
684
+
685
+ /// details | As $\lambda$ increases, the Poisson distribution approaches a normal distribution
686
+ ✅ Correct! For large values of $\lambda$ (generally $\lambda > 10$), the Poisson distribution is approximately normal with mean $\lambda$ and variance $\lambda$.
687
+ ///
688
+
689
+ /// details | The probability of zero events in a Poisson process is always less than the probability of one event
690
+ ❌ Incorrect! For $\lambda < 1$, the probability of zero events ($e^{-\lambda}$) is actually greater than the probability of one event ($\lambda e^{-\lambda}$).
691
+ ///
692
+
693
+ /// details | The Poisson distribution has a single parameter $\lambda$, which always equals the average number of events per time period
694
+ ✅ Correct! The parameter $\lambda$ represents the average rate of events, and it uniquely defines the distribution.
695
+ ///
696
+ """
697
+ )
698
+ return
699
+
700
+
701
+ @app.cell(hide_code=True)
702
+ def _(mo):
703
+ mo.md(
704
+ r"""
705
+ ## Summary
706
+
707
+ The Poisson distribution is one of those incredibly useful tools that shows up all over the place. I've always found it fascinating how such a simple formula can model so many real-world phenomena - from website traffic to radioactive decay.
708
+
709
+ What makes the Poisson really cool is that it emerges naturally as we try to model rare events occurring over a continuous interval. Remember that visualization where we kept dividing time into smaller and smaller chunks? As we showed, when you take a binomial distribution and let the number of trials approach infinity while keeping the expected value constant, you end up with the elegant Poisson formula.
710
+
711
+ The key things to remember about the Poisson distribution:
712
+
713
+ - It models the number of events occurring in a fixed interval of time or space, assuming events happen at a constant average rate and independently of each other
714
+
715
+ - Its PMF is given by the elegantly simple formula $P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}$
716
+
717
+ - Both the mean and variance equal the parameter $\lambda$, which represents the average number of events per interval
718
+
719
+ - It's related to the binomial distribution as a limiting case when $n \to \infty$, $p \to 0$, and $np = \lambda$ remains constant
720
+
721
+ - The rate parameter scales linearly with the length of the interval - if events occur at rate $\lambda$ per unit time, then over $t$ units, the parameter becomes $\lambda t$
722
+
723
+ From modeling website traffic and customer arrivals to defects in manufacturing and radioactive decay, the Poisson distribution provides a powerful and mathematically elegant way to understand random occurrences in our world.
724
+ """
725
+ )
726
+ return
727
+
728
+
729
+ @app.cell(hide_code=True)
730
+ def _(mo):
731
+ mo.md(r"""Appendix code (helper functions, variables, etc.):""")
732
+ return
733
+
734
+
735
+ @app.cell
736
+ def _():
737
+ import marimo as mo
738
+ return (mo,)
739
+
740
+
741
+ @app.cell(hide_code=True)
742
+ def _():
743
+ import numpy as np
744
+ import matplotlib.pyplot as plt
745
+ import scipy.stats as stats
746
+ import pandas as pd
747
+ import altair as alt
748
+ from wigglystuff import TangleSlider
749
+ return TangleSlider, alt, np, pd, plt, stats
750
+
751
+
752
+ @app.cell(hide_code=True)
753
+ def _():
754
+ import io
755
+ import base64
756
+ from matplotlib.figure import Figure
757
+
758
+ # Helper function to convert mpl figure to an image format mo.image can hopefully handle
759
+ def fig_to_image(fig):
760
+ buf = io.BytesIO()
761
+ fig.savefig(buf, format='png')
762
+ buf.seek(0)
763
+ data = f"data:image/png;base64,{base64.b64encode(buf.read()).decode('utf-8')}"
764
+ return data
765
+ return Figure, base64, fig_to_image, io
766
+
767
+
768
+ if __name__ == "__main__":
769
+ app.run()