Raine Hoang commited on
Commit
083415e
·
1 Parent(s): 46fe9ce

installed packages

Browse files
Files changed (1) hide show
  1. polars/02_dataframes.py +75 -71
polars/02_dataframes.py CHANGED
@@ -2,11 +2,15 @@
2
  # requires-python = ">=3.11"
3
  # dependencies = [
4
  # "marimo",
 
 
 
5
  # ]
6
  # ///
 
7
  import marimo
8
 
9
- __generated_with = "0.12.5"
10
  app = marimo.App(width="medium")
11
 
12
 
@@ -23,13 +27,13 @@ def _():
23
  def _(mo):
24
  mo.md(
25
  r"""
26
- # DataFrames
27
- Author: Raine Hoang
28
 
29
- In this tutorial, we will go over the central data structure for structured data, DataFrames. There are a multitude of packages that work with DataFrames, but we will be focusing on the way Polars uses them the different options it provides.
30
 
31
- **Note**: The following tutorial has been adapted from the Polars [documentation](https://docs.pola.rs/api/python/stable/reference/dataframe/index.html).
32
- """
33
  )
34
  return
35
 
@@ -38,10 +42,10 @@ def _(mo):
38
  def _(mo):
39
  mo.md(
40
  """
41
- ## Defining a DataFrame
42
 
43
- At the most basic level, all that you need to do in order to create a DataFrame in Polars is to use the .DataFrame() method and pass in some data into the data parameter. However, there are restrictions as to what exactly you can pass into this method.
44
- """
45
  )
46
  return
47
 
@@ -56,14 +60,14 @@ def _(mo):
56
  def _(mo):
57
  mo.md(
58
  r"""
59
- There are 5 data types that can be converted into a DataFrame.
60
-
61
- 1. Dictionary
62
- 2. Sequence
63
- 3. NumPy Array
64
- 4. Series
65
- 5. Pandas DataFrame
66
- """
67
  )
68
  return
69
 
@@ -72,10 +76,10 @@ def _(mo):
72
  def _(mo):
73
  mo.md(
74
  r"""
75
- #### Dictionary
76
 
77
- Dictionaries are structures that store data as key:value pairs. Let's say we have the following dictionary:
78
- """
79
  )
80
  return
81
 
@@ -104,10 +108,10 @@ def _(dct_data, pl):
104
  def _(mo):
105
  mo.md(
106
  r"""
107
- In this case, Polars turned each of the lists in the dictionary into a column in the DataFrame.
108
 
109
- The other data structures will follow a similar pattern when converting them to DataFrames.
110
- """
111
  )
112
  return
113
 
@@ -116,10 +120,10 @@ def _(mo):
116
  def _(mo):
117
  mo.md(
118
  r"""
119
- ##### Sequence
120
 
121
- Sequences are data structures that contain collections of items, which can be accessed using its index. Examples of sequences are lists, tuples, and strings. We will be using a list of lists in order to demonstrate how to convert a sequence in a DataFrame.
122
- """
123
  )
124
  return
125
 
@@ -148,10 +152,10 @@ def _(mo):
148
  def _(mo):
149
  mo.md(
150
  r"""
151
- ##### NumPy Array
152
 
153
- NumPy arrays are considered a sequence of items that can also be accessed using its index. An important thing to note is that all of the items in an array must have the same data type.
154
- """
155
  )
156
  return
157
 
@@ -180,10 +184,10 @@ def _(mo):
180
  def _(mo):
181
  mo.md(
182
  r"""
183
- ##### Series
184
 
185
- Series are a way to store a single column in a DataFrame and all entries in a series must have the same data type. You can combine these series together to form one DataFrame.
186
- """
187
  )
188
  return
189
 
@@ -199,17 +203,17 @@ def _(pl):
199
  def _(pl, pl_series):
200
  series_df = pl.DataFrame(data = pl_series)
201
  series_df
202
- return (series_df,)
203
 
204
 
205
  @app.cell(hide_code=True)
206
  def _(mo):
207
  mo.md(
208
  r"""
209
- ##### Pandas DataFrame
210
 
211
- Another popular package that utilizes DataFrames is pandas. By passing in a pandas DataFrame into .DataFrame(), you can easily convert it into a Polars DataFrame.
212
- """
213
  )
214
  return
215
 
@@ -229,7 +233,7 @@ def _(pd_df, pl):
229
  pl_df = pl.DataFrame(data = pd_df)
230
 
231
  pl_df
232
- return (pl_df,)
233
 
234
 
235
  @app.cell(hide_code=True)
@@ -242,10 +246,10 @@ def _(mo):
242
  def _(mo):
243
  mo.md(
244
  r"""
245
- ## DataFrame Structure
246
 
247
- Let's recall one of the DataFrames we defined earlier.
248
- """
249
  )
250
  return
251
 
@@ -266,17 +270,17 @@ def _(mo):
266
  def _(mo):
267
  mo.md(
268
  r"""
269
- ## Parameters
270
 
271
- On top of the "data" parameter, there are 6 additional parameters you can specify:
272
 
273
- 1. schema
274
- 2. schema_overrides
275
- 3. strict
276
- 4. orient
277
- 5. infer_schema_length
278
- 6. nan_to_null
279
- """
280
  )
281
  return
282
 
@@ -285,10 +289,10 @@ def _(mo):
285
  def _(mo):
286
  mo.md(
287
  r"""
288
- #### Schema
289
 
290
- Let's recall the DataFrame we created using a sequence.
291
- """
292
  )
293
  return
294
 
@@ -345,10 +349,10 @@ def _(mo):
345
  def _(mo):
346
  mo.md(
347
  r"""
348
- #### Schema_Overrides
349
 
350
- If you only wanted to specify the data type of specific columns and let Polars infer the rest, you can use the schema_overrides parameter for that. This parameter requires that you pass in a dictionary where the key value pair is column name:data type. Unlike the schema parameter, the column name must match the name already present in the DataFrame as that is how Polars will identify which column you want to specify the data type. If you use a column name that doesn't already exist, Polars won't be able to change the data type.
351
- """
352
  )
353
  return
354
 
@@ -363,10 +367,10 @@ def _(pl, seq_data):
363
  def _(mo):
364
  mo.md(
365
  r"""
366
- Notice here that only the data type in the first column changed while Polars inferred the rest.
367
 
368
- It is important to note that if you only use the schema_overrides parameter, you are limited to how much you can change the data type. In the example above, we were able to change the data type from int32 to int16 without any further parameters since the data type is still an integer. However, if we wanted to change the first column to be a string, we would get an error as Polars has already strictly set the schema to only take in integer values.
369
- """
370
  )
371
  return
372
 
@@ -387,10 +391,10 @@ def _(mo):
387
  def _(mo):
388
  mo.md(
389
  r"""
390
- #### Strict
391
 
392
- The strict parameter allows you to specify if you want a column's data type to be enforced with flexibility or not. When set to `True`, Polars will raise an error if there is a data type that doesn't match the data type the column is expecting. It will not attempt to type cast it to the correct data type as Polars prioritizes that all the data can be converted without any loss or error. When set to `False`, Polars will attempt to type cast the data into the data type the column wants. If it is unable to successfully convert the data type, the value will be replaced with a null value.
393
- """
394
  )
395
  return
396
 
@@ -406,7 +410,7 @@ def _(pl):
406
  data = [[1, "a", 2]]
407
 
408
  pl.DataFrame(data = data, strict = True)
409
- return (data,)
410
 
411
 
412
  @app.cell
@@ -431,10 +435,10 @@ def _(mo):
431
  def _(mo):
432
  mo.md(
433
  """
434
- #### Orient
435
 
436
- Let's recall the DataFrame we made by using an array and the data used to make it.
437
- """
438
  )
439
  return
440
 
@@ -485,10 +489,10 @@ def _(pl, seq_data):
485
  def _(mo):
486
  mo.md(
487
  r"""
488
- #### Infer_Schema_Length
489
 
490
- Without setting the schema ourselves, Polars uses the data provided to infer the data types of the columns. It does this by looking at each of the rows in the data provided. You can specify to Polars how many rows to look at by using the infer_schema_length parameter. For example, if you were to set this parameter to 5, then Polars would use the first 5 rows to infer the schema.
491
- """
492
  )
493
  return
494
 
@@ -497,10 +501,10 @@ def _(mo):
497
  def _(mo):
498
  mo.md(
499
  r"""
500
- #### NaN_To_Null
501
 
502
- If there are np.nan values in the data, you can convert them to null values by setting the nan_to_null parameter to `True`.
503
- """
504
  )
505
  return
506
 
 
2
  # requires-python = ">=3.11"
3
  # dependencies = [
4
  # "marimo",
5
+ # "numpy==2.2.5",
6
+ # "pandas==2.2.3",
7
+ # "polars==1.29.0",
8
  # ]
9
  # ///
10
+
11
  import marimo
12
 
13
+ __generated_with = "0.13.6"
14
  app = marimo.App(width="medium")
15
 
16
 
 
27
  def _(mo):
28
  mo.md(
29
  r"""
30
+ # DataFrames
31
+ Author: Raine Hoang
32
 
33
+ In this tutorial, we will go over the central data structure for structured data, DataFrames. There are a multitude of packages that work with DataFrames, but we will be focusing on the way Polars uses them the different options it provides.
34
 
35
+ **Note**: The following tutorial has been adapted from the Polars [documentation](https://docs.pola.rs/api/python/stable/reference/dataframe/index.html).
36
+ """
37
  )
38
  return
39
 
 
42
  def _(mo):
43
  mo.md(
44
  """
45
+ ## Defining a DataFrame
46
 
47
+ At the most basic level, all that you need to do in order to create a DataFrame in Polars is to use the .DataFrame() method and pass in some data into the data parameter. However, there are restrictions as to what exactly you can pass into this method.
48
+ """
49
  )
50
  return
51
 
 
60
  def _(mo):
61
  mo.md(
62
  r"""
63
+ There are 5 data types that can be converted into a DataFrame.
64
+
65
+ 1. Dictionary
66
+ 2. Sequence
67
+ 3. NumPy Array
68
+ 4. Series
69
+ 5. Pandas DataFrame
70
+ """
71
  )
72
  return
73
 
 
76
  def _(mo):
77
  mo.md(
78
  r"""
79
+ #### Dictionary
80
 
81
+ Dictionaries are structures that store data as key:value pairs. Let's say we have the following dictionary:
82
+ """
83
  )
84
  return
85
 
 
108
  def _(mo):
109
  mo.md(
110
  r"""
111
+ In this case, Polars turned each of the lists in the dictionary into a column in the DataFrame.
112
 
113
+ The other data structures will follow a similar pattern when converting them to DataFrames.
114
+ """
115
  )
116
  return
117
 
 
120
  def _(mo):
121
  mo.md(
122
  r"""
123
+ ##### Sequence
124
 
125
+ Sequences are data structures that contain collections of items, which can be accessed using its index. Examples of sequences are lists, tuples, and strings. We will be using a list of lists in order to demonstrate how to convert a sequence in a DataFrame.
126
+ """
127
  )
128
  return
129
 
 
152
  def _(mo):
153
  mo.md(
154
  r"""
155
+ ##### NumPy Array
156
 
157
+ NumPy arrays are considered a sequence of items that can also be accessed using its index. An important thing to note is that all of the items in an array must have the same data type.
158
+ """
159
  )
160
  return
161
 
 
184
  def _(mo):
185
  mo.md(
186
  r"""
187
+ ##### Series
188
 
189
+ Series are a way to store a single column in a DataFrame and all entries in a series must have the same data type. You can combine these series together to form one DataFrame.
190
+ """
191
  )
192
  return
193
 
 
203
  def _(pl, pl_series):
204
  series_df = pl.DataFrame(data = pl_series)
205
  series_df
206
+ return
207
 
208
 
209
  @app.cell(hide_code=True)
210
  def _(mo):
211
  mo.md(
212
  r"""
213
+ ##### Pandas DataFrame
214
 
215
+ Another popular package that utilizes DataFrames is pandas. By passing in a pandas DataFrame into .DataFrame(), you can easily convert it into a Polars DataFrame.
216
+ """
217
  )
218
  return
219
 
 
233
  pl_df = pl.DataFrame(data = pd_df)
234
 
235
  pl_df
236
+ return
237
 
238
 
239
  @app.cell(hide_code=True)
 
246
  def _(mo):
247
  mo.md(
248
  r"""
249
+ ## DataFrame Structure
250
 
251
+ Let's recall one of the DataFrames we defined earlier.
252
+ """
253
  )
254
  return
255
 
 
270
  def _(mo):
271
  mo.md(
272
  r"""
273
+ ## Parameters
274
 
275
+ On top of the "data" parameter, there are 6 additional parameters you can specify:
276
 
277
+ 1. schema
278
+ 2. schema_overrides
279
+ 3. strict
280
+ 4. orient
281
+ 5. infer_schema_length
282
+ 6. nan_to_null
283
+ """
284
  )
285
  return
286
 
 
289
  def _(mo):
290
  mo.md(
291
  r"""
292
+ #### Schema
293
 
294
+ Let's recall the DataFrame we created using a sequence.
295
+ """
296
  )
297
  return
298
 
 
349
  def _(mo):
350
  mo.md(
351
  r"""
352
+ #### Schema_Overrides
353
 
354
+ If you only wanted to specify the data type of specific columns and let Polars infer the rest, you can use the schema_overrides parameter for that. This parameter requires that you pass in a dictionary where the key value pair is column name:data type. Unlike the schema parameter, the column name must match the name already present in the DataFrame as that is how Polars will identify which column you want to specify the data type. If you use a column name that doesn't already exist, Polars won't be able to change the data type.
355
+ """
356
  )
357
  return
358
 
 
367
  def _(mo):
368
  mo.md(
369
  r"""
370
+ Notice here that only the data type in the first column changed while Polars inferred the rest.
371
 
372
+ It is important to note that if you only use the schema_overrides parameter, you are limited to how much you can change the data type. In the example above, we were able to change the data type from int32 to int16 without any further parameters since the data type is still an integer. However, if we wanted to change the first column to be a string, we would get an error as Polars has already strictly set the schema to only take in integer values.
373
+ """
374
  )
375
  return
376
 
 
391
  def _(mo):
392
  mo.md(
393
  r"""
394
+ #### Strict
395
 
396
+ The strict parameter allows you to specify if you want a column's data type to be enforced with flexibility or not. When set to `True`, Polars will raise an error if there is a data type that doesn't match the data type the column is expecting. It will not attempt to type cast it to the correct data type as Polars prioritizes that all the data can be converted without any loss or error. When set to `False`, Polars will attempt to type cast the data into the data type the column wants. If it is unable to successfully convert the data type, the value will be replaced with a null value.
397
+ """
398
  )
399
  return
400
 
 
410
  data = [[1, "a", 2]]
411
 
412
  pl.DataFrame(data = data, strict = True)
413
+ return
414
 
415
 
416
  @app.cell
 
435
  def _(mo):
436
  mo.md(
437
  """
438
+ #### Orient
439
 
440
+ Let's recall the DataFrame we made by using an array and the data used to make it.
441
+ """
442
  )
443
  return
444
 
 
489
  def _(mo):
490
  mo.md(
491
  r"""
492
+ #### Infer_Schema_Length
493
 
494
+ Without setting the schema ourselves, Polars uses the data provided to infer the data types of the columns. It does this by looking at each of the rows in the data provided. You can specify to Polars how many rows to look at by using the infer_schema_length parameter. For example, if you were to set this parameter to 5, then Polars would use the first 5 rows to infer the schema.
495
+ """
496
  )
497
  return
498
 
 
501
  def _(mo):
502
  mo.md(
503
  r"""
504
+ #### NaN_To_Null
505
 
506
+ If there are np.nan values in the data, you can convert them to null values by setting the nan_to_null parameter to `True`.
507
+ """
508
  )
509
  return
510