Spaces:
Sleeping
Sleeping
first push
Browse files- Dockerfile +11 -0
- Metadata_Country_API_Download_DS2_en_csv_v2_5657328.csv +9 -0
- Metadata_Indicator_API_Download_DS2_en_csv_v2_5657328.csv +0 -0
- README.md +101 -5
- docker-compose.yml +13 -0
- remark_slides.md +15 -0
- requirements.txt +8 -0
- scripts_build/01_read.py +49 -0
- scripts_build/02_explore.py +82 -0
- scripts_build/03_holoviews.py +54 -0
- streamlit.py +82 -0
Dockerfile
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
FROM python:3.9-slim
|
2 |
+
EXPOSE 8501
|
3 |
+
WORKDIR /app
|
4 |
+
RUN apt-get update && apt-get install -y \
|
5 |
+
build-essential \
|
6 |
+
software-properties-common \
|
7 |
+
git \
|
8 |
+
&& rm -rf /var/lib/apt/lists/*
|
9 |
+
COPY . /app
|
10 |
+
RUN pip3 install -r requirements.txt
|
11 |
+
ENTRYPOINT ["streamlit", "run", "streamlit.py", "--server.port=8501", "--server.address=0.0.0.0"]
|
Metadata_Country_API_Download_DS2_en_csv_v2_5657328.csv
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"Country Code","Region","IncomeGroup","SpecialNotes","TableName",
|
2 |
+
"COD","Sub-Saharan Africa","Low income","The World Bank systematically assesses the appropriateness of official exchange rates as conversion factors. In this country, multiple or dual exchange rate activity exists and must be accounted for appropriately in underlying statistics. An alternative estimate (“alternative conversion factor” - PA.NUS.ATLS) is thus calculated as a weighted average of the different exchange rates in use in the country. Doing so better reflects economic reality and leads to more accurate cross-country comparisons and country classifications by income level. For this country, this applies to the period 1999-2004. Alternative conversion factors are used in the Atlas methodology and elsewhere in World Development Indicators as single-year conversion factors.","Congo, Dem. Rep.",
|
3 |
+
"GHA","Sub-Saharan Africa","Lower middle income","The World Bank systematically assesses the appropriateness of official exchange rates as conversion factors. In this country, multiple or dual exchange rate activity exists and must be accounted for appropriately in underlying statistics. An alternative estimate (“alternative conversion factor” - PA.NUS.ATLS) is thus calculated as a weighted average of the different exchange rates in use in the country. Doing so better reflects economic reality and leads to more accurate cross-country comparisons and country classifications by income level. For this country, this applies to the period 1974-1987. Alternative conversion factors are used in the Atlas methodology and elsewhere in World Development Indicators as single-year conversion factors.","Ghana",
|
4 |
+
"KEN","Sub-Saharan Africa","Lower middle income","Fiscal year end: June 30; reporting period for national accounts data: CY.","Kenya",
|
5 |
+
"NGA","Sub-Saharan Africa","Lower middle income","The World Bank systematically assesses the appropriateness of official exchange rates as conversion factors. In this country, multiple or dual exchange rate activity exists and must be accounted for appropriately in underlying statistics. An alternative estimate (“alternative conversion factor” - PA.NUS.ATLS) is thus calculated as a weighted average of the different exchange rates in use in the country. Doing so better reflects economic reality and leads to more accurate cross-country comparisons and country classifications by income level. For this country, this applies to 1970-2020. Alternative conversion factors are used in the Atlas methodology and elsewhere in World Development Indicators as single-year conversion factors.","Nigeria",
|
6 |
+
"ZAF","Sub-Saharan Africa","Upper middle income","Fiscal year end: March 31; reporting period for national accounts data: CY.","South Africa",
|
7 |
+
"ZWE","Sub-Saharan Africa","Lower middle income","National Accounts data are reported in Zimbabwean Dollar (ZWL). Before 2017, one ZWL is set to be equal to one USD.
|
8 |
+
|
9 |
+
The World Bank systematically assesses the appropriateness of official exchange rates as conversion factors. In this country, multiple or dual exchange rate activity exists and must be accounted for appropriately in underlying statistics. An alternative estimate (“alternative conversion factor” - PA.NUS.ATLS) is thus calculated as a weighted average of the different exchange rates in use in the country. Doing so better reflects economic reality and leads to more accurate cross-country comparisons and country classifications by income level. For this country, this applies to the period 2017-2022. Alternative conversion factors are used in the Atlas methodology and elsewhere in World Development Indicators as single-year conversion factors.","Zimbabwe",
|
Metadata_Indicator_API_Download_DS2_en_csv_v2_5657328.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
README.md
CHANGED
@@ -1,11 +1,107 @@
|
|
1 |
---
|
2 |
-
title: Docker
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: docker
|
7 |
pinned: false
|
8 |
license: apache-2.0
|
|
|
9 |
---
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: Streamlit Docker
|
3 |
+
emoji: 🐨
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: red
|
6 |
sdk: docker
|
7 |
pinned: false
|
8 |
license: apache-2.0
|
9 |
+
app_port: 8501
|
10 |
---
|
11 |
|
12 |
+
|
13 |
+
## Introduction to Data Science with Python
|
14 |
+
|
15 |
+
## Overview
|
16 |
+
|
17 |
+
Location: Accra, Ghana When: July 31 and August 1, 2023
|
18 |
+
|
19 |
+
This material focuses on [Polars](https://pola-rs.github.io/polars-book/user-guide/), [Parquet files](https://parquet.apache.org/docs/), [Plotly Express](https://plotly.com/python/plotly-express/), and [Streamlit](https://streamlit.io/) to introduce the data science process.
|
20 |
+
|
21 |
+
## Installing the tools
|
22 |
+
|
23 |
+
We will need [Visual Studio Code](https://code.visualstudio.com/download) and [Python](https://www.python.org/downloads/) installed for this short course. Each tool has additional packages/extensions that we will need to install as well.
|
24 |
+
|
25 |
+
|
26 |
+
### Visual Studio Code Extensions
|
27 |
+
|
28 |
+
You can use [Managing Extensions in Visual Studio Code](https://code.visualstudio.com/docs/editor/extension-marketplace) to learn about how to install extensions. We will use [Python - Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-python.python) extension heavily. [Managing Extensions in Visual Studio Code](https://code.visualstudio.com/docs/editor/extension-marketplace) provides more background on extensions if needed.
|
29 |
+
|
30 |
+
#### VS Code Interactive Python Window
|
31 |
+
|
32 |
+
An open-source project called [Jupyter](http://jupyter-notebook.readthedocs.io/en/latest/) is the standard method for interactive Python use for data science or scientific computing. However, there are [some issues](https://towardsdatascience.com/5-reasons-why-jupyter-notebooks-suck-4dc201e27086) with its use in a development environment. VS Code provides a way for us to have the best of Python and Jupyter Notebooks with their [Python Interactive Window](https://code.visualstudio.com/docs/python/jupyter-support-py).
|
33 |
+
|
34 |
+
VS Code is fairly intelligent in responding to your needs. If you open a `.py` file it should ask pop up a window asking you if you would like prepare your Python experience. You will need to install the [jupyter python package](https://jupyter.readthedocs.io/en/latest/install.html). If VS Code doesn't install it it, you can use `pip` or `pip3` for the interactive Python window to work.
|
35 |
+
|
36 |
+
Using the VS Code functionality, you will work with a standard `.py` file instead of the `.ipynb` extension typically used with jupyter notebooks. The Python extension in VS Code will recognize `# %%` as a cell or chunk of python code and add notebook options to ‘Run Cell’ as well as other actions. You can see the code example bellow with the image of the view in VS Code as an example. [Microsoft’s documentation](https://code.visualstudio.com/docs/python/jupyter-support-py) goes into more detail (https://code.visualstudio.com/docs/python/jupyter-support-py).
|
37 |
+
|
38 |
+
To make the interactive window use more functional you can `ctrl + ,` or `cmd + ,` on a mac to open the settings. From there you can search **‘Send Selection to Interactive Window’** and make sure the box is checked. Now you will be able to use `shift + return` to send a selected chunk of code or an entire cell.
|
39 |
+
|
40 |
+
```python
|
41 |
+
# %%
|
42 |
+
msg = "Hello World"
|
43 |
+
print(msg)
|
44 |
+
|
45 |
+
# %%
|
46 |
+
msg = "Hello again"
|
47 |
+
print(msg)
|
48 |
+
```
|
49 |
+
|
50 |
+

|
51 |
+
|
52 |
+
### Python Packages
|
53 |
+
|
54 |
+
#### `pip` overview
|
55 |
+
|
56 |
+
*The standard command* - `pip install polars[all] plotly streamlit` is executed in your Terminal, Command Window, or by using the `New Terminal` under `Terminal` in VS Code. If you are using a Mac you most likely will use `pip3 install polars[all] plotly streamlit`. In your interactive Python environment in VS Code (Jupyter server) you can run `!pip install polars[all] plotly streamlit` as explained [here](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/#How-to-use-Pip-from-the-Jupyter-Notebook). Finally, you could use the following Python code snippet.
|
57 |
+
|
58 |
+
The two commands that can be used in the interactive python window in VS Code to install packages.
|
59 |
+
|
60 |
+
```python
|
61 |
+
!pip install polars[all] plotly streamlit
|
62 |
+
```
|
63 |
+
|
64 |
+
or
|
65 |
+
|
66 |
+
```python
|
67 |
+
import sys
|
68 |
+
!{sys.executable} -m pip install polars[all] plotly streamlit
|
69 |
+
```
|
70 |
+
|
71 |
+
#### `pip` commands
|
72 |
+
|
73 |
+
- `pip install polars[all] plotly streamlit` should install all needed packages.
|
74 |
+
|
75 |
+
You could install them individually using the following commands.
|
76 |
+
|
77 |
+
- `pip install polars[all]` for [Polars](https://pola-rs.github.io/polars-book/user-guide/installation/)
|
78 |
+
- `pip install streamlit` for [Streamlit](https://docs.streamlit.io/library/get-started/main-concepts)
|
79 |
+
- `pip install plotly` for [plotly in Python](https://plotly.com/python/getting-started/)
|
80 |
+
|
81 |
+
## Repo Navigation
|
82 |
+
|
83 |
+
### `guides` folder
|
84 |
+
|
85 |
+
The `guides` folder will allow us to explore these packages if the internet connection is down during our course.
|
86 |
+
|
87 |
+
- PDF Files: The pdf files should have most of the commands we will need during the course. The `polars_website.pdf` is a full pdf build of their website guide as of July 2023.
|
88 |
+
- `streamlit_md` folder: This folder has the markdown files used to build their website guide. It is a little harder to navigate.
|
89 |
+
- `polars_site` folder: This folder has the fully built website for the polars package as of July 2023. From your OS file explorer open the `index.html` file to see the full site.
|
90 |
+
|
91 |
+
### `data` folder
|
92 |
+
|
93 |
+
This folder has the data we will be using for the short course. Read more about [the data folder](https://file+.vscode-resource.vscode-cdn.net/Users/hathawayj/git/hathawayj/ghana_datascience/data/readme.md).
|
94 |
+
|
95 |
+
### Scripts folder
|
96 |
+
|
97 |
+
The scripts folder has the starting scripts for each of the activities we will complete during the short course.
|
98 |
+
|
99 |
+
### Markdown links
|
100 |
+
|
101 |
+
- [plotly.md](https://file+.vscode-resource.vscode-cdn.net/Users/hathawayj/git/hathawayj/ghana_datascience/plotly.md): links to the primary functions we will use as we create charts with Plotly Express
|
102 |
+
- [polars.md](https://file+.vscode-resource.vscode-cdn.net/Users/hathawayj/git/hathawayj/ghana_datascience/polars.md): links to the key methods we will leverage for data import and munging.
|
103 |
+
- [streamlit.md](https://file+.vscode-resource.vscode-cdn.net/Users/hathawayj/git/hathawayj/ghana_datascience/streamlit.md): links to the dashboard functions and concepts we will use with Streamlit
|
104 |
+
|
105 |
+
## Slides
|
106 |
+
|
107 |
+
The [HTML Slides](https://hathawayj.github.io/ghana_datascience/) and [pdf slides](https://github.com/hathawayj/ghana_datascience/blob/slides/slides.pdf)
|
docker-compose.yml
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
version: '3.9'
|
2 |
+
services:
|
3 |
+
streamlit:
|
4 |
+
build:
|
5 |
+
dockerfile: Dockerfile
|
6 |
+
context: .
|
7 |
+
container_name: streamlit-example
|
8 |
+
cpus: 2
|
9 |
+
mem_limit: 2048m
|
10 |
+
ports:
|
11 |
+
- "8501:8501"
|
12 |
+
volumes:
|
13 |
+
- ".:/app:rw"
|
remark_slides.md
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# GitHub Pages Slideshow with [Remark](https://github.com/gnab/remark)
|
2 |
+
|
3 |
+
This template is made from [Remark](https://github.com/gnab/remark), an open source tool to help create and display slideshows from markdown. For questions, see [Remark's documentation](https://github.com/gnab/remark). I have added a Github action to convert the slides to a pdf in the `slides` branch.
|
4 |
+
|
5 |
+
The most important things to know are:
|
6 |
+
- Enable GitHub Pages from `master` for the slides to work
|
7 |
+
- Once enabled, the slides will be visible at `https://USERNAME.github.io/REPOSITORY-NAME/#1`, like https://brianamarie.github.io/slideshow-on-pages/#1
|
8 |
+
- Edit the `index.html` file to edit the slides
|
9 |
+
- Slides are separated by `----`
|
10 |
+
- Presenter notes after `???` within one slide
|
11 |
+
- Toggle presenter notes during presentation with `P`
|
12 |
+
- Read the full guide to [remark markdown](https://github.com/gnab/remark/wiki)
|
13 |
+
- Press `C` to clone a display; then press `P` to switch to presenter mode. Open help menu with `h`
|
14 |
+
|
15 |
+
Fork this repository to get started!
|
requirements.txt
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
altair
|
2 |
+
polars
|
3 |
+
pandas
|
4 |
+
streamlit
|
5 |
+
scikit-learn
|
6 |
+
numpy
|
7 |
+
plotly
|
8 |
+
lets-plot
|
scripts_build/01_read.py
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# %%
|
2 |
+
import polars as pl
|
3 |
+
# Notice that the world health leaves missing as blanks in the csv. We need to explain that blanks aren't strings but missing values.
|
4 |
+
|
5 |
+
dat = pl.read_csv("../data/API_Download_DS2_en_csv_v2_5657328.csv", skip_rows=4, null_values = "")
|
6 |
+
# We don't like the World Banks wide format. Let's clean it upt to long format.
|
7 |
+
dat_long = dat.melt(id_vars=["Country Name", "Country Code", "Indicator Name", "Indicator Code"])
|
8 |
+
# no we need to fix the year column and give it a better name.
|
9 |
+
# we could have fixed the name as an argument in `.melt()` as well.
|
10 |
+
# https://docs.pola.rs/user-guide/expressions/casting/#overflow
|
11 |
+
dat_long = dat_long\
|
12 |
+
.with_columns(
|
13 |
+
pl.col("variable").cast(pl.Int64, strict=False).alias("variable"),
|
14 |
+
pl.col("value").cast(pl.Float32, strict=False).alias("value"))\
|
15 |
+
.rename({"variable":"year"})\
|
16 |
+
.filter(pl.col("value").is_not_null())
|
17 |
+
|
18 |
+
|
19 |
+
# %%
|
20 |
+
# Can we split out the information in the indicator Code
|
21 |
+
indicator_columns = dat_long\
|
22 |
+
.select(
|
23 |
+
pl.col("Indicator Code"),
|
24 |
+
pl.col("Indicator Code")\
|
25 |
+
# split string: example VC.IDP.TOCV into list object ["VC", "IDP", "TOCV"]
|
26 |
+
.str.split_exact(".", 6).alias("split")).unnest("split")\
|
27 |
+
.unique()
|
28 |
+
|
29 |
+
|
30 |
+
# What should we call these columns?
|
31 |
+
# https://datahelpdesk.worldbank.org/knowledgebase/articles/201175-how-does-the-world-bank-code-its-indicators
|
32 |
+
new_names = {"field_0":"topic", "field_1":"general_subj", "field_2":"specific_subj",
|
33 |
+
"field_3":"ext_1", "field_4":"ext_2", "field_5":"ext_3", "field_6":"ext_4"}
|
34 |
+
indicator_columns = indicator_columns.rename(new_names)
|
35 |
+
|
36 |
+
# %%
|
37 |
+
# now we need to finalize our munge and write our data
|
38 |
+
dat_final = dat_long.join(indicator_columns, how="left", on="Indicator Code")
|
39 |
+
# Now I want to reorder the columns
|
40 |
+
names = dat_final.columns
|
41 |
+
new_order = [1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 3, 0]
|
42 |
+
name_order = [names[i] for i in new_order]
|
43 |
+
dat_final = dat_final.select(name_order)
|
44 |
+
|
45 |
+
# %%
|
46 |
+
# write data
|
47 |
+
dat_final.write_parquet("../data/dat_munged.parquet")
|
48 |
+
|
49 |
+
|
scripts_build/02_explore.py
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# %%
|
2 |
+
import polars as pl
|
3 |
+
import plotly.express as px
|
4 |
+
import plotly.io as pio
|
5 |
+
pio.templates.default = "simple_white"
|
6 |
+
|
7 |
+
from lets_plot import *
|
8 |
+
LetsPlot.setup_html()
|
9 |
+
# ["plotly", "plotly_white", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"]
|
10 |
+
|
11 |
+
dat = pl.read_parquet("../data/dat_munged.parquet")
|
12 |
+
info = pl.read_csv("../data/Metadata_Indicator_API_Download_DS2_en_csv_v2_5657328.csv").rename({"INDICATOR_CODE":"Indicator Code", "INDICATOR_NAME":"Indicator Name"})
|
13 |
+
|
14 |
+
# %%
|
15 |
+
dat_vars = dat\
|
16 |
+
.group_by("Indicator Name", "Indicator Code", "Country Code").len()\
|
17 |
+
.pivot(values="len", index=["Indicator Name", "Indicator Code"], columns="Country Code", aggregate_function="first")\
|
18 |
+
.with_columns((pl.col("COD").fill_null(0) + pl.col("GHA").fill_null(0) +
|
19 |
+
pl.col("KEN").fill_null(0) + pl.col("NGA").fill_null(0) +
|
20 |
+
pl.col("ZAF").fill_null(0) + pl.col("ZWE").fill_null(0)).alias("total"))\
|
21 |
+
.sort(pl.col("total"), descending=True)\
|
22 |
+
.filter(pl.col("total") > 25)
|
23 |
+
|
24 |
+
dat_vars.write_parquet("../data/dat_vars.parquet")
|
25 |
+
|
26 |
+
# %%
|
27 |
+
ggplot(
|
28 |
+
dat.filter(pl.col("Indicator Code").is_in(["NY.GDP.PCAP.PP.KD"])),
|
29 |
+
aes(x="year", y="value", color="Country Code")) +\
|
30 |
+
geom_line() +\
|
31 |
+
geom_point(size=1.2)
|
32 |
+
|
33 |
+
# %%
|
34 |
+
# Access to fuels
|
35 |
+
# Access to internet
|
36 |
+
# GDP
|
37 |
+
|
38 |
+
drop_country = ["ZAF"]
|
39 |
+
indicator_code = "NY.GDP.PCAP.PP.KD"
|
40 |
+
title_text = dat_vars\
|
41 |
+
.filter(pl.col("Indicator Code") == indicator_code)\
|
42 |
+
.select("Indicator Name")\
|
43 |
+
.to_series()[0]
|
44 |
+
subtitle_text = info\
|
45 |
+
.filter(pl.col("Indicator Code") == indicator_code)\
|
46 |
+
.select("SOURCE_NOTE")\
|
47 |
+
.to_series()[0]
|
48 |
+
subtitle_text = subtitle_text[1:100] + "..."
|
49 |
+
chart_title = title_text + "<br><sup>" + subtitle_text + "</sup>"
|
50 |
+
y_axis_title = chart_title[chart_title.find("(")+1:chart_title.find(")")]
|
51 |
+
chart_dat = dat\
|
52 |
+
.filter(
|
53 |
+
(pl.col("Indicator Code").is_in([indicator_code])) &
|
54 |
+
(~pl.col("Country Code").is_in(drop_country)))
|
55 |
+
|
56 |
+
chart_plotly = px.line(chart_dat,
|
57 |
+
x="year", y="value", color="Country Code", markers=True,
|
58 |
+
labels = {"year":"Year", "value":y_axis_title},
|
59 |
+
title = chart_title)
|
60 |
+
|
61 |
+
chart_lp = ggplot(chart_dat, aes(x="year", y="value", color="Country Code")) +\
|
62 |
+
geom_point(shape=21, size=1.25, tooltips="none") +\
|
63 |
+
geom_line(tooltips=layer_tooltips()\
|
64 |
+
.format('value', '{.0f}')) +\
|
65 |
+
labs(
|
66 |
+
x="Year", y=y_axis_title,
|
67 |
+
title=title_text,
|
68 |
+
subtitle=subtitle_text) +\
|
69 |
+
scale_x_continuous(format='.0f') +\
|
70 |
+
theme(legend_position="bottom")
|
71 |
+
chart_lp
|
72 |
+
|
73 |
+
# %%
|
74 |
+
# https://2001-2009.state.gov/r/pa/ho/time/pcw/98678.htm#:~:text=Apartheid%2C%20the%20Afrikaans%20name%20given,a%20democratic%20government%20in%201994.
|
75 |
+
# What happened in South Africa in the 1990s?
|
76 |
+
# What happened in Nigeria in early 2000?
|
77 |
+
# What happened in Ghana in 2010s?
|
78 |
+
# sp.add_annotation(
|
79 |
+
# x=1994, y=4100,
|
80 |
+
# text="Democratic Government",
|
81 |
+
# showarrow=True,
|
82 |
+
# yshift=10)
|
scripts_build/03_holoviews.py
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# %%
|
2 |
+
import polars as pl
|
3 |
+
import holoviews as hv
|
4 |
+
import panel as pn
|
5 |
+
from holoviews import opts
|
6 |
+
hv.extension('bokeh')
|
7 |
+
dat = pl.read_parquet("../data/dat_munged.parquet")
|
8 |
+
info = pl.read_csv("../data/Metadata_Indicator_API_Download_DS2_en_csv_v2_5657328.csv").rename({"INDICATOR_CODE":"Indicator Code", "INDICATOR_NAME":"Indicator Name"})
|
9 |
+
|
10 |
+
dat_vars = dat\
|
11 |
+
.group_by("Indicator Name", "Indicator Code", "Country Code").len()\
|
12 |
+
.pivot(values="len", index=["Indicator Name", "Indicator Code"], columns="Country Code", aggregate_function="first")\
|
13 |
+
.with_columns((pl.col("COD").fill_null(0) + pl.col("GHA").fill_null(0) +
|
14 |
+
pl.col("KEN").fill_null(0) + pl.col("NGA").fill_null(0) +
|
15 |
+
pl.col("ZAF").fill_null(0) + pl.col("ZWE").fill_null(0)).alias("total"))\
|
16 |
+
.sort(pl.col("total"), descending=True)\
|
17 |
+
.filter(pl.col("total") > 25)
|
18 |
+
|
19 |
+
# %%
|
20 |
+
hv.Scatter(dat.filter(pl.col("Indicator Code").is_in(["NY.GDP.PCAP.PP.KD"])).to_pandas(), "year", "value")\
|
21 |
+
.opts(width=500)
|
22 |
+
|
23 |
+
# %%
|
24 |
+
def select_row(row=0):
|
25 |
+
return dat.slice(row, 1)
|
26 |
+
|
27 |
+
app = pn.interact(select_row, row=(0, dat.select(pl.len()).item()))
|
28 |
+
print(app)
|
29 |
+
# %%
|
30 |
+
pn.Column("## Choose a row", pn.Row(app[0], app[1]))
|
31 |
+
# %%
|
32 |
+
color_picker = pn.widgets.ColorPicker()
|
33 |
+
color_picker
|
34 |
+
# %%
|
35 |
+
html = pn.pane.HTML('', width=200, height=200, styles={'background-color': color_picker.value})
|
36 |
+
color_picker.link(html, value="background-color")# %%
|
37 |
+
html
|
38 |
+
|
39 |
+
# %%
|
40 |
+
x = pn.widgets.IntSlider(name='x', start=0, end=100)
|
41 |
+
background = pn.widgets.ColorPicker(name='Background', value='lightgray')
|
42 |
+
|
43 |
+
def square(x):
|
44 |
+
return f'{x} squared is {x**2}'
|
45 |
+
|
46 |
+
def styles(background):
|
47 |
+
return {'background-color': background, 'padding': '0 10px'}
|
48 |
+
|
49 |
+
pn.Column(
|
50 |
+
x,
|
51 |
+
background,
|
52 |
+
pn.pane.Markdown(pn.bind(square, x), styles=pn.bind(styles, background))
|
53 |
+
)
|
54 |
+
# %%
|
streamlit.py
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# %%
|
2 |
+
# packages
|
3 |
+
import streamlit as st
|
4 |
+
import polars as pl
|
5 |
+
import plotly.express as px
|
6 |
+
import plotly.io as pio
|
7 |
+
pio.templates.default = "simple_white"
|
8 |
+
# %%
|
9 |
+
# Data
|
10 |
+
dat = pl.read_parquet("data/dat_munged.parquet")
|
11 |
+
info = pl.read_csv("data/Metadata_Indicator_API_Download_DS2_en_csv_v2_5657328.csv").rename({"INDICATOR_CODE":"Indicator Code", "INDICATOR_NAME":"Indicator Name"})
|
12 |
+
dat_vars = pl.read_parquet("data/dat_vars.parquet")
|
13 |
+
|
14 |
+
# %%
|
15 |
+
# Example Chart
|
16 |
+
# drop_country = ["ZAF"]
|
17 |
+
# indicator_code = "NY.GDP.PCAP.PP.KD"
|
18 |
+
list_name = dat_vars.select("Indicator Name").to_series().to_list()
|
19 |
+
list_code = dat_vars.select("Indicator Code").to_series().to_list()
|
20 |
+
list_country_code = ["ZAF", "ZWE", "KEN", "NGA", "GHA", "COD"]
|
21 |
+
list_country_name = ["South Africa", "Zimbabwe", "Kenya", "Nigeria", "Ghana", "Congo, Dem. Rep."]
|
22 |
+
|
23 |
+
|
24 |
+
drop_country = st.sidebar.multiselect("Remove Country (Country Code)", list_country_code)
|
25 |
+
|
26 |
+
checked_var = st.sidebar.checkbox("Use Variable Name")
|
27 |
+
|
28 |
+
|
29 |
+
if checked_var:
|
30 |
+
indicator_name = st.sidebar.selectbox("Select your variable", list_name)
|
31 |
+
indicator_code = dat_vars.filter(pl.col("Indicator Name") == indicator_name).select("Indicator Code").to_series()[0]
|
32 |
+
else:
|
33 |
+
indicator_code = st.sidebar.selectbox("Select your variable", list_code)
|
34 |
+
indicator_name = dat_vars.filter(pl.col("Indicator Code") == indicator_code).select("Indicator Name").to_series()[0]
|
35 |
+
|
36 |
+
title_text = indicator_name
|
37 |
+
subtitle_text = info.filter(pl.col("Indicator Code") == indicator_code).select("SOURCE_NOTE").to_series()[0]
|
38 |
+
|
39 |
+
y_axis_title = indicator_name[indicator_name.find("(")+1:indicator_name.find(")")]
|
40 |
+
|
41 |
+
use_dat = dat.filter((pl.col("Indicator Code").is_in([str(indicator_code)])) & (~pl.col("Country Code").is_in(drop_country)) & (pl.col("value").is_not_null()))
|
42 |
+
|
43 |
+
sp = px.line(use_dat.to_pandas(),
|
44 |
+
x="year", y="value", color="Country Name", markers=True,
|
45 |
+
labels = {"year":"Year", "value":y_axis_title},
|
46 |
+
title = title_text)
|
47 |
+
|
48 |
+
st.markdown("## Country performance over time")
|
49 |
+
|
50 |
+
st.markdown("__" + title_text + "__")
|
51 |
+
|
52 |
+
st.markdown(subtitle_text)
|
53 |
+
|
54 |
+
st.markdown("### Chart")
|
55 |
+
st.markdown("_Use the expand arrows visible when you hover over the upper right corner of the chart to see it in full screen._")
|
56 |
+
|
57 |
+
sp
|
58 |
+
|
59 |
+
st.markdown("### Table: " + title_text)
|
60 |
+
|
61 |
+
display_dat = use_dat.select("Country Code", "Indicator Name", "year", "value")
|
62 |
+
|
63 |
+
st.dataframe(
|
64 |
+
display_dat\
|
65 |
+
.pivot(index="year", on="Country Code", values="value", aggregate_function="first")\
|
66 |
+
.sort(pl.col("year"),descending=True), hide_index=True,
|
67 |
+
use_container_width=True,
|
68 |
+
column_config={
|
69 |
+
"value": y_axis_title,
|
70 |
+
"year": st.column_config.NumberColumn(
|
71 |
+
"Year",
|
72 |
+
help="Year of data",
|
73 |
+
format="%.0f"
|
74 |
+
)})
|
75 |
+
|
76 |
+
|
77 |
+
def convert_df(df):
|
78 |
+
return df.write_csv().encode('utf-8')
|
79 |
+
|
80 |
+
csv = convert_df(display_dat)
|
81 |
+
|
82 |
+
st.download_button("Download Data", data = csv, file_name = "data.csv", mime="text/csv")
|