pkiage commited on
Commit
74f6cdb
·
1 Parent(s): 79d8ed7

docs update - code block

Browse files
Files changed (1) hide show
  1. README.md +72 -30
README.md CHANGED
@@ -5,49 +5,75 @@
5
  An interactive tool demonstrating credit risk modelling.
6
 
7
  Emphasis on:
8
- * Building models
9
- * Comparing techniques
10
- * Interpretating results
 
11
 
12
  ## Built With
13
 
14
  - [Streamlit](https://streamlit.io/)
15
 
16
- ### Hardware initially built on:
 
17
  Processor: 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80Ghz, 2803 Mhz, 4 Core(s), 8 Logical Processor(s)
18
 
19
- Memory (RAM): 16GB
20
 
21
  ## Local setup
 
22
  ### Obtain the repo locally and open its root folder
 
23
  #### To potentially contribute
 
 
24
  git clone https://github.com/pkiage/tool-credit-risk-modelling.git
 
25
 
26
  or
27
 
 
28
  gh repo clone pkiage/tool-credit-risk-modelling
 
29
 
30
  #### Just to deploy locally
 
31
  Download ZIP
32
 
33
  ### (optional) Setup virtual environment:
 
 
34
  python -m venv venv
 
35
 
36
  ### (optional) Activate virtual environment:
37
- #### If using Unix based OS run the following in terminal:
 
 
 
38
  .\venv\bin\activate
 
39
 
40
  #### If using Windows run the following in terminal:
 
 
41
  .\venv\Scripts\activate
 
42
 
43
  ### Install requirements by running the following in terminal:
 
44
  #### Required packages
 
 
45
  pip install -r requirements.txt
 
46
 
47
  #### Complete graphviz installation
48
- https://graphviz.org/download/
 
49
 
50
  ## Build and install local package
 
51
  ```shell
52
  python setup.py build
53
  ```
@@ -58,76 +84,88 @@ python setup.py install
58
 
59
  ### Run the streamlit app (app.py) by running the following in terminal (from repository root folder):
60
 
 
61
  streamlit run src/app.py
 
62
 
63
  ## Deployed setup details
64
- For faster model building and testing (particularly XGBoost) a local setup or on a more powerful server than free heroku dyno type is recommended. ([tutorials on servers for data science & ML](https://course.fast.ai))
65
-
66
- [Free Heroku dyno type](https://devcenter.heroku.com/articles/dyno-types) was used to deploy the app
67
 
 
68
 
 
69
 
70
  Memory (RAM): 512 MB
71
 
72
  CPU Share: 1x
73
 
74
- Compute: 1x-4x
75
 
76
  Dedicated: no
77
 
78
  Sleeps: yes
79
 
80
  # Roadmap
 
81
  Models:
 
82
  - [ ] Add LightGBM
83
  - [ ] Add Adabost
84
  - [ ] Add Random Forest
85
 
86
  Visualization:
 
87
  - [ ] Add decision surface plot(s)
88
 
89
  Documentation:
 
90
  - [x] Add getting started and usage documentation
91
  - [ ] Add documentation evaluating models
92
  - [ ] Add design rationale(s)
93
 
94
  Other:
 
95
  - [x] Deploy app
96
  - [ ] Add csv file data input
97
  - [ ] Add tests
98
  - [ ] Add test/code coverage badge
99
  - [ ] Add continuous integration badge
100
 
101
-
102
-
103
  # Docs creation
 
104
  ## [pydeps](https://github.com/thebjorn/pydeps) Python module depenency visualization
105
 
106
- *Delete __init__.py and __main__.py* then run the following
107
 
108
  ### App and clusters
 
109
  ```shell
110
  pydeps src/app.py --max-bacon=5 --cluster --rankdir BT -o docs/module-dependency-graph/src-app-clustered.svg
111
  ```
112
 
113
  ### App and links
 
114
  Features, models, & visualization links:
 
115
  ```shell
116
- pydeps src/app.py --only features models visualization --max-bacon=4 --rankdir BT -o docs/module-dependency-graph/src-feature-model-visualization.svg
117
  ```
118
 
119
  ### Only features
 
120
  ```shell
121
- pydeps src/app.py --only features --max-bacon=5 --cluster --max-cluster-size=3 --rankdir BT -o docs/module-dependency-graph/src-features.svg
122
  ```
123
 
124
  ### Only models
 
125
  ```shell
126
- pydeps src/app.py --only models --max-bacon=5 --cluster --max-cluster-size=15 --rankdir BT -o docs/module-dependency-graph/src-models.svg
127
  ```
128
 
129
  ## [code2flow](https://github.com/scottrogowski/code2flow) Call graphs for a pretty good estimate of project structure
 
130
  ### Logistic
 
131
  ```shell
132
  code2flow src/models/logistic_train_model.py -o docs/call-graph/logistic_train_model.svg
133
  ```
@@ -137,6 +175,7 @@ code2flow src/models/logistic_model.py -o docs/call-graph/logistic_model.svg
137
  ```
138
 
139
  ### Xgboost
 
140
  ```shell
141
  code2flow src/models/xgboost_train_model.py -o docs/call-graph/xgboost_train_model.svg
142
  ```
@@ -146,6 +185,7 @@ code2flow src/models/xgboost_model.py -o docs/call-graph/xgboost_model.svg
146
  ```
147
 
148
  ### utils
 
149
  ```shell
150
  code2flow src/models/util_test.py -o docs/call-graph/util_test.svg
151
  ```
@@ -162,7 +202,6 @@ code2flow src/models/util_predict_model.py -o docs/call-graph/util_predict_model
162
  code2flow src/models/util_model_comparison.py -o docs/call-graph/util_model_comparison.svg
163
  ```
164
 
165
-
166
  # References
167
 
168
  ## Inspiration:
@@ -181,6 +220,7 @@ code2flow src/models/util_model_comparison.py -o docs/call-graph/util_model_comp
181
  - Project structure
182
 
183
  [GraphViz Buildpack](https://github.com/weibeld/heroku-buildpack-graphviz)
 
184
  - Buildpack used for Heroku deployment
185
 
186
  ## Political, Economic, Social, Technological, Legal and Environmental(PESTLE):
@@ -190,19 +230,21 @@ code2flow src/models/util_model_comparison.py -o docs/call-graph/util_model_comp
190
  > "(37) Another area in which the use of AI systems deserves special consideration is the access to and enjoyment of certain essential private and public services and benefits necessary for people to fully participate in society or to improve one’s standard of living. In particular, AI systems used to evaluate the credit score or creditworthiness of natural persons should be classified as high-risk AI systems, since they determine those persons’ access to financial resources or essential services such as housing, electricity, and telecommunication services. AI systems used for this purpose may lead to discrimination of persons or groups and perpetuate historical patterns of discrimination, for example based on racial or ethnic origins, disabilities, age, sexual orientation, or create new forms of discriminatory impacts. Considering the very limited scale of the impact and the available alternatives on the market, it is appropriate to exempt AI systems for the purpose of creditworthiness assessment and credit scoring when put into service by small-scale providers for their own use. Natural persons applying for or receiving public assistance benefits and services from public authorities are typically dependent on those benefits and services and in a vulnerable position in relation to the responsible authorities. If AI systems are used for determining whether such benefits and services should be denied, reduced, revoked or reclaimed by authorities, they may have a significant impact on persons’ livelihood and may infringe their fundamental rights, such as the right to social protection, non-discrimination, human dignity or an effective remedy. Those systems should therefore be classified as high-risk. Nonetheless, this Regulation should not hamper the development and use of innovative approaches in the public administration, which would stand to benefit from a wider use of compliant and safe AI systems, provided that those systems do not entail a high risk to legal and natural persons."
191
 
192
  [Europe fit for the Digital Age: Commission proposes new rules and actions for excellence and trust in Artificial Intelligence](https://ec.europa.eu/commission/presscorner/detail/en/ip_21_1682)
 
193
  > "High-risk AI systems will be subject to strict obligations before they can be put on the market:
194
- >* Adequate risk assessment and mitigation systems;
195
- >* High quality of the datasets feeding the system to minimise risks and discriminatory outcomes;
196
- >* Logging of activity to ensure traceability of results;
197
- >* Detailed documentation providing all information necessary on the system and its purpose for authorities to assess its compliance;
198
- >* Clear and adequate information to the user;
199
- >* Appropriate human oversight measures to minimise risk;
200
- >* High level of robustness, security and accuracy."
 
201
 
202
  [A list of open problems in DeFi](https://mirror.xyz/0xemperor.eth/0guEj0CYt5V8J5AKur2_UNKyOhONr1QJaG4NGDF0YoQ?utm_source=tldrnewsletter)
203
- * Automated risk scoring of lending borrowing pools -> Increasingly important problem
204
- * One alternative way of looking at the problem would be, looking at a function for calculating the probability of default given the pool of assets you have.
205
- * Managing Risk for lenders and distributing risk/ Undercollateralized Loans
206
- * Tradfi is plagued by NPAs [(Nonperforming assets)] but still ultimately fall back to some sort of credit score establishment [[Spectral finance](https://www.spectral.finance/) solving this, but still an open problem].
207
- * But still, most credit score methods would rely on onchain history for credit establishment, we are moving towards privacy-centric defi is this approach extendable to that idea? [Homomorphic encryption could provide a solution]
208
 
 
 
 
 
 
 
5
  An interactive tool demonstrating credit risk modelling.
6
 
7
  Emphasis on:
8
+
9
+ - Building models
10
+ - Comparing techniques
11
+ - Interpretating results
12
 
13
  ## Built With
14
 
15
  - [Streamlit](https://streamlit.io/)
16
 
17
+ #### Hardware initially built on:
18
+
19
  Processor: 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80Ghz, 2803 Mhz, 4 Core(s), 8 Logical Processor(s)
20
 
21
+ Memory (RAM): 16GB
22
 
23
  ## Local setup
24
+
25
  ### Obtain the repo locally and open its root folder
26
+
27
  #### To potentially contribute
28
+
29
+ ```shell
30
  git clone https://github.com/pkiage/tool-credit-risk-modelling.git
31
+ ```
32
 
33
  or
34
 
35
+ ```shell
36
  gh repo clone pkiage/tool-credit-risk-modelling
37
+ ```
38
 
39
  #### Just to deploy locally
40
+
41
  Download ZIP
42
 
43
  ### (optional) Setup virtual environment:
44
+
45
+ ```shell
46
  python -m venv venv
47
+ ```
48
 
49
  ### (optional) Activate virtual environment:
50
+
51
+ #### If using Unix based OS run the following in terminal:
52
+
53
+ ```shell
54
  .\venv\bin\activate
55
+ ```
56
 
57
  #### If using Windows run the following in terminal:
58
+
59
+ ```shell
60
  .\venv\Scripts\activate
61
+ ```
62
 
63
  ### Install requirements by running the following in terminal:
64
+
65
  #### Required packages
66
+
67
+ ```shell
68
  pip install -r requirements.txt
69
+ ```
70
 
71
  #### Complete graphviz installation
72
+
73
+ https://graphviz.org/download/
74
 
75
  ## Build and install local package
76
+
77
  ```shell
78
  python setup.py build
79
  ```
 
84
 
85
  ### Run the streamlit app (app.py) by running the following in terminal (from repository root folder):
86
 
87
+ ```shell
88
  streamlit run src/app.py
89
+ ```
90
 
91
  ## Deployed setup details
 
 
 
92
 
93
+ For faster model building and testing (particularly XGBoost) a local setup or on a more powerful server than free heroku dyno type is recommended. ([tutorials on servers for data science & ML](https://course.fast.ai))
94
 
95
+ [Free Heroku dyno type](https://devcenter.heroku.com/articles/dyno-types) was used to deploy the app
96
 
97
  Memory (RAM): 512 MB
98
 
99
  CPU Share: 1x
100
 
101
+ Compute: 1x-4x
102
 
103
  Dedicated: no
104
 
105
  Sleeps: yes
106
 
107
  # Roadmap
108
+
109
  Models:
110
+
111
  - [ ] Add LightGBM
112
  - [ ] Add Adabost
113
  - [ ] Add Random Forest
114
 
115
  Visualization:
116
+
117
  - [ ] Add decision surface plot(s)
118
 
119
  Documentation:
120
+
121
  - [x] Add getting started and usage documentation
122
  - [ ] Add documentation evaluating models
123
  - [ ] Add design rationale(s)
124
 
125
  Other:
126
+
127
  - [x] Deploy app
128
  - [ ] Add csv file data input
129
  - [ ] Add tests
130
  - [ ] Add test/code coverage badge
131
  - [ ] Add continuous integration badge
132
 
 
 
133
  # Docs creation
134
+
135
  ## [pydeps](https://github.com/thebjorn/pydeps) Python module depenency visualization
136
 
137
+ _Delete **init**.py and **main**.py_ then run the following
138
 
139
  ### App and clusters
140
+
141
  ```shell
142
  pydeps src/app.py --max-bacon=5 --cluster --rankdir BT -o docs/module-dependency-graph/src-app-clustered.svg
143
  ```
144
 
145
  ### App and links
146
+
147
  Features, models, & visualization links:
148
+
149
  ```shell
150
+ pydeps src/app.py --only features models visualization --max-bacon=4 --rankdir BT -o docs/module-dependency-graph/src-feature-model-visualization.svg
151
  ```
152
 
153
  ### Only features
154
+
155
  ```shell
156
+ pydeps src/app.py --only features --max-bacon=5 --cluster --max-cluster-size=3 --rankdir BT -o docs/module-dependency-graph/src-features.svg
157
  ```
158
 
159
  ### Only models
160
+
161
  ```shell
162
+ pydeps src/app.py --only models --max-bacon=5 --cluster --max-cluster-size=15 --rankdir BT -o docs/module-dependency-graph/src-models.svg
163
  ```
164
 
165
  ## [code2flow](https://github.com/scottrogowski/code2flow) Call graphs for a pretty good estimate of project structure
166
+
167
  ### Logistic
168
+
169
  ```shell
170
  code2flow src/models/logistic_train_model.py -o docs/call-graph/logistic_train_model.svg
171
  ```
 
175
  ```
176
 
177
  ### Xgboost
178
+
179
  ```shell
180
  code2flow src/models/xgboost_train_model.py -o docs/call-graph/xgboost_train_model.svg
181
  ```
 
185
  ```
186
 
187
  ### utils
188
+
189
  ```shell
190
  code2flow src/models/util_test.py -o docs/call-graph/util_test.svg
191
  ```
 
202
  code2flow src/models/util_model_comparison.py -o docs/call-graph/util_model_comparison.svg
203
  ```
204
 
 
205
  # References
206
 
207
  ## Inspiration:
 
220
  - Project structure
221
 
222
  [GraphViz Buildpack](https://github.com/weibeld/heroku-buildpack-graphviz)
223
+
224
  - Buildpack used for Heroku deployment
225
 
226
  ## Political, Economic, Social, Technological, Legal and Environmental(PESTLE):
 
230
  > "(37) Another area in which the use of AI systems deserves special consideration is the access to and enjoyment of certain essential private and public services and benefits necessary for people to fully participate in society or to improve one’s standard of living. In particular, AI systems used to evaluate the credit score or creditworthiness of natural persons should be classified as high-risk AI systems, since they determine those persons’ access to financial resources or essential services such as housing, electricity, and telecommunication services. AI systems used for this purpose may lead to discrimination of persons or groups and perpetuate historical patterns of discrimination, for example based on racial or ethnic origins, disabilities, age, sexual orientation, or create new forms of discriminatory impacts. Considering the very limited scale of the impact and the available alternatives on the market, it is appropriate to exempt AI systems for the purpose of creditworthiness assessment and credit scoring when put into service by small-scale providers for their own use. Natural persons applying for or receiving public assistance benefits and services from public authorities are typically dependent on those benefits and services and in a vulnerable position in relation to the responsible authorities. If AI systems are used for determining whether such benefits and services should be denied, reduced, revoked or reclaimed by authorities, they may have a significant impact on persons’ livelihood and may infringe their fundamental rights, such as the right to social protection, non-discrimination, human dignity or an effective remedy. Those systems should therefore be classified as high-risk. Nonetheless, this Regulation should not hamper the development and use of innovative approaches in the public administration, which would stand to benefit from a wider use of compliant and safe AI systems, provided that those systems do not entail a high risk to legal and natural persons."
231
 
232
  [Europe fit for the Digital Age: Commission proposes new rules and actions for excellence and trust in Artificial Intelligence](https://ec.europa.eu/commission/presscorner/detail/en/ip_21_1682)
233
+
234
  > "High-risk AI systems will be subject to strict obligations before they can be put on the market:
235
+ >
236
+ > - Adequate risk assessment and mitigation systems;
237
+ > - High quality of the datasets feeding the system to minimise risks and discriminatory outcomes;
238
+ > - Logging of activity to ensure traceability of results;
239
+ > - Detailed documentation providing all information necessary on the system and its purpose for authorities to assess its compliance;
240
+ > - Clear and adequate information to the user;
241
+ > - Appropriate human oversight measures to minimise risk;
242
+ > - High level of robustness, security and accuracy."
243
 
244
  [A list of open problems in DeFi](https://mirror.xyz/0xemperor.eth/0guEj0CYt5V8J5AKur2_UNKyOhONr1QJaG4NGDF0YoQ?utm_source=tldrnewsletter)
 
 
 
 
 
245
 
246
+ - Automated risk scoring of lending borrowing pools -> Increasingly important problem
247
+ - One alternative way of looking at the problem would be, looking at a function for calculating the probability of default given the pool of assets you have.
248
+ - Managing Risk for lenders and distributing risk/ Undercollateralized Loans
249
+ - Tradfi is plagued by NPAs [(Nonperforming assets)] but still ultimately fall back to some sort of credit score establishment [[Spectral finance](https://www.spectral.finance/) solving this, but still an open problem].
250
+ - But still, most credit score methods would rely on onchain history for credit establishment, we are moving towards privacy-centric defi is this approach extendable to that idea? [Homomorphic encryption could provide a solution]