RaThorat commited on
Commit
b24dc98
·
verified ·
1 Parent(s): 1119f8a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -131
README.md CHANGED
@@ -19,70 +19,32 @@ This modelcard aims to be a base template for new models. It has been generated
19
  ### Model Description
20
 
21
  <!-- Provide a longer summary of what this model is. -->
22
-
23
-
24
-
25
- - **Developed by:** [More Information Needed]
26
- - **Funded by [optional]:** [More Information Needed]
27
- - **Shared by [optional]:** [More Information Needed]
28
- - **Model type:** [More Information Needed]
29
- - **Language(s) (NLP):** [More Information Needed]
30
- - **License:** [More Information Needed]
31
- - **Finetuned from model [optional]:** [More Information Needed]
32
 
33
  ### Model Sources [optional]
34
 
35
  <!-- Provide the basic links for the model. -->
36
 
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
 
41
  ## Uses
42
 
43
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
44
 
45
  ### Direct Use
46
 
47
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
48
 
49
  [More Information Needed]
50
 
51
- ### Downstream Use [optional]
52
-
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
-
55
- [More Information Needed]
56
-
57
- ### Out-of-Scope Use
58
-
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
-
61
- [More Information Needed]
62
-
63
- ## Bias, Risks, and Limitations
64
-
65
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
-
67
- [More Information Needed]
68
-
69
- ### Recommendations
70
-
71
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
-
73
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
-
75
- ## How to Get Started with the Model
76
-
77
- Use the code below to get started with the model.
78
-
79
- [More Information Needed]
80
 
81
  ## Training Details
82
 
83
  ### Training Data
84
 
85
- <!-- 46 txt, pdf en odt documenten van de DUS-I website zijn gebruikt om Chunks (200 woorden per chunk) te maken in JSON-formaat. -->
86
 
87
  [More Information Needed]
88
 
@@ -92,42 +54,13 @@ Use the code below to get started with the model.
92
 
93
  #### Preprocessing [optional]
94
 
95
- [Documenten gegroepeerd (groeperen_segment_text_to_jsonl.py) in labels zoals: PROJECT, HANDLEIDING, OVEREENKOMST, PLAN, BELEID, SUBSIDIE.]
96
 
97
 
98
  #### Training Hyperparameters
99
 
100
- - **Training regime:** [Uitgevoerd met GroNLP/bert-base-dutch-cased model (110 miljoen parameters).] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
-
102
- #### Speeds, Sizes, Times [optional]
103
-
104
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
-
106
- [More Information Needed]
107
-
108
- ## Evaluation
109
-
110
- <!-- This section describes the evaluation protocols and provides the results. -->
111
-
112
- ### Testing Data, Factors & Metrics
113
-
114
- #### Testing Data
115
-
116
- <!-- This should link to a Dataset Card if possible. -->
117
-
118
- [More Information Needed]
119
-
120
- #### Factors
121
-
122
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
-
124
- [More Information Needed]
125
-
126
- #### Metrics
127
-
128
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
 
130
- [More Information Needed]
131
 
132
  ### Results
133
 
@@ -135,34 +68,15 @@ Use the code below to get started with the model.
135
 
136
  #### Summary
137
 
138
- Categorisatie:
139
-
140
- Script voor textcat model: train_textcat_model.py.
141
-
142
-
143
- ## Model Examination [optional]
144
-
145
- <!-- Relevant interpretability work for the model goes here -->
146
-
147
- [More Information Needed]
148
-
149
- ## Environmental Impact
150
-
151
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
152
-
153
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
154
 
155
- - **Hardware Type:** [More Information Needed]
156
- - **Hours used:** [More Information Needed]
157
- - **Cloud Provider:** [More Information Needed]
158
- - **Compute Region:** [More Information Needed]
159
- - **Carbon Emitted:** [More Information Needed]
160
 
161
  ## Technical Specifications [optional]
162
 
163
  ### Model Architecture and Objective
164
 
165
- [More Information Needed]
 
166
 
167
  ### Compute Infrastructure
168
 
@@ -170,38 +84,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
170
 
171
  #### Hardware
172
 
173
- [8 vCPU's en 64 GB RAM was vereist.]
174
-
175
- #### Software
176
-
177
- [More Information Needed]
178
-
179
- ## Citation [optional]
180
-
181
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
182
-
183
- **BibTeX:**
184
-
185
- [More Information Needed]
186
-
187
- **APA:**
188
-
189
- [More Information Needed]
190
-
191
- ## Glossary [optional]
192
-
193
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
194
-
195
- [More Information Needed]
196
-
197
- ## More Information [optional]
198
-
199
- [More Information Needed]
200
-
201
- ## Model Card Authors [optional]
202
-
203
- [More Information Needed]
204
-
205
- ## Model Card Contact
206
-
207
- [More Information Needed]
 
19
  ### Model Description
20
 
21
  <!-- Provide a longer summary of what this model is. -->
22
+ Het doel is een schaalbare, privacyschone oplossing die gebruik maakt van openbare gegevens van DUS-I (zoals beleidsdocumenten en nieuwsberichten) om medewerkers snel en accuraat te informeren.
 
 
 
 
 
 
 
 
 
23
 
24
  ### Model Sources [optional]
25
 
26
  <!-- Provide the basic links for the model. -->
27
 
28
+ - **Repository:** https://github.com/RaThorat/my-chatbot-project
 
 
29
 
30
  ## Uses
31
 
32
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
33
+ Identificatie van vragen: Veelvoorkomende onderwerpen zijn subsidie-informatie, beleidsontwikkelingen en handleidingen.
34
 
35
  ### Direct Use
36
 
37
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
38
+ Tijd besparen door snel informatie te leveren aan medewerkers via AI.
39
 
40
  [More Information Needed]
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Training Details
44
 
45
  ### Training Data
46
 
47
+ 46 txt, pdf en odt documenten van de DUS-I website zijn gebruikt om Chunks (200 woorden per chunk) te maken in JSON-formaat.
48
 
49
  [More Information Needed]
50
 
 
54
 
55
  #### Preprocessing [optional]
56
 
57
+ Documenten gegroepeerd (groeperen_segment_text_to_jsonl.py) in labels zoals: PROJECT, HANDLEIDING, OVEREENKOMST, PLAN, BELEID, SUBSIDIE.
58
 
59
 
60
  #### Training Hyperparameters
61
 
62
+ - **Training regime:** Uitgevoerd met GroNLP/bert-base-dutch-cased model (110 miljoen parameters). <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
 
64
 
65
  ### Results
66
 
 
68
 
69
  #### Summary
70
 
71
+ Script voor textcat model: https://github.com/RaThorat/my-chatbot-project/blob/main/scripts/train_textcat_model.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
 
 
 
 
 
73
 
74
  ## Technical Specifications [optional]
75
 
76
  ### Model Architecture and Objective
77
 
78
+ 46 txt, pdf en odt documenten van de DUS-I website zijn gebruikt om Chunks (200 woorden per chunk) te maken in JSON-formaat.
79
+ Voor text categorization model: dezelfde documenten omgezet naar JSONL-formaat.
80
 
81
  ### Compute Infrastructure
82
 
 
84
 
85
  #### Hardware
86
 
87
+ 8 vCPU's en 64 GB RAM was vereist.