alex-miller commited on
Commit
6629ddc
·
verified ·
1 Parent(s): 37e2a38

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -4,30 +4,32 @@ base_model: bert-base-multilingual-uncased
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
- - name: bert-base-multilingual-uncased-finetuned-wiki-crs
8
  results: []
 
 
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
- # bert-base-multilingual-uncased-finetuned-wiki-crs
15
 
16
- This model is a fine-tuned version of [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 0.9961
19
 
20
  ## Model description
21
 
22
- More information needed
23
 
24
  ## Intended uses & limitations
25
 
26
- More information needed
27
 
28
  ## Training and evaluation data
29
 
30
- More information needed
31
 
32
  ## Training procedure
33
 
@@ -56,4 +58,4 @@ The following hyperparameters were used during training:
56
  - Transformers 4.38.2
57
  - Pytorch 2.0.1
58
  - Datasets 2.18.0
59
- - Tokenizers 0.15.2
 
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
+ - name: ODABert
8
  results: []
9
+ datasets:
10
+ - alex-miller/oecd-dac-crs
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
+ # ODABert
17
 
18
+ This model is a fine-tuned version of [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) on the [OECD DAC CRS project titles and descriptions](https://huggingface.co/datasets/alex-miller/oecd-dac-crs) dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.9961
21
 
22
  ## Model description
23
 
24
+ A 3 epoch fine-tune of BERT base multilingual uncased on development and humanitarian finance project titles and descriptions from the OECD DAC CRS. Vocabulary of the base model was expanded by 1,059 tokens (1% increase) based on the most prevalent tokens in the CRS that were not present in the base model vocabulary.
25
 
26
  ## Intended uses & limitations
27
 
28
+ Developed as an experiment to see whether fine-tuning on the CRS would help improve classifier models built on top of this MLM. Although it's built on a multilingual model, an the finetuning texts do include other languages, English will be the most prevalent.
29
 
30
  ## Training and evaluation data
31
 
32
+ See the [OECD DAC CRS project titles and descriptions](https://huggingface.co/datasets/alex-miller/oecd-dac-crs) dataset.
33
 
34
  ## Training procedure
35
 
 
58
  - Transformers 4.38.2
59
  - Pytorch 2.0.1
60
  - Datasets 2.18.0
61
+ - Tokenizers 0.15.2