jgrosjean commited on
Commit
9abb0dd
1 Parent(s): dbc65e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -33
README.md CHANGED
@@ -21,7 +21,7 @@ The fine-tuning script can be accessed [here](Link).
21
 
22
  - **Developed by:** [Juri Grosjean](https://huggingface.co/jgrosjean)
23
  - **Model type:** [XMOD](https://huggingface.co/facebook/xmod-base)
24
- - **Language(s) (NLP):** [de_CH, fr_CH, it_CH, rm_CH]
25
  - **License:** [More Information Needed]
26
  - **Finetuned from model:** [SwissBERT](https://huggingface.co/ZurichNLP/swissbert)
27
 
@@ -70,32 +70,12 @@ tensor([[ 5.6306e-02, -2.8375e-01, -4.1495e-02, 7.4393e-02, -3.1552e-01,
70
  ...]])
71
  ```
72
 
73
- [More Information Needed]
74
-
75
- ### Downstream Use [optional]
76
-
77
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
78
-
79
- [More Information Needed]
80
-
81
  ## Bias, Risks, and Limitations
82
 
83
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
84
-
85
  This multilingual model has not been fine-tuned for cross-lingual transfer. It is intended for computing sentence embeddings that can be compared mono-lingually.
86
 
87
- ### Recommendations
88
-
89
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
90
-
91
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
92
-
93
- ## How to Get Started with the Model
94
-
95
- Use the code below to get started with the model.
96
-
97
- [More Information Needed]
98
-
99
  ## Training Details
100
 
101
  ### Training Data
@@ -115,11 +95,24 @@ Use the code below to get started with the model.
115
 
116
  #### Training Hyperparameters
117
 
118
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
119
-
120
- #### Speeds, Sizes, Times [optional]
121
-
122
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
  [More Information Needed]
125
 
@@ -155,12 +148,6 @@ Use the code below to get started with the model.
155
 
156
 
157
 
158
- ## Model Examination [optional]
159
-
160
- <!-- Relevant interpretability work for the model goes here -->
161
-
162
- [More Information Needed]
163
-
164
  ## Environmental Impact
165
 
166
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
21
 
22
  - **Developed by:** [Juri Grosjean](https://huggingface.co/jgrosjean)
23
  - **Model type:** [XMOD](https://huggingface.co/facebook/xmod-base)
24
+ - **Language(s) (NLP):** de_CH, fr_CH, it_CH, rm_CH
25
  - **License:** [More Information Needed]
26
  - **Finetuned from model:** [SwissBERT](https://huggingface.co/ZurichNLP/swissbert)
27
 
 
70
  ...]])
71
  ```
72
 
 
 
 
 
 
 
 
 
73
  ## Bias, Risks, and Limitations
74
 
75
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
76
+ This model has been trained on news articles only. Hence, it might not perform as well on other text classes.
77
  This multilingual model has not been fine-tuned for cross-lingual transfer. It is intended for computing sentence embeddings that can be compared mono-lingually.
78
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ## Training Details
80
 
81
  ### Training Data
 
95
 
96
  #### Training Hyperparameters
97
 
98
+ - **Training regime:** python3 train_simcse_multilingual.py \
99
+ --seed 54699 \
100
+ --model_name_or_path zurichNLP/swissbert \
101
+ --train_file /srv/scratch2/grosjean/Masterarbeit/data_subsets \
102
+ --output_dir /srv/scratch2/grosjean/Masterarbeit/model \
103
+ --overwrite_output_dir \
104
+ --save_strategy no \
105
+ --do_train \
106
+ --num_train_epochs 1 \
107
+ --learning_rate 1e-5 \
108
+ --per_device_train_batch_size 4 \
109
+ --gradient_accumulation_steps 128 \
110
+ --max_seq_length 512 \
111
+ --overwrite_cache \
112
+ --pooler_type avg \
113
+ --pad_to_max_length \
114
+ --temp 0.05 \
115
+ --fp16 <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
116
 
117
  [More Information Needed]
118
 
 
148
 
149
 
150
 
 
 
 
 
 
 
151
  ## Environmental Impact
152
 
153
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->