gonzalez-agirre
commited on
Commit
•
4574f98
1
Parent(s):
c157649
Update README.md
Browse files
README.md
CHANGED
@@ -77,12 +77,36 @@ pipeline_tag: text-generation
|
|
77 |
|
78 |
# falcon_7b_balanced_tokenizer_fp16_CPT_open_data_26B_tokens_balanced_es_ca
|
79 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
80 |
## Model description
|
81 |
|
82 |
The **Cǒndor-7B** is a transformer-based causal language model for Catalan, Spanish, and English. It is based on the [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) model and has been trained on a 26B token trilingual corpus collected from publicly available corpora and crawlers.
|
83 |
|
84 |
|
85 |
-
## Intended uses
|
86 |
|
87 |
The **Cǒndor-7B** model is ready-to-use only for causal language modeling to perform text-generation tasks. However, it is intended to be fine-tuned on a generative downstream task.
|
88 |
|
@@ -118,7 +142,7 @@ generation = pipeline(
|
|
118 |
print(f"Result: {generation['generated_text']}")
|
119 |
```
|
120 |
|
121 |
-
## Limitations and
|
122 |
At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
|
123 |
|
124 |
|
|
|
77 |
|
78 |
# falcon_7b_balanced_tokenizer_fp16_CPT_open_data_26B_tokens_balanced_es_ca
|
79 |
|
80 |
+
## Table of Contents
|
81 |
+
<details>
|
82 |
+
<summary>Click to expand</summary>
|
83 |
+
|
84 |
+
- [Model description](#model-description)
|
85 |
+
- [Intended uses and limitations](#intended-use)
|
86 |
+
- [How to use](#how-to-use)
|
87 |
+
- [Limitations and bias](#limitations-and-bias)
|
88 |
+
- [Language adaptation](#language-adaptation)
|
89 |
+
- [Training](#training)
|
90 |
+
- [Training data](#training-data)
|
91 |
+
- [Training procedure](#training-procedure)
|
92 |
+
- [Licensing Information](#licensing-information)
|
93 |
+
- [Additional information](#additional-information)
|
94 |
+
- [Author](#author)
|
95 |
+
- [Contact information](#contact-information)
|
96 |
+
- [Copyright](#copyright)
|
97 |
+
- [Licensing information](#licensing-information)
|
98 |
+
- [Funding](#funding)
|
99 |
+
- [Citing information](#citing-information)
|
100 |
+
- [Disclaimer](#disclaimer)
|
101 |
+
|
102 |
+
</details>
|
103 |
+
|
104 |
## Model description
|
105 |
|
106 |
The **Cǒndor-7B** is a transformer-based causal language model for Catalan, Spanish, and English. It is based on the [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) model and has been trained on a 26B token trilingual corpus collected from publicly available corpora and crawlers.
|
107 |
|
108 |
|
109 |
+
## Intended uses and limitations
|
110 |
|
111 |
The **Cǒndor-7B** model is ready-to-use only for causal language modeling to perform text-generation tasks. However, it is intended to be fine-tuned on a generative downstream task.
|
112 |
|
|
|
142 |
print(f"Result: {generation['generated_text']}")
|
143 |
```
|
144 |
|
145 |
+
## Limitations and bias
|
146 |
At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
|
147 |
|
148 |
|