sambanovasystems
/

SambaLingo-Bulgarian-Base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zolicsaki commited on Feb 23, 2024

Commit

5180789

·

verified ·

1 Parent(s): 2352a2c

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -52,10 +52,8 @@ All pre-training is done on the [Cultura-X](https://huggingface.co/datasets/uonl
 We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
 ## Evaluation
-|| SambaLingo-Turkish-Base       | TURNA               | bloom-7b1 | xglm-7.5B | mGPT-13B |
 |-------------------------------|---------------------|-----------|-----------|----------|--------|
-| Perplexity (Lower Is Better)  | 1.589               | 13.435    | 2.804     | 1.799    | 2.386  |
-| SambaLingo-Bulgarian-Base     | mGPT-1.3B-bulgarian | bloom-7b1 | xglm-7.5B | mGPT-13B |        |
 | Perplexity (Lower Is Better)  | 1.416               | 1.755     | 2.051     | 1.502    | 1.889  |
 | FLORES en->bg (8 shot, CHRF)  | 0.558               | 0.143     | 0.211     | 0.484    | 0.136  |
 | FLORES bg->en (8 shot, CHRF)  | 0.621               | 0.227     | 0.182     | 0.347    | 0.145  |

 We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
 ## Evaluation
+|| SambaLingo-Bulgarian-Base     | mGPT-1.3B-bulgarian | bloom-7b1 | xglm-7.5B | mGPT-13B |
 |-------------------------------|---------------------|-----------|-----------|----------|--------|
 | Perplexity (Lower Is Better)  | 1.416               | 1.755     | 2.051     | 1.502    | 1.889  |
 | FLORES en->bg (8 shot, CHRF)  | 0.558               | 0.143     | 0.211     | 0.484    | 0.136  |
 | FLORES bg->en (8 shot, CHRF)  | 0.621               | 0.227     | 0.182     | 0.347    | 0.145  |