bertin-project
/

bertin-gpt-j-6B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

versae commited on Jun 6, 2022

Commit

90d4c46

·

1 Parent(s): abafe00

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -10,7 +10,8 @@ datasets:
 ---
-- [Version v1beta](https://huggingface.co/bertin-project/bertin-gpt-j-6B/): April 28th, 2022 (*half-precision weights*)
 # BERTIN GPT-J-6B
@@ -52,7 +53,7 @@ BERTIN-GPT-J-6B was finetuned on [mC4-es-sampled (gaussian)](https://huggingface
 ## Training procedure
-This model was finetuned for 26 billion tokens over 408,000 steps on a TPU v3-8 VM. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
 ## Intended Use and Limitations

 ---
+- [Version v1beta2](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2): June 6th, 2022 (*[full](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2) and [half-precision weights](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2-half)*)
+- [Version v1beta1](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta1-half): April 28th, 2022 (*half-precision weights only*)
 # BERTIN GPT-J-6B
 ## Training procedure
+This model was finetuned for 40 billion tokens (40,384,790,528) over 616,000 steps on a single TPU v3-8 VM. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
 ## Intended Use and Limitations