Text Generation
Transformers
PyTorch
Safetensors
Spanish
gptj
causal-lm
Inference Endpoints
versae commited on
Commit
90d4c46
·
1 Parent(s): abafe00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -10,7 +10,8 @@ datasets:
10
 
11
  ---
12
 
13
- - [Version v1beta](https://huggingface.co/bertin-project/bertin-gpt-j-6B/): April 28th, 2022 (*half-precision weights*)
 
14
 
15
  # BERTIN GPT-J-6B
16
 
@@ -52,7 +53,7 @@ BERTIN-GPT-J-6B was finetuned on [mC4-es-sampled (gaussian)](https://huggingface
52
 
53
  ## Training procedure
54
 
55
- This model was finetuned for 26 billion tokens over 408,000 steps on a TPU v3-8 VM. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
56
 
57
  ## Intended Use and Limitations
58
 
 
10
 
11
  ---
12
 
13
+ - [Version v1beta2](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2): June 6th, 2022 (*[full](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2) and [half-precision weights](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2-half)*)
14
+ - [Version v1beta1](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta1-half): April 28th, 2022 (*half-precision weights only*)
15
 
16
  # BERTIN GPT-J-6B
17
 
 
53
 
54
  ## Training procedure
55
 
56
+ This model was finetuned for 40 billion tokens (40,384,790,528) over 616,000 steps on a single TPU v3-8 VM. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
57
 
58
  ## Intended Use and Limitations
59