nikokons commited on
Commit
b2bb85c
·
1 Parent(s): 0e3198e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -2
README.md CHANGED
@@ -12,5 +12,3 @@ The model is the "small" version of GPT-2 (12-layer, 768-hidden, 12-heads) with
12
  ## Training details:
13
  It is trained from scratch a generative Transformer model as GPT-2 on a large corpus of Greek text so that the model can generate long stretches of contiguous coherent text. Attention dropouts with a rate of 0.1 are used for regularization on all layers and L2 weight decay of 0,01. In addition, a batch size of 4 and accumulated gradients over 8 iterations are used, resulting in an effective batch size of 32. The model uses the Adam optimization scheme with a learning rate of 1e-4 and is trained for 20 epochs. The learning rate increases linearly from zero over the first 9000 updates and decreases linearly by using a linear schedule. The implementation is based on the open-source PyTorch-transformer library (HuggingFace 2019).
14
 
15
- ## Fine-tuned model using the pre-trained "gpt2-greek":
16
- https://huggingface.co/nikokons/conversational-agent-el
 
12
  ## Training details:
13
  It is trained from scratch a generative Transformer model as GPT-2 on a large corpus of Greek text so that the model can generate long stretches of contiguous coherent text. Attention dropouts with a rate of 0.1 are used for regularization on all layers and L2 weight decay of 0,01. In addition, a batch size of 4 and accumulated gradients over 8 iterations are used, resulting in an effective batch size of 32. The model uses the Adam optimization scheme with a learning rate of 1e-4 and is trained for 20 epochs. The learning rate increases linearly from zero over the first 9000 updates and decreases linearly by using a linear schedule. The implementation is based on the open-source PyTorch-transformer library (HuggingFace 2019).
14