jploski commited on
Commit
741346b
1 Parent(s): 4b2b1b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -10
README.md CHANGED
@@ -5,31 +5,34 @@ tags:
5
  model-index:
6
  - name: retnet-mini-shakespeare
7
  results: []
 
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
- should probably proofread and complete it, then remove this comment. -->
12
-
13
  # retnet-mini-shakespeare
14
 
15
- This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
16
- It achieves the following results on the evaluation set:
17
- - Loss: 2.7718
18
 
19
  ## Model description
20
 
21
- More information needed
 
 
 
22
 
23
  ## Intended uses & limitations
24
 
25
- More information needed
26
 
27
  ## Training and evaluation data
28
 
29
- More information needed
30
 
31
  ## Training procedure
32
 
 
 
 
 
33
  ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
@@ -59,4 +62,4 @@ The following hyperparameters were used during training:
59
  - Transformers 4.31.0
60
  - Pytorch 2.0.1+cu118
61
  - Datasets 2.14.3
62
- - Tokenizers 0.13.3
 
5
  model-index:
6
  - name: retnet-mini-shakespeare
7
  results: []
8
+ pipeline_tag: text-generation
9
  ---
10
 
 
 
 
11
  # retnet-mini-shakespeare
12
 
13
+ This model was trained from scratch on "tinyshakespeare" text file.
 
 
14
 
15
  ## Model description
16
 
17
+ A tiny model similar to jploski/falcon-mini-shakespeare, to demonstrate training and recurrent inference using a retention network (https://arxiv.org/pdf/2307.08621.pdf).
18
+ The code utilizes Sehyun Choi's implementation of retention network (https://github.com/syncdoth/RetNet) with configuration parameters changed to make it a very tiny model.
19
+
20
+ - **License:** Apache 2.0.
21
 
22
  ## Intended uses & limitations
23
 
24
+ Intended to demonstrate training and (recurrent O(1)) inference using a retention network
25
 
26
  ## Training and evaluation data
27
 
28
+ https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
29
 
30
  ## Training procedure
31
 
32
+ Just used the single tinyshakespeare text file as both the training and validation set (split up into paragraphs). See:
33
+
34
+ https://colab.research.google.com/drive/1wZnM7FCe4TsQpoamJ7NDAuQfA3DYiwHi?usp=sharing
35
+
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
 
62
  - Transformers 4.31.0
63
  - Pytorch 2.0.1+cu118
64
  - Datasets 2.14.3
65
+ - Tokenizers 0.13.3