Question Answering
Transformers
Safetensors
English
doge
text-generation
custom_code
JingzeShi commited on
Commit
25295dc
verified
1 Parent(s): 1a77cdf

Update README.md (#1)

Browse files

- Update README.md (8d7c7d03c10e2ab6c594a3337b34aa459c458161)

Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -18,7 +18,6 @@ Doge is an ongoing research project where we aim to train a series of small lang
18
  In addition, Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by Jingze Shi, it only allows text input and text generation, for detailed algorithm and model architecture, please refer to [Wonderful Matrices](https://arxiv.org/abs/2412.11834), the ongoing research repository is [Wonderful Matrices](https://github.com/LoserCheems/WonderfulMatrices).
19
 
20
 
21
-
22
  ## Uses
23
 
24
  ```python
@@ -60,17 +59,18 @@ outputs = model.generate(
60
 
61
  > TODO: The larger model is under training and will be uploaded soon.
62
 
63
-
64
- || Training Data | Epochs | Content Length | LR | Batch Size | Precision |
65
  |---|---|---|---|---|---|---|
66
- | [Doge-20M-Instruct](https://huggingface.co/LoserCheems/Doge-20M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 8192 | 8e-5 | 1M | bfloat16 |
 
67
 
68
-
69
- **Training Environment**:
70
  - Image: nvcr.io/nvidia/pytorch:24.10-py3
71
  - Hardware: 1x NVIDIA RTX 4090
72
  - Software: Transformers, TRL
73
 
 
74
  ## Citation
75
 
76
  ```bibtex
 
18
  In addition, Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by Jingze Shi, it only allows text input and text generation, for detailed algorithm and model architecture, please refer to [Wonderful Matrices](https://arxiv.org/abs/2412.11834), the ongoing research repository is [Wonderful Matrices](https://github.com/LoserCheems/WonderfulMatrices).
19
 
20
 
 
21
  ## Uses
22
 
23
  ```python
 
59
 
60
  > TODO: The larger model is under training and will be uploaded soon.
61
 
62
+ **Training**:
63
+ | Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
64
  |---|---|---|---|---|---|---|
65
+ | [Doge-20M-Instruct](https://huggingface.co/JingzeShi/Doge-20M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 8192 | 8e-5 | 1M | bfloat16 |
66
+ | [Doge-60M-Instruct](https://huggingface.co/JingzeShi/Doge-60M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 8192 | 6e-5 | 1M | bfloat16 |
67
 
68
+ **Environment**:
 
69
  - Image: nvcr.io/nvidia/pytorch:24.10-py3
70
  - Hardware: 1x NVIDIA RTX 4090
71
  - Software: Transformers, TRL
72
 
73
+
74
  ## Citation
75
 
76
  ```bibtex