File size: 1,352 Bytes
7a9ace6 cccf815 7a9ace6 cccf815 7a9ace6 cccf815 512186e 39379db ff9ea24 6f726bb dd2f5c9 6f726bb 3957ef8 512186e 3c64a58 aee797f 512186e 0616e70 30457d0 a17eea4 30457d0 06dc12e 88df261 aee797f 88df261 aee797f 88df261 aee797f 88df261 06dc12e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
language:
- en
- zh
- de
- fr
- es
- pt
- ru
- it
- ja
- ko
- vi
- ar
tags:
- pytorch
- text-generation
- causal-lm
- rwkv
license: apache-2.0
datasets:
- EleutherAI/pile
- togethercomputer/RedPajama-Data-1T
---
# RWKV-4 World
## Model Description
RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code).
How to use:
* use latest rwkv pip package (0.7.4+)
* use latest ChatRWKV v2/benchmark_world.py to test
* larger models are stronger even though not fully trained yet
The difference between World & Raven:
* set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
* use Question/Answer or User/AI or Human/Bot prompt for Q&A. **DO NOT USE Bob/Alice or Q/A**
* use **fp32** (will overflow in fp16 at this moment - fixable in future) or bf16 (slight degradation)
NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']
prompt:
```
Instruction: xxx
Input: xxx
Response:
```
A good chat prompt:
```
Question: hi
Answer: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.
Question: xxxxxx
Answer:
``` |