lyogavin commited on
Commit
d8e074b
1 Parent(s): 450c10d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -1
README.md CHANGED
@@ -1,3 +1,97 @@
1
  ---
2
  license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ ---
4
+
5
+ Anima LLM supporting 100K input token length. It's trained based on Llama2 7B, so the license support commercial use!
6
+
7
+ We carefully curated long QA training dataset from 30k to 100k length to train this model. We also made a lot of memory optimizations to make it scale to 100k tokens.
8
+
9
+
10
+ ## How to train/infer?
11
+
12
+ #### install dependencies
13
+
14
+ ```bash
15
+ # Please update the path of `CUDA_HOME`
16
+ export CUDA_HOME=/usr/local/cuda-11.8
17
+ pip install transformers==4.31.0
18
+ pip install sentencepiece
19
+ pip install ninja
20
+ pip install flash-attn --no-build-isolation
21
+ pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
22
+ pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/xentropy
23
+ ```
24
+
25
+ #### inference
26
+
27
+ ```python
28
+ from transformers import AutoModelForCausalLM, AutoTokenizer
29
+ import torch
30
+
31
+ base_model = "lyogavin/Anima-7B-100K"
32
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
33
+ model = AutoModelForCausalLM.from_pretrained(
34
+ base_model,
35
+ torch_dtype=torch.float16,
36
+ trust_remote_code=True,
37
+ device_map="auto",
38
+ )
39
+ model.eval()
40
+
41
+ prompt = "Where is the capital of US?"
42
+ inputs = tokenizer(prompt, return_tensors="pt")
43
+
44
+ inputs['input_ids'] = inputs['input_ids'].cuda()
45
+ inputs['attention_mask'] = inputs['attention_mask'].cuda()
46
+
47
+ # Generate
48
+ generate_ids = model.generate(**inputs, max_new_tokens=30,
49
+ only_last_logit=True, # to save memory
50
+ use_cache=False, # when run into OOM, enable this can save memory
51
+ xentropy=True)
52
+ output = tokenizer.batch_decode(generate_ids,
53
+ skip_special_tokens=True,
54
+ clean_up_tokenization_spaces=False)[0]
55
+
56
+ ```
57
+
58
+ #### Training
59
+
60
+ ```bash
61
+ ./run_longer_training.sh
62
+ ```
63
+
64
+ ## Evaluations
65
+
66
+ There's almost none evaluation dataset designed for 100k tokens. So we designed/curated some dataset for this model. We compared this model and several other public/private models.
67
+
68
+ #### 1. longchat topic retrieval
69
+
70
+ | Model | Accuracy |
71
+ |-------------------|---------|
72
+ | Claude2 | 0.9 |
73
+ | together llama2 32k | 0.15 |
74
+ | longchat 32k 1.5 | 0.05 |
75
+ | Anima 100K | 0.5 |
76
+
77
+ #### 2. longchat number retrieval
78
+
79
+ | Model | Accuracy |
80
+ |-------------------|---------|
81
+ | Claude2 | 0.85 |
82
+ | together llama2 32k | 0.2 |
83
+ | longchat 32k 1.5 | 0.05 |
84
+ | Anima 100K | 0.45 |
85
+
86
+ #### 3. Narrative QA in zeroscore
87
+
88
+ | Model | F1 |
89
+ |-------------------|---------|
90
+ | Claude2 | 0.6187 |
91
+ | together llama2 32k | 0.3833 |
92
+ | longchat 32k 1.5 | 0.2416 |
93
+ | Anima 100K | 0.4919 |
94
+
95
+ ## Github
96
+
97
+ Github repo is [here](https://github.com/lyogavin/Anima/tree/main/anima_100k)