llama2_7B_15Btoken / README.md
mostafaamiri's picture
Create README.md
4a0a63f verified
---
license: mit
language:
- fa
tags:
- persian
- llama
---
I trained Llama2-7B after extending its tokenizer by 21,455 token on about 15B farsi text(common crawl, social, papers)
```
from transformers import LlamaForCausalLM, AutoTokenizer
import torch
model = LlamaForCausalLM.from_pretrained("mostafaamiri/base_7B")
tokenizer = AutoTokenizer.from_pretrained("mostafaamiri/llama2_7B_15Btoken")
model.resize_token_embeddings(len(tokenizer))
model.load_adapter("mostafaamiri/llama2_7B_15Btoken")
```