|
---
|
|
license: mit
|
|
language:
|
|
- en
|
|
library_name: transformers
|
|
pipeline_tag: text-generation
|
|
---
|
|
|
|
## Model Information
|
|
|
|
This is a vocab pruned variant of [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct).
|
|
The vocabulary size is pruned from 128256 to 32256.
|
|
The total parameter size is: 1,039,214,756, ~200M parameters pruned from origin.
|
|
|
|
|
|
## How to use
|
|
|
|
This is a code example:
|
|
|
|
```python
|
|
import torch
|
|
from transformers import pipeline
|
|
|
|
|
|
pipe = pipeline(
|
|
"text-generation",
|
|
model='k-l-lambda/Llama-3.2-1B-vocab32k',
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="auto",
|
|
)
|
|
messages = [
|
|
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
|
|
{"role": "user", "content": "Who are you?"},
|
|
]
|
|
outputs = pipe(
|
|
messages,
|
|
max_new_tokens=256,
|
|
)
|
|
print(outputs[0]["generated_text"][-1])
|
|
```
|
|
|
|
Another:
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")
|
|
model = AutoModelForCausalLM.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")
|
|
|
|
input_ids = tokenizer.encode("Hello, ", return_tensors="pt")
|
|
output = model.generate(input_ids)
|
|
print(tokenizer.decode(output[0]))
|
|
```
|
|
|
|
|
|
## Tokens conversion
|
|
|
|
You can map an ID value in 32k vocab to the ID value in original 128k vocab, by the tensor in `token_indices.pt` and `inv_token_indices.pt`.
|
|
|
|
```python
|
|
import torch
|
|
from huggingface_hub import hf_hub_download
|
|
from transformers import AutoTokenizer
|
|
|
|
|
|
tokenizer128k = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')
|
|
tokenizer32k = AutoTokenizer.from_pretrained('k-l-lambda/Llama-3.2-1B-vocab32k')
|
|
|
|
indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='token_indices.pt')
|
|
inv_indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='inv_token_indices.pt')
|
|
token_indices = torch.load(indices_path)
|
|
inv_token_indices = torch.load(inv_indices_path)
|
|
|
|
ids_32k = tokenizer32k.encode('This is an example sentence.')
|
|
ids_128k = [token_indices[id].item() for id in ids_32k]
|
|
print(f'{ids_32k=}')
|
|
print(f'{ids_128k=}')
|
|
|
|
print(tokenizer128k.decode(ids_128k))
|
|
|
|
|
|
ids_128k = tokenizer128k.encode('This is another example sentence.')
|
|
ids_32k = [inv_token_indices[id].item() for id in ids_128k]
|
|
print(f'{ids_128k=}')
|
|
print(f'{ids_32k=}') # non-exist tokens in 32k vocab will map to -1
|
|
|
|
print(tokenizer32k.decode(ids_32k))
|
|
```
|
|
|