k-l-lambda
/

Llama-3.2-1B-vocab32k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.2-1B-vocab32k / README.md

k-l-lambda's picture

updated README.

9a96e2c 4 months ago

|

history blame contribute delete

2.55 kB

	---
	license: mit
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	---

	## Model Information

	This is a vocab pruned variant of [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct).
	The vocabulary size is pruned from 128256 to 32256.
	The total parameter size is: 1,039,214,756, ~200M parameters pruned from origin.


	## How to use

	This is a code example:

	```python
	import torch
	from transformers import pipeline


	pipe = pipeline(
	"text-generation",
	model='k-l-lambda/Llama-3.2-1B-vocab32k',
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	messages = [
	{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
	{"role": "user", "content": "Who are you?"},
	]
	outputs = pipe(
	messages,
	max_new_tokens=256,
	)
	print(outputs[0]["generated_text"][-1])
	```

	Another:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer


	tokenizer = AutoTokenizer.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")
	model = AutoModelForCausalLM.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")

	input_ids = tokenizer.encode("Hello, ", return_tensors="pt")
	output = model.generate(input_ids)
	print(tokenizer.decode(output[0]))
	```


	## Tokens conversion

	You can map an ID value in 32k vocab to the ID value in original 128k vocab, by the tensor in `token_indices.pt` and `inv_token_indices.pt`.

	```python
	import torch
	from huggingface_hub import hf_hub_download
	from transformers import AutoTokenizer


	tokenizer128k = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')
	tokenizer32k = AutoTokenizer.from_pretrained('k-l-lambda/Llama-3.2-1B-vocab32k')

	indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='token_indices.pt')
	inv_indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='inv_token_indices.pt')
	token_indices = torch.load(indices_path)
	inv_token_indices = torch.load(inv_indices_path)

	ids_32k = tokenizer32k.encode('This is an example sentence.')
	ids_128k = [token_indices[id].item() for id in ids_32k]
	print(f'{ids_32k=}')
	print(f'{ids_128k=}')

	print(tokenizer128k.decode(ids_128k))


	ids_128k = tokenizer128k.encode('This is another example sentence.')
	ids_32k = [inv_token_indices[id].item() for id in ids_128k]
	print(f'{ids_128k=}')
	print(f'{ids_32k=}') # non-exist tokens in 32k vocab will map to -1

	print(tokenizer32k.decode(ids_32k))
	```