jovyan
/

Swallow-MS-7b-v0.1-ChatVector

Text Generation

text-generation-inference

Model card Files Files and versions Community

Swallow-MS-7b-v0.1-ChatVector / README.md

jovyan's picture

Update README.md

5ff373f verified about 1 year ago

|

history blame contribute delete

2.22 kB

	---
	license: apache-2.0
	language:
	- ja
	- en
	library_name: transformers
	pipeline_tag: text-generation
	model_type: mistral
	---
	# Swallow-MS-7b-v0.1-ChatVector

	Japanese "instruction tuned" model made by the technique of [Chat Vector](https://arxiv.org/abs/2310.04799)

	The weights of this model are obtained not by any instruction tuning but by the following arithmetic:
	> [Swallow-MS-7b-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MS-7b-v0.1) + [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) - [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)

	-----------------------
	[Chat Vector](https://arxiv.org/abs/2310.04799)の手法を使って、学習済み重みの足し引きのみで[Swallow-MS-7b-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MS-7b-v0.1)モデルにチャット形式の対話能力を与えたモデルです。

	詳細は[こちらの日本語記事](https://qiita.com/jovyan/items/ee6affa5ee5bdaada6b4)で解説しています。

	## Instruction format

	The promot format should be the same as [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).

	E.g.
	```
	text = "<s>[INST] What is your favourite condiment? [/INST]"
	"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
	"[INST] Do you have mayonnaise recipes? [/INST]"
	```

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "jovyan/Swallow-MS-7b-v0.1-ChatVector"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	prompt = "<s>[INST] 東京工業大学のキャンパスの特色を元気よく説明してください。 [/INST]"
	input_ids = tokenizer.encode(
	prompt,
	add_special_tokens=False,
	return_tensors="pt"
	)
	tokens = model.generate(
	input_ids.to(device=model.device),
	max_new_tokens=128,
	temperature=0.99,
	top_p=0.95,
	do_sample=True,
	)

	out = tokenizer.decode(tokens[0], skip_special_tokens=True)
	print(out)
	```