lvkaokao commited on
Commit
830e8d3
2 Parent(s): f6e1e39 af71dde

Merge branch 'main' of https://huggingface.co/Intel/neural-chat-7b-v3 into main

Browse files
Files changed (1) hide show
  1. README.md +31 -7
README.md CHANGED
@@ -2,7 +2,7 @@
2
  license: apache-2.0
3
  ---
4
 
5
- ## Finetuning on [habana](https://habana.ai/) HPU
6
 
7
  This model is a fine-tuned model based on [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca). Then we align it with DPO algorithm. For more details, you can refer our blog: [The Practice of Supervised Fine-tuning and Direct Preference Optimization on Habana Gaudi2](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).
8
 
@@ -38,12 +38,37 @@ The following hyperparameters were used during training:
38
  - lr_scheduler_warmup_ratio: 0.02
39
  - num_epochs: 2.0
40
 
41
- ## Inference with transformers
42
 
43
  ```shell
44
- import transformers
45
- model = transformers.AutoModelForCausalLM.from_pretrained(
46
- 'Intel/neural-chat-7b-v3'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  )
48
  ```
49
 
@@ -58,9 +83,8 @@ The license on this model does not constitute legal advice. We are not responsib
58
 
59
  ## Organizations developing the model
60
 
61
- The NeuralChat team with members from Intel/SATG/AIA/AIPT. Core team members: Kaokao Lv, Liang Lv, Chang Wang, Wenxin Zhang, Xuhui Ren, and Haihao Shen.
62
 
63
  ## Useful links
64
  * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
65
  * Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
66
- * Intel Extension for PyTorch [link](https://github.com/intel/intel-extension-for-pytorch)
 
2
  license: apache-2.0
3
  ---
4
 
5
+ ## Fine-tuning on [Habana](https://habana.ai/) Gaudi
6
 
7
  This model is a fine-tuned model based on [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca). Then we align it with DPO algorithm. For more details, you can refer our blog: [The Practice of Supervised Fine-tuning and Direct Preference Optimization on Habana Gaudi2](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).
8
 
 
38
  - lr_scheduler_warmup_ratio: 0.02
39
  - num_epochs: 2.0
40
 
41
+ ## FP32 Inference with transformers
42
 
43
  ```shell
44
+ from transformers import AutoTokenizer, TextStreamer
45
+ model_name = "Intel/neural-chat-7b-v3"
46
+ prompt = "Once upon a time, there existed a little girl,"
47
+
48
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
49
+ inputs = tokenizer(prompt, return_tensors="pt").input_ids
50
+ streamer = TextStreamer(tokenizer)
51
+
52
+ model = AutoModelForCausalLM.from_pretrained(model_name)
53
+ outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
54
+ )
55
+ ```
56
+
57
+ ## INT4 Inference with transformers
58
+
59
+ ```shell
60
+ from transformers import AutoTokenizer, TextStreamer
61
+ from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig
62
+ model_name = "Intel/neural-chat-7b-v3"
63
+ config = WeightOnlyQuantConfig(compute_dtype="int8", weight_dtype="int4")
64
+ prompt = "Once upon a time, there existed a little girl,"
65
+
66
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
67
+ inputs = tokenizer(prompt, return_tensors="pt").input_ids
68
+ streamer = TextStreamer(tokenizer)
69
+
70
+ model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=config)
71
+ outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
72
  )
73
  ```
74
 
 
83
 
84
  ## Organizations developing the model
85
 
86
+ The NeuralChat team with members from Intel/DCAI/AISE. Core team members: Kaokao Lv, Liang Lv, Chang Wang, Wenxin Zhang, Xuhui Ren, and Haihao Shen.
87
 
88
  ## Useful links
89
  * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
90
  * Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)