DecoderImmortal
/

Llama3-8B-MSN

Safetensors

llama

Model card Files Files and versions Community

DecoderImmortal commited on 8 days ago

Commit

b3a40e5

verified ·

1 Parent(s): 7fafb47

Update README.md

Browse files

Files changed (1) hide show

README.md +104 -3

README.md CHANGED Viewed

@@ -1,3 +1,104 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+---
+license: apache-2.0
+---
+# Make Some Noise (MSN) Framework
+Implementation of EMNLP 2024 paper [Make Some Noise: Unlocking Language Model Parallel Inference
+Capability through Noisy Training](https://arxiv.org/pdf/2406.17404).
+[[Github]](https://github.com/wyxstriker/MakeSomeNoiseInference)
+## Requirements
+- Environment: We adopt the same environment as used in [Spec-Bench](https://github.com/hemingkx/Spec-Bench) to facilitate a fair and consistent evaluation.
+- Prepared Models: For convenience of testing, we release the weights of both the general-purpose model [[Llama3-8B-MSN](https://huggingface.co/DecoderImmortal/Llama3-8B-MSN)] and the code-specific model [[DeepSeek-Coder-7B-MSN](https://huggingface.co/DecoderImmortal/DeepSeek-Coder-7B-MSN)] trained on MSN as discussed in the paper.
+## A minimal implementation of MSN
+The MSN framework can be easily integrated into the data preprocessing stage of any training script. The entire noise addition process is as follows:
+```python
+# L denotes the noise length hyperparameter, which is typically set to 5.
+dataset = [
+    {"source_ids": "Query prompt.",
+    "input_ids": "Concatenation of the query and response.",
+    "output_ids": "Copy of input_ids as label for LM task."}
+]
+for source_ids, input_ids in dataset:
+    start_idx = random.randrange(len(source_ids), len(input_ids)-L)
+    for mask_i in range(start_idx, start_idx+L):
+        # Noise is added only to the input portion corresponding to the response.
+        input_ids[mask_i] = random.choice(input_ids[:mask_i])
+```
+## TR-Jacobi
+<div align="center">
+<img src="./pic/tr-jacobi.png" width="50%"/>
+</div>
+We demonstrate how to use TR-Jacobi to accelerate the MSN-trained model in ```src/inference_msn.py```.
+```python
+# jacobi decoding
+spec_res_ids, new_tokens, forward_steps, accpet_list = noise_forward(input_ids.cuda(), model, tokenizer, args.max_new_tokens)
+print("msn output")
+print(tokenizer.decode(spec_res_ids[0]))
+print("#MTA")
+print(new_tokens/forward_steps)
+print("Accepted Length List")
+print(accpet_list)
+# msn output
+# <|begin_of_text|><|start_header_id|>system<|end_header_id|>
+# Give me some advices about how to write an academic paper?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+# 1. Start by researching your topic and gathering relevant information. Make sure to take notes and organize your research in a way that makes sense.
+# ...
+# 8. Submit your paper. Make sure to follow any submission guidelines and make sure to submit your paper on time.<|eot_id|><|eot_id|>.
+# #MTA
+# 2.2
+# Accepted Length List
+# [1, 2, 1, 1, 3, 1, 2, 2, 3, 1, 2, 2, 2, 2, 2, 1, 3, 1, 3, 1, 2, 1, 3, 2, 2, 2, 1, 2, 1, 2, 3, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 1, 5, 1, 3, 1, 5, 2, 1, 3, 2, 2, 2, 3, 2, 5, 1, 3, 2, 3, 2, 3, 2, 1, 4, 3, 1, 2, 2, 3, 6, 1, 2, 2, 2, 3, 2, 2, 3, 3, 2, 3, 2, 2, 2, 1, 2, 2, 2, 3, 3, 3, 1, 4, 2, 1, 2, 2, 2]
+```
+Run ```sh run_case.sh``` to obtain the execution process of a test sample.
+The interface design of the entire ```noise_forward``` is kept consistent with Spec-Bench.
+## Citation
+If you find this work is useful for your research, please cite our paper:
+```
+@inproceedings{wang-etal-2024-make,
+    title = "Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training",
+    author = "Wang, Yixuan  and
+      Luo, Xianzhen  and
+      Wei, Fuxuan  and
+      Liu, Yijun  and
+      Zhu, Qingfu  and
+      Zhang, Xuanyu  and
+      Yang, Qing  and
+      Xu, Dongliang  and
+      Che, Wanxiang",
+    editor = "Al-Onaizan, Yaser  and
+      Bansal, Mohit  and
+      Chen, Yun-Nung",
+    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
+    month = nov,
+    year = "2024",
+    address = "Miami, Florida, USA",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2024.emnlp-main.718/",
+    doi = "10.18653/v1/2024.emnlp-main.718",
+    pages = "12914--12926",
+}
+```