|
--- |
|
license: other |
|
extra_gated_prompt: >- |
|
### MULTI-TOKEN PREDICTION RESEARCH LICENSE AGREEMENT 18th June 2024 |
|
|
|
This Multi-token Prediction Research License (“Agreement”) contains the terms and conditions that govern your access and use of the Materials (as defined below). You may not use the Materials if you do not accept this Agreement. By clicking "submit" below to accept, or accessing, using, or distributing any portion or element of the Materials you hereby agree to be bound by the terms of this Agreement. If you are agreeing to be bound by the Agreement on behalf of your employer or other entity, you represent and warrant to Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland) (“Meta”) that you have full legal authority to bind your employer or such entity to this Agreement. If you do not have requisite authority, you may not accept the Agreement or access the Materials on behalf of your employer or other entity. |
|
|
|
This Agreement is effective upon the earlier of the date that you first access the Materials or accept this Agreement (“Effective Date”), and is entered into by and between Meta, and you, or if you are entering into this Agreement on behalf of your employer or other entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules, or regulations to provide legal consent and, your employer or other entity and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf (“Licensee” or “You”). |
|
|
|
1. Definitions. |
|
|
|
a. “Documentation” means the specifications, manuals and documentation accompanying this release distributed by Meta at https://huggingface.co/facebook/multi-token-prediction. |
|
|
|
b. “Noncommercial Research Uses” means noncommercial research use cases related to research, development, education, processing, or analysis and in each case, is not primarily intended for commercial advantage or monetary compensation to you or others. |
|
|
|
c. “Materials” means, collectively, Documentation and the models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code, demonstration materials and other elements of the foregoing distributed by Meta at https://huggingface.co/facebook/multi-token-prediction and made available under this Agreement. |
|
|
|
d. “Trade Control Laws” means any applicable U.S. and non-U.S. export control and trade sanctions laws and regulations. |
|
|
|
e. “Acceptable Use Policy” means the [LLaMA Acceptable Use Policy](https://ai.meta.com/llama/use-policy/) applicable to Materials that is incorporated into this Agreement. |
|
|
|
2. License Rights and Redistribution. Subject to Your compliance with the terms and conditions of this Agreement, Meta hereby grants you the following: |
|
|
|
a. Grant of Rights. You are hereby granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Materials solely for Noncommercial Research Uses. |
|
|
|
b. Redistribution and Use. |
|
|
|
i. Distribution of Materials, and any derivative works thereof, are subject to the terms of this Agreement. If you distribute or make the Materials, or any derivative works thereof, available to a third party, you may only do so under the terms of this Agreement. You shall also provide a copy of this Agreement to such third party. |
|
|
|
ii. If you submit for publication the results of research you perform on, using, or otherwise in connection with Materials, you must acknowledge the use of Materials in your publication. |
|
|
|
iii. You must retain in all copies of the Materials that you distribute and include the following attribution notice within a “Notice” text file distributed as a part of such copies: “Materials are licensed under the Multi-token Prediction Research License, Copyright © Meta Platforms, Inc. All Rights Reserved.” |
|
|
|
iv. Your use of the Materials must comply with applicable laws and regulations (including Trade Control Laws) and adhere to the LLaMA Acceptable Use Policy, which is hereby incorporated by reference into this Agreement. |
|
|
|
v. You agree to validate and confirm LLaMA outputs for compliance with the LLaMA Acceptable Use Policy, including before relying on LLaMA outputs in any way as part of research activities or incorporating these outputs in research, studies, and papers. |
|
|
|
vi. You agree to report any violation of this Multi-token Prediction Research License or the Acceptable Use Policy, as outlined in the LLaMA Acceptable Use Policy. |
|
|
|
3. Restrictions. You will not, and will not permit, assist or cause any third party to: |
|
|
|
a. use the Materials or any outputs or results of the Materials in connection with any commercial uses or for any uses other than Noncommercial Research Uses; |
|
|
|
b. disguise your or their location through IP proxying or other methods; |
|
|
|
c. use or download Materials if you or they are: (a) located in a comprehensively sanctioned jurisdiction, (b) currently listed on any U.S. or non-U.S. restricted parties list, or (c) will use Materials for any purpose prohibited by Trade Control Laws; or |
|
|
|
d. directly or indirectly export, re-export, provide, or otherwise transfer Materials: (a) to any individual, entity, or country prohibited by Trade Control Laws; (b) to anyone on U.S. or non-U.S. government restricted parties lists; or (c) for any purpose prohibited by Trade Control Laws, including nuclear, chemical or biological weapons, or missile technology applications. |
|
|
|
4. User Support. Your Noncommercial Research Use of the Materials is done at your own discretion; Meta does not process any information nor provide any service in relation to such use. Meta is under no obligation to provide any support services for the Materials. Any support provided is “as is”, “with all faults”, and without warranty of any kind. |
|
|
|
5. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, THE ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT DISCOVERABLE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE MATERIALS AND ANY OUTPUT AND RESULTS. |
|
|
|
6. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY DIRECT OR INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. |
|
|
|
7. Intellectual Property. |
|
|
|
a. No trademark licenses are granted under this Agreement, and in connection with the Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Materials. |
|
|
|
b. Subject to Meta’s ownership of Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications. |
|
|
|
c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Materials or outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses and rights granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Materials. |
|
|
|
8. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Materials. Sections 3, 4, 5, 6, 7, 8 and 9 shall survive the termination of this Agreement. |
|
|
|
9. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement. |
|
|
|
10. Modifications and Amendments. Meta may modify this Agreement from time to time by posting a revised version at https://huggingface.co/facebook/multi-token-prediction/LICENSE; provided that they are similar in spirit to the current version of the Agreement, but may differ in detail to address new problems or concerns. All such changes will be effective immediately. Your continued use of the Materials after any modification to this Agreement constitutes your agreement to such modification. Except as provided in this Agreement, no other modification or addition to any provision of this Agreement will be binding unless it is in writing and signed by an authorized representative of both you and Meta. |
|
|
|
extra_gated_fields: |
|
First Name: text |
|
Last Name: text |
|
Date of birth: date_picker |
|
Country: country |
|
Affiliation: text |
|
geo: ip_location |
|
By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox |
|
extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the [Meta Privacy Policy](https://www.facebook.com/privacy/policy/). |
|
extra_gated_button_content: Submit |
|
library_name: multi_token_prediction |
|
--- |
|
|
|
# **Multi-token prediction models and baselines** |
|
|
|
Models accompanying the research paper "Better & Faster Large Language Models via Multi-token Prediction" (https://arxiv.org/abs/2404.19737). |
|
|
|
Included are the following four 7B parameter models trained on code: |
|
- baseline model (`n=1`) trained on 200B tokens of code: [7B_200B_1/](7B_200B_1/) |
|
- multi-token prediction model (`n=4`) trained on 200B tokens of code: [7B_200B_4/](7B_200B_4/) |
|
- baseline model (`n=1`) trained on 1T tokens of code: [7B_1T_1/](7B_1T_1/) |
|
- multi-token prediction model (`n=4`) trained on 1T tokens of code: [7B_1T_4/](7B_1T_4/) |
|
|
|
Tokenizer: standard Llama 2 SentencePiece tokenizer in [tokenizer.model](tokenizer.model). |
|
|
|
## *Quickstart* |
|
|
|
Install `torch`, `fairscale`, `fire` and `sentencepiece` and run |
|
``` |
|
torchrun --nproc_per_node 1 example_completion.py --ckpt_dir 7B_200B_4/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 2 |
|
``` |
|
replacing `7B_200B_4` by the respective checkpoint directory. |
|
|
|
## *Format* |
|
|
|
The Pytorch `state_dicts` are compatible with Llama format: the layers of the shared trunk and the next-token prediction head layer are numbered contiguously. Additional prediction heads for tokens further in the future are names `extra_heads` and can be ignored for standard autoregressive inference. |
|
|
|
The implementation of `forward()` in [llama/model.py](llama/model.py) provides an additional argument `return_all_heads`. If set, the additional prediction heads are called and the logits are returned in shape `(batch_size, seq_len, n_future_tokens, vocab_size)`. Otherwise, the logit's shape is `(batch_size, seq_len, 1, vocab_size)`. |
|
|
|
## *Citation* |
|
|
|
Gloeckle, F., Idrissi, B. Y., Rozière, B., Lopez-Paz, D., & Synnaeve, G. (2024). Better & faster large language models via multi-token prediction. arXiv preprint arXiv:2404.19737. |
|
|
|
Bibtex entry: |
|
``` |
|
@article{gloeckle2024better, |
|
title={Better \& faster large language models via multi-token prediction}, |
|
author={Gloeckle, Fabian and Idrissi, Badr Youbi and Rozi{\`e}re, Baptiste and Lopez-Paz, David and Synnaeve, Gabriel}, |
|
journal={arXiv preprint arXiv:2404.19737}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
## Feedback and comments |
|
Please report risks as indicated in the Acceptable Use Policy and address bugs and any other comments to the corresponding authors as indicated in the research paper. |
|
|