license: apache-2.0
XGen-7B-8K-Base
Official research release for the family of XGen models (7B
) by Salesforce AI Research:
Title: Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length
Authors: Erik Nijkamp*, Tian Xie*, Hiroaki Hayashi*, Bo Pang*, Congying Xia*, Chen Xing, Rui Meng, Wojciech Kryscinski, Lifu Tu, Meghana Bhat, Semih Yavuz, Jesse Vig, Lidiya Murakhovs'ka, Chien-Sheng Wu, Yingbo Zhou, Shafiq Rayhan Joty, Caiming Xiong, Silvio Savarese.
(* indicates equal contribution)
Correspondence to: Shafiq Rayhan Joty, Caiming Xiong
Models
Base models
- XGen-7B-4K-Base: XGen-7B model pre-trained under 4K sequence length.
- License: Apache-2.0
- XGen-7B-8K-Base: XGen-7B model pre-trained under 8K sequence length.
- License: Apache-2.0
Instruction-finetuned models
Supervised finetuned model on public domain instructional data. Released for research purpose only.
How to run
The training data for the models are tokenized with OpenAI Tiktoken library.
To use this model, install the package via pip
:
pip install tiktoken
The models can be used as auto-regressive samplers as follows:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Salesforce/xgen-7b-8k-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Salesforce/xgen-7b-8k-base", torch_dtype=torch.bfloat16)
inputs = tokenizer("The world is", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))
Citation
@misc{XGen,
title={Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length},
author={Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Rui Meng, Wojciech Kryscinski, Lifu Tu, Meghana Bhat, Semih Yavuz, Jesse Vig, Lidiya Murakhovs'ka, Chien-Sheng Wu, Yingbo Zhou, Shafiq Rayhan Joty, Caiming Xiong, Silvio Savarese},
howpublished={Salesforce AI Research Blog},
year={2023},
url={https://blog.salesforceairesearch.com/xgen}
}