Open-Assistant Falcon 7B SFT OASST-TOP1 Model

This model is a fine-tuning of TII's Falcon 7B LLM. It was trained with 11,123 top-1 (high-quality) demonstrations of the OASST data set (exported on June 2, 2023) with a batch size of 128 for 8 epochs with LIMA style dropout (p=0.2) and a context-length of 2048 tokens.

Model Details

Prompting

Two special tokens are used to mark the beginning of user and assistant turns: <|prompter|> and <|assistant|>. Each turn ends with a <|endoftext|> token.

Input prompt example:

<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>

The input ends with the <|assistant|> token to signal that the model should start generating the assistant reply.

Sample Code

from transformers import AutoTokenizer
import transformers
import torch

model = "OpenAssistant/falcon-7b-sft-top1-696"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

input_text="<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>"

sequences = pipeline(
    input_text,
    max_length=500,
    do_sample=True,
    return_full_text=False,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Configuration Details

Model:

falcon-7b:
  dtype: bf16
  log_dir: "falcon_log_7b"
  learning_rate: 1e-5
  model_name: "tiiuae/falcon-7b"
  deepspeed_config: configs/zero_config.json
  output_dir: falcon
  weight_decay: 0.0
  max_length: 2048
  save_strategy: steps
  eval_steps: 80
  save_steps: 80
  warmup_steps: 20
  gradient_checkpointing: true
  gradient_accumulation_steps: 4
  per_device_train_batch_size: 4
  per_device_eval_batch_size: 8
  num_train_epochs: 8
  save_total_limit: 4
  residual_dropout: 0.2
  residual_dropout_lima: true

Dataset:

oasst-top1:
  # oasst_export: 11123 (100.00%)
  datasets:
    - oasst_export:
        lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
        input_file_path: 2023-06-02_oasst_all_labels.jsonl.gz
        val_split: 0.05
        top_k: 1

Train command:

deepspeed trainer_sft.py --configs defaults falcon-7b oasst-top1 --cache_dir <data_cache_dir> --output_dir <output_path> --deepspeed

Export command:

python export_model.py --dtype bf16 --hf_repo_name OpenAssistant/falcon-7b-sft-top1 --trust_remote_code --auth_token <auth_token> <output_path> --max_shard_size 2GB
Downloads last month
55
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Dataset used to train OpenAssistant/falcon-7b-sft-top1-696

Spaces using OpenAssistant/falcon-7b-sft-top1-696 6