Falcon3-1B-Base / README.md
wdevazelhes's picture
fix: use correct rounding
b74b487 verified
|
raw
history blame
7.08 kB
metadata
language:
  - en
  - es
  - pt
tags:
  - falcon3
license: other
license_name: falcon-llm-license
license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html

Table of Contents

  1. TL;DR
  2. Model Details
  3. Usage
  4. Training Details
  5. Evaluation

TL;DR

Model Details

⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases.

Model Description

  • Developed by: https://www.tii.ae
  • Model type: Causal decoder-only
  • Architecture: Transformer-base
  • Language(s) (NLP): Mainly English
  • License: TII Falcon-LLM License 2.0

Usage

Find below some example scripts on how to use the model in transformers (Make sure to have the latest transformers, or the one built from source):

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Click to expand
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU

Click to expand
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using torch.compile

Click to expand
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.

Training Procedure

Falcon3-7B is trained on 256 H100 nodes (world size 2048).

Training Hyperparameters

Hyperparameter Value Comment
Precision bfloat16
Optimizer AdamW
Max learning rate 6e-4 Following a WSD (warmup-stable-decay)
learning rate scheduler
Weight decay 1e-1
z-loss 1e-4
Batch size Variable Batch size was gradually increased
during the training

Evaluation

Category Benchmark Llama-3.2-1B Qwen2.5-1.5B SmolLM2-1.7B gemma-2-2b Falcon3-1B-Base
General MMLU (5-shot) 31.1 61.0 50.1 53.0 42.5
MMLU-PRO (5-shot) 11.7 28.4 21.3 22.1 16.1
IFEval 14.8 26.0 24.2 20.3 25.2
Math GSM8K (5-shot) 6.6 62.2 31.0 25.5 34.3
MATH Lvl-5 (4-shot) 0.2 6.7 1.4 2.6 2.2
Reasoning Arc Challenge (25-shot) 40.2 54.8 54.1 53.7 48.1
GPQA (0-shot) 24.2 28.1 28.9 25.5 28.1
MUSR (0-shot) 34.5 35.5 34.7 42.7 41.9
BBH (3-shot) 31.2 41.1 34.2 36.8 36.0
CommonSense Understanding PIQA (0-shot) 74.5 76.0 77.5 79.2 74.5
SciQ (0-shot) 88.5 93.1 90.8 95.7 91.1
Winogrande (0-shot) 60.4 63.0 66.1 68.6 61.2
OpenbookQA (0-shot) 37.4 40.4 44.0 41.8 41.0

Citation