metadata

language:
  - en
  - es
  - pt
tags:
  - falcon3

TL;DR
Model Details
Usage
Training Details
Evaluation

TL;DR

Model Details

⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases.

Model Description

Developed by: https://www.tii.ae
Model type: Causal decoder-only
Architecture: Transformer-base
Language(s) (NLP): Mainly English
License: TII Falcon-LLM License 2.0

Usage

Find below some example scripts on how to use the model in transformers (Make sure to have the latest transformers, or the one built from source):

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Click to expand

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU

Click to expand

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using `torch.compile`

Click to expand

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.

Training Procedure

Falcon3-7B is trained on 256 H100 nodes (world size 2048).

Training Hyperparameters

Hyperparameter	Value	Comment
Precision	`bfloat16`
Optimizer	AdamW
Max learning rate	6e-4	Following a WSD (warmup-stable-decay)
		learning rate scheduler
Weight decay	1e-1
z-loss	1e-4
Batch size	Variable	Batch size was gradually increased
		during the training

Evaluation

Category	Benchmark	Llama3.1-8B	Qwen2-7B	Qwen2.5-7B	Falcon3-7B-Base
General	MMLU (5-shot)	65.2	70.4	74.2	67.5
	MMLU-PRO (5-shot)	32.7	42.1	43.5	39.2
	IFEval	12.0	30.6	33.9	34.3
Math	GSM8K (5-shot)	49.4	77.9	82.9	76.2
Math	MATH(4-shot)	4.1	17.5	15.5	18.0
Reasoning	Arc Challenge (25-shot)	53.4	57.4	59.0	59.6
	GPQA (0-shot)	31.0	31.9	33.0	35.5
	MUSR (0-shot)	38.0	44.1	44.2	47.3
	BBH (3-shot)	46.5	53.3	54.0	51.0
CommonSense Understanding	PIQA (0-shot)	80.3	79.8	78.7	77.7
	SciQ (0-shot)	96.3	95.9	96.6	95.3
	Winogrande (0-shot)	74.0	72.1	72.9	71.0
	OpenbookQA (0-shot)	33.4	35.2	33.6	31.4

tiiuae
/

Falcon3-7B-Base

Table of Contents

TL;DR

Model Details

Model Description

Usage

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Running the model on a GPU

Running the model on a GPU using `torch.compile`

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Citation

Table of Contents

TL;DR

Model Details

Model Description

Usage

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Running the model on a GPU

Running the model on a GPU using torch.compile

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Citation

Running the model on a GPU using `torch.compile`