metadata

language:
  - en
  - es
  - pt
tags:
  - falcon3
license: other
license_name: falcon-llm-license
license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html

TL;DR
Model Details
Usage
Training Details
Evaluation

TL;DR

Model Details

⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases.

Model Description

Developed by: https://www.tii.ae
Model type: Causal decoder-only
Architecture: Transformer-base
Language(s) (NLP): Mainly English
License: TII Falcon-LLM License 2.0

Usage

Find below some example scripts on how to use the model in transformers (Make sure to have the latest transformers, or the one built from source):

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Click to expand

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU

Click to expand

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using `torch.compile`

Click to expand

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.

Training Procedure

Falcon3-7B is trained on 256 H100 nodes (world size 2048).

Training Hyperparameters

Hyperparameter	Value	Comment
Precision	`bfloat16`
Optimizer	AdamW
Max learning rate	6e-4	Following a WSD (warmup-stable-decay)
		learning rate scheduler
Weight decay	1e-1
z-loss	1e-4
Batch size	Variable	Batch size was gradually increased
		during the training

Evaluation

Category	Benchmark	Llama-3.2-1B	Qwen2.5-1.5B	SmolLM2-1.7B	gemma-2-2b	Falcon3-1B-Base
General	MMLU (5-shot)	31.1	61.0	50.1	53.0	42.5
	MMLU-PRO (5-shot)	11.7	28.4	21.3	22.1	16.1
	IFEval	14.8	26.0	24.2	20.3	25.2
Math	GSM8K (5-shot)	6.6	62.2	31.0	25.5	34.3
Math	MATH Lvl-5 (4-shot)	0.2	6.7	1.4	2.6	2.2
Reasoning	Arc Challenge (25-shot)	40.2	54.8	54.1	53.7	48.1
	GPQA (0-shot)	24.2	28.1	28.9	25.5	28.1
	MUSR (0-shot)	34.5	35.5	34.7	42.7	41.9
	BBH (3-shot)	31.2	41.1	34.2	36.8	36.0
CommonSense Understanding	PIQA (0-shot)	74.5	76.0	77.5	79.2	74.5
	SciQ (0-shot)	88.5	93.1	90.8	95.7	91.1
	Winogrande (0-shot)	60.4	63.0	66.1	68.6	61.2
	OpenbookQA (0-shot)	37.4	40.4	44.0	41.8	41.0

tiiuae
/

Falcon3-1B-Base

Table of Contents

TL;DR

Model Details

Model Description

Usage

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Running the model on a GPU

Running the model on a GPU using `torch.compile`

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Citation

Table of Contents

TL;DR

Model Details

Model Description

Usage

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Running the model on a GPU

Running the model on a GPU using torch.compile

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Citation

Running the model on a GPU using `torch.compile`