--- language: - en - es - pt tags: - falcon3 license: other license_name: falcon-llm-license license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html --- # Table of Contents 0. [TL;DR](#TL;DR) 1. [Model Details](#model-details) 2. [Usage](#usage) 3. [Training Details](#training-details) 4. [Evaluation](#evaluation) # TL;DR # Model Details ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.** ## Model Description - **Developed by:** [https://www.tii.ae](https://www.tii.ae) - **Model type:** Causal decoder-only - **Architecture:** Transformer-base - **Language(s) (NLP):** Mainly English - **License:** TII Falcon-LLM License 2.0
# Usage Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source): ## Using the Pytorch model with 🤗 transformers ### Running the model on a CPU

Click to expand

```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

### Running the model on a GPU

Click to expand

```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

### Running the model on a GPU using `torch.compile`

Click to expand

```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0) model = torch.compile(model) input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

# Training Details ## Training Data Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data. ## Training Procedure Falcon3-7B is trained on 256 H100 nodes (world size 2048). ### Training Hyperparameters | **Hyperparameter** | **Value** | **Comment** | |--------------------|------------|---------------------------------------| | Precision | `bfloat16` | | | Optimizer | AdamW | | | Max learning rate | 6e-4 | Following a WSD (warmup-stable-decay) | | | | learning rate scheduler | | Weight decay | 1e-1 | | | z-loss | 1e-4 | | | Batch size | Variable | Batch size was gradually increased | | | | during the training | # Evaluation

Category	Benchmark	Llama-3.2-1B	Qwen2.5-1.5B	SmolLM2-1.7B	gemma-2-2b	Falcon3-1B-Base
General	MMLU (5-shot)	31.1	61.0	50.1	53.0	42.5
	MMLU-PRO (5-shot)	11.7	28.4	21.3	22.1	16.1
	IFEval	14.8	26.0	24.2	20.3	25.2
Math	GSM8K (5-shot)	6.6	62.2	31.0	25.5	34.3
Math	MATH Lvl-5 (4-shot)	0.2	6.7	1.4	2.6	2.2
Reasoning	Arc Challenge (25-shot)	40.2	54.8	54.1	53.7	48.1
	GPQA (0-shot)	24.2	28.1	28.9	25.5	28.1
	MUSR (0-shot)	34.5	35.5	34.7	42.7	41.9
	BBH (3-shot)	31.2	41.1	34.2	36.8	36.0
CommonSense Understanding	PIQA (0-shot)	74.5	76.0	77.5	79.2	74.5
	SciQ (0-shot)	88.5	93.1	90.8	95.7	91.1
	Winogrande (0-shot)	60.4	63.0	66.1	68.6	61.2
	OpenbookQA (0-shot)	37.4	40.4	44.0	41.8	41.0

# Citation