Text Generation
KerasHub
Keras
English
text-generation-inference

Model Overview

Model Summary

Falcon-RW-1B is a 1B parameters causal decoder-only model built by TII and trained on 350B tokens of RefinedWeb. The architecture of the model is adopted from the GPT-3 paper (Brown et al., 2020) but it uses ALiBi.

Links

Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

Preset name Parameters Description
falcon_refinedweb_1b_en 1.31B 24-layer Falcon model (Falcon with 1B parameters), trained on 350B tokens of RefinedWeb dataset.

Use

Direct Use

Research on large language models, specifically the influence of adequately filtered and deduplicated web data on the properties of large language models (fairness, safety, limitations, capabilities, etc.).

Out-of-scope Use

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

Bias, Risks, and Limitations

Falcon-RW-1B is trained on English data only, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.

Recommendations

We recommend users of Falcon-RW-1B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.

Training Details

Training Data

Falcon-RW-1B was trained on 350B tokens of RefinedWeb, a high-quality filtered and deduplicated web dataset. The data was tokenized with the GPT-2 tokenizer.

Training Procedure

Falcon-RW-1B was trained on 32 A100 40GB GPUs, using only data parallelism with ZeRO.

Training Hyperparameters

Hyperparameters were adapted from the GPT-3 paper (Brown et al., 2020).

Hyperparameter Value Comment
Precision bfloat16
Optimizer AdamW
Learning rate 2e-4 500M tokens warm-up, cosine decay to 2e-5
Weight decay 1e-1
Batch size 512 4B tokens ramp-up

Speeds, Sizes, Times

Training happened in early December 2022 and took about six days.

Evaluation

See the paper on arXiv for in-depth evaluation.

Technical Specifications

Model Architecture and Objective

Falcon-RW-1B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).

The architecture is adapted from the GPT-3 paper (Brown et al., 2020), but uses ALiBi (Ofir et al., 2021).

Hyperparameter Value
Layers 24
d_model 2048
head_dim 64
Vocabulary 50304
Sequence length 2048

Citation

@article{refinedweb,
  title={The {R}efined{W}eb dataset for {F}alcon {LLM}: outperforming curated corpora with web data, and web data only},
  author={Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Alessandro Cappelli and Hamza Alobeidli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay},
  journal={arXiv preprint arXiv:2306.01116},
  eprint={2306.01116},
  eprinttype = {arXiv},
  url={https://arxiv.org/abs/2306.01116},
  year={2023}
}

Example Usage


import os

os.environ["KERAS_BACKEND"] = "jax"

import keras
import keras_hub

# When running only inference, bfloat16 saves memory usage significantly.
keras.config.set_floatx("bfloat16")

causal_lm = keras_hub.models.FalconCausalLM.from_preset(
    "falcon_refinedweb_1b_en"
)
causal_lm.summary()

outputs = causal_lm.generate([
    "What is Jax?",
    "Give me your best brownie recipe.",
], max_length=512)

Example Usage with Hugging Face URI


import os

os.environ["KERAS_BACKEND"] = "jax"

import keras
import keras_hub

# When running only inference, bfloat16 saves memory usage significantly.
keras.config.set_floatx("bfloat16")

causal_lm = keras_hub.models.FalconCausalLM.from_preset(
    "hf://keras/falcon_refinedweb_1b_en"
)
causal_lm.summary()

outputs = causal_lm.generate([
    "What is Jax?",
    "Give me your best brownie recipe.",
], max_length=512)
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including keras/falcon_refinedweb_1b_en