Model: FalconMasr
This model is based on the Falcon-7B model with quantization in 4-bit format for efficient memory usage and fine-tuned using LoRA (Low-Rank Adaptation) for Arabic causal language modeling tasks. The model has been configured to handle causal language modeling tasks specifically designed to improve responses in Arabic.
Model Configuration
- Base Model:
ybelkada/falcon-7b-sharded-bf16
- Quantization: 4-bit with
nf4
quantization type andfloat16
computation - LoRA Configuration:
lora_alpha=16
,lora_dropout=0
,r=64
- Task Type: Causal Language Modeling
- Target Modules:
query_key_value
,dense
,dense_h_to_4h
,dense_4h_to_h
Training
The model was fine-tuned on a custom Arabic text dataset, achieving progressive improvements in training loss, as shown in the table below:
Step | Training Loss |
---|---|
10 | 1.459100 |
20 | 1.335000 |
30 | 1.295600 |
40 | 1.177000 |
50 | 1.144900 |
60 | 1.132900 |
70 | 1.074500 |
80 | 1.078600 |
90 | 1.121100 |
100 | 0.936000 |
110 | 1.151500 |
120 | 1.068000 |
130 | 1.056700 |
140 | 0.976900 |
150 | 0.867300 |
160 | 1.151100 |
170 | 1.023200 |
180 | 1.074300 |
190 | 1.036800 |
200 | 0.930700 |
210 | 0.960800 |
220 | 1.098800 |
230 | 0.967400 |
240 | 0.961700 |
250 | 0.871100 |
260 | 0.869400 |
270 | 0.939500 |
280 | 1.087600 |
290 | 1.080700 |
300 | 0.906800 |
310 | 0.901600 |
320 | 0.943200 |
330 | 0.968900 |
340 | 0.986600 |
350 | 1.014200 |
360 | 1.191700 |
370 | 0.992500 |
380 | 0.963600 |
390 | 0.888800 |
400 | 0.746000 |
Usage
To use this model, load it with the following configuration:
import torch
from transformers import AutoModelForCausalLM,BitsAndBytesConfig
from transformers import AutoTokenizer
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
# Model Configuration
model_name ="MahmoudIbrahim/FalconMasr"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
trust_remote_code=True,
low_cpu_mem_usage=True,
)
model.config.use_cache = False
tokenizer =AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
)
tokenizer.pad_token = tokenizer.eos_token
input_text = "كيف تختلف منصة المدفوعات المتكاملة لشركة أمريكان إكسبريس عن شبكات البطاقات المصرفية؟"
# Move inputs to the same device as the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Set use_reentrant=False for torch checkpointing
torch.utils.checkpoint.checkpoint_sequential.use_reentrant = False
# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt").to(device)
# Remove 'token_type_ids' if it's present in the inputs
inputs.pop('token_type_ids', None)
# Generate the output
output = model.generate(**inputs, max_length=200,
use_cache=False,pad_token_id=tokenizer.eos_token_id)
# Decode the generated output
decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
print(decoded_output)