|
# Model: FalconMasr |
|
|
|
This model is based on the Falcon-7B model with quantization in 4-bit format for efficient memory usage and fine-tuned using LoRA (Low-Rank Adaptation) for Arabic causal language modeling tasks. The model has been configured to handle causal language modeling tasks specifically designed to improve responses in Arabic. |
|
|
|
## Model Configuration |
|
- **Base Model**: `ybelkada/falcon-7b-sharded-bf16` |
|
- **Quantization**: 4-bit with `nf4` quantization type and `float16` computation |
|
- **LoRA Configuration**: `lora_alpha=16`, `lora_dropout=0`, `r=64` |
|
- **Task Type**: Causal Language Modeling |
|
- **Target Modules**: `query_key_value`, `dense`, `dense_h_to_4h`, `dense_4h_to_h` |
|
|
|
## Training |
|
The model was fine-tuned on a custom Arabic text dataset, achieving progressive improvements in training loss, as shown in the table below: |
|
|
|
| Step | Training Loss | |
|
|------|---------------| |
|
| 10 | 1.459100 | |
|
| 20 | 1.335000 | |
|
| 30 | 1.295600 | |
|
| 40 | 1.177000 | |
|
| 50 | 1.144900 | |
|
| 60 | 1.132900 | |
|
| 70 | 1.074500 | |
|
| 80 | 1.078600 | |
|
| 90 | 1.121100 | |
|
| 100 | 0.936000 | |
|
| 110 | 1.151500 | |
|
| 120 | 1.068000 | |
|
| 130 | 1.056700 | |
|
| 140 | 0.976900 | |
|
| 150 | 0.867300 | |
|
| 160 | 1.151100 | |
|
| 170 | 1.023200 | |
|
| 180 | 1.074300 | |
|
| 190 | 1.036800 | |
|
| 200 | 0.930700 | |
|
| 210 | 0.960800 | |
|
| 220 | 1.098800 | |
|
| 230 | 0.967400 | |
|
| 240 | 0.961700 | |
|
| 250 | 0.871100 | |
|
| 260 | 0.869400 | |
|
| 270 | 0.939500 | |
|
| 280 | 1.087600 | |
|
| 290 | 1.080700 | |
|
| 300 | 0.906800 | |
|
| 310 | 0.901600 | |
|
| 320 | 0.943200 | |
|
| 330 | 0.968900 | |
|
| 340 | 0.986600 | |
|
| 350 | 1.014200 | |
|
| 360 | 1.191700 | |
|
| 370 | 0.992500 | |
|
| 380 | 0.963600 | |
|
| 390 | 0.888800 | |
|
| 400 | 0.746000 | |
|
|
|
## Usage |
|
To use this model, load it with the following configuration: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM,BitsAndBytesConfig |
|
from transformers import AutoTokenizer |
|
import warnings |
|
warnings.filterwarnings("ignore", category=FutureWarning) |
|
|
|
# Model Configuration |
|
model_name ="MahmoudIbrahim/FalconMasr" |
|
|
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.float16, |
|
) |
|
|
|
# Load model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
quantization_config=bnb_config, |
|
trust_remote_code=True, |
|
low_cpu_mem_usage=True, |
|
) |
|
model.config.use_cache = False |
|
|
|
|
|
tokenizer =AutoTokenizer.from_pretrained( |
|
model_name, |
|
trust_remote_code=True, |
|
) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
|
|
input_text = "كيف تختلف منصة المدفوعات المتكاملة لشركة أمريكان إكسبريس عن شبكات البطاقات المصرفية؟" |
|
|
|
# Move inputs to the same device as the model |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
# Set use_reentrant=False for torch checkpointing |
|
torch.utils.checkpoint.checkpoint_sequential.use_reentrant = False |
|
|
|
# Tokenize the input text |
|
inputs = tokenizer(input_text, return_tensors="pt").to(device) |
|
|
|
# Remove 'token_type_ids' if it's present in the inputs |
|
inputs.pop('token_type_ids', None) |
|
|
|
# Generate the output |
|
output = model.generate(**inputs, max_length=200, |
|
use_cache=False,pad_token_id=tokenizer.eos_token_id) |
|
|
|
# Decode the generated output |
|
decoded_output = tokenizer.decode(output[0], skip_special_tokens=True) |
|
print(decoded_output) |
|
|