FalconMasr / README.md
MahmoudIbrahim's picture
Update README.md
0920d9a verified
metadata
datasets:
  - Omartificial-Intelligence-Space/Arabic-finanical-rag-embedding-dataset
language:
  - ar
base_model:
  - ybelkada/falcon-7b-sharded-bf16
pipeline_tag: text-generation
library_name: transformers
tags:
  - finance

Model: FalconMasr

This model is based on the Falcon-7B model with quantization in 4-bit format for efficient memory usage and fine-tuned using LoRA (Low-Rank Adaptation) for Arabic causal language modeling tasks. The model has been configured to handle causal language modeling tasks specifically designed to improve responses in Arabic.

Model Configuration

  • Base Model: ybelkada/falcon-7b-sharded-bf16
  • Quantization: 4-bit with nf4 quantization type and float16 computation
  • LoRA Configuration: lora_alpha=16, lora_dropout=0, r=64
  • Task Type: Causal Language Modeling
  • Target Modules: query_key_value, dense, dense_h_to_4h, dense_4h_to_h

Training

The model was fine-tuned on a custom Arabic text dataset, achieving progressive improvements in training loss, as shown in the table below:

Step Training Loss
10 1.459100
20 1.335000
30 1.295600
40 1.177000
50 1.144900
60 1.132900
70 1.074500
80 1.078600
90 1.121100
100 0.936000
110 1.151500
120 1.068000
130 1.056700
140 0.976900
150 0.867300
160 1.151100
170 1.023200
180 1.074300
190 1.036800
200 0.930700
210 0.960800
220 1.098800
230 0.967400
240 0.961700
250 0.871100
260 0.869400
270 0.939500
280 1.087600
290 1.080700
300 0.906800
310 0.901600
320 0.943200
330 0.968900
340 0.986600
350 1.014200
360 1.191700
370 0.992500
380 0.963600
390 0.888800
400 0.746000

Usage

To use this model, load it with the following configuration:

import torch
from transformers import AutoModelForCausalLM,BitsAndBytesConfig
from transformers import AutoTokenizer
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

# Model Configuration
model_name ="MahmoudIbrahim/FalconMasr"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)
model.config.use_cache = False


tokenizer =AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)
tokenizer.pad_token = tokenizer.eos_token


input_text = "كيف تختلف منصة المدفوعات المتكاملة لشركة أمريكان إكسبريس عن شبكات البطاقات المصرفية؟"

# Move inputs to the same device as the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Set use_reentrant=False for torch checkpointing
torch.utils.checkpoint.checkpoint_sequential.use_reentrant = False

# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt").to(device)

# Remove 'token_type_ids' if it's present in the inputs
inputs.pop('token_type_ids', None)

# Generate the output
output = model.generate(**inputs, max_length=200,
                        use_cache=False,pad_token_id=tokenizer.eos_token_id)

# Decode the generated output
decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
print(decoded_output)