MahmoudIbrahim
/

FalconMasr

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

FalconMasr / README.md

MahmoudIbrahim's picture

Update README.md

0920d9a verified about 2 months ago

|

history blame contribute delete

3.8 kB

	---
	datasets:
	- Omartificial-Intelligence-Space/Arabic-finanical-rag-embedding-dataset
	language:
	- ar
	base_model:
	- ybelkada/falcon-7b-sharded-bf16
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- finance
	---
	# Model: FalconMasr

	This model is based on the Falcon-7B model with quantization in 4-bit format for efficient memory usage and fine-tuned using LoRA (Low-Rank Adaptation) for Arabic causal language modeling tasks. The model has been configured to handle causal language modeling tasks specifically designed to improve responses in Arabic.

	## Model Configuration
	- Base Model: `ybelkada/falcon-7b-sharded-bf16`
	- Quantization: 4-bit with `nf4` quantization type and `float16` computation
	- LoRA Configuration: `lora_alpha=16`, `lora_dropout=0`, `r=64`
	- Task Type: Causal Language Modeling
	- Target Modules: `query_key_value`, `dense`, `dense_h_to_4h`, `dense_4h_to_h`

	## Training
	The model was fine-tuned on a custom Arabic text dataset, achieving progressive improvements in training loss, as shown in the table below:

	\| Step \| Training Loss \|
	\|------\|---------------\|
	\| 10 \| 1.459100 \|
	\| 20 \| 1.335000 \|
	\| 30 \| 1.295600 \|
	\| 40 \| 1.177000 \|
	\| 50 \| 1.144900 \|
	\| 60 \| 1.132900 \|
	\| 70 \| 1.074500 \|
	\| 80 \| 1.078600 \|
	\| 90 \| 1.121100 \|
	\| 100 \| 0.936000 \|
	\| 110 \| 1.151500 \|
	\| 120 \| 1.068000 \|
	\| 130 \| 1.056700 \|
	\| 140 \| 0.976900 \|
	\| 150 \| 0.867300 \|
	\| 160 \| 1.151100 \|
	\| 170 \| 1.023200 \|
	\| 180 \| 1.074300 \|
	\| 190 \| 1.036800 \|
	\| 200 \| 0.930700 \|
	\| 210 \| 0.960800 \|
	\| 220 \| 1.098800 \|
	\| 230 \| 0.967400 \|
	\| 240 \| 0.961700 \|
	\| 250 \| 0.871100 \|
	\| 260 \| 0.869400 \|
	\| 270 \| 0.939500 \|
	\| 280 \| 1.087600 \|
	\| 290 \| 1.080700 \|
	\| 300 \| 0.906800 \|
	\| 310 \| 0.901600 \|
	\| 320 \| 0.943200 \|
	\| 330 \| 0.968900 \|
	\| 340 \| 0.986600 \|
	\| 350 \| 1.014200 \|
	\| 360 \| 1.191700 \|
	\| 370 \| 0.992500 \|
	\| 380 \| 0.963600 \|
	\| 390 \| 0.888800 \|
	\| 400 \| 0.746000 \|

	## Usage
	To use this model, load it with the following configuration:

	```python
	import torch
	from transformers import AutoModelForCausalLM,BitsAndBytesConfig
	from transformers import AutoTokenizer
	import warnings
	warnings.filterwarnings("ignore", category=FutureWarning)

	# Model Configuration
	model_name ="MahmoudIbrahim/FalconMasr"

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	)

	# Load model and tokenizer
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	quantization_config=bnb_config,
	trust_remote_code=True,
	low_cpu_mem_usage=True,
	)
	model.config.use_cache = False


	tokenizer =AutoTokenizer.from_pretrained(
	model_name,
	trust_remote_code=True,
	)
	tokenizer.pad_token = tokenizer.eos_token


	input_text = "كيف تختلف منصة المدفوعات المتكاملة لشركة أمريكان إكسبريس عن شبكات البطاقات المصرفية؟"

	# Move inputs to the same device as the model
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Set use_reentrant=False for torch checkpointing
	torch.utils.checkpoint.checkpoint_sequential.use_reentrant = False

	# Tokenize the input text
	inputs = tokenizer(input_text, return_tensors="pt").to(device)

	# Remove 'token_type_ids' if it's present in the inputs
	inputs.pop('token_type_ids', None)

	# Generate the output
	output = model.generate(**inputs, max_length=200,
	use_cache=False,pad_token_id=tokenizer.eos_token_id)

	# Decode the generated output
	decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
	print(decoded_output)