bart_hin_eng_mt / README.md

Update README.md

7654349 verified 10 months ago

6.22 kB

	---
	library_name: transformers
	base_model: danasone/bart-small-ru-en
	tags:
	- generated_from_trainer
	metrics:
	- bleu
	model-index:
	- name: bart_hin_eng_mt
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bart_hin_eng_mt

	This model is a fine-tuned version of [danasone/bart-small-ru-en](https://huggingface.co/danasone/bart-small-ru-en) on [cfilt/iitb-english-hindi](https://huggingface.co/datasets/cfilt/iitb-english-hindi) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9000
	- Bleu: 12.0235
	- Gen Len: 33.4107

	## Model description

	Machine Translation model from Hindi to English on bart small model.

	## Inference and evaluation

	```python
	import torch
	import evaluate
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	class BartSmall():
	def __init__(self, model_path = 'ar5entum/bart_hin_eng_mt', device = None):
	self.tokenizer = AutoTokenizer.from_pretrained(model_path)
	self.model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
	if not device:
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	self.device = device
	self.model.to(device)

	def predict(self, input_text):
	inputs = self.tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True).to(self.device)
	pred_ids = self.model.generate(inputs.input_ids, max_length=512, num_beams=4, early_stopping=True)
	prediction = self.tokenizer.decode(pred_ids[0], skip_special_tokens=True)
	return prediction

	def predict_batch(self, input_texts, batch_size=32):
	all_predictions = []
	for i in range(0, len(input_texts), batch_size):
	batch_texts = input_texts[i:i+batch_size]
	inputs = self.tokenizer(batch_texts, return_tensors="pt", max_length=512,
	truncation=True, padding=True).to(self.device)

	with torch.no_grad():
	pred_ids = self.model.generate(inputs.input_ids,
	max_length=512,
	num_beams=4,
	early_stopping=True)

	predictions = self.tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
	all_predictions.extend(predictions)

	return all_predictions

	model = BartSmall(device='cuda')

	input_texts = [
	"यह शोध्य रकम है।",
	"जानने के लिए देखें ये वीडियो.",
	"वह दो बेटियों व एक बेटे का पिता था।"
	]
	ground_truths = [
	"This is a repayable amount.",
	"Watch this video to find out.",
	"He was a father of two daughters and a son."
	]
	import time
	start = time.time()

	predictions = model.predict_batch(input_texts, batch_size=len(input_texts))
	end = time.time()
	print("TIME: ", end-start)
	for i in range(len(input_texts)):
	print("‾‾‾‾‾‾‾‾‾‾‾‾")
	print("Input text:\t", input_texts[i])
	print("Prediction:\t", predictions[i])
	print("Ground Truth:\t", ground_truths[i])
	bleu = evaluate.load("bleu")
	results = bleu.compute(predictions=predictions, references=ground_truths)
	print(results)

	# TIME: 1.2374696731567383
	# ‾‾‾‾‾‾‾‾‾‾‾‾
	# Input text: यह शोध्य रकम है।
	# Prediction: This is a repayable amount.
	# Ground Truth: This is a repayable amount.
	# ‾‾‾‾‾‾‾‾‾‾‾‾
	# Input text: जानने के लिए देखें ये वीडियो.
	# Prediction: View these videos to know.
	# Ground Truth: Watch this video to find out.
	# ‾‾‾‾‾‾‾‾‾‾‾‾
	# Input text: वह दो बेटियों व एक बेटे का पिता था।
	# Prediction: He was a father of two daughters and a son.
	# Ground Truth: He was a father of two daughters and a son.
	# {'bleu': 0.747875245486914, 'precisions': [0.8260869565217391, 0.75, 0.7647058823529411, 0.7857142857142857], 'brevity_penalty': 0.9574533680683809, 'length_ratio': 0.9583333333333334, 'translation_length': 23, 'reference_length': 24}
	```

	## Training Procedure
	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 100
	- eval_batch_size: 40
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- total_train_batch_size: 200
	- total_eval_batch_size: 80
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 15.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Bleu \| Gen Len \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|:-------:\|:-------:\|
	\| 2.6298 \| 1.0 \| 8265 \| 2.6192 \| 4.5435 \| 39.8786 \|
	\| 2.2656 \| 2.0 \| 16530 \| 2.2836 \| 8.2498 \| 35.8339 \|
	\| 2.0625 \| 3.0 \| 24795 \| 2.1747 \| 9.9182 \| 35.5214 \|
	\| 1.974 \| 4.0 \| 33060 \| 2.0760 \| 10.1515 \| 33.9732 \|
	\| 1.925 \| 5.0 \| 41325 \| 2.0285 \| 10.7702 \| 34.175 \|
	\| 1.8076 \| 6.0 \| 49590 \| 1.9860 \| 11.4286 \| 34.8875 \|
	\| 1.7817 \| 7.0 \| 57855 \| 1.9664 \| 11.4579 \| 32.6411 \|
	\| 1.7025 \| 8.0 \| 66120 \| 1.9561 \| 11.9226 \| 33.5179 \|
	\| 1.6691 \| 9.0 \| 74385 \| 1.9354 \| 11.7352 \| 33.2161 \|
	\| 1.6631 \| 10.0 \| 82650 \| 1.9231 \| 11.9303 \| 32.7679 \|
	\| 1.6317 \| 11.0 \| 90915 \| 1.9264 \| 11.5889 \| 32.625 \|
	\| 1.6449 \| 12.0 \| 99180 \| 1.9047 \| 11.8451 \| 33.8554 \|
	\| 1.6165 \| 13.0 \| 107445 \| 1.9040 \| 12.0755 \| 32.7661 \|
	\| 1.5826 \| 14.0 \| 115710 \| 1.9000 \| 12.3137 \| 33.3536 \|
	\| 1.5835 \| 15.0 \| 123975 \| 1.9000 \| 12.0235 \| 33.4107 \|


	### Framework versions

	- Transformers 4.45.0.dev0
	- Pytorch 2.4.0+cu121
	- Datasets 2.21.0
	- Tokenizers 0.19.1