BhashaAI
/

ViLaH

Visual Question Answering

image-text-to-text

text-generation-inference

Model card Files Files and versions Community

damerajee commited on May 28, 2024

Commit

1f5fd30

·

verified ·

1 Parent(s): 0078211

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -21,8 +21,9 @@ ViLaH (Vision Language Hindi) is a model with 3 billion parameters, fine-tuned f
 * Model Configuration: Fine-tuned on a single epoch using a V100 gpu.
 * Training Duration: Approximately one day.
 * Evaluation Loss: Achieved an eval loss of 1.6384 at the end of the epoch.
 # Dataset
 The dataset was finetuned on only one dataset
 * [damerajee/clean_hin_vqa](https://huggingface.co/datasets/damerajee/clean_hin_vqa) : This dataset was derived from [Lin-Chen/ShareGPT4V](https://huggingface.co/google/paligemma-3b-pt-224)  and filtered to include only images from the COCO dataset. The original dataset was translated and cleaned to ensure high-quality Hindi visual question answering content.

 * Model Configuration: Fine-tuned on a single epoch using a V100 gpu.
 * Training Duration: Approximately one day.
 * Evaluation Loss: Achieved an eval loss of 1.6384 at the end of the epoch.
+* The model is still being train as of right now with better quality dataset
+* The model's performance may be compromised due to insufficient data and the fact that it was trained for only one epoch.
 # Dataset
 The dataset was finetuned on only one dataset
 * [damerajee/clean_hin_vqa](https://huggingface.co/datasets/damerajee/clean_hin_vqa) : This dataset was derived from [Lin-Chen/ShareGPT4V](https://huggingface.co/google/paligemma-3b-pt-224)  and filtered to include only images from the COCO dataset. The original dataset was translated and cleaned to ensure high-quality Hindi visual question answering content.