wanadzhar913
/

malaysian-mistral-llmasajudge-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

wanadzhar913 commited on Dec 22, 2024

Commit

95d4a13

·

verified ·

1 Parent(s): 99e280b

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ We have finetuned [mesolitica/malaysian-mistral-7b-32k-instructions-v4](https://
 ### Training Details
-Overall, solely training on the [Boolq-Malay](https://huggingface.co/datasets/wanadzhar913/boolq-malay) dataset (comprised of both Malay and English versions of the original [Boolq](https://huggingface.co/datasets/google/boolq) dataset), we use the following training parameters and obtain the following training results:
 - **No. of Epochs:** 0.504
 - **Per Device Train Batch Size:** 4
@@ -63,9 +63,9 @@ nf4_config = BitsAndBytesConfig(
     bnb_4bit_compute_dtype=getattr(torch, TORCH_DTYPE)
 )
-tokenizer = AutoTokenizer.from_pretrained('wanadzhar913/malaysian-mistral-llmasajudge-v3')
 model = AutoModelForCausalLM.from_pretrained(
-    'wanadzhar913/malaysian-mistral-llmasajudge-v3',
     use_flash_attention_2 = True,
     quantization_config = nf4_config
 )

 ### Training Details
+Overall, solely training on the [Boolq-Malay](https://huggingface.co/datasets/wanadzhar913/boolq-malay) dataset (comprised of both Malay and English versions of the original [Boolq](https://huggingface.co/datasets/google/boolq) dataset) and Google Colab's A100 GPU (40GB VRAM), we use the following training parameters and obtain the following training results:
 - **No. of Epochs:** 0.504
 - **Per Device Train Batch Size:** 4
     bnb_4bit_compute_dtype=getattr(torch, TORCH_DTYPE)
 )
+tokenizer = AutoTokenizer.from_pretrained('wanadzhar913/malaysian-mistral-llmasajudge-v2')
 model = AutoModelForCausalLM.from_pretrained(
+    'wanadzhar913/malaysian-mistral-llmasajudge-v2',
     use_flash_attention_2 = True,
     quantization_config = nf4_config
 )