matchaoneshot
/

RoBERTa-MalayMLMFineTuned

Token Classification

masked-language-model

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

RoBERTa-MalayMLMFineTuned / README.md

matchaoneshot's picture

update latest 3.0

803056d verified 17 days ago

|

history blame contribute delete

3.03 kB

	---
	language:
	- ms
	- id
	tags:
	- roberta
	- fine-tuned
	- transformers
	- bert
	- masked-language-model
	license: apache-2.0
	model_type: roberta
	metrics:
	- accuracy
	base_model:
	- mesolitica/roberta-base-bahasa-cased
	pipeline_tag: token-classification
	---

	# Fine-tuned RoBERTa on Malay Language

	This model is a fine-tuned version of the `mesolitica/roberta-base-bahasa-cased` model, specifically trained on a custom Malay dataset. The model is fine-tuned for Masked Language Modeling (MLM) on normalized Malay sentences.

	## Model Description

	This model is based on the RoBERTa architecture, a robustly optimized version of BERT. It was pre-trained on a large corpus of text in the Malay language and then fine-tuned on a specialized dataset consisting of normalized Malay sentences. The fine-tuning task involved predicting masked tokens in sentences, which is typical for masked language modeling tasks.

	### Training Details

	- Pre-trained Model: `mesolitica/roberta-base-bahasa-cased`
	- Task: Masked Language Modeling (MLM)
	- Training Dataset: Custom dataset of Malay sentences
	- Training Duration: 3 epochs
	- Batch Size: 16 per device
	- Learning Rate: 1e-6
	- Optimizer: AdamW
	- Evaluation: Evaluated every 200 steps

	## Training and Validation Loss

	The following table shows the training and validation loss at each evaluation step during the fine-tuning process:

	\| Step \| Training Loss \| Validation Loss \|
	\|-------\|---------------\|-----------------\|
	\| 200 \| 0.069000 \| 0.069317 \|
	\| 800 \| 0.070100 \| 0.067430 \|
	\| 1400 \| 0.069000 \| 0.066185 \|
	\| 2000 \| 0.037900 \| 0.066657 \|
	\| 2600 \| 0.040200 \| 0.066858 \|
	\| 3200 \| 0.041800 \| 0.066634 \|
	\| 3800 \| 0.023700 \| 0.067717 \|
	\| 4400 \| 0.024500 \| 0.068275 \|
	\| 5000 \| 0.024500 \| 0.068108 \|


	### Observations
	- The training loss consistently decreased over time, with notable reductions in the earlier steps.
	- The validation loss showed slight fluctuations, but overall, it remained relatively stable after the first few thousand steps.
	- The model demonstrated good convergence as training progressed, with a sharp drop in the training loss after the first few steps.


	## Intended Use
	This model is intended for tasks such as:
	- Masked Language Modeling (MLM): Fill in the blanks for masked tokens in a Malay sentence.
	- Text Generation: Generate plausible text given a context.
	- Text Understanding: Extract contextual meaning from Malay sentences.

	## Updated News
	- This model is used for the research paper : "Mitigating Linguistic Bias between Malay and Indonesian Languages using Masked Language Models" which been accepted as a short paper (poster presentation) for the Research Track at DASFAA 2025.
	- Author: Ferdinand Lenchau Bit, Iman Khaleda binti Zamri, Amzine Toushik Wasi, Taki Hasan Rafi, and Dong-Kyu Chae (Department of Computer Science, Hanyang University, Seoul, South Korea)