afrizalha
/

Sasando-1-25M

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Sasando-1-25M / README.md

afrizalha's picture

Update README.md

1982bfa verified 7 months ago

|

history blame contribute delete

2.88 kB

	---
	library_name: transformers
	tags:
	- indonesia
	license: mit
	language:
	- id
	inference: true
	---
	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Document Title</title>
	<style>
	h1 {
	font-size: 32px;
	color: navy;
	font-family: 'Tahoma';
	text-align: center;
	}
	</style>
	</head>
	<body>
	<h1>First steps to descale low-resource language models</h1>
	</body>
	</html>

	<center>
	<img src="https://i.imgur.com/z9ey830.png" alt="Sasando" width="500" height="250">
	<p><em>Sasando-1 is a tiny, highly experimental short-sequence text generator built using the Phi-3 architecture.</em></p>
	<p><strong><a href="https://huggingface.co/spaces/afrizalha/Sasando-1" style="color: blue; font-family: Tahoma;">❕Go straight to the gradio demo❕</a></strong></p>
	<p><em style="color: black; font-weight: bold;">This repo contains the 25M version</em></p>
	<p><em style="color: black; font-weight: bold;">Preliminary research preview</em></p>
	</center>

	## 🎻 Welcome!
	Sasando-1 is a tiny, highly experimental Indonesian text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset.

	## 🇮🇩 Context
	Indonesia has +700 languages, and many of them are dying at an alarming rate. Language technologies like generative AI can play a massive role in language preservation. However, Indonesia has several contextual issues:

	- Many languages, including those with millions of speakers, have low-volume digital resources
	- Running large models can be costly, while Indonesia is a middle-income country with little funding

	Overcoming these challenges require developers to work with what little data and money that they have. Sasando-1 is a prototypical demonstration that thinly-available resources can potentially still be leveraged to develop generative models with cheap compute.

	## ✨ Specs
	- Comes with 7M and 25M parameters
	- Based on Phi-3 architecture
	- Embedding vocab 4096
	- Trained on ~257M tokens * 4 epoch

	## 🔭 Out-of-Scope Use
	This is a research preview base model. It is not intruction-tuned and has minimal safety curation. It is not intended for commercial or practical applications.

	You are also not allowed to use this model without having fun.

	## Acknowledgments

	- Developed by: Afrizal Hasbi Azizy
	- License: MIT

	## Training log
	<right>
	<img src="https://imgur.com/32NFAKm.png" alt="Training log" width="500" height="250">
	</right>