archit11
/

qwen_worldmodel

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

qwen_worldmodel / README.md

archit11's picture

Update README.md

72d9631 verified 3 months ago

|

history blame contribute delete

2.37 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: Qwen/Qwen2.5-0.5B
	tags:
	- generated_from_trainer
	- qwen
	- GGUF
	- worldmodel
	- worldbuilding
	model-index:
	- name: capybara_finetuned_results3
	results: []
	datasets:
	- archit11/worldbuilding
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# capybara_finetuned_results3

	This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 5.6542

	## video demo : (its pretty bad)

	<video controls autoplay muted src="https://0x0.st/XgZs.mp4"></video>

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 5
	- training_steps: 800

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 15.5311 \| 0.0230 \| 50 \| 14.5422 \|
	\| 8.7477 \| 0.0460 \| 100 \| 9.2952 \|
	\| 7.3554 \| 0.0690 \| 150 \| 7.1992 \|
	\| 6.828 \| 0.0920 \| 200 \| 6.7258 \|
	\| 6.4694 \| 0.1150 \| 250 \| 6.3597 \|
	\| 6.3401 \| 0.1381 \| 300 \| 6.1703 \|
	\| 6.1256 \| 0.1611 \| 350 \| 6.0395 \|
	\| 6.0372 \| 0.1841 \| 400 \| 5.9271 \|
	\| 6.0221 \| 0.2071 \| 450 \| 5.8464 \|
	\| 5.8783 \| 0.2301 \| 500 \| 5.7810 \|
	\| 5.8339 \| 0.2531 \| 550 \| 5.7335 \|
	\| 5.8546 \| 0.2761 \| 600 \| 5.6904 \|
	\| 5.9169 \| 0.2991 \| 650 \| 5.6690 \|
	\| 5.7959 \| 0.3221 \| 700 \| 5.6565 \|
	\| 5.7271 \| 0.3451 \| 750 \| 5.6543 \|
	\| 5.8734 \| 0.3682 \| 800 \| 5.6542 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.4.0
	- Datasets 3.0.0
	- Tokenizers 0.19.1