shuttie
/

mistral-nemo-dadjokes-v1

Model card Files Files and versions Community

mistral-nemo-dadjokes-v1 / README.md

shuttie's picture

update readme

8dab994 6 months ago

|

history blame contribute delete

3.18 kB

	---
	base_model: mistralai/Mistral-Nemo-Base-2407
	library_name: peft
	---

	# Model Card for Model ID


	This is a [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) model fine-tuned on [Reddit-Dad-Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes). It can generate cringe and toxic jokes on any short prompt:

	```
	[INST] What horse says when she sees an old lady? [/INST] Move over grandma, I'm going to trot!
	[INST] What is a lawyer's favorite drink? [/INST] Whine.
	[INST] My obese pet parrot died over the weekend. [/INST] Now I just have a big fat birdcage.
	[INST] My wife asked why there was a blanket on top of me and my laptop. [/INST] I told her that I was just covering all my bases.
	[INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat.
	```

	## Demo

	The model is running on HF Spaces: [https://huggingface.co/spaces/shuttie/dadjokes](https://huggingface.co/spaces/shuttie/dadjokes)

	## Used data

	We use a [Kaggle Reddit Dad Jokes dataset](https://huggingface.co/datasets/shuttie/reddit-dadjokes) formatted in a base+punchline tuples. The model task was to predict the punchline given the base. Prompt format is the same as for original Mistral model:

	`[INST] base [/INST] punchline`

	## Training process

	The model was trained with [Axolotl](TODO) with the following config:

	```yaml
	base_model: mistralai/Mistral-Nemo-Base-2407
	model_type: MistralForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: false
	load_in_4bit: true
	strict: false
	val_set_size: 0.01
	datasets:
	- path: shuttie/reddit-dadjokes
	split: train
	type:
	field_system: system
	field_instruction: instruction
	field_output: output
	field_input: input
	format: "[INST] {input} [/INST]"

	dataset_prepared_path: last_run_prepared
	output_dir: ./outputs/dadjoke-mistral-nemo-qlora-r128

	adapter: qlora
	lora_model_dir:

	sequence_len: 256
	sample_packing: false
	pad_to_sequence_len: true

	lora_r: 128
	lora_alpha: 64
	lora_dropout: 0.05
	lora_target_modules:
	lora_target_linear: true
	lora_fan_in_fan_out:

	wandb_project: "dad jokes"
	wandb_entity:
	wandb_watch:
	wandb_name:
	wandb_log_model:

	gradient_accumulation_steps: 1
	micro_batch_size: 16
	num_epochs: 1
	optimizer: adamw_bnb_8bit
	lr_scheduler: cosine
	learning_rate: 0.0001

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: false
	gradient_checkpointing_kwargs:
	use_reentrant: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	xformers_attention:
	flash_attention: true

	logging_steps: 10
	warmup_steps: 10
	evals_per_epoch: 10
	eval_table_size:
	saves_per_epoch: 1
	debug:
	deepspeed:
	weight_decay: 0.0
	fsdp:
	- full_shard
	- auto_wrap
	fsdp_config:
	fsdp_limit_all_gathers: true
	fsdp_sync_module_states: true
	fsdp_offload_params: false
	fsdp_use_orig_params: false
	fsdp_cpu_ram_efficient_loading: false
	fsdp_transformer_layer_cls_to_wrap: MistralDecoderLayer
	fsdp_state_dict_type: FULL_STATE_DICT
	fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
	activation_checkpointing: true
	special_tokens:
	pad_token: <pad>
	flash_attention: true
	```

	# License

	Apache 2.0