--- base_model: mistralai/Mistral-Nemo-Base-2407 library_name: peft --- # Model Card for Model ID This is a [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) model fine-tuned on [Reddit-Dad-Jokes dataset](https://huggingface.co/datasets/shuttie/dadjokes). It can generate cringe and toxic jokes on any short prompt: ``` [INST] What horse says when she sees an old lady? [/INST] Move over grandma, I'm going to trot! [INST] What is a lawyer's favorite drink? [/INST] Whine. [INST] My obese pet parrot died over the weekend. [/INST] Now I just have a big fat birdcage. [INST] My wife asked why there was a blanket on top of me and my laptop. [/INST] I told her that I was just covering all my bases. [INST] My girlfriend changed after she became a vegetarian. [/INST] She finally lost her meat. ``` ## Demo The model is running on HF Spaces: [https://huggingface.co/spaces/shuttie/dadjokes](https://huggingface.co/spaces/shuttie/dadjokes) ## Used data We use a [Kaggle Reddit Dad Jokes dataset](https://huggingface.co/datasets/shuttie/reddit-dadjokes) formatted in a base+punchline tuples. The model task was to predict the punchline given the base. Prompt format is the same as for original Mistral model: `[INST] base [/INST] punchline` ## Training process The model was trained with [Axolotl](TODO) with the following config: ```yaml base_model: mistralai/Mistral-Nemo-Base-2407 model_type: MistralForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: true strict: false val_set_size: 0.01 datasets: - path: shuttie/reddit-dadjokes split: train type: field_system: system field_instruction: instruction field_output: output field_input: input format: "[INST] {input} [/INST]" dataset_prepared_path: last_run_prepared output_dir: ./outputs/dadjoke-mistral-nemo-qlora-r128 adapter: qlora lora_model_dir: sequence_len: 256 sample_packing: false pad_to_sequence_len: true lora_r: 128 lora_alpha: 64 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out: wandb_project: "dad jokes" wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 1 micro_batch_size: 16 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0001 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: false gradient_checkpointing_kwargs: use_reentrant: true early_stopping_patience: resume_from_checkpoint: local_rank: xformers_attention: flash_attention: true logging_steps: 10 warmup_steps: 10 evals_per_epoch: 10 eval_table_size: saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp: - full_shard - auto_wrap fsdp_config: fsdp_limit_all_gathers: true fsdp_sync_module_states: true fsdp_offload_params: false fsdp_use_orig_params: false fsdp_cpu_ram_efficient_loading: false fsdp_transformer_layer_cls_to_wrap: MistralDecoderLayer fsdp_state_dict_type: FULL_STATE_DICT fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP activation_checkpointing: true special_tokens: pad_token: flash_attention: true ``` # License Apache 2.0