--- license: apache-2.0 datasets: - trollek/SimpleInstructionJudge-v01 language: - en base_model: h2oai/h2o-danube3-4b-base --- # LittleInstructionJudge-4B-v0.1 **Update:** The instruct_reward is all out of wack due to a misunderstanding on my part caused by lazyness. The other values are fine, though not as useful if I had actually just read more. Any model with the right prompt is better. Even [CleverQwen2-1.5B](https://huggingface.co/trollek/CleverQwen2-1.5B). The next version will be better. A BAdam fine-tuned danube3-4b-base to do one thing, and one thing only: Being a lightweight LLM-as-a-Judge for instruction prompts. The purpose of training this model is to have a small language model that can filter away the worst offenders when creating datasets using the Magpie method in hardware constrained environments. **Important note:** For reasons I don't know, I have issues running models like danube3 in LM Studio. Ollama runs them fine though. LMS reports my VRAM as expected, mostly free since it can't load the model, but unexpectedly only about 90 kB unused RAM, even though it knows *damn well* that there are over 20 gigs worth of memory real estate available. ### Promt template ```jinja2 Judge the instruction below using the following json format: { "intent": , "knowledge": , "task_category": , "other_task_category": [], "difficulty": , "quality_explanation": , "instruct_reward": } This is the instruction I need you to judge: {{instruction}} ``` ### Quants * [mradermacher/LittleInstructionJudge-4B-v0.1-GGUF](https://huggingface.co/mradermacher/LittleInstructionJudge-4B-v0.1-GGUF) ### LLama-Factory training config ```yaml ### model model_name_or_path: danube3/chatml-base ### method stage: sft do_train: true finetuning_type: full use_badam: true badam_switch_mode: ascending badam_switch_interval: 50 badam_start_block: 6 badam_verbose: 1 seed: 8 ### dataset dataset: balanced_instruction_judge template: chatml cutoff_len: 4096 overwrite_cache: false preprocessing_num_workers: 12 ### output output_dir: danube3/trained/LittleInstructionJudge-4B-v0.1 logging_steps: 5 save_steps: 1 save_strategy: epoch plot_loss: true overwrite_output_dir: false ### train per_device_train_batch_size: 1 gradient_accumulation_steps: 4 learning_rate: 0.0000015 num_train_epochs: 1 lr_scheduler_type: cosine warmup_ratio: 0.01 pure_bf16: true flash_attn: fa2 ### eval val_size: 0.02 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 1000 ``` ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:-----:|:---------------:| | 0.4062 | 0.0441 | 1000 | 0.3899 | | 0.3346 | 0.0882 | 2000 | 0.3520 | | 0.3192 | 0.1323 | 3000 | 0.3342 | | 0.3007 | 0.1763 | 4000 | 0.3239 | | 0.2792 | 0.2204 | 5000 | 0.3165 | | 0.2957 | 0.2645 | 6000 | 0.3111 | | 0.3254 | 0.3086 | 7000 | 0.3064 | | 0.3058 | 0.3527 | 8000 | 0.3033 | | 0.298 | 0.3968 | 9000 | 0.3011 | | 0.3157 | 0.4409 | 10000 | 0.2995 | | 0.3314 | 0.4849 | 11000 | 0.2979 | | 0.301 | 0.5290 | 12000 | 0.2965 | | 0.2927 | 0.5731 | 13000 | 0.2957 | | 0.3199 | 0.6172 | 14000 | 0.2950 | | 0.2924 | 0.6613 | 15000 | 0.2948 | | 0.2784 | 0.7054 | 16000 | 0.2945 | | 0.3069 | 0.7495 | 17000 | 0.2943 | | 0.2813 | 0.7935 | 18000 | 0.2943 | | 0.2934 | 0.8376 | 19000 | 0.2942 | | 0.2762 | 0.8817 | 20000 | 0.2942 | | 0.2792 | 0.9258 | 21000 | 0.2942 | | 0.3057 | 0.9699 | 22000 | 0.2942 |