Model Overview

This model employs fine-tuning using Low-Rank Adaptation (LoRA) for mapping questions to tagged questions.

Fine-Tuning with LoRA

Fine-tuning adjusts the model's parameters for specific tasks, enhancing its ability to handle nuanced requirements.
LoRA allows efficient updates by modifying only a subset of model weights, significantly reducing computational overhead while maintaining or improving performance.
The fine-tuned llama3-Finetuned model demonstrates exceptional stability and accuracy, achieving perfect scores with no error margins.

Model Configuration

See https://pytorch.org/torchtune/stable/tutorials/e2e_flow.html to know how to use torchtune.

To finetune the model:

Download the model: tune download
meta-llama/Meta-Llama-3-8B
--output-dir /home/YOUR_USERNAME/Meta-Llama-3-8B
--hf-token
Prepare the config file.

Download config file

Run the command: tune cp llama3/8B_lora_single_device custom_config.yaml

Update the file as follows:

Configuration File

```yaml # Config for single device LoRA finetuning in lora_finetune_single_device.py # using a Llama3 8B model # # This config assumes that you've run the following command before launching # this run: # tune download meta-llama/Meta-Llama-3-8B --output-dir /tmp/Meta-Llama-3-8B --hf-token # # To launch on a single device, run the following command from root: # tune run lora_finetune_single_device --config llama3/8B_lora_single_device # # You can add specific overrides through the command line. For example # to override the checkpointer directory while launching training # you can run: # tune run lora_finetune_single_device --config llama3/8B_lora_single_device checkpointer.checkpoint_dir= # # This config works only for training on single device.

# Model Arguments model: component: torchtune.models.llama3.lora_llama3_8b lora_attn_modules: ['q_proj', 'v_proj'] apply_lora_to_mlp: False apply_lora_to_output: False lora_rank: 8 lora_alpha: 16

# Tokenizer tokenizer: component: torchtune.models.llama3.llama3_tokenizer path: /home/YOUR_USERNAME/Meta-Llama-3-8B/original/tokenizer.model

checkpointer: component: torchtune.utils.FullModelMetaCheckpointer checkpoint_dir: /home/YOUR_USERNAME/Meta-Llama-3-8B/original/ checkpoint_files: [ consolidated.00.pth ] recipe_checkpoint: null output_dir: /home/YOUR_USERNAME/Meta-Llama-3-8B/ model_type: LLAMA3 resume_from_checkpoint: False

# Dataset and Sampler dataset: component: torchtune.datasets.instruct_dataset split: train source: /home/YOUR_USERNAME/data template: AlpacaInstructTemplate train_on_input: False seed: null shuffle: True batch_size: 1

# Optimizer and Scheduler optimizer: component: torch.optim.AdamW weight_decay: 0.01 lr: 3e-4 lr_scheduler: component: torchtune.modules.get_cosine_schedule_with_warmup num_warmup_steps: 100

loss: component: torch.nn.CrossEntropyLoss

# Training epochs: 1 max_steps_per_epoch: null gradient_accumulation_steps: 64 compile: False

# Logging output_dir: /home/YOUR_USERNAME/lora_finetune_output metric_logger: component: torchtune.utils.metric_logging.DiskLogger log_dir: ${output_dir} log_every_n_steps: null

# Environment device: cuda dtype: bf16 enable_activation_checkpointing: True

# Profiler (disabled) profiler: component: torchtune.utils.profiler enabled: False

</summary>
</details>

Run the finetune: tune run lora_finetune_single_device --config /home/YOUR_USERNAME/.../custom_config.yaml
Inference Configuration
Copy the generation config: tune cp generation ./custom_generation_config.yaml

Update the file:
```yaml
# Config for running the InferenceRecipe in generate.py to generate output from an LLM
#
# To launch, run the following command from root torchtune directory:
#    tune run generate --config generation

# Model arguments
model:
  _component_: torchtune.models.llama3.llama3_8b

checkpointer:
  _component_: torchtune.utils.FullModelMetaCheckpointer
  
  checkpoint_dir: /home/YOUR_USERNAME/Meta-Llama-3-8B/
  checkpoint_files: [
    meta_model_0.pt
  ]
  output_dir: /home/YOUR_USERNAME/Meta-Llama-3-8B/
  model_type: LLAMA3

device: cuda
dtype: bf16

seed: 1234

# Tokenizer arguments
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /home/YOUR_USERNAME/Meta-Llama-3-8B/original/tokenizer.model

# Generation arguments; defaults taken from gpt-fast
prompt: "### Instruction: \nYou are a powerful model trained to convert questions to tagged questions. Use the tags as follows: \n<qt> to surround question keywords like 'What', 'Who', 'Which', 'How many', 'Return' or any word that represents requests. \n<o> to surround entities as an object like person name, place name, etc. It must be a noun or a noun phrase. \n<s> to surround entities as a subject like person name, place name, etc. The difference between <s> and <o>, <s> only appear in yes/no questions as in the training data you saw before. \n<cc> to surround coordinating conjunctions that connect two or more phrases like 'and', 'or', 'nor', etc. \n<p> to surround predicates that may be an entity attribute or a relationship between two entities. It can be a verb phrase or a noun phrase. The question must contain at least one predicate. \n<off> for offset in questions asking for the second, third, etc. For example, the question 'What is the second largest country?', <off> will be located as follows. 'What is the <off>second</off> largest country?' \n<t> to surround entity types like person, place, etc. \n<op> to surround operators that compare quantities or values, like 'greater than', 'more than', etc. \n<ref> to indicate a reference within the question that requires a cycle to refer back to an entity (e.g., 'Who is the CEO of a company founded by himself?' where 'himself' would be tagged as <ref>himself</ref>). \nInput: Which films directed by a director died in 2014 and starring both Julia Roberts and Richard Gere?\nResponse:"
max_new_tokens: 100
temperature: 0.6 # 0.8 and 0.6 are popular values to try
top_k: 1

quantizer: null

Run the generation: tune run generate --config /home/YOUR_USERNAME/.../custom_generation_config.yaml