Model Card for Model ID

This model is a finetuned version of Prometheus2-8x7b-hf using a Dataset Feedback Collection with questions, answers, and evaluations related to general subjects.

Model Details

It used a QLORA version for the finetuning process.

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Uses

Use destinated to improve automatic healthcare evaluations. LLM as a Judge.

Direct Use

Bias, Risks, and Limitations

The dataset used is not specialized in healthcare questions, answers, and evaluation with rubric scores. The main risk is getting evaluations not so customized to our proposal.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. More specific healthcare data is needed for further recommendations.

Training Data

The prometheus-eval/Feedback-Collection dataset contains 9,996 rows. (90%) of those rows was randomized and used for training.

Training Procedure

It wasn't a full finetuning, but only of the following linear layers of the transformer model: Percentage of trained parameters Trainable: 121112576 Total: 46823905280 Percentage: 0.2587% Linear Layers: ['w3', 'o_proj', 'q_proj', 'gate', 'v_proj', 'w1', 'w2', 'k_proj']

Training Hyperparameters

  • Training regime: [More Information Needed]
  • For LORA step it was used the following hyperparameters: r=8, lora_alpha=32, target_modules=modules, lora_dropout=0.05

The hiperparameters related to the training process:

    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    warmup_steps=0,
    max_steps=100,
    learning_rate=2e-4,
    logging_steps=20,
    output_dir="outputs",
    optim="paged_adamw_8bit",
    save_strategy="epoch"
    

Metrics

It was used for the assessment of the training process Loss Function (built-in Cross-entropy for Transformers). It's the cross-entropy between the predicted probability matrix between the predicted probability matrix ∥sentence length∥×∥vocab∥ (right before taking the argmax to find the token to output), and the ∥sentence length∥ -length vector of token IDs as the true label. It is a method of evaluating how well your algorithm models your dataset. In this case, it's a method to assess how well a transformer predicts the evaluations for each pair question and generated answer. There is a decrease in the Loss function indicating a learning process. Ideally, adding more data and adjusting hyperparameters can close this function to zero.

Results

The metrics related to the Finetuning process:

global_step=100 train_runtime: 1584 s outputs train/loss: 0.4637 https://wandb.ai/eeg-neuko/huggingface/runs/bfnozz0c?nw=nwuserguicondor1512

Hardware

It used a GPU A100 with VRAM of 40 GB from Google Colab Pro+.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support