lambdalabs/pythia-1.4b-deduped-synthetic-instruct

This model is created by finetuning EleutherAI/pythia-1.4b-deduped on the Dahoas/synthetic-instruct-gptj-pairwise.

You can try a demo of the model hosted on Lambda Cloud.

Model Details

Finetuned by: Lambda
Model type: Transformer-based Language Model
Language: English
Pre-trained model: EleutherAI/pythia-1.4b-deduped
Dataset: Dahoas/synthetic-instruct-gptj-pairwise
Library: transformers
License: Apache 2.0

Prerequisites

Running inference with the model takes ~4GB of GPU memory.

Quick Start

import torch

from transformers import AutoTokenizer, pipeline, StoppingCriteria, StoppingCriteriaList

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

model_name = "lambdalabs/pythia-1.4b-deduped-synthetic-instruct"
max_new_tokens = 2048
stop_token = "<|stop|>"


class KeywordsStoppingCriteria(StoppingCriteria):
    def __init__(self, keywords_ids: list):
        self.keywords = keywords_ids

    def __call__(
        self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs
    ) -> bool:
        if input_ids[0][-1] in self.keywords:
            return True
        return False


tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_tokens([stop_token])

stop_ids = [tokenizer.encode(w)[0] for w in [stop_token]]
stop_criteria = KeywordsStoppingCriteria(stop_ids)

generator = pipeline(
    "text-generation",
    model=model_name,
    device=device,
    max_new_tokens=max_new_tokens,
    torch_dtype=torch.float16,
    stopping_criteria=StoppingCriteriaList([stop_criteria]),
)

example = "Can you give me some tips on how to save money every month."
text = "Question: {}\nAnswer:".format(example)

result = generator(
    text,
    num_return_sequences=1,
)

output = result[0]["generated_text"]

print(output)

Output:

Question: Can you give me some tips on how to save money every month.
Answer:Create a budget and track your spending.
2. Cut down on unnecessary expenses, such as eating out, shopping, and entertainment.
3. Make a list of your monthly expenses and stick to it.
4. Take advantage of discounts and coupons when shopping.
5. Make sure to pay your bills on time to avoid late fees.
6. Save a portion of your income each month by investing it in a high-yield savings account.
7. Consider automating your savings by setting up a recurring transfer from your checking to a savings account.
8. Take advantage of free entertainment opportunities, such as going to the park or museum.
9. Look for ways to save on utilities, such as installing energy-efficient appliances.
10. Research and use public transportation to save on gas.<|stop|>

Training

The model was trained on the Dahoas/synthetic-instruct-gptj-pairwise. We split the original dataset into the train (first 32000 examples) and validation (the remaining 1144 examples) subsets.

We finetune the model for 4 epoches. This took 8xA100 80GB 2 hours, where we set batch_size_per_gpu to 8 (so global batch size is 64), and learning rate to 0.00002 (with linear decay to zero at the last trainig step). You can find a Weights and Biases record here.

lambdalabs
/

pythia-1.4b-deduped-synthetic-instruct

Model Details

Prerequisites

Quick Start

Training

Dataset used to train lambdalabs/pythia-1.4b-deduped-synthetic-instruct