[Model Information]

  • This is a fine-tuned version of Apple/OpenELM model series; created in hopes of testing the limitations of OpenELM architecture. And maybe it had something to do with Apple's instruct models not providing the inst format.

  • This language model is trained on a total estimated sample size of 61k lines of data without giving a system prompt.

[Model Usage]

  • Tokenizer is included in this repo, so you may use the model as any other model.

  • Model currently can handle up to 2048/T maximum embeddings. This is the default limit imposed by Apple during the original training process.

  • In the training process of the language model, I didn't use any moderation filtration's; so this model might generate unwanted surprises.

  • Please be aware that this model is trained on a tiny fraction of, what other models are trained on; as of 2024/5/12, you may consider this model as a research-model (This might change, if I feel like continuing improvement).

[How to utilize the model to it's full capacity.]

  • First you will need the basic dependencies that are required for operations, you may install it by running this command:
pip install -U transformers torch torchvision torchaudio accelerate
  • Secondly you may run this code, and remember to replace the What can you do. example question with your own.
from accelerate import Accelerator
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

accelerator = Accelerator()

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="VINUK/OpenELM_Instruct_272M_V1.0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="VINUK/OpenELM_Instruct_272M_V1.0", trust_remote_code=True, torch_dtype=torch.bfloat16)

model = accelerator.prepare_model(model=model, evaluation_mode=True)



with torch.no_grad():
    inputs = tokenizer(text=f"[|=U=|]\nWhat can you do.\n[|=M=|]\n", return_tensors='pt').to(accelerator.device)
    response = model.generate(inputs=inputs['input_ids'],
                                      attention_mask=inputs['attention_mask'],
                                      max_new_tokens=1024,
                                      min_new_tokens=10,
                                      do_sample=True,
                                      top_p=0.95,
                                      top_k=50,
                                      temperature=0.6,
                                      repetition_penalty=1.0,
                                      use_cache=True,
                                      pad_token_id=tokenizer.eos_token_id,
                                      )
    decoded = tokenizer.decode(response[:, inputs['input_ids'].shape[-1]:][0], skip_special_tokens=True)

    print(decoded.replace('\\n', '\n'))
Downloads last month
12
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Datasets used to train VINUK/OpenELM_Instruct_272M_V1.0