INT8 ONNX version of Felladrin/llama2_xs_460M_experimental_evol_instruct to use with Transformers.js.
Example usage
Pipeline API
import { pipeline } from '@xenova/transformers';
const generator = await pipeline('text-generation', 'Felladrin/onnx-int8-llama2_xs_460M_experimental_evol_instruct');
const output = await generator('Once upon a time,', { add_special_tokens: true, max_new_tokens: 60, repetition_penalty: 1.2});
console.log(output);
Auto Classes
import { AutoModelForCausalLM, AutoTokenizer } from '@xenova/transformers';
const model_path = 'Felladrin/onnx-int8-llama2_xs_460M_experimental_evol_instruct';
const model = await AutoModelForCausalLM.from_pretrained(model_path);
const tokenizer = await AutoTokenizer.from_pretrained(model_path);
const prompt = 'Once upon a time,';
const { input_ids } = tokenizer(prompt);
const tokens = await model.generate(input_ids, { max_new_tokens: 60, repetition_penalty: 1.2});
console.log(tokenizer.decode(tokens[0], { skip_special_tokens: true }));
- Downloads last month
- 28
Inference API (serverless) does not yet support transformers.js models for this pipeline type.
Model tree for Felladrin/onnx-llama2_xs_460M_experimental_evol_instruct
Base model
ahxt/llama2_xs_460M_experimental