Usage
from deepsparse import TextGeneration
prompt = "How to get in a good university?"
formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
model = TextGeneration(model="hf:neuralmagic/TinyLlama-1.1B-Chat-v0.3-pruned50-quant-ds")
print(model(formatted_prompt, max_new_tokens=200).generations[0].text)
"""
Getting into a good university is a complex process that involves factors such as academic performance, financial aid, and personal qualifications. Here are some steps you can follow to get in a good university:
1. Academic performance:
- Look for a university that has a strong academic program, including a well-rounded curriculum that covers a wide range of subjects.
- Check if the university offers a clear curriculum that includes a clear sequence of courses.
- Check if the university offers a clear pathway to graduation, including clear dates and deadlines.
2. Financial aid:
- Look for a university that offers financial aid, such as scholarships, grants, or loans.
- Check if the university offers financial aid that fits your budget.
- Consider the university's financial aid package, including the cost of tuition, room and board, and other expenses.
"""
One-shot and Export
git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]" "torch<2"
python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py PY007/TinyLlama-1.1B-Chat-v0.3 open_platypus --recipe recipe.yaml --save True
python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment --sequence_length 512
cp deployment/model.onnx deployment/model-orig.onnx
python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx
recipe.yaml
test_stage:
obcq_modifiers:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: false
quantize:
QuantizationModifier:
ignore:
- LlamaRotaryEmbedding
- LlamaRMSNorm
- SiLUActivation
- model.layers.21.mlp.down_proj
- model.layers.7.mlp.down_proj
- model.layers.2.mlp.down_proj
- model.layers.20.mlp.down_proj
- model.layers.19.mlp.down_proj
post_oneshot_calibration: false
scheme_overrides:
Embedding:
input_activations: null
weights:
num_bits: 8
symmetric: false
percdamp: 0.01
prunen: 0
prunem: 0
targets:
- model.layers.0
- model.layers.1
- model.layers.2
- model.layers.3
- model.layers.4
- model.layers.5
- model.layers.6
- model.layers.7
- model.layers.8
- model.layers.9
- model.layers.10
- model.layers.11
- model.layers.12
- model.layers.13
- model.layers.14
- model.layers.15
- model.layers.16
- model.layers.17
- model.layers.18
- model.layers.19
- model.layers.20
- model.layers.21
target_ids:
- attention_mask
- position_ids
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support