Pure Python version for local Inference operation on PC
I first heard about this model today in this article: https://venturebeat.com/ai/ai-on-your-smartphone-hugging-faces-smollm2-brings-powerful-models-to-the-palm-of-your-hand/?_bhlid=071034f893836a3364663dcc52fbea6fd14a2f15
I am disappointed that the full edge-optimal "model" is not disclosed in a portable standalone python format that can be ported to an edge device capable of running python.
Can Hugginface please publish a python script that defines the "model" explicitly, without using the complex "AutoModelForCausalLM" library???
for example for PC operation,
SmolLM2.py
SmolLM2_tokenizer.py
So that:
model = SmolLM2(SmolLM_135M_checkpoint_file).to(device)
inputs = SmolLM2_tokenizer.encode("Gravity is", return_tensors="pt").to(device)
not:
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-1.7B"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
for multiple GPUs install accelerate and do model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)