README.md · afrideva/smol_llama-101M-GQA-python-GGUF at f02032c2e8120092ee67ebf3feb7cbc28fd68a85

smol_llama-101M-GQA-python-GGUF / README.md

afrideva

Upload README.md with huggingface_hub

f02032c about 1 year ago

preview code

raw

history blame

9.37 kB

	---
	base_model: BEE-spoke-data/smol_llama-101M-GQA-python
	datasets:
	- BEE-spoke-data/pypi_clean-deduped
	inference: false
	language:
	- en
	license: apache-2.0
	metrics:
	- accuracy
	model_creator: BEE-spoke-data
	model_name: smol_llama-101M-GQA-python
	pipeline_tag: text-generation
	quantized_by: afrideva
	source_model: BEE-spoke-data/smol_llama-101M-GQA
	tags:
	- python
	- codegen
	- markdown
	- smol_llama
	- gguf
	- ggml
	- quantized
	- q2_k
	- q3_k_m
	- q4_k_m
	- q5_k_m
	- q6_k
	- q8_0
	widget:
	- example_title: Add Numbers Function
	text: "def add_numbers(a, b):\n return\n"
	- example_title: Car Class
	text: "class Car:\n def __init__(self, make, model):\n self.make = make\n
	\ self.model = model\n\n def display_car(self):\n"
	- example_title: Pandas DataFrame
	text: 'import pandas as pd

	data = {''Name'': [''Tom'', ''Nick'', ''John''], ''Age'': [20, 21, 19]}

	df = pd.DataFrame(data).convert_dtypes()

	# eda

	'
	- example_title: Factorial Function
	text: "def factorial(n):\n if n == 0:\n return 1\n else:\n"
	- example_title: Fibonacci Function
	text: "def fibonacci(n):\n if n <= 0:\n raise ValueError(\"Incorrect input\")\n
	\ elif n == 1:\n return 0\n elif n == 2:\n return 1\n else:\n"
	- example_title: Matplotlib Plot
	text: 'import matplotlib.pyplot as plt

	import numpy as np

	x = np.linspace(0, 10, 100)

	# simple plot

	'
	- example_title: Reverse String Function
	text: "def reverse_string(s:str) -> str:\n return\n"
	- example_title: Palindrome Function
	text: "def is_palindrome(word:str) -> bool:\n return\n"
	- example_title: Bubble Sort Function
	text: "def bubble_sort(lst: list):\n n = len(lst)\n for i in range(n):\n for
	j in range(0, n-i-1):\n"
	- example_title: Binary Search Function
	text: "def binary_search(arr, low, high, x):\n if high >= low:\n mid =
	(high + low) // 2\n if arr[mid] == x:\n return mid\n elif
	arr[mid] > x:\n"
	---
	# BEE-spoke-data/smol_llama-101M-GQA-python-GGUF

	Quantized GGUF model files for [smol_llama-101M-GQA-python](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA-python) from [BEE-spoke-data](https://huggingface.co/BEE-spoke-data)


	\| Name \| Quant method \| Size \|
	\| ---- \| ---- \| ---- \|
	\| [smol_llama-101m-gqa-python.fp16.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.fp16.gguf) \| fp16 \| 203.28 MB \|
	\| [smol_llama-101m-gqa-python.q2_k.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q2_k.gguf) \| q2_k \| 50.93 MB \|
	\| [smol_llama-101m-gqa-python.q3_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q3_k_m.gguf) \| q3_k_m \| 57.06 MB \|
	\| [smol_llama-101m-gqa-python.q4_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q4_k_m.gguf) \| q4_k_m \| 65.41 MB \|
	\| [smol_llama-101m-gqa-python.q5_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q5_k_m.gguf) \| q5_k_m \| 74.34 MB \|
	\| [smol_llama-101m-gqa-python.q6_k.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q6_k.gguf) \| q6_k \| 83.83 MB \|
	\| [smol_llama-101m-gqa-python.q8_0.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-python-GGUF/resolve/main/smol_llama-101m-gqa-python.q8_0.gguf) \| q8_0 \| 108.35 MB \|



	## Original Model Card:
	# smol_llama-101M-GQA: python

	<a href="https://colab.research.google.com/gist/pszemraj/91b5a267df95461b46922e6c0212e8f7/beecoder-basic-test-notebook.ipynb">
	<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
	</a>

	> 400MB of buzz: pure Python programming nectar! 🍯

	This model is the general pre-trained checkpoint `BEE-spoke-data/smol_llama-101M-GQA` trained on a deduped version of `pypi` for +1 epoch. Play with the model in [this demo space](https://huggingface.co/spaces/BEE-spoke-data/beecoder-playground).

	- Its architecture is the same as the base, with some new Python-related tokens added to vocab prior to training.
	- It can generate basic Python code and markdown in README style, but will struggle with harder planning/reasoning tasks
	- This is an experiment to test the abilities of smol-sized models in code generation; meaning both its capabilities and limitations

	Use with care & understand that there may be some bugs 🐛 still to be worked out.

	## Usage

	📌 Be sure to note:

	1. The model uses the "slow" llama2 tokenizer. Set use_fast=False when loading the tokenizer.
	2. Use transformers library version 4.33.3 due to a known issue in version 4.34.1 (_at time of writing_)

	> Which llama2 tokenizer the API widget uses is an age-old mystery, and may cause minor whitespace issues (widget only).

	To install the necessary packages and load the model:

	```python
	# Install necessary packages
	# pip install transformers==4.33.3 accelerate sentencepiece

	from transformers import AutoTokenizer, AutoModelForCausalLM

	# Load the tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(
	"BEE-spoke-data/smol_llama-101M-GQA-python",
	use_fast=False,
	)
	model = AutoModelForCausalLM.from_pretrained(
	"BEE-spoke-data/smol_llama-101M-GQA-python",
	device_map="auto",
	)

	# The model can now be used as any other decoder
	```

	### longer code-gen example


	Below is a quick script that can be used as a reference/starting point for writing your own, better one :)



	<details>
	<summary>🔥 Unleash the Power of Code Generation! Click to Reveal the Magic! 🔮</summary>

	Are you ready to witness the incredible possibilities of code generation? 🚀. Brace yourself for an exceptional journey into the world of artificial intelligence and programming. Observe a script that will change the way you create and finalize code.

	This script provides entry to a planet where machines can write code with remarkable precision and imagination.

	```python
	"""
	simple script for testing model(s) designed to generate/complete code

	See details/args with the below.
	python textgen_inference_code.py --help
	"""
	import logging
	import random
	import time
	from pathlib import Path

	import fire
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	logging.basicConfig(format="%(levelname)s - %(message)s", level=logging.INFO)


	class Timer:
	"""
	Basic timer utility.
	"""

	def __enter__(self):

	self.start_time = time.perf_counter()
	return self

	def __exit__(self, exc_type, exc_value, traceback):

	self.end_time = time.perf_counter()
	self.elapsed_time = self.end_time - self.start_time
	logging.info(f"Elapsed time: {self.elapsed_time:.4f} seconds")


	def load_model(model_name, use_fast=False):
	""" util for loading model and tokenizer"""
	logging.info(f"Loading model: {model_name}")
	tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=use_fast)
	model = AutoModelForCausalLM.from_pretrained(
	model_name, torch_dtype="auto", device_map="auto"
	)
	model = torch.compile(model)
	return tokenizer, model


	def run_inference(prompt, model, tokenizer, max_new_tokens: int = 256):
	"""
	run_inference

	Args:
	prompt (TYPE): Description
	model (TYPE): Description
	tokenizer (TYPE): Description
	max_new_tokens (int, optional): Description

	Returns:
	TYPE: Description
	"""
	logging.info(f"Running inference with max_new_tokens={max_new_tokens} ...")
	with Timer() as timer:
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=max_new_tokens,
	min_new_tokens=8,
	renormalize_logits=True,
	no_repeat_ngram_size=8,
	repetition_penalty=1.04,
	num_beams=4,
	early_stopping=True,
	)
	text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
	logging.info(f"Output text:\n\n{text}")
	return text


	def main(
	model_name="BEE-spoke-data/smol_llama-101M-GQA-python",
	prompt:str=None,
	use_fast=False,
	n_tokens: int = 256,
	):
	"""Summary

	Args:
	model_name (str, optional): Description
	prompt (None, optional): specify the prompt directly (default: random choice from list)
	n_tokens (int, optional): max new tokens to generate
	"""
	logging.info(f"Inference with:\t{model_name}, max_new_tokens:{n_tokens}")

	if prompt is None:
	prompt_list = [
	'''
	def print_primes(n: int):
	"""
	Print all primes between 1 and n
	"""''',
	"def quantum_analysis(",
	"def sanitize_filenames(target_dir:str, recursive:False, extension",
	]
	prompt = random.SystemRandom().choice(prompt_list)

	logging.info(f"Using prompt:\t{prompt}")

	tokenizer, model = load_model(model_name, use_fast=use_fast)

	run_inference(prompt, model, tokenizer, n_tokens)


	if __name__ == "__main__":
	fire.Fire(main)
	```

	Wowoweewa!! It can create some file cleaning utilities.


	</details>


	---