Corianas
/

Tiny_Test

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Tiny_Test / README.md

Corianas's picture

Update README.md

90b77f3 verified 11 months ago

|

history blame contribute delete

1.01 kB

	---
	license: cc-by-nc-4.0
	---
	A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c

	Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.)


	Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase.

	---
	To convert normal text to the right format I use:
	```
	def add_caseifer(text):
	# Using list comprehension for more efficient concatenation
	return ''.join(['↨' + char.lower() if char.isupper() else char for char in text])
	```

	To return the text to human format I use:
	```
	def remove_caseifer(text):
	new_text = ""
	i = 0
	while i < len(text):
	if text[i] == "↨":
	if i+1 < len(text):
	new_text += text[i+1].upper()
	i += 1
	else:
	pass # skip this index
	else:
	new_text += text[i]
	i += 1
	return new_text
	```