|
--- |
|
license: cc-by-nc-4.0 |
|
--- |
|
A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c |
|
|
|
Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.) |
|
|
|
|
|
Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase. |
|
|
|
--- |
|
To convert normal text to the right format I use: |
|
``` |
|
def add_caseifer(text): |
|
# Using list comprehension for more efficient concatenation |
|
return ''.join(['↨' + char.lower() if char.isupper() else char for char in text]) |
|
``` |
|
|
|
To return the text to human format I use: |
|
``` |
|
def remove_caseifer(text): |
|
new_text = "" |
|
i = 0 |
|
while i < len(text): |
|
if text[i] == "↨": |
|
if i+1 < len(text): |
|
new_text += text[i+1].upper() |
|
i += 1 |
|
else: |
|
pass # skip this index |
|
else: |
|
new_text += text[i] |
|
i += 1 |
|
return new_text |
|
``` |
|
|