Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Harith Zulfaizal
harithzulfaizal
Follow
0 followers
ยท
1 following
AI & ML interests
None yet
Recent Activity
published
a dataset
22 days ago
harithzulfaizal/fomc-public-chroma
updated
a dataset
22 days ago
harithzulfaizal/fomc-public-chroma
reacted
to
codys12
's
post
with ๐
3 months ago
Introducing bitnet-r1-llama-8b and bitnet-r1-qwen-32b preview! These models are the first successful sub 1-billion-token finetune to BitNet architecture. We discovered that by adding an aditional input RMSNorm to each linear, you can finetune directly to BitNet with fast convergence to original model performance! We are working on a pull request to use this extra RMS for any model. To test these models now, install this fork of transformers: ``` pip install git+https://github.com/Codys12/transformers.git ``` Then load the models and test: ``` from transformers import (AutoModelForCausalLM, AutoTokenizer) model_id = "codys12/bitnet-r1-qwen-32b" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="cuda", ) tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left") ``` bitnet-r1-llama-8b and bitnet-r1-llama-32b were trained on ~ 300M and 200M tokens of the open-thoughts/OpenThoughts-114k dataset respectively, and were still significantly improving at the end of training. This preview simply demonstrates that the concept works, for future training runs we will leave the lm_head unquantized and align the last hidden state with the original model. Huge thanks to the team that made this possible: Gavin Childress, Aaron Herbst, Gavin Jones, Jasdeep Singh, Eli Vang, and Keagan Weinstock from the MSOE AI Club.
View all activity
Organizations
None yet
harithzulfaizal
's datasets
1
Sort:ย Recently updated
harithzulfaizal/fomc-public-chroma
Updated
22 days ago
โข
10