Danielus's picture

Danielus

danielus

·

DanielusG

AI & ML interests

None yet

Recent Activity

reacted to YerbaPage's post with 👀 8 days ago

Curated list of **Repository-level Code Generation** papers & benchmarks! 🔥 Stay ahead with the latest in: ✅ Repo-level Issue Resolution ✅ Repo-level Code Completion ✅ Datasets & Benchmarks 👉 Check it out: https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation 🔥

reacted to m-ric's post with 🔥 11 days ago

𝗣𝗼𝘁𝗲𝗻𝘁𝗶𝗮𝗹 𝗽𝗮𝗿𝗮𝗱𝗶𝗴𝗺 𝘀𝗵𝗶𝗳𝘁 𝗶𝗻 𝗟𝗟𝗠𝘀: 𝗻𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗯𝘆 𝗠𝗲𝘁𝗮 𝗰𝗹𝗮𝗶𝗺𝘀 𝘁𝗵𝗮𝘁 𝘄𝗲 𝗰𝗮𝗻 𝗴𝗲𝘁 𝗿𝗶𝗱 𝗼𝗳 𝘁𝗼𝗸𝗲𝗻𝗶𝘇𝗲𝗿𝘀! 🥳 Current LLMs process text by first splitting it into tokens. They use a module named "tokenizer", that -spl-it-s- th-e- te-xt- in-to- arbitrary tokens depending on a fixed dictionnary. On the Hub you can find this dictionary in a model's files under tokenizer.json. ➡️ This process is called BPE tokenization. It is suboptimal, everyone says it. It breaks text into predefined chunks that often fail to capture the nuance of language. But it has been a necessary evil in language models since their inception. 💥 In Byte Latent Transformer (BLT), Meta researchers propose an elegant solution by eliminating tokenization entirely, working directly with raw bytes while maintaining efficiency through dynamic "patches." This had been tried before with different byte-level tokenizations, but it's the first time that an architecture of this type scales as well as BPE tokenization. And it could mean a real paradigm shift! 👏👏 🏗️ 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: Instead of a lightweight tokenizer, BLT has a lightweight encoder that process raw bytes into patches. Then the patches are processed by the main heavy-duty transformers as we do normally (but for patches of bytes instead of tokens), before converting back to bytes. 🧩 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴: Instead of fixed tokens, BLT groups bytes based on their predictability (measured by entropy) - using more compute for complex sequences and efficiently handling simple ones. This allows efficient processing while maintaining byte-level understanding. I hope this breakthrough is confirmed and we can get rid of all the tokenizer stuff, it will make model handling easier! Read their paper here 👉 https://dl.fbaipublicfiles.com/blt/BLT__Patches_Scale_Better_Than_Tokens.pdf

reacted to julien-c's post with 🔥 15 days ago

After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise) docs: https://huggingface.co/docs/hub/storage-limits We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥 cc: @reach-vb @pierric @victor and the HF team

View all activity

Organizations

None yet

danielus's activity

New activity in infly/OpenCoder-8B-Instruct about 2 months ago

FIM task

#2 opened about 2 months ago by

New activity in bartowski/Qwen2.5-Coder-7B-Instruct-GGUF 3 months ago

FIM with Ollama

#2 opened 3 months ago by

New activity in G-reen/gpt5o-reflexion-q-agi-llama-3.1-8b 4 months ago

🚩 Report: Ethical issue(s)

#9 opened 4 months ago by

New activity in lamhieu/ghost-8b-beta-8k 5 months ago

Fail on relatively easy question

#3 opened 5 months ago by

New activity in meta-llama/Llama-3.1-8B-Instruct 5 months ago

My alternative quantizations.

#16 opened 5 months ago by

Garbage output ?

#30 opened 5 months ago by

New activity in open-llm-leaderboard/open_llm_leaderboard 5 months ago

[BUG] Gemma2-9b-it evaluation

#849 opened 5 months ago by

New activity in DeepMount00/Qwen2-1.5B-Ita 6 months ago

Fine-tuning this model

#1 opened 6 months ago by

New activity in ExperimentLab/Mistral-ita-7b-Boost 8 months ago

LLama3 8B vs this model

#1 opened 8 months ago by

New activity in TroyDoesAI/Mermaid-Llama-3-8B 8 months ago

Long context

#2 opened 8 months ago by

New activity in TroyDoesAI/MermaidMixtral-3x7b 9 months ago

Best model

#1 opened 9 months ago by

New activity in TheBloke/StableBeluga2-70B-GGML over 1 year ago

Performance of quantified models

#3 opened over 1 year ago by