AI & ML interests
inviting members to join the donut earth community from all over the donut π©

takarajordanΒ
posted
an
update
12 days ago

alielfilali01Β
posted
an
update
13 days ago
Post
447
Guys WTH is "yofo-*" ???
Most OpenAI staff associated with the openai/gpt-oss-68911959590a1634ba11c7a4 release are affiliated to dozens of yofo orgs ...
i.e
yofo-wildflower
Some HF folks as well π
Most OpenAI staff associated with the openai/gpt-oss-68911959590a1634ba11c7a4 release are affiliated to dozens of yofo orgs ...
i.e
Some HF folks as well π

aldigobblerΒ
posted
an
update
15 days ago
Post
297
hop on huggingface minecraft server
aldigobbler/mc-server
1.20.1 (view info on space)
soon to add agents into the game
aldigobbler/mc-server
1.20.1 (view info on space)
soon to add agents into the game

alielfilali01Β
authored
a
paper
3 months ago

aldigobblerΒ
posted
an
update
3 months ago
Post
316
no ai slop posted here today i just feel like posting what i did for today
wrote a little framework for turning multiple dense models (llama based) into Sparse MoEs.. i found it fun, spent the whole day and a half on it.
code @ https://gist.github.com/cappuch/6a454ec8d2d349a27f9fd84f6ac90554
wrote a little framework for turning multiple dense models (llama based) into Sparse MoEs.. i found it fun, spent the whole day and a half on it.
code @ https://gist.github.com/cappuch/6a454ec8d2d349a27f9fd84f6ac90554

takarajordanΒ
posted
an
update
3 months ago
Post
383
Cool to see the new model lightonai/Reason-ModernColBERT
Made with late interaction I'd love to recreate the dataset to see a proper apache 2.0 version!
Made with late interaction I'd love to recreate the dataset to see a proper apache 2.0 version!

alielfilali01Β
posted
an
update
4 months ago
Post
822
Great efforts from
@AtlasIA
folks to adapt text2image models (ghibli style) for Moroccan Context
Read the blog is here : https://huggingface.co/blog/atlasia/creating-your-custom-ghibli-text-to-image-model
Read the blog is here : https://huggingface.co/blog/atlasia/creating-your-custom-ghibli-text-to-image-model

takarajordanΒ
posted
an
update
4 months ago
Post
645
π Two months in, https://github.com/takara-ai/go-attention has passed 429 stars on GitHub.
We built this library at takara.ai to bring attention mechanisms and transformer layers to Go β in a form that's lightweight, clean, and dependency-free.
Weβre proud to say that every part of this project reflects what we set out to do.
- Pure Go β no external dependencies, built entirely on the Go standard library
- Core support for DotProductAttention and MultiHeadAttention
- Full transformer layers with LayerNorm, feed-forward networks, and residual connections
- Designed for edge, embedded, and real-time environments where simplicity and performance matter
Thank you to everyone who has supported this so far β the stars, forks, and feedback mean a lot.
We built this library at takara.ai to bring attention mechanisms and transformer layers to Go β in a form that's lightweight, clean, and dependency-free.
Weβre proud to say that every part of this project reflects what we set out to do.
- Pure Go β no external dependencies, built entirely on the Go standard library
- Core support for DotProductAttention and MultiHeadAttention
- Full transformer layers with LayerNorm, feed-forward networks, and residual connections
- Designed for edge, embedded, and real-time environments where simplicity and performance matter
Thank you to everyone who has supported this so far β the stars, forks, and feedback mean a lot.

takarajordanΒ
posted
an
update
5 months ago
Post
1596
AI research over coffee βοΈ
No abstracts, just bullet points.
Start your day here: https://tldr.takara.ai
No abstracts, just bullet points.
Start your day here: https://tldr.takara.ai

takarajordanΒ
posted
an
update
5 months ago
Post
1879
Takara takes 3rd place in the {tech:munich} AI hackathon with Fudeno!
A little over 2 weeks ago @aldigobbler and I set out to create the largest MultiModal SVG dataset ever created, we succeeded in this and when I was in Munich, Germany I took it one step further and made an entire app with it!
We fine-tuned Mistral Small, made a Next.JS application and blew some minds, taking 3rd place out of over 100 hackers. So cool!
If you want to see the dataset, please see below.
takara-ai/fudeno-instruct-4M
A little over 2 weeks ago @aldigobbler and I set out to create the largest MultiModal SVG dataset ever created, we succeeded in this and when I was in Munich, Germany I took it one step further and made an entire app with it!
We fine-tuned Mistral Small, made a Next.JS application and blew some minds, taking 3rd place out of over 100 hackers. So cool!
If you want to see the dataset, please see below.
takara-ai/fudeno-instruct-4M
Post
4420
πAraClip is now fully integrated with Hugging Face π€
AraClip is a specialized CLIP model that was created by @pain and optimized for Arabic text-image retrieval tasksπ₯
π Try it out π
π€ model: Arabic-Clip/araclip
π§© Gradio demo: Arabic-Clip/Araclip-Simplified
π website: https://arabic-clip.github.io/Arabic-CLIP/
AraClip is a specialized CLIP model that was created by @pain and optimized for Arabic text-image retrieval tasksπ₯
π Try it out π
π€ model: Arabic-Clip/araclip
π§© Gradio demo: Arabic-Clip/Araclip-Simplified
π website: https://arabic-clip.github.io/Arabic-CLIP/

alielfilali01Β
posted
an
update
6 months ago
Post
1074
π¨ Arabic LLM Evaluation π¨
Few models join the ranking of https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard Today.
The new MistralAI model, Saba, is quite impressive, Top10 ! Well done @arthurmensch and team.
Sadly Mistral did not follow its strategy about public weights this time, we hope this changes soon and we get the model with a permissive license.
We added other Mistral models and apparently, we have been sleeping on mistralai/Mistral-Large-Instruct-2411 !
Another impressive model that joined the ranking today is ALLaM-AI/ALLaM-7B-Instruct-preview. After a long wait finally ALLaM is here and it is IMPRESSIVE given its size !
ALLaM is ranked on OALL/Open-Arabic-LLM-Leaderboard as well.
Few models join the ranking of https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard Today.
The new MistralAI model, Saba, is quite impressive, Top10 ! Well done @arthurmensch and team.
Sadly Mistral did not follow its strategy about public weights this time, we hope this changes soon and we get the model with a permissive license.
We added other Mistral models and apparently, we have been sleeping on mistralai/Mistral-Large-Instruct-2411 !
Another impressive model that joined the ranking today is ALLaM-AI/ALLaM-7B-Instruct-preview. After a long wait finally ALLaM is here and it is IMPRESSIVE given its size !
ALLaM is ranked on OALL/Open-Arabic-LLM-Leaderboard as well.
Post
4534
I have just released a new blogpost about kv caching and its role in inference speedup π
π https://huggingface.co/blog/not-lain/kv-caching/
some takeaways :
π https://huggingface.co/blog/not-lain/kv-caching/
some takeaways :
Post
1795
we now have more than 2000 public AI models using ModelHubMixinπ€
Post
4139
Published a new blogpost π
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
π https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
π https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :

alielfilali01Β
posted
an
update
7 months ago
Post
2153
3C3H AraGen Leaderboard welcomes today
deepseek-ai/DeepSeek-V3 and 12 other models (including the late gpt-3.5 π) to the ranking of best LLMs in Arabic !
Observations:
- DeepSeek-v3 ranked 3rd and only Open model among the top 5 !
- A 14B open model ( Qwen/Qwen2.5-14B-Instruct) outperforms gpt-3.5-turbo-0125 (from last year). This shows how much we came in advancing and supporting Arabic presence within the LLM ecosystem !
- Contrary to what observed in likelihood-acc leaderboards (like OALL/Open-Arabic-LLM-Leaderboard) further finetuned models like maldv/Qwentile2.5-32B-Instruct actually decreased the performance compared to the original model Qwen/Qwen2.5-32B-Instruct.
It's worth to note that the decrease is statiscally insignificant which imply that at best, the out-domain finetuning do not really hurts the model original capabilities acquired during pretraining.
Previous work addressed this (finetuning VS pretraining) but more investigation in this regard is required (any PhDs here ? This could be your question ...)
Check out the latest rankings: https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard
Observations:
- DeepSeek-v3 ranked 3rd and only Open model among the top 5 !
- A 14B open model ( Qwen/Qwen2.5-14B-Instruct) outperforms gpt-3.5-turbo-0125 (from last year). This shows how much we came in advancing and supporting Arabic presence within the LLM ecosystem !
- Contrary to what observed in likelihood-acc leaderboards (like OALL/Open-Arabic-LLM-Leaderboard) further finetuned models like maldv/Qwentile2.5-32B-Instruct actually decreased the performance compared to the original model Qwen/Qwen2.5-32B-Instruct.
It's worth to note that the decrease is statiscally insignificant which imply that at best, the out-domain finetuning do not really hurts the model original capabilities acquired during pretraining.
Previous work addressed this (finetuning VS pretraining) but more investigation in this regard is required (any PhDs here ? This could be your question ...)
Check out the latest rankings: https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard

alielfilali01Β
posted
an
update
8 months ago
Post
2064
~75% on the challenging GPQA with only 40M parameters π₯π₯³
GREAT ACHIEVEMENT ! Or is it ?
This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.
The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.
Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.
What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.
This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, itβs apparently possible to (intentionally or unintentionally) leak test data through this method.
Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)
GREAT ACHIEVEMENT ! Or is it ?
This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.
The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.
Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.
What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.
This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, itβs apparently possible to (intentionally or unintentionally) leak test data through this method.
Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)

takarajordanΒ
posted
an
update
8 months ago
Post
2096
I made an RSS feed for HuggingFace Daily Papers!! π€
Just Subscribe here: https://papers.takara.ai/api/feed
It updates every 24 hours, completely written as a serverless go script with a Redis cache (to avoid hitting HF all the time).
I'm open sourcing the code, you can check out my repo and deploy it on Vercel extremely easily!
https://github.com/404missinglink/HF-Daily-Papers-Feeds
thanks to @John6666 @p3nGu1nZz for your early support
Just Subscribe here: https://papers.takara.ai/api/feed
It updates every 24 hours, completely written as a serverless go script with a Redis cache (to avoid hitting HF all the time).
I'm open sourcing the code, you can check out my repo and deploy it on Vercel extremely easily!
https://github.com/404missinglink/HF-Daily-Papers-Feeds
thanks to @John6666 @p3nGu1nZz for your early support

alielfilali01Β
posted
an
update
8 months ago
Post
3556
Unpopular opinion: Open Source takes courage to do !
Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !
Cheers to the heroes here who see this!
Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !
Cheers to the heroes here who see this!

takarajordanΒ
posted
an
update
8 months ago
Post
2503
I'm super excited to release my first open-source text dataset:
WorldScenario 20K is a novel dataset of 20,000 synthetically generated multi-stakeholder scenarios designed to simulate real-world decision-making processes. Each scenario explores a unique environmental, societal, or economic issue.
I used the brand new meta-llama/Llama-3.3-70B-Instruct model to generate this dataset and I put the dataset through some post processing to clean and evaluate the dataset for diversity.
I'd appreciate some feedback and thoughts on my new release! Thanks!
takarajordan/WorldScenario_20K
WorldScenario 20K is a novel dataset of 20,000 synthetically generated multi-stakeholder scenarios designed to simulate real-world decision-making processes. Each scenario explores a unique environmental, societal, or economic issue.
I used the brand new meta-llama/Llama-3.3-70B-Instruct model to generate this dataset and I put the dataset through some post processing to clean and evaluate the dataset for diversity.
I'd appreciate some feedback and thoughts on my new release! Thanks!
takarajordan/WorldScenario_20K