I am considering canceling my Pro subscription because I just discovered that i am just limited to 10 zeroGPU spaces i can host on my account. This number should be way higher.
Mohamed Rashad PRO
MohamedRashad
AI & ML interests
Computer Vision, Robotics, Natural Language Processing
Recent Activity
new activity
about 20 hours ago
MohamedRashad/FinePersonas-Lite:Update README.md
liked
a model
8 days ago
OmarSamir/EGTTS-V0.1
Organizations
MohamedRashad's activity
Post
2007
The winners of Best Paper Award in NeurIPs2024 (FoundationVision)
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction (2404.02905) has just released a new paper called infinty:
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis (2412.04431)
And i managed to build a space for it so anyone can try it out: MohamedRashad/Infinity
The idea of a text to image model using autoregressive archticture is quite interesting in my opinion.
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis (2412.04431)
And i managed to build a space for it so anyone can try it out: MohamedRashad/Infinity
The idea of a text to image model using autoregressive archticture is quite interesting in my opinion.
reacted to
alielfilali01's
post with π
28 days ago
Post
1911
3C3H AraGen Leaderboard welcomes today
deepseek-ai/DeepSeek-V3 and 12 other models (including the late gpt-3.5 π) to the ranking of best LLMs in Arabic !
Observations:
- DeepSeek-v3 ranked 3rd and only Open model among the top 5 !
- A 14B open model ( Qwen/Qwen2.5-14B-Instruct) outperforms gpt-3.5-turbo-0125 (from last year). This shows how much we came in advancing and supporting Arabic presence within the LLM ecosystem !
- Contrary to what observed in likelihood-acc leaderboards (like OALL/Open-Arabic-LLM-Leaderboard) further finetuned models like maldv/Qwentile2.5-32B-Instruct actually decreased the performance compared to the original model Qwen/Qwen2.5-32B-Instruct.
It's worth to note that the decrease is statiscally insignificant which imply that at best, the out-domain finetuning do not really hurts the model original capabilities acquired during pretraining.
Previous work addressed this (finetuning VS pretraining) but more investigation in this regard is required (any PhDs here ? This could be your question ...)
Check out the latest rankings: inceptionai/AraGen-Leaderboard
Observations:
- DeepSeek-v3 ranked 3rd and only Open model among the top 5 !
- A 14B open model ( Qwen/Qwen2.5-14B-Instruct) outperforms gpt-3.5-turbo-0125 (from last year). This shows how much we came in advancing and supporting Arabic presence within the LLM ecosystem !
- Contrary to what observed in likelihood-acc leaderboards (like OALL/Open-Arabic-LLM-Leaderboard) further finetuned models like maldv/Qwentile2.5-32B-Instruct actually decreased the performance compared to the original model Qwen/Qwen2.5-32B-Instruct.
It's worth to note that the decrease is statiscally insignificant which imply that at best, the out-domain finetuning do not really hurts the model original capabilities acquired during pretraining.
Previous work addressed this (finetuning VS pretraining) but more investigation in this regard is required (any PhDs here ? This could be your question ...)
Check out the latest rankings: inceptionai/AraGen-Leaderboard
Post
2007
The winners of Best Paper Award in NeurIPs2024 (FoundationVision)
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction (2404.02905) has just released a new paper called infinty:
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis (2412.04431)
And i managed to build a space for it so anyone can try it out: MohamedRashad/Infinity
The idea of a text to image model using autoregressive archticture is quite interesting in my opinion.
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis (2412.04431)
And i managed to build a space for it so anyone can try it out: MohamedRashad/Infinity
The idea of a text to image model using autoregressive archticture is quite interesting in my opinion.
posted
an
update
29 days ago
Post
2007
The winners of Best Paper Award in NeurIPs2024 (FoundationVision)
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction (2404.02905) has just released a new paper called infinty:
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis (2412.04431)
And i managed to build a space for it so anyone can try it out: MohamedRashad/Infinity
The idea of a text to image model using autoregressive archticture is quite interesting in my opinion.
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis (2412.04431)
And i managed to build a space for it so anyone can try it out: MohamedRashad/Infinity
The idea of a text to image model using autoregressive archticture is quite interesting in my opinion.
reacted to
alielfilali01's
post with π€
about 2 months ago
Post
3456
Unpopular opinion: Open Source takes courage to do !
Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !
Cheers to the heroes here who see this!
Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !
Cheers to the heroes here who see this!
Post
2762
For those Game Developers out there who wants a tool to generate them 3d assets of different game items. I built something for you π
JeffreyXiang/TRELLIS-image-large +
Qwen/Qwen2.5-72B-Instruct +
Freepik/flux.1-lite-8B-alpha =
MohamedRashad/Game-Items-Generator
Happy building π
JeffreyXiang/TRELLIS-image-large +
Qwen/Qwen2.5-72B-Instruct +
Freepik/flux.1-lite-8B-alpha =
MohamedRashad/Game-Items-Generator
Happy building π
posted
an
update
about 2 months ago
Post
2762
For those Game Developers out there who wants a tool to generate them 3d assets of different game items. I built something for you π
JeffreyXiang/TRELLIS-image-large +
Qwen/Qwen2.5-72B-Instruct +
Freepik/flux.1-lite-8B-alpha =
MohamedRashad/Game-Items-Generator
Happy building π
JeffreyXiang/TRELLIS-image-large +
Qwen/Qwen2.5-72B-Instruct +
Freepik/flux.1-lite-8B-alpha =
MohamedRashad/Game-Items-Generator
Happy building π
Post
1663
A while back i shared this model
MohamedRashad/arabic-small-nougat that was a finetune from
facebook/nougat-small for the Arabic Language.
Today this humble project has been scaled with new models, new datasets, new space, and a new paper
Check everything throught this collection here:
MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e
Today this humble project has been scaled with new models, new datasets, new space, and a new paper
Check everything throught this collection here:
MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e
posted
an
update
2 months ago
Post
1663
A while back i shared this model
MohamedRashad/arabic-small-nougat that was a finetune from
facebook/nougat-small for the Arabic Language.
Today this humble project has been scaled with new models, new datasets, new space, and a new paper
Check everything throught this collection here:
MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e
Today this humble project has been scaled with new models, new datasets, new space, and a new paper
Check everything throught this collection here:
MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e
posted
an
update
2 months ago
Post
426
For those who want to try out the new
black-forest-labs/FLUX.1-Redux-dev
You can do this from my latest spaces MohamedRashad/Flux-Redux
You can do this from my latest spaces MohamedRashad/Flux-Redux
Post
1015
Qwen2.5-72B + Flux-dev + FinePersonas = Grounded Structured Character Generator
Check out my latest projects that uses Qwen/Qwen2.5-72B-Instruct , black-forest-labs/FLUX.1-dev , and MohamedRashad/FinePersonas-Lite to generate different characters in a world of your description.
Try Here: MohamedRashad/Character-Generator π€
Check out my latest projects that uses Qwen/Qwen2.5-72B-Instruct , black-forest-labs/FLUX.1-dev , and MohamedRashad/FinePersonas-Lite to generate different characters in a world of your description.
Try Here: MohamedRashad/Character-Generator π€
posted
an
update
5 months ago
Post
1015
Qwen2.5-72B + Flux-dev + FinePersonas = Grounded Structured Character Generator
Check out my latest projects that uses Qwen/Qwen2.5-72B-Instruct , black-forest-labs/FLUX.1-dev , and MohamedRashad/FinePersonas-Lite to generate different characters in a world of your description.
Try Here: MohamedRashad/Character-Generator π€
Check out my latest projects that uses Qwen/Qwen2.5-72B-Instruct , black-forest-labs/FLUX.1-dev , and MohamedRashad/FinePersonas-Lite to generate different characters in a world of your description.
Try Here: MohamedRashad/Character-Generator π€
reacted to
reach-vb's
post with π§
5 months ago
Post
2874
Less than two days ago Kyutai Labs open sourced Moshi - an ~7.6B on-device Speech to Speech foundation model and Mimi - SoTA streaming speech codec! π₯
The release includes:
1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) ( kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd)
2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) ( kyutai/mimi)
3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi)
How does Moshi work?
1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model.
2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality.
3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies.
4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU.
Model size & inference:
Moshiko/ka are 7.69B param models
bf16 ~16GB VRAM
8-bit ~8GB VRAM
4-bit ~4GB VRAM
You can run inference via Candle π¦, PyTorch and MLX - based on your hardware.
The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! π
The release includes:
1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) ( kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd)
2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) ( kyutai/mimi)
3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi)
How does Moshi work?
1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model.
2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality.
3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies.
4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU.
Model size & inference:
Moshiko/ka are 7.69B param models
bf16 ~16GB VRAM
8-bit ~8GB VRAM
4-bit ~4GB VRAM
You can run inference via Candle π¦, PyTorch and MLX - based on your hardware.
The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! π
Post
3426
For all the Muslims out there who are interested in Quran and its tafsir (explanations). This humble dataset consists of 84 different books of tafsir for nearly all the ayat in the Quran:
MohamedRashad/Quran-Tafseer
I hope it helps someone to build something nice and useful with it ^_^
MohamedRashad/Quran-Tafseer
I hope it helps someone to build something nice and useful with it ^_^
posted
an
update
5 months ago
Post
3426
For all the Muslims out there who are interested in Quran and its tafsir (explanations). This humble dataset consists of 84 different books of tafsir for nearly all the ayat in the Quran:
MohamedRashad/Quran-Tafseer
I hope it helps someone to build something nice and useful with it ^_^
MohamedRashad/Quran-Tafseer
I hope it helps someone to build something nice and useful with it ^_^
reacted to
rwightman's
post with β€οΈ
5 months ago
Post
1293
The
Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)
They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
timm
leaderboard
timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations. Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)
They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
Happy to be of help ^^
reacted to
rwightman's
post with π₯
6 months ago
Post
2064
The latest timm validation & test set results are now viewable by a leaderboard space:
timm/leaderboard
As of yesterday, I updated all of the results for ImageNet , ImageNet-ReaL, ImageNet-V2, ImageNet-R, ImageNet-A, and Sketch sets. The csv files can be found in the GH repo https://github.com/huggingface/pytorch-image-models/tree/main/results
Unfortunately the latest benchmark csv files are not yet up to date, there are some gaps in dataset results vs throughput/flop numbers impact the plots.
h/t to @MohamedRashad for making the first timm leaderboard.
As of yesterday, I updated all of the results for ImageNet , ImageNet-ReaL, ImageNet-V2, ImageNet-R, ImageNet-A, and Sketch sets. The csv files can be found in the GH repo https://github.com/huggingface/pytorch-image-models/tree/main/results
Unfortunately the latest benchmark csv files are not yet up to date, there are some gaps in dataset results vs throughput/flop numbers impact the plots.
h/t to @MohamedRashad for making the first timm leaderboard.
reacted to
vilarin's
post with π₯
6 months ago
Post
4198
Black Forest Labs, BASED! π
FLUX.1 is more delightful, with good instruction following.
FLUX.1 dev( black-forest-labs/FLUX.1-dev) with a 12B parameter distillation model, second only to Black Forest Labs' state-of-the-art model FLUX.1 pro. π
Update π€Official demo:
black-forest-labs/FLUX.1-dev
FLUX.1 is more delightful, with good instruction following.
FLUX.1 dev( black-forest-labs/FLUX.1-dev) with a 12B parameter distillation model, second only to Black Forest Labs' state-of-the-art model FLUX.1 pro. π
Update π€Official demo:
black-forest-labs/FLUX.1-dev