Manel ALOUI

Manel-Hik

AI & ML interests

NLP recommender system, machine learning

Recent Activity

liked a Space 9 days ago
franciszzj/Leffa
updated a dataset about 2 months ago
OALL/ALRAGE
View all activity

Organizations

๐Ÿค— Course Team AI Law Assistant's profile picture LangChainDatasets's profile picture FreedomAI's profile picture fastai X Hugging Face Group 2022's profile picture Arabic Machine Learning 's profile picture Open Arabic LLM Leaderboard's profile picture Data Is Better Together Contributor's profile picture

Manel-Hik's activity

reacted to joylarkin's post with ๐Ÿš€ 4 months ago
view post
Post
2630
๐Ÿ’ฌ Chat as a way to query SQL! The Airtrain AI team is happy to share a new Hugging Face Space that lets you interact with Hugging Face Hub datasets using a natural language chatbot. ๐Ÿค—

Start Exploring ๐Ÿ‘‰ airtrain-ai/hf-dataset-chat-to-sql

This Space is forked from davidberenstein1957/text-to-sql-hub-datasetsย byย  @davidberenstein1957 and features chat capability with improved table naming. The tool works with Hugging Faceโ€™s recently released in-browser DuckDB-based SQL query engine for datasets.



reacted to Salama1429's post with ๐Ÿ‘ 5 months ago
view post
Post
1435
๐Ÿ“š Introducing the 101 Billion Arabic Words Dataset

๐ŸŒ Exciting Milestone in Arabic Language Technology! hashtag#NLP hashtag#ArabicLLM hashtag#LanguageModels

๐Ÿš€ Why It Matters:
1. ๐ŸŒŸ Large Language Models (LLMs) have brought transformative changes, primarily in English. It's time for Arabic to shine!
2. ๐ŸŽฏ This project addresses the critical challenge of bias in Arabic LLMs due to reliance on translated datasets.

๐Ÿ” Approach:
1. ๐Ÿ’ช Undertook a massive data mining initiative focusing exclusively on Arabic from Common Crawl WET files.
2. ๐Ÿงน Employed state-of-the-art cleaning and deduplication processes to maintain data quality and uniqueness.

๐Ÿ“ˆ Impact:
1. ๐Ÿ† Created the largest Arabic dataset to date with 101 billion words.
2. ๐Ÿ“ Enables the development of Arabic LLMs that are linguistically and culturally accurate.
3. ๐ŸŒ Sets a global benchmark for future Arabic language research.


๐Ÿ”— Paper: https://lnkd.in/dGAiaygn
๐Ÿ”— Dataset: https://lnkd.in/dGTMe5QV

- ๐Ÿ”„ Share your thoughts and let's drive the future of Arabic NLP together!

hashtag#DataScience hashtag#MachineLearning hashtag#ArtificialIntelligence hashtag#Innovation hashtag#ArabicData
New activity in silma-ai/silma-ar-custom-eval 5 months ago

Technical Report

#2 opened 5 months ago by
Manel-Hik
reacted to alielfilali01's post with ๐Ÿค— 8 months ago
view post
Post
1985
I'm officially considered #gpu_poor ๐Ÿ’€
But I'm #data_rich ๐Ÿ˜Ž
upvoted an article 8 months ago
view article
Article

Introducing the Open Arabic LLM Leaderboard

โ€ข 77
upvoted an article 9 months ago
view article
Article

๐Ÿฆ™โš—๏ธ Using Llama3 and distilabel to build fine-tuning datasets

By dvilasuero โ€ข
โ€ข 73