ROHITH VENKATA REDDY's picture
7 2

ROHITH VENKATA REDDY

knight7561

AI & ML interests

Deep learning, Autonomous Driving

Recent Activity

Organizations

Hugging Face Discord Community's profile picture

knight7561's activity

replied to chansung's post 9 days ago
reacted to cfahlgren1's post with ❀️ about 2 months ago
view post
Post
3126
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
Β·
upvoted 2 articles 4 months ago
view article
Article

ColPali: Efficient Document Retrieval with Vision Language Models πŸ‘€

By manu β€’
β€’ 186
view article
Article

Training and Finetuning Embedding Models with Sentence Transformers v3

β€’ 171