Burhan's picture
4

Burhan PRO

brhnsbn

AI & ML interests

Open Source, Large Vision Models, Community, Responsible AI

Recent Activity

Organizations

Miami AI Hub's profile picture AI Starter Pack's profile picture

brhnsbn's activity

reacted to clem's post with 🤗 10 months ago
view post
Post
2537
Introducing gretelai/synthetic_text_to_sql by https://huggingface.co/gretelai

It stands as the largest and most diverse synthetic Text-to-SQL dataset available to-date.

The dataset includes:

- 105,851 records partitioned into 100,000 train and 5,851 test records
~23M total tokens, including ~12M SQL tokens
- Coverage across 100 distinct domains/verticals
- Comprehensive array of SQL tasks: data definition, retrieval, manipulation, analytics & reporting
- Wide range of SQL complexity levels, including subqueries, single joins, multiple joins, aggregations, window functions, set operations
- Database context, including table and view create statements
- Natural language explanations of what the SQL query is doing
- Contextual tags to optimize model training

Blogpost: https://gretel.ai/blog/synthetic-text-to-sql-dataset
Dataset: gretelai/synthetic_text_to_sql
  • 1 reply
·
updated a Space 10 months ago