davidberenstein1957 HF staff commited on
Commit
84677f5
Β·
1 Parent(s): 904d1fd

update readme on env var usage

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -28,13 +28,12 @@ hf_oauth_scopes:
28
 
29
  ## Introduction
30
 
31
- Synthetic Data Generator is a tool that allows you to create high-quality datasets for training and fine-tuning language models. It leverages the power of distilabel and LLMs to generate synthetic data tailored to your specific needs.
32
 
33
  Supported Tasks:
34
 
35
  - Text Classification
36
- - Supervised Fine-Tuning
37
- - Judging and rationale evaluation
38
 
39
  This tool simplifies the process of creating custom datasets, enabling you to:
40
 
@@ -87,7 +86,7 @@ Optionally, you can use different models and APIs.
87
 
88
  - `BASE_URL`: The base URL for any OpenAI compatible API, e.g. `https://api-inference.huggingface.co/v1/`, `https://api.openai.com/v1/`.
89
  - `MODEL`: The model to use for generating the dataset, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`, `gpt-4o`.
90
- - `API_KEY`: The API key to use for the corresponding API, e.g. `hf_...`, `sk-...`.
91
 
92
  Optionally, you can also push your datasets to Argilla for further curation by setting the following environment variables:
93
 
 
28
 
29
  ## Introduction
30
 
31
+ Synthetic Data Generator is a tool that allows you to create high-quality datasets for training and fine-tuning language models. It leverages the power of distilabel and LLMs to generate synthetic data tailored to your specific needs. [The announcement blog](https://huggingface.co/blog/synthetic-data-generator) goes over a practical example of how to use it.
32
 
33
  Supported Tasks:
34
 
35
  - Text Classification
36
+ - Chat Data for Supervised Fine-Tuning
 
37
 
38
  This tool simplifies the process of creating custom datasets, enabling you to:
39
 
 
86
 
87
  - `BASE_URL`: The base URL for any OpenAI compatible API, e.g. `https://api-inference.huggingface.co/v1/`, `https://api.openai.com/v1/`.
88
  - `MODEL`: The model to use for generating the dataset, e.g. `meta-llama/Meta-Llama-3.1-8B-Instruct`, `gpt-4o`.
89
+ - `API_KEY`: The API key to use for the generation API, e.g. `hf_...`, `sk-...`. If not provided, it will default to the provided `HF_TOKEN` environment variable.
90
 
91
  Optionally, you can also push your datasets to Argilla for further curation by setting the following environment variables:
92