yxdyc commited on
Commit
a614d7e
·
verified ·
1 Parent(s): a9af24d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -4
README.md CHANGED
@@ -8,14 +8,19 @@ pinned: false
8
  ---
9
 
10
  ## Interaction
11
- Data-Juicer is A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs!
12
 
13
- Data-Juicer 是一个一站式数据处理系统,可以使数据质量更高、更丰富、更易被大语言模型"消化"!
14
 
15
  ## News
16
- - [2024-02-20] We have actively maintained an awesome list of LLM-Data, welcome to [visit](docs/awesome_llm_data.md) and contribute!
 
 
 
 
 
17
  - [2024-02-05] Our paper has been accepted by SIGMOD'24 industrial track!
18
  - [2024-01-10] Discover new horizons in "Data Mixture"—Our second data-centric LLM competition has kicked off! Please visit the competition's [official website](https://tianchi.aliyun.com/competition/entrance/532174) for more information.
19
  - [2024-01-05] We release **Data-Juicer v0.1.3** now!
20
- In this new version, we support **more Python versions** (3.7-3.10), and support **multimodal** dataset [converting](tools/multimodal/README.md)/[processing](docs/Operators.md) (Including texts, images, and audios. More modalities will be supported in the future).
21
  Besides, our paper is also updated to [v3](https://arxiv.org/abs/2309.02033).
 
8
  ---
9
 
10
  ## Interaction
11
+ Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs!
12
 
13
+ Data-Juicer 是一个一站式数据处理系统,为大模型提供更高质量、更丰富、更易”消化“的数据!
14
 
15
  ## News
16
+ - [2024-07-17] We utilized the Data-Juicer [Sandbox Laboratory Suite](https://github.com/modelscope/data-juicer/blob/main/docs/Sandbox.md) to systematically optimize data and models through an co-development workflow between data and models, achieving a new top spot on the [VBench](https://huggingface.co/spaces/Vchitect/VBench_Leaderboard) text-to-video leaderboard. The related achievements have been compiled and published in a [paper](http://arxiv.org/abs/2407.11784), and the model has been released on the [ModelScope](https://modelscope.cn/models/Data-Juicer/Data-Juicer-T2V) and [HuggingFace](https://huggingface.co/datajuicer/Data-Juicer-T2V) platforms.
17
+ - [2024-07-12] Our *awesome list of MLLM-Data* has evolved into a systemic [survey](https://arxiv.org/abs/2407.08583) from model-data co-development perspective. Welcome to [explore](docs/awesome_llm_data.md) and contribute!
18
+ - [2024-06-01] ModelScope-Sora "Data Directors" creative sprint—Our third data-centric LLM competition has kicked off! Please visit the competition's [official website](https://tianchi.aliyun.com/competition/entrance/532219) for more information.
19
+ - [2024-03-07] We release **Data-Juicer [v0.2.0](https://github.com/alibaba/data-juicer/releases/tag/v0.2.0)** now!
20
+ In this new version, we support more features for **multimodal data (including video now)**, and introduce **[DJ-SORA](docs/DJ_SORA.md)** to provide open large-scale, high-quality datasets for SORA-like models.
21
+ - [2024-02-20] We have actively maintained an *awesome list of LLM-Data*, welcome to [visit](docs/awesome_llm_data.md) and contribute!
22
  - [2024-02-05] Our paper has been accepted by SIGMOD'24 industrial track!
23
  - [2024-01-10] Discover new horizons in "Data Mixture"—Our second data-centric LLM competition has kicked off! Please visit the competition's [official website](https://tianchi.aliyun.com/competition/entrance/532174) for more information.
24
  - [2024-01-05] We release **Data-Juicer v0.1.3** now!
25
+ In this new version, we support **more Python versions** (3.8-3.10), and support **multimodal** dataset [converting](tools/multimodal/README.md)/[processing](docs/Operators.md) (Including texts, images, and audios. More modalities will be supported in the future).
26
  Besides, our paper is also updated to [v3](https://arxiv.org/abs/2309.02033).