Spaces:
Sleeping
Sleeping
[email protected]
commited on
Merge pull request #23 from Sunwood-ai-labs/translate-readme-12027616857
Browse files- docs/README.en.md +38 -27
docs/README.en.md
CHANGED
@@ -31,7 +31,7 @@ license: mit
|
|
31 |
</p>
|
32 |
|
33 |
<h2 align="center">
|
34 |
-
|
35 |
</h2>
|
36 |
|
37 |
<p align="center">
|
@@ -44,42 +44,42 @@ license: mit
|
|
44 |
|
45 |
## 🚀 Project Overview
|
46 |
|
47 |
-
**Llama-finetune-sandbox** provides an experimental environment for learning and
|
48 |
|
49 |
|
50 |
## ✨ Main Features
|
51 |
|
52 |
-
1. **
|
53 |
- LoRA (Low-Rank Adaptation)
|
54 |
- QLoRA (Quantized LoRA)
|
55 |
|
56 |
-
2. **Flexible Model Settings
|
57 |
- Customizable maximum sequence length
|
58 |
- Various quantization options
|
59 |
- Multiple attention mechanisms
|
60 |
|
61 |
-
3. **Experiment Environment Setup
|
62 |
-
-
|
63 |
- Visualization of experimental results
|
64 |
|
65 |
-
4. **Context-Aware
|
66 |
- Generates high-quality Q&A datasets from Wikipedia data.
|
67 |
-
-
|
68 |
-
- Employs a
|
69 |
- Provides comprehensive code and explanations covering environment setup, model selection, data preprocessing, Q&A pair generation, quality evaluation, and the improvement process.
|
70 |
- Uses libraries such as `litellm`, `wikipedia`, and `transformers`.
|
71 |
- Generated Q&A pairs are saved in JSON format and can be easily uploaded to the Hugging Face Hub.
|
72 |
|
73 |
-
5. **LLM Evaluation System
|
74 |
- Automatically evaluates the quality of LLM responses.
|
75 |
-
- Evaluates questions, model answers, and LLM responses on a 4-
|
76 |
- Features error handling, retry functionality, logging, customizable evaluation criteria, and report generation in CSV and HTML formats.
|
77 |
-
- Also includes functionality for uploading to the
|
78 |
|
79 |
|
80 |
## 🔧 Usage
|
81 |
|
82 |
-
|
83 |
|
84 |
|
85 |
## 📦 Installation Instructions
|
@@ -87,48 +87,59 @@ Refer to the notebooks in this repository.
|
|
87 |
Refer to `requirements.txt` and install the necessary packages.
|
88 |
|
89 |
|
90 |
-
## 📚
|
91 |
|
92 |
-
This repository includes the following
|
93 |
|
94 |
-
###
|
95 |
-
-
|
96 |
- → See [`Llama_3_2_1B+3B_Conversational_+_2x_faster_finetuning_JP.md`](sandbox/Llama_3_2_1B+3B_Conversational_+_2x_faster_finetuning_JP.md) for details.
|
97 |
- → [Use this to convert from markdown to notebook format](https://huggingface.co/spaces/MakiAi/JupytextWebUI)
|
98 |
- [📒Notebook here](https://colab.research.google.com/drive/1AjtWF2vOEwzIoCMmlQfSTYCVgy4Y78Wi?usp=sharing)
|
99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
### Efficient Model Operation using Ollama and LiteLLM
|
101 |
-
- Setup and operation guide
|
102 |
- → See [`efficient-ollama-colab-setup-with-litellm-guide.md`](sandbox/efficient-ollama-colab-setup-with-litellm-guide.md) for details.
|
103 |
- [📒Notebook here](https://colab.research.google.com/drive/1buTPds1Go1NbZOLlpG94VG22GyK-F4GW?usp=sharing)
|
104 |
|
105 |
### Q&A Dataset Generation from Wikipedia Data (Sentence Pool QA Method)
|
106 |
-
- High-quality Q&A dataset generation using the
|
107 |
-
- → A new dataset creation method that
|
108 |
-
- → Chunk size
|
109 |
- → See [`wikipedia-qa-dataset-generator.md`](sandbox/wikipedia-qa-dataset-generator.md) for details.
|
110 |
- [📒Notebook here](https://colab.research.google.com/drive/1mmK5vxUzjk3lI6OnEPrQqyjSzqsEoXpk?usp=sharing)
|
111 |
|
112 |
-
### Context-Aware
|
113 |
-
- Q&A dataset generation with
|
114 |
- → A new method that automatically evaluates the quality of generated Q&A pairs and iteratively improves them.
|
115 |
- → Quantifies factuality, question quality, and answer completeness for evaluation.
|
116 |
-
- → Uses contextual information for high-
|
117 |
- → See [`context_aware_Reflexive_qa_generator_V2.md`](sandbox/context_aware_Reflexive_qa_generator_V2.md) for details.
|
118 |
- [📒Notebook here](https://colab.research.google.com/drive/1OYdgAuXHbl-0LUJgkLl_VqknaAEmAm0S?usp=sharing)
|
119 |
|
120 |
### LLM Evaluation System (LLMs as a Judge)
|
121 |
- Advanced quality evaluation system utilizing LLMs as evaluators
|
122 |
-
- → Automatically evaluates questions, model answers, and LLM responses on a 4-
|
123 |
- → Robust design with error handling and retry functionality.
|
124 |
- → Generates detailed evaluation reports in CSV and HTML formats.
|
125 |
- → See [`LLMs_as_a_Judge_TOHO_V2.md`](sandbox/LLMs_as_a_Judge_TOHO_V2.md) for details.
|
126 |
- [📒Notebook here](https://colab.research.google.com/drive/1Zjw3sOMa2v5RFD8dFfxMZ4NDGFoQOL7s?usp=sharing)
|
127 |
|
128 |
|
129 |
-
## 🆕
|
130 |
|
131 |
-
- **Implementation of the LLM Evaluation System
|
132 |
- Added information about the LLM evaluation system to README.md
|
133 |
|
134 |
|
|
|
31 |
</p>
|
32 |
|
33 |
<h2 align="center">
|
34 |
+
Llama model fine-tuning experiment environment
|
35 |
</h2>
|
36 |
|
37 |
<p align="center">
|
|
|
44 |
|
45 |
## 🚀 Project Overview
|
46 |
|
47 |
+
**Llama-finetune-sandbox** provides an experimental environment for learning and verifying Llama model fine-tuning. You can try various fine-tuning methods, customize models, and evaluate performance. It caters to a wide range of users, from beginners to researchers. Version 0.6.0 includes updated documentation and the implementation of an LLM evaluation system. This system automatically assesses the quality of LLM responses and generates detailed evaluation reports.
|
48 |
|
49 |
|
50 |
## ✨ Main Features
|
51 |
|
52 |
+
1. **Various Fine-tuning Methods:**
|
53 |
- LoRA (Low-Rank Adaptation)
|
54 |
- QLoRA (Quantized LoRA)
|
55 |
|
56 |
+
2. **Flexible Model Settings:**
|
57 |
- Customizable maximum sequence length
|
58 |
- Various quantization options
|
59 |
- Multiple attention mechanisms
|
60 |
|
61 |
+
3. **Experiment Environment Setup:**
|
62 |
+
- Optimized memory usage
|
63 |
- Visualization of experimental results
|
64 |
|
65 |
+
4. **Context-Aware Reflective QA Generation System:**
|
66 |
- Generates high-quality Q&A datasets from Wikipedia data.
|
67 |
+
- Uses LLMs to generate context-aware questions and answers, automatically evaluate quality, and iteratively improve them.
|
68 |
+
- Employs a reflective approach, quantifying factuality, question quality, and answer completeness to enable iterative improvements.
|
69 |
- Provides comprehensive code and explanations covering environment setup, model selection, data preprocessing, Q&A pair generation, quality evaluation, and the improvement process.
|
70 |
- Uses libraries such as `litellm`, `wikipedia`, and `transformers`.
|
71 |
- Generated Q&A pairs are saved in JSON format and can be easily uploaded to the Hugging Face Hub.
|
72 |
|
73 |
+
5. **LLM Evaluation System:**
|
74 |
- Automatically evaluates the quality of LLM responses.
|
75 |
+
- Evaluates questions, model answers, and LLM responses on a 4-level scale, generating detailed evaluation reports.
|
76 |
- Features error handling, retry functionality, logging, customizable evaluation criteria, and report generation in CSV and HTML formats.
|
77 |
+
- Also includes functionality for uploading to the HuggingFace Hub.
|
78 |
|
79 |
|
80 |
## 🔧 Usage
|
81 |
|
82 |
+
Please refer to the notebooks in this repository.
|
83 |
|
84 |
|
85 |
## 📦 Installation Instructions
|
|
|
87 |
Refer to `requirements.txt` and install the necessary packages.
|
88 |
|
89 |
|
90 |
+
## 📚 Examples
|
91 |
|
92 |
+
This repository includes the following examples:
|
93 |
|
94 |
+
### Fast Fine-tuning using Unsloth
|
95 |
+
- Fast fine-tuning implementation for Llama-3.2-1B/3B models
|
96 |
- → See [`Llama_3_2_1B+3B_Conversational_+_2x_faster_finetuning_JP.md`](sandbox/Llama_3_2_1B+3B_Conversational_+_2x_faster_finetuning_JP.md) for details.
|
97 |
- → [Use this to convert from markdown to notebook format](https://huggingface.co/spaces/MakiAi/JupytextWebUI)
|
98 |
- [📒Notebook here](https://colab.research.google.com/drive/1AjtWF2vOEwzIoCMmlQfSTYCVgy4Y78Wi?usp=sharing)
|
99 |
|
100 |
+
### Fast Inference using Unsloth
|
101 |
+
- Fast inference implementation for Llama-3.2 models
|
102 |
+
- → See [`Unsloth_inference_llama3-2.md`](sandbox/Unsloth_inference_llama3-2.md) for details.
|
103 |
+
- → Implementation of efficient inference processing for Llama-3.2 models using Unsloth
|
104 |
+
- [📒Notebook here](https://colab.research.google.com/drive/1FkAYiX2fbGPTRUopYw39Qt5UE2tWJRpa?usp=sharing)
|
105 |
+
|
106 |
+
- Fast inference implementation for LLM-JP models
|
107 |
+
- → See [`Unsloth_inference_llm_jp.md`](sandbox/Unsloth_inference_llm_jp.md) for details.
|
108 |
+
- → Implementation and performance optimization of fast inference processing for Japanese LLMs
|
109 |
+
- [📒Notebook here](https://colab.research.google.com/drive/1lbMKv7NzXQ1ynCg7DGQ6PcCFPK-zlSEG?usp=sharing)
|
110 |
+
|
111 |
### Efficient Model Operation using Ollama and LiteLLM
|
112 |
+
- Setup and operation guide on Google Colab
|
113 |
- → See [`efficient-ollama-colab-setup-with-litellm-guide.md`](sandbox/efficient-ollama-colab-setup-with-litellm-guide.md) for details.
|
114 |
- [📒Notebook here](https://colab.research.google.com/drive/1buTPds1Go1NbZOLlpG94VG22GyK-F4GW?usp=sharing)
|
115 |
|
116 |
### Q&A Dataset Generation from Wikipedia Data (Sentence Pool QA Method)
|
117 |
+
- High-quality Q&A dataset generation using the Sentence Pool QA method
|
118 |
+
- → A new dataset creation method that generates Q&A pairs while preserving context by pooling sentences separated by periods.
|
119 |
+
- → Chunk size can be flexibly adjusted (default 200 characters) to generate Q&A pairs with optimal context ranges for various applications.
|
120 |
- → See [`wikipedia-qa-dataset-generator.md`](sandbox/wikipedia-qa-dataset-generator.md) for details.
|
121 |
- [📒Notebook here](https://colab.research.google.com/drive/1mmK5vxUzjk3lI6OnEPrQqyjSzqsEoXpk?usp=sharing)
|
122 |
|
123 |
+
### Context-Aware Reflective QA Generation System
|
124 |
+
- Q&A dataset generation with reflective quality improvement
|
125 |
- → A new method that automatically evaluates the quality of generated Q&A pairs and iteratively improves them.
|
126 |
- → Quantifies factuality, question quality, and answer completeness for evaluation.
|
127 |
+
- → Uses contextual information for high-accuracy question generation and answer consistency checks.
|
128 |
- → See [`context_aware_Reflexive_qa_generator_V2.md`](sandbox/context_aware_Reflexive_qa_generator_V2.md) for details.
|
129 |
- [📒Notebook here](https://colab.research.google.com/drive/1OYdgAuXHbl-0LUJgkLl_VqknaAEmAm0S?usp=sharing)
|
130 |
|
131 |
### LLM Evaluation System (LLMs as a Judge)
|
132 |
- Advanced quality evaluation system utilizing LLMs as evaluators
|
133 |
+
- → Automatically evaluates questions, model answers, and LLM responses on a 4-level scale.
|
134 |
- → Robust design with error handling and retry functionality.
|
135 |
- → Generates detailed evaluation reports in CSV and HTML formats.
|
136 |
- → See [`LLMs_as_a_Judge_TOHO_V2.md`](sandbox/LLMs_as_a_Judge_TOHO_V2.md) for details.
|
137 |
- [📒Notebook here](https://colab.research.google.com/drive/1Zjw3sOMa2v5RFD8dFfxMZ4NDGFoQOL7s?usp=sharing)
|
138 |
|
139 |
|
140 |
+
## 🆕 Latest Information (v0.6.0)
|
141 |
|
142 |
+
- **Implementation of the LLM Evaluation System:** Added a system that automatically evaluates the quality of LLM responses. Questions, model answers, and LLM responses are compared and evaluated on a 4-level scale. Features error handling, retry functionality, logging, customizable evaluation criteria, and report generation in CSV and HTML formats.
|
143 |
- Added information about the LLM evaluation system to README.md
|
144 |
|
145 |
|