Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,108 @@
|
|
1 |
-
---
|
2 |
-
license: gemma
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: gemma
|
3 |
+
language:
|
4 |
+
- ko
|
5 |
+
library_name: transformers
|
6 |
+
---
|
7 |
+
|
8 |
+
|
9 |
+
# Model Card for Korean STT Data error correction Feedback Model
|
10 |
+
|
11 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
12 |
+
|
13 |
+
This model provides feedback on Korean STT Data error correction, utilizing the Gemma2 architecture fine-tuned specifically for this task. It evaluates and generates constructive feedback to enhance the quality of STT Data.
|
14 |
+
|
15 |
+
|
16 |
+
## Model Details
|
17 |
+
|
18 |
+
### Model Description
|
19 |
+
|
20 |
+
This model is based on the Gemma2 architecture and has been fine-tuned on a dataset of Korean STT Data error correction. It aims to provide useful feedback for improving STT Data by evaluating various aspects of the text and suggesting enhancements.
|
21 |
+
|
22 |
+
- **Developed by:** Kray, Yunyoung
|
23 |
+
- **Funded by [optional]:** [More Information Needed]
|
24 |
+
- **Shared by [optional]:** [More Information Needed]
|
25 |
+
- **Model type:** Text Generation
|
26 |
+
- **Language(s) (NLP):** Korean
|
27 |
+
- **License:** [More Information Needed]
|
28 |
+
- **Finetuned from model [optional]:** Gemma2
|
29 |
+
|
30 |
+
### Model Sources [optional]
|
31 |
+
|
32 |
+
- **Repository:** [More Information Needed]
|
33 |
+
- **Paper [optional]:** [More Information Needed]
|
34 |
+
- **Demo [optional]:** [More Information Needed]
|
35 |
+
|
36 |
+
## Uses
|
37 |
+
|
38 |
+
|
39 |
+
### Direct Use
|
40 |
+
|
41 |
+
The model can be used directly to generate feedback on Korean STT Data error correction, helping enhance the understanding of counseling
|
42 |
+
|
43 |
+
### Downstream Use [optional]
|
44 |
+
|
45 |
+
When integrated into applications or services aimed at improving Contact Center or Chatbots, the model can enhance user-generated content through feedback and suggestions.
|
46 |
+
|
47 |
+
### Out-of-Scope Use
|
48 |
+
|
49 |
+
The model may not perform well on non-Korean text. It is not designed for tasks outside the scope of personal statement improvement.
|
50 |
+
|
51 |
+
## Bias, Risks, and Limitations
|
52 |
+
|
53 |
+
### Recommendations
|
54 |
+
|
55 |
+
Users should be aware that the model's feedback is based on patterns learned from training data, which might not cover all possible STT Data scenarios. It is recommended to use the feedback as a guide rather than an absolute measure.
|
56 |
+
|
57 |
+
## How to Get Started with the Model
|
58 |
+
|
59 |
+
To get started with the model, you can load it using the Hugging Face `transformers` library and use it to generate feedback for Korean STT Data.
|
60 |
+
|
61 |
+
|
62 |
+
```python
|
63 |
+
|
64 |
+
import torch
|
65 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
66 |
+
|
67 |
+
# Load the pre-trained model and tokenizer for STT correction
|
68 |
+
model = AutoModelForCausalLM.from_pretrained("stt-error-correction-model")
|
69 |
+
tokenizer = AutoTokenizer.from_pretrained("stt-error-correction-tokenizer")
|
70 |
+
|
71 |
+
# Create a text generation pipeline using the fine-tuned model and tokenizer
|
72 |
+
pipe_finetuned = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512)
|
73 |
+
|
74 |
+
# Define a placeholder for the STT error text
|
75 |
+
stt_error_text = "์๋ฝํ์์"
|
76 |
+
|
77 |
+
# Construct a list of messages for STT correction
|
78 |
+
messages = [
|
79 |
+
{
|
80 |
+
"role": "user",
|
81 |
+
"content": (
|
82 |
+
f"STT ์ค๋ฅ๊ฐ ํฌํจ๋ ํ
์คํธ๋ฅผ ์ฌ๋ฐ๋ฅด๊ฒ ์์ ํด์ฃผ์ธ์.\n"
|
83 |
+
f"STT ์ค๋ฅ ํ
์คํธ: {stt_error_text}\n"
|
84 |
+
)
|
85 |
+
}
|
86 |
+
]
|
87 |
+
|
88 |
+
# Prepare the input prompt using the tokenizer
|
89 |
+
prompt = pipe_finetuned.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
90 |
+
|
91 |
+
# Generate correction by passing the formatted prompt to the pipeline
|
92 |
+
outputs = pipe_finetuned(
|
93 |
+
prompt,
|
94 |
+
do_sample=True, # Enable sampling to generate diverse outputs
|
95 |
+
temperature=0.2, # Control randomness in text generation (lower value makes the output more focused)
|
96 |
+
top_k=50, # Limit the sampling pool to the top 50 tokens
|
97 |
+
top_p=0.95, # Use nucleus sampling to focus on the top 95% of probability mass
|
98 |
+
add_special_tokens=True # Include special tokens as per the model's requirements
|
99 |
+
)
|
100 |
+
|
101 |
+
# Print the generated correction
|
102 |
+
print(outputs[0]["generated_text"][len(prompt):])
|
103 |
+
|
104 |
+
|
105 |
+
|
106 |
+
|
107 |
+
|
108 |
+
|