Update README.md
Browse files
README.md
CHANGED
@@ -33,26 +33,40 @@ The model is not designed for complex sentence structures, idiomatic expressions
|
|
33 |
|
34 |
Users (both direct and downstream) should be aware that the model's accuracy may decline with more complex or less conventional sentence structures. It's recommended to use this model in conjunction with other tools for more comprehensive linguistic analysis.
|
35 |
|
|
|
36 |
## Training Details
|
37 |
|
|
|
|
|
38 |
### Training Data
|
39 |
|
40 |
The model was trained on a curated dataset of simple English sentences annotated with Universal Dependency Parsing tags. The training data focused on ensuring high accuracy in syntactic role assignment.
|
41 |
|
42 |
### Training Procedure
|
43 |
|
|
|
44 |
|
|
|
45 |
|
46 |
-
|
|
|
|
|
47 |
|
48 |
-
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
## Evaluation
|
51 |
|
52 |
-
|
53 |
-
|
54 |
-
### Testing Data, Factors & Metrics
|
55 |
-
|
56 |
|
57 |
#### Testing Data
|
58 |
|
|
|
33 |
|
34 |
Users (both direct and downstream) should be aware that the model's accuracy may decline with more complex or less conventional sentence structures. It's recommended to use this model in conjunction with other tools for more comprehensive linguistic analysis.
|
35 |
|
36 |
+
|
37 |
## Training Details
|
38 |
|
39 |
+
The model was trained on a curated dataset of simple English sentences annotated with Universal Dependency Parsing tags. The dataset was sourced from the "manupinasco/syntax_analysis" dataset available on Hugging Face's Datasets Hub. The training data focused on ensuring high accuracy in syntactic role assignment, aiming to improve the model's ability to understand and generate syntactically correct responses.
|
40 |
+
|
41 |
### Training Data
|
42 |
|
43 |
The model was trained on a curated dataset of simple English sentences annotated with Universal Dependency Parsing tags. The training data focused on ensuring high accuracy in syntactic role assignment.
|
44 |
|
45 |
### Training Procedure
|
46 |
|
47 |
+
The training procedure involved fine-tuning the unsloth/Meta-Llama-3.1-8B-Instruct model using a custom prompt format inspired by the Alpaca prompt template. The procedure included quantization to 4-bit to reduce memory usage, and mixed precision training to leverage GPU capabilities effectively.
|
48 |
|
49 |
+
Key components of the training process:
|
50 |
|
51 |
+
Model Quantization: 4-bit quantization was applied to the model to reduce VRAM usage while maintaining performance.
|
52 |
+
Gradient Checkpointing: Enabled using "unsloth" mode to save memory during training, which allowed handling longer sequences effectively.
|
53 |
+
Prompt Template: The model was trained using a structured prompt that provided instructions and expected responses, ensuring consistency and clarity in the tasks presented to the model.
|
54 |
|
55 |
+
#### Training Hyperparameters
|
56 |
|
57 |
+
- **Batch Size**: 2 per device
|
58 |
+
- **Gradient Accumulation**: 4 steps
|
59 |
+
- **Warmup Steps**: 5
|
60 |
+
- **Max Training Steps**: 60
|
61 |
+
- **Learning Rate**: 2e-4
|
62 |
+
- **Optimizer**: AdamW with 8-bit quantization
|
63 |
+
- **Weight Decay**: 0.01
|
64 |
+
- **LR Scheduler**: Linear
|
65 |
+
- **Mixed Precision**: fp16 (or bf16 if supported)
|
66 |
+
-
|
67 |
## Evaluation
|
68 |
|
69 |
+
The model's performance was evaluated using the test split from the same dataset. The evaluation focused on syntactic role assignment accuracy.
|
|
|
|
|
|
|
70 |
|
71 |
#### Testing Data
|
72 |
|