Upload 13 files
Browse files- README.md +76 -35
- all_results.json +11 -11
- eval_results.json +5 -5
- model.safetensors +1 -1
- train_results.json +6 -6
- trainer_state.json +82 -124
- training_args.bin +1 -1
README.md
CHANGED
@@ -4,58 +4,99 @@ license: mit
|
|
4 |
base_model: agentlans/deberta-v3-xsmall-zyda-2
|
5 |
tags:
|
6 |
- generated_from_trainer
|
|
|
|
|
7 |
model-index:
|
8 |
-
- name: deberta-v3-xsmall-zyda-2-sentiment
|
9 |
results: []
|
10 |
---
|
11 |
|
12 |
-
|
13 |
-
should probably proofread and complete it, then remove this comment. -->
|
14 |
|
15 |
-
|
16 |
|
17 |
-
This model is a fine-tuned version of [agentlans/deberta-v3-xsmall-zyda-2](https://huggingface.co/agentlans/deberta-v3-xsmall-zyda-2) on
|
18 |
-
It achieves the following results on the evaluation set:
|
19 |
-
- Loss: 0.0493
|
20 |
-
- Mse: 0.0493
|
21 |
|
22 |
-
|
|
|
23 |
|
24 |
-
|
25 |
|
26 |
-
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
|
|
|
|
33 |
|
34 |
-
##
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
- eval_batch_size: 8
|
42 |
-
- seed: 42
|
43 |
-
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
44 |
-
- lr_scheduler_type: linear
|
45 |
-
- num_epochs: 3.0
|
46 |
|
47 |
-
|
|
|
|
|
|
|
|
|
48 |
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
-
|
57 |
|
58 |
-
- Transformers 4.46.3
|
59 |
-
-
|
60 |
-
- Datasets 3.1.0
|
61 |
-
- Tokenizers 0.20.3
|
|
|
4 |
base_model: agentlans/deberta-v3-xsmall-zyda-2
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
+
- sentiment-analysis
|
8 |
+
- twitter-sentiment
|
9 |
model-index:
|
10 |
+
- name: deberta-v3-xsmall-zyda-2-transformed-sentiment-new
|
11 |
results: []
|
12 |
---
|
13 |
|
14 |
+
# DeBERTa-v3-XSmall Sentiment Analysis Model
|
|
|
15 |
|
16 |
+
## Model Overview
|
17 |
|
18 |
+
This model is a fine-tuned version of [agentlans/deberta-v3-xsmall-zyda-2](https://huggingface.co/agentlans/deberta-v3-xsmall-zyda-2) optimized for sentiment analysis on Twitter data. It achieves the following results on the evaluation set:
|
|
|
|
|
|
|
19 |
|
20 |
+
- Loss: 0.0656
|
21 |
+
- MSE: 0.0656
|
22 |
|
23 |
+
## Dataset
|
24 |
|
25 |
+
The model was trained on the [Twitter Sentiment Meta-Analysis Dataset](https://huggingface.co/datasets/agentlans/twitter-sentiment-meta-analysis).
|
26 |
|
27 |
+
### Dataset Description
|
28 |
|
29 |
+
This dataset contains sentiment analysis results for English tweets collected between September 2009 and January 2010. The tweets were processed and analyzed using 10 different sentiment classifiers, with the final sentiment score derived from principal component analysis (PCA).
|
30 |
|
31 |
+
- **Source**: Cheng-Caverlee-Lee Twitter Scrape (Sept 2009 - Jan 2010)
|
32 |
+
- **Size**: 138,690 tweets
|
33 |
+
- **Language**: English only (filtered using langdetect)
|
34 |
|
35 |
+
## Usage
|
36 |
|
37 |
+
Here's an example of how to use the model for sentiment prediction:
|
38 |
|
39 |
+
```
|
40 |
+
import torch
|
41 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
+
# Load model and tokenizer
|
44 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
45 |
+
model_name = "agentlans/deberta-v3-xsmall-zyda-2-sentiment"
|
46 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1).to(device)
|
47 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
48 |
|
49 |
+
# Function to perform inference
|
50 |
+
def predict_score(text):
|
51 |
+
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
|
52 |
+
with torch.no_grad():
|
53 |
+
logits = model(**inputs).logits
|
54 |
+
return logits.item()
|
55 |
+
|
56 |
+
# Example usage
|
57 |
+
input_text = "I accidentally the whole thing. Is that bad?"
|
58 |
+
score = predict_score(input_text)
|
59 |
+
print(f"Predicted score: {score}")
|
60 |
+
```
|
61 |
+
|
62 |
+
## Example Outputs
|
63 |
+
|
64 |
+
| Text | Sentiment |
|
65 |
+
|------|----------:|
|
66 |
+
| Nothing seems to go right, and I'm constantly frustrated. | -2.25 |
|
67 |
+
| Everything is falling apart, and I can't see any way out. | -2.02 |
|
68 |
+
| I feel completely overwhelmed by the challenges I face. | -1.62 |
|
69 |
+
| There are some minor improvements, but overall, things are still tough. | -0.81 |
|
70 |
+
| I can see a glimmer of hope amidst the difficulties I encounter. | 1.03 |
|
71 |
+
| Things are starting to look up, and I'm cautiously optimistic. | 2.06 |
|
72 |
+
| There are many good things happening, and I appreciate them. | 2.23 |
|
73 |
+
| I'm feeling more positive about my situation than I have in a while. | 2.39 |
|
74 |
+
| Every day brings new joy and possibilities; I feel truly blessed. | 2.54 |
|
75 |
+
| Life is full of opportunities, and I'm excited about the future. | 2.56 |
|
76 |
+
|
77 |
+
## Training Procedure
|
78 |
|
79 |
+
### Hyperparameters
|
80 |
+
|
81 |
+
- Learning rate: 5e-05
|
82 |
+
- Train batch size: 64
|
83 |
+
- Eval batch size: 8
|
84 |
+
- Seed: 42
|
85 |
+
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
|
86 |
+
- LR scheduler: Linear
|
87 |
+
- Number of epochs: 3.0
|
88 |
+
|
89 |
+
### Training Results
|
90 |
+
|
91 |
+
| Training Loss | Epoch | Step | Validation Loss | MSE |
|
92 |
+
|:-------------:|:-----:|:----:|:---------------:|:------:|
|
93 |
+
| 0.0792 | 1.0 | 2011 | 0.0871 | 0.0871 |
|
94 |
+
| 0.0541 | 2.0 | 4022 | 0.0691 | 0.0691 |
|
95 |
+
| 0.0411 | 3.0 | 6033 | 0.0656 | 0.0656 |
|
96 |
|
97 |
+
## Framework Versions
|
98 |
|
99 |
+
- Transformers: 4.46.3
|
100 |
+
- PyTorch: 2.5.1+cu124
|
101 |
+
- Datasets: 3.1.0
|
102 |
+
- Tokenizers: 0.20.3
|
all_results.json
CHANGED
@@ -1,15 +1,15 @@
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
-
"eval_loss": 0.
|
4 |
-
"eval_mse": 0.
|
5 |
-
"eval_runtime":
|
6 |
"eval_samples": 10000,
|
7 |
-
"eval_samples_per_second":
|
8 |
-
"eval_steps_per_second":
|
9 |
-
"total_flos":
|
10 |
-
"train_loss": 0.
|
11 |
-
"train_runtime":
|
12 |
-
"train_samples":
|
13 |
-
"train_samples_per_second":
|
14 |
-
"train_steps_per_second": 7.
|
15 |
}
|
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
+
"eval_loss": 0.06556913256645203,
|
4 |
+
"eval_mse": 0.06556913494220615,
|
5 |
+
"eval_runtime": 13.1744,
|
6 |
"eval_samples": 10000,
|
7 |
+
"eval_samples_per_second": 759.049,
|
8 |
+
"eval_steps_per_second": 94.881,
|
9 |
+
"total_flos": 6357984788759040.0,
|
10 |
+
"train_loss": 0.07220485827706652,
|
11 |
+
"train_runtime": 846.782,
|
12 |
+
"train_samples": 128690,
|
13 |
+
"train_samples_per_second": 455.926,
|
14 |
+
"train_steps_per_second": 7.125
|
15 |
}
|
eval_results.json
CHANGED
@@ -1,9 +1,9 @@
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
-
"eval_loss": 0.
|
4 |
-
"eval_mse": 0.
|
5 |
-
"eval_runtime":
|
6 |
"eval_samples": 10000,
|
7 |
-
"eval_samples_per_second":
|
8 |
-
"eval_steps_per_second":
|
9 |
}
|
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
+
"eval_loss": 0.06556913256645203,
|
4 |
+
"eval_mse": 0.06556913494220615,
|
5 |
+
"eval_runtime": 13.1744,
|
6 |
"eval_samples": 10000,
|
7 |
+
"eval_samples_per_second": 759.049,
|
8 |
+
"eval_steps_per_second": 94.881
|
9 |
}
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 283345892
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2ca36fde7f77cd9138373636d634d704dc626ed3f64e5adca78c6790760099f0
|
3 |
size 283345892
|
train_results.json
CHANGED
@@ -1,9 +1,9 @@
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
-
"total_flos":
|
4 |
-
"train_loss": 0.
|
5 |
-
"train_runtime":
|
6 |
-
"train_samples":
|
7 |
-
"train_samples_per_second":
|
8 |
-
"train_steps_per_second": 7.
|
9 |
}
|
|
|
1 |
{
|
2 |
"epoch": 3.0,
|
3 |
+
"total_flos": 6357984788759040.0,
|
4 |
+
"train_loss": 0.07220485827706652,
|
5 |
+
"train_runtime": 846.782,
|
6 |
+
"train_samples": 128690,
|
7 |
+
"train_samples_per_second": 455.926,
|
8 |
+
"train_steps_per_second": 7.125
|
9 |
}
|
trainer_state.json
CHANGED
@@ -1,178 +1,136 @@
|
|
1 |
{
|
2 |
-
"best_metric": 0.
|
3 |
-
"best_model_checkpoint": "deberta-v3-xsmall-zyda-2-sentiment/checkpoint-
|
4 |
"epoch": 3.0,
|
5 |
"eval_steps": 500,
|
6 |
-
"global_step":
|
7 |
"is_hyper_param_search": false,
|
8 |
"is_local_process_zero": true,
|
9 |
"is_world_process_zero": true,
|
10 |
"log_history": [
|
11 |
{
|
12 |
-
"epoch": 0.
|
13 |
-
"grad_norm":
|
14 |
-
"learning_rate": 4.
|
15 |
-
"loss": 0.
|
16 |
"step": 500
|
17 |
},
|
18 |
{
|
19 |
-
"epoch": 0.
|
20 |
-
"grad_norm":
|
21 |
-
"learning_rate": 4.
|
22 |
-
"loss": 0.
|
23 |
"step": 1000
|
24 |
},
|
25 |
{
|
26 |
-
"epoch": 0.
|
27 |
-
"grad_norm": 1.
|
28 |
-
"learning_rate":
|
29 |
-
"loss": 0.
|
30 |
"step": 1500
|
31 |
},
|
32 |
{
|
33 |
-
"epoch": 0.
|
34 |
-
"grad_norm": 1.
|
35 |
-
"learning_rate": 3.
|
36 |
-
"loss": 0.
|
37 |
"step": 2000
|
38 |
},
|
39 |
{
|
40 |
-
"epoch": 0
|
41 |
-
"
|
42 |
-
"
|
43 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
"step": 2500
|
45 |
},
|
46 |
{
|
47 |
-
"epoch":
|
48 |
-
"grad_norm":
|
49 |
-
"learning_rate":
|
50 |
-
"loss": 0.
|
51 |
"step": 3000
|
52 |
},
|
53 |
{
|
54 |
-
"epoch": 1.
|
55 |
-
"
|
56 |
-
"
|
57 |
-
"
|
58 |
-
"eval_samples_per_second": 950.17,
|
59 |
-
"eval_steps_per_second": 118.771,
|
60 |
-
"step": 3143
|
61 |
-
},
|
62 |
-
{
|
63 |
-
"epoch": 1.1135857461024499,
|
64 |
-
"grad_norm": 0.9926055073738098,
|
65 |
-
"learning_rate": 3.144023756495917e-05,
|
66 |
-
"loss": 0.0522,
|
67 |
"step": 3500
|
68 |
},
|
69 |
{
|
70 |
-
"epoch": 1.
|
71 |
-
"grad_norm":
|
72 |
-
"learning_rate":
|
73 |
-
"loss": 0.
|
74 |
"step": 4000
|
75 |
},
|
76 |
{
|
77 |
-
"epoch":
|
78 |
-
"
|
79 |
-
"
|
80 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
"step": 4500
|
82 |
},
|
83 |
{
|
84 |
-
"epoch":
|
85 |
-
"grad_norm":
|
86 |
-
"learning_rate":
|
87 |
-
"loss": 0.
|
88 |
"step": 5000
|
89 |
},
|
90 |
{
|
91 |
-
"epoch":
|
92 |
-
"grad_norm":
|
93 |
-
"learning_rate":
|
94 |
-
"loss": 0.
|
95 |
"step": 5500
|
96 |
},
|
97 |
{
|
98 |
-
"epoch":
|
99 |
-
"grad_norm": 0.
|
100 |
-
"learning_rate":
|
101 |
"loss": 0.0411,
|
102 |
"step": 6000
|
103 |
},
|
104 |
-
{
|
105 |
-
"epoch": 2.0,
|
106 |
-
"eval_loss": 0.04927213117480278,
|
107 |
-
"eval_mse": 0.049272132016595305,
|
108 |
-
"eval_runtime": 11.3101,
|
109 |
-
"eval_samples_per_second": 884.162,
|
110 |
-
"eval_steps_per_second": 110.52,
|
111 |
-
"step": 6286
|
112 |
-
},
|
113 |
-
{
|
114 |
-
"epoch": 2.068087814190264,
|
115 |
-
"grad_norm": 0.6708300709724426,
|
116 |
-
"learning_rate": 1.5531869763495598e-05,
|
117 |
-
"loss": 0.0387,
|
118 |
-
"step": 6500
|
119 |
-
},
|
120 |
-
{
|
121 |
-
"epoch": 2.2271714922048997,
|
122 |
-
"grad_norm": 0.6490187644958496,
|
123 |
-
"learning_rate": 1.2880475129918337e-05,
|
124 |
-
"loss": 0.0337,
|
125 |
-
"step": 7000
|
126 |
-
},
|
127 |
-
{
|
128 |
-
"epoch": 2.3862551702195356,
|
129 |
-
"grad_norm": 0.7127770185470581,
|
130 |
-
"learning_rate": 1.0229080496341075e-05,
|
131 |
-
"loss": 0.0324,
|
132 |
-
"step": 7500
|
133 |
-
},
|
134 |
-
{
|
135 |
-
"epoch": 2.545338848234171,
|
136 |
-
"grad_norm": 0.6604452133178711,
|
137 |
-
"learning_rate": 7.5776858627638146e-06,
|
138 |
-
"loss": 0.0326,
|
139 |
-
"step": 8000
|
140 |
-
},
|
141 |
-
{
|
142 |
-
"epoch": 2.704422526248807,
|
143 |
-
"grad_norm": 0.5042712092399597,
|
144 |
-
"learning_rate": 4.926291229186552e-06,
|
145 |
-
"loss": 0.0323,
|
146 |
-
"step": 8500
|
147 |
-
},
|
148 |
-
{
|
149 |
-
"epoch": 2.8635062042634427,
|
150 |
-
"grad_norm": 0.573316752910614,
|
151 |
-
"learning_rate": 2.2748965956092908e-06,
|
152 |
-
"loss": 0.0321,
|
153 |
-
"step": 9000
|
154 |
-
},
|
155 |
{
|
156 |
"epoch": 3.0,
|
157 |
-
"eval_loss": 0.
|
158 |
-
"eval_mse": 0.
|
159 |
-
"eval_runtime":
|
160 |
-
"eval_samples_per_second":
|
161 |
-
"eval_steps_per_second":
|
162 |
-
"step":
|
163 |
},
|
164 |
{
|
165 |
"epoch": 3.0,
|
166 |
-
"step":
|
167 |
-
"total_flos":
|
168 |
-
"train_loss": 0.
|
169 |
-
"train_runtime":
|
170 |
-
"train_samples_per_second":
|
171 |
-
"train_steps_per_second": 7.
|
172 |
}
|
173 |
],
|
174 |
"logging_steps": 500,
|
175 |
-
"max_steps":
|
176 |
"num_input_tokens_seen": 0,
|
177 |
"num_train_epochs": 3,
|
178 |
"save_steps": 500,
|
@@ -188,7 +146,7 @@
|
|
188 |
"attributes": {}
|
189 |
}
|
190 |
},
|
191 |
-
"total_flos":
|
192 |
"train_batch_size": 64,
|
193 |
"trial_name": null,
|
194 |
"trial_params": null
|
|
|
1 |
{
|
2 |
+
"best_metric": 0.06556913256645203,
|
3 |
+
"best_model_checkpoint": "deberta-v3-xsmall-zyda-2-transformed-sentiment-new/checkpoint-6033",
|
4 |
"epoch": 3.0,
|
5 |
"eval_steps": 500,
|
6 |
+
"global_step": 6033,
|
7 |
"is_hyper_param_search": false,
|
8 |
"is_local_process_zero": true,
|
9 |
"is_world_process_zero": true,
|
10 |
"log_history": [
|
11 |
{
|
12 |
+
"epoch": 0.2486325211337643,
|
13 |
+
"grad_norm": 2.0000367164611816,
|
14 |
+
"learning_rate": 4.5856124647770596e-05,
|
15 |
+
"loss": 0.2003,
|
16 |
"step": 500
|
17 |
},
|
18 |
{
|
19 |
+
"epoch": 0.4972650422675286,
|
20 |
+
"grad_norm": 2.3387935161590576,
|
21 |
+
"learning_rate": 4.17122492955412e-05,
|
22 |
+
"loss": 0.1052,
|
23 |
"step": 1000
|
24 |
},
|
25 |
{
|
26 |
+
"epoch": 0.7458975634012929,
|
27 |
+
"grad_norm": 1.853918194770813,
|
28 |
+
"learning_rate": 3.7568373943311785e-05,
|
29 |
+
"loss": 0.085,
|
30 |
"step": 1500
|
31 |
},
|
32 |
{
|
33 |
+
"epoch": 0.9945300845350572,
|
34 |
+
"grad_norm": 1.7671293020248413,
|
35 |
+
"learning_rate": 3.342449859108238e-05,
|
36 |
+
"loss": 0.0792,
|
37 |
"step": 2000
|
38 |
},
|
39 |
{
|
40 |
+
"epoch": 1.0,
|
41 |
+
"eval_loss": 0.08709739148616791,
|
42 |
+
"eval_mse": 0.08709739712527088,
|
43 |
+
"eval_runtime": 14.8419,
|
44 |
+
"eval_samples_per_second": 673.767,
|
45 |
+
"eval_steps_per_second": 84.221,
|
46 |
+
"step": 2011
|
47 |
+
},
|
48 |
+
{
|
49 |
+
"epoch": 1.2431626056688214,
|
50 |
+
"grad_norm": 1.0026581287384033,
|
51 |
+
"learning_rate": 2.928062323885298e-05,
|
52 |
+
"loss": 0.0594,
|
53 |
"step": 2500
|
54 |
},
|
55 |
{
|
56 |
+
"epoch": 1.4917951268025857,
|
57 |
+
"grad_norm": 0.9303980469703674,
|
58 |
+
"learning_rate": 2.5136747886623573e-05,
|
59 |
+
"loss": 0.0594,
|
60 |
"step": 3000
|
61 |
},
|
62 |
{
|
63 |
+
"epoch": 1.74042764793635,
|
64 |
+
"grad_norm": 1.7368980646133423,
|
65 |
+
"learning_rate": 2.0992872534394168e-05,
|
66 |
+
"loss": 0.0551,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
"step": 3500
|
68 |
},
|
69 |
{
|
70 |
+
"epoch": 1.9890601690701144,
|
71 |
+
"grad_norm": 0.6475295424461365,
|
72 |
+
"learning_rate": 1.684899718216476e-05,
|
73 |
+
"loss": 0.0541,
|
74 |
"step": 4000
|
75 |
},
|
76 |
{
|
77 |
+
"epoch": 2.0,
|
78 |
+
"eval_loss": 0.06912554055452347,
|
79 |
+
"eval_mse": 0.06912553393413896,
|
80 |
+
"eval_runtime": 13.2293,
|
81 |
+
"eval_samples_per_second": 755.898,
|
82 |
+
"eval_steps_per_second": 94.487,
|
83 |
+
"step": 4022
|
84 |
+
},
|
85 |
+
{
|
86 |
+
"epoch": 2.2376926902038785,
|
87 |
+
"grad_norm": 0.6805059909820557,
|
88 |
+
"learning_rate": 1.2705121829935357e-05,
|
89 |
+
"loss": 0.0444,
|
90 |
"step": 4500
|
91 |
},
|
92 |
{
|
93 |
+
"epoch": 2.486325211337643,
|
94 |
+
"grad_norm": 1.3735737800598145,
|
95 |
+
"learning_rate": 8.56124647770595e-06,
|
96 |
+
"loss": 0.043,
|
97 |
"step": 5000
|
98 |
},
|
99 |
{
|
100 |
+
"epoch": 2.734957732471407,
|
101 |
+
"grad_norm": 0.9396611452102661,
|
102 |
+
"learning_rate": 4.417371125476545e-06,
|
103 |
+
"loss": 0.0422,
|
104 |
"step": 5500
|
105 |
},
|
106 |
{
|
107 |
+
"epoch": 2.9835902536051715,
|
108 |
+
"grad_norm": 0.756208062171936,
|
109 |
+
"learning_rate": 2.7349577324714074e-07,
|
110 |
"loss": 0.0411,
|
111 |
"step": 6000
|
112 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
{
|
114 |
"epoch": 3.0,
|
115 |
+
"eval_loss": 0.06556913256645203,
|
116 |
+
"eval_mse": 0.06556913494220615,
|
117 |
+
"eval_runtime": 13.2288,
|
118 |
+
"eval_samples_per_second": 755.924,
|
119 |
+
"eval_steps_per_second": 94.491,
|
120 |
+
"step": 6033
|
121 |
},
|
122 |
{
|
123 |
"epoch": 3.0,
|
124 |
+
"step": 6033,
|
125 |
+
"total_flos": 6357984788759040.0,
|
126 |
+
"train_loss": 0.07220485827706652,
|
127 |
+
"train_runtime": 846.782,
|
128 |
+
"train_samples_per_second": 455.926,
|
129 |
+
"train_steps_per_second": 7.125
|
130 |
}
|
131 |
],
|
132 |
"logging_steps": 500,
|
133 |
+
"max_steps": 6033,
|
134 |
"num_input_tokens_seen": 0,
|
135 |
"num_train_epochs": 3,
|
136 |
"save_steps": 500,
|
|
|
146 |
"attributes": {}
|
147 |
}
|
148 |
},
|
149 |
+
"total_flos": 6357984788759040.0,
|
150 |
"train_batch_size": 64,
|
151 |
"trial_name": null,
|
152 |
"trial_params": null
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 5368
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9c472a73a883ba5245b32b70e114642c495e951ce29acca84c258c8a402b2a81
|
3 |
size 5368
|