SNHE commited on
Commit
62d4bd3
·
1 Parent(s): 1cad373

Update Model Card

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -118,15 +118,15 @@ LLM-as-a-Judge is an effective method for evaluating data quality. Specifically,
118
  ##### Complexity-based selection
119
  Besides low data quality, excessive data complexity can significantly impair learning efficacy. This phenomenon is particularly pronounced in smaller-scale language models, which demonstrate limited capacity to effectively process and internalize highly complex patterns. Consequently, we develop an approach to identify and filter training samples that exceed the model's optimal learning capacity. The complexity of each instruction is measured using the following equation:
120
 
121
- ```math
122
  \text{C}(x,y) = \lambda_1 \cdot L_{\text{length}} + \lambda_2 \cdot \text{Loss}_{\text{it}}(x, y),
123
- ```
124
 
125
  where $\lambda_1$, $\lambda_2$ are hyperparameters, $L_{\text{length}}$ denotes the length of the instruction, and $\text{Loss}_{\text{it}}(x, y)$ is the loss calculated by the base model:
126
 
127
- ```math
128
  \text{Loss}_{\text{it}}(x,y)=\sum\limits_{i=1}^{|y|} \log P(y_i|x,y_{1:i-1}),
129
- ```
130
  where $y_i$ represents the $i$-th token in the output $y$, and $y_{1:i-1}$ denotes the sequence up to the $i-1$ tokens.
131
  We implemente a complexity-based stratification protocol followed by selective pruning of samples exceeding empirically determined complexity thresholds.
132
 
 
118
  ##### Complexity-based selection
119
  Besides low data quality, excessive data complexity can significantly impair learning efficacy. This phenomenon is particularly pronounced in smaller-scale language models, which demonstrate limited capacity to effectively process and internalize highly complex patterns. Consequently, we develop an approach to identify and filter training samples that exceed the model's optimal learning capacity. The complexity of each instruction is measured using the following equation:
120
 
121
+ $$
122
  \text{C}(x,y) = \lambda_1 \cdot L_{\text{length}} + \lambda_2 \cdot \text{Loss}_{\text{it}}(x, y),
123
+ $$
124
 
125
  where $\lambda_1$, $\lambda_2$ are hyperparameters, $L_{\text{length}}$ denotes the length of the instruction, and $\text{Loss}_{\text{it}}(x, y)$ is the loss calculated by the base model:
126
 
127
+ $$
128
  \text{Loss}_{\text{it}}(x,y)=\sum\limits_{i=1}^{|y|} \log P(y_i|x,y_{1:i-1}),
129
+ $$
130
  where $y_i$ represents the $i$-th token in the output $y$, and $y_{1:i-1}$ denotes the sequence up to the $i-1$ tokens.
131
  We implemente a complexity-based stratification protocol followed by selective pruning of samples exceeding empirically determined complexity thresholds.
132