Text Classification
Transformers
Safetensors
xlm-roberta
File size: 3,933 Bytes
0344b03
 
2b05e54
 
 
 
 
 
 
 
 
 
 
 
 
0344b03
2b05e54
0344b03
2b05e54
0344b03
 
 
 
 
2b05e54
0344b03
2b05e54
 
 
 
0344b03
2b05e54
0344b03
2b05e54
 
0344b03
 
 
 
2b05e54
1a0bb00
2b05e54
 
 
1a0bb00
 
2b05e54
 
 
 
 
 
 
1a0bb00
 
2b05e54
 
 
1a0bb00
 
2b05e54
0344b03
 
 
 
 
2b05e54
 
 
 
0344b03
 
 
2b05e54
 
 
 
0344b03
 
2b05e54
 
 
 
 
 
 
 
 
0344b03
2b05e54
 
 
0344b03
 
 
2b05e54
0344b03
2b05e54
0344b03
 
2b05e54
0344b03
2b05e54
 
 
 
 
0344b03
2b05e54
 
 
 
 
 
 
 
 
 
0344b03
2b05e54
 
 
 
 
0344b03
2b05e54
0344b03
2b05e54
 
 
 
 
 
0344b03
2b05e54
0344b03
 
 
2b05e54
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
library_name: transformers
datasets:
- s-nlp/EverGreen-Multilingual
language:
- ru
- en
- fr
- de
- he
- ar
- zh
base_model:
- intfloat/multilingual-e5-large-instruct
pipeline_tag: text-classification
---
# E5-EG-large

A lightweight multilingual model for temporal classification of questions, fine-tuned from [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct).

## Model Details

### Model Description

E5-EG-small (E5 EverGreen - Large) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency.

- **Model type:** Text Classification
- **Base model:** [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-large-instruct)
- **Language(s):** Russian, English, French, German, Hebrew, Arabic, Chinese
- **License:** MIT

### Model Sources

- **Repository:** [GitHub](https://github.com/s-nlp/Evergreen-classification)
- **Paper:** [Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA](https://arxiv.org/abs/2505.21115)


## How to Get Started with the Model

```python
from transformers import pipeline
import torch

# Load model and tokenizer
model_name = "s-nlp/E5-EverGreen-Multilingual-Large"
pipe = pipeline("text-classification", model_name)

# Batch classification example
questions = [
    "What is the capital of France?",
    "Who won the latest World Cup?",
    "What is the speed of light?",
    "What is the current Bitcoin price?"
    "How old is Elon Musk",
    "How old was Leo Tolstoy when he died?"
]

# Classify
results = pipe(questions)

```

## Training Details

### Training Data

Same multilingual dataset as E5-EG-small:
- ~4,000 questions per language
- Balanced class distribution
- Augmented with synthetic and translated data

### Training Procedure

#### Preprocessing
- Identical to E5-EG-small
- Maximum sequence length: 64 tokens
- Multilingual tokenization

#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
- **Epochs:** 10 
- **Batch size:** 32 
- **Learning rate:** 5e-05 
- **Warmup steps:** 300
- **Weight decay:** 0.01
- **Optimizer:** AdamW
- **Loss function:** Focal Loss (γ=2.0, α=0.25) with class weighting
- **Gradient accumulation steps:** 1

#### Hardware
- **GPUs:** Single NVIDIA V100
- **Training time:** ~8 hours

## Evaluation

### Testing Data

Same test sets as E5-EG-large (2100 samples per language).


### Metrics

#### Overall Performance
| Metric | Score |
|--------|-------|
| Overall F1 | 0.89 |
| Overall Accuracy | 0.88 |

#### Per-Language F1 Scores
| Language | F1 Score |
|----------|----------|
| English | 0.92 |
| Chinese | 0.91 |
| French | 0.90 |
| German | 0.89 |
| Russian | 0.88 |
| Hebrew | 0.87 |
| Arabic | 0.86 |

#### Class-wise Performance
| Class | Precision | Recall | F1 |
|-------|-----------|--------|-----|
| Immutable | 0.87 | 0.90 | 0.88 |
| Mutable | 0.90 | 0.87 | 0.88 |

### Model Comparison

| Model | Parameters | Overall F1 | Inference Time (ms) |
|-------|------------|------------|---------------------|
| E5-EG-large | 560M | 0.89 | 45 |
| E5-EG-small | 118M | 0.85 | 12 |
| mDeBERTa-base | 278M | 0.87 | 28 |
| mBERT | 177M | 0.85 | 20 |

## Citation

**BibTeX:**

```bibtex
@misc{pletenev2025truetomorrowmultilingualevergreen,
      title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA}, 
      author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii},
      year={2025},
      eprint={2505.21115},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.21115}, 
}
```