|
--- |
|
library_name: transformers |
|
datasets: |
|
- s-nlp/EverGreen-Multilingual |
|
language: |
|
- ru |
|
- en |
|
- fr |
|
- de |
|
- he |
|
- ar |
|
- zh |
|
base_model: |
|
- intfloat/multilingual-e5-small |
|
pipeline_tag: text-classification |
|
--- |
|
# E5-EG-small |
|
|
|
A lightweight multilingual model for temporal classification of questions, fine-tuned from [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
E5-EG-small (E5 EverGreen - Small) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency. |
|
|
|
- **Model type:** Text Classification |
|
- **Base model:** [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) |
|
- **Language(s):** Russian, English, French, German, Hebrew, Arabic, Chinese |
|
- **License:** MIT |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [GitHub](https://github.com/s-nlp/Evergreen-classification) |
|
- **Paper:** [Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA](https://arxiv.org/abs/2505.21115) |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
import time |
|
|
|
# Load model and tokenizer |
|
model_name = "s-nlp/E5-EG-small" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
# For optimal performance, use GPU if available |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model = model.to(device) |
|
model.eval() |
|
|
|
# Batch classification example |
|
questions = [ |
|
"What is the capital of France?", |
|
"Who won the latest World Cup?", |
|
"What is the speed of light?", |
|
"What is the current Bitcoin price?" |
|
] |
|
|
|
# Tokenize all questions |
|
inputs = tokenizer( |
|
questions, |
|
return_tensors="pt", |
|
padding=True, |
|
truncation=True, |
|
max_length=64 |
|
).to(device) |
|
|
|
# Classify |
|
start_time = time.time() |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
predicted_classes = torch.argmax(predictions, dim=-1) |
|
|
|
inference_time = (time.time() - start_time) * 1000 # ms |
|
|
|
# Display results |
|
class_names = ["Immutable", "Mutable"] |
|
for i, question in enumerate(questions): |
|
print(f"Q: {question}") |
|
print(f" Classification: {class_names[predicted_classes[i].item()]}") |
|
print(f" Confidence: {predictions[i][predicted_classes[i]].item():.2f}") |
|
|
|
print(f"\nTotal inference time: {inference_time:.2f}ms") |
|
print(f"Average per question: {inference_time/len(questions):.2f}ms") |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Same multilingual dataset as E5-EG-large: |
|
- ~4,000 questions per language |
|
- Balanced class distribution |
|
- Augmented with synthetic and translated data |
|
|
|
### Training Procedure |
|
|
|
#### Preprocessing |
|
- Identical to E5-EG-large |
|
- Maximum sequence length: 64 tokens |
|
- Multilingual tokenization |
|
|
|
#### Training Hyperparameters |
|
- **Training regime:** fp16 mixed precision |
|
- **Epochs:** 10 |
|
- **Batch size:** 32 |
|
- **Learning rate:** 5e-05 |
|
- **Warmup steps:** 300 |
|
- **Weight decay:** 0.01 |
|
- **Optimizer:** AdamW |
|
- **Loss function:** Focal Loss (γ=2.0, α=0.25) with class weighting |
|
- **Gradient accumulation steps:** 1 |
|
|
|
#### Hardware |
|
- **GPUs:** Single NVIDIA V100 |
|
- **Training time:** ~2 hours |
|
|
|
## Evaluation |
|
|
|
### Testing Data |
|
|
|
Same test sets as E5-EG-large (2100 samples per language). |
|
|
|
### Metrics |
|
|
|
#### Per-Language F1 Scores |
|
| Language | F1 Score | Δ vs Large | |
|
|----------|----------|------------| |
|
| English | 0.88 | -0.04 | |
|
| Chinese | 0.87 | -0.04 | |
|
| French | 0.86 | -0.04 | |
|
| German | 0.85 | -0.04 | |
|
| Russian | 0.84 | -0.04 | |
|
| Hebrew | 0.83 | -0.04 | |
|
| Arabic | 0.82 | -0.04 | |
|
|
|
#### Class-wise Performance |
|
| Class | Precision | Recall | F1 | |
|
|-------|-----------|--------|-----| |
|
| Immutable | 0.83 | 0.86 | 0.84 | |
|
| Mutable | 0.86 | 0.83 | 0.84 | |
|
|
|
### Efficiency Metrics |
|
|
|
| Metric | E5-EG-small | E5-EG-large | Improvement | |
|
|--------|-------------|-------------|-------------| |
|
| Parameters | 118M | 560M | 4.7x smaller | |
|
| Model Size (MB) | 471 | 2,240 | 4.8x smaller | |
|
| Inference Time (ms) | 12 | 45 | 3.8x faster | |
|
| Memory Usage (GB) | 0.8 | 3.2 | 4x less | |
|
| Throughput (samples/sec) | 83 | 22 | 3.8x higher | |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@misc{pletenev2025truetomorrowmultilingualevergreen, |
|
title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA}, |
|
author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii}, |
|
year={2025}, |
|
eprint={2505.21115}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2505.21115}, |
|
} |
|
``` |
|
|