Text Classification
Transformers
Safetensors
bert
memyprokotow's picture
Update README.md
07444d5 verified
---
library_name: transformers
datasets:
- s-nlp/EverGreen-Multilingual
language:
- ru
- en
- fr
- de
- he
- ar
- zh
base_model:
- intfloat/multilingual-e5-small
pipeline_tag: text-classification
---
# E5-EG-small
A lightweight multilingual model for temporal classification of questions, fine-tuned from [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small).
## Model Details
### Model Description
E5-EG-small (E5 EverGreen - Small) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency.
- **Model type:** Text Classification
- **Base model:** [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small)
- **Language(s):** Russian, English, French, German, Hebrew, Arabic, Chinese
- **License:** MIT
### Model Sources
- **Repository:** [GitHub](https://github.com/s-nlp/Evergreen-classification)
- **Paper:** [Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA](https://arxiv.org/abs/2505.21115)
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import time
# Load model and tokenizer
model_name = "s-nlp/E5-EG-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# For optimal performance, use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
# Batch classification example
questions = [
"What is the capital of France?",
"Who won the latest World Cup?",
"What is the speed of light?",
"What is the current Bitcoin price?"
]
# Tokenize all questions
inputs = tokenizer(
questions,
return_tensors="pt",
padding=True,
truncation=True,
max_length=64
).to(device)
# Classify
start_time = time.time()
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_classes = torch.argmax(predictions, dim=-1)
inference_time = (time.time() - start_time) * 1000 # ms
# Display results
class_names = ["Immutable", "Mutable"]
for i, question in enumerate(questions):
print(f"Q: {question}")
print(f" Classification: {class_names[predicted_classes[i].item()]}")
print(f" Confidence: {predictions[i][predicted_classes[i]].item():.2f}")
print(f"\nTotal inference time: {inference_time:.2f}ms")
print(f"Average per question: {inference_time/len(questions):.2f}ms")
```
## Training Details
### Training Data
Same multilingual dataset as E5-EG-large:
- ~4,000 questions per language
- Balanced class distribution
- Augmented with synthetic and translated data
### Training Procedure
#### Preprocessing
- Identical to E5-EG-large
- Maximum sequence length: 64 tokens
- Multilingual tokenization
#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
- **Epochs:** 10
- **Batch size:** 32
- **Learning rate:** 5e-05
- **Warmup steps:** 300
- **Weight decay:** 0.01
- **Optimizer:** AdamW
- **Loss function:** Focal Loss (γ=2.0, α=0.25) with class weighting
- **Gradient accumulation steps:** 1
#### Hardware
- **GPUs:** Single NVIDIA V100
- **Training time:** ~2 hours
## Evaluation
### Testing Data
Same test sets as E5-EG-large (2100 samples per language).
### Metrics
#### Per-Language F1 Scores
| Language | F1 Score | Δ vs Large |
|----------|----------|------------|
| English | 0.88 | -0.04 |
| Chinese | 0.87 | -0.04 |
| French | 0.86 | -0.04 |
| German | 0.85 | -0.04 |
| Russian | 0.84 | -0.04 |
| Hebrew | 0.83 | -0.04 |
| Arabic | 0.82 | -0.04 |
#### Class-wise Performance
| Class | Precision | Recall | F1 |
|-------|-----------|--------|-----|
| Immutable | 0.83 | 0.86 | 0.84 |
| Mutable | 0.86 | 0.83 | 0.84 |
### Efficiency Metrics
| Metric | E5-EG-small | E5-EG-large | Improvement |
|--------|-------------|-------------|-------------|
| Parameters | 118M | 560M | 4.7x smaller |
| Model Size (MB) | 471 | 2,240 | 4.8x smaller |
| Inference Time (ms) | 12 | 45 | 3.8x faster |
| Memory Usage (GB) | 0.8 | 3.2 | 4x less |
| Throughput (samples/sec) | 83 | 22 | 3.8x higher |
## Citation
**BibTeX:**
```bibtex
@misc{pletenev2025truetomorrowmultilingualevergreen,
title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA},
author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii},
year={2025},
eprint={2505.21115},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.21115},
}
```