metadata
library_name: transformers
datasets:
- s-nlp/EverGreen-Multilingual
language:
- ru
- en
- fr
- de
- he
- ar
- zh
base_model:
- intfloat/multilingual-e5-small
pipeline_tag: text-classification
E5-EG-small
A lightweight multilingual model for temporal classification of questions, fine-tuned from intfloat/multilingual-e5-small.
Model Details
Model Description
E5-EG-small (E5 EverGreen - Small) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency.
- Model type: Text Classification
- Base model: intfloat/multilingual-e5-small
- Language(s): Russian, English, French, German, Hebrew, Arabic, Chinese
- License: MIT
Model Sources
- Repository: GitHub
- Paper: Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import time
# Load model and tokenizer
model_name = "s-nlp/E5-EG-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# For optimal performance, use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
# Batch classification example
questions = [
"What is the capital of France?",
"Who won the latest World Cup?",
"What is the speed of light?",
"What is the current Bitcoin price?"
]
# Tokenize all questions
inputs = tokenizer(
questions,
return_tensors="pt",
padding=True,
truncation=True,
max_length=64
).to(device)
# Classify
start_time = time.time()
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_classes = torch.argmax(predictions, dim=-1)
inference_time = (time.time() - start_time) * 1000 # ms
# Display results
class_names = ["Immutable", "Mutable"]
for i, question in enumerate(questions):
print(f"Q: {question}")
print(f" Classification: {class_names[predicted_classes[i].item()]}")
print(f" Confidence: {predictions[i][predicted_classes[i]].item():.2f}")
print(f"\nTotal inference time: {inference_time:.2f}ms")
print(f"Average per question: {inference_time/len(questions):.2f}ms")
Training Details
Training Data
Same multilingual dataset as E5-EG-large:
- ~4,000 questions per language
- Balanced class distribution
- Augmented with synthetic and translated data
Training Procedure
Preprocessing
- Identical to E5-EG-large
- Maximum sequence length: 64 tokens
- Multilingual tokenization
Training Hyperparameters
- Training regime: fp16 mixed precision
- Epochs: 10
- Batch size: 32
- Learning rate: 5e-05
- Warmup steps: 300
- Weight decay: 0.01
- Optimizer: AdamW
- Loss function: Focal Loss (γ=2.0, α=0.25) with class weighting
- Gradient accumulation steps: 1
Hardware
- GPUs: Single NVIDIA V100
- Training time: ~2 hours
Evaluation
Testing Data
Same test sets as E5-EG-large (2100 samples per language).
Metrics
Per-Language F1 Scores
Language | F1 Score | Δ vs Large |
---|---|---|
English | 0.88 | -0.04 |
Chinese | 0.87 | -0.04 |
French | 0.86 | -0.04 |
German | 0.85 | -0.04 |
Russian | 0.84 | -0.04 |
Hebrew | 0.83 | -0.04 |
Arabic | 0.82 | -0.04 |
Class-wise Performance
Class | Precision | Recall | F1 |
---|---|---|---|
Immutable | 0.83 | 0.86 | 0.84 |
Mutable | 0.86 | 0.83 | 0.84 |
Efficiency Metrics
Metric | E5-EG-small | E5-EG-large | Improvement |
---|---|---|---|
Parameters | 118M | 560M | 4.7x smaller |
Model Size (MB) | 471 | 2,240 | 4.8x smaller |
Inference Time (ms) | 12 | 45 | 3.8x faster |
Memory Usage (GB) | 0.8 | 3.2 | 4x less |
Throughput (samples/sec) | 83 | 22 | 3.8x higher |
Citation
BibTeX:
@misc{pletenev2025truetomorrowmultilingualevergreen,
title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA},
author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii},
year={2025},
eprint={2505.21115},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.21115},
}