E5-EverGreen-Multilingual-Small / README.md

Update README.md

07444d5 verified 8 days ago

4.96 kB

	---
	library_name: transformers
	datasets:
	- s-nlp/EverGreen-Multilingual
	language:
	- ru
	- en
	- fr
	- de
	- he
	- ar
	- zh
	base_model:
	- intfloat/multilingual-e5-small
	pipeline_tag: text-classification
	---
	# E5-EG-small

	A lightweight multilingual model for temporal classification of questions, fine-tuned from [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small).

	## Model Details

	### Model Description

	E5-EG-small (E5 EverGreen - Small) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency.

	- Model type: Text Classification
	- Base model: [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small)
	- Language(s): Russian, English, French, German, Hebrew, Arabic, Chinese
	- License: MIT

	### Model Sources

	- Repository: [GitHub](https://github.com/s-nlp/Evergreen-classification)
	- Paper: [Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA](https://arxiv.org/abs/2505.21115)


	## How to Get Started with the Model

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch
	import time

	# Load model and tokenizer
	model_name = "s-nlp/E5-EG-small"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# For optimal performance, use GPU if available
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = model.to(device)
	model.eval()

	# Batch classification example
	questions = [
	"What is the capital of France?",
	"Who won the latest World Cup?",
	"What is the speed of light?",
	"What is the current Bitcoin price?"
	]

	# Tokenize all questions
	inputs = tokenizer(
	questions,
	return_tensors="pt",
	padding=True,
	truncation=True,
	max_length=64
	).to(device)

	# Classify
	start_time = time.time()
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_classes = torch.argmax(predictions, dim=-1)

	inference_time = (time.time() - start_time) * 1000 # ms

	# Display results
	class_names = ["Immutable", "Mutable"]
	for i, question in enumerate(questions):
	print(f"Q: {question}")
	print(f" Classification: {class_names[predicted_classes[i].item()]}")
	print(f" Confidence: {predictions[i][predicted_classes[i]].item():.2f}")

	print(f"\nTotal inference time: {inference_time:.2f}ms")
	print(f"Average per question: {inference_time/len(questions):.2f}ms")
	```

	## Training Details

	### Training Data

	Same multilingual dataset as E5-EG-large:
	- ~4,000 questions per language
	- Balanced class distribution
	- Augmented with synthetic and translated data

	### Training Procedure

	#### Preprocessing
	- Identical to E5-EG-large
	- Maximum sequence length: 64 tokens
	- Multilingual tokenization

	#### Training Hyperparameters
	- Training regime: fp16 mixed precision
	- Epochs: 10
	- Batch size: 32
	- Learning rate: 5e-05
	- Warmup steps: 300
	- Weight decay: 0.01
	- Optimizer: AdamW
	- Loss function: Focal Loss (γ=2.0, α=0.25) with class weighting
	- Gradient accumulation steps: 1

	#### Hardware
	- GPUs: Single NVIDIA V100
	- Training time: ~2 hours

	## Evaluation

	### Testing Data

	Same test sets as E5-EG-large (2100 samples per language).

	### Metrics

	#### Per-Language F1 Scores
	\| Language \| F1 Score \| Δ vs Large \|
	\|----------\|----------\|------------\|
	\| English \| 0.88 \| -0.04 \|
	\| Chinese \| 0.87 \| -0.04 \|
	\| French \| 0.86 \| -0.04 \|
	\| German \| 0.85 \| -0.04 \|
	\| Russian \| 0.84 \| -0.04 \|
	\| Hebrew \| 0.83 \| -0.04 \|
	\| Arabic \| 0.82 \| -0.04 \|

	#### Class-wise Performance
	\| Class \| Precision \| Recall \| F1 \|
	\|-------\|-----------\|--------\|-----\|
	\| Immutable \| 0.83 \| 0.86 \| 0.84 \|
	\| Mutable \| 0.86 \| 0.83 \| 0.84 \|

	### Efficiency Metrics

	\| Metric \| E5-EG-small \| E5-EG-large \| Improvement \|
	\|--------\|-------------\|-------------\|-------------\|
	\| Parameters \| 118M \| 560M \| 4.7x smaller \|
	\| Model Size (MB) \| 471 \| 2,240 \| 4.8x smaller \|
	\| Inference Time (ms) \| 12 \| 45 \| 3.8x faster \|
	\| Memory Usage (GB) \| 0.8 \| 3.2 \| 4x less \|
	\| Throughput (samples/sec) \| 83 \| 22 \| 3.8x higher \|

	## Citation

	BibTeX:

	```bibtex
	@misc{pletenev2025truetomorrowmultilingualevergreen,
	title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA},
	author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii},
	year={2025},
	eprint={2505.21115},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2505.21115},
	}
	```