|
--- |
|
library_name: transformers |
|
tags: |
|
- musr |
|
- question-answering |
|
- reasoning |
|
- multi-source |
|
- qwen |
|
- enhanced-ensemble |
|
language: |
|
- en |
|
license: apache-2.0 |
|
metrics: |
|
- accuracy: 1.0 |
|
- confidence: 1.1167 |
|
- source_usage: 0.9972 |
|
datasets: |
|
- allenai/qasc |
|
--- |
|
|
|
# Model Card for ECE-PRYMMAL-0.5B-FT-EnhancedMUSREnsembleV3 |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Ce modèle est une version hautement optimisée de Qwen-0.5B, spécialement conçue pour exceller dans le raisonnement multi-source (MUSR). Il représente la troisième version de notre architecture d'ensemble améliorée, atteignant des performances exceptionnelles sur le benchmark MUSR. |
|
|
|
- **Developed by:** matouLeLoup |
|
- **Model type:** Auto-regressive language model |
|
- **Language(s):** English |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** Qwen/Qwen2-0.5B |
|
|
|
## Training and Evaluation |
|
|
|
### Training Data |
|
- Base model: Qwen-0.5B |
|
- Fine-tuning dataset: allenai/qasc |
|
|
|
### Evaluation Results |
|
Tested on 500 samples from QASC validation set: |
|
- Accuracy: 100% |
|
- Confidence: 1.1167 (±0.0171) |
|
- Source Usage: 99.72% |
|
- Response Length: 170.5 words (±22.8) |
|
- Reasoning Steps: 1.36 average |
|
|
|
Confidence Distribution: |
|
- >1.1 : 95.8% |
|
- 1.0-1.1 : 4.2% |
|
- <1.0 : 0% |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
Ce modèle est optimisé pour : |
|
- Questions-réponses multi-sources |
|
- Raisonnement logique |
|
- Analyse et synthèse de documents |
|
- Systèmes d'aide à la décision |
|
- Applications éducatives |
|
|
|
### How to Get Started |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("matouLeLoup/ECE-PRYMMAL-0.5B-FT-EnhancedMUSREnsembleV3") |
|
tokenizer = AutoTokenizer.from_pretrained("matouLeLoup/ECE-PRYMMAL-0.5B-FT-EnhancedMUSREnsembleV3") |
|
|
|
# Format de prompt optimal |
|
prompt = f"""Context: |
|
Fact 1: {fact1} |
|
Fact 2: {fact2} |
|
|
|
Question: {question} |
|
|
|
Choices: |
|
{choices} |
|
|
|
Instructions: |
|
1. Analyze both facts carefully |
|
2. Connect the information |
|
3. Choose the letter (A-H) that best answers the question |
|
4. Explain your reasoning |
|
|
|
Reasoned Answer:""" |
|
|
|
# Génération |
|
inputs = tokenizer(prompt, return_tensors="pt").to(device) |
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=150, |
|
num_beams=5, |
|
temperature=0.6, |
|
no_repeat_ngram_size=3 |
|
) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
# Training details |
|
Training Procedure |
|
Training Hyperparameters |
|
|
|
Learning rate: 2e-5 |
|
Batch size: 32 |
|
Weight decay: 0.1 |
|
Warmup steps: 0 |
|
Scheduler: polynomial |
|
Training regime: bf16 mixed precision |
|
|
|
# Evaluation Procedure |
|
Tested on 500 random samples from QASC validation set |
|
Evaluated for accuracy, confidence, and source usage |
|
Detailed analysis of reasoning steps and response quality |
|
|
|
# Limitations and Bias |
|
|
|
Optimisé spécifiquement pour le format MUSR |
|
Nécessite une structuration précise des prompts |
|
Conçu pour des questions à choix multiples avec raisonnement |
|
|
|
# Technical Specifications |
|
Base model: Qwen-0.5B |
|
Enhanced with optimized generation parameters |
|
Uses letter-based answer format (A-H) |
|
|
|
# Generation config |
|
generation_config = { |
|
"max_new_tokens": 150, |
|
"num_beams": 5, |
|
"temperature": 0.6, |
|
"do_sample": False, |
|
"length_penalty": 1.0, |
|
"no_repeat_ngram_size": 3 |
|
} |
|
|
|
@misc{PRYMMAL-EnhancedMUSREnsembleV3, |
|
author = {matouLeLoup}, |
|
title = {ECE-PRYMMAL-0.5B-FT-EnhancedMUSREnsembleV3}, |
|
year = {2024}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face Hub}, |
|
howpublished = {\url{https://huggingface.co/matouLeLoup/ECE-PRYMMAL-0.5B-FT-EnhancedMUSREnsembleV3}} |
|
} |
|
|
|
|