--- title: Submission Template emoji: 🔥 colorFrom: yellow colorTo: green sdk: docker pinned: false --- # Fine-tuned Emotion Model Checkpoint for Climate Disinformation Classification ## Model Description This is a lightwight RoBERTa model fine-tuned for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor, randomly assigning labels to text inputs without any learning. ### Intended Use - **Primary intended uses**: Baseline comparison for climate disinformation classification models - **Primary intended users**: Researchers and developers participating in the Frugal AI Challenge - **Out-of-scope use cases**: Not intended for production use or real-world classification tasks ## Training Data The model uses the QuotaClimat/frugalaichallenge-text-train dataset: - Size: ~6000 examples - Split: 80% train, 20% test - 8 categories of climate disinformation claims ### Labels 0. No relevant claim detected 1. Global warming is not happening 2. Not caused by humans 3. Not bad or beneficial 4. Solutions harmful/unnecessary 5. Science is unreliable 6. Proponents are biased 7. Fossil fuels are needed ## Performance This model is a fine-tuned version of michellejieli/emotion_text_classifier on the provided dataset for the competition. It achieves the following results on the evaluation set: Loss: 0.2828 F1: 0.7879 Roc Auc: nan Hamming: 0.1039 Model description This model uses a lightweight RoBERTa checkpoint that has been fine-tuned on evaluating emotions to further be trained on recognizing climate disinformation. ## Training procedure Used a binarizer to tokenize the text and found a seemingly suitable model checkpoint as a good place to start! ## Training hyperparameters The following hyperparameters were used during training: learning_rate: 5e-05 train_batch_size: 8 eval_batch_size: 8 seed: 42 optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments lr_scheduler_type: linear num_epochs: 4 Training results Framework versions Transformers 4.47.1 Pytorch 2.5.1+cu121 Datasets 3.2.0 Tokenizers 0.21.0 ### Metrics - **Accuracy**: ~12.5% (random chance with 8 classes) - **Environmental Impact**: - Emissions tracked in gCO2eq - Energy consumption tracked in Wh ### Model Architecture The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline. ## Environmental Impact Environmental impact is tracked using CodeCarbon, measuring: - Carbon emissions during inference - Energy consumption during inference This tracking helps establish a baseline for the environmental impact of model deployment and inference. ## Limitations - Makes completely random predictions - No learning or pattern recognition - No consideration of input text - Serves only as a baseline reference - Not suitable for any real-world applications ## Ethical Considerations - Dataset contains sensitive topics related to climate disinformation - Model makes random predictions and should not be used for actual classification - Environmental impact is tracked to promote awareness of AI's carbon footprint ```