YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DistilBERT Model for Crop Recommendation Based on Environmental Parameters

This repository contains a fine-tuned DistilBERT model trained for crop recommendation using structured agricultural data. By converting numerical environmental features into text format, the model leverages transformer-based NLP techniques to classify the most suitable crop type.

🌾 Problem Statement

The goal is to recommend the best crop to cultivate based on parameters such as soil nutrients and weather conditions. Traditional ML models handle this as a tabular classification problem. Here, we explore the innovative approach of using NLP models (DistilBERT) on serialized tabular data.


πŸ“Š Dataset

  • Source: Crop Recommendation Dataset

  • Features:

    • N: Nitrogen content in soil
    • P: Phosphorus content in soil
    • K: Potassium content in soil
    • Temperature: in Celsius
    • Humidity: %
    • pH: Acidity of soil
    • Rainfall: mm
  • Target: Crop label (22 crop types)

The dataset is preprocessed by concatenating all numeric features into a single space-separated string, making it suitable for transformer-based tokenization.


🧠 Model Details

  • Architecture: DistilBERT
  • Tokenizer: DistilBertTokenizerFast
  • Model: DistilBertForSequenceClassification
  • Task Type: Multi-Class Classification (22 classes)

πŸ”§ Installation

pip install transformers datasets pandas scikit-learn torch

Loading the Model

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

# Load model and tokenizer
model_path = "model_fp32_dir"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_path)
model = DistilBertForSequenceClassification.from_pretrained(model_path)

# Sample input
sample_text = "90 42 43 20.879744 82.002744 6.502985 202.935536"
inputs = tokenizer(sample_text, return_tensors="pt")

# Predict
with torch.no_grad():
    outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()
print("Predicted class index:", predicted_class)

πŸ“ˆ Performance Metrics

  • Accuracy: 0.7636
  • Precision: 0.7738
  • Recall: 0.7636
  • F1 Score: 0.7343

πŸ‹οΈ Fine-Tuning Details

πŸ“š Dataset

The dataset is sourced from the publicly available Crop Recommendation Dataset. It consists of structured features such as:

  • Nitrogen (N)
  • Phosphorus (P)
  • Potassium (K)
  • Temperature (Β°C)
  • Humidity (%)
  • pH
  • Rainfall (mm)

All numerical features were converted into a single textual input string to be used with the DistilBERT tokenizer. Labels were factorized into class indices for training.

The dataset was split using an 80/20 ratio for training and testing.


πŸ”§ Training Configuration

  • Epochs: 3
  • Batch size: 8
  • Learning rate: 2e-5
  • Evaluation strategy: epoch
  • Model Base: DistilBERT (distilbert-base-uncased)
  • Framework: Hugging Face Transformers + PyTorch

πŸ”„ Quantization

Post-training quantization was applied using PyTorch’s half() precision (FP16).
This reduces the model size and speeds up inference with minimal impact on performance.

The quantized model can be loaded with:

model = DistilBertForSequenceClassification.from_pretrained("quantized_model_fp16", torch_dtype=torch.float16)

Repository Structure

.
β”œβ”€β”€ quantized-model/               # Contains the quantized model files
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ tokenizer_config.json
β”‚   β”œβ”€β”€ vocab.txt
β”‚   └── special_tokens_map.json
β”œβ”€β”€ README.md                      # Model documentation

Limitations

  • Uses text conversion of tabular data, which may miss deeper feature interactions.
  • Trained on a specific dataset; may not generalize to different regions or conditions.
  • FP16 quantization may slightly reduce accuracy in rare cases.

Contributing

Feel free to open issues or submit pull requests to improve the model or documentation.

Downloads last month
33
Safetensors
Model size
67M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support