# BERT-Based Language Classification Model This repository contains a fine-tuned BERT-based model for classifying text into different languages. The model is designed to identify the language of a given sentence and has been trained using the Hugging Face Transformers library. It supports post-training dynamic quantization for optimized performance in deployment environments. --- ## Model Details - **Model Name:** BERT Base for Language Classification - **Model Architecture:** BERT Base - **Task:** Language Identification - **Dataset:** Custom Dataset with multilingual text samples - **Quantization:** Dynamic Quantization (INT8) - **Fine-tuning Framework:** Hugging Face Transformers --- ## Usage ### Installation ```bash pip install transformers torch ``` ### Loading the Fine-tuned Model ```python from transformers import pipeline # Load the model and tokenizer from saved directory classifier = pipeline("text-classification", model="./saved_model", tokenizer="./saved_model") # Example input text = "Bonjour, comment allez-vous?" # Get prediction prediction = classifier(text) print(f"Prediction: {prediction}") ``` --- ## Saving and Testing the Model ### Saving ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer model_checkpoint = "bert-base-uncased" # or your fine-tuned model path tokenizer = AutoTokenizer.from_pretrained(model_checkpoint) model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint) # Save model and tokenizer model.save_pretrained("./saved_model") tokenizer.save_pretrained("./saved_model") ``` ### Testing ```python from transformers import pipeline classifier = pipeline("text-classification", model="./saved_model", tokenizer="./saved_model") text = "Ceci est un exemple de texte." print(classifier(text)) ``` --- ## Quantization ### Apply Dynamic Quantization ```python import torch from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("./saved_model") # Apply dynamic quantization quantized_model = torch.quantization.quantize_dynamic( model, {torch.nn.Linear}, dtype=torch.qint8 ) # Save quantized model quantized_model.save_pretrained("./quantized_model") ``` ### Load and Test Quantized Model ```python from transformers import AutoTokenizer, pipeline from transformers import AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("./saved_model") quantized_model = AutoModelForSequenceClassification.from_pretrained("./quantized_model") classifier = pipeline("text-classification", model=quantized_model, tokenizer=tokenizer) text = "Hola, ¿cómo estás?" print(classifier(text)) ``` --- ## Repository Structure ``` . ├── saved_model/ # Fine-tuned Model ├── quantized_model/ # Quantized Model ├── language-clasification.ipynb ├── README.md # Documentation ``` --- ## Limitations - The model performance may vary for low-resource or underrepresented languages in the training dataset. - Quantization may slightly reduce accuracy, but improves inference efficiency. --- ## Contributing Feel free to submit issues or pull requests to enhance performance, accuracy, or add new language support. ---