Johnson8187
/

Vision_or_not

Text Classification

Inference Endpoints

Model card Files Files and versions Community

Vision_or_not / README.md

Johnson8187's picture

Update README.md

c9afc00 verified 3 months ago

|

history blame contribute delete

3.22 kB

	---
	license: mit
	language:
	- zh
	- en
	base_model:
	- joeddav/xlm-roberta-large-xnli
	pipeline_tag: text-classification
	library_name: transformers
	---
	# Vision_or_not: A Multimodal Text Classification Model

	Vision_or_not is a text classification model designed to determine whether a given sentence requires visual processing or not. This model is part of a multimodal framework, enabling efficient analysis of text and its potential need for visual processing, useful in applications like visual question answering (VQA) and other AI systems that require understanding both textual and visual content.

	# Model Overview

	This model classifies sentences into two categories:

	Requires Visual Processing (1): The sentence contains content that necessitates additional visual information for full understanding.
	Does Not Require Visual Processing (0): The sentence is self-contained and can be processed without any visual input.

	The model is fine-tuned for sequence classification tasks and provides a straightforward interface to make predictions.

	# Fine-Tuning Information
	This model is fine-tuned based on the mDeBERTa-v3-base-mnli-xn model, which is a multilingual version of DeBERTa (Decoding-enhanced BERT with disentangled attention). The fine-tuning data used is primarily in Traditional Chinese, which makes the model well-suited for processing texts in this language. However, the model has been tested and can also perform well with English inputs.

	Base Model: [joeddav/xlm-roberta-large-xnli](https://huggingface.co/joeddav/xlm-roberta-large-xnli)
	Fine-Tuning Data: Traditional Chinese text data

	# Quick Start

	To use the Vision_or_not model, you will need to install the following Python libraries:
	```
	pip install transformers torch
	```

	To use the model for making predictions, simply load the model and tokenizer, then pass your text to the prediction function. Below is an example code for usage:
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	label_mapping = {
	0: "No need for visual processing",
	1: "Requires visual processing",
	}

	def predict_emotion(text, model_path="Johnson8187/Vision_or_not"):
	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)

	# Tokenize the input text
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)

	# Perform the prediction
	with torch.no_grad():
	outputs = model(**inputs)

	# Get predicted class
	predicted_class = torch.argmax(outputs.logits).item()
	predicted_label = label_mapping[predicted_class]

	return predicted_label

	if __name__ == "__main__":
	# Example usage
	test_texts = [
	"Hello, how are you?",
	]

	for text in test_texts:
	prediction = predict_emotion(text)
	print(f"Text: {text}")
	print(f"Prediction: {prediction}\n")

	```

	# Example Output

	For the input text "Hello, how are you?", the model might output:
	```
	Text: Hello, how are you?
	Prediction: No need for visual processing
	```