Johnson8187 commited on
Commit
9f6f93f
·
verified ·
1 Parent(s): 8b14bba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -3
README.md CHANGED
@@ -1,3 +1,67 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # Vision_or_not: A Multimodal Text Classification Model
5
+
6
+ Vision_or_not is a text classification model designed to determine whether a given sentence requires visual processing or not. This model is part of a multimodal framework, enabling efficient analysis of text and its potential need for visual processing, useful in applications like visual question answering (VQA) and other AI systems that require understanding both textual and visual content.
7
+
8
+ # Model Overview
9
+
10
+ This model classifies sentences into two categories:
11
+
12
+ Requires Visual Processing (1): The sentence contains content that necessitates additional visual information for full understanding.
13
+ Does Not Require Visual Processing (0): The sentence is self-contained and can be processed without any visual input.
14
+
15
+ The model is fine-tuned for sequence classification tasks and provides a straightforward interface to make predictions.
16
+
17
+ # Quick Start
18
+
19
+ To use the Vision_or_not model, you will need to install the following Python libraries:
20
+ `
21
+ pip install transformers torch
22
+ `
23
+
24
+ To use the model for making predictions, simply load the model and tokenizer, then pass your text to the prediction function. Below is an example code for usage:
25
+ `
26
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
27
+ import torch
28
+
29
+ # Set up the device (GPU if available, otherwise CPU)
30
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
31
+
32
+ # Label mapping dictionary
33
+ label_mapping = {
34
+ 0: "No need for visual processing",
35
+ 1: "Requires visual processing",
36
+ }
37
+
38
+ def predict_emotion(text, model_path="Johnson8187/Vision_or_not"):
39
+ # Load model and tokenizer
40
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
41
+ model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
42
+
43
+ # Tokenize the input text
44
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
45
+
46
+ # Perform the prediction
47
+ with torch.no_grad():
48
+ outputs = model(**inputs)
49
+
50
+ # Get predicted class
51
+ predicted_class = torch.argmax(outputs.logits).item()
52
+ predicted_label = label_mapping[predicted_class]
53
+
54
+ return predicted_label
55
+
56
+ if __name__ == "__main__":
57
+ # Example usage
58
+ test_texts = [
59
+ "Hello, how are you?",
60
+ ]
61
+
62
+ for text in test_texts:
63
+ prediction = predict_emotion(text)
64
+ print(f"Text: {text}")
65
+ print(f"Prediction: {prediction}\n")
66
+
67
+ `