Update README.md
Browse files
README.md
CHANGED
@@ -6,8 +6,15 @@ datasets:
|
|
6 |
- toshi456/llava_pretrain_blip_laion_cc_sbu_558k_ja
|
7 |
base_model: mylesgoose/Meta-Llama-3.1-8B-Instruct-goose-abliterated
|
8 |
---
|
|
|
|
|
9 |
Install https://github.com/LLaVA-VL/LLaVA-NeXT/tree/main prior to running below. Thanks to that team for their fantastic work.
|
|
|
10 |
you can test with something like this.
|
|
|
|
|
|
|
|
|
11 |
data:image/s3,"s3://crabby-images/9d8b2/9d8b2f8c771226bdfa40bed3731077dc66066852" alt="image/jpeg"
|
12 |
|
13 |
""""
|
@@ -30,44 +37,35 @@ tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained
|
|
30 |
model.eval()
|
31 |
model.tie_weights()
|
32 |
|
33 |
-
|
|
|
34 |
image_tensor = process_images([image], image_processor, model.config)
|
35 |
image_tensor = [_image.to(dtype=torch.float16, device=device) for _image in image_tensor]
|
36 |
-
|
|
|
37 |
question = DEFAULT_IMAGE_TOKEN + "\nWhat is shown in this image? Is there anything strange about this image? Is this normal behaviour"
|
38 |
conv = copy.deepcopy(conv_templates[conv_template])
|
39 |
conv.append_message(conv.roles[0], question)
|
40 |
conv.append_message(conv.roles[1], None)
|
41 |
prompt_question = conv.get_prompt()
|
42 |
|
43 |
-
|
44 |
-
input_ids, attention_mask = tokenizer_image_token(
|
45 |
-
prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt"
|
46 |
-
)
|
47 |
-
input_ids = input_ids.unsqueeze(0).to(device)
|
48 |
image_sizes = [image.size]
|
49 |
|
50 |
-
# If attention_mask is not returned, create it manually (adjust as needed)
|
51 |
-
if attention_mask is None:
|
52 |
-
attention_mask = torch.ones_like(input_ids)
|
53 |
-
attention_mask[:, :IMAGE_TOKEN_INDEX] = 1
|
54 |
-
attention_mask[:, IMAGE_TOKEN_INDEX+1:] = 1
|
55 |
|
56 |
cont = model.generate(
|
57 |
input_ids,
|
58 |
images=image_tensor,
|
59 |
image_sizes=image_sizes,
|
60 |
-
attention_mask=attention_mask,
|
61 |
do_sample=True,
|
62 |
temperature=0.9,
|
63 |
max_new_tokens=256,
|
64 |
)
|
65 |
text_outputs = tokenizer.batch_decode(cont, skip_special_tokens=True)
|
66 |
print(text_outputs)
|
67 |
-
|
68 |
"""
|
69 |
|
70 |
-
|
71 |
LLM_VERSION="mylesgoose/Meta-Llama-3.1-8B-Instruct-goose-abliterated"
|
72 |
LLM_VERSION_CLEAN="${LLM_VERSION//\//_}"
|
73 |
VISION_MODEL_VERSION="google/siglip-so400m-patch14-384"
|
|
|
6 |
- toshi456/llava_pretrain_blip_laion_cc_sbu_558k_ja
|
7 |
base_model: mylesgoose/Meta-Llama-3.1-8B-Instruct-goose-abliterated
|
8 |
---
|
9 |
+
I Trained the llama 3.1 model integrating the google vison encoder. This is a base model It has not been trained on images the model itself, this modeel would be useefull to train on your own image datasets.
|
10 |
+
It has only the encoder integrated into it. It has not been trained on any closed source datasets. Other than what is listed, for some reason its listing the japanese verison of the dataset above..
|
11 |
Install https://github.com/LLaVA-VL/LLaVA-NeXT/tree/main prior to running below. Thanks to that team for their fantastic work.
|
12 |
+
|
13 |
you can test with something like this.
|
14 |
+
download this image and place into the path below in script or use your own image.
|
15 |
+
["The image shows a man in a yellow shirt and shorts sitting on the hood of a car with a clothes iron and ironing board in the back.\nThis is a common sight to see in many cities, especially in major cities like new york, where ironing clothes is a common activity for people to carry out while they are at home.\nHowever, this image is a little unusual because the man is ironing clothes on top of the car.\nIt is not unusual to see people ironing clothes while driving, but this is a rare sight.\nThis image is also unusual because the person is sitting on the hood of the car with their clothes in the back, and it seems that they are using an ironing board.\nThe man in the image is wearing a yellow shirt and shorts, and his pants and shirt appear to be in a bag on the hood.\nThe man is sitting on the car with the ironing board, which has a steamer, an ironing board, and clothes.\nThis image is unusual because it is a picture of a man in the middle of ironing clothes,
|
16 |
+
and it's also unusual because the car is driving down a street.\nThe man is using an ironing board with a steamer and clothes, and is sitting on the hood of the"]
|
17 |
+
|
18 |
data:image/s3,"s3://crabby-images/9d8b2/9d8b2f8c771226bdfa40bed3731077dc66066852" alt="image/jpeg"
|
19 |
|
20 |
""""
|
|
|
37 |
model.eval()
|
38 |
model.tie_weights()
|
39 |
|
40 |
+
|
41 |
+
image = Image.open("/home/myles/Desktop/extreme_ironing.jpg")
|
42 |
image_tensor = process_images([image], image_processor, model.config)
|
43 |
image_tensor = [_image.to(dtype=torch.float16, device=device) for _image in image_tensor]
|
44 |
+
|
45 |
+
conv_template = "llava_llama_3" # Make sure you use correct chat template for different models
|
46 |
question = DEFAULT_IMAGE_TOKEN + "\nWhat is shown in this image? Is there anything strange about this image? Is this normal behaviour"
|
47 |
conv = copy.deepcopy(conv_templates[conv_template])
|
48 |
conv.append_message(conv.roles[0], question)
|
49 |
conv.append_message(conv.roles[1], None)
|
50 |
prompt_question = conv.get_prompt()
|
51 |
|
52 |
+
input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(device)
|
|
|
|
|
|
|
|
|
53 |
image_sizes = [image.size]
|
54 |
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
cont = model.generate(
|
57 |
input_ids,
|
58 |
images=image_tensor,
|
59 |
image_sizes=image_sizes,
|
|
|
60 |
do_sample=True,
|
61 |
temperature=0.9,
|
62 |
max_new_tokens=256,
|
63 |
)
|
64 |
text_outputs = tokenizer.batch_decode(cont, skip_special_tokens=True)
|
65 |
print(text_outputs)
|
|
|
66 |
"""
|
67 |
|
68 |
+
|
69 |
LLM_VERSION="mylesgoose/Meta-Llama-3.1-8B-Instruct-goose-abliterated"
|
70 |
LLM_VERSION_CLEAN="${LLM_VERSION//\//_}"
|
71 |
VISION_MODEL_VERSION="google/siglip-so400m-patch14-384"
|