mylesgoose commited on
Commit
c7574c0
·
verified ·
1 Parent(s): 833f89d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -1
README.md CHANGED
@@ -9,9 +9,13 @@ base_model: mylesgoose/Meta-Llama-3.1-8B-Instruct-goose-abliterated
9
  I Trained the llama 3.1 model integrating the google vison encoder. This is a base model It has not been trained on images the model itself, this modeel would be useefull to train on your own image datasets.
10
  It has only the encoder integrated into it. It has not been trained on any closed source datasets. Other than what is listed, for some reason its listing the japanese verison of the dataset above..
11
  Install https://github.com/LLaVA-VL/LLaVA-NeXT/tree/main prior to running below. Thanks to that team for their fantastic work.
 
 
12
 
13
  you can test with something like this.
14
  download this image and place into the path below in script or use your own image.
 
 
15
  ["The image shows a man in a yellow shirt and shorts sitting on the hood of a car with a clothes iron and ironing board in the back.\nThis is a common sight to see in many cities, especially in major cities like new york, where ironing clothes is a common activity for people to carry out while they are at home.\nHowever, this image is a little unusual because the man is ironing clothes on top of the car.\nIt is not unusual to see people ironing clothes while driving, but this is a rare sight.\nThis image is also unusual because the person is sitting on the hood of the car with their clothes in the back, and it seems that they are using an ironing board.\nThe man in the image is wearing a yellow shirt and shorts, and his pants and shirt appear to be in a bag on the hood.\nThe man is sitting on the car with the ironing board, which has a steamer, an ironing board, and clothes.\nThis image is unusual because it is a picture of a man in the middle of ironing clothes,
16
  and it's also unusual because the car is driving down a street.\nThe man is using an ironing board with a steamer and clothes, and is sitting on the hood of the"]
17
 
@@ -42,7 +46,7 @@ image = Image.open("/home/myles/Desktop/extreme_ironing.jpg")
42
  image_tensor = process_images([image], image_processor, model.config)
43
  image_tensor = [_image.to(dtype=torch.float16, device=device) for _image in image_tensor]
44
 
45
- conv_template = "llava_llama_3" # Make sure you use correct chat template for different models
46
  question = DEFAULT_IMAGE_TOKEN + "\nWhat is shown in this image? Is there anything strange about this image? Is this normal behaviour"
47
  conv = copy.deepcopy(conv_templates[conv_template])
48
  conv.append_message(conv.roles[0], question)
@@ -493,3 +497,79 @@ yt-dlp 2024.8.6
493
  zipp 3.20.1
494
  zss 1.2.0
495
  zstandard 0.23.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  I Trained the llama 3.1 model integrating the google vison encoder. This is a base model It has not been trained on images the model itself, this modeel would be useefull to train on your own image datasets.
10
  It has only the encoder integrated into it. It has not been trained on any closed source datasets. Other than what is listed, for some reason its listing the japanese verison of the dataset above..
11
  Install https://github.com/LLaVA-VL/LLaVA-NeXT/tree/main prior to running below. Thanks to that team for their fantastic work.
12
+ you should install as an editable install as you will need to modify the conversation.py file to point to this repo instead of the llama 3.0 repo for the tokenizer etc
13
+ pip install -e ".[train]"
14
 
15
  you can test with something like this.
16
  download this image and place into the path below in script or use your own image.
17
+
18
+ Models first ouput?:
19
  ["The image shows a man in a yellow shirt and shorts sitting on the hood of a car with a clothes iron and ironing board in the back.\nThis is a common sight to see in many cities, especially in major cities like new york, where ironing clothes is a common activity for people to carry out while they are at home.\nHowever, this image is a little unusual because the man is ironing clothes on top of the car.\nIt is not unusual to see people ironing clothes while driving, but this is a rare sight.\nThis image is also unusual because the person is sitting on the hood of the car with their clothes in the back, and it seems that they are using an ironing board.\nThe man in the image is wearing a yellow shirt and shorts, and his pants and shirt appear to be in a bag on the hood.\nThe man is sitting on the car with the ironing board, which has a steamer, an ironing board, and clothes.\nThis image is unusual because it is a picture of a man in the middle of ironing clothes,
20
  and it's also unusual because the car is driving down a street.\nThe man is using an ironing board with a steamer and clothes, and is sitting on the hood of the"]
21
 
 
46
  image_tensor = process_images([image], image_processor, model.config)
47
  image_tensor = [_image.to(dtype=torch.float16, device=device) for _image in image_tensor]
48
 
49
+ conv_template = "llava_llama_3" # Make sure you use correct chat template for different models, you will also need to modify the conversation.py file to point to this repo isntead of the 3.0 repo, you need transfomers versison above a certain one or you get tokenization error.
50
  question = DEFAULT_IMAGE_TOKEN + "\nWhat is shown in this image? Is there anything strange about this image? Is this normal behaviour"
51
  conv = copy.deepcopy(conv_templates[conv_template])
52
  conv.append_message(conv.roles[0], question)
 
497
  zipp 3.20.1
498
  zss 1.2.0
499
  zstandard 0.23.0
500
+
501
+ I Just tested a training run for this model using LLAVA next one vison repo above on a fresh install. from the pip list above dont install that tensorflow version.
502
+ To train on your own image datasets for your use case. you would need to first adjust thee conversation.py in the llava folder to this:
503
+
504
+ conv_llava_llama_3 = Conversation(
505
+ system="You are a helpful language and vision, AI. " "You are able to understand the visual content that the user provides, " "and assist the user with a variety of tasks using natural language.",
506
+ roles=("user", "assistant"),
507
+ version="llama_v3",
508
+ messages=[],
509
+ offset=0,
510
+ sep="<|eot_id|>",
511
+ sep_style=SeparatorStyle.LLAMA_3,
512
+ tokenizer_id="mylesgoose/Meta-Llama-3.1-8B-Instruct-goose-abliterated-pre-llava",
513
+ tokenizer=safe_load_tokenizer("mylesgoose/Meta-Llama-3.1-8B-Instruct-goose-abliterated-pre-llava"),
514
+ stop_token_ids=[128009],
515
+ )
516
+
517
+ And here is an example of a training script. You can replace the json dataset with the one you want to train your model on.
518
+ LLM_VERSION="mylesgoose/Meta-Llama-3.1-8B-Instruct-goose-abliterated-pre-llava"
519
+ LLM_VERSION_CLEAN="${LLM_VERSION//\//_}"
520
+ VISION_MODEL_VERSION="google/siglip-so400m-patch14-384"
521
+ VISION_MODEL_VERSION_CLEAN="${VISION_MODEL_VERSION//\//_}"
522
+
523
+ ############### Pretrain ################
524
+ PROMPT_VERSION=llava_llama_3
525
+
526
+ BASE_RUN_NAME="llavanext-${VISION_MODEL_VERSION_CLEAN}-${LLM_VERSION_CLEAN}-mlp2x_gelu-pretrain_blip558k_plain"
527
+ echo "BASE_RUN_NAME: ${BASE_RUN_NAME}"
528
+ PRE_RUN_NAME="${BASE_RUN_NAME}-synthdog_en"
529
+ CKPT_PATH=$LLM_VERSION
530
+
531
+ accelerate launch llava/train/train_mem.py \
532
+ --deepspeed scripts/zero3.json\
533
+ --model_name_or_path ${CKPT_PATH} \
534
+ --version ${PROMPT_VERSION} \
535
+ --data_path ./data/synthdog_en/synthdog_en_processed.json \
536
+ --image_folder ./data/synthdog_en \
537
+ --mm_tunable_parts="mm_vision_tower,mm_mlp_adapter,mm_language_model" \
538
+ --mm_vision_tower_lr=2e-6 \
539
+ --vision_tower ${VISION_MODEL_VERSION} \
540
+ --mm_projector_type mlp2x_gelu \
541
+ --mm_vision_select_layer -2 \
542
+ --mm_use_im_start_end False \
543
+ --mm_use_im_patch_token False \
544
+ --group_by_modality_length True \
545
+ --image_aspect_ratio anyres \
546
+ --image_grid_pinpoints "[(384, 768), (768, 384), (768, 768), (1152, 384), (384, 1152)]" \
547
+ --mm_patch_merge_type spatial_unpad \
548
+ --bf16 True \
549
+ --output_dir "./checkpoints/${PRE_RUN_NAME}" \
550
+ --num_train_epochs 1 \
551
+ --per_device_train_batch_size 6 \
552
+ --per_device_eval_batch_size 0 \
553
+ --gradient_accumulation_steps 6 \
554
+ --evaluation_strategy "no" \
555
+ --save_strategy "steps" \
556
+ --save_steps 5 \
557
+ --save_total_limit 2 \
558
+ --learning_rate 1e-5 \
559
+ --weight_decay 0. \
560
+ --warmup_ratio 0.03 \
561
+ --lr_scheduler_type "cosine" \
562
+ --logging_steps 1 \
563
+ --tf32 True \
564
+ --gradient_checkpointing True \
565
+ --dataloader_num_workers 2 \
566
+ --lazy_preprocess True \
567
+ --report_to wandb \
568
+ --torch_compile True \
569
+ --torch_compile_backend "inductor" \
570
+ --dataloader_drop_last True \
571
+ --attn_implementation flash_attention_2 \
572
+ --run_name ${PRE_RUN_NAME}
573
+
574
+
575
+