htlou's picture
Upload folder using huggingface_hub
24f69f0 verified
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_setup.py:_flush():79] Current SDK version is 0.18.3
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_setup.py:_flush():79] Configure stats pid to 937440
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_setup.py:_flush():79] Loading settings from /home/align-anything/.config/wandb/settings
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_setup.py:_flush():79] Loading settings from /data/align-anything/hantao/align-anything/scripts/wandb/settings
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_setup.py:_flush():79] Loading settings from environment variables: {'api_key': '***REDACTED***', 'mode': 'online'}
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': 'online', '_disable_service': None}
2024-10-25 18:15:18,977 WARNING MainThread:937440 [wandb_setup.py:_flush():79] Could not find program at -m align_anything.trainers.text_image_to_text_image.ppo
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': None, 'program': '-m align_anything.trainers.text_image_to_text_image.ppo'}
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_setup.py:_flush():79] Applying login settings: {}
2024-10-25 18:15:18,977 INFO MainThread:937440 [wandb_init.py:_log_setup():532] Logging user logs to ../outputs/ppo_ti2ti_baseline_1025_with_eval/wandb/run-20241025_181518-qbvp2oju/logs/debug.log
2024-10-25 18:15:18,978 INFO MainThread:937440 [wandb_init.py:_log_setup():533] Logging internal logs to ../outputs/ppo_ti2ti_baseline_1025_with_eval/wandb/run-20241025_181518-qbvp2oju/logs/debug-internal.log
2024-10-25 18:15:18,978 INFO MainThread:937440 [wandb_init.py:init():617] calling init triggers
2024-10-25 18:15:18,978 INFO MainThread:937440 [wandb_init.py:init():624] wandb.init called with sweep_config: {}
config: {'train_cfgs': {'ds_cfgs': 'ds_z3_config.json', 'epochs': 3, 'seed': 42, 'per_device_prompt_batch_size': 8, 'per_device_train_batch_size': 8, 'per_device_eval_batch_size': 8, 'gradient_accumulation_steps': 2, 'actor_gradient_checkpointing': True, 'critic_gradient_checkpointing': True, 'actor_lr': 1e-05, 'actor_lr_scheduler_type': 'cosine', 'actor_lr_warmup_ratio': 0.03, 'actor_weight_decay': 0.01, 'critic_lr': 5e-06, 'critic_lr_scheduler_type': 'constant', 'critic_lr_warmup_ratio': 0.03, 'critic_weight_decay': 0.0, 'adam_betas': [0.9, 0.95], 'bf16': True, 'fp16': False, 'eval_strategy': 'epoch', 'eval_interval': 10, 'kl_coeff': 0.02, 'clip_range_ratio': 0.2, 'clip_range_score': 50.0, 'clip_range_value': 5.0, 'ptx_coeff': 16.0, 'gamma': 1.0, 'gae_lambda': 0.95, 'normalize_reward': False, 'update_iters': 1, 'freeze_mm_proj': True, 'freeze_vision_tower': False, 'freeze_language_model': True}, 'data_cfgs': {'train_datasets': '/data/align-anything/hantao/align-anything/projects/text_image_to_text_image/outputs', 'train_template': 'spavl_ti2ti', 'train_size': 5000, 'train_split': None, 'train_subset': None, 'train_data_files': 'ti2ti_llf_prompt_only_tokenize.pt', 'train_optional_args': [], 'eval_datasets': None, 'eval_template': None, 'eval_size': None, 'eval_split': None, 'eval_subset': None, 'eval_data_files': None, 'eval_optional_args': [], 'ptx_datasets': None, 'ptx_template': 'spavl_ti2ti', 'ptx_size': None, 'ptx_subset': None, 'ptx_split': None, 'ptx_data_files': 'ti2ti_ptx_27k.pt', 'ptx_optional_args': []}, 'logger_cfgs': {'log_type': 'wandb', 'log_project': 'align-anything', 'log_run_name': 'ppo', 'output_dir': '../outputs/ppo_ti2ti_baseline_1025_with_eval', 'cache_dir': None, 'save_interval': 30.0}, 'model_cfgs': {'actor_model_name_or_path': '/data/align-anything/hantao/models/0916_ti_to_ti_sft', 'reward_model_name_or_path': '/data/align-anything/hantao/align-anything/outputs/rm_ti2ti_baseline_1025_with_eval/slice_2400', 'reward_critic_model_name_or_path': '/data/align-anything/hantao/align-anything/outputs/rm_ti2ti_baseline_1025_with_eval/slice_2400', 'trust_remote_code': True, 'model_max_length': 2048, 'temperature': 1.0, 'top_p': 1.0, 'repetition_penalty': 1.0}, 'lora_cfgs': {'use_lora': False, 'task_type': 'TaskType.CAUSAL_LM', 'inference_mode': False, 'r': 16, 'lora_alpha': 16, 'lora_dropout': 0.1, 'target_modules': ['q_proj', 'v_proj'], 'save_full_model': True}, 'bnb_cfgs': {'use_bnb': False, 'load_in_4bit': True, 'load_in_8bit': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': True, 'bnb_4bit_compute_dtype': 'float16'}, 'special_tokens': None}
2024-10-25 18:15:18,978 INFO MainThread:937440 [wandb_init.py:init():667] starting backend
2024-10-25 18:15:18,978 INFO MainThread:937440 [wandb_init.py:init():671] sending inform_init request
2024-10-25 18:15:18,982 INFO MainThread:937440 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2024-10-25 18:15:18,983 INFO MainThread:937440 [wandb_init.py:init():684] backend started and connected
2024-10-25 18:15:18,986 INFO MainThread:937440 [wandb_init.py:init():779] updated telemetry
2024-10-25 18:15:18,996 INFO MainThread:937440 [wandb_init.py:init():812] communicating run to backend with 90.0 second timeout
2024-10-25 18:15:20,628 INFO MainThread:937440 [wandb_init.py:init():863] starting run threads in backend
2024-10-25 18:15:20,774 INFO MainThread:937440 [wandb_run.py:_console_start():2465] atexit reg
2024-10-25 18:15:20,774 INFO MainThread:937440 [wandb_run.py:_redirect():2313] redirect: wrap_raw
2024-10-25 18:15:20,774 INFO MainThread:937440 [wandb_run.py:_redirect():2378] Wrapping output streams.
2024-10-25 18:15:20,774 INFO MainThread:937440 [wandb_run.py:_redirect():2403] Redirects installed.
2024-10-25 18:15:20,776 INFO MainThread:937440 [wandb_init.py:init():907] run started, returning control to user process
2024-10-25 19:38:13,587 INFO MainThread:937440 [wandb_run.py:_finish():2164] finishing run htlou/align-anything/qbvp2oju
2024-10-25 19:38:13,590 INFO MainThread:937440 [wandb_run.py:_atexit_cleanup():2428] got exitcode: 0
2024-10-25 19:38:13,591 INFO MainThread:937440 [wandb_run.py:_restore():2410] restore
2024-10-25 19:38:13,592 INFO MainThread:937440 [wandb_run.py:_restore():2416] restore done
2024-10-25 19:38:17,104 INFO MainThread:937440 [wandb_run.py:_footer_history_summary_info():4049] rendering history
2024-10-25 19:38:17,107 INFO MainThread:937440 [wandb_run.py:_footer_history_summary_info():4081] rendering summary
2024-10-25 19:38:17,119 INFO MainThread:937440 [wandb_run.py:_footer_sync_info():4008] logging synced files