diff --git "a/training_log.txt" "b/training_log.txt" deleted file mode 100644--- "a/training_log.txt" +++ /dev/null @@ -1,9040 +0,0 @@ -W1205 13:38:14.913000 139671368247104 torch/distributed/run.py:779] -W1205 13:38:14.913000 139671368247104 torch/distributed/run.py:779] ***************************************** -W1205 13:38:14.913000 139671368247104 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. -W1205 13:38:14.913000 139671368247104 torch/distributed/run.py:779] ***************************************** -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/_distutils_hack/__init__.py:55: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml - warnings.warn( -[2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) -[2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) -[2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) -[2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) -[2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) -[2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) -[2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) -[2024-12-05 13:38:39,939] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) -df: df: /root/.triton/autotune/root/.triton/autotunedf: /root/.triton/autotune: 没有那个文件或目录 -: 没有那个文件或目录 -: 没有那个文件或目录 -df: /root/.triton/autotune: 没有那个文件或目录 -df: /root/.triton/autotune: 没有那个文件或目录 -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead - warnings.warn( -[2024-12-05 13:38:47,217] [INFO] [comm.py:652:init_distributed] cdb=None -[2024-12-05 13:38:47,217] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl -[2024-12-05 13:38:47,218] [INFO] [comm.py:652:init_distributed] cdb=None -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead - warnings.warn( -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead - warnings.warn( -[2024-12-05 13:38:47,229] [INFO] [comm.py:652:init_distributed] cdb=None -[2024-12-05 13:38:47,230] [INFO] [comm.py:652:init_distributed] cdb=None -[2024-12-05 13:38:47,231] [INFO] [comm.py:652:init_distributed] cdb=None -[2024-12-05 13:38:47,231] [INFO] [comm.py:652:init_distributed] cdb=None -[2024-12-05 13:38:47,231] [INFO] [comm.py:652:init_distributed] cdb=None -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/transformers/training_args.py:1733: FutureWarning: Using `--dispatch_batches` is deprecated and will be removed in version 4.41 of 🤗 Transformers. Use `--accelerator_config {'dispatch_batches':VALUE} instead - warnings.warn( -[2024-12-05 13:38:47,237] [INFO] [comm.py:652:init_distributed] cdb=None -12/05/2024 13:38:47 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 -[INFO|configuration_utils.py:670] 2024-12-05 13:38:48,151 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/config.json -[WARNING|modeling_rope_utils.py:379] 2024-12-05 13:38:48,155 >> Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} -[INFO|configuration_utils.py:739] 2024-12-05 13:38:48,156 >> Model config Qwen2VLConfig { - "_name_or_path": "/nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct", - "architectures": [ - "Qwen2VLForConditionalGeneration" - ], - "attention_dropout": 0.0, - "bos_token_id": 151643, - "eos_token_id": 151645, - "hidden_act": "silu", - "hidden_size": 3584, - "image_token_id": 151655, - "initializer_range": 0.02, - "intermediate_size": 18944, - "max_position_embeddings": 32768, - "max_window_layers": 28, - "model_type": "qwen2_vl", - "num_attention_heads": 28, - "num_hidden_layers": 28, - "num_key_value_heads": 4, - "rms_norm_eps": 1e-06, - "rope_scaling": { - "mrope_section": [ - 16, - 24, - 24 - ], - "rope_type": "default", - "type": "default" - }, - "rope_theta": 1000000.0, - "sliding_window": 32768, - "tie_word_embeddings": false, - "torch_dtype": "bfloat16", - "transformers_version": "4.45.0", - "use_cache": true, - "use_sliding_window": false, - "video_token_id": 151656, - "vision_config": { - "in_chans": 3, - "model_type": "qwen2_vl", - "spatial_patch_size": 14 - }, - "vision_end_token_id": 151653, - "vision_start_token_id": 151652, - "vision_token_id": 151654, - "vocab_size": 152064 -} - -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file vocab.json -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file merges.txt -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file tokenizer.json -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file added_tokens.json -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file special_tokens_map.json -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,288 >> loading file tokenizer_config.json -[INFO|tokenization_utils_base.py:2478] 2024-12-05 13:38:48,699 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. -[INFO|image_processing_base.py:373] 2024-12-05 13:38:48,771 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/preprocessor_config.json -[INFO|image_processing_base.py:373] 2024-12-05 13:38:48,793 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/preprocessor_config.json -[INFO|image_processing_base.py:429] 2024-12-05 13:38:48,793 >> Image processor Qwen2VLImageProcessor { - "do_convert_rgb": true, - "do_normalize": true, - "do_rescale": true, - "do_resize": true, - "image_mean": [ - 0.48145466, - 0.4578275, - 0.40821073 - ], - "image_processor_type": "Qwen2VLImageProcessor", - "image_std": [ - 0.26862954, - 0.26130258, - 0.27577711 - ], - "max_pixels": 12845056, - "merge_size": 2, - "min_pixels": 3136, - "patch_size": 14, - "processor_class": "Qwen2VLProcessor", - "resample": 3, - "rescale_factor": 0.00392156862745098, - "size": { - "max_pixels": 12845056, - "min_pixels": 3136 - }, - "temporal_patch_size": 2 -} - -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,805 >> loading file vocab.json -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file merges.txt -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file tokenizer.json -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file added_tokens.json -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file special_tokens_map.json -[INFO|tokenization_utils_base.py:2212] 2024-12-05 13:38:48,806 >> loading file tokenizer_config.json -[INFO|tokenization_utils_base.py:2478] 2024-12-05 13:38:49,114 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. -[INFO|processing_utils.py:744] 2024-12-05 13:38:49,684 >> Processor Qwen2VLProcessor: -- image_processor: Qwen2VLImageProcessor { - "do_convert_rgb": true, - "do_normalize": true, - "do_rescale": true, - "do_resize": true, - "image_mean": [ - 0.48145466, - 0.4578275, - 0.40821073 - ], - "image_processor_type": "Qwen2VLImageProcessor", - "image_std": [ - 0.26862954, - 0.26130258, - 0.27577711 - ], - "max_pixels": 12845056, - "merge_size": 2, - "min_pixels": 3136, - "patch_size": 14, - "processor_class": "Qwen2VLProcessor", - "resample": 3, - "rescale_factor": 0.00392156862745098, - "size": { - "max_pixels": 12845056, - "min_pixels": 3136 - }, - "temporal_patch_size": 2 -} - -- tokenizer: Qwen2TokenizerFast(name_or_path='/nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={ - 151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), - 151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), -} - -{ - "processor_class": "Qwen2VLProcessor" -} - -12/05/2024 13:38:49 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> -12/05/2024 13:38:49 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... -12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 -12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 -12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 -Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} -Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} -Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} -12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 -12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 -12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 -Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} -Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} -Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} -12/05/2024 13:38:50 - INFO - llamafactory.hparams.parser - Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 -Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} - Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 5754 examples [00:00, 43710.45 examples/s] Generating train split: 23931 examples [00:00, 83108.86 examples/s] Generating train split: 24618 examples [00:00, 78770.57 examples/s] -12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> -12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> -12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> -12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> -12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> -12/05/2024 13:38:51 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> -12/05/2024 13:38:52 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> - Converting format of dataset (num_proc=320): 0%| | 0/24618 [00:00 -dlc1o8747bm15inu-master-0:71:71 [0] NCCL INFO Plugin name set by env to libnccl-net-none.so -dlc1o8747bm15inu-master-0:71:71 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation -dlc1o8747bm15inu-master-0:71:71 [0] NCCL INFO cudaDriverVersion 12010 -NCCL version 2.20.5+cuda12.4 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NCCL_IB_HCA set to mlx5 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO cudaDriverVersion 12010 -dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO Plugin name set by env to libnccl-net-none.so -dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO cudaDriverVersion 12010 -dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO Plugin name set by env to libnccl-net-none.so -dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO cudaDriverVersion 12010 -dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO cudaDriverVersion 12010 -dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO cudaDriverVersion 12010 -dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO Plugin name set by env to libnccl-net-none.so -dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO Plugin name set by env to libnccl-net-none.so -dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO Plugin name set by env to libnccl-net-none.so -dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO cudaDriverVersion 12010 -dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO Plugin name set by env to libnccl-net-none.so -dlc1o8747bm15inu-master-0:76:76 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation -dlc1o8747bm15inu-master-0:75:75 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation -dlc1o8747bm15inu-master-0:72:72 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation -dlc1o8747bm15inu-master-0:74:74 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation -dlc1o8747bm15inu-master-0:73:73 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation -dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO cudaDriverVersion 12010 -dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:78:78 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation -dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO Bootstrap : Using eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO Plugin name set by env to libnccl-net-none.so -dlc1o8747bm15inu-master-0:77:77 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net-none.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net-none.so), using internal implementation -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NCCL_IB_HCA set to mlx5 -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NCCL_IB_HCA set to mlx5 -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NCCL_IB_HCA set to mlx5 -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NCCL_IB_HCA set to mlx5 -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NCCL_IB_HCA set to mlx5 -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NCCL_IB_HCA set to mlx5 -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NCCL_IB_HCA set to mlx5 -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.20.90<0> -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO comm 0xa547c690 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x28a6667510b10773 - Init START -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO comm 0xa4c31ca0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x28a6667510b10773 - Init START -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO comm 0xa6df0650 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x28a6667510b10773 - Init START -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO comm 0xa55b37a0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x28a6667510b10773 - Init START -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO comm 0xaf287310 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x28a6667510b10773 - Init START -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO comm 0xa4b2bc20 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x28a6667510b10773 - Init START -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO comm 0xa75046a0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x28a6667510b10773 - Init START -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO comm 0xc0177130 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x28a6667510b10773 - Init START -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff,ffffffff -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NVLS multicast support is not available on dev 1 -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NVLS multicast support is not available on dev 7 -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff,ffffffff -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NVLS multicast support is not available on dev 2 -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NVLS multicast support is not available on dev 4 -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NVLS multicast support is not available on dev 6 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,ffffffff -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NVLS multicast support is not available on dev 0 -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff,ffffffff -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NVLS multicast support is not available on dev 3 -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NVLS multicast support is not available on dev 5 -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO comm 0xaf287310 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO comm 0xa75046a0 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO comm 0xa6df0650 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO comm 0xa547c690 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO comm 0xa4c31ca0 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO comm 0xa55b37a0 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO comm 0xa4b2bc20 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO comm 0xc0177130 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:74:1649 [3] NCCL INFO comm 0xa75046a0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x28a6667510b10773 - Init COMPLETE -dlc1o8747bm15inu-master-0:76:1646 [5] NCCL INFO comm 0xa4b2bc20 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x28a6667510b10773 - Init COMPLETE -dlc1o8747bm15inu-master-0:77:1652 [6] NCCL INFO comm 0xa55b37a0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x28a6667510b10773 - Init COMPLETE -dlc1o8747bm15inu-master-0:72:1648 [1] NCCL INFO comm 0xa4c31ca0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x28a6667510b10773 - Init COMPLETE -dlc1o8747bm15inu-master-0:78:1651 [7] NCCL INFO comm 0xa547c690 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x28a6667510b10773 - Init COMPLETE -dlc1o8747bm15inu-master-0:75:1647 [4] NCCL INFO comm 0xaf287310 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x28a6667510b10773 - Init COMPLETE -dlc1o8747bm15inu-master-0:71:1641 [0] NCCL INFO comm 0xc0177130 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x28a6667510b10773 - Init COMPLETE -dlc1o8747bm15inu-master-0:73:1650 [2] NCCL INFO comm 0xa6df0650 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x28a6667510b10773 - Init COMPLETE -12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... -12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... -12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... -12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... -12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... -12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... -12/05/2024 13:39:03 - INFO - llamafactory.data.loader - Loading dataset sim/1205/1205_aw_24k.jsonl... - Running tokenizer on dataset (num_proc=320): 0%| | 0/24618 [00:00system -You are a helpful assistant.<|im_end|> -<|im_start|>user -<|vision_start|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|image_pad|><|vision_end|> -You are a GUI task expert, I will provide you with a high-level instruction, a screenshot with its corresponding accessibility tree. - -High-level instruction: Exit the Camera app and return to the home screen. - -Accessibility tree: {"android.view.View com.android.camera2 com.android.camera2:id/preview_overlay": "(540.0, 1232.5)", "Shutter": "(540.0, 2179.5)", "MODE LIST": "(124.0, 191.0)", "FILMSTRIP": "(371.0, 191.0)", "Z-": "(609.5, 191.0)", "Z+": "(840.5, 191.0)", "Countdown timer is off": "(382.0, 1927.0)", "Grid lines off": "(540.0, 1927.0)", "Back camera": "(698.0, 1927.0)"} - -Please generate the low-level thought and action for the next step.<|im_end|> -<|im_start|>assistant -Low-level thought: Press the back button to exit the Camera app and return to the home screen. -action: {"action_type":"navigate_back"}<|im_end|> -label_ids: -[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 24187, 11591, 3381, 25, 8445, 279, 1182, 3137, 311, 4869, 279, 14332, 906, 323, 470, 311, 279, 2114, 4171, 624, 1311, 25, 5212, 1311, 1819, 3252, 70839, 3895, 9207, 151645] -labels: -Low-level thought: Press the back button to exit the Camera app and return to the home screen. -action: {"action_type":"navigate_back"}<|im_end|> -[INFO|configuration_utils.py:670] 2024-12-05 13:40:16,953 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/config.json -[WARNING|modeling_rope_utils.py:379] 2024-12-05 13:40:16,953 >> Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'} -[INFO|configuration_utils.py:739] 2024-12-05 13:40:16,954 >> Model config Qwen2VLConfig { - "_name_or_path": "/nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct", - "architectures": [ - "Qwen2VLForConditionalGeneration" - ], - "attention_dropout": 0.0, - "bos_token_id": 151643, - "eos_token_id": 151645, - "hidden_act": "silu", - "hidden_size": 3584, - "image_token_id": 151655, - "initializer_range": 0.02, - "intermediate_size": 18944, - "max_position_embeddings": 32768, - "max_window_layers": 28, - "model_type": "qwen2_vl", - "num_attention_heads": 28, - "num_hidden_layers": 28, - "num_key_value_heads": 4, - "rms_norm_eps": 1e-06, - "rope_scaling": { - "mrope_section": [ - 16, - 24, - 24 - ], - "rope_type": "default", - "type": "default" - }, - "rope_theta": 1000000.0, - "sliding_window": 32768, - "tie_word_embeddings": false, - "torch_dtype": "bfloat16", - "transformers_version": "4.45.0", - "use_cache": true, - "use_sliding_window": false, - "video_token_id": 151656, - "vision_config": { - "in_chans": 3, - "model_type": "qwen2_vl", - "spatial_patch_size": 14 - }, - "vision_end_token_id": 151653, - "vision_start_token_id": 151652, - "vision_token_id": 151654, - "vocab_size": 152064 -} - -[INFO|modeling_utils.py:3723] 2024-12-05 13:40:17,192 >> loading weights file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/model.safetensors.index.json -[INFO|modeling_utils.py:1622] 2024-12-05 13:40:17,294 >> Instantiating Qwen2VLForConditionalGeneration model under default dtype torch.bfloat16. -[INFO|configuration_utils.py:1099] 2024-12-05 13:40:17,295 >> Generate config GenerationConfig { - "bos_token_id": 151643, - "eos_token_id": 151645 -} - -[WARNING|logging.py:328] 2024-12-05 13:40:17,317 >> `Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46 - Loading checkpoint shards: 0%| | 0/5 [00:00> All model checkpoint weights were used when initializing Qwen2VLForConditionalGeneration. - -[INFO|modeling_utils.py:4576] 2024-12-05 13:42:11,359 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct. -If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training. -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full -[INFO|configuration_utils.py:1052] 2024-12-05 13:42:11,380 >> loading configuration file /nas/shared/NLP_A100/wuzhenyu/LLMs/Qwen2-VL-7B-Instruct/generation_config.json -[INFO|configuration_utils.py:1099] 2024-12-05 13:42:11,380 >> Generate config GenerationConfig { - "bos_token_id": 151643, - "do_sample": true, - "eos_token_id": [ - 151645, - 151643 - ], - "pad_token_id": 151643, - "temperature": 0.01, - "top_k": 1, - "top_p": 0.001 -} - -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. -12/05/2024 13:42:11 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. -12/05/2024 13:42:11 - INFO - llamafactory.model.adapter - Fine-tuning method: Full -12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 -12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 -12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 -12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 -12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 -12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 -12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 -12/05/2024 13:42:11 - INFO - llamafactory.model.loader - trainable params: 7,615,616,512 || all params: 8,291,375,616 || trainable%: 91.8499 -Detected kernel version 4.19.91, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. -[INFO|trainer.py:667] 2024-12-05 13:42:11,800 >> Using auto half precision backend -[2024-12-05 13:42:15,269] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed info: version=0.15.4, git-hash=unknown, git-branch=unknown -[2024-12-05 13:42:15,269] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 8 -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Using non-device net plugin version 0 -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Using network IB -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO bootstrapSplit: comm 0xa6a0b1d0 parent 0xa4c31ca0 rank 1 nranks 8 color -934961569 key 1 prev 0 next 2 - DONE -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO bootstrapSplit: comm 0xa6f91730 parent 0xa547c690 rank 7 nranks 8 color -934961569 key 7 prev 6 next 0 - DONE -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO bootstrapSplit: comm 0xa79ebd30 parent 0xa4b2bc20 rank 5 nranks 8 color -934961569 key 5 prev 4 next 6 - DONE -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO bootstrapSplit: comm 0xa6c5e6a0 parent 0xaf287310 rank 4 nranks 8 color -934961569 key 4 prev 3 next 5 - DONE -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO bootstrapSplit: comm 0xa7df0940 parent 0xa55b37a0 rank 6 nranks 8 color -934961569 key 6 prev 5 next 7 - DONE -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO bootstrapSplit: comm 0xc33c3ce0 parent 0xc0177130 rank 0 nranks 8 color -934961569 key 0 prev 7 next 1 - DONE -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO comm 0xa6f91730 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x1900e06671419955 - Init START -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO comm 0xa79ebd30 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x1900e06671419955 - Init START -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO comm 0xa7df0940 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x1900e06671419955 - Init START -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO comm 0xa6a0b1d0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x1900e06671419955 - Init START -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO comm 0xa6c5e6a0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x1900e06671419955 - Init START -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO bootstrapSplit: comm 0xa5e6c900 parent 0xa75046a0 rank 3 nranks 8 color -934961569 key 3 prev 2 next 4 - DONE -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO comm 0xc33c3ce0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x1900e06671419955 - Init START -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO comm 0xa5e6c900 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x1900e06671419955 - Init START -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO bootstrapSplit: comm 0xa57b0e70 parent 0xa6df0650 rank 2 nranks 8 color -934961569 key 2 prev 1 next 3 - DONE -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO comm 0xa57b0e70 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x1900e06671419955 - Init START -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffffff,ffffffff -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO NVLS multicast support is not available on dev 1 -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO NVLS multicast support is not available on dev 7 -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO NVLS multicast support is not available on dev 6 -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO NVLS multicast support is not available on dev 4 -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff,ffffffff -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO NVLS multicast support is not available on dev 3 -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO NVLS multicast support is not available on dev 5 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,ffffffff -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO NVLS multicast support is not available on dev 0 -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff,ffffffff -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO NVLS multicast support is not available on dev 2 -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO comm 0xa6a0b1d0 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO comm 0xc33c3ce0 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0 -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO comm 0xa6f91730 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0 -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO comm 0xa7df0940 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0 -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO comm 0xa6c5e6a0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO comm 0xa79ebd30 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO comm 0xa5e6c900 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO comm 0xa57b0e70 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO P2P Chunksize set to 524288 -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Connected all rings -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO Connected all trees -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO 24 coll channels, 0 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer -dlc1o8747bm15inu-master-0:73:3059 [2] NCCL INFO comm 0xa57b0e70 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x1900e06671419955 - Init COMPLETE -dlc1o8747bm15inu-master-0:78:3060 [7] NCCL INFO comm 0xa6f91730 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x1900e06671419955 - Init COMPLETE -dlc1o8747bm15inu-master-0:77:3061 [6] NCCL INFO comm 0xa7df0940 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x1900e06671419955 - Init COMPLETE -dlc1o8747bm15inu-master-0:76:3058 [5] NCCL INFO comm 0xa79ebd30 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x1900e06671419955 - Init COMPLETE -dlc1o8747bm15inu-master-0:75:3056 [4] NCCL INFO comm 0xa6c5e6a0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x1900e06671419955 - Init COMPLETE -dlc1o8747bm15inu-master-0:74:3057 [3] NCCL INFO comm 0xa5e6c900 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x1900e06671419955 - Init COMPLETE -dlc1o8747bm15inu-master-0:71:3054 [0] NCCL INFO comm 0xc33c3ce0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x1900e06671419955 - Init COMPLETE -dlc1o8747bm15inu-master-0:72:3055 [1] NCCL INFO comm 0xa6a0b1d0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x1900e06671419955 - Init COMPLETE -[2024-12-05 13:42:17,406] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False -Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... - - - -Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... -Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam...Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam...Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... - - -Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... -Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... -Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... - -Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam...Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... - -Using /root/.cache/torch_extensions/py311_cu121 as PyTorch extensions root... -Creating extension directory /root/.cache/torch_extensions/py311_cu121/fused_adam... -Detected CUDA files, patching ldflags -Emitting ninja build file /root/.cache/torch_extensions/py311_cu121/fused_adam/build.ninja... -/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. -If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. - warnings.warn( -Building extension module fused_adam... -Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) -[1/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/adam -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/TH -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o -[2/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output multi_tensor_adam.cuda.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/adam -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/TH -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -std=c++17 -c /cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o -[3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/cpfs01/user/wuzhenyu/anaconda3/envs/llama-factory/lib/python3.11/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o fused_adam.so -Loading extension module fused_adam... -Time to load fused_adam op: 34.222880601882935 seconds -[2024-12-05 13:42:51,649] [INFO] [logging.py:128:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer -[2024-12-05 13:42:51,649] [INFO] [logging.py:128:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer -Loading extension module fused_adam...Loading extension module fused_adam... - -Loading extension module fused_adam...Loading extension module fused_adam... - -Time to load fused_adam op: 34.243221044540405 secondsTime to load fused_adam op: 34.24309253692627 seconds - -Loading extension module fused_adam... -Time to load fused_adam op: 34.24324321746826 seconds -Time to load fused_adam op: 34.243529319763184 seconds -Loading extension module fused_adam... -Time to load fused_adam op: 34.243879318237305 seconds -Time to load fused_adam op: 34.244356632232666 seconds -[2024-12-05 13:42:51,666] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam -[2024-12-05 13:42:51,666] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= -[2024-12-05 13:42:51,666] [INFO] [logging.py:128:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 1 optimizer -[2024-12-05 13:42:51,666] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 1000000000 -[2024-12-05 13:42:51,666] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 1000000000 -[2024-12-05 13:42:51,666] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False -[2024-12-05 13:42:51,666] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False -Loading extension module fused_adam... -Time to load fused_adam op: 34.29689931869507 seconds -[2024-12-05 13:43:03,153] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states -[2024-12-05 13:43:03,154] [INFO] [utils.py:782:see_memory_usage] MA 18.99 GB Max_MA 20.77 GB CA 20.9 GB Max_CA 21 GB -[2024-12-05 13:43:03,154] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 133.14 GB, percent = 13.3% -[2024-12-05 13:43:03,464] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states -[2024-12-05 13:43:03,465] [INFO] [utils.py:782:see_memory_usage] MA 18.99 GB Max_MA 22.54 GB CA 24.44 GB Max_CA 24 GB -[2024-12-05 13:43:03,465] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 133.14 GB, percent = 13.3% -[2024-12-05 13:43:03,465] [INFO] [stage_1_and_2.py:544:__init__] optimizer state initialized -[2024-12-05 13:43:03,760] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer -[2024-12-05 13:43:03,761] [INFO] [utils.py:782:see_memory_usage] MA 18.99 GB Max_MA 18.99 GB CA 24.44 GB Max_CA 24 GB -[2024-12-05 13:43:03,761] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory: used = 118.96 GB, percent = 11.9% -[2024-12-05 13:43:03,763] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer -[2024-12-05 13:43:03,763] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed using client callable to create LR scheduler -[2024-12-05 13:43:03,763] [INFO] [logging.py:128:log_dist] [Rank 0] DeepSpeed LR Scheduler = -[2024-12-05 13:43:03,763] [INFO] [logging.py:128:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[[0.9, 0.999]] -[2024-12-05 13:43:03,765] [INFO] [config.py:999:print] DeepSpeedEngine configuration: -[2024-12-05 13:43:03,765] [INFO] [config.py:1003:print] activation_checkpointing_config { - "partition_activations": false, - "contiguous_memory_optimization": false, - "cpu_checkpointing": false, - "number_checkpoints": null, - "synchronize_checkpoint_boundary": false, - "profile": false -} -[2024-12-05 13:43:03,765] [INFO] [config.py:1003:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False} -[2024-12-05 13:43:03,765] [INFO] [config.py:1003:print] amp_enabled .................. False -[2024-12-05 13:43:03,765] [INFO] [config.py:1003:print] amp_params ................... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] autotuning_config ............ { - "enabled": false, - "start_step": null, - "end_step": null, - "metric_path": null, - "arg_mappings": null, - "metric": "throughput", - "model_info": null, - "results_dir": "autotuning_results", - "exps_dir": "autotuning_exps", - "overwrite": true, - "fast": true, - "start_profile_step": 3, - "end_profile_step": 5, - "tuner_type": "gridsearch", - "tuner_early_stopping": 5, - "tuner_num_trials": 50, - "model_info_path": null, - "mp_size": 1, - "max_train_batch_size": null, - "min_train_batch_size": 1, - "max_train_micro_batch_size_per_gpu": 1.024000e+03, - "min_train_micro_batch_size_per_gpu": 1, - "num_tuning_micro_batch_sizes": 3 -} -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] bfloat16_enabled ............. True -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] bfloat16_immediate_grad_update False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] checkpoint_parallel_write_pipeline False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] checkpoint_tag_validation_enabled True -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] checkpoint_tag_validation_fail False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] comms_config ................. -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] communication_data_type ...... None -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] curriculum_enabled_legacy .... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] curriculum_params_legacy ..... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] data_efficiency_enabled ...... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] dataloader_drop_last ......... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] disable_allgather ............ False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] dump_state ................... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] dynamic_loss_scale_args ...... None -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_enabled ........... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_gas_boundary_resolution 1 -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_layer_name ........ bert.encoder.layer -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_layer_num ......... 0 -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_max_iter .......... 100 -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_stability ......... 1e-06 -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_tol ............... 0.01 -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] eigenvalue_verbose ........... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] elasticity_enabled ........... False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] flops_profiler_config ........ { - "enabled": false, - "recompute_fwd_factor": 0.0, - "profile_step": 1, - "module_depth": -1, - "top_modules": 1, - "detailed": true, - "output_file": null -} -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] fp16_auto_cast ............... None -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] fp16_enabled ................. False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] fp16_master_weights_and_gradients False -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] global_rank .................. 0 -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] grad_accum_dtype ............. None -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] gradient_accumulation_steps .. 16 -[2024-12-05 13:43:03,766] [INFO] [config.py:1003:print] gradient_clipping ............ 1.0 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] gradient_predivide_factor .... 1.0 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] graph_harvesting ............. False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] initial_dynamic_scale ........ 1 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] load_universal_checkpoint .... False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] loss_scale ................... 1.0 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] memory_breakdown ............. False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] mics_hierarchial_params_gather False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] mics_shard_size .............. -1 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, mode=None) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] nebula_config ................ { - "enabled": false, - "persistent_storage_path": null, - "persistent_time_interval": 100, - "num_of_version_in_retention": 2, - "enable_nebula_load": true, - "load_path": null -} -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] optimizer_legacy_fusion ...... False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] optimizer_name ............... adamw -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] optimizer_params ............. {'lr': 1e-06, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.001} -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] pld_enabled .................. False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] pld_params ................... False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] prescale_gradients ........... False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] scheduler_name ............... None -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] scheduler_params ............. None -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] seq_parallel_communication_data_type torch.float32 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] sparse_attention ............. None -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] sparse_gradients_enabled ..... False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] steps_per_print .............. inf -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] timers_config ................ enabled=True synchronized=True -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] train_batch_size ............. 128 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] train_micro_batch_size_per_gpu 1 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] use_data_before_expert_parallel_ False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] use_node_local_storage ....... False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] wall_clock_breakdown ......... True -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] weight_quantization_config ... None -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] world_size ................... 8 -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_allow_untested_optimizer False -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_config .................. stage=1 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=1000000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=1000000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False use_all_reduce_for_fetch_params=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_enabled ................. True -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_force_ds_cpu_optimizer .. True -[2024-12-05 13:43:03,767] [INFO] [config.py:1003:print] zero_optimization_stage ...... 1 -[2024-12-05 13:43:03,768] [INFO] [config.py:989:print_user_config] json = { - "zero_optimization": { - "stage": 1, - "allgather_partitions": true, - "allgather_bucket_size": 1.000000e+09, - "overlap_comm": true, - "reduce_scatter": true, - "reduce_bucket_size": 1.000000e+09, - "contiguous_gradients": true - }, - "fp16": { - "enabled": false, - "auto_cast": true, - "loss_scale": 0, - "initial_scale_power": 32, - "loss_scale_window": 1000, - "hysteresis": 2, - "min_loss_scale": 1 - }, - "bf16": { - "enabled": true - }, - "optimizer": { - "type": "AdamW", - "params": { - "lr": 1e-06, - "betas": [0.9, 0.999], - "eps": 1e-08, - "weight_decay": 0.001 - } - }, - "gradient_accumulation_steps": 16, - "gradient_clipping": 1.0, - "steps_per_print": inf, - "train_batch_size": 128, - "train_micro_batch_size_per_gpu": 1, - "wall_clock_breakdown": true -} -[INFO|trainer.py:2243] 2024-12-05 13:43:03,768 >> ***** Running training ***** -[INFO|trainer.py:2244] 2024-12-05 13:43:03,768 >> Num examples = 24,618 -[INFO|trainer.py:2245] 2024-12-05 13:43:03,768 >> Num Epochs = 2 -[INFO|trainer.py:2246] 2024-12-05 13:43:03,768 >> Instantaneous batch size per device = 1 -[INFO|trainer.py:2249] 2024-12-05 13:43:03,768 >> Total train batch size (w. parallel, distributed & accumulation) = 128 -[INFO|trainer.py:2250] 2024-12-05 13:43:03,768 >> Gradient Accumulation steps = 16 -[INFO|trainer.py:2251] 2024-12-05 13:43:03,768 >> Total optimization steps = 384 -[INFO|trainer.py:2252] 2024-12-05 13:43:03,770 >> Number of trainable parameters = 7,615,616,512 - 0%| | 0/384 [00:00> Saving model checkpoint to /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384 -[INFO|configuration_utils.py:407] 2024-12-05 15:32:31,937 >> Configuration saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/config.json -[INFO|configuration_utils.py:868] 2024-12-05 15:32:31,972 >> Configuration saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/generation_config.json -[INFO|modeling_utils.py:2838] 2024-12-05 15:33:11,815 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/model.safetensors.index.json. -[INFO|tokenization_utils_base.py:2649] 2024-12-05 15:33:11,856 >> tokenizer config file saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/tokenizer_config.json -[INFO|tokenization_utils_base.py:2658] 2024-12-05 15:33:11,895 >> Special tokens file saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/special_tokens_map.json -[2024-12-05 15:33:12,401] [INFO] [logging.py:128:log_dist] [Rank 0] [Torch] Checkpoint global_step384 is about to be saved! -[2024-12-05 15:33:12,431] [INFO] [logging.py:128:log_dist] [Rank 0] Saving model checkpoint: /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/mp_rank_00_model_states.pt -[2024-12-05 15:33:12,431] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/mp_rank_00_model_states.pt... -[2024-12-05 15:33:52,818] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/mp_rank_00_model_states.pt. -[2024-12-05 15:33:53,188] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... -[2024-12-05 15:36:48,940] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. -[2024-12-05 15:36:49,848] [INFO] [engine.py:3536:_save_zero_checkpoint] zero checkpoint saved /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/checkpoint-384/global_step384/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt -[2024-12-05 15:36:49,848] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step384 is ready now! -[INFO|trainer.py:2505] 2024-12-05 15:36:52,089 >> - -Training completed. Do not forget to share your model on huggingface.co/models =) - - - {'train_runtime': 6828.3197, 'train_samples_per_second': 7.211, 'train_steps_per_second': 0.056, 'train_loss': 0.19105235013800362, 'epoch': 2.0} - 100%|██████████| 384/384 [1:53:48<00:00, 15.56s/it] 100%|██████████| 384/384 [1:53:48<00:00, 17.78s/it] -[INFO|image_processing_base.py:258] 2024-12-05 15:36:52,231 >> Image processor saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/preprocessor_config.json -[INFO|trainer.py:3705] 2024-12-05 15:36:57,061 >> Saving model checkpoint to /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025 -[INFO|configuration_utils.py:407] 2024-12-05 15:36:57,118 >> Configuration saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/config.json -[INFO|configuration_utils.py:868] 2024-12-05 15:36:57,153 >> Configuration saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/generation_config.json -[INFO|modeling_utils.py:2838] 2024-12-05 15:37:37,945 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/model.safetensors.index.json. -[INFO|tokenization_utils_base.py:2649] 2024-12-05 15:37:38,009 >> tokenizer config file saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/tokenizer_config.json -[INFO|tokenization_utils_base.py:2658] 2024-12-05 15:37:38,079 >> Special tokens file saved in /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/special_tokens_map.json -***** train metrics ***** - epoch = 1.9961 - total_flos = 2845225360GF - train_loss = 0.1911 - train_runtime = 1:53:48.31 - train_samples_per_second = 7.211 - train_steps_per_second = 0.056 -Figure saved at: /nas/shared/NLP_A100/wuzhenyu/ckpt/qwen2vl_7b_sim_24k_1025/training_loss.png -12/05/2024 15:37:39 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot. -12/05/2024 15:37:39 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot. -[INFO|modelcard.py:449] 2024-12-05 15:37:39,381 >> Dropping the following result as it does not have all the necessary fields: -{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}