diff --git "a/training_log_20250123_001940.txt" "b/training_log_20250123_001940.txt" new file mode 100644--- /dev/null +++ "b/training_log_20250123_001940.txt" @@ -0,0 +1,3627 @@ +[2025-01-23 00:19:45,603] torch.distributed.run: [WARNING] +[2025-01-23 00:19:45,603] torch.distributed.run: [WARNING] ***************************************** +[2025-01-23 00:19:45,603] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +[2025-01-23 00:19:45,603] torch.distributed.run: [WARNING] ***************************************** +[2025-01-23 00:19:45,958] torch.distributed.run: [WARNING] +[2025-01-23 00:19:45,958] torch.distributed.run: [WARNING] ***************************************** +[2025-01-23 00:19:45,958] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +[2025-01-23 00:19:45,958] torch.distributed.run: [WARNING] ***************************************** +[2025-01-23 00:19:46,580] torch.distributed.run: [WARNING] +[2025-01-23 00:19:46,580] torch.distributed.run: [WARNING] ***************************************** +[2025-01-23 00:19:46,580] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +[2025-01-23 00:19:46,580] torch.distributed.run: [WARNING] ***************************************** +The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`. + +0it [00:00, ?it/s] +0it [00:00, ?it/s] +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +/cpfs02/user/zhaoxiangyu/miniconda3/envs/llava/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml + warnings.warn( +[2025-01-23 00:20:01,200] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,200] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,200] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,200] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,200] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,200] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,200] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,200] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,206] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2025-01-23 00:20:01,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) +df: df: df: /root/.triton/autotune/root/.triton/autotune/root/.triton/autotunedf: df: df: df: df: df: /root/.triton/autotune/root/.triton/autotune/root/.triton/autotune/root/.triton/autotune/root/.triton/autotune/root/.triton/autotune: 没有那个文件或目录: 没有那个文件或目录: 没有那个文件或目录 + + +: 没有那个文件或目录 +: 没有那个文件或目录 +: 没有那个文件或目录 +: 没有那个文件或目录 +: 没有那个文件或目录: 没有那个文件或目录 + +df: /root/.triton/autotunedf: /root/.triton/autotune: 没有那个文件或目录 +df: /root/.triton/autotune: 没有那个文件或目录 +: 没有那个文件或目录 +df: df: df: /root/.triton/autotune/root/.triton/autotune/root/.triton/autotune: 没有那个文件或目录 +: 没有那个文件或目录 +: 没有那个文件或目录 + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible + [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 + [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,270] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,270] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,276] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,271] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,271] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,271] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,271] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,271] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,271] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,276] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,276] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,277] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,277] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,277] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,277] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,277] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +01/23/2025 00:20:19 - WARNING - llava.train.train - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,273] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,280] [INFO] [comm.py:637:init_distributed] cdb=None +[2025-01-23 00:20:19,280] [INFO] [comm.py:637:init_distributed] cdb=None +01/23/2025 00:20:19 - INFO - llava.train.train - Training/evaluation parameters TrainingArguments( +_n_gpu=1, +adafactor=False, +adam_beta1=0.9, +adam_beta2=0.999, +adam_epsilon=1e-08, +auto_find_batch_size=False, +bf16=True, +bf16_full_eval=False, +bits=16, +cache_dir=None, +data_seed=None, +dataloader_drop_last=False, +dataloader_num_workers=4, +dataloader_persistent_workers=False, +dataloader_pin_memory=True, +ddp_backend=None, +ddp_broadcast_buffers=None, +ddp_bucket_cap_mb=None, +ddp_find_unused_parameters=None, +ddp_timeout=1800, +debug=[], +deepspeed=./scripts/zero3.json, +disable_tqdm=False, +dispatch_batches=None, +do_eval=False, +do_predict=False, +do_train=False, +double_quant=True, +eval_accumulation_steps=None, +eval_delay=0, +eval_steps=None, +evaluation_strategy=no, +fp16=False, +fp16_backend=auto, +fp16_full_eval=False, +fp16_opt_level=O1, +freeze_mm_mlp_adapter=False, +fsdp=[], +fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, +fsdp_min_num_params=0, +fsdp_transformer_layer_cls_to_wrap=None, +full_determinism=False, +gradient_accumulation_steps=2, +gradient_checkpointing=True, +gradient_checkpointing_kwargs=None, +greater_is_better=None, +group_by_length=False, +group_by_modality_length=True, +half_precision_backend=auto, +hub_always_push=False, +hub_model_id=None, +hub_private_repo=False, +hub_strategy=every_save, +hub_token=, +ignore_data_skip=False, +include_inputs_for_metrics=False, +include_num_input_tokens_seen=False, +include_tokens_per_second=False, +jit_mode_eval=False, +label_names=None, +label_smoothing_factor=0.0, +learning_rate=2e-05, +length_column_name=length, +load_best_model_at_end=False, +local_rank=0, +log_level=passive, +log_level_replica=warning, +log_on_each_node=True, +logging_dir=./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/runs/Jan23_00-20-19_dlc1irjyfb0zt5ew-worker-6, +logging_first_step=False, +logging_nan_inf_filter=True, +logging_steps=1.0, +logging_strategy=steps, +lora_alpha=16, +lora_bias=none, +lora_dropout=0.05, +lora_enable=False, +lora_r=64, +lora_weight_path=, +lr_scheduler_kwargs={}, +lr_scheduler_type=cosine, +max_grad_norm=1.0, +max_steps=-1, +metric_for_best_model=None, +mm_projector_lr=None, +mm_vision_tower_lr=2e-06, +model_max_length=32768, +mp_parameters=, +mpt_attn_impl=triton, +neftune_noise_alpha=None, +no_cuda=False, +num_train_epochs=1.0, +optim=adamw_torch, +optim_args=None, +output_dir=./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt, +overwrite_output_dir=False, +past_index=-1, +per_device_eval_batch_size=4, +per_device_train_batch_size=1, +prediction_loss_only=False, +push_to_hub=False, +push_to_hub_model_id=None, +push_to_hub_organization=None, +push_to_hub_token=, +quant_type=nf4, +ray_scope=last, +remove_unused_columns=False, +report_to=['wandb'], +resume_from_checkpoint=None, +run_name=llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt, +save_on_each_node=False, +save_only_model=False, +save_safetensors=True, +save_steps=10000, +save_strategy=steps, +save_total_limit=1, +seed=42, +skip_memory_metrics=True, +split_batches=False, +tf32=True, +torch_compile=False, +torch_compile_backend=None, +torch_compile_mode=None, +torchdynamo=None, +tpu_metrics_debug=False, +tpu_num_cores=None, +use_cpu=False, +use_ipex=False, +use_legacy_prediction_loop=False, +use_mps_device=False, +warmup_ratio=0.03, +warmup_steps=0, +weight_decay=0.0, +) +01/23/2025 00:20:19 - INFO - llava.train.train - Training/evaluation parameters DataArguments(data_path=None, meta_path='playground/meta_json/llavanext_sample/llava_next_notext_inf37kpolishmd_de35k_know40k_knins40k_creation10kfixed_chart11kmerge_tqa8k_info28k_gpt.json', lazy_preprocess=True, is_multimodal=False, image_folder=None, image_aspect_ratio='anyres', image_grid_pinpoints='[(336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008)]', image_crop_resolution=None, image_split_resolution=None, use_data_resampling=False) +01/23/2025 00:20:19 - WARNING - llava.train.train - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:19 - WARNING - llava.train.train - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:19 - INFO - llava.train.train - Training/evaluation parameters TrainingArguments( +_n_gpu=1, +adafactor=False, +adam_beta1=0.9, +adam_beta2=0.999, +adam_epsilon=1e-08, +auto_find_batch_size=False, +bf16=True, +bf16_full_eval=False, +bits=16, +cache_dir=None, +data_seed=None, +dataloader_drop_last=False, +dataloader_num_workers=4, +dataloader_persistent_workers=False, +dataloader_pin_memory=True, +ddp_backend=None, +ddp_broadcast_buffers=None, +ddp_bucket_cap_mb=None, +ddp_find_unused_parameters=None, +ddp_timeout=1800, +debug=[], +deepspeed=./scripts/zero3.json, +disable_tqdm=False, +dispatch_batches=None, +do_eval=False, +do_predict=False, +do_train=False, +double_quant=True, +eval_accumulation_steps=None, +eval_delay=0, +eval_steps=None, +evaluation_strategy=no, +fp16=False, +fp16_backend=auto, +fp16_full_eval=False, +fp16_opt_level=O1, +freeze_mm_mlp_adapter=False, +fsdp=[], +fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, +fsdp_min_num_params=0, +fsdp_transformer_layer_cls_to_wrap=None, +full_determinism=False, +gradient_accumulation_steps=2, +gradient_checkpointing=True, +gradient_checkpointing_kwargs=None, +greater_is_better=None, +group_by_length=False, +group_by_modality_length=True, +half_precision_backend=auto, +hub_always_push=False, +hub_model_id=None, +hub_private_repo=False, +hub_strategy=every_save, +hub_token=, +ignore_data_skip=False, +include_inputs_for_metrics=False, +include_num_input_tokens_seen=False, +include_tokens_per_second=False, +jit_mode_eval=False, +label_names=None, +label_smoothing_factor=0.0, +learning_rate=2e-05, +length_column_name=length, +load_best_model_at_end=False, +local_rank=0, +log_level=passive, +log_level_replica=warning, +log_on_each_node=True, +logging_dir=./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/runs/Jan23_00-20-19_dlc1irjyfb0zt5ew-worker-5, +logging_first_step=False, +logging_nan_inf_filter=True, +logging_steps=1.0, +logging_strategy=steps, +lora_alpha=16, +lora_bias=none, +lora_dropout=0.05, +lora_enable=False, +lora_r=64, +lora_weight_path=, +lr_scheduler_kwargs={}, +lr_scheduler_type=cosine, +max_grad_norm=1.0, +max_steps=-1, +metric_for_best_model=None, +mm_projector_lr=None, +mm_vision_tower_lr=2e-06, +model_max_length=32768, +mp_parameters=, +mpt_attn_impl=triton, +neftune_noise_alpha=None, +no_cuda=False, +num_train_epochs=1.0, +optim=adamw_torch, +optim_args=None, +output_dir=./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt, +overwrite_output_dir=False, +past_index=-1, +per_device_eval_batch_size=4, +per_device_train_batch_size=1, +prediction_loss_only=False, +push_to_hub=False, +push_to_hub_model_id=None, +push_to_hub_organization=None, +push_to_hub_token=, +quant_type=nf4, +ray_scope=last, +remove_unused_columns=False, +report_to=['wandb'], +resume_from_checkpoint=None, +run_name=llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt, +save_on_each_node=False, +save_only_model=False, +save_safetensors=True, +save_steps=10000, +save_strategy=steps, +save_total_limit=1, +seed=42, +skip_memory_metrics=True, +split_batches=False, +tf32=True, +torch_compile=False, +torch_compile_backend=None, +torch_compile_mode=None, +torchdynamo=None, +tpu_metrics_debug=False, +tpu_num_cores=None, +use_cpu=False, +use_ipex=False, +use_legacy_prediction_loop=False, +use_mps_device=False, +warmup_ratio=0.03, +warmup_steps=0, +weight_decay=0.0, +) +01/23/2025 00:20:19 - INFO - llava.train.train - Training/evaluation parameters DataArguments(data_path=None, meta_path='playground/meta_json/llavanext_sample/llava_next_notext_inf37kpolishmd_de35k_know40k_knins40k_creation10kfixed_chart11kmerge_tqa8k_info28k_gpt.json', lazy_preprocess=True, is_multimodal=False, image_folder=None, image_aspect_ratio='anyres', image_grid_pinpoints='[(336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008)]', image_crop_resolution=None, image_split_resolution=None, use_data_resampling=False) +01/23/2025 00:20:19 - INFO - llava.train.train - Training/evaluation parameters TrainingArguments( +_n_gpu=1, +adafactor=False, +adam_beta1=0.9, +adam_beta2=0.999, +adam_epsilon=1e-08, +auto_find_batch_size=False, +bf16=True, +bf16_full_eval=False, +bits=16, +cache_dir=None, +data_seed=None, +dataloader_drop_last=False, +dataloader_num_workers=4, +dataloader_persistent_workers=False, +dataloader_pin_memory=True, +ddp_backend=None, +ddp_broadcast_buffers=None, +ddp_bucket_cap_mb=None, +ddp_find_unused_parameters=None, +ddp_timeout=1800, +debug=[], +deepspeed=./scripts/zero3.json, +disable_tqdm=False, +dispatch_batches=None, +do_eval=False, +do_predict=False, +do_train=False, +double_quant=True, +eval_accumulation_steps=None, +eval_delay=0, +eval_steps=None, +evaluation_strategy=no, +fp16=False, +fp16_backend=auto, +fp16_full_eval=False, +fp16_opt_level=O1, +freeze_mm_mlp_adapter=False, +fsdp=[], +fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, +fsdp_min_num_params=0, +fsdp_transformer_layer_cls_to_wrap=None, +full_determinism=False, +gradient_accumulation_steps=2, +gradient_checkpointing=True, +gradient_checkpointing_kwargs=None, +greater_is_better=None, +group_by_length=False, +group_by_modality_length=True, +half_precision_backend=auto, +hub_always_push=False, +hub_model_id=None, +hub_private_repo=False, +hub_strategy=every_save, +hub_token=, +ignore_data_skip=False, +include_inputs_for_metrics=False, +include_num_input_tokens_seen=False, +include_tokens_per_second=False, +jit_mode_eval=False, +label_names=None, +label_smoothing_factor=0.0, +learning_rate=2e-05, +length_column_name=length, +load_best_model_at_end=False, +local_rank=0, +log_level=passive, +log_level_replica=warning, +log_on_each_node=True, +logging_dir=./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt/runs/Jan23_00-20-19_dlc1irjyfb0zt5ew-worker-0, +logging_first_step=False, +logging_nan_inf_filter=True, +logging_steps=1.0, +logging_strategy=steps, +lora_alpha=16, +lora_bias=none, +lora_dropout=0.05, +lora_enable=False, +lora_r=64, +lora_weight_path=, +lr_scheduler_kwargs={}, +lr_scheduler_type=cosine, +max_grad_norm=1.0, +max_steps=-1, +metric_for_best_model=None, +mm_projector_lr=None, +mm_vision_tower_lr=2e-06, +model_max_length=32768, +mp_parameters=, +mpt_attn_impl=triton, +neftune_noise_alpha=None, +no_cuda=False, +num_train_epochs=1.0, +optim=adamw_torch, +optim_args=None, +output_dir=./checkpoints/llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt, +overwrite_output_dir=False, +past_index=-1, +per_device_eval_batch_size=4, +per_device_train_batch_size=1, +prediction_loss_only=False, +push_to_hub=False, +push_to_hub_model_id=None, +push_to_hub_organization=None, +push_to_hub_token=, +quant_type=nf4, +ray_scope=last, +remove_unused_columns=False, +report_to=['wandb'], +resume_from_checkpoint=None, +run_name=llavaAR4-qwen2_5-32b-sft-llavanext-notext-kn-infpolishmd-detail-knins40k-creationme10kfixed-chart11kmerge-tqa8k-info28kgpt, +save_on_each_node=False, +save_only_model=False, +save_safetensors=True, +save_steps=10000, +save_strategy=steps, +save_total_limit=1, +seed=42, +skip_memory_metrics=True, +split_batches=False, +tf32=True, +torch_compile=False, +torch_compile_backend=None, +torch_compile_mode=None, +torchdynamo=None, +tpu_metrics_debug=False, +tpu_num_cores=None, +use_cpu=False, +use_ipex=False, +use_legacy_prediction_loop=False, +use_mps_device=False, +warmup_ratio=0.03, +warmup_steps=0, +weight_decay=0.0, +) +01/23/2025 00:20:19 - INFO - llava.train.train - Training/evaluation parameters DataArguments(data_path=None, meta_path='playground/meta_json/llavanext_sample/llava_next_notext_inf37kpolishmd_de35k_know40k_knins40k_creation10kfixed_chart11kmerge_tqa8k_info28k_gpt.json', lazy_preprocess=True, is_multimodal=False, image_folder=None, image_aspect_ratio='anyres', image_grid_pinpoints='[(336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008)]', image_crop_resolution=None, image_split_resolution=None, use_data_resampling=False) +[INFO|configuration_utils.py:727] 2025-01-23 00:20:19,301 >> loading configuration file models/qwen/qwen2.5-32B-Instruct/config.json +[INFO|configuration_utils.py:727] 2025-01-23 00:20:19,308 >> loading configuration file models/qwen/qwen2.5-32B-Instruct/config.json +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:19,301 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[INFO|configuration_utils.py:792] 2025-01-23 00:20:19,302 >> Model config LlavaQwenConfig { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "eos_token_id": 151645, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 27648, + "max_position_embeddings": 32768, + "max_window_layers": 70, + "model_type": "llava_qwen", + "num_attention_heads": 40, + "num_hidden_layers": 64, + "num_key_value_heads": 8, + "rms_norm_eps": 1e-06, + "rope_theta": 1000000.0, + "sliding_window": 131072, + "tie_word_embeddings": false, + "torch_dtype": "bfloat16", + "transformers_version": "4.37.2", + "use_cache": true, + "use_sliding_window": false, + "vocab_size": 152064 +} + +[INFO|modeling_utils.py:3473] 2025-01-23 00:20:19,303 >> loading weights file models/qwen/qwen2.5-32B-Instruct/model.safetensors.index.json +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:19,308 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[INFO|configuration_utils.py:792] 2025-01-23 00:20:19,308 >> Model config LlavaQwenConfig { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "eos_token_id": 151645, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 27648, + "max_position_embeddings": 32768, + "max_window_layers": 70, + "model_type": "llava_qwen", + "num_attention_heads": 40, + "num_hidden_layers": 64, + "num_key_value_heads": 8, + "rms_norm_eps": 1e-06, + "rope_theta": 1000000.0, + "sliding_window": 131072, + "tie_word_embeddings": false, + "torch_dtype": "bfloat16", + "transformers_version": "4.37.2", + "use_cache": true, + "use_sliding_window": false, + "vocab_size": 152064 +} + +[INFO|modeling_utils.py:3473] 2025-01-23 00:20:19,309 >> loading weights file models/qwen/qwen2.5-32B-Instruct/model.safetensors.index.json +[INFO|modeling_utils.py:1426] 2025-01-23 00:20:19,306 >> Instantiating LlavaQwenForCausalLM model under default dtype torch.bfloat16. +[INFO|modeling_utils.py:3582] 2025-01-23 00:20:19,306 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[INFO|modeling_utils.py:1426] 2025-01-23 00:20:19,312 >> Instantiating LlavaQwenForCausalLM model under default dtype torch.bfloat16. +[INFO|modeling_utils.py:3582] 2025-01-23 00:20:19,312 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[INFO|configuration_utils.py:727] 2025-01-23 00:20:19,306 >> loading configuration file models/qwen/qwen2.5-32B-Instruct/config.json +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:19,317 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:19,306 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[INFO|configuration_utils.py:792] 2025-01-23 00:20:19,307 >> Model config LlavaQwenConfig { + "architectures": [ + "Qwen2ForCausalLM" + ], + "attention_dropout": 0.0, + "bos_token_id": 151643, + "eos_token_id": 151645, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 27648, + "max_position_embeddings": 32768, + "max_window_layers": 70, + "model_type": "llava_qwen", + "num_attention_heads": 40, + "num_hidden_layers": 64, + "num_key_value_heads": 8, + "rms_norm_eps": 1e-06, + "rope_theta": 1000000.0, + "sliding_window": 131072, + "tie_word_embeddings": false, + "torch_dtype": "bfloat16", + "transformers_version": "4.37.2", + "use_cache": true, + "use_sliding_window": false, + "vocab_size": 152064 +} + +[INFO|modeling_utils.py:3473] 2025-01-23 00:20:19,308 >> loading weights file models/qwen/qwen2.5-32B-Instruct/model.safetensors.index.json +[INFO|modeling_utils.py:1426] 2025-01-23 00:20:19,311 >> Instantiating LlavaQwenForCausalLM model under default dtype torch.bfloat16. +[INFO|modeling_utils.py:3582] 2025-01-23 00:20:19,311 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:19,316 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[INFO|configuration_utils.py:826] 2025-01-23 00:20:19,325 >> Generate config GenerationConfig { + "bos_token_id": 151643, + "eos_token_id": 151645 +} + +[INFO|configuration_utils.py:826] 2025-01-23 00:20:19,325 >> Generate config GenerationConfig { + "bos_token_id": 151643, + "eos_token_id": 151645 +} + +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:19,310 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[INFO|configuration_utils.py:826] 2025-01-23 00:20:19,318 >> Generate config GenerationConfig { + "bos_token_id": 151643, + "eos_token_id": 151645 +} + +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,637 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,643 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,645 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,645 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,649 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,649 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,650 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,651 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,655 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,655 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,656 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,664 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,662 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,672 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,680 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,686 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,688 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,694 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,698 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,701 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,702 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,702 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,705 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,706 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,708 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,709 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,710 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,711 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,711 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,714 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,716 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,719 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,748 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,751 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,756 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,757 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,759 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,766 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,769 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +01/23/2025 00:20:21 - WARNING - llava.train.train - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,777 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +[WARNING|configuration_utils.py:607] 2025-01-23 00:20:21,778 >> You are using a model of type qwen2 to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield errors. +[WARNING|modeling_utils.py:1517] 2025-01-23 00:20:21,785 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. +dlc1irjyfb0zt5ew-worker-6:79:79 [7] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-6:79:79 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:79:79 [7] NCCL INFO Bootstrap : Using eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:79:79 [7] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-6:79:79 [7] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-6:79:79 [7] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-6:77:77 [5] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-6:77:77 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:77:77 [5] NCCL INFO Bootstrap : Using eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:77:77 [5] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-6:77:77 [5] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-6:77:77 [5] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-6:74:74 [2] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-6:74:74 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:74:74 [2] NCCL INFO Bootstrap : Using eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:74:74 [2] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-6:74:74 [2] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-6:74:74 [2] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-6:73:73 [1] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-6:73:73 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:73:73 [1] NCCL INFO Bootstrap : Using eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:73:73 [1] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-6:73:73 [1] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-6:73:73 [1] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-6:76:76 [4] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-6:76:76 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:76:76 [4] NCCL INFO Bootstrap : Using eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:76:76 [4] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-6:76:76 [4] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-6:76:76 [4] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-6:75:75 [3] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-6:75:75 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:75:75 [3] NCCL INFO Bootstrap : Using eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:75:75 [3] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-6:75:75 [3] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-6:75:75 [3] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:73:73 [0] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-5:79:79 [6] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:73:73 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:79:79 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:79:79 [6] NCCL INFO Bootstrap : Using eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:73:73 [0] NCCL INFO Bootstrap : Using eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:79:79 [6] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:73:73 [0] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:73:73 [0] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-5:79:79 [6] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-5:73:73 [0] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-5:79:79 [6] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-5:77:77 [4] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-5:77:77 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:77:77 [4] NCCL INFO Bootstrap : Using eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:77:77 [4] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:77:77 [4] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-5:77:77 [4] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:75:75 [2] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-5:75:75 [2] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:75:75 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:75:75 [2] NCCL INFO Bootstrap : Using eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:75:75 [2] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-0:75:75 [2] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-0:75:75 [2] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-5:75:75 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:75:75 [2] NCCL INFO Bootstrap : Using eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:75:75 [2] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:75:75 [2] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-5:75:75 [2] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-5:78:78 [5] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-5:78:78 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:78:78 [5] NCCL INFO Bootstrap : Using eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:78:78 [5] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:78:78 [5] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-5:78:78 [5] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:80:80 [7] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:80:80 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:80:80 [7] NCCL INFO Bootstrap : Using eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:80:80 [7] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-0:80:80 [7] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-0:80:80 [7] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:79:79 [6] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:79:79 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:79:79 [6] NCCL INFO Bootstrap : Using eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:79:79 [6] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-0:79:79 [6] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-0:79:79 [6] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:73:73 [0] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:73:73 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:73:73 [0] NCCL INFO Bootstrap : Using eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:73:73 [0] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-0:73:73 [0] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-0:73:73 [0] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:80:80 [7] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-5:80:80 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:80:80 [7] NCCL INFO Bootstrap : Using eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:80:80 [7] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:80:80 [7] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-5:80:80 [7] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-5:76:76 [3] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-5:76:76 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:76:76 [3] NCCL INFO Bootstrap : Using eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:76:76 [3] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:76:76 [3] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-5:76:76 [3] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-5:74:74 [1] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-5:74:74 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:74:74 [1] NCCL INFO Bootstrap : Using eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:74:74 [1] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:74:74 [1] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-5:74:74 [1] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:76:76 [3] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:76:76 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:77:77 [4] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:77:77 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:76:76 [3] NCCL INFO Bootstrap : Using eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:76:76 [3] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-0:77:77 [4] NCCL INFO Bootstrap : Using eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:77:77 [4] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-0:76:76 [3] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-0:76:76 [3] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:77:77 [4] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-0:77:77 [4] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:74:74 [1] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:74:74 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:74:74 [1] NCCL INFO Bootstrap : Using eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:74:74 [1] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:74:74 [1] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-0:74:74 [1] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-6:72:72 [0] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:78:78 [5] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-0:78:78 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:72:72 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:72:72 [0] NCCL INFO Bootstrap : Using eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:72:72 [0] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-6:72:72 [0] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-6:72:72 [0] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:78:78 [5] NCCL INFO Bootstrap : Using eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:78:78 [5] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-0:78:78 [5] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-0:78:78 [5] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:78:78 [6] NCCL INFO cudaDriverVersion 12010 +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.46.162<0> +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:78:78 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:78:78 [6] NCCL INFO Bootstrap : Using eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:78:78 [6] NCCL INFO Plugin name set by env to libnccl-net-none.so +dlc1irjyfb0zt5ew-worker-6:78:78 [6] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory +dlc1irjyfb0zt5ew-worker-6:78:78 [6] NCCL INFO NET/Plugin : No plugin found, using internal implementation +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.12.62<0> +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO NCCL_IB_HCA set to mlx5 +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.8.74.99<0> +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO comm 0x9ab58920 rank 56 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO comm 0x9a8d4590 rank 55 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO comm 0x9adb4cb0 rank 57 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO comm 0x9b9384a0 rank 59 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO comm 0x9bc9d510 rank 60 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO comm 0x9b1e9740 rank 61 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO comm 0x9adf3c70 rank 58 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO comm 0x9b788240 rank 63 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO comm 0x9aa0dd20 rank 62 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO comm 0x9ae0d630 rank 54 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO comm 0x9b140b30 rank 53 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO comm 0x9b0bad90 rank 52 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO comm 0x9ac83900 rank 48 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO comm 0x9a5a1030 rank 51 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO comm 0x9a4d4a80 rank 50 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO comm 0x99ebdcf0 rank 49 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO comm 0x9b8bfc50 rank 13 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO comm 0x9b1f00d0 rank 14 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO comm 0x9aaaf700 rank 12 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO comm 0x9b001930 rank 15 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO comm 0x9a801560 rank 8 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO comm 0x99f67b70 rank 10 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO comm 0x9a8c13a0 rank 11 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO comm 0x9a23e7c0 rank 9 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0xa8b591fa54e9ef10 - Init START +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO NVLS multicast support is not available on dev 7 +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO NVLS multicast support is not available on dev 6 +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO NVLS multicast support is not available on dev 0 +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO NVLS multicast support is not available on dev 5 +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO NVLS multicast support is not available on dev 1 +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO NVLS multicast support is not available on dev 3 +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO NVLS multicast support is not available on dev 4 +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO NVLS multicast support is not available on dev 2 +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO NVLS multicast support is not available on dev 0 +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO NVLS multicast support is not available on dev 3 +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO NVLS multicast support is not available on dev 5 +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO NVLS multicast support is not available on dev 4 +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO NVLS multicast support is not available on dev 2 +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO NVLS multicast support is not available on dev 7 +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO NVLS multicast support is not available on dev 6 +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO NVLS multicast support is not available on dev 1 +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO NVLS multicast support is not available on dev 5 +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO NVLS multicast support is not available on dev 7 +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO NVLS multicast support is not available on dev 1 +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO NVLS multicast support is not available on dev 0 +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO NVLS multicast support is not available on dev 6 +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO NVLS multicast support is not available on dev 3 +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO NVLS multicast support is not available on dev 2 +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO NVLS multicast support is not available on dev 4 +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 15/-1/-1->14->13 [3] 15/-1/-1->14->23 [4] 15/-1/-1->14->13 [5] 15/-1/-1->14->13 [6] 15/-1/-1->14->13 [7] 15/6/-1->14->30 +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Trees [0] 49/56/-1->48->32 [1] 49/-1/-1->48->55 [2] 49/-1/-1->48->55 [3] 49/-1/-1->48->55 [4] 49/-1/-1->48->41 [5] 49/-1/-1->48->55 [6] 49/-1/-1->48->55 [7] 49/-1/-1->48->55 +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Trees [0] 13/-1/-1->12->11 [1] 13/-1/-1->12->11 [2] 13/-1/-1->12->21 [3] 13/-1/-1->12->11 [4] 13/-1/-1->12->11 [5] 13/-1/-1->12->11 [6] 13/4/-1->12->28 [7] 13/-1/-1->12->11 +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->12 [2] 14/-1/-1->13->12 [3] -1/-1/-1->13->12 [4] 14/-1/-1->13->12 [5] 14/-1/-1->13->12 [6] 14/20/-1->13->12 [7] -1/-1/-1->13->12 +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Trees [0] 12/-1/-1->11->10 [1] 12/-1/-1->11->10 [2] -1/-1/-1->11->10 [3] 12/-1/-1->11->10 [4] 12/-1/-1->11->10 [5] 12/18/-1->11->10 [6] -1/-1/-1->11->10 [7] 12/-1/-1->11->10 +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Trees [0] -1/-1/-1->15->14 [1] 8/-1/-1->15->14 [2] 8/-1/-1->15->14 [3] 8/-1/-1->15->14 [4] -1/-1/-1->15->14 [5] 8/-1/-1->15->14 [6] 8/-1/-1->15->14 [7] 8/22/-1->15->14 +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->19 [2] 11/-1/-1->10->9 [3] 11/-1/-1->10->9 [4] 11/-1/-1->10->9 [5] 11/2/-1->10->26 [6] 11/-1/-1->10->9 [7] 11/-1/-1->10->9 +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Trees [0] 9/-1/-1->8->17 [1] 9/-1/-1->8->15 [2] 9/-1/-1->8->15 [3] 9/-1/-1->8->15 [4] 9/0/-1->8->24 [5] 9/-1/-1->8->15 [6] 9/-1/-1->8->15 [7] 9/-1/-1->8->15 +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] -1/-1/-1->9->8 [2] 10/-1/-1->9->8 [3] 10/-1/-1->9->8 [4] 10/16/-1->9->8 [5] -1/-1/-1->9->8 [6] 10/-1/-1->9->8 [7] 10/-1/-1->9->8 +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Trees [0] 54/-1/-1->53->52 [1] 54/-1/-1->53->52 [2] 54/44/-1->53->52 [3] -1/-1/-1->53->52 [4] 54/-1/-1->53->52 [5] 54/-1/-1->53->52 [6] 54/-1/-1->53->52 [7] -1/-1/-1->53->52 +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Trees [0] -1/-1/-1->55->54 [1] 48/-1/-1->55->54 [2] 48/-1/-1->55->54 [3] 48/46/-1->55->54 [4] -1/-1/-1->55->54 [5] 48/-1/-1->55->54 [6] 48/-1/-1->55->54 [7] 48/-1/-1->55->54 +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Trees [0] 50/40/-1->49->48 [1] -1/-1/-1->49->48 [2] 50/-1/-1->49->48 [3] 50/-1/-1->49->48 [4] 50/-1/-1->49->48 [5] -1/-1/-1->49->48 [6] 50/-1/-1->49->48 [7] 50/-1/-1->49->48 +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Trees [0] 53/-1/-1->52->51 [1] 53/-1/-1->52->51 [2] 53/60/-1->52->36 [3] 53/-1/-1->52->51 [4] 53/-1/-1->52->51 [5] 53/-1/-1->52->51 [6] 53/-1/-1->52->45 [7] 53/-1/-1->52->51 +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Trees [0] 51/-1/-1->50->49 [1] 51/58/-1->50->34 [2] 51/-1/-1->50->49 [3] 51/-1/-1->50->49 [4] 51/-1/-1->50->49 [5] 51/-1/-1->50->43 [6] 51/-1/-1->50->49 [7] 51/-1/-1->50->49 +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Trees [0] 55/-1/-1->54->53 [1] 55/-1/-1->54->53 [2] 55/-1/-1->54->53 [3] 55/62/-1->54->38 [4] 55/-1/-1->54->53 [5] 55/-1/-1->54->53 [6] 55/-1/-1->54->53 [7] 55/-1/-1->54->47 +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Trees [0] 52/-1/-1->51->50 [1] 52/42/-1->51->50 [2] -1/-1/-1->51->50 [3] 52/-1/-1->51->50 [4] 52/-1/-1->51->50 [5] 52/-1/-1->51->50 [6] -1/-1/-1->51->50 [7] 52/-1/-1->51->50 +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 03/0 : 12[4] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 01/0 : 8[0] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 01/0 : 10[2] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 03/0 : 52[4] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 01/0 : 48[0] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 01/0 : 50[2] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 00/0 : 49[1] -> 56[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 04/0 : 49[1] -> 56[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 03/0 : 47[7] -> 54[6] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 07/0 : 52[4] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 07/0 : 47[7] -> 54[6] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 05/0 : 48[0] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 03/0 : 7[7] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 07/0 : 7[7] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 00/0 : 9[1] -> 16[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 04/0 : 9[1] -> 16[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 05/0 : 50[2] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 03/0 : 48[0] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 07/0 : 48[0] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 02/0 : 45[5] -> 52[4] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 06/0 : 45[5] -> 52[4] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 01/0 : 51[3] -> 58[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 05/0 : 51[3] -> 58[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 02/0 : 53[5] -> 60[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 06/0 : 53[5] -> 60[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 00/0 : 41[1] -> 48[0] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 04/0 : 41[1] -> 48[0] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 00/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 02/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 03/0 : 55[7] -> 62[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 07/0 : 55[7] -> 62[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 04/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 01/0 : 43[3] -> 50[2] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 05/0 : 43[3] -> 50[2] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 06/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 01/0 : 52[4] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 03/0 : 54[6] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 05/0 : 52[4] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 00/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Trees [0] 63/-1/-1->62->61 [1] 63/-1/-1->62->61 [2] 63/-1/-1->62->61 [3] 63/-1/-1->62->54 [4] 63/-1/-1->62->61 [5] 63/-1/-1->62->61 [6] 63/-1/-1->62->61 [7] 63/30/-1->62->-1 +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Trees [0] -1/-1/-1->63->62 [1] 56/-1/-1->63->62 [2] 56/-1/-1->63->62 [3] 56/-1/-1->63->62 [4] -1/-1/-1->63->62 [5] 56/-1/-1->63->62 [6] 56/-1/-1->63->62 [7] 56/-1/-1->63->62 +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Trees [0] 61/-1/-1->60->59 [1] 61/-1/-1->60->59 [2] 61/-1/-1->60->52 [3] 61/-1/-1->60->59 [4] 61/-1/-1->60->59 [5] 61/-1/-1->60->59 [6] 61/28/-1->60->-1 [7] 61/-1/-1->60->59 +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Trees [0] 62/-1/-1->61->60 [1] 62/-1/-1->61->60 [2] 62/-1/-1->61->60 [3] -1/-1/-1->61->60 [4] 62/-1/-1->61->60 [5] 62/-1/-1->61->60 [6] 62/-1/-1->61->60 [7] -1/-1/-1->61->60 +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Trees [0] 59/-1/-1->58->57 [1] 59/-1/-1->58->50 [2] 59/-1/-1->58->57 [3] 59/-1/-1->58->57 [4] 59/-1/-1->58->57 [5] 59/26/-1->58->-1 [6] 59/-1/-1->58->57 [7] 59/-1/-1->58->57 +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Trees [0] 60/-1/-1->59->58 [1] 60/-1/-1->59->58 [2] -1/-1/-1->59->58 [3] 60/-1/-1->59->58 [4] 60/-1/-1->59->58 [5] 60/-1/-1->59->58 [6] -1/-1/-1->59->58 [7] 60/-1/-1->59->58 +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Trees [0] 58/-1/-1->57->56 [1] -1/-1/-1->57->56 [2] 58/-1/-1->57->56 [3] 58/-1/-1->57->56 [4] 58/-1/-1->57->56 [5] -1/-1/-1->57->56 [6] 58/-1/-1->57->56 [7] 58/-1/-1->57->56 +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Trees [0] 57/-1/-1->56->48 [1] 57/-1/-1->56->63 [2] 57/-1/-1->56->63 [3] 57/-1/-1->56->63 [4] 57/24/-1->56->-1 [5] 57/-1/-1->56->63 [6] 57/-1/-1->56->63 [7] 57/-1/-1->56->63 +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 03/0 : 60[4] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 01/0 : 58[2] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 01/0 : 56[0] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 03/0 : 55[7] -> 62[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 07/0 : 55[7] -> 62[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 00/0 : 57[1] -> 0[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 04/0 : 57[1] -> 0[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 07/0 : 60[4] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 05/0 : 58[2] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 05/0 : 56[0] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 03/0 : 62[6] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 03/0 : 56[0] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 07/0 : 62[6] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 02/0 : 53[5] -> 60[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 07/0 : 56[0] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 01/0 : 59[3] -> 2[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 06/0 : 53[5] -> 60[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 05/0 : 59[3] -> 2[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 01/0 : 60[4] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 07/0 : 54[6] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 01/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 02/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 04/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 05/0 : 60[4] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 07/0 : 12[4] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 01/0 : 51[3] -> 58[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 05/0 : 51[3] -> 58[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 02/0 : 61[5] -> 4[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 00/0 : 49[1] -> 56[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 06/0 : 61[5] -> 4[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 04/0 : 49[1] -> 56[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 00/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 05/0 : 8[0] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 05/0 : 10[2] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 05/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 06/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 00/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 02/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 03/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 03/0 : 8[0] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 00/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 04/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 01/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 03/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 06/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 04/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 07/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 05/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 00/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 07/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 02/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 00/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 03/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 01/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 04/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 02/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 06/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 04/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 07/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 05/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 06/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 01/0 : 11[3] -> 18[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 05/0 : 11[3] -> 18[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 02/0 : 5[5] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 06/0 : 5[5] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 01/0 : 12[4] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 07/0 : 8[0] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 05/0 : 12[4] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:357 [7] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 02/0 : 13[5] -> 20[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 06/0 : 13[5] -> 20[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 00/0 : 1[1] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 04/0 : 1[1] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 00/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 01/0 : 3[3] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 05/0 : 3[3] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:357 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 03/0 : 14[6] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 02/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 02/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 04/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 04/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 07/0 : 14[6] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:359 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 06/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:359 [6] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 03/0 : 15[7] -> 22[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 07/0 : 15[7] -> 22[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 00/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:359 [6] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-5:79:359 [6] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:79:359 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 01/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 01/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 00/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 06/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 00/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 00/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 02/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 02/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 02/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 02/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 03/0 : 63[7] -> 6[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 07/0 : 63[7] -> 6[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 00/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 00/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 01/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 01/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 01/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 04/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 03/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 04/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 03/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 03/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 05/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 00/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 02/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 02/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 03/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 04/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 05/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 06/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 06/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 07/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:362 [1] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 00/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 05/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 06/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 04/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 04/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 06/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 00/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 02/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 06/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 06/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 05/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:356 [5] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-5:76:360 [3] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 01/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 03/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 07/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:362 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 07/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 07/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 02/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 04/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:362 [1] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 04/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 06/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:360 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-5:78:356 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 05/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 07/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 06/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:362 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 02/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:355 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 00/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 00/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 02/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 03/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 02/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 01/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 04/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 00/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 04/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 03/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 03/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 05/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 01/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 06/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 04/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 04/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 06/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 02/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:355 [0] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:76:358 [3] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-5:73:355 [0] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-5:73:355 [0] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:73:355 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:74:362 [1] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-0:74:362 [1] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-0:74:362 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:78:357 [5] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 00/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 01/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 07/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 06/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 05/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:358 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 04/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 02/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 02/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 07/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 07/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 05/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 04/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 03/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 06/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 06/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 05/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:356 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:78:356 [6] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 06/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:357 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 07/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:358 [3] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-0:76:358 [3] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:78:356 [6] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-6:78:356 [6] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:77:359 [5] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-6:75:362 [3] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-6:78:356 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:76:358 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-6:73:361 [1] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:80:359 [7] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:78:357 [5] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-0:78:357 [5] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-6:75:362 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:77:359 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-0:78:357 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-6:73:361 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:80:359 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:79:357 [7] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-5:80:357 [7] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-5:80:357 [7] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:80:357 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:78:356 [5] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-5:78:356 [5] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:78:356 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-6:79:357 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-5:76:360 [3] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-5:76:360 [3] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:76:360 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-6:72:363 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-6:72:363 [0] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 00/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:363 [0] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-6:72:363 [0] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-0:80:359 [7] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-0:80:359 [7] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:72:363 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:80:359 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 01/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:360 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-0:75:360 [2] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 02/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:362 [1] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-5:74:362 [1] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:74:362 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 03/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 04/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:75:360 [2] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-0:75:360 [2] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-0:75:360 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 05/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 06/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:79:357 [7] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-6:79:357 [7] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 07/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:357 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-6:77:359 [5] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-6:77:359 [5] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:77:359 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 00/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 01/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:361 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-0:73:361 [0] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 00/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 02/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 01/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 03/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 02/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 02/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 04/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:361 [0] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-0:73:361 [0] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-0:73:361 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 00/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 03/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 00/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 05/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 01/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 06/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 03/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 04/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 01/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 06/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 05/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 02/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 07/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:362 [3] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-6:75:362 [3] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 00/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 06/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 03/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 01/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 07/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:362 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 04/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 02/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 05/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 00/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 04/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 00/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 04/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 01/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 05/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 02/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 06/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 01/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 05/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 07/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 03/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 06/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 04/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 02/0 : 44[4] -> 53[5] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 03/0 : 54[6] -> 62[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 02/0 : 53[5] -> 44[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 00/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 05/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 02/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 07/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 03/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 01/0 : 42[2] -> 51[3] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 02/0 : 52[4] -> 60[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 01/0 : 51[3] -> 42[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 04/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 02/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 06/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 07/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 04/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 05/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 06/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 8. +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:73:361 [1] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-6:73:361 [1] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-6:73:361 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO NCCL_IB_TC set by environment to 136. +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO NCCL_IB_SL set by environment to 5. +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22. +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Channel 06/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 07/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 01/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 00/0 : 40[0] -> 49[1] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 01/0 : 50[2] -> 58[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 00/0 : 49[1] -> 40[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 03/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 00/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 01/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 05/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Channel 04/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Channel 05/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 07/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 00/0 : 48[0] -> 56[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 03/0 : 46[6] -> 55[7] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 03/0 : 55[7] -> 46[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 01/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 02/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 03/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 05/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 06/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 07/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 00/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 01/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 00/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 00/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 02/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 01/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 01/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 03/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 02/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 00/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 01/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 02/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 04/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 00/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 03/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 03/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 05/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 02/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 03/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 04/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 05/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 06/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 07/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 05/0 : 2[2] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 01/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 04/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 04/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 06/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 00/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 02/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 05/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 05/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 07/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 01/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 03/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 06/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 06/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 02/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 04/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 07/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 07/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 04/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 05/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 05/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 00/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 06/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 06/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 01/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 07/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 03/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 02/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 03/0 : 54[6] -> 62[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 07/0 : 30[6] -> 62[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 07/0 : 62[6] -> 30[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Channel 06/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 04/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 05/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 07/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 02/0 : 52[4] -> 60[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 06/0 : 28[4] -> 60[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 06/0 : 60[4] -> 28[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 03/0 : 38[6] -> 54[6] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 01/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 03/0 : 54[6] -> 38[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 02/0 : 36[4] -> 52[4] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 02/0 : 52[4] -> 36[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Channel 05/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 00/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 01/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 02/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 00/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 03/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 01/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 04/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 02/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 05/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 03/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 06/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 04/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 07/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 05/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 01/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 06/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 04/0 : 16[0] -> 9[1] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 03/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 07/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 05/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 06/0 : 4[4] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 05/0 : 18[2] -> 11[3] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 07/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 00/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 01/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 02/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 03/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 04/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 05/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 06/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 07/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 06/0 : 20[4] -> 13[5] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 07/0 : 22[6] -> 15[7] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 01/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 02/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 03/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 05/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 06/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 07/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 00/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 02/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 03/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 04/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 06/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 07/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 01/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 00/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 01/0 : 50[2] -> 58[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Channel 04/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 05/0 : 26[2] -> 58[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 05/0 : 58[2] -> 26[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 03/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 01/0 : 34[2] -> 50[2] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 05/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 01/0 : 50[2] -> 34[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 07/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 01/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 00/0 : 48[0] -> 56[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 04/0 : 24[0] -> 56[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 04/0 : 56[0] -> 24[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 02/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 03/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 05/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 06/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 07/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 00/0 : 32[0] -> 48[0] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 00/0 : 48[0] -> 32[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 02/0 : 12[4] -> 21[5] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 02/0 : 60[4] -> 52[4] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 06/0 : 12[4] -> 28[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 06/0 : 28[4] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 03/0 : 14[6] -> 23[7] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 07/0 : 14[6] -> 30[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 07/0 : 30[6] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 02/0 : 21[5] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 06/0 : 12[4] -> 4[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 02/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 03/0 : 23[7] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Channel 07/0 : 14[6] -> 6[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Channel 06/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 06/0 : 52[4] -> 45[5] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 03/0 : 62[6] -> 54[6] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Channel 07/0 : 54[6] -> 47[7] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 01/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 03/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 05/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Channel 07/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 01/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 03/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 05/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Channel 07/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 01/0 : 58[2] -> 50[2] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 01/0 : 10[2] -> 19[3] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 05/0 : 10[2] -> 26[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 05/0 : 26[2] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 01/0 : 19[3] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 00/0 : 8[0] -> 17[1] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Channel 05/0 : 50[2] -> 43[3] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 00/0 : 56[0] -> 48[0] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 04/0 : 8[0] -> 24[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 04/0 : 24[0] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Channel 05/0 : 10[2] -> 2[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 00/0 : 17[1] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 01/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Channel 05/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 02/0 : 60[4] -> 52[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Channel 03/0 : 62[6] -> 54[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 01/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 03/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 05/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Channel 07/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Channel 01/0 : 58[2] -> 50[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Channel 00/0 : 56[0] -> 48[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 03/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Channel 07/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 00/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Channel 04/0 : 48[0] -> 41[1] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Channel 04/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 03/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Channel 07/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 03/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Channel 07/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:72:346 [0] NCCL INFO comm 0x9ab58920 rank 56 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:76:323 [4] NCCL INFO comm 0x9bc9d510 rank 60 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:78:348 [6] NCCL INFO comm 0x9aa0dd20 rank 62 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:74:320 [2] NCCL INFO comm 0x9adf3c70 rank 58 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:79:316 [7] NCCL INFO comm 0x9b788240 rank 63 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:73:322 [1] NCCL INFO comm 0x9adb4cb0 rank 57 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:77:317 [5] NCCL INFO comm 0x9b1e9740 rank 61 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:75:338 [3] NCCL INFO comm 0x9b9384a0 rank 59 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:79:315 [6] NCCL INFO comm 0x9ae0d630 rank 54 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:73:316 [0] NCCL INFO comm 0x9ac83900 rank 48 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:75:318 [2] NCCL INFO comm 0x9a4d4a80 rank 50 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:74:347 [1] NCCL INFO comm 0x99ebdcf0 rank 49 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:80:332 [7] NCCL INFO comm 0x9a8d4590 rank 55 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:76:331 [3] NCCL INFO comm 0x9a5a1030 rank 51 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:78:319 [5] NCCL INFO comm 0x9b140b30 rank 53 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:77:317 [4] NCCL INFO comm 0x9b0bad90 rank 52 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:74:337 [1] NCCL INFO comm 0x9a23e7c0 rank 9 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:78:344 [5] NCCL INFO comm 0x9b8bfc50 rank 13 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:76:335 [3] NCCL INFO comm 0x9a8c13a0 rank 11 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:80:316 [7] NCCL INFO comm 0x9b001930 rank 15 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:77:336 [4] NCCL INFO comm 0x9aaaf700 rank 12 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:73:318 [0] NCCL INFO comm 0x9a801560 rank 8 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:79:317 [6] NCCL INFO comm 0x9b1f00d0 rank 14 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0xa8b591fa54e9ef10 - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:75:315 [2] NCCL INFO comm 0x99f67b70 rank 10 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0xa8b591fa54e9ef10 - Init COMPLETE + +Loading checkpoint shards: 0%| | 0/17 [00:00> All model checkpoint weights were used when initializing LlavaQwenForCausalLM. + +[INFO|modeling_utils.py:4358] 2025-01-23 00:21:10,990 >> All the weights of LlavaQwenForCausalLM were initialized from the model checkpoint at models/qwen/qwen2.5-32B-Instruct. +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlavaQwenForCausalLM for predictions without further training. +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] +[INFO|configuration_utils.py:779] 2025-01-23 00:21:10,998 >> loading configuration file models/qwen/qwen2.5-32B-Instruct/generation_config.json +[INFO|configuration_utils.py:826] 2025-01-23 00:21:10,998 >> Generate config GenerationConfig { + "attn_implementation": "flash_attention_2", + "bos_token_id": 151643, + "do_sample": true, + "eos_token_id": [ + 151645, + 151643 + ], + "pad_token_id": 151643, + "repetition_penalty": 1.05, + "temperature": 0.7, + "top_k": 20, + "top_p": 0.8 +} + +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None + +Loading checkpoint shards: 100%|█████���████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] +[INFO|modeling_utils.py:4350] 2025-01-23 00:21:10,989 >> All model checkpoint weights were used when initializing LlavaQwenForCausalLM. + +[INFO|modeling_utils.py:4358] 2025-01-23 00:21:10,989 >> All the weights of LlavaQwenForCausalLM were initialized from the model checkpoint at models/qwen/qwen2.5-32B-Instruct. +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlavaQwenForCausalLM for predictions without further training. +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +[INFO|configuration_utils.py:779] 2025-01-23 00:21:10,996 >> loading configuration file models/qwen/qwen2.5-32B-Instruct/generation_config.json +[INFO|configuration_utils.py:826] 2025-01-23 00:21:10,996 >> Generate config GenerationConfig { + "attn_implementation": "flash_attention_2", + "bos_token_id": 151643, + "do_sample": true, + "eos_token_id": [ + 151645, + 151643 + ], + "pad_token_id": 151643, + "repetition_penalty": 1.05, + "temperature": 0.7, + "top_k": 20, + "top_p": 0.8 +} + +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.14s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] +[INFO|modeling_utils.py:4350] 2025-01-23 00:21:11,009 >> All model checkpoint weights were used when initializing LlavaQwenForCausalLM. + +[INFO|modeling_utils.py:4358] 2025-01-23 00:21:11,009 >> All the weights of LlavaQwenForCausalLM were initialized from the model checkpoint at models/qwen/qwen2.5-32B-Instruct. +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlavaQwenForCausalLM for predictions without further training. +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +[INFO|configuration_utils.py:779] 2025-01-23 00:21:11,016 >> loading configuration file models/qwen/qwen2.5-32B-Instruct/generation_config.json +[INFO|configuration_utils.py:826] 2025-01-23 00:21:11,016 >> Generate config GenerationConfig { + "attn_implementation": "flash_attention_2", + "bos_token_id": 151643, + "do_sample": true, + "eos_token_id": [ + 151645, + 151643 + ], + "pad_token_id": 151643, + "repetition_penalty": 1.05, + "temperature": 0.7, + "top_k": 20, + "top_p": 0.8 +} + +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.15s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] + +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.15s/it] +Loading checkpoint shards: 100%|██████████| 17/17 [00:44<00:00, 2.62s/it] +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,023 >> loading file vocab.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,020 >> loading file vocab.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,023 >> loading file merges.txt +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,023 >> loading file added_tokens.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,023 >> loading file special_tokens_map.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,023 >> loading file tokenizer_config.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,023 >> loading file tokenizer.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,020 >> loading file merges.txt +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,020 >> loading file added_tokens.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,020 >> loading file special_tokens_map.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,020 >> loading file tokenizer_config.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,020 >> loading file tokenizer.json +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +Using tokenizer from models/qwen/qwen2.5-32B-Instruct +using cache dir None +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,037 >> loading file vocab.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,037 >> loading file merges.txt +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,037 >> loading file added_tokens.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,037 >> loading file special_tokens_map.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,037 >> loading file tokenizer_config.json +[INFO|tokenization_utils_base.py:2025] 2025-01-23 00:21:11,037 >> loading file tokenizer.json +[WARNING|logging.py:314] 2025-01-23 00:21:11,243 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,249 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,250 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,251 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,254 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[WARNING|logging.py:314] 2025-01-23 00:21:11,258 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,256 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[WARNING|logging.py:314] 2025-01-23 00:21:11,256 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,255 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[WARNING|logging.py:314] 2025-01-23 00:21:11,260 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,262 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - INFO - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[INFO|image_processing_utils.py:373] 2025-01-23 00:21:11,264 >> loading configuration file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/preprocessor_config.json +[INFO|image_processing_utils.py:738] 2025-01-23 00:21:11,264 >> size should be a dictionary on of the following set of keys: ({'width', 'height'}, {'shortest_edge'}, {'longest_edge', 'shortest_edge'}, {'longest_edge'}), got 336. Converted to {'shortest_edge': 336}. +[INFO|image_processing_utils.py:738] 2025-01-23 00:21:11,264 >> crop_size should be a dictionary on of the following set of keys: ({'width', 'height'}, {'shortest_edge'}, {'longest_edge', 'shortest_edge'}, {'longest_edge'}), got 336. Converted to {'height': 336, 'width': 336}. +[INFO|image_processing_utils.py:425] 2025-01-23 00:21:11,264 >> Image processor CLIPImageProcessor { + "crop_size": { + "height": 336, + "width": 336 + }, + "do_center_crop": true, + "do_convert_rgb": true, + "do_normalize": true, + "do_rescale": true, + "do_resize": true, + "image_mean": [ + 0.48145466, + 0.4578275, + 0.40821073 + ], + "image_processor_type": "CLIPImageProcessor", + "image_std": [ + 0.26862954, + 0.26130258, + 0.27577711 + ], + "resample": 3, + "rescale_factor": 0.00392156862745098, + "size": { + "shortest_edge": 336 + } +} + +[WARNING|logging.py:314] 2025-01-23 00:21:11,264 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,256 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,257 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - INFO - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[INFO|image_processing_utils.py:373] 2025-01-23 00:21:11,259 >> loading configuration file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/preprocessor_config.json +[INFO|image_processing_utils.py:738] 2025-01-23 00:21:11,259 >> size should be a dictionary on of the following set of keys: ({'height', 'width'}, {'shortest_edge'}, {'longest_edge', 'shortest_edge'}, {'longest_edge'}), got 336. Converted to {'shortest_edge': 336}. +[INFO|image_processing_utils.py:738] 2025-01-23 00:21:11,259 >> crop_size should be a dictionary on of the following set of keys: ({'height', 'width'}, {'shortest_edge'}, {'longest_edge', 'shortest_edge'}, {'longest_edge'}), got 336. Converted to {'height': 336, 'width': 336}. +[INFO|image_processing_utils.py:425] 2025-01-23 00:21:11,259 >> Image processor CLIPImageProcessor { + "crop_size": { + "height": 336, + "width": 336 + }, + "do_center_crop": true, + "do_convert_rgb": true, + "do_normalize": true, + "do_rescale": true, + "do_resize": true, + "image_mean": [ + 0.48145466, + 0.4578275, + 0.40821073 + ], + "image_processor_type": "CLIPImageProcessor", + "image_std": [ + 0.26862954, + 0.26130258, + 0.27577711 + ], + "resample": 3, + "rescale_factor": 0.00392156862745098, + "size": { + "shortest_edge": 336 + } +} + +[WARNING|logging.py:314] 2025-01-23 00:21:11,259 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,259 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,260 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,261 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - INFO - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,261 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[INFO|configuration_utils.py:727] 2025-01-23 00:21:11,264 >> loading configuration file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/config.json +[INFO|configuration_utils.py:792] 2025-01-23 00:21:11,265 >> Model config CLIPVisionConfig { + "attention_dropout": 0.0, + "dropout": 0.0, + "hidden_act": "quick_gelu", + "hidden_size": 1024, + "image_size": 336, + "initializer_factor": 1.0, + "initializer_range": 0.02, + "intermediate_size": 4096, + "layer_norm_eps": 1e-05, + "model_type": "clip_vision_model", + "num_attention_heads": 16, + "num_channels": 3, + "num_hidden_layers": 24, + "patch_size": 14, + "projection_dim": 768, + "transformers_version": "4.37.2" +} + +[INFO|modeling_utils.py:3473] 2025-01-23 00:21:11,265 >> loading weights file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/pytorch_model.bin +[INFO|image_processing_utils.py:373] 2025-01-23 00:21:11,263 >> loading configuration file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/preprocessor_config.json +[INFO|image_processing_utils.py:738] 2025-01-23 00:21:11,263 >> size should be a dictionary on of the following set of keys: ({'height', 'width'}, {'shortest_edge'}, {'shortest_edge', 'longest_edge'}, {'longest_edge'}), got 336. Converted to {'shortest_edge': 336}. +[INFO|image_processing_utils.py:738] 2025-01-23 00:21:11,263 >> crop_size should be a dictionary on of the following set of keys: ({'height', 'width'}, {'shortest_edge'}, {'shortest_edge', 'longest_edge'}, {'longest_edge'}), got 336. Converted to {'height': 336, 'width': 336}. +[INFO|image_processing_utils.py:425] 2025-01-23 00:21:11,263 >> Image processor CLIPImageProcessor { + "crop_size": { + "height": 336, + "width": 336 + }, + "do_center_crop": true, + "do_convert_rgb": true, + "do_normalize": true, + "do_rescale": true, + "do_resize": true, + "image_mean": [ + 0.48145466, + 0.4578275, + 0.40821073 + ], + "image_processor_type": "CLIPImageProcessor", + "image_std": [ + 0.26862954, + 0.26130258, + 0.27577711 + ], + "resample": 3, + "rescale_factor": 0.00392156862745098, + "size": { + "shortest_edge": 336 + } +} + +[WARNING|logging.py:314] 2025-01-23 00:21:11,263 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[INFO|configuration_utils.py:727] 2025-01-23 00:21:11,275 >> loading configuration file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/config.json +[INFO|configuration_utils.py:792] 2025-01-23 00:21:11,275 >> Model config CLIPVisionConfig { + "attention_dropout": 0.0, + "dropout": 0.0, + "hidden_act": "quick_gelu", + "hidden_size": 1024, + "image_size": 336, + "initializer_factor": 1.0, + "initializer_range": 0.02, + "intermediate_size": 4096, + "layer_norm_eps": 1e-05, + "model_type": "clip_vision_model", + "num_attention_heads": 16, + "num_channels": 3, + "num_hidden_layers": 24, + "patch_size": 14, + "projection_dim": 768, + "transformers_version": "4.37.2" +} + +[INFO|modeling_utils.py:3473] 2025-01-23 00:21:11,276 >> loading weights file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/pytorch_model.bin +[INFO|configuration_utils.py:727] 2025-01-23 00:21:11,272 >> loading configuration file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/config.json +[INFO|configuration_utils.py:792] 2025-01-23 00:21:11,273 >> Model config CLIPVisionConfig { + "attention_dropout": 0.0, + "dropout": 0.0, + "hidden_act": "quick_gelu", + "hidden_size": 1024, + "image_size": 336, + "initializer_factor": 1.0, + "initializer_range": 0.02, + "intermediate_size": 4096, + "layer_norm_eps": 1e-05, + "model_type": "clip_vision_model", + "num_attention_heads": 16, + "num_channels": 3, + "num_hidden_layers": 24, + "patch_size": 14, + "projection_dim": 768, + "transformers_version": "4.37.2" +} + +[INFO|modeling_utils.py:3473] 2025-01-23 00:21:11,274 >> loading weights file /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1/pytorch_model.bin +[WARNING|logging.py:314] 2025-01-23 00:21:11,289 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,290 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[WARNING|logging.py:314] 2025-01-23 00:21:11,333 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +[WARNING|logging.py:314] 2025-01-23 00:21:11,334 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +01/23/2025 00:21:11 - WARNING - llava.train.train - Using conversation template: Conversation(system='<|im_start|>system\nYou are a helpful assistant.', roles=('<|im_start|>user', '<|im_start|>assistant'), messages=[], offset=0, sep_style=, sep='<|im_end|>', sep2=None, version='qwen', mm_system=None, skip_next=False) +[INFO|modeling_utils.py:3582] 2025-01-23 00:21:14,666 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[INFO|modeling_utils.py:3582] 2025-01-23 00:21:14,730 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[INFO|modeling_utils.py:3582] 2025-01-23 00:21:14,798 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[INFO|modeling_utils.py:4340] 2025-01-23 00:21:16,131 >> Some weights of the model checkpoint at /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1 were not used when initializing CLIPVisionModel: ['logit_scale', 'text_model.embeddings.position_embedding.weight', 'text_model.embeddings.position_ids', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.3.mlp.fc2.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.5.mlp.fc1.weight', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.7.self_attn.k_pro[INFO|modeling_utils.py:4340] 2025-01-23 00:21:16,126 >> Some weights of the model checkpoint at /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1 were not used when initializing CLIPVisionModel: ['logit_scale', 'text_model.embeddings.position_embedding.weight', 'text_model.embeddings.position_ids', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.3.mlp.fc2.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.5.mlp.fc1.weight', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.final_layer_norm.bias', 'text_model.final_layer_norm.weight', 'text_projection.weight', 'visual_projection.weight'] +- This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). +- This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). +[INFO|modeling_utils.py:4358] 2025-01-23 00:21:16,131 >> All the weights of CLIPVisionModel were initialized from the model checkpoint at /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1. +If your task is similar to the task the model of the checkpoint was trained on, you can already use CLIPVisionModel for predictions without further training. +j.bias', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.final_layer_norm.bias', 'text_model.final_layer_norm.weight', 'text_projection.weight', 'visual_projection.weight'] +- This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). +- This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). +[INFO|modeling_utils.py:4358] 2025-01-23 00:21:16,126 >> All the weights of CLIPVisionModel were initialized from the model checkpoint at /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1. +If your task is similar to the task the model of the checkpoint was trained on, you can already use CLIPVisionModel for predictions without further training. +[INFO|modeling_utils.py:4340] 2025-01-23 00:21:16,129 >> Some weights of the model checkpoint at /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1 were not used when initializing CLIPVisionModel: ['logit_scale', 'text_model.embeddings.position_embedding.weight', 'text_model.embeddings.position_ids', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.3.mlp.fc2.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.5.mlp.fc1.weight', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.final_layer_norm.bias', 'text_model.final_layer_norm.weight', 'text_projection.weight', 'visual_projection.weight'] +- This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). +- This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). +[INFO|modeling_utils.py:4358] 2025-01-23 00:21:16,129 >> All the weights of CLIPVisionModel were initialized from the model checkpoint at /fs-computility/mllm1/shared/hub/models--openai--clip-vit-large-patch14-336/snapshots/ce19dc912ca5cd21c8a653c79e251e808ccabcd1. +If your task is similar to the task the model of the checkpoint was trained on, you can already use CLIPVisionModel for predictions without further training. +01/23/2025 00:21:35 - INFO - llava.train.train - Add dataset: llava-next-sft-notext with length: 738601, data type: normal, seed: 0 +01/23/2025 00:21:35 - INFO - llava.train.train - Add dataset: llava-next-sft-notext with length: 738601, data type: normal, seed: 0 +01/23/2025 00:21:35 - INFO - llava.train.train - Add dataset: llava-next-sft-notext with length: 738601, data type: normal, seed: 0 +01/23/2025 00:21:38 - INFO - llava.train.train - Add dataset: knowledge_gqa9k_art1500_cc3m30k with length: 40813, data type: know, seed: 1 +01/23/2025 00:21:38 - INFO - llava.train.train - Add dataset: knowledge_gqa9k_art1500_cc3m30k with length: 40813, data type: know, seed: 1 +01/23/2025 00:21:38 - INFO - llava.train.train - Add dataset: knowledge_gqa9k_art1500_cc3m30k with length: 40813, data type: know, seed: 1 +01/23/2025 00:21:41 - INFO - llava.train.train - Add dataset: Inferencial_flickr7k_cc3m30k_polished_md with length: 37117, data type: inf_polishmd, seed: 2 +01/23/2025 00:21:41 - INFO - llava.train.train - Add dataset: Inferencial_flickr7k_cc3m30k_polished_md with length: 37117, data type: inf_polishmd, seed: 2 +01/23/2025 00:21:42 - INFO - llava.train.train - Add dataset: Inferencial_flickr7k_cc3m30k_polished_md with length: 37117, data type: inf_polishmd, seed: 2 +01/23/2025 00:21:45 - INFO - llava.train.train - Add dataset: Detail_flickr7k_cc3m28k with length: 35313, data type: detail, seed: 3 +01/23/2025 00:21:45 - INFO - llava.train.train - Add dataset: Detail_flickr7k_cc3m28k with length: 35313, data type: detail, seed: 3 +01/23/2025 00:21:46 - INFO - llava.train.train - Add dataset: Detail_flickr7k_cc3m28k with length: 35313, data type: detail, seed: 3 +01/23/2025 00:21:49 - INFO - llava.train.train - Add dataset: Knowledge_instruct40k with length: 40218, data type: know_ins, seed: 4 +01/23/2025 00:21:50 - INFO - llava.train.train - Add dataset: Knowledge_instruct40k with length: 40218, data type: know_ins, seed: 4 +01/23/2025 00:21:50 - INFO - llava.train.train - Add dataset: Knowledge_instruct40k with length: 40218, data type: know_ins, seed: 4 +01/23/2025 00:21:53 - INFO - llava.train.train - Add dataset: Creation10k_fixed with length: 9698, data type: creation, seed: 5 +01/23/2025 00:21:53 - INFO - llava.train.train - Add dataset: Creation10k_fixed with length: 9698, data type: creation, seed: 5 +01/23/2025 00:21:54 - INFO - llava.train.train - Add dataset: Creation10k_fixed with length: 9698, data type: creation, seed: 5 +01/23/2025 00:21:56 - INFO - llava.train.train - Add dataset: Chartqa_generate_11k_gpt_qwen_merge with length: 11160, data type: chart, seed: 6 +01/23/2025 00:21:56 - INFO - llava.train.train - Add dataset: Chartqa_generate_11k_gpt_qwen_merge with length: 11160, data type: chart, seed: 6 +01/23/2025 00:21:57 - INFO - llava.train.train - Add dataset: Chartqa_generate_11k_gpt_qwen_merge with length: 11160, data type: chart, seed: 6 +01/23/2025 00:21:59 - INFO - llava.train.train - Add dataset: Tqa_detail_qwengenerate_multi8k_gpt with length: 8391, data type: tqa, seed: 7 +01/23/2025 00:22:00 - INFO - llava.train.train - Add dataset: Tqa_detail_qwengenerate_multi8k_gpt with length: 8391, data type: tqa, seed: 7 +01/23/2025 00:22:01 - INFO - llava.train.train - Add dataset: Tqa_detail_qwengenerate_multi8k_gpt with length: 8391, data type: tqa, seed: 7 +01/23/2025 00:22:03 - INFO - llava.train.train - Add dataset: Infovqa_single_gpt with length: 23068, data type: info, seed: 8 +[INFO|trainer.py:571] 2025-01-23 00:22:03,466 >> Using auto half precision backend +01/23/2025 00:22:03 - INFO - llava.train.train - Add dataset: Infovqa_single_gpt with length: 23068, data type: info, seed: 8 +[INFO|trainer.py:571] 2025-01-23 00:22:03,860 >> Using auto half precision backend +01/23/2025 00:22:04 - INFO - llava.train.train - Add dataset: Infovqa_single_gpt with length: 23068, data type: info, seed: 8 +[INFO|trainer.py:571] 2025-01-23 00:22:04,890 >> Using auto half precision backend +[INFO|trainer.py:1721] 2025-01-23 00:22:47,364 >> ***** Running training ***** +[INFO|trainer.py:1722] 2025-01-23 00:22:47,364 >> Num examples = 944,379 +[INFO|trainer.py:1723] 2025-01-23 00:22:47,364 >> Num Epochs = 1 +[INFO|trainer.py:1724] 2025-01-23 00:22:47,364 >> Instantaneous batch size per device = 1 +[INFO|trainer.py:1727] 2025-01-23 00:22:47,364 >> Total train batch size (w. parallel, distributed & accumulation) = 128 +[INFO|trainer.py:1728] 2025-01-23 00:22:47,364 >> Gradient Accumulation steps = 2 +[INFO|trainer.py:1729] 2025-01-23 00:22:47,364 >> Total optimization steps = 7,378 +[INFO|trainer.py:1730] 2025-01-23 00:22:47,367 >> Number of trainable parameters = 33,098,856,448 +[INFO|trainer.py:1721] 2025-01-23 00:22:47,369 >> ***** Running training ***** +[INFO|trainer.py:1722] 2025-01-23 00:22:47,369 >> Num examples = 944,379 +[INFO|trainer.py:1723] 2025-01-23 00:22:47,369 >> Num Epochs = 1 +[INFO|trainer.py:1724] 2025-01-23 00:22:47,369 >> Instantaneous batch size per device = 1 +[INFO|trainer.py:1727] 2025-01-23 00:22:47,369 >> Total train batch size (w. parallel, distributed & accumulation) = 128 +[INFO|trainer.py:1728] 2025-01-23 00:22:47,369 >> Gradient Accumulation steps = 2 +[INFO|trainer.py:1729] 2025-01-23 00:22:47,369 >> Total optimization steps = 7,378 +[INFO|trainer.py:1730] 2025-01-23 00:22:47,372 >> Number of trainable parameters = 33,098,856,448 +[INFO|trainer.py:1721] 2025-01-23 00:22:47,383 >> ***** Running training ***** +[INFO|trainer.py:1722] 2025-01-23 00:22:47,383 >> Num examples = 944,379 +[INFO|trainer.py:1723] 2025-01-23 00:22:47,383 >> Num Epochs = 1 +[INFO|trainer.py:1724] 2025-01-23 00:22:47,383 >> Instantaneous batch size per device = 1 +[INFO|trainer.py:1727] 2025-01-23 00:22:47,383 >> Total train batch size (w. parallel, distributed & accumulation) = 128 +[INFO|trainer.py:1728] 2025-01-23 00:22:47,383 >> Gradient Accumulation steps = 2 +[INFO|trainer.py:1729] 2025-01-23 00:22:47,383 >> Total optimization steps = 7,378 +[INFO|trainer.py:1730] 2025-01-23 00:22:47,386 >> Number of trainable parameters = 33,098,856,448 +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Using network IB +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO comm 0x7fbf0c044a90 rank 54 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO comm 0x7f9af0044a70 rank 52 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO comm 0x7f52a8044050 rank 53 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO comm 0x7fb9d4043fd0 rank 51 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO comm 0x7f4018044610 rank 50 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO comm 0x7fe534044470 rank 49 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO comm 0x7fb3d80446b0 rank 55 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO comm 0x7f2d10044670 rank 48 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO comm 0x7f4e300443f0 rank 15 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO comm 0x7f4b5c044090 rank 14 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO comm 0x7f424c044290 rank 13 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO comm 0x7fc450044150 rank 8 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO comm 0x7ff200044640 rank 12 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO comm 0x7f4ccc044540 rank 56 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO comm 0x7f1b700442b0 rank 57 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO comm 0x7fad4c044190 rank 11 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO comm 0x7fef74044530 rank 9 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO comm 0x7f0c9c0445c0 rank 10 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO comm 0x7f2008044d50 rank 59 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO comm 0x7f22c8044150 rank 60 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO comm 0x7f23400445d0 rank 58 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO comm 0x7ff8a0044150 rank 61 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO comm 0x7fc184043f90 rank 63 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO comm 0x7fc5340443f0 rank 62 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0x7f74f3499b0b795a - Init START +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO NVLS multicast support is not available on dev 4 +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO NVLS multicast support is not available on dev 3 +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO NVLS multicast support is not available on dev 6 +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO NVLS multicast support is not available on dev 7 +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO NVLS multicast support is not available on dev 3 +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO NVLS multicast support is not available on dev 5 +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO NVLS multicast support is not available on dev 1 +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO NVLS multicast support is not available on dev 0 +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO NVLS multicast support is not available on dev 7 +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO NVLS multicast support is not available on dev 2 +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO NVLS multicast support is not available on dev 1 +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO NVLS multicast support is not available on dev 5 +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO NVLS multicast support is not available on dev 4 +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO NVLS multicast support is not available on dev 6 +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO NVLS multicast support is not available on dev 0 +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO NVLS multicast support is not available on dev 2 +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO NVLS multicast support is not available on dev 1 +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO NVLS multicast support is not available on dev 0 +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO NVLS multicast support is not available on dev 6 +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO NVLS multicast support is not available on dev 3 +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO NVLS multicast support is not available on dev 5 +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO NVLS multicast support is not available on dev 4 +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,ffffffff +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO NVLS multicast support is not available on dev 2 +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO NVLS multicast support is not available on dev 7 +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Trees [0] 57/-1/-1->56->48 [1] 57/-1/-1->56->63 [2] 57/-1/-1->56->63 [3] 57/-1/-1->56->63 [4] 57/24/-1->56->-1 [5] 57/-1/-1->56->63 [6] 57/-1/-1->56->63 [7] 57/-1/-1->56->63 +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Trees [0] 61/-1/-1->60->59 [1] 61/-1/-1->60->59 [2] 61/-1/-1->60->52 [3] 61/-1/-1->60->59 [4] 61/-1/-1->60->59 [5] 61/-1/-1->60->59 [6] 61/28/-1->60->-1 [7] 61/-1/-1->60->59 +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Trees [0] 60/-1/-1->59->58 [1] 60/-1/-1->59->58 [2] -1/-1/-1->59->58 [3] 60/-1/-1->59->58 [4] 60/-1/-1->59->58 [5] 60/-1/-1->59->58 [6] -1/-1/-1->59->58 [7] 60/-1/-1->59->58 +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Trees [0] 62/-1/-1->61->60 [1] 62/-1/-1->61->60 [2] 62/-1/-1->61->60 [3] -1/-1/-1->61->60 [4] 62/-1/-1->61->60 [5] 62/-1/-1->61->60 [6] 62/-1/-1->61->60 [7] -1/-1/-1->61->60 +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Trees [0] -1/-1/-1->63->62 [1] 56/-1/-1->63->62 [2] 56/-1/-1->63->62 [3] 56/-1/-1->63->62 [4] -1/-1/-1->63->62 [5] 56/-1/-1->63->62 [6] 56/-1/-1->63->62 [7] 56/-1/-1->63->62 +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Trees [0] 63/-1/-1->62->61 [1] 63/-1/-1->62->61 [2] 63/-1/-1->62->61 [3] 63/-1/-1->62->54 [4] 63/-1/-1->62->61 [5] 63/-1/-1->62->61 [6] 63/-1/-1->62->61 [7] 63/30/-1->62->-1 +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Trees [0] 58/-1/-1->57->56 [1] -1/-1/-1->57->56 [2] 58/-1/-1->57->56 [3] 58/-1/-1->57->56 [4] 58/-1/-1->57->56 [5] -1/-1/-1->57->56 [6] 58/-1/-1->57->56 [7] 58/-1/-1->57->56 +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Trees [0] 59/-1/-1->58->57 [1] 59/-1/-1->58->50 [2] 59/-1/-1->58->57 [3] 59/-1/-1->58->57 [4] 59/-1/-1->58->57 [5] 59/26/-1->58->-1 [6] 59/-1/-1->58->57 [7] 59/-1/-1->58->57 +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Trees [0] -1/-1/-1->15->14 [1] 8/-1/-1->15->14 [2] 8/-1/-1->15->14 [3] 8/-1/-1->15->14 [4] -1/-1/-1->15->14 [5] 8/-1/-1->15->14 [6] 8/-1/-1->15->14 [7] 8/22/-1->15->14 +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Trees [0] 13/-1/-1->12->11 [1] 13/-1/-1->12->11 [2] 13/-1/-1->12->21 [3] 13/-1/-1->12->11 [4] 13/-1/-1->12->11 [5] 13/-1/-1->12->11 [6] 13/4/-1->12->28 [7] 13/-1/-1->12->11 +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13 [2] 15/-1/-1->14->13 [3] 15/-1/-1->14->23 [4] 15/-1/-1->14->13 [5] 15/-1/-1->14->13 [6] 15/-1/-1->14->13 [7] 15/6/-1->14->30 +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Trees [0] 14/-1/-1->13->12 [1] 14/-1/-1->13->12 [2] 14/-1/-1->13->12 [3] -1/-1/-1->13->12 [4] 14/-1/-1->13->12 [5] 14/-1/-1->13->12 [6] 14/20/-1->13->12 [7] -1/-1/-1->13->12 +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Trees [0] 12/-1/-1->11->10 [1] 12/-1/-1->11->10 [2] -1/-1/-1->11->10 [3] 12/-1/-1->11->10 [4] 12/-1/-1->11->10 [5] 12/18/-1->11->10 [6] -1/-1/-1->11->10 [7] 12/-1/-1->11->10 +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 03/0 : 60[4] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 01/0 : 56[0] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Trees [0] 11/-1/-1->10->9 [1] 11/-1/-1->10->19 [2] 11/-1/-1->10->9 [3] 11/-1/-1->10->9 [4] 11/-1/-1->10->9 [5] 11/2/-1->10->26 [6] 11/-1/-1->10->9 [7] 11/-1/-1->10->9 +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Trees [0] 9/-1/-1->8->17 [1] 9/-1/-1->8->15 [2] 9/-1/-1->8->15 [3] 9/-1/-1->8->15 [4] 9/0/-1->8->24 [5] 9/-1/-1->8->15 [6] 9/-1/-1->8->15 [7] 9/-1/-1->8->15 +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Trees [0] 10/-1/-1->9->8 [1] -1/-1/-1->9->8 [2] 10/-1/-1->9->8 [3] 10/-1/-1->9->8 [4] 10/16/-1->9->8 [5] -1/-1/-1->9->8 [6] 10/-1/-1->9->8 [7] 10/-1/-1->9->8 +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 03/0 : 12[4] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 01/0 : 10[2] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 01/0 : 8[0] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Trees [0] 54/-1/-1->53->52 [1] 54/-1/-1->53->52 [2] 54/44/-1->53->52 [3] -1/-1/-1->53->52 [4] 54/-1/-1->53->52 [5] 54/-1/-1->53->52 [6] 54/-1/-1->53->52 [7] -1/-1/-1->53->52 +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Trees [0] -1/-1/-1->55->54 [1] 48/-1/-1->55->54 [2] 48/-1/-1->55->54 [3] 48/46/-1->55->54 [4] -1/-1/-1->55->54 [5] 48/-1/-1->55->54 [6] 48/-1/-1->55->54 [7] 48/-1/-1->55->54 +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Trees [0] 55/-1/-1->54->53 [1] 55/-1/-1->54->53 [2] 55/-1/-1->54->53 [3] 55/62/-1->54->38 [4] 55/-1/-1->54->53 [5] 55/-1/-1->54->53 [6] 55/-1/-1->54->53 [7] 55/-1/-1->54->47 +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Trees [0] 53/-1/-1->52->51 [1] 53/-1/-1->52->51 [2] 53/60/-1->52->36 [3] 53/-1/-1->52->51 [4] 53/-1/-1->52->51 [5] 53/-1/-1->52->51 [6] 53/-1/-1->52->45 [7] 53/-1/-1->52->51 +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Trees [0] 52/-1/-1->51->50 [1] 52/42/-1->51->50 [2] -1/-1/-1->51->50 [3] 52/-1/-1->51->50 [4] 52/-1/-1->51->50 [5] 52/-1/-1->51->50 [6] -1/-1/-1->51->50 [7] 52/-1/-1->51->50 +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Trees [0] 51/-1/-1->50->49 [1] 51/58/-1->50->34 [2] 51/-1/-1->50->49 [3] 51/-1/-1->50->49 [4] 51/-1/-1->50->49 [5] 51/-1/-1->50->43 [6] 51/-1/-1->50->49 [7] 51/-1/-1->50->49 +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 03/0 : 7[7] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 07/0 : 7[7] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 00/0 : 9[1] -> 16[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 04/0 : 9[1] -> 16[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 07/0 : 12[4] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 05/0 : 10[2] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 05/0 : 8[0] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 01/0 : 58[2] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 05/0 : 58[2] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 05/0 : 56[0] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 07/0 : 60[4] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 00/0 : 57[1] -> 0[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 04/0 : 57[1] -> 0[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 03/0 : 55[7] -> 62[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 07/0 : 55[7] -> 62[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 03/0 : 8[0] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 07/0 : 8[0] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 03/0 : 14[6] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 01/0 : 11[3] -> 18[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 02/0 : 5[5] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 05/0 : 11[3] -> 18[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 06/0 : 5[5] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 07/0 : 14[6] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 01/0 : 12[4] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 03/0 : 56[0] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 07/0 : 56[0] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 01/0 : 59[3] -> 2[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 05/0 : 59[3] -> 2[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 02/0 : 53[5] -> 60[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 06/0 : 53[5] -> 60[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 01/0 : 51[3] -> 58[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 05/0 : 51[3] -> 58[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 02/0 : 61[5] -> 4[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 06/0 : 61[5] -> 4[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 00/0 : 49[1] -> 56[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 04/0 : 49[1] -> 56[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 00/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 02/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 05/0 : 12[4] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 01/0 : 3[3] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 05/0 : 3[3] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 02/0 : 13[5] -> 20[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 06/0 : 13[5] -> 20[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 00/0 : 1[1] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 04/0 : 1[1] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 00/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 03/0 : 15[7] -> 22[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 07/0 : 15[7] -> 22[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 02/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 03/0 : 63[7] -> 6[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 07/0 : 63[7] -> 6[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 04/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 06/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 01/0 : 60[4] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 03/0 : 62[6] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 00/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 04/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 06/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 00/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 05/0 : 60[4] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 07/0 : 62[6] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 01/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 00/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 02/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 00/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Trees [0] 50/40/-1->49->48 [1] -1/-1/-1->49->48 [2] 50/-1/-1->49->48 [3] 50/-1/-1->49->48 [4] 50/-1/-1->49->48 [5] -1/-1/-1->49->48 [6] 50/-1/-1->49->48 [7] 50/-1/-1->49->48 +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Trees [0] 49/56/-1->48->32 [1] 49/-1/-1->48->55 [2] 49/-1/-1->48->55 [3] 49/-1/-1->48->55 [4] 49/-1/-1->48->41 [5] 49/-1/-1->48->55 [6] 49/-1/-1->48->55 [7] 49/-1/-1->48->55 +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO P2P Chunksize set to 131072 +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 03/0 : 52[4] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 01/0 : 48[0] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 01/0 : 50[2] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 07/0 : 52[4] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 05/0 : 48[0] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 00/0 : 49[1] -> 56[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 04/0 : 49[1] -> 56[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 05/0 : 50[2] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 03/0 : 47[7] -> 54[6] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 07/0 : 47[7] -> 54[6] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 03/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 01/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 00/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 00/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 00/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 04/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 02/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 02/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 03/0 : 48[0] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 07/0 : 48[0] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 03/0 : 54[6] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 02/0 : 45[5] -> 52[4] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 06/0 : 45[5] -> 52[4] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 01/0 : 52[4] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 07/0 : 54[6] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 05/0 : 52[4] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 01/0 : 51[3] -> 58[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 05/0 : 51[3] -> 58[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 01/0 : 43[3] -> 50[2] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 05/0 : 43[3] -> 50[2] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 00/0 : 41[1] -> 48[0] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 02/0 : 53[5] -> 60[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 04/0 : 41[1] -> 48[0] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 00/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 06/0 : 53[5] -> 60[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 03/0 : 55[7] -> 62[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 07/0 : 55[7] -> 62[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 02/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 04/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 06/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 00/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 00/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 02/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 01/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 03/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 00/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 00/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 02/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 00/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 01/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 01/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 06/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 04/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 03/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 02/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 03/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 04/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 01/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 02/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 04/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 01/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 06/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 02/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 03/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 03/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 07/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 05/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 07/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 05/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 04/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 04/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 01/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 00/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 04/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 06/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 06/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 05/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 02/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 02/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 05/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 07/0 : 10[2] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 06/0 : 14[6] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 03/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 04/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 04/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 00/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 04/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 01/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 06/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 05/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 06/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 02/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 04/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 07/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 05/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 06/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 06/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 05/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 02/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 06/0 : 54[6] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 04/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 07/0 : 50[2] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 07/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 03/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 06/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 05/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 07/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 06/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 02/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 01/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 04/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 07/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 03/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 05/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 00/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 04/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 06/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 02/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 05/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 00/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 03/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 07/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 02/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 04/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 01/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 00/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 03/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 06/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 02/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 02/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 00/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 04/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 07/0 : 58[2] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 03/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 04/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 01/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 06/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 05/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 06/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 02/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 07/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 06/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 04/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 07/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 05/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 06/0 : 62[6] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 00/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 00/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 00/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 01/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 01/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 01/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 02/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 00/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 02/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 03/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 04/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 01/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 04/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 05/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 00/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 04/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 02/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 05/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 06/0 : 61[5] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 00/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 03/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 00/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 07/0 : 59[3] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 01/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 05/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 01/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 00/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 00/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 00/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 03/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 06/0 : 13[5] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 01/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 02/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 04/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 02/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 00/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 02/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 05/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 03/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 00/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 01/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 03/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 06/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 04/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 01/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 00/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 02/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 04/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 07/0 : 56[0] -> 57[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 06/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 02/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 03/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 05/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 07/0 : 57[1] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 03/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 04/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 06/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 01/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 01/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 02/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 00/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 04/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 05/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 07/0 : 62[6] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 03/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 05/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 06/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 03/0 : 54[6] -> 62[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 06/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 05/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 07/0 : 58[2] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 07/0 : 60[4] -> 61[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 07/0 : 56[0] -> 63[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 03/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 00/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 01/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Connected all rings +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 04/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 01/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 00/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 02/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 02/0 : 52[4] -> 60[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 01/0 : 50[2] -> 58[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Channel 04/0 : 57[1] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Channel 06/0 : 61[5] -> 60[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 01/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 00/0 : 48[0] -> 56[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 02/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 01/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 00/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 02/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 05/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 03/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 02/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 03/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Channel 05/0 : 59[3] -> 58[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 05/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 06/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 07/0 : 63[7] -> 56[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 04/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 06/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 04/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 03/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 05/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 07/0 : 48[0] -> 49[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 01/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 04/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 02/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 03/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 02/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 00/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 05/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 05/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 04/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 06/0 : 53[5] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 00/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 07/0 : 51[3] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 06/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 03/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 04/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 03/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 01/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 07/0 : 11[3] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 04/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 06/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 04/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 00/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 02/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 05/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 07/0 : 9[1] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 05/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 01/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 03/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 06/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 06/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 02/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 04/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 07/0 : 14[6] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 07/0 : 8[0] -> 9[1] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 03/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 05/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 01/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 06/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 07/0 : 6[6] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 04/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 03/0 : 14[6] -> 23[7] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 03/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 05/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 07/0 : 12[4] -> 13[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 05/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 06/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 06/0 : 4[4] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 06/0 : 20[4] -> 13[5] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 02/0 : 12[4] -> 21[5] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 07/0 : 8[0] -> 15[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 07/0 : 10[2] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 07/0 : 22[6] -> 15[7] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 01/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 05/0 : 2[2] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 04/0 : 16[0] -> 9[1] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 05/0 : 18[2] -> 11[3] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 00/0 : 8[0] -> 17[1] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 02/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 01/0 : 10[2] -> 19[3] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 03/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 05/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 06/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 07/0 : 15[7] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 01/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 00/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 07/0 : 49[1] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 00/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 02/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 01/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 01/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 01/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 03/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 02/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 03/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 02/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 04/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 03/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 05/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 03/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 05/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 04/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 07/0 : 48[0] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 04/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 06/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 04/0 : 8[0] -> 24[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 05/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 05/0 : 10[2] -> 26[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 05/0 : 26[2] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 05/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 07/0 : 50[2] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 06/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 06/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 00/0 : 40[0] -> 49[1] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 01/0 : 50[2] -> 58[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 01/0 : 34[2] -> 50[2] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 00/0 : 49[1] -> 40[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 01/0 : 50[2] -> 34[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 07/0 : 54[6] -> 55[7] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 07/0 : 52[4] -> 53[5] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 00/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 01/0 : 58[2] -> 50[2] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Channel 05/0 : 50[2] -> 43[3] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 03/0 : 54[6] -> 62[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 03/0 : 38[6] -> 54[6] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 06/0 : 12[4] -> 28[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 06/0 : 28[4] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 01/0 : 19[3] -> 10[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Channel 05/0 : 10[2] -> 2[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 02/0 : 21[5] -> 12[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 06/0 : 12[4] -> 4[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 02/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Channel 04/0 : 49[1] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 03/0 : 54[6] -> 38[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 01/0 : 42[2] -> 51[3] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 02/0 : 52[4] -> 60[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 02/0 : 44[4] -> 53[5] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 02/0 : 36[4] -> 52[4] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 01/0 : 51[3] -> 42[2] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 02/0 : 52[4] -> 36[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 02/0 : 53[5] -> 44[4] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 02/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 02/0 : 60[4] -> 52[4] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 06/0 : 52[4] -> 45[5] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 03/0 : 62[6] -> 54[6] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Channel 06/0 : 53[5] -> 52[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Channel 06/0 : 13[5] -> 12[4] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 07/0 : 14[6] -> 30[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 07/0 : 30[6] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 04/0 : 24[0] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 03/0 : 23[7] -> 14[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 01/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Channel 07/0 : 14[6] -> 6[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 01/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 03/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Channel 07/0 : 54[6] -> 47[7] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 01/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 03/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 01/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 05/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Channel 05/0 : 11[3] -> 10[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 05/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Channel 07/0 : 12[4] -> 11[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 00/0 : 17[1] -> 8[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Channel 05/0 : 51[3] -> 50[2] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Channel 07/0 : 52[4] -> 51[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 00/0 : 48[0] -> 56[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 03/0 : 46[6] -> 55[7] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 03/0 : 55[7] -> 46[6] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 00/0 : 32[0] -> 48[0] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 00/0 : 48[0] -> 32[0] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 01/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 00/0 : 56[0] -> 48[0] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 05/0 : 26[2] -> 58[2] [receive] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 05/0 : 58[2] -> 26[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Channel 01/0 : 58[2] -> 50[2] [send] via NET/IB/1/GDRDMA +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 07/0 : 30[6] -> 62[6] [receive] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 07/0 : 62[6] -> 30[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 06/0 : 28[4] -> 60[4] [receive] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 06/0 : 60[4] -> 28[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 02/0 : 60[4] -> 52[4] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 01/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Channel 03/0 : 62[6] -> 54[6] [send] via NET/IB/3/GDRDMA +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 03/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 05/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Channel 07/0 : 60[4] -> 59[3] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 04/0 : 24[0] -> 56[0] [receive] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 04/0 : 56[0] -> 24[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Channel 00/0 : 56[0] -> 48[0] [send] via NET/IB/0/GDRDMA +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 02/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 03/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 03/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Channel 07/0 : 63[7] -> 62[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 03/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 05/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 06/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 07/0 : 55[7] -> 48[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Channel 07/0 : 15[7] -> 14[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 00/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Channel 04/0 : 48[0] -> 41[1] [send] via NET/IB/2/GDRDMA +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Channel 04/0 : 9[1] -> 8[0] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 03/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Channel 07/0 : 55[7] -> 54[6] via P2P/IPC/read +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO Connected all trees +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO threadThresholds 8/8/64 | 512/8/64 | 512 | 512 +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer +dlc1irjyfb0zt5ew-worker-6:78:3080 [6] NCCL INFO comm 0x7fc5340443f0 rank 62 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:73:3079 [1] NCCL INFO comm 0x7f1b700442b0 rank 57 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:75:3084 [3] NCCL INFO comm 0x7f2008044d50 rank 59 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:79:3078 [7] NCCL INFO comm 0x7fc184043f90 rank 63 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:72:3083 [0] NCCL INFO comm 0x7f4ccc044540 rank 56 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:74:3082 [2] NCCL INFO comm 0x7f23400445d0 rank 58 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:77:3085 [5] NCCL INFO comm 0x7ff8a0044150 rank 61 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-6:76:3081 [4] NCCL INFO comm 0x7f22c8044150 rank 60 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:79:3081 [6] NCCL INFO comm 0x7fbf0c044a90 rank 54 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:76:3085 [3] NCCL INFO comm 0x7fb9d4043fd0 rank 51 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:73:3078 [0] NCCL INFO comm 0x7f2d10044670 rank 48 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:80:3082 [7] NCCL INFO comm 0x7fb3d80446b0 rank 55 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:75:3084 [2] NCCL INFO comm 0x7f4018044610 rank 50 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:77:3083 [4] NCCL INFO comm 0x7f9af0044a70 rank 52 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:78:3080 [5] NCCL INFO comm 0x7f52a8044050 rank 53 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-5:74:3079 [1] NCCL INFO comm 0x7fe534044470 rank 49 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:80:3082 [7] NCCL INFO comm 0x7f4e300443f0 rank 15 nranks 64 cudaDev 7 nvmlDev 7 busId 80 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:78:3080 [5] NCCL INFO comm 0x7f424c044290 rank 13 nranks 64 cudaDev 5 nvmlDev 5 busId 60 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:74:3081 [1] NCCL INFO comm 0x7fef74044530 rank 9 nranks 64 cudaDev 1 nvmlDev 1 busId 20 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:77:3085 [4] NCCL INFO comm 0x7ff200044640 rank 12 nranks 64 cudaDev 4 nvmlDev 4 busId 50 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:75:3086 [2] NCCL INFO comm 0x7f0c9c0445c0 rank 10 nranks 64 cudaDev 2 nvmlDev 2 busId 30 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:76:3079 [3] NCCL INFO comm 0x7fad4c044190 rank 11 nranks 64 cudaDev 3 nvmlDev 3 busId 40 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:79:3084 [6] NCCL INFO comm 0x7f4b5c044090 rank 14 nranks 64 cudaDev 6 nvmlDev 6 busId 70 commId 0x7f74f3499b0b795a - Init COMPLETE +dlc1irjyfb0zt5ew-worker-0:73:3083 [0] NCCL INFO comm 0x7fc450044150 rank 8 nranks 64 cudaDev 0 nvmlDev 0 busId 10 commId 0x7f74f3499b0b795a - Init COMPLETE +[INFO|trainer.py:1962] 2025-01-24 01:40:37,430 >> + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +[INFO|trainer.py:1962] 2025-01-24 01:40:37,430 >> + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +[INFO|trainer.py:1962] 2025-01-24 01:40:37,430 >> + +Training completed. Do not forget to share your model on huggingface.co/models =) + + +dlc1irjyfb0zt5ew-worker-5:77:3089 [4] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-worker-5:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-worker-5:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-5:77:3089 [4] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-6:76:3090 [4] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-6:74:3089 [2] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-5:77:3089 [4] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-5:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-5:77:3089 [4] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-5:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-5:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-worker-5:77:3089 [4] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-worker-6:76:3090 [4] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-6:74:3089 [2] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-6:76:3090 [4] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-worker-6:74:3089 [2] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-worker-6:76:3090 [4] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-worker-6:74:3089 [2] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO [Service thread] Connection closed by localRank 7 +dlc1irjyfb0zt5ew-worker-6:76:358 [4] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-worker-5:77:358 [4] NCCL INFO [Service thread] Connection closed by localRank 6 +dlc1irjyfb0zt5ew-worker-5:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 4 +dlc1irjyfb0zt5ew-worker-5:75:361 [2] NCCL INFO [Service thread] Connection closed by localRank 4 +dlc1irjyfb0zt5ew-worker-0:79:3088 [6] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-0:77:3087 [4] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-0:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-0:75:360 [2] NCCL INFO [Service thread] Connection closed by localRank 1 +dlc1irjyfb0zt5ew-worker-0:79:3088 [6] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-0:77:3087 [4] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-0:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-0:75:360 [2] NCCL INFO [Service thread] Connection closed by localRank 0 +dlc1irjyfb0zt5ew-worker-0:77:3087 [4] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-0:79:3088 [6] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-0:75:3091 [2] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-0:75:360 [2] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO [Service thread] Connection closed by localRank 5 +dlc1irjyfb0zt5ew-worker-0:77:3087 [4] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-0:79:3088 [6] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-0:77:356 [4] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-0:79:3088 [6] NCCL INFO [Service thread] Connection closed by localRank 2 +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO [Service thread] Connection closed by localRank 2 +dlc1irjyfb0zt5ew-worker-6:74:3089 [2] NCCL INFO [Service thread] Connection closed by localRank 4 +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO [Service thread] Connection closed by localRank 4 +dlc1irjyfb0zt5ew-worker-0:79:3088 [6] NCCL INFO [Service thread] Connection closed by localRank 4 +dlc1irjyfb0zt5ew-worker-0:79:355 [6] NCCL INFO [Service thread] Connection closed by localRank 4 +dlc1irjyfb0zt5ew-worker-6:74:3089 [2] NCCL INFO [Service thread] Connection closed by localRank 3 +dlc1irjyfb0zt5ew-worker-6:74:360 [2] NCCL INFO [Service thread] Connection closed by localRank 3