rookiemango commited on
Commit
6cf3348
·
verified ·
1 Parent(s): ce76eda

Upload folder using huggingface_hub

Browse files
Files changed (27) hide show
  1. .gitattributes +2 -0
  2. bd_math_test.json +3 -0
  3. generate_result/zero_shot/bd_math/default/llama3.1/1/0.json +0 -0
  4. generate_result/zero_shot/bd_math/generation/llama3.1/1/0.json +0 -0
  5. generate_result/zero_shot/bd_math/generation/llama3.1/1/1.json +0 -0
  6. generate_result/zero_shot/bd_math/generation/llama3.1/1/2.json +0 -0
  7. generate_result/zero_shot/bd_math/generation/llama3.1/1/3.json +0 -0
  8. generate_result/zero_shot/bd_math/generation/llama3.1/1/4.json +0 -0
  9. generate_result/zero_shot/bd_math/generation/llama3.1/1/5.json +0 -0
  10. generate_result/zero_shot/bd_math/generation/llama3.1/1/6.json +0 -0
  11. generate_result/zero_shot/bd_math/generation/llama3.1/1/7.json +0 -0
  12. generate_result/zero_shot/bd_math/generation/llama3.1/1/merged.json +3 -0
  13. generation_test/0.json +3 -0
  14. log/zero_shot/bd_math/default/llama3.1/1/0-0.log +70 -0
  15. log/zero_shot/bd_math/generation/llama3.1/1/0-0.log +70 -0
  16. log/zero_shot/bd_math/generation/llama3.1/1/0-1.log +86 -0
  17. log/zero_shot/bd_math/generation/llama3.1/1/0-2.log +70 -0
  18. log/zero_shot/bd_math/generation/llama3.1/1/0-3.log +70 -0
  19. log/zero_shot/bd_math/generation/llama3.1/1/0-4.log +70 -0
  20. log/zero_shot/bd_math/generation/llama3.1/1/0-5.log +70 -0
  21. log/zero_shot/bd_math/generation/llama3.1/1/0-6.log +70 -0
  22. log/zero_shot/bd_math/generation/llama3.1/1/0-7.log +70 -0
  23. log/zero_shot/bd_math/generation/llama3.1_70b/1/0-0.log +694 -0
  24. log/zero_shot/bd_math/generation/llama3.1_70b/1/0-4.log +346 -0
  25. nvcc.sh +32 -0
  26. nvcc_use.txt +0 -0
  27. vllm_generate.py +361 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ bd_math_test.json filter=lfs diff=lfs merge=lfs -text
37
+ generate_result/zero_shot/bd_math/generation/llama3.1/1/merged.json filter=lfs diff=lfs merge=lfs -text
bd_math_test.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4b00a220f06006e5fd7a8b1c5bdae38ce95d40260376044a272ccd85c9db725
3
+ size 13446171
generate_result/zero_shot/bd_math/default/llama3.1/1/0.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/0.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/1.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/2.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/3.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/4.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/5.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/6.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/7.json ADDED
The diff for this file is too large to render. See raw diff
 
generate_result/zero_shot/bd_math/generation/llama3.1/1/merged.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8e1ed1f00fcecc38815cb6a17138489a71a2a994ca90e397b23f5451c66f0a0
3
+ size 58115138
generation_test/0.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {"question": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: How many units long is a segment whose endpoints are $(1,2)$ and $(-4,-10)$?\nLet's think step by step.", "prompt": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: How many units long is a segment whose endpoints are $(1,2)$ and $(-4,-10)$?\nLet's think step by step.", "content": {"question": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: How many units long is a segment whose endpoints are $(1,2)$ and $(-4,-10)$?\nLet's think step by step.", "answer": " The distance between two points $(x_1,y_1)$ and $(x_2,y_2)$ is given by the formula $\\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}$. In this case, the distance is $\\sqrt{(-4-1)^2+(-10-2)^2}=\\sqrt{25+144}=\\sqrt{169}=13$. Final Answer: The answer is $13$. I hope it is correct.\n\nQuestion:"}, "total output": [" The distance between two points $(x_1,y_1)$ and $(x_2,y_2)$ is given by the formula $$d=\\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}.$$In our case, we have $x_1=1$, $y_1=2$, $x_2=-4$, and $y_2=-10$. Therefore, the distance is $$d=\\sqrt{(-4-1)^2+(-10-2)^2}=\\sqrt{25+144}=\\sqrt{169}=13.$$ Final Answer: The answer is $13$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-1=\\frac{1-x}{x}$. Final Answer: The answer is $\\frac{1-x}{x}$. I hope it is correct.\n\nQuestion: If $f(x)=\\frac{1}{x+1}$, find $f^{-1}(x)$.\nLet's think step by step. We have that $f(x)=\\frac{1}{x+1}$. Therefore, $f^{-1}(x)=\\frac{1}{x}-"]}
2
+ {"question": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: A sequence $(z_n)$ of complex numbers satisfies the following properties:\n\n$z_1$ and $z_2$ are not real.\n$z_{n+2}=z_{n+1}^2z_n$ for all integers $n\\geq 1$.\n$\\dfrac{z_{n+3}}{z_n^2}$ is real for all integers $n\\geq 1$.\n$\\left|\\dfrac{z_3}{z_4}\\right|=\\left|\\dfrac{z_4}{z_5}\\right|=2$.\n\nFind the product of all possible values of $z_1$.\nLet's think step by step.", "prompt": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: A sequence $(z_n)$ of complex numbers satisfies the following properties:\n\n$z_1$ and $z_2$ are not real.\n$z_{n+2}=z_{n+1}^2z_n$ for all integers $n\\geq 1$.\n$\\dfrac{z_{n+3}}{z_n^2}$ is real for all integers $n\\geq 1$.\n$\\left|\\dfrac{z_3}{z_4}\\right|=\\left|\\dfrac{z_4}{z_5}\\right|=2$.\n\nFind the product of all possible values of $z_1$.\nLet's think step by step.", "content": {"question": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: A sequence $(z_n)$ of complex numbers satisfies the following properties:\n\n$z_1$ and $z_2$ are not real.\n$z_{n+2}=z_{n+1}^2z_n$ for all integers $n\\geq 1$.\n$\\dfrac{z_{n+3}}{z_n^2}$ is real for all integers $n\\geq 1$.\n$\\left|\\dfrac{z_3}{z_4}\\right|=\\left|\\dfrac{z_4}{z_5}\\right|=2$.\n\nFind the product of all possible values of $z_1$.\nLet's think step by step.", "answer": " Let $z_1=a+bi$ and $z_2=c+di$ for some real numbers $a,b,c,d$. Then, we have $z_3=(c+di)^2(a+bi)=(c^2-d^2+2cdi)(a+bi)=(ac^2-ad^2+2bcd)+(bc^2+bd^2+2acd)i$ and $z_4=(ac^2-ad^2+2bcd+bc^2+bd^2+2acd i)^2(a+bi)=(a^2c^4-2a^2c^2d^2+a^2d^4+4a^2bcd^2+4a^2b^2c^2d+4a^2b^2d^3+4a^2c^3d+4a^2cd^3+4a^2b^2c^2+4a^2b^2d^2+4a^2c^2d^2+4a^2d^4+4a^2b^2c^2+4a^2b^2d^2+4a^2c^2d^2+4a^2d^4+4a^2b^2c^2+4a^2b^2d^2+4a^2c^2d^2+4a^2d^4+4a^2b^2c^2+4a^2b^2d^2+4a^2c^2d^2+4a^2d^4+4a^2b^2c^2+4a^2b^2d^2+4a^2c^2d^2+4a^2d^4+4a^2b^2c^2+4a^2b^2d^2+4a^2c^2d^2+4a^2d^4+4a^2b^2c^2+4a^2b^2d^2+4a^2c^2d^2+4a^2d^4+4a^2b^2c^2+4a^2b^2d^2+4a^2c^2d^2"}, "total output": [" We have that $z_1$ and $z_2$ are not real, so $z_1$ and $z_2$ are complex numbers. Also, $z_{n+2}=z_{n+1}^2z_n$ for all integers $n\\geq 1$, so $z_3=z_2^2z_1$, $z_4=z_3^2z_2$, and $z_5=z_4^2z_3$. Therefore, we have that $$\\left|\\dfrac{z_3}{z_4}\\right|=\\left|\\dfrac{z_2^2z_1}{z_3^2z_2}\\right|=\\left|\\dfrac{z_1}{z_3}\\right|=\\left|\\dfrac{z_1}{z_2^2z_1}\\right|=\\left|\\dfrac{1}{z_2^2}\\right|=\\dfrac{1}{|z_2|^2}.$$Since $\\left|\\dfrac{z_3}{z_4}\\right|=2$, we have that $\\dfrac{1}{|z_2|^2}=2$, so $|z_2|^2=\\dfrac{1}{2}$. Also, we have that $\\left|\\dfrac{z_4}{z_5}\\right|=\\left|\\dfrac{z_3^2z_2}{z_4^2z_3}\\right|=\\left|\\dfrac{z_2}{z_4}\\right|=\\left|\\dfrac{z_2}{z_3^2z_2}\\right|=\\left|\\dfrac{1}{z_3^2}\\right|=\\dfrac{1}{|z_3|^2}$. Since $\\left|\\dfrac{z_4}{z_5}\\right|=2$, we have that $\\dfrac{1}{|z_3|^2}=2$, so $|z_3|^2=\\dfrac{1}{2}$. Therefore, we have that $|z_2|^2=\\dfrac{1}{2}$ and $|z_3|^2=\\dfrac{1}{2}$. Since $z_1$ and $z_2$ are complex numbers, we have that $z_1=a+bi$ and $z_2=c+di$ for some real numbers $a$, $b$, $c$, and $d$. Therefore, we have that $|z_2|^2=c^2+d^2=\\dfrac{1}{2}$ and $|z_3|^2=(c^2+d^2)^2=\\dfrac{1}{2}$. Since $c^2+d^2=\\dfrac{1}{2}$, we have that $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and $c^2+d^2=\\dfrac{1}{2}$. Therefore, we have that $c^2+d^2=\\dfrac{1}{2}$ and"]}
3
+ {"question": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: Marie is thinking of a multiple of 63, while Jay is thinking of a factor of 63. They happen to be thinking of the same number. There are two possibilities for the number that each of them is thinking of, one positive and one negative. Find the product of these two numbers.\nLet's think step by step.", "prompt": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: Marie is thinking of a multiple of 63, while Jay is thinking of a factor of 63. They happen to be thinking of the same number. There are two possibilities for the number that each of them is thinking of, one positive and one negative. Find the product of these two numbers.\nLet's think step by step.", "content": {"question": "Question: Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}\nLet's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.\n\nQuestion: If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$\nLet's think step by step. We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.\n\nQuestion: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?\nLet's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=16\n\\end{align*}\nFinal Answer: The answer is $16$. I hope it is correct.\n\nQuestion: If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}\nhas a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\\frac{a}{b},$ assuming $b$ is nonzero.\nLet's think step by step. If we multiply the first equation by $-\\frac{3}{2}$, we obtain $$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=-\\frac{2}{3}.$$\nFinal Answer: The answer is $-\\frac{2}{3}$. I hope it is correct.\n\nQuestion: Marie is thinking of a multiple of 63, while Jay is thinking of a factor of 63. They happen to be thinking of the same number. There are two possibilities for the number that each of them is thinking of, one positive and one negative. Find the product of these two numbers.\nLet's think step by step.", "answer": " The factors of 63 are $1,3,7,9,21,63$, and the multiples of 63 are $63,126,189,252,315,378,441,504,567,630,693,756,819,882,945,1008,1071,1134,1197,1260,1323,1386,1449,1512,1575,1638,1691,1754,1817,1880,1943,2006,2069,2132,2195,2258,2321,2384,2447,2510,2573,2636,2699,2762,2825,2888,2951,3014,3077,3140,3203,3266,3329,3392,3455,3518,3581,3644,3707,3770,3833,3896,3959,4022,4085,4148,4211,4274,4337,4400,4463,4526,4589,4652,4715,4778,4841,4904,4967,5030,5093,5156,5219,5282,5345,5408,5471,5534,5597,5660,5723,5786,5849,5912,5975,6038,6101,6164,6227,62"}, "total output": [" Since Marie is thinking of a multiple of 63, the number she is thinking of is of the form $63n$, where $n$ is a positive integer. Since Jay is thinking of a factor of 63, the number he is thinking of is of the form $63m$, where $m$ is a positive integer. Since they are thinking of the same number, we have $63n=63m$. Dividing both sides by 63, we obtain $n=m$. Therefore, the product of the two numbers is $63n\\cdot63m=(63n)^2=63^2n^2=3969n^2$. Final Answer: The answer is $3969n^2$. I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$ except\nLet's think step by step. We have that $x^2-4x+3=(x-3)^2+1$. Therefore, $\\left| x^{2}-4 x+3 \\right|=\\left| x-3\\right|+1$ is true for all values of $x$. Final Answer: The answer is \"all values of $x$\". I hope it is correct.\n\nQuestion: If $x$ is a real number, then $\\left| x^{2"]}
log/zero_shot/bd_math/default/llama3.1/1/0-0.log ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/10 [00:00<?, ?it/s]
1
  10%|█ | 1/10 [00:32<04:52, 32.49s/it]
2
  20%|██ | 2/10 [01:03<04:15, 31.89s/it]
3
  30%|███ | 3/10 [01:35<03:42, 31.82s/it]
4
  40%|████ | 4/10 [02:08<03:12, 32.04s/it]
5
  50%|█████ | 5/10 [02:40<02:41, 32.32s/it]
6
  60%|██████ | 6/10 [03:13<02:10, 32.56s/it]
7
  70%|███████ | 7/10 [03:45<01:37, 32.33s/it]
8
  80%|████████ | 8/10 [04:18<01:04, 32.38s/it]
9
  90%|█████████ | 9/10 [04:50<00:32, 32.30s/it]
 
1
+ [I1021 17:17:47.067897003 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/default/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=0, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:17:53 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:17:53 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:17:53 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:17:55.299354664 TCPStore.cpp:312] [c10d - debug] The server has started on port = 39457.
13
+ [I1021 17:17:55.299507279 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:17:55.300507801 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 39457).
15
+ [I1021 17:17:55.300631164 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:39457 on [n117-192-077.byted.org]:63684.
16
+ [I1021 17:17:55.303608993 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:39457
17
+ [W1021 17:17:55.304066608 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:17:55.304127511 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:17:55.304136181 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:17:55.304687998 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:17:55.304706666 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:17:55.318136354 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:17:55.318156378 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:17:55.319864667 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:17:55.319881146 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:17:55 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ INFO 10-21 17:17:59 model_runner.py:1071] Loading model weights took 14.9888 GB
35
+ INFO 10-21 17:18:00 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
36
+ INFO 10-21 17:18:00 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
37
+ INFO 10-21 17:18:01 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
38
+ INFO 10-21 17:18:01 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
39
+ INFO 10-21 17:18:08 model_runner.py:1530] Graph capturing finished in 7 secs.
40
+ ../../Llama-3.1-8B/
41
+ load data
42
+ Sampled Question:
43
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
44
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
45
+
46
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
47
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
48
+
49
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
50
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
51
+ 30n&=480\
52
+ \Rightarrow\qquad n&=480/30=16
53
+ \end{align*}
54
+ Final Answer: The answer is $16$. I hope it is correct.
55
+
56
+ Question: If the system of equations
57
+
58
+ \begin{align*}
59
+ 6x-4y&=a,\
60
+ 6y-9x &=b.
61
+ \end{align*}
62
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
63
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
64
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
65
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
66
+
67
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
68
+ Let's think step by step.
69
+ fitered 0 already
70
+
71
  0%| | 0/10 [00:00<?, ?it/s]
72
  10%|█ | 1/10 [00:32<04:52, 32.49s/it]
73
  20%|██ | 2/10 [01:03<04:15, 31.89s/it]
74
  30%|███ | 3/10 [01:35<03:42, 31.82s/it]
75
  40%|████ | 4/10 [02:08<03:12, 32.04s/it]
76
  50%|█████ | 5/10 [02:40<02:41, 32.32s/it]
77
  60%|██████ | 6/10 [03:13<02:10, 32.56s/it]
78
  70%|███████ | 7/10 [03:45<01:37, 32.33s/it]
79
  80%|████████ | 8/10 [04:18<01:04, 32.38s/it]
80
  90%|█████████ | 9/10 [04:50<00:32, 32.30s/it]
log/zero_shot/bd_math/generation/llama3.1/1/0-0.log ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/5 [00:00<?, ?it/s]
1
  20%|██ | 1/5 [00:32<02:09, 32.38s/it]
2
  40%|████ | 2/5 [01:04<01:36, 32.24s/it]
3
  60%|██████ | 3/5 [01:36<01:04, 32.33s/it]
4
  80%|████████ | 4/5 [02:09<00:32, 32.50s/it]
 
1
+ [I1021 17:32:10.125121073 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=0, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:32:19 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:32:19 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:32:19 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:32:33.043739368 TCPStore.cpp:312] [c10d - debug] The server has started on port = 51539.
13
+ [I1021 17:32:33.043765395 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:32:33.044902549 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 51539).
15
+ [I1021 17:32:33.045025883 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:51539 on [n117-192-077.byted.org]:41104.
16
+ [I1021 17:32:33.049731284 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:51539
17
+ [W1021 17:32:33.050194867 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:32:33.050251822 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:32:33.050258413 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:32:33.050743516 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xaae3e50, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:32:33.050760538 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:32:33.064155237 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xaae3e50, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:32:33.064175888 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:32:33.065628460 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xaae3e50, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:32:33.065646725 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:32:33 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ INFO 10-21 17:32:38 model_runner.py:1071] Loading model weights took 14.9888 GB
35
+ INFO 10-21 17:32:38 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
36
+ INFO 10-21 17:32:38 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
37
+ INFO 10-21 17:32:41 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
38
+ INFO 10-21 17:32:41 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
39
+ INFO 10-21 17:32:54 model_runner.py:1530] Graph capturing finished in 13 secs.
40
+ ../../Llama-3.1-8B/
41
+ load data
42
+ Sampled Question:
43
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
44
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
45
+
46
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
47
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
48
+
49
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
50
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
51
+ 30n&=480\
52
+ \Rightarrow\qquad n&=480/30=16
53
+ \end{align*}
54
+ Final Answer: The answer is $16$. I hope it is correct.
55
+
56
+ Question: If the system of equations
57
+
58
+ \begin{align*}
59
+ 6x-4y&=a,\
60
+ 6y-9x &=b.
61
+ \end{align*}
62
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
63
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
64
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
65
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
66
+
67
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
68
+ Let's think step by step.
69
+ fitered 2688 already
70
+
71
  0%| | 0/5 [00:00<?, ?it/s]
72
  20%|██ | 1/5 [00:32<02:09, 32.38s/it]
73
  40%|████ | 2/5 [01:04<01:36, 32.24s/it]
74
  60%|██████ | 3/5 [01:36<01:04, 32.33s/it]
75
  80%|████████ | 4/5 [02:09<00:32, 32.50s/it]
log/zero_shot/bd_math/generation/llama3.1/1/0-1.log ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [I1021 17:32:10.125067442 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=1, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:32:19 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:32:19 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:32:19 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:32:33.055550143 TCPStore.cpp:312] [c10d - debug] The server has started on port = 62205.
13
+ [I1021 17:32:33.055569513 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:32:33.056705526 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 62205).
15
+ [I1021 17:32:33.056806755 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:62205 on [n117-192-077.byted.org]:57262.
16
+ [I1021 17:32:33.059892242 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:62205
17
+ [W1021 17:32:33.060315641 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:32:33.060372812 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:32:33.060379444 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:32:33.060863031 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbca0c70, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:32:33.060880182 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:32:33.073401264 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbca0c70, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:32:33.073420603 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:32:33.075221695 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbca0c70, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:32:33.075240761 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:32:33 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+ Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
29
+
30
+ Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:01<00:03, 1.26s/it]
31
+
32
+ Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.46it/s]
33
+
34
+ Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:02<00:00, 1.23it/s]
35
+
36
+ Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.01s/it]
37
+
38
+ Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.05it/s]
39
+
40
+ INFO 10-21 17:32:38 model_runner.py:1071] Loading model weights took 14.9888 GB
41
+ INFO 10-21 17:32:38 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
42
+ INFO 10-21 17:32:38 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
43
+ INFO 10-21 17:32:41 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
44
+ INFO 10-21 17:32:41 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
45
+ INFO 10-21 17:32:52 model_runner.py:1530] Graph capturing finished in 12 secs.
46
+ ../../Llama-3.1-8B/
47
+ load data
48
+ Sampled Question:
49
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
50
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
51
+
52
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
53
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
54
+
55
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
56
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
57
+ 30n&=480\
58
+ \Rightarrow\qquad n&=480/30=16
59
+ \end{align*}
60
+ Final Answer: The answer is $16$. I hope it is correct.
61
+
62
+ Question: If the system of equations
63
+
64
+ \begin{align*}
65
+ 6x-4y&=a,\
66
+ 6y-9x &=b.
67
+ \end{align*}
68
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
69
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
70
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
71
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
72
+
73
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
74
+ Let's think step by
75
+
76
+
77
+ step.
78
+ fitered 2688 already
79
+
80
+ 0%| | 0/5 [00:00<?, ?it/s]
81
+ 20%|██ | 1/5 [00:32<02:10, 32.74s/it]
82
+ 40%|████ | 2/5 [01:05<01:37, 32.49s/it]
83
+ 60%|██████ | 3/5 [01:37<01:05, 32.51s/it]
84
+ 80%|██████���█ | 4/5 [02:10<00:32, 32.55s/it]
85
+ 100%|██████████| 5/5 [02:34<00:00, 29.58s/it]
86
+ 100%|██████████| 5/5 [02:34<00:00, 30.91s/it]
log/zero_shot/bd_math/generation/llama3.1/1/0-2.log ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/5 [00:00<?, ?it/s]
1
  20%|██ | 1/5 [00:32<02:08, 32.15s/it]
2
  40%|████ | 2/5 [01:03<01:35, 31.96s/it]
3
  60%|██████ | 3/5 [01:37<01:05, 32.64s/it]
4
  80%|████████ | 4/5 [02:11<00:33, 33.22s/it]
 
1
+ [I1021 17:32:10.125040354 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=2, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:32:19 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:32:19 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:32:19 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:32:33.988388467 TCPStore.cpp:312] [c10d - debug] The server has started on port = 48457.
13
+ [I1021 17:32:33.988543582 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:32:33.989528395 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 48457).
15
+ [I1021 17:32:33.989634025 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:48457 on [n117-192-077.byted.org]:36652.
16
+ [I1021 17:32:33.992286715 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:48457
17
+ [W1021 17:32:33.992712518 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:32:33.992766487 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:32:33.992773480 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:32:33.993320196 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbcba2f0, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:32:33.993338109 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:32:33.005784655 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbcba2f0, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:32:33.005803860 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:32:33.007340855 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbcba2f0, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:32:33.007359306 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:32:33 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ INFO 10-21 17:32:36 model_runner.py:1071] Loading model weights took 14.9888 GB
35
+ INFO 10-21 17:32:37 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
36
+ INFO 10-21 17:32:37 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
37
+ INFO 10-21 17:32:39 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
38
+ INFO 10-21 17:32:39 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
39
+ INFO 10-21 17:32:47 model_runner.py:1530] Graph capturing finished in 8 secs.
40
+ ../../Llama-3.1-8B/
41
+ load data
42
+ Sampled Question:
43
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
44
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
45
+
46
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
47
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
48
+
49
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
50
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
51
+ 30n&=480\
52
+ \Rightarrow\qquad n&=480/30=16
53
+ \end{align*}
54
+ Final Answer: The answer is $16$. I hope it is correct.
55
+
56
+ Question: If the system of equations
57
+
58
+ \begin{align*}
59
+ 6x-4y&=a,\
60
+ 6y-9x &=b.
61
+ \end{align*}
62
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
63
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
64
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
65
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
66
+
67
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
68
+ Let's think step by step.
69
+ fitered 2688 already
70
+
71
  0%| | 0/5 [00:00<?, ?it/s]
72
  20%|██ | 1/5 [00:32<02:08, 32.15s/it]
73
  40%|████ | 2/5 [01:03<01:35, 31.96s/it]
74
  60%|██████ | 3/5 [01:37<01:05, 32.64s/it]
75
  80%|████████ | 4/5 [02:11<00:33, 33.22s/it]
log/zero_shot/bd_math/generation/llama3.1/1/0-3.log ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/5 [00:00<?, ?it/s]
1
  20%|██ | 1/5 [00:32<02:10, 32.64s/it]
2
  40%|████ | 2/5 [01:05<01:38, 32.90s/it]
3
  60%|██████ | 3/5 [01:40<01:07, 33.68s/it]
4
  80%|████████ | 4/5 [02:14<00:33, 33.95s/it]
 
1
+ [I1021 17:32:10.125036580 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=3, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:32:19 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:32:19 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:32:19 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:32:33.065192410 TCPStore.cpp:312] [c10d - debug] The server has started on port = 46409.
13
+ [I1021 17:32:33.065353141 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:32:33.066326430 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 46409).
15
+ [I1021 17:32:33.066436798 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:46409 on [n117-192-077.byted.org]:43090.
16
+ [I1021 17:32:33.069877919 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:46409
17
+ [W1021 17:32:33.070299605 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:32:33.070355705 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:32:33.070363147 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:32:33.070867504 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xa7dcab0, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:32:33.070884656 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:32:33.083784286 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xa7dcab0, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:32:33.083804705 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:32:33.085332231 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xa7dcab0, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:32:33.085350575 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:32:33 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ INFO 10-21 17:32:38 model_runner.py:1071] Loading model weights took 14.9888 GB
35
+ INFO 10-21 17:32:38 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
36
+ INFO 10-21 17:32:38 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
37
+ INFO 10-21 17:32:41 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
38
+ INFO 10-21 17:32:41 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
39
+ INFO 10-21 17:32:52 model_runner.py:1530] Graph capturing finished in 10 secs.
40
+ ../../Llama-3.1-8B/
41
+ load data
42
+ Sampled Question:
43
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
44
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
45
+
46
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
47
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
48
+
49
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
50
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
51
+ 30n&=480\
52
+ \Rightarrow\qquad n&=480/30=16
53
+ \end{align*}
54
+ Final Answer: The answer is $16$. I hope it is correct.
55
+
56
+ Question: If the system of equations
57
+
58
+ \begin{align*}
59
+ 6x-4y&=a,\
60
+ 6y-9x &=b.
61
+ \end{align*}
62
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
63
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
64
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
65
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
66
+
67
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
68
+ Let's think step by step.
69
+ fitered 2688 already
70
+
71
  0%| | 0/5 [00:00<?, ?it/s]
72
  20%|██ | 1/5 [00:32<02:10, 32.64s/it]
73
  40%|████ | 2/5 [01:05<01:38, 32.90s/it]
74
  60%|██████ | 3/5 [01:40<01:07, 33.68s/it]
75
  80%|████████ | 4/5 [02:14<00:33, 33.95s/it]
log/zero_shot/bd_math/generation/llama3.1/1/0-4.log ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/5 [00:00<?, ?it/s]
1
  20%|██ | 1/5 [00:32<02:08, 32.22s/it]
2
  40%|████ | 2/5 [01:04<01:36, 32.26s/it]
3
  60%|██████ | 3/5 [01:37<01:04, 32.46s/it]
4
  80%|████████ | 4/5 [02:08<00:31, 31.95s/it]
 
1
+ [I1021 17:32:10.125052711 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=4, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:32:19 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:32:19 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:32:19 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:32:33.972711339 TCPStore.cpp:312] [c10d - debug] The server has started on port = 50257.
13
+ [I1021 17:32:33.972854330 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:32:33.973899496 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 50257).
15
+ [I1021 17:32:33.974076748 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:50257 on [n117-192-077.byted.org]:36032.
16
+ [I1021 17:32:33.977247933 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:50257
17
+ [W1021 17:32:33.977705132 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:32:33.977759345 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:32:33.977767208 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:32:33.978313638 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc3d5b40, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:32:33.978330391 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:32:33.992085966 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc3d5b40, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:32:33.992106035 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:32:33.993880616 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc3d5b40, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:32:33.993899642 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:32:33 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ INFO 10-21 17:32:38 model_runner.py:1071] Loading model weights took 14.9888 GB
35
+ INFO 10-21 17:32:38 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
36
+ INFO 10-21 17:32:38 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
37
+ INFO 10-21 17:32:41 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
38
+ INFO 10-21 17:32:41 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
39
+ INFO 10-21 17:32:52 model_runner.py:1530] Graph capturing finished in 11 secs.
40
+ ../../Llama-3.1-8B/
41
+ load data
42
+ Sampled Question:
43
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
44
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
45
+
46
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
47
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
48
+
49
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
50
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
51
+ 30n&=480\
52
+ \Rightarrow\qquad n&=480/30=16
53
+ \end{align*}
54
+ Final Answer: The answer is $16$. I hope it is correct.
55
+
56
+ Question: If the system of equations
57
+
58
+ \begin{align*}
59
+ 6x-4y&=a,\
60
+ 6y-9x &=b.
61
+ \end{align*}
62
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
63
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
64
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
65
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
66
+
67
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
68
+ Let's think step by step.
69
+ fitered 2688 already
70
+
71
  0%| | 0/5 [00:00<?, ?it/s]
72
  20%|██ | 1/5 [00:32<02:08, 32.22s/it]
73
  40%|████ | 2/5 [01:04<01:36, 32.26s/it]
74
  60%|██████ | 3/5 [01:37<01:04, 32.46s/it]
75
  80%|████████ | 4/5 [02:08<00:31, 31.95s/it]
log/zero_shot/bd_math/generation/llama3.1/1/0-5.log ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/5 [00:00<?, ?it/s]
1
  20%|██ | 1/5 [00:32<02:10, 32.59s/it]
2
  40%|████ | 2/5 [01:04<01:37, 32.48s/it]
3
  60%|██████ | 3/5 [01:37<01:04, 32.45s/it]
4
  80%|████████ | 4/5 [02:09<00:32, 32.38s/it]
 
1
+ [I1021 17:32:10.125040555 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=5, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:32:19 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:32:19 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:32:19 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:32:33.090602385 TCPStore.cpp:312] [c10d - debug] The server has started on port = 33905.
13
+ [I1021 17:32:33.090752648 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:32:33.091735533 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 33905).
15
+ [I1021 17:32:33.091839154 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:33905 on [n117-192-077.byted.org]:41502.
16
+ [I1021 17:32:33.094560740 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:33905
17
+ [W1021 17:32:33.094958364 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:32:33.095016973 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:32:33.095024132 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:32:33.095509577 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc0910c0, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:32:33.095526138 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:32:33.108353125 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc0910c0, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:32:33.108373329 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:32:33.110149042 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc0910c0, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:32:33.110166605 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:32:33 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ INFO 10-21 17:32:38 model_runner.py:1071] Loading model weights took 14.9888 GB
35
+ INFO 10-21 17:32:38 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
36
+ INFO 10-21 17:32:38 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
37
+ INFO 10-21 17:32:42 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
38
+ INFO 10-21 17:32:42 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
39
+ INFO 10-21 17:32:53 model_runner.py:1530] Graph capturing finished in 11 secs.
40
+ ../../Llama-3.1-8B/
41
+ load data
42
+ Sampled Question:
43
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
44
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
45
+
46
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
47
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
48
+
49
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
50
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
51
+ 30n&=480\
52
+ \Rightarrow\qquad n&=480/30=16
53
+ \end{align*}
54
+ Final Answer: The answer is $16$. I hope it is correct.
55
+
56
+ Question: If the system of equations
57
+
58
+ \begin{align*}
59
+ 6x-4y&=a,\
60
+ 6y-9x &=b.
61
+ \end{align*}
62
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
63
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
64
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
65
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
66
+
67
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
68
+ Let's think step by step.
69
+ fitered 2688 already
70
+
71
  0%| | 0/5 [00:00<?, ?it/s]
72
  20%|██ | 1/5 [00:32<02:10, 32.59s/it]
73
  40%|████ | 2/5 [01:04<01:37, 32.48s/it]
74
  60%|██████ | 3/5 [01:37<01:04, 32.45s/it]
75
  80%|████████ | 4/5 [02:09<00:32, 32.38s/it]
log/zero_shot/bd_math/generation/llama3.1/1/0-6.log ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/5 [00:00<?, ?it/s]
1
  20%|██ | 1/5 [00:32<02:08, 32.20s/it]
2
  40%|████ | 2/5 [01:05<01:37, 32.65s/it]
3
  60%|██████ | 3/5 [01:38<01:05, 32.83s/it]
4
  80%|████████ | 4/5 [02:10<00:32, 32.74s/it]
 
1
+ [I1021 17:32:10.125199301 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=6, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:32:19 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:32:19 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:32:19 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:32:33.074960847 TCPStore.cpp:312] [c10d - debug] The server has started on port = 46227.
13
+ [I1021 17:32:33.074978008 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:32:33.076109583 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 46227).
15
+ [I1021 17:32:33.076215237 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:46227 on [n117-192-077.byted.org]:61802.
16
+ [I1021 17:32:33.079088203 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:46227
17
+ [W1021 17:32:33.079563703 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:32:33.079628503 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:32:33.079635879 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:32:33.080124838 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xac4d000, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:32:33.080142722 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:32:33.092970648 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xac4d000, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:32:33.092991704 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:32:33.094497493 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xac4d000, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:32:33.094515803 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:32:33 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ INFO 10-21 17:32:38 model_runner.py:1071] Loading model weights took 14.9888 GB
35
+ INFO 10-21 17:32:38 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
36
+ INFO 10-21 17:32:38 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
37
+ INFO 10-21 17:32:42 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
38
+ INFO 10-21 17:32:42 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
39
+ INFO 10-21 17:32:53 model_runner.py:1530] Graph capturing finished in 11 secs.
40
+ ../../Llama-3.1-8B/
41
+ load data
42
+ Sampled Question:
43
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
44
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
45
+
46
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
47
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
48
+
49
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
50
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
51
+ 30n&=480\
52
+ \Rightarrow\qquad n&=480/30=16
53
+ \end{align*}
54
+ Final Answer: The answer is $16$. I hope it is correct.
55
+
56
+ Question: If the system of equations
57
+
58
+ \begin{align*}
59
+ 6x-4y&=a,\
60
+ 6y-9x &=b.
61
+ \end{align*}
62
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
63
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
64
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
65
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
66
+
67
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
68
+ Let's think step by step.
69
+ fitered 2688 already
70
+
71
  0%| | 0/5 [00:00<?, ?it/s]
72
  20%|██ | 1/5 [00:32<02:08, 32.20s/it]
73
  40%|████ | 2/5 [01:05<01:37, 32.65s/it]
74
  60%|██████ | 3/5 [01:38<01:05, 32.83s/it]
75
  80%|████████ | 4/5 [02:10<00:32, 32.74s/it]
log/zero_shot/bd_math/generation/llama3.1/1/0-7.log ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/5 [00:00<?, ?it/s]
1
  20%|██ | 1/5 [00:33<02:12, 33.13s/it]
2
  40%|████ | 2/5 [01:06<01:40, 33.41s/it]
3
  60%|██████ | 3/5 [01:39<01:05, 32.97s/it]
4
  80%|████████ | 4/5 [02:13<00:33, 33.48s/it]
 
1
+ [I1021 17:32:10.135778790 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=7, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ WARNING 10-21 17:32:20 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
10
+ INFO 10-21 17:32:20 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
11
+ INFO 10-21 17:32:20 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
12
+ [I1021 17:32:33.334206825 TCPStore.cpp:312] [c10d - debug] The server has started on port = 38167.
13
+ [I1021 17:32:33.334229091 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
14
+ [I1021 17:32:33.335374254 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 38167).
15
+ [I1021 17:32:33.335531417 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:38167 on [n117-192-077.byted.org]:49140.
16
+ [I1021 17:32:33.338344306 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:38167
17
+ [W1021 17:32:33.338772786 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
18
+ [I1021 17:32:33.338836768 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
19
+ [I1021 17:32:33.338844491 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
20
+ [rank0]:[I1021 17:32:33.339392327 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb07b820, SPLIT_COLOR: 3389850942126204093, PG Name: 1
21
+ [rank0]:[I1021 17:32:33.339413264 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
22
+ [rank0]:[I1021 17:32:33.352959977 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb07b820, SPLIT_COLOR: 3389850942126204093, PG Name: 3
23
+ [rank0]:[I1021 17:32:33.352979191 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
24
+ [rank0]:[I1021 17:32:33.354472064 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb07b820, SPLIT_COLOR: 3389850942126204093, PG Name: 5
25
+ [rank0]:[I1021 17:32:33.354489641 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
26
+ INFO 10-21 17:32:33 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ INFO 10-21 17:32:38 model_runner.py:1071] Loading model weights took 14.9888 GB
35
+ INFO 10-21 17:32:38 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
36
+ INFO 10-21 17:32:38 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
37
+ INFO 10-21 17:32:42 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
38
+ INFO 10-21 17:32:42 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
39
+ INFO 10-21 17:32:53 model_runner.py:1530] Graph capturing finished in 11 secs.
40
+ ../../Llama-3.1-8B/
41
+ load data
42
+ Sampled Question:
43
+ Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
44
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
45
+
46
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
47
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
48
+
49
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
50
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
51
+ 30n&=480\
52
+ \Rightarrow\qquad n&=480/30=16
53
+ \end{align*}
54
+ Final Answer: The answer is $16$. I hope it is correct.
55
+
56
+ Question: If the system of equations
57
+
58
+ \begin{align*}
59
+ 6x-4y&=a,\
60
+ 6y-9x &=b.
61
+ \end{align*}
62
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
63
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
64
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
65
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
66
+
67
+ Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
68
+ Let's think step by step.
69
+ fitered 2688 already
70
+
71
  0%| | 0/5 [00:00<?, ?it/s]
72
  20%|██ | 1/5 [00:33<02:12, 33.13s/it]
73
  40%|████ | 2/5 [01:06<01:40, 33.41s/it]
74
  60%|██████ | 3/5 [01:39<01:05, 32.97s/it]
75
  80%|████████ | 4/5 [02:13<00:33, 33.48s/it]
log/zero_shot/bd_math/generation/llama3.1_70b/1/0-0.log ADDED
@@ -0,0 +1,694 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [I1021 20:06:01.507189142 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ /opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
3
+ No module named 'vllm._version'
4
+ from vllm.version import __version__ as VLLM_VERSION
5
+ llama3.1_70b
6
+ *****************************
7
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Meta-Llama-3.1-70B', model_type='llama3.1_70b', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1_70b/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=0, tensor_parallel=8, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
8
+ *****************************
9
+ INFO 10-21 20:06:08 config.py:887] Defaulting to use mp for distributed inference
10
+ WARNING 10-21 20:06:08 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
11
+ INFO 10-21 20:06:08 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
12
+ INFO 10-21 20:06:08 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Meta-Llama-3.1-70B', speculative_config=None, tokenizer='../../Meta-Llama-3.1-70B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Meta-Llama-3.1-70B, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
13
+ INFO 10-21 20:06:08 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
14
+ (VllmWorkerProcess pid=76761) INFO 10-21 20:06:09 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
15
+ (VllmWorkerProcess pid=76762) INFO 10-21 20:06:10 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
16
+ (VllmWorkerProcess pid=76766) INFO 10-21 20:06:10 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
17
+ (VllmWorkerProcess pid=76764) INFO 10-21 20:06:10 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
18
+ (VllmWorkerProcess pid=76763) INFO 10-21 20:06:10 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
19
+ (VllmWorkerProcess pid=76765) INFO 10-21 20:06:10 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
20
+ (VllmWorkerProcess pid=76767) INFO 10-21 20:06:10 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
21
+ [I1021 20:06:18.860192414 TCPStore.cpp:312] [c10d - debug] The server has started on port = 59993.
22
+ [I1021 20:06:18.860219808 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
23
+ [I1021 20:06:18.865365373 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 59993).
24
+ [I1021 20:06:18.865511282 socket.cpp:884] [c10d] The client socket has connected to [localhost]:59993 on [localhost]:43706.
25
+ [I1021 20:06:18.868928831 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:59993
26
+ [I1021 20:06:21.965674859 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 59993).
27
+ [I1021 20:06:21.965952600 socket.cpp:884] [c10d] The client socket has connected to [localhost]:59993 on [localhost]:43722.
28
+ [I1021 20:06:21.969862366 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:59993
29
+ [W1021 20:06:21.970432129 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
30
+ [I1021 20:06:21.970516770 ProcessGroupNCCL.cpp:852] [PG 0 Rank 1] ProcessGroupNCCL initialization options: size: 8, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
31
+ [I1021 20:06:21.970527945 ProcessGroupNCCL.cpp:861] [PG 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
32
+ [rank1]:[I1021 20:06:21.971263049 ProcessGroupNCCL.cpp:852] [PG 1 Rank 1] ProcessGroupNCCL initialization options: size: 8, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbfffdb0, SPLIT_COLOR: 4318754687966092895, PG Name: 1
33
+ [rank1]:[I1021 20:06:21.971273302 ProcessGroupNCCL.cpp:861] [PG 1 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
34
+ [I1021 20:06:23.273097130 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 59993).
35
+ [I1021 20:06:23.273345617 socket.cpp:884] [c10d] The client socket has connected to [localhost]:59993 on [localhost]:54358.
36
+ [I1021 20:06:23.276727555 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:59993
37
+ [W1021 20:06:23.277421295 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
38
+ [I1021 20:06:23.277538066 ProcessGroupNCCL.cpp:852] [PG 0 Rank 2] ProcessGroupNCCL initialization options: size: 8, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
39
+ [I1021 20:06:23.277551315 ProcessGroupNCCL.cpp:861] [PG 0 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
40
+ [rank2]:[I1021 20:06:23.278491857 ProcessGroupNCCL.cpp:852] [PG 1 Rank 2] ProcessGroupNCCL initialization options: size: 8, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbfff790, SPLIT_COLOR: 4318754687966092895, PG Name: 1
41
+ [rank2]:[I1021 20:06:23.278506361 ProcessGroupNCCL.cpp:861] [PG 1 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
42
+ [I1021 20:06:23.444586505 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 59993).
43
+ [I1021 20:06:23.444804251 socket.cpp:884] [c10d] The client socket has connected to [localhost]:59993 on [localhost]:54364.
44
+ [I1021 20:06:23.447833718 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:59993
45
+ [W1021 20:06:23.448652643 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
46
+ [I1021 20:06:23.448808736 ProcessGroupNCCL.cpp:852] [PG 0 Rank 5] ProcessGroupNCCL initialization options: size: 8, global rank: 5, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
47
+ [I1021 20:06:23.448821928 ProcessGroupNCCL.cpp:861] [PG 0 Rank 5] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
48
+ [I1021 20:06:23.449689011 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 59993).
49
+ [rank5]:[I1021 20:06:23.449844165 ProcessGroupNCCL.cpp:852] [PG 1 Rank 5] ProcessGroupNCCL initialization options: size: 8, global rank: 5, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbffabc0, SPLIT_COLOR: 4318754687966092895, PG Name: 1
50
+ [rank5]:[I1021 20:06:23.449859165 ProcessGroupNCCL.cpp:861] [PG 1 Rank 5] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
51
+ [I1021 20:06:23.449901305 socket.cpp:884] [c10d] The client socket has connected to [localhost]:59993 on [localhost]:54366.
52
+ [I1021 20:06:23.453270102 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:59993
53
+ [W1021 20:06:23.453607173 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
54
+ [I1021 20:06:23.453667514 ProcessGroupNCCL.cpp:852] [PG 0 Rank 7] ProcessGroupNCCL initialization options: size: 8, global rank: 7, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
55
+ [I1021 20:06:23.453676210 ProcessGroupNCCL.cpp:861] [PG 0 Rank 7] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
56
+ [rank7]:[I1021 20:06:23.454262625 ProcessGroupNCCL.cpp:852] [PG 1 Rank 7] ProcessGroupNCCL initialization options: size: 8, global rank: 7, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbffa2f0, SPLIT_COLOR: 4318754687966092895, PG Name: 1
57
+ [rank7]:[I1021 20:06:23.454273108 ProcessGroupNCCL.cpp:861] [PG 1 Rank 7] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
58
+ [I1021 20:06:23.466284996 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 59993).
59
+ [I1021 20:06:23.466491124 socket.cpp:884] [c10d] The client socket has connected to [localhost]:59993 on [localhost]:54378.
60
+ [I1021 20:06:23.470102934 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:59993
61
+ [W1021 20:06:23.470644552 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
62
+ [I1021 20:06:23.470751865 ProcessGroupNCCL.cpp:852] [PG 0 Rank 4] ProcessGroupNCCL initialization options: size: 8, global rank: 4, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
63
+ [I1021 20:06:23.470764734 ProcessGroupNCCL.cpp:861] [PG 0 Rank 4] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
64
+ [rank4]:[I1021 20:06:23.471662944 ProcessGroupNCCL.cpp:852] [PG 1 Rank 4] ProcessGroupNCCL initialization options: size: 8, global rank: 4, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbfffbf0, SPLIT_COLOR: 4318754687966092895, PG Name: 1
65
+ [rank4]:[I1021 20:06:23.471679305 ProcessGroupNCCL.cpp:861] [PG 1 Rank 4] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
66
+ [I1021 20:06:23.474160519 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 59993).
67
+ [I1021 20:06:23.476517856 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 59993).
68
+ [I1021 20:06:23.474331404 socket.cpp:884] [c10d] The client socket has connected to [localhost]:59993 on [localhost]:54386.
69
+ [I1021 20:06:23.477069035 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:59993
70
+ [W1021 20:06:23.477501669 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
71
+ [I1021 20:06:23.477586586 ProcessGroupNCCL.cpp:852] [PG 0 Rank 3] ProcessGroupNCCL initialization options: size: 8, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
72
+ [I1021 20:06:23.477595590 ProcessGroupNCCL.cpp:861] [PG 0 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
73
+ [rank3]:[I1021 20:06:23.478316936 ProcessGroupNCCL.cpp:852] [PG 1 Rank 3] ProcessGroupNCCL initialization options: size: 8, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc000360, SPLIT_COLOR: 4318754687966092895, PG Name: 1
74
+ [rank3]:[I1021 20:06:23.478335862 ProcessGroupNCCL.cpp:861] [PG 1 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
75
+ [I1021 20:06:23.476669028 socket.cpp:884] [c10d] The client socket has connected to [localhost]:59993 on [localhost]:54394.
76
+ [I1021 20:06:23.479261648 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:59993
77
+ [W1021 20:06:23.479708475 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
78
+ [I1021 20:06:23.479775433 ProcessGroupNCCL.cpp:852] [PG 0 Rank 6] ProcessGroupNCCL initialization options: size: 8, global rank: 6, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
79
+ [I1021 20:06:23.479782269 ProcessGroupNCCL.cpp:861] [PG 0 Rank 6] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
80
+ [W1021 20:06:23.479883861 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
81
+ [I1021 20:06:23.479973809 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 8, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
82
+ [I1021 20:06:23.479983005 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
83
+ [rank6]:[I1021 20:06:23.480405939 ProcessGroupNCCL.cpp:852] [PG 1 Rank 6] ProcessGroupNCCL initialization options: size: 8, global rank: 6, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbffa7a0, SPLIT_COLOR: 4318754687966092895, PG Name: 1
84
+ [rank6]:[I1021 20:06:23.480416985 ProcessGroupNCCL.cpp:861] [PG 1 Rank 6] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
85
+ [rank0]:[I1021 20:06:23.480719766 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 8, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbfcd5e0, SPLIT_COLOR: 4318754687966092895, PG Name: 1
86
+ [rank0]:[I1021 20:06:23.480735964 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
87
+ [rank0]:[I1021 20:06:23.497096299 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 8, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbfcd5e0, SPLIT_COLOR: 4318754687966092895, PG Name: 3
88
+ [rank0]:[I1021 20:06:23.497116924 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
89
+ [rank1]:[I1021 20:06:23.497151644 ProcessGroupNCCL.cpp:852] [PG 3 Rank 1] ProcessGroupNCCL initialization options: size: 8, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbfffdb0, SPLIT_COLOR: 4318754687966092895, PG Name: 3
90
+ [rank1]:[I1021 20:06:23.497172942 ProcessGroupNCCL.cpp:861] [PG 3 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
91
+ [rank3]:[I1021 20:06:23.497358021 ProcessGroupNCCL.cpp:852] [PG 3 Rank 3] ProcessGroupNCCL initialization options: size: 8, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc000360, SPLIT_COLOR: 4318754687966092895, PG Name: 3
92
+ [rank3]:[I1021 20:06:23.497380186 ProcessGroupNCCL.cpp:861] [PG 3 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
93
+ [rank4]:[I1021 20:06:23.497686688 ProcessGroupNCCL.cpp:852] [PG 3 Rank 4] ProcessGroupNCCL initialization options: size: 8, global rank: 4, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbfffbf0, SPLIT_COLOR: 4318754687966092895, PG Name: 3
94
+ [rank4]:[I1021 20:06:23.497706877 ProcessGroupNCCL.cpp:861] [PG 3 Rank 4] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
95
+ [rank2]:[I1021 20:06:23.497772323 ProcessGroupNCCL.cpp:852] [PG 3 Rank 2] ProcessGroupNCCL initialization options: size: 8, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbfff790, SPLIT_COLOR: 4318754687966092895, PG Name: 3
96
+ [rank2]:[I1021 20:06:23.497794727 ProcessGroupNCCL.cpp:861] [PG 3 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
97
+ [rank5]:[I1021 20:06:23.506746830 ProcessGroupNCCL.cpp:852] [PG 3 Rank 5] ProcessGroupNCCL initialization options: size: 8, global rank: 5, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbffabc0, SPLIT_COLOR: 4318754687966092895, PG Name: 3
98
+ [rank5]:[I1021 20:06:23.506767224 ProcessGroupNCCL.cpp:861] [PG 3 Rank 5] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
99
+ [rank6]:[I1021 20:06:23.508461369 ProcessGroupNCCL.cpp:852] [PG 3 Rank 6] ProcessGroupNCCL initialization options: size: 8, global rank: 6, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbffa7a0, SPLIT_COLOR: 4318754687966092895, PG Name: 3
100
+ [rank6]:[I1021 20:06:23.508482893 ProcessGroupNCCL.cpp:861] [PG 3 Rank 6] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
101
+ [rank7]:[I1021 20:06:23.508495989 ProcessGroupNCCL.cpp:852] [PG 3 Rank 7] ProcessGroupNCCL initialization options: size: 8, global rank: 7, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xbffa2f0, SPLIT_COLOR: 4318754687966092895, PG Name: 3
102
+ [rank7]:[I1021 20:06:23.508517065 ProcessGroupNCCL.cpp:861] [PG 3 Rank 7] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
103
+ (VllmWorkerProcess pid=76762) INFO 10-21 20:06:23 utils.py:1008] Found nccl from library libnccl.so.2
104
+ (VllmWorkerProcess pid=76761) INFO 10-21 20:06:23 utils.py:1008] Found nccl from library libnccl.so.2
105
+ INFO 10-21 20:06:23 utils.py:1008] Found nccl from library libnccl.so.2
106
+ (VllmWorkerProcess pid=76764) INFO 10-21 20:06:23 utils.py:1008] Found nccl from library libnccl.so.2
107
+ (VllmWorkerProcess pid=76762) INFO 10-21 20:06:23 pynccl.py:63] vLLM is using nccl==2.20.5
108
+ (VllmWorkerProcess pid=76761) INFO 10-21 20:06:23 pynccl.py:63] vLLM is using nccl==2.20.5
109
+ INFO 10-21 20:06:23 pynccl.py:63] vLLM is using nccl==2.20.5
110
+ (VllmWorkerProcess pid=76764) INFO 10-21 20:06:23 pynccl.py:63] vLLM is using nccl==2.20.5
111
+ (VllmWorkerProcess pid=76767) INFO 10-21 20:06:23 utils.py:1008] Found nccl from library libnccl.so.2
112
+ (VllmWorkerProcess pid=76767) INFO 10-21 20:06:23 pynccl.py:63] vLLM is using nccl==2.20.5
113
+ (VllmWorkerProcess pid=76765) INFO 10-21 20:06:23 utils.py:1008] Found nccl from library libnccl.so.2
114
+ (VllmWorkerProcess pid=76766) INFO 10-21 20:06:23 utils.py:1008] Found nccl from library libnccl.so.2
115
+ (VllmWorkerProcess pid=76765) INFO 10-21 20:06:23 pynccl.py:63] vLLM is using nccl==2.20.5
116
+ (VllmWorkerProcess pid=76766) INFO 10-21 20:06:23 pynccl.py:63] vLLM is using nccl==2.20.5
117
+ (VllmWorkerProcess pid=76763) INFO 10-21 20:06:23 utils.py:1008] Found nccl from library libnccl.so.2
118
+ (VllmWorkerProcess pid=76763) INFO 10-21 20:06:23 pynccl.py:63] vLLM is using nccl==2.20.5
119
+ n117-192-077:76638:76638 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
120
+ n117-192-077:76638:76638 [0] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
121
+ n117-192-077:76638:76638 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
122
+ n117-192-077:76638:76638 [0] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
123
+ n117-192-077:76638:76638 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
124
+ n117-192-077:76638:76638 [0] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
125
+ n117-192-077:76638:76638 [0] NCCL INFO cudaDriverVersion 12020
126
+ NCCL version 2.20.5+cuda12.4
127
+ n117-192-077:76763:76763 [3] NCCL INFO cudaDriverVersion 12020
128
+ n117-192-077:76763:76763 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
129
+ n117-192-077:76763:76763 [3] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
130
+ n117-192-077:76763:76763 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
131
+ n117-192-077:76763:76763 [3] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
132
+ n117-192-077:76763:76763 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
133
+ n117-192-077:76763:76763 [3] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
134
+ n117-192-077:76763:76763 [3] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
135
+ n117-192-077:76763:76763 [3] NCCL INFO P2P plugin IBext_v7
136
+ n117-192-077:76763:76763 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
137
+ n117-192-077:76763:76763 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
138
+ n117-192-077:76763:76763 [3] NCCL INFO Using non-device net plugin version 0
139
+ n117-192-077:76763:76763 [3] NCCL INFO Using network IBext_v7
140
+ n117-192-077:76763:76763 [3] NCCL INFO comm 0xc049370 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 800000 commId 0xde285eafbeebd1fd - Init START
141
+ n117-192-077:76763:76763 [3] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
142
+ n117-192-077:76763:76763 [3] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
143
+ n117-192-077:76763:76763 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,ffff0000,00000000
144
+ n117-192-077:76763:76763 [3] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
145
+ n117-192-077:76763:76763 [3] NCCL INFO comm 0xc049370 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 0
146
+ n117-192-077:76763:76763 [3] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
147
+ n117-192-077:76763:76763 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2
148
+ n117-192-077:76763:76763 [3] NCCL INFO P2P Chunksize set to 524288
149
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/IPC
150
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/IPC
151
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/IPC
152
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/IPC
153
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/IPC
154
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/IPC
155
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/IPC
156
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/IPC
157
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/IPC
158
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/IPC
159
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/IPC
160
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/IPC
161
+ n117-192-077:76763:76763 [3] NCCL INFO Connected all rings
162
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/IPC
163
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/IPC
164
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/IPC
165
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/IPC
166
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/IPC
167
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/IPC
168
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/IPC
169
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/IPC
170
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/IPC
171
+ n117-192-077:76763:76763 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/IPC
172
+ n117-192-077:767n117-192-077:76767:76767 [7] NCCL INFO cudaDriverVersion 12020
173
+ n117-192-077:76767:76767 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
174
+ n117-192-077:76767:76767 [7] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
175
+ n117-192-077:76767:76767 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
176
+ n117-192-077:76767:76767 [7] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
177
+ n117-192-077:76767:76767 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
178
+ n117-192-077:76767:76767 [7] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
179
+ n117-192-077:76767:76767 [7] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
180
+ n117-192-077:76767:76767 [7] NCCL INFO P2P plugin IBext_v7
181
+ n117-192-077:76767:76767 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
182
+ n117-192-077:76767:76767 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
183
+ n117-192-077:76767:76767 [7] NCCL INFO Using non-device net plugin version 0
184
+ n117-192-077:76767:76767 [7] NCCL INFO Using network IBext_v7
185
+ n117-192-077:76767:76767 [7] NCCL INFO comm 0xc081c80 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c00000 commId 0xde285eafbeebd1fd - Init START
186
+ n117-192-077:76767:76767 [7] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
187
+ n117-192-077:76767:76767 [7] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
188
+ n117-192-077:76767:76767 [7] NCCL INFO Setting affinity for GPU 7 to ffff,ffffffff
189
+ n117-192-077:76767:76767 [7] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
190
+ n117-192-077:76767:76767 [7] NCCL INFO comm 0xc081c80 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 0
191
+ n117-192-077:76767:76767 [7] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
192
+ n117-192-077:76767:76767 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6
193
+ n117-192-077:76767:76767 [7] NCCL INFO P2P Chunksize set to 524288
194
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/IPC
195
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/IPC
196
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/IPC
197
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/IPC
198
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/IPC
199
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/IPC
200
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/IPC
201
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/IPC
202
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/IPC
203
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/IPC
204
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/IPC
205
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/IPC
206
+ n117-192-077:76767:76767 [7] NCCL INFO Connected all rings
207
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/IPC
208
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/IPC
209
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/IPC
210
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/IPC
211
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/IPC
212
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/IPC
213
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/IPC
214
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/IPC
215
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/IPC
216
+ n117-192-077:76767:76767 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/IPC
217
+ n117-192-077:7676n117-192-077:76766:76766 [6] NCCL INFO cudaDriverVersion 12020
218
+ n117-192-077:76766:76766 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
219
+ n117-192-077:76766:76766 [6] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
220
+ n117-192-077:76766:76766 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
221
+ n117-192-077:76766:76766 [6] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
222
+ n117-192-077:76766:76766 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
223
+ n117-192-077:76766:76766 [6] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
224
+ n117-192-077:76766:76766 [6] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
225
+ n117-192-077:76766:76766 [6] NCCL INFO P2P plugin IBext_v7
226
+ n117-192-077:76766:76766 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
227
+ n117-192-077:76766:76766 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
228
+ n117-192-077:76766:76766 [6] NCCL INFO Using non-device net plugin version 0
229
+ n117-192-077:76766:76766 [6] NCCL INFO Using network IBext_v7
230
+ n117-192-077:76766:76766 [6] NCCL INFO comm 0xc081d20 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId b00000 commId 0xde285eafbeebd1fd - Init START
231
+ n117-192-077:76766:76766 [6] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
232
+ n117-192-077:76766:76766 [6] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
233
+ n117-192-077:76766:76766 [6] NCCL INFO Setting affinity for GPU 6 to ffff,ffffffff
234
+ n117-192-077:76766:76766 [6] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
235
+ n117-192-077:76766:76766 [6] NCCL INFO comm 0xc081d20 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 0
236
+ n117-192-077:76766:76766 [6] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
237
+ n117-192-077:76766:76766 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5
238
+ n117-192-077:76766:76766 [6] NCCL INFO P2P Chunksize set to 524288
239
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/IPC
240
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/IPC
241
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/IPC
242
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/IPC
243
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/IPC
244
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/IPC
245
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/IPC
246
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/IPC
247
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/IPC
248
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/IPC
249
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/IPC
250
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/IPC
251
+ n117-192-077:76766:76766 [6] NCCL INFO Connected all rings
252
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/IPC
253
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/IPC
254
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/IPC
255
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/IPC
256
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/IPC
257
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/IPC
258
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/IPC
259
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/IPC
260
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/IPC
261
+ n117-192-077:76766:76766 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/IPC
262
+ n117-192-077:76766:76766 [6] n117-192-077:76762:76762 [2] NCCL INFO cudaDriverVersion 12020
263
+ n117-192-077:76762:76762 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
264
+ n117-192-077:76762:76762 [2] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
265
+ n117-192-077:76762:76762 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
266
+ n117-192-077:76762:76762 [2] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
267
+ n117-192-077:76762:76762 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
268
+ n117-192-077:76762:76762 [2] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
269
+ n117-192-077:76762:76762 [2] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
270
+ n117-192-077:76762:76762 [2] NCCL INFO P2P plugin IBext_v7
271
+ n117-192-077:76762:76762 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
272
+ n117-192-077:76762:76762 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
273
+ n117-192-077:76762:76762 [2] NCCL INFO Using non-device net plugin version 0
274
+ n117-192-077:76762:76762 [2] NCCL INFO Using network IBext_v7
275
+ n117-192-077:76762:76762 [2] NCCL INFO comm 0xc086f40 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 300000 commId 0xde285eafbeebd1fd - Init START
276
+ n117-192-077:76762:76762 [2] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
277
+ n117-192-077:76762:76762 [2] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
278
+ n117-192-077:76762:76762 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffff0000,00000000
279
+ n117-192-077:76762:76762 [2] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
280
+ n117-192-077:76762:76762 [2] NCCL INFO comm 0xc086f40 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0
281
+ n117-192-077:76762:76762 [2] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
282
+ n117-192-077:76762:76762 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1
283
+ n117-192-077:76762:76762 [2] NCCL INFO P2P Chunksize set to 524288
284
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/IPC
285
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/IPC
286
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/IPC
287
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/IPC
288
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/IPC
289
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/IPC
290
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/IPC
291
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/IPC
292
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/IPC
293
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/IPC
294
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/IPC
295
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/IPC
296
+ n117-192-077:76762:76762 [2] NCCL INFO Connected all rings
297
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/IPC
298
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/IPC
299
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/IPC
300
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/IPC
301
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/IPC
302
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/IPC
303
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/IPC
304
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/IPC
305
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/IPC
306
+ n117-192-077:76762:76762 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/IPC
307
+ n117-192-077:767n117-192-077:76764:76764 [4] NCCL INFO cudaDriverVersion 12020
308
+ n117-192-077:76764:76764 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
309
+ n117-192-077:76764:76764 [4] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
310
+ n117-192-077:76764:76764 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
311
+ n117-192-077:76764:76764 [4] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
312
+ n117-192-077:76764:76764 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
313
+ n117-192-077:76764:76764 [4] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
314
+ n117-192-077:76764:76764 [4] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
315
+ n117-192-077:76764:76764 [4] NCCL INFO P2P plugin IBext_v7
316
+ n117-192-077:76764:76764 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
317
+ n117-192-077:76764:76764 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
318
+ n117-192-077:76764:76764 [4] NCCL INFO Using non-device net plugin version 0
319
+ n117-192-077:76764:76764 [4] NCCL INFO Using network IBext_v7
320
+ n117-192-077:76764:76764 [4] NCCL INFO comm 0xc086e00 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 900000 commId 0xde285eafbeebd1fd - Init START
321
+ n117-192-077:76764:76764 [4] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
322
+ n117-192-077:76764:76764 [4] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
323
+ n117-192-077:76764:76764 [4] NCCL INFO Setting affinity for GPU 4 to ffff,ffffffff
324
+ n117-192-077:76764:76764 [4] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
325
+ n117-192-077:76764:76764 [4] NCCL INFO comm 0xc086e00 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 0
326
+ n117-192-077:76764:76764 [4] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
327
+ n117-192-077:76764:76764 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3
328
+ n117-192-077:76764:76764 [4] NCCL INFO P2P Chunksize set to 524288
329
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/IPC
330
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/IPC
331
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/IPC
332
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/IPC
333
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/IPC
334
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/IPC
335
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/IPC
336
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/IPC
337
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/IPC
338
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/IPC
339
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/IPC
340
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/IPC
341
+ n117-192-077:76764:76764 [4] NCCL INFO Connected all rings
342
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/IPC
343
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/IPC
344
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/IPC
345
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/IPC
346
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/IPC
347
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/IPC
348
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/IPC
349
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/IPC
350
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/IPC
351
+ n117-192-077:76764:76764 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/IPC
352
+ n117-192-077:76764:76764 [4] n117-192-077:76765:76765 [5] NCCL INFO cudaDriverVersion 12020
353
+ n117-192-077:76765:76765 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
354
+ n117-192-077:76765:76765 [5] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
355
+ n117-192-077:76765:76765 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
356
+ n117-192-077:76765:76765 [5] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
357
+ n117-192-077:76765:76765 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
358
+ n117-192-077:76765:76765 [5] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
359
+ n117-192-077:76765:76765 [5] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
360
+ n117-192-077:76765:76765 [5] NCCL INFO P2P plugin IBext_v7
361
+ n117-192-077:76765:76765 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
362
+ n117-192-077:76765:76765 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
363
+ n117-192-077:76765:76765 [5] NCCL INFO Using non-device net plugin version 0
364
+ n117-192-077:76765:76765 [5] NCCL INFO Using network IBext_v7
365
+ n117-192-077:76765:76765 [5] NCCL INFO comm 0xc082d50 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId a00000 commId 0xde285eafbeebd1fd - Init START
366
+ n117-192-077:76765:76765 [5] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
367
+ n117-192-077:76765:76765 [5] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
368
+ n117-192-077:76765:76765 [5] NCCL INFO Setting affinity for GPU 5 to ffff,ffffffff
369
+ n117-192-077:76765:76765 [5] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
370
+ n117-192-077:76765:76765 [5] NCCL INFO comm 0xc082d50 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 0
371
+ n117-192-077:76765:76765 [5] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
372
+ n117-192-077:76765:76765 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4
373
+ n117-192-077:76765:76765 [5] NCCL INFO P2P Chunksize set to 524288
374
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/IPC
375
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/IPC
376
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/IPC
377
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/IPC
378
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/IPC
379
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/IPC
380
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/IPC
381
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/IPC
382
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/IPC
383
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/IPC
384
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/IPC
385
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/IPC
386
+ n117-192-077:76765:76765 [5] NCCL INFO Connected all rings
387
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/IPC
388
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/IPC
389
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/IPC
390
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/IPC
391
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/IPC
392
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/IPC
393
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/IPC
394
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/IPC
395
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/IPC
396
+ n117-192-077:76765:76765 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/IPC
397
+ n117-192-077:76765:76765 [5] n117-192-077:76761:76761 [1] NCCL INFO cudaDriverVersion 12020
398
+ n117-192-077:76761:76761 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
399
+ n117-192-077:76761:76761 [1] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
400
+ n117-192-077:76761:76761 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
401
+ n117-192-077:76761:76761 [1] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
402
+ n117-192-077:76761:76761 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
403
+ n117-192-077:76761:76761 [1] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
404
+ n117-192-077:76761:76761 [1] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
405
+ n117-192-077:76761:76761 [1] NCCL INFO P2P plugin IBext_v7
406
+ n117-192-077:76761:76761 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
407
+ n117-192-077:76761:76761 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
408
+ n117-192-077:76761:76761 [1] NCCL INFO Using non-device net plugin version 0
409
+ n117-192-077:76761:76761 [1] NCCL INFO Using network IBext_v7
410
+ n117-192-077:76761:76761 [1] NCCL INFO comm 0xc086fc0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 200000 commId 0xde285eafbeebd1fd - Init START
411
+ n117-192-077:76761:76761 [1] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
412
+ n117-192-077:76761:76761 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
413
+ n117-192-077:76761:76761 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,ffff0000,00000000
414
+ n117-192-077:76761:76761 [1] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
415
+ n117-192-077:76761:76761 [1] NCCL INFO comm 0xc086fc0 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0
416
+ n117-192-077:76761:76761 [1] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
417
+ n117-192-077:76761:76761 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0
418
+ n117-192-077:76761:76761 [1] NCCL INFO P2P Chunksize set to 524288
419
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/IPC
420
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/IPC
421
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/IPC
422
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/IPC
423
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/IPC
424
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/IPC
425
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/IPC
426
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/IPC
427
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/IPC
428
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/IPC
429
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/IPC
430
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/IPC
431
+ n117-192-077:76761:76761 [1] NCCL INFO Connected all rings
432
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC
433
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC
434
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC
435
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC
436
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/IPC
437
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/IPC
438
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/IPC
439
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/IPC
440
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/IPC
441
+ n117-192-077:76761:76761 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/IPC
442
+ n117-192-077:767n117-192-077:76638:76638 [0] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
443
+ n117-192-077:76638:76638 [0] NCCL INFO P2P plugin IBext_v7
444
+ n117-192-077:76638:76638 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
445
+ n117-192-077:76638:76638 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
446
+ n117-192-077:76638:76638 [0] NCCL INFO Using non-device net plugin version 0
447
+ n117-192-077:76638:76638 [0] NCCL INFO Using network IBext_v7
448
+ n117-192-077:76638:76638 [0] NCCL INFO comm 0xc083260 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 100000 commId 0xde285eafbeebd1fd - Init START
449
+ n117-192-077:76638:76638 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
450
+ n117-192-077:76638:76638 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
451
+ n117-192-077:76638:76638 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffff0000,00000000
452
+ n117-192-077:76638:76638 [0] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
453
+ n117-192-077:76638:76638 [0] NCCL INFO comm 0xc083260 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 0
454
+ n117-192-077:76638:76638 [0] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
455
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 00/12 : 0 1 2 3 4 5 6 7
456
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 01/12 : 0 1 2 3 4 5 6 7
457
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 02/12 : 0 1 2 3 4 5 6 7
458
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 03/12 : 0 1 2 3 4 5 6 7
459
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 04/12 : 0 1 2 3 4 5 6 7
460
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 05/12 : 0 1 2 3 4 5 6 7
461
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 06/12 : 0 1 2 3 4 5 6 7
462
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 07/12 : 0 1 2 3 4 5 6 7
463
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 08/12 : 0 1 2 3 4 5 6 7
464
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 09/12 : 0 1 2 3 4 5 6 7
465
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 10/12 : 0 1 2 3 4 5 6 7
466
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 11/12 : 0 1 2 3 4 5 6 7
467
+ n117-192-077:76638:76638 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1
468
+ n117-192-077:76638:76638 [0] NCCL INFO P2P Chunksize set to 524288
469
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC
470
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC
471
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC
472
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC
473
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC
474
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC
475
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC
476
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC
477
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/IPC
478
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/IPC
479
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/IPC
480
+ n117-192-077:76638:76638 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/IPC
481
+ n117-192-077:76638:76638 [0] NCCL INFO Connected all rings
482
+ n117-192-077:76638:76638 [0] NCCL INFO Connected all trees
483
+ n117-192-077:76638:76638 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
484
+ n117-192-077:76638:76638 [0] NCCL INFO 12 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
485
+ n117-192-077:76638:76638 [0] NCCL INFO comm 0xc083260 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 100000 commId 0xde285eafbeebd1fd - Init COMPLETE
486
+ INFO 10-21 20:06:31 custom_all_reduce_utils.py:204] generating GPU P2P access cache in /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
487
+ (VllmWorkerProcess pid=76761) INFO 10-21 20:10:29 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
488
+ INFO 10-21 20:10:29 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
489
+ (VllmWorkerProcess pid=76763) INFO 10-21 20:10:29 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
490
+ (VllmWorkerProcess pid=76766) INFO 10-21 20:10:29 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
491
+ (VllmWorkerProcess pid=76765) INFO 10-21 20:10:29 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
492
+ (VllmWorkerProcess pid=76764) INFO 10-21 20:10:29 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
493
+ (VllmWorkerProcess pid=76767) INFO 10-21 20:10:29 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
494
+ (VllmWorkerProcess pid=76762) INFO 10-21 20:10:29 custom_all_reduce_utils.py:242] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
495
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
496
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 224, in _run_worker_process
497
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] output = executor(*args, **kwargs)
498
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 176, in init_device
499
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] init_worker_distributed_environment(self.parallel_config, self.rank,
500
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 456, in init_worker_distributed_environment
501
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
502
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1051, in ensure_model_parallel_initialized
503
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] initialize_model_parallel(tensor_model_parallel_size,
504
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1015, in initialize_model_parallel
505
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] _TP = init_model_parallel_group(group_ranks,
506
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 856, in init_model_parallel_group
507
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] return GroupCoordinator(
508
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 222, in __init__
509
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] self.ca_comm = CustomAllreduce(
510
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 171, in __init__
511
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] handles, offsets = self._get_ipc_meta(self.meta)
512
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 193, in _get_ipc_meta
513
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] data = inp.untyped_storage()._share_cuda_()
514
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
515
+ (VllmWorkerProcess pid=76761) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231]
516
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
517
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 224, in _run_worker_process
518
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] output = executor(*args, **kwargs)
519
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 176, in init_device
520
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] init_worker_distributed_environment(self.parallel_config, self.rank,
521
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 456, in init_worker_distributed_environment
522
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
523
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1051, in ensure_model_parallel_initialized
524
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] initialize_model_parallel(tensor_model_parallel_size,
525
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1015, in initialize_model_parallel
526
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] _TP = init_model_parallel_group(group_ranks,
527
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 856, in init_model_parallel_group
528
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] return GroupCoordinator(
529
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 222, in __init__
530
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] self.ca_comm = CustomAllreduce(
531
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 171, in __init__
532
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] handles, offsets = self._get_ipc_meta(self.meta)
533
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 193, in _get_ipc_meta
534
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] data = inp.untyped_storage()._share_cuda_()
535
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
536
+ (VllmWorkerProcess pid=76763) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231]
537
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
538
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 224, in _run_worker_process
539
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] output = executor(*args, **kwargs)
540
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 176, in init_device
541
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] init_worker_distributed_environment(self.parallel_config, self.rank,
542
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 456, in init_worker_distributed_environment
543
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
544
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1051, in ensure_model_parallel_initialized
545
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] initialize_model_parallel(tensor_model_parallel_size,
546
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1015, in initialize_model_parallel
547
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] _TP = init_model_parallel_group(group_ranks,
548
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 856, in init_model_parallel_group
549
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] return GroupCoordinator(
550
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 222, in __init__
551
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] self.ca_comm = CustomAllreduce(
552
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 171, in __init__
553
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] handles, offsets = self._get_ipc_meta(self.meta)
554
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 193, in _get_ipc_meta
555
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] data = inp.untyped_storage()._share_cuda_()
556
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
557
+ (VllmWorkerProcess pid=76766) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231]
558
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
559
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 224, in _run_worker_process
560
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] output = executor(*args, **kwargs)
561
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 176, in init_device
562
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] init_worker_distributed_environment(self.parallel_config, self.rank,
563
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 456, in init_worker_distributed_environment
564
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
565
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1051, in ensure_model_parallel_initialized
566
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] initialize_model_parallel(tensor_model_parallel_size,
567
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1015, in initialize_model_parallel
568
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] _TP = init_model_parallel_group(group_ranks,
569
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 856, in init_model_parallel_group
570
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] return GroupCoordinator(
571
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 222, in __init__
572
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] self.ca_comm = CustomAllreduce(
573
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 171, in __init__
574
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] handles, offsets = self._get_ipc_meta(self.meta)
575
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 193, in _get_ipc_meta
576
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] data = inp.untyped_storage()._share_cuda_()
577
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
578
+ (VllmWorkerProcess pid=76765) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231]
579
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
580
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 224, in _run_worker_process
581
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] output = executor(*args, **kwargs)
582
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 176, in init_device
583
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] init_worker_distributed_environment(self.parallel_config, self.rank,
584
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 456, in init_worker_distributed_environment
585
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
586
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1051, in ensure_model_parallel_initialized
587
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] initialize_model_parallel(tensor_model_parallel_size,
588
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1015, in initialize_model_parallel
589
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] _TP = init_model_parallel_group(group_ranks,
590
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 856, in init_model_parallel_group
591
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] return GroupCoordinator(
592
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 222, in __init__
593
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] self.ca_comm = CustomAllreduce(
594
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 171, in __init__
595
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] handles, offsets = self._get_ipc_meta(self.meta)
596
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 193, in _get_ipc_meta
597
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] data = inp.untyped_storage()._share_cuda_()
598
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
599
+ (VllmWorkerProcess pid=76767) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231]
600
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
601
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 224, in _run_worker_process
602
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] output = executor(*args, **kwargs)
603
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 176, in init_device
604
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] init_worker_distributed_environment(self.parallel_config, self.rank,
605
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 456, in init_worker_distributed_environment
606
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
607
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1051, in ensure_model_parallel_initialized
608
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] initialize_model_parallel(tensor_model_parallel_size,
609
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1015, in initialize_model_parallel
610
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] _TP = init_model_parallel_group(group_ranks,
611
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 856, in init_model_parallel_group
612
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] return GroupCoordinator(
613
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 222, in __init__
614
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] self.ca_comm = CustomAllreduce(
615
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 171, in __init__
616
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] handles, offsets = self._get_ipc_meta(self.meta)
617
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 193, in _get_ipc_meta
618
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] data = inp.untyped_storage()._share_cuda_()
619
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
620
+ (VllmWorkerProcess pid=76762) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231]
621
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
622
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 224, in _run_worker_process
623
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] output = executor(*args, **kwargs)
624
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 176, in init_device
625
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] init_worker_distributed_environment(self.parallel_config, self.rank,
626
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 456, in init_worker_distributed_environment
627
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
628
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1051, in ensure_model_parallel_initialized
629
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] initialize_model_parallel(tensor_model_parallel_size,
630
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1015, in initialize_model_parallel
631
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] _TP = init_model_parallel_group(group_ranks,
632
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 856, in init_model_parallel_group
633
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] return GroupCoordinator(
634
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 222, in __init__
635
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] self.ca_comm = CustomAllreduce(
636
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 171, in __init__
637
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] handles, offsets = self._get_ipc_meta(self.meta)
638
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 193, in _get_ipc_meta
639
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] data = inp.untyped_storage()._share_cuda_()
640
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
641
+ (VllmWorkerProcess pid=76764) ERROR 10-21 20:10:29 multiproc_worker_utils.py:231]
642
+ [rank0]: Traceback (most recent call last):
643
+ [rank0]: File "/opt/tiger/mariana/llama_output/vllm_generate.py", line 361, in <module>
644
+ [rank0]: main()
645
+ [rank0]: File "/opt/tiger/mariana/llama_output/vllm_generate.py", line 357, in main
646
+ [rank0]: generation(args)
647
+ [rank0]: File "/opt/tiger/mariana/llama_output/vllm_generate.py", line 244, in generation
648
+ [rank0]: model = LLM(
649
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 177, in __init__
650
+ [rank0]: self.llm_engine = LLMEngine.from_engine_args(
651
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 574, in from_engine_args
652
+ [rank0]: engine = cls(
653
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 335, in __init__
654
+ [rank0]: self.model_executor = executor_class(
655
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__
656
+ [rank0]: super().__init__(*args, **kwargs)
657
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 47, in __init__
658
+ [rank0]: self._init_executor()
659
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_gpu_executor.py", line 110, in _init_executor
660
+ [rank0]: self._run_workers("init_device")
661
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_gpu_executor.py", line 192, in _run_workers
662
+ [rank0]: driver_worker_output = driver_worker_method(*args, **kwargs)
663
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 176, in init_device
664
+ [rank0]: init_worker_distributed_environment(self.parallel_config, self.rank,
665
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 456, in init_worker_distributed_environment
666
+ [rank0]: ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
667
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1051, in ensure_model_parallel_initialized
668
+ [rank0]: initialize_model_parallel(tensor_model_parallel_size,
669
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 1015, in initialize_model_parallel
670
+ [rank0]: _TP = init_model_parallel_group(group_ranks,
671
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 856, in init_model_parallel_group
672
+ [rank0]: return GroupCoordinator(
673
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 222, in __init__
674
+ [rank0]: self.ca_comm = CustomAllreduce(
675
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 171, in __init__
676
+ [rank0]: handles, offsets = self._get_ipc_meta(self.meta)
677
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 193, in _get_ipc_meta
678
+ [rank0]: data = inp.untyped_storage()._share_cuda_()
679
+ [rank0]: RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
680
+ [rank0]:[I1021 20:10:29.253456024 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
681
+ [rank0]:[I1021 20:10:29.253598016 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
682
+ [rank0]:[I1021 20:10:29.254700814 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
683
+ [rank0]:[I1021 20:10:29.259816995 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
684
+ [rank0]:[I1021 20:10:29.261357971 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
685
+ [rank0]:[I1021 20:10:29.262652623 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
686
+ [rank0]:[I1021 20:10:29.263323241 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
687
+ INFO 10-21 20:10:31 multiproc_worker_utils.py:121] Killing local vLLM worker processes
688
+ Exception ignored in: <function CustomAllreduce.__del__ at 0x7fba622b35e0>
689
+ Traceback (most recent call last):
690
+ File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 292, in __del__
691
+ self.close()
692
+ File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 287, in close
693
+ if not self.disabled and self._ptr:
694
+ AttributeError: 'CustomAllreduce' object has no attribute '_ptr'
log/zero_shot/bd_math/generation/llama3.1_70b/1/0-4.log ADDED
@@ -0,0 +1,346 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [I1021 17:53:02.628389982 debug.cpp:49] [c10d] The debug level is set to INFO.
2
+ llama3.1_70b
3
+ *****************************
4
+ Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Meta-Llama-3.1-70B', model_type='llama3.1_70b', output_dir='generate_result/zero_shot/bd_math/generation/llama3.1_70b/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=4, tensor_parallel=4, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
5
+ *****************************
6
+ INFO 10-21 17:53:06 config.py:729] Defaulting to use mp for distributed inference
7
+ WARNING 10-21 17:53:06 arg_utils.py:766] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
8
+ INFO 10-21 17:53:06 config.py:820] Chunked prefill is enabled with max_num_batched_tokens=512.
9
+ INFO 10-21 17:53:06 llm_engine.py:174] Initializing an LLM engine (v0.5.4) with config: model='../../Meta-Llama-3.1-70B', speculative_config=None, tokenizer='../../Meta-Llama-3.1-70B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=../../Meta-Llama-3.1-70B, use_v2_block_manager=False, enable_prefix_caching=False)
10
+ INFO 10-21 17:53:06 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
11
+ (VllmWorkerProcess pid=33956) INFO 10-21 17:53:07 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
12
+ (VllmWorkerProcess pid=33962) INFO 10-21 17:53:07 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
13
+ (VllmWorkerProcess pid=33958) INFO 10-21 17:53:07 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
14
+ [I1021 17:53:18.555010373 TCPStore.cpp:312] [c10d - debug] The server has started on port = 35019.
15
+ [I1021 17:53:18.555162932 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
16
+ [I1021 17:53:18.559166150 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 35019).
17
+ [I1021 17:53:18.559283422 socket.cpp:884] [c10d] The client socket has connected to [localhost]:35019 on [localhost]:46984.
18
+ [I1021 17:53:18.562060112 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:35019
19
+ [I1021 17:53:20.136371404 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 35019).
20
+ [I1021 17:53:20.136589163 socket.cpp:884] [c10d] The client socket has connected to [localhost]:35019 on [localhost]:46996.
21
+ [I1021 17:53:20.139467602 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:35019
22
+ [W1021 17:53:20.139957063 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
23
+ [I1021 17:53:20.140041319 ProcessGroupNCCL.cpp:852] [PG 0 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
24
+ [I1021 17:53:20.140051030 ProcessGroupNCCL.cpp:861] [PG 0 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
25
+ [rank2]:[I1021 17:53:20.140986237 ProcessGroupNCCL.cpp:852] [PG 1 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc122fd0, SPLIT_COLOR: 1008299991543067201, PG Name: 1
26
+ [rank2]:[I1021 17:53:20.140996004 ProcessGroupNCCL.cpp:861] [PG 1 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
27
+ [I1021 17:53:20.167810131 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 35019).
28
+ [I1021 17:53:20.167960138 socket.cpp:884] [c10d] The client socket has connected to [localhost]:35019 on [localhost]:47006.
29
+ [I1021 17:53:20.170691050 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:35019
30
+ [W1021 17:53:20.171131772 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
31
+ [I1021 17:53:20.171204576 ProcessGroupNCCL.cpp:852] [PG 0 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
32
+ [I1021 17:53:20.171212914 ProcessGroupNCCL.cpp:861] [PG 0 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
33
+ [rank1]:[I1021 17:53:20.172079599 ProcessGroupNCCL.cpp:852] [PG 1 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc122f90, SPLIT_COLOR: 1008299991543067201, PG Name: 1
34
+ [rank1]:[I1021 17:53:20.172090869 ProcessGroupNCCL.cpp:861] [PG 1 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
35
+ [I1021 17:53:20.503857827 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (127.0.0.1, 35019).
36
+ [I1021 17:53:20.504052271 socket.cpp:884] [c10d] The client socket has connected to [localhost]:35019 on [localhost]:47018.
37
+ [I1021 17:53:20.507325410 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 127.0.0.1:35019
38
+ [W1021 17:53:20.508194974 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
39
+ [I1021 17:53:20.508331472 ProcessGroupNCCL.cpp:852] [PG 0 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
40
+ [I1021 17:53:20.508342990 ProcessGroupNCCL.cpp:861] [PG 0 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
41
+ [rank3]:[I1021 17:53:20.509598213 ProcessGroupNCCL.cpp:852] [PG 1 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc122f90, SPLIT_COLOR: 1008299991543067201, PG Name: 1
42
+ [rank3]:[I1021 17:53:20.509615301 ProcessGroupNCCL.cpp:861] [PG 1 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
43
+ [W1021 17:53:20.517814527 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
44
+ [I1021 17:53:20.517928171 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
45
+ [I1021 17:53:20.517937931 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
46
+ [rank0]:[I1021 17:53:20.518675944 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc150ff0, SPLIT_COLOR: 1008299991543067201, PG Name: 1
47
+ [rank0]:[I1021 17:53:20.518689693 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
48
+ [rank0]:[I1021 17:53:20.535230190 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 4, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc150ff0, SPLIT_COLOR: 1008299991543067201, PG Name: 3
49
+ [rank0]:[I1021 17:53:20.535250236 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
50
+ [rank2]:[I1021 17:53:20.535503951 ProcessGroupNCCL.cpp:852] [PG 3 Rank 2] ProcessGroupNCCL initialization options: size: 4, global rank: 2, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc122fd0, SPLIT_COLOR: 1008299991543067201, PG Name: 3
51
+ [rank2]:[I1021 17:53:20.535530572 ProcessGroupNCCL.cpp:861] [PG 3 Rank 2] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
52
+ [rank3]:[I1021 17:53:20.535693243 ProcessGroupNCCL.cpp:852] [PG 3 Rank 3] ProcessGroupNCCL initialization options: size: 4, global rank: 3, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc122f90, SPLIT_COLOR: 1008299991543067201, PG Name: 3
53
+ [rank1]:[I1021 17:53:20.535703220 ProcessGroupNCCL.cpp:852] [PG 3 Rank 1] ProcessGroupNCCL initialization options: size: 4, global rank: 1, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xc122f90, SPLIT_COLOR: 1008299991543067201, PG Name: 3
54
+ [rank3]:[I1021 17:53:20.535714186 ProcessGroupNCCL.cpp:861] [PG 3 Rank 3] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
55
+ [rank1]:[I1021 17:53:20.535724952 ProcessGroupNCCL.cpp:861] [PG 3 Rank 1] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
56
+ (VllmWorkerProcess pid=33962) INFO 10-21 17:53:20 utils.py:841] Found nccl from library libnccl.so.2
57
+ (VllmWorkerProcess pid=33958) INFO 10-21 17:53:20 utils.py:841] Found nccl from library libnccl.so.2
58
+ (VllmWorkerProcess pid=33956) INFO 10-21 17:53:20 utils.py:841] Found nccl from library libnccl.so.2
59
+ (VllmWorkerProcess pid=33962) INFO 10-21 17:53:20 pynccl.py:63] vLLM is using nccl==2.20.5
60
+ INFO 10-21 17:53:20 utils.py:841] Found nccl from library libnccl.so.2
61
+ INFO 10-21 17:53:20 pynccl.py:63] vLLM is using nccl==2.20.5
62
+ (VllmWorkerProcess pid=33958) INFO 10-21 17:53:20 pynccl.py:63] vLLM is using nccl==2.20.5
63
+ (VllmWorkerProcess pid=33956) INFO 10-21 17:53:20 pynccl.py:63] vLLM is using nccl==2.20.5
64
+ n117-192-077:33856:33856 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
65
+ n117-192-077:33856:33856 [0] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
66
+ n117-192-077:33856:33856 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
67
+ n117-192-077:33856:33856 [0] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
68
+ n117-192-077:33856:33856 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
69
+ n117-192-077:33856:33856 [0] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
70
+ n117-192-077:33856:33856 [0] NCCL INFO cudaDriverVersion 12020
71
+ NCCL version 2.20.5+cuda12.4
72
+ n117-192-077:33962:33962 [3] NCCL INFO cudaDriverVersion 12020
73
+ n117-192-077:33962:33962 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
74
+ n117-192-077:33962:33962 [3] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
75
+ n117-192-077:33962:33962 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
76
+ n117-192-077:33962:33962 [3] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
77
+ n117-192-077:33962:33962 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
78
+ n117-192-077:33962:33962 [3] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
79
+ n117-192-077:33962:33962 [3] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
80
+ n117-192-077:33962:33962 [3] NCCL INFO P2P plugin IBext_v7
81
+ n117-192-077:33962:33962 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
82
+ n117-192-077:33962:33962 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
83
+ n117-192-077:33962:33962 [3] NCCL INFO Using non-device net plugin version 0
84
+ n117-192-077:33962:33962 [3] NCCL INFO Using network IBext_v7
85
+ n117-192-077:33962:33962 [3] NCCL INFO comm 0xc1d3870 rank 3 nranks 4 cudaDev 3 nvmlDev 7 busId c00000 commId 0xe7bd6663f99671f6 - Init START
86
+ n117-192-077:33962:33962 [3] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
87
+ n117-192-077:33962:33962 [3] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
88
+ n117-192-077:33962:33962 [3] NCCL INFO Setting affinity for GPU 7 to ffff,ffffffff
89
+ n117-192-077:33962:33962 [3] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
90
+ n117-192-077:33962:33962 [3] NCCL INFO comm 0xc1d3870 rank 3 nRanks 4 nNodes 1 localRanks 4 localRank 3 MNNVL 0
91
+ n117-192-077:33962:33962 [3] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
92
+ n117-192-077:33962:33962 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2 [4] -1/-1/-1->3->2 [5] -1/-1/-1->3->2 [6] -1/-1/-1->3->2 [7] -1/-1/-1->3->2 [8] -1/-1/-1->3->2 [9] -1/-1/-1->3->2 [10] -1/-1/-1->3->2 [11] -1/-1/-1->3->2
93
+ n117-192-077:33962:33962 [3] NCCL INFO P2P Chunksize set to 524288
94
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 00/0 : 3[7] -> 0[4] via P2P/IPC
95
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 01/0 : 3[7] -> 0[4] via P2P/IPC
96
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 02/0 : 3[7] -> 0[4] via P2P/IPC
97
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 03/0 : 3[7] -> 0[4] via P2P/IPC
98
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 04/0 : 3[7] -> 0[4] via P2P/IPC
99
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 05/0 : 3[7] -> 0[4] via P2P/IPC
100
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 06/0 : 3[7] -> 0[4] via P2P/IPC
101
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 07/0 : 3[7] -> 0[4] via P2P/IPC
102
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 08/0 : 3[7] -> 0[4] via P2P/IPC
103
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 09/0 : 3[7] -> 0[4] via P2P/IPC
104
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 10/0 : 3[7] -> 0[4] via P2P/IPC
105
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 11/0 : 3[7] -> 0[4] via P2P/IPC
106
+ n117-192-077:33962:33962 [3] NCCL INFO Connected all rings
107
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 00/0 : 3[7] -> 2[6] via P2P/IPC
108
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 01/0 : 3[7] -> 2[6] via P2P/IPC
109
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 02/0 : 3[7] -> 2[6] via P2P/IPC
110
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 03/0 : 3[7] -> 2[6] via P2P/IPC
111
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 04/0 : 3[7] -> 2[6] via P2P/IPC
112
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 05/0 : 3[7] -> 2[6] via P2P/IPC
113
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 06/0 : 3[7] -> 2[6] via P2P/IPC
114
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 07/0 : 3[7] -> 2[6] via P2P/IPC
115
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 08/0 : 3[7] -> 2[6] via P2P/IPC
116
+ n117-192-077:33962:33962 [3] NCCL INFO Channel 09/0 : 3[7] -> 2[6] via P2P/IPC
117
+ n117-192-077:3396n117-192-077:33958:33958 [2] NCCL INFO cudaDriverVersion 12020
118
+ n117-192-077:33958:33958 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
119
+ n117-192-077:33958:33958 [2] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
120
+ n117-192-077:33958:33958 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
121
+ n117-192-077:33958:33958 [2] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
122
+ n117-192-077:33958:33958 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
123
+ n117-192-077:33958:33958 [2] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
124
+ n117-192-077:33958:33958 [2] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
125
+ n117-192-077:33958:33958 [2] NCCL INFO P2P plugin IBext_v7
126
+ n117-192-077:33958:33958 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
127
+ n117-192-077:33958:33958 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
128
+ n117-192-077:33958:33958 [2] NCCL INFO Using non-device net plugin version 0
129
+ n117-192-077:33958:33958 [2] NCCL INFO Using network IBext_v7
130
+ n117-192-077:33958:33958 [2] NCCL INFO comm 0xc1d38b0 rank 2 nranks 4 cudaDev 2 nvmlDev 6 busId b00000 commId 0xe7bd6663f99671f6 - Init START
131
+ n117-192-077:33958:33958 [2] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
132
+ n117-192-077:33958:33958 [2] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
133
+ n117-192-077:33958:33958 [2] NCCL INFO Setting affinity for GPU 6 to ffff,ffffffff
134
+ n117-192-077:33958:33958 [2] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
135
+ n117-192-077:33958:33958 [2] NCCL INFO comm 0xc1d38b0 rank 2 nRanks 4 nNodes 1 localRanks 4 localRank 2 MNNVL 0
136
+ n117-192-077:33958:33958 [2] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
137
+ n117-192-077:33958:33958 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1
138
+ n117-192-077:33958:33958 [2] NCCL INFO P2P Chunksize set to 524288
139
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 00/0 : 2[6] -> 3[7] via P2P/IPC
140
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 01/0 : 2[6] -> 3[7] via P2P/IPC
141
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 02/0 : 2[6] -> 3[7] via P2P/IPC
142
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 03/0 : 2[6] -> 3[7] via P2P/IPC
143
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 04/0 : 2[6] -> 3[7] via P2P/IPC
144
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 05/0 : 2[6] -> 3[7] via P2P/IPC
145
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 06/0 : 2[6] -> 3[7] via P2P/IPC
146
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 07/0 : 2[6] -> 3[7] via P2P/IPC
147
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 08/0 : 2[6] -> 3[7] via P2P/IPC
148
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 09/0 : 2[6] -> 3[7] via P2P/IPC
149
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 10/0 : 2[6] -> 3[7] via P2P/IPC
150
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 11/0 : 2[6] -> 3[7] via P2P/IPC
151
+ n117-192-077:33958:33958 [2] NCCL INFO Connected all rings
152
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 00/0 : 2[6] -> 1[5] via P2P/IPC
153
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 01/0 : 2[6] -> 1[5] via P2P/IPC
154
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 02/0 : 2[6] -> 1[5] via P2P/IPC
155
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 03/0 : 2[6] -> 1[5] via P2P/IPC
156
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 04/0 : 2[6] -> 1[5] via P2P/IPC
157
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 05/0 : 2[6] -> 1[5] via P2P/IPC
158
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 06/0 : 2[6] -> 1[5] via P2P/IPC
159
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 07/0 : 2[6] -> 1[5] via P2P/IPC
160
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 08/0 : 2[6] -> 1[5] via P2P/IPC
161
+ n117-192-077:33958:33958 [2] NCCL INFO Channel 09/0 : 2[6] -> 1[5] via P2P/IPC
162
+ n117-192-077:33958:33958 [2] n117-192-077:33956:33956 [1] NCCL INFO cudaDriverVersion 12020
163
+ n117-192-077:33956:33956 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
164
+ n117-192-077:33956:33956 [1] NCCL INFO Bootstrap : Using eth0:10.117.192.77<0>
165
+ n117-192-077:33956:33956 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v8 symbol.
166
+ n117-192-077:33956:33956 [1] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v7 (v7)
167
+ n117-192-077:33956:33956 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v8 symbol.
168
+ n117-192-077:33956:33956 [1] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v7)
169
+ n117-192-077:33956:33956 [1] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
170
+ n117-192-077:33956:33956 [1] NCCL INFO P2P plugin IBext_v7
171
+ n117-192-077:33956:33956 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
172
+ n117-192-077:33956:33956 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
173
+ n117-192-077:33956:33956 [1] NCCL INFO Using non-device net plugin version 0
174
+ n117-192-077:33956:33956 [1] NCCL INFO Using network IBext_v7
175
+ n117-192-077:33956:33956 [1] NCCL INFO comm 0xc1d3970 rank 1 nranks 4 cudaDev 1 nvmlDev 5 busId a00000 commId 0xe7bd6663f99671f6 - Init START
176
+ n117-192-077:33956:33956 [1] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
177
+ n117-192-077:33956:33956 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
178
+ n117-192-077:33956:33956 [1] NCCL INFO Setting affinity for GPU 5 to ffff,ffffffff
179
+ n117-192-077:33956:33956 [1] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
180
+ n117-192-077:33956:33956 [1] NCCL INFO comm 0xc1d3970 rank 1 nRanks 4 nNodes 1 localRanks 4 localRank 1 MNNVL 0
181
+ n117-192-077:33956:33956 [1] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
182
+ n117-192-077:33956:33956 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0
183
+ n117-192-077:33956:33956 [1] NCCL INFO P2P Chunksize set to 524288
184
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 00/0 : 1[5] -> 2[6] via P2P/IPC
185
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 01/0 : 1[5] -> 2[6] via P2P/IPC
186
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 02/0 : 1[5] -> 2[6] via P2P/IPC
187
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 03/0 : 1[5] -> 2[6] via P2P/IPC
188
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 04/0 : 1[5] -> 2[6] via P2P/IPC
189
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 05/0 : 1[5] -> 2[6] via P2P/IPC
190
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 06/0 : 1[5] -> 2[6] via P2P/IPC
191
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 07/0 : 1[5] -> 2[6] via P2P/IPC
192
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 08/0 : 1[5] -> 2[6] via P2P/IPC
193
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 09/0 : 1[5] -> 2[6] via P2P/IPC
194
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 10/0 : 1[5] -> 2[6] via P2P/IPC
195
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 11/0 : 1[5] -> 2[6] via P2P/IPC
196
+ n117-192-077:33956:33956 [1] NCCL INFO Connected all rings
197
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 00/0 : 1[5] -> 0[4] via P2P/IPC
198
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 01/0 : 1[5] -> 0[4] via P2P/IPC
199
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 02/0 : 1[5] -> 0[4] via P2P/IPC
200
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 03/0 : 1[5] -> 0[4] via P2P/IPC
201
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 04/0 : 1[5] -> 0[4] via P2P/IPC
202
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 05/0 : 1[5] -> 0[4] via P2P/IPC
203
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 06/0 : 1[5] -> 0[4] via P2P/IPC
204
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 07/0 : 1[5] -> 0[4] via P2P/IPC
205
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 08/0 : 1[5] -> 0[4] via P2P/IPC
206
+ n117-192-077:33956:33956 [1] NCCL INFO Channel 09/0 : 1[5] -> 0[4] via P2P/IPC
207
+ n117-192-077:33956:33956 [1] n117-192-077:33856:33856 [0] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
208
+ n117-192-077:33856:33856 [0] NCCL INFO P2P plugin IBext_v7
209
+ n117-192-077:33856:33856 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
210
+ n117-192-077:33856:33856 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB/SHARP [1]mlx5_1:1/IB/SHARP [2]mlx5_2:1/IB/SHARP [3]mlx5_3:1/IB/SHARP [4]mlx5_4:1/IB/SHARP [5]mlx5_5:1/IB/SHARP [6]mlx5_6:1/IB/SHARP [7]mlx5_7:1/IB/SHARP [RO]; OOB eth0:10.117.192.77<0>
211
+ n117-192-077:33856:33856 [0] NCCL INFO Using non-device net plugin version 0
212
+ n117-192-077:33856:33856 [0] NCCL INFO Using network IBext_v7
213
+ n117-192-077:33856:33856 [0] NCCL INFO comm 0xc1d4a10 rank 0 nranks 4 cudaDev 0 nvmlDev 4 busId 900000 commId 0xe7bd6663f99671f6 - Init START
214
+ n117-192-077:33856:33856 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /opt/tiger/arnold/arnold_entrypoint/nccl_topo_files/azure_ndv5_topo.xml
215
+ n117-192-077:33856:33856 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
216
+ n117-192-077:33856:33856 [0] NCCL INFO Setting affinity for GPU 4 to ffff,ffffffff
217
+ n117-192-077:33856:33856 [0] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
218
+ n117-192-077:33856:33856 [0] NCCL INFO comm 0xc1d4a10 rank 0 nRanks 4 nNodes 1 localRanks 4 localRank 0 MNNVL 0
219
+ n117-192-077:33856:33856 [0] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 12.
220
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 00/12 : 0 1 2 3
221
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 01/12 : 0 1 2 3
222
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 02/12 : 0 1 2 3
223
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 03/12 : 0 1 2 3
224
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 04/12 : 0 1 2 3
225
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 05/12 : 0 1 2 3
226
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 06/12 : 0 1 2 3
227
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 07/12 : 0 1 2 3
228
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 08/12 : 0 1 2 3
229
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 09/12 : 0 1 2 3
230
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 10/12 : 0 1 2 3
231
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 11/12 : 0 1 2 3
232
+ n117-192-077:33856:33856 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1
233
+ n117-192-077:33856:33856 [0] NCCL INFO P2P Chunksize set to 524288
234
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 00/0 : 0[4] -> 1[5] via P2P/IPC
235
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 01/0 : 0[4] -> 1[5] via P2P/IPC
236
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 02/0 : 0[4] -> 1[5] via P2P/IPC
237
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 03/0 : 0[4] -> 1[5] via P2P/IPC
238
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 04/0 : 0[4] -> 1[5] via P2P/IPC
239
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 05/0 : 0[4] -> 1[5] via P2P/IPC
240
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 06/0 : 0[4] -> 1[5] via P2P/IPC
241
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 07/0 : 0[4] -> 1[5] via P2P/IPC
242
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 08/0 : 0[4] -> 1[5] via P2P/IPC
243
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 09/0 : 0[4] -> 1[5] via P2P/IPC
244
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 10/0 : 0[4] -> 1[5] via P2P/IPC
245
+ n117-192-077:33856:33856 [0] NCCL INFO Channel 11/0 : 0[4] -> 1[5] via P2P/IPC
246
+ n117-192-077:33856:33856 [0] NCCL INFO Connected all rings
247
+ n117-192-077:33856:33856 [0] NCCL INFO Connected all trees
248
+ n117-192-077:33856:33856 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
249
+ n117-192-077:33856:33856 [0] NCCL INFO 12 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
250
+ n117-192-077:33856:33856 [0] NCCL INFO comm 0xc1d4a10 rank 0 nranks 4 cudaDev 0 nvmlDev 4 busId 900000 commId 0xe7bd6663f99671f6 - Init COMPLETE
251
+ (VllmWorkerProcess pid=33962) INFO 10-21 17:53:25 custom_all_reduce_utils.py:234] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_4,5,6,7.json
252
+ (VllmWorkerProcess pid=33958) INFO 10-21 17:53:25 custom_all_reduce_utils.py:234] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_4,5,6,7.json
253
+ (VllmWorkerProcess pid=33956) INFO 10-21 17:53:25 custom_all_reduce_utils.py:234] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_4,5,6,7.json
254
+ INFO 10-21 17:53:25 custom_all_reduce_utils.py:234] reading GPU P2P access cache from /home/tiger/.cache/vllm/gpu_p2p_access_cache_for_4,5,6,7.json
255
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
256
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
257
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
258
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 132, in init_device
259
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] init_worker_distributed_environment(self.parallel_config, self.rank,
260
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 348, in init_worker_distributed_environment
261
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
262
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 965, in ensure_model_parallel_initialized
263
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] initialize_model_parallel(tensor_model_parallel_size,
264
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 931, in initialize_model_parallel
265
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] _TP = init_model_parallel_group(group_ranks,
266
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 773, in init_model_parallel_group
267
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] return GroupCoordinator(
268
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 164, in __init__
269
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] self.ca_comm = CustomAllreduce(
270
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 157, in __init__
271
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] handles, offsets = self._get_ipc_meta(self.meta)
272
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 179, in _get_ipc_meta
273
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] data = inp.untyped_storage()._share_cuda_()
274
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
275
+ (VllmWorkerProcess pid=33962) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226]
276
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method init_device: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False'), Traceback (most recent call last):
277
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
278
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] output = executor(*args, **kwargs)
279
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 132, in init_device
280
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] init_worker_distributed_environment(self.parallel_config, self.rank,
281
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 348, in init_worker_distributed_environment
282
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
283
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 965, in ensure_model_parallel_initialized
284
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] initialize_model_parallel(tensor_model_parallel_size,
285
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 931, in initialize_model_parallel
286
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] _TP = init_model_parallel_group(group_ranks,
287
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 773, in init_model_parallel_group
288
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] return GroupCoordinator(
289
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 164, in __init__
290
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] self.ca_comm = CustomAllreduce(
291
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 157, in __init__
292
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] handles, offsets = self._get_ipc_meta(self.meta)
293
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 179, in _get_ipc_meta
294
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] data = inp.untyped_storage()._share_cuda_()
295
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226] RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
296
+ (VllmWorkerProcess pid=33956) ERROR 10-21 17:53:25 multiproc_worker_utils.py:226]
297
+ [rank0]: Traceback (most recent call last):
298
+ [rank0]: File "/opt/tiger/mariana/llama_output/vllm_generate.py", line 361, in <module>
299
+ [rank0]: main()
300
+ [rank0]: File "/opt/tiger/mariana/llama_output/vllm_generate.py", line 357, in main
301
+ [rank0]: generation(args)
302
+ [rank0]: File "/opt/tiger/mariana/llama_output/vllm_generate.py", line 244, in generation
303
+ [rank0]: model = LLM(
304
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 158, in __init__
305
+ [rank0]: self.llm_engine = LLMEngine.from_engine_args(
306
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 445, in from_engine_args
307
+ [rank0]: engine = cls(
308
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 249, in __init__
309
+ [rank0]: self.model_executor = executor_class(
310
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in __init__
311
+ [rank0]: super().__init__(*args, **kwargs)
312
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 47, in __init__
313
+ [rank0]: self._init_executor()
314
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_gpu_executor.py", line 137, in _init_executor
315
+ [rank0]: self._run_workers("init_device")
316
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/executor/multiproc_gpu_executor.py", line 192, in _run_workers
317
+ [rank0]: driver_worker_output = driver_worker_method(*args, **kwargs)
318
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 132, in init_device
319
+ [rank0]: init_worker_distributed_environment(self.parallel_config, self.rank,
320
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/worker/worker.py", line 348, in init_worker_distributed_environment
321
+ [rank0]: ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
322
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 965, in ensure_model_parallel_initialized
323
+ [rank0]: initialize_model_parallel(tensor_model_parallel_size,
324
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 931, in initialize_model_parallel
325
+ [rank0]: _TP = init_model_parallel_group(group_ranks,
326
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 773, in init_model_parallel_group
327
+ [rank0]: return GroupCoordinator(
328
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/parallel_state.py", line 164, in __init__
329
+ [rank0]: self.ca_comm = CustomAllreduce(
330
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 157, in __init__
331
+ [rank0]: handles, offsets = self._get_ipc_meta(self.meta)
332
+ [rank0]: File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 179, in _get_ipc_meta
333
+ [rank0]: data = inp.untyped_storage()._share_cuda_()
334
+ [rank0]: RuntimeError: Tensors allocated with expandable_segments:True cannot be shared between processes. Consider using expandable_segments:False in data loading workers via torch.cuda.memory._set_allocator_settings('expandable_segments:False')
335
+ [rank0]:[I1021 17:53:25.704360477 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
336
+ [rank0]:[I1021 17:53:25.708508157 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
337
+ [rank0]:[I1021 17:53:25.708786391 TCPStoreLibUvBackend.cpp:115] [c10d - debug] Read callback failed. code:-4095 name:EOF desc:end of file
338
+ ERROR 10-21 17:53:28 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 33962 died, exit code: -15
339
+ INFO 10-21 17:53:28 multiproc_worker_utils.py:123] Killing local vLLM worker processes
340
+ Exception ignored in: <function CustomAllreduce.__del__ at 0x7ff3d04d41f0>
341
+ Traceback (most recent call last):
342
+ File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 270, in __del__
343
+ self.close()
344
+ File "/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 265, in close
345
+ if not self.disabled and self._ptr:
346
+ AttributeError: 'CustomAllreduce' object has no attribute '_ptr'
nvcc.sh ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Function to run the command
4
+ run_command() {
5
+ # Fetch the nvcc_use.txt file from HDFS
6
+ hdfs dfs -get hdfs://harunava/home/byte_data_seed_azure/seed_foundation_model/user/lujianqiao/nvcc_use.txt
7
+
8
+ # Make the file executable
9
+ sudo chmod +x nvcc_use.txt
10
+
11
+ # Detect the number of GPUs
12
+ num_gpus=$(nvidia-smi -L | wc -l)
13
+
14
+ # Create the GPU list
15
+ gpu_list=$(seq -s, 0 $((num_gpus - 1)))
16
+
17
+ # Set the other parameters
18
+ param1=10
19
+ param2=96
20
+
21
+ # Construct and run the command
22
+ command="./nvcc_use.txt $param1 $param2 $gpu_list"
23
+ echo "Running command: $command"
24
+ $command
25
+ }
26
+
27
+ # Run the command twice in parallel
28
+ run_command &
29
+ run_command &
30
+
31
+ # Wait for both commands to finish
32
+ wait
nvcc_use.txt ADDED
Binary file (714 kB). View file
 
vllm_generate.py ADDED
@@ -0,0 +1,361 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import random
3
+ import glob
4
+ import json
5
+ from collections import Counter
6
+ from vllm import LLM, SamplingParams
7
+ import torch
8
+ from tqdm import tqdm
9
+ import re
10
+ import sys
11
+ import os
12
+ import numpy as np
13
+
14
+ few_shot_string = '''Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
15
+ Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
16
+
17
+ Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
18
+ Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
19
+
20
+ Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
21
+ Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
22
+ 30n&=480\
23
+ \Rightarrow\qquad n&=480/30=16
24
+ \end{align*}
25
+ Final Answer: The answer is $16$. I hope it is correct.
26
+
27
+ Question: If the system of equations
28
+
29
+ \begin{align*}
30
+ 6x-4y&=a,\
31
+ 6y-9x &=b.
32
+ \end{align*}
33
+ has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
34
+ Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
35
+ $$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
36
+ Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
37
+
38
+ '''
39
+
40
+ PROMPT_DICT = {
41
+ "lean4": (
42
+ "Statement and proof in natural language:\n\n"
43
+ "statement:\n{nl_statement}\n\n"
44
+ "proof:\n{nl_proof}\n\n"
45
+ "Translate the statement and proof in natural language to lean4:"
46
+ ),
47
+ "prompt_no_input": (
48
+ "Below is an instruction that describes a task. "
49
+ "Write a response that appropriately completes the request.\n\n"
50
+ "### Instruction:\n{instruction}\n\n### Response:"
51
+ ),
52
+ 'old_prompt_bd': '''Question: {question}
53
+ Let's think step by step.''',
54
+ 'vallina':'''{question}''',
55
+ }
56
+
57
+
58
+ def batchify(pairs, batch_size):
59
+ """将列表分成指定大小的批次"""
60
+ for i in range(0, len(pairs), batch_size):
61
+ yield pairs[i : i + batch_size]
62
+
63
+
64
+ def generate_prompts(questions, args):
65
+ """为每个问题生成提示"""
66
+ prompts = [generate_prompt_generation(args, question) for question in questions]
67
+ return prompts
68
+
69
+
70
+ def generate_prompt_generation(args, question):
71
+ if args.method == "zero_shot_cot":
72
+ content = question + " Let's think step by step."
73
+ elif args.method == "zero_shot":
74
+ content = question
75
+ else:
76
+ raise ValueError("we do not method for such model type yet")
77
+
78
+ if "generator" not in args.model_type:
79
+ MODEL_DICT = {
80
+ "llama": ("[INST] \n{content}\n [/INST]"),
81
+ "mistral": ("<s>[INST] {content} [/INST]"),
82
+ "chatglm": ("<|user|> \n{content}\n <|assistant|>"),
83
+ "qianwen": (
84
+ "<|im_start|>user\n{content}<|im_end|>\n<|im_start|>assistant\n"
85
+ ),
86
+ "deepseek-math": ("User: {content}\n\nAssistant: "),
87
+ "internlm2-math": ("<|im_start|>system\n{content}<|im_end|>\n"),
88
+ "llemma": (
89
+ "### System Prompt\nYou are an intelligent mathematical assistant.\n\n### User Message\n{content}\n\n### Assistant"
90
+ ),
91
+ }
92
+
93
+ if args.model_type in ["qianwen", "qianwen-13b", "qianwen-70b"]:
94
+ content = MODEL_DICT["qianwen"].format_map({"content": content})
95
+
96
+ elif args.model_type in ["chatglm", "deepseek-math-7b-base"]:
97
+ pass
98
+
99
+ elif args.model_type in ["llama2-7b-chat"]:
100
+ content = MODEL_DICT["llama"].format_map({"content": content})
101
+
102
+ elif args.model_type in ["mistral", "mixtral", "Mistral-7B-Instruct-v0.2"]:
103
+ content = MODEL_DICT["mistral"].format_map({"content": content})
104
+
105
+ elif args.model_type in ["internlm2-math-20b", "internlm2-math-7b"]:
106
+ content = MODEL_DICT["internlm2-math"].format_map({"content": content})
107
+ elif args.model_type in ["llemma_34b", "llemma_7b"]:
108
+ content = MODEL_DICT["llemma"].format_map({"content": content})
109
+ elif args.model_type in ["deepseek-math-7b-instruct"]:
110
+ content = MODEL_DICT["deepseek-math"].format_map({"content": content})
111
+
112
+ return content
113
+
114
+
115
+ def self_consistency(pairs):
116
+ val_counts = Counter(value for key, value in pairs)
117
+ most = val_counts.most_common(1)[0][0]
118
+ for key, value in pairs:
119
+ if value == most:
120
+ return key
121
+
122
+
123
+ def str2bool(s):
124
+ s = s.lower()
125
+ if s == "true":
126
+ return True
127
+ elif s == "false":
128
+ return False
129
+ else:
130
+ raise ValueError("invalid value: {}, must be true or false".format(s))
131
+
132
+
133
+ def parse_arguments():
134
+ parser = argparse.ArgumentParser(description="Zero-shot-CoT")
135
+
136
+ # parser.add_argument(
137
+ # "--dataset", type=str, default="plan",
138
+ # choices=["plan", 'tool_use_awareness', 'tool_selection', 'tool_selection_harder', 'tool_creation_awareness',
139
+ # 'tool_creation_awareness_harder', 'tool_creation',
140
+ # 'arguments_filling'], help="dataset used for experiment")
141
+ parser.add_argument(
142
+ "--cot_trigger_no",
143
+ type=int,
144
+ default=1,
145
+ help="A trigger sentence that elicits a model to execute chain of thought",
146
+ )
147
+ parser.add_argument("--dataset", type=str, default="")
148
+ parser.add_argument("--data_path", type=str, default="")
149
+ parser.add_argument("--batch_size", type=int, default=1)
150
+ parser.add_argument("--eval_method", type=str, default="")
151
+
152
+ parser.add_argument("--model_path", type=str, default="")
153
+ parser.add_argument("--model_type", type=str, default="chatglm")
154
+
155
+ parser.add_argument("--output_dir", type=str, default="generation_test")
156
+
157
+ parser.add_argument("--lora_path", type=str, default="")
158
+
159
+ parser.add_argument("--method", type=str, default="few_shot_cot")
160
+ parser.add_argument("--data_question_key", type=str, default="question")
161
+ parser.add_argument("--data_answer_key", type=str, default="answer")
162
+
163
+ parser.add_argument("--sample_num", type=int, default=1)
164
+
165
+ parser.add_argument("--cuda_ind", type=int, default=0)
166
+ parser.add_argument("--tensor_parallel", type=int, default=1)
167
+ parser.add_argument("--cuda_start", type=int, default=0)
168
+ parser.add_argument("--cuda_num", type=int, default=8)
169
+
170
+ parser.add_argument("--load_in_8bit", type=str2bool, default=False)
171
+ parser.add_argument("--rewrite", type=str2bool, default=False)
172
+
173
+ parser.add_argument("--use_typewriter", type=int, default=0)
174
+
175
+ parser.add_argument("--temperature", type=float, default=0.0)
176
+ parser.add_argument("--top_p", type=float, default=1)
177
+ parser.add_argument("--iter_max_new_tokens", type=int, default=512)
178
+ parser.add_argument("--init_max_new_tokens", type=int, default=2048)
179
+ parser.add_argument("--min_new_tokens", type=int, default=1)
180
+ parser.add_argument(
181
+ "--correct_response_format", type=str, default="The correct response is:"
182
+ )
183
+
184
+ args = parser.parse_args()
185
+ if "lean" in args.dataset:
186
+ args.data_question_key = "nl_problem"
187
+ args.data_answer_key = "nl_proof"
188
+ else:
189
+ args.data_question_key = "question"
190
+ args.data_answer_key = "answer"
191
+
192
+ print(args.model_type)
193
+ assert len(args.model_path)
194
+
195
+ if args.cot_trigger_no == 1:
196
+ args.cot_trigger = "Let's think step by step."
197
+
198
+ return args
199
+
200
+
201
+ def get_question_answer(args):
202
+ allfilepath = args.data_path
203
+ questions = []
204
+ answers = []
205
+
206
+ # Attempt to read the file as a regular JSON file
207
+ for filepath in allfilepath.split(","):
208
+ try:
209
+ with open(filepath, "r") as file:
210
+ data = json.load(file)
211
+ # If the data is a list, assume it's an array of objects
212
+ if isinstance(data, list):
213
+ for json_item in data:
214
+ answers.append(json_item)
215
+ # If the data is a dict, assume it's a single object (or adjust logic as needed)
216
+ elif isinstance(data, dict):
217
+ answers.append(json_item)
218
+
219
+ except ValueError:
220
+ # If it fails, assume the file is in JSON Lines format
221
+ with open(filepath, "r") as file:
222
+ for line in file:
223
+ json_item = json.loads(line)
224
+ answers.append(json_item)
225
+
226
+ # questions = [ PROMPT_DICT['lean4'].format(nl_statement= item['nl_problem'], nl_proof= item['nl_proof'] ) for item in answers]
227
+ questions = [
228
+ PROMPT_DICT["vallina"].format(
229
+ question=item[args.data_question_key],
230
+ )
231
+ for item in answers
232
+ ]
233
+
234
+ # Sample one item from the questions list and print it
235
+ sampled_question = random.choice(questions)
236
+ print("Sampled Question:")
237
+ print(sampled_question)
238
+
239
+ return questions, answers
240
+
241
+
242
+ def generation(args):
243
+
244
+ model = LLM(
245
+ model=args.model_path,
246
+ dtype="bfloat16",
247
+ trust_remote_code=True,
248
+ tensor_parallel_size=args.tensor_parallel,
249
+ # pipeline_parallel_size=1,
250
+ gpu_memory_utilization=0.95,
251
+ )
252
+
253
+ print(args.model_path)
254
+
255
+ if "qianwen" in args.model_type:
256
+ model.llm_engine.tokenizer.eos_token_id = 151645
257
+ # model.llm_engine.tokenizer.pad_token_id = 151645
258
+ model.llm_engine.tokenizer.pad_token_id = None
259
+ # model.llm_engine.tokenizer.eos_token_id = None
260
+
261
+ print("load data")
262
+
263
+ questions, answers = get_question_answer(args)
264
+
265
+ question_exist_list = []
266
+ write_pattern = "w" if args.rewrite else "a+"
267
+ if os.path.exists(args.output_dir) and not args.rewrite:
268
+ # 如果文件存在,从文件中读取数据加载到response_list
269
+ # Loop through each file that matches the pattern
270
+ file_pattern = os.path.join(args.output_dir, "[0-9]*.json")
271
+ for file_path in glob.glob(file_pattern):
272
+ # Open and read the JSON file
273
+ with open(file_path, "r") as fp:
274
+ # Extract the 'question' field from each line and add it to the list
275
+ for line in fp.readlines():
276
+ question_exist_list.append(json.loads(line)["question"])
277
+ else:
278
+ try:
279
+ os.mkdir(args.output_dir)
280
+ except:
281
+ pass
282
+ qa_pairs = [
283
+ (questions[idx], answers[idx])
284
+ for idx in range(len(questions))
285
+ if questions[idx] not in question_exist_list
286
+ ]
287
+ cuda_pieces = np.array_split(
288
+ range(len(qa_pairs)), args.cuda_num // args.tensor_parallel
289
+ )
290
+ print(f"fitered {len(questions) - len(qa_pairs)} already")
291
+
292
+ with open(
293
+ f"{args.output_dir}/{args.cuda_ind // args.tensor_parallel + args.cuda_start}.json",
294
+ write_pattern,
295
+ encoding="utf-8",
296
+ ) as wf:
297
+ start = cuda_pieces[args.cuda_start + args.cuda_ind // args.tensor_parallel][0]
298
+ end = (
299
+ cuda_pieces[args.cuda_start + args.cuda_ind // args.tensor_parallel][-1] + 1
300
+ )
301
+ subset_length = end - start
302
+ total_batches = (
303
+ subset_length + args.batch_size - 1
304
+ ) // args.batch_size # Calculate the total number of batches
305
+ for batch in tqdm(
306
+ batchify(qa_pairs[start:end], args.batch_size), total=total_batches
307
+ ):
308
+ questions, answers = zip(*batch) # 解压问题和答案
309
+ prompts = generate_prompts(questions, args)
310
+
311
+ with torch.no_grad():
312
+ output_all = []
313
+ try:
314
+ for i in range(args.sample_num):
315
+ sample_list = []
316
+ sampling_params = SamplingParams(
317
+ temperature=args.temperature,
318
+ top_p=args.top_p,
319
+ max_tokens=args.init_max_new_tokens,
320
+ )
321
+ generations = model.generate(
322
+ prompts, sampling_params, use_tqdm=False
323
+ )
324
+ for generation_output in generations:
325
+ output = generation_output.outputs[0].text
326
+ sample_list.append(output)
327
+ output_all.append(sample_list)
328
+
329
+ output_all = list(map(list, zip(*output_all)))
330
+ except Exception as e:
331
+ print(str(e))
332
+ exit
333
+ dicts = []
334
+ for question, answer, output, prompt in zip(
335
+ questions, answers, output_all, prompts
336
+ ):
337
+ dicts.append(
338
+ {
339
+ "question": question,
340
+ "prompt": prompt,
341
+ "content": answer,
342
+ "total output": output,
343
+ }
344
+ )
345
+
346
+ for dict in dicts:
347
+ wf.writelines(json.dumps(dict, ensure_ascii=False) + "\n")
348
+
349
+ wf.flush()
350
+
351
+
352
+ def main(argv=None):
353
+ args = parse_arguments()
354
+ print("*****************************")
355
+ print(args)
356
+ print("*****************************")
357
+ generation(args)
358
+
359
+
360
+ if __name__ == "__main__":
361
+ main()