yaleh commited on
Commit
695de90
·
1 Parent(s): 9a207c6

Updated langgraph_meata_prompt.ipynb for Google Colab.

Browse files
Files changed (1) hide show
  1. langgraph_meta_prompt.ipynb +130 -119
langgraph_meta_prompt.ipynb CHANGED
@@ -2,7 +2,103 @@
2
  "cells": [
3
  {
4
  "cell_type": "code",
5
- "execution_count": 1,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  "metadata": {},
7
  "outputs": [],
8
  "source": [
@@ -23,10 +119,10 @@
23
  "\n",
24
  "# Can converge correctly\n",
25
  "\n",
26
- "MODEL_NAME = \"anthropic/claude-3.5-sonnet:beta\"\n",
27
  "# MODEL_NAME = \"llama3-70b-8192\"\n",
28
  "# MODEL_NAME = \"meta-llama/llama-3-70b-instruct\"\n",
29
- "# MODEL_NAME = \"deepseek/deepseek-chat\"\n",
30
  "# MODEL_NAME = \"qwen/qwen-2-72b-instruct\"\n",
31
  "\n",
32
  "# Failed to converge correctly\n",
@@ -43,13 +139,13 @@
43
  "llm = ChatOpenAI(model_name=MODEL_NAME, temperature=0.5)\n",
44
  "\n",
45
  "# EXECUTOR_MODEL = \"microsoft/phi-3-medium-128k-instruct:free\"\n",
46
- "# EXECUTOR_MODEL = \"deepseek/deepseek-chat\"\n",
47
  "# EXECUTOR_MODEL = \"gemma-7b-it\"\n",
48
  "# EXECUTOR_MODEL = \"llama3-8b-8192\"\n",
49
  "# EXECUTOR_MODEL = \"llama3-70b-8192\"\n",
50
  "# EXECUTOR_MODEL = \"mixtral-8x7b-32768\"\n",
51
  "# EXECUTOR_MODEL = \"anthropic/claude-3-haiku:beta\"\n",
52
- "EXECUTOR_MODEL = \"meta-llama/llama-3-8b-instruct\"\n",
53
  "# EXECUTOR_MODEL = \"google/gemma-2-9b-it\"\n",
54
  "\n",
55
  "executor_llm = ChatOpenAI(model_name=EXECUTOR_MODEL, temperature=0.01)\n",
@@ -465,7 +561,7 @@
465
  },
466
  {
467
  "cell_type": "code",
468
- "execution_count": 2,
469
  "metadata": {},
470
  "outputs": [
471
  {
@@ -491,7 +587,7 @@
491
  },
492
  {
493
  "cell_type": "code",
494
- "execution_count": 3,
495
  "metadata": {},
496
  "outputs": [
497
  {
@@ -502,134 +598,49 @@
502
  "Expected Output: (2+8)*3\n",
503
  "= 10*3\n",
504
  "= 30\n",
505
- "\n"
506
- ]
507
- },
508
- {
509
- "name": "stderr",
510
- "output_type": "stream",
511
- "text": [
512
- "/home/yale/work/meta-prompt/.venv/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:139: LangChainDeprecationWarning: The method `BaseChatModel.__call__` was deprecated in langchain-core 0.1.7 and will be removed in 0.3.0. Use invoke instead.\n",
513
- " warn_deprecated(\n"
514
- ]
515
- },
516
- {
517
- "name": "stdout",
518
- "output_type": "stream",
519
- "text": [
520
- "You are a step-by-step math calculator. When given a mathematical\n",
521
- "expression:\n",
522
  "\n",
523
- "1. Display the original expression on the first line.\n",
524
- "2. On the next line, show the first step of the calculation, preceded by '='.\n",
525
- "3. Continue showing each step on a new line until the final result is reached.\n",
526
- "4. Simplify expressions within parentheses before applying operations outside.\n",
527
- "5. Show multiplication using the '*' symbol.\n",
528
- "6. Do not explain the steps; simply show the calculations.\n",
529
  "\n",
530
- "Provide the solution using this clear, concise format to help users understand\n",
531
- "the problem-solving process.\n",
532
- "(2+8)*3\n",
533
- "= 2 + 8\n",
534
- "= 10\n",
535
- "= 10 * 3\n",
536
- "= 30\n",
537
- "Based on the provided Expected Output, Actual Output, and Acceptance Criteria, here's the analysis:\n",
538
  "\n",
 
 
 
 
539
  "```\n",
540
  "- Acceptable Differences: \n",
541
- " * Extra line break at the end of the Actual Output\n",
542
- "\n",
543
  "- Unacceptable Differences: \n",
544
- " * Actual Output shows intermediate steps (2 + 8 = 10) not present in Expected Output\n",
545
- " * Actual Output separates 10 * 3 into a separate step, which is not in Expected Output\n",
546
- "\n",
547
  "- Accept: No\n",
548
  "```\n",
 
 
 
549
  "\n",
550
- "The Actual Output differs significantly from the Expected Output in terms of content, showing additional steps in the calculation process that are not present in the Expected Output. These differences go beyond the acceptable criteria of extra spaces or line breaks. Therefore, the Actual Output is not acceptable according to the given Acceptance Criteria.\n",
551
- "Here are suggestions to improve the System Prompt:\n",
552
- "\n",
553
- "- Add a specific instruction to simplify expressions within parentheses in a single step, without showing intermediate calculations.\n",
554
- "- Include a clear directive to perform multiplication immediately after simplifying parentheses, without separating it into an additional step.\n",
555
- "- Provide an example calculation in the prompt that demonstrates the desired output format, such as `(3+2)*4 = 5*4 = 20`.\n",
556
- "- Explicitly state that only two lines of calculation should be shown for expressions with a single set of parentheses: one for simplifying the parentheses and one for the final result.\n",
557
- "- Remove or modify the instruction about showing each step, as it may encourage unnecessary intermediate steps.\n",
558
- "- Add a note that the solution should be as concise as possible, showing only the essential steps.\n",
559
- "You are a step-by-step math calculator. When given a mathematical expression:\n",
560
- "\n",
561
- "1. Display the original expression on the first line.\n",
562
- "2. Simplify expressions within parentheses in a single step, without showing\n",
563
- " intermediate calculations.\n",
564
- "3. Perform multiplication immediately after simplifying parentheses, without\n",
565
- " separating it into an additional step.\n",
566
- "4. Show the final result on the last line.\n",
567
- "5. Use the '*' symbol for multiplication.\n",
568
- "6. For expressions with a single set of parentheses, show only two lines of\n",
569
- " calculation: one for simplifying the parentheses and one for the final\n",
570
- " result.\n",
571
- "7. Provide the solution in the most concise format possible, showing only\n",
572
- " essential steps.\n",
573
- "\n",
574
- "Do not explain the steps; simply show the calculations. Here's an example of\n",
575
- "the desired output format:\n",
576
- "\n",
577
- "(3+2)*4\n",
578
- "= 5*4\n",
579
- "= 20\n",
580
- "\n",
581
- "This clear, concise format will help users understand the problem-solving\n",
582
- "process efficiently.\n",
583
- "(2+8)*3\n",
584
- "= 10*3\n",
585
  "= 30\n",
586
- "After comparing the two outputs with the expected output based on the given acceptance criteria, I can conclude:\n",
587
- "\n",
588
  "# Better Output ID: B\n",
589
- "\n",
590
- "Output B is more similar to the expected output. It matches the expected output exactly, with the only difference being a missing line break at the end, which is acceptable according to the criteria. Output A, while reaching the same final result, includes additional steps that are not present in the expected output.\n",
591
- "Here's the analysis based on the provided Expected Output, Actual Output, and Acceptance Criteria:\n",
592
- "\n",
593
  "```\n",
594
- "- Acceptable Differences: \n",
595
- " * Missing line break at the end of the Actual Output.\n",
596
- "\n",
597
- "- Unacceptable Differences: \n",
598
- " [None]\n",
599
- "\n",
600
  "- Accept: Yes\n",
601
  "```\n",
602
- "\n",
603
- "The Actual Output matches the Expected Output exactly in terms of content and formatting, with the only difference being a missing line break at the end of the Actual Output. This falls under the acceptable differences as per the Acceptance Criteria, which allows for \"Extra or missing line breaks at the beginning or end of the output.\" Therefore, the Actual Output is acceptable.\n",
604
- "Final Result: {'acceptance_criteria': '\\n* Exactly text match.\\n* Acceptable differences:\\n * Extra or missing spaces.\\n * Extra or missing line breaks at the beginning or end of the output.\\n', 'user_message': '(2+8)*3', 'expected_output': '(2+8)*3\\n= 10*3\\n= 30\\n', 'system_message': \"You are a step-by-step math calculator. When given a mathematical expression:\\n\\n1. Display the original expression on the first line.\\n2. Simplify expressions within parentheses in a single step, without showing\\n intermediate calculations.\\n3. Perform multiplication immediately after simplifying parentheses, without\\n separating it into an additional step.\\n4. Show the final result on the last line.\\n5. Use the '*' symbol for multiplication.\\n6. For expressions with a single set of parentheses, show only two lines of\\n calculation: one for simplifying the parentheses and one for the final\\n result.\\n7. Provide the solution in the most concise format possible, showing only\\n essential steps.\\n\\nDo not explain the steps; simply show the calculations. Here's an example of\\nthe desired output format:\\n\\n(3+2)*4\\n= 5*4\\n= 20\\n\\nThis clear, concise format will help users understand the problem-solving\\nprocess efficiently.\", 'output': '(2+8)*3\\n= 10*3\\n= 30', 'suggestions': 'Here are suggestions to improve the System Prompt:\\n\\n- Add a specific instruction to simplify expressions within parentheses in a single step, without showing intermediate calculations.\\n- Include a clear directive to perform multiplication immediately after simplifying parentheses, without separating it into an additional step.\\n- Provide an example calculation in the prompt that demonstrates the desired output format, such as `(3+2)*4 = 5*4 = 20`.\\n- Explicitly state that only two lines of calculation should be shown for expressions with a single set of parentheses: one for simplifying the parentheses and one for the final result.\\n- Remove or modify the instruction about showing each step, as it may encourage unnecessary intermediate steps.\\n- Add a note that the solution should be as concise as possible, showing only the essential steps.', 'accepted': True, 'analysis': 'Here\\'s the analysis based on the provided Expected Output, Actual Output, and Acceptance Criteria:\\n\\n```\\n- Acceptable Differences: \\n * Missing line break at the end of the Actual Output.\\n\\n- Unacceptable Differences: \\n [None]\\n\\n- Accept: Yes\\n```\\n\\nThe Actual Output matches the Expected Output exactly in terms of content and formatting, with the only difference being a missing line break at the end of the Actual Output. This falls under the acceptable differences as per the Acceptance Criteria, which allows for \"Extra or missing line breaks at the beginning or end of the output.\" Therefore, the Actual Output is acceptable.', 'best_output': '(2+8)*3\\n= 10*3\\n= 30', 'best_system_message': \"You are a step-by-step math calculator. When given a mathematical expression:\\n\\n1. Display the original expression on the first line.\\n2. Simplify expressions within parentheses in a single step, without showing\\n intermediate calculations.\\n3. Perform multiplication immediately after simplifying parentheses, without\\n separating it into an additional step.\\n4. Show the final result on the last line.\\n5. Use the '*' symbol for multiplication.\\n6. For expressions with a single set of parentheses, show only two lines of\\n calculation: one for simplifying the parentheses and one for the final\\n result.\\n7. Provide the solution in the most concise format possible, showing only\\n essential steps.\\n\\nDo not explain the steps; simply show the calculations. Here's an example of\\nthe desired output format:\\n\\n(3+2)*4\\n= 5*4\\n= 20\\n\\nThis clear, concise format will help users understand the problem-solving\\nprocess efficiently.\", 'best_output_age': 0, 'max_output_age': 3}\n",
605
  "System Message:\n",
606
- "You are a step-by-step math calculator. When given a mathematical expression:\n",
607
- "\n",
608
- "1. Display the original expression on the first line.\n",
609
- "2. Simplify expressions within parentheses in a single step, without showing\n",
610
- " intermediate calculations.\n",
611
- "3. Perform multiplication immediately after simplifying parentheses, without\n",
612
- " separating it into an additional step.\n",
613
- "4. Show the final result on the last line.\n",
614
- "5. Use the '*' symbol for multiplication.\n",
615
- "6. For expressions with a single set of parentheses, show only two lines of\n",
616
- " calculation: one for simplifying the parentheses and one for the final\n",
617
- " result.\n",
618
- "7. Provide the solution in the most concise format possible, showing only\n",
619
- " essential steps.\n",
620
- "\n",
621
- "Do not explain the steps; simply show the calculations. Here's an example of\n",
622
- "the desired output format:\n",
623
- "\n",
624
- "(3+2)*4\n",
625
- "= 5*4\n",
626
- "= 20\n",
627
- "\n",
628
- "This clear, concise format will help users understand the problem-solving\n",
629
- "process efficiently.\n",
630
  "Output:\n",
631
- "(2+8)*3\n",
632
- "= 10*3\n",
633
  "= 30\n"
634
  ]
635
  }
 
2
  "cells": [
3
  {
4
  "cell_type": "code",
5
+ "execution_count": 9,
6
+ "metadata": {},
7
+ "outputs": [
8
+ {
9
+ "name": "stdout",
10
+ "output_type": "stream",
11
+ "text": [
12
+ "Requirement already satisfied: langchain in ./.venv/lib/python3.10/site-packages (0.2.6)\n",
13
+ "Requirement already satisfied: openai in ./.venv/lib/python3.10/site-packages (1.35.7)\n",
14
+ "Requirement already satisfied: langchain_openai in ./.venv/lib/python3.10/site-packages (0.1.13)\n",
15
+ "Requirement already satisfied: langchain_core in ./.venv/lib/python3.10/site-packages (0.2.10)\n",
16
+ "Requirement already satisfied: langgraph in ./.venv/lib/python3.10/site-packages (0.1.4)\n",
17
+ "Requirement already satisfied: PyYAML>=5.3 in ./.venv/lib/python3.10/site-packages (from langchain) (6.0.1)\n",
18
+ "Requirement already satisfied: SQLAlchemy<3,>=1.4 in ./.venv/lib/python3.10/site-packages (from langchain) (2.0.21)\n",
19
+ "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in ./.venv/lib/python3.10/site-packages (from langchain) (3.8.5)\n",
20
+ "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in ./.venv/lib/python3.10/site-packages (from langchain) (4.0.3)\n",
21
+ "Requirement already satisfied: langchain-text-splitters<0.3.0,>=0.2.0 in ./.venv/lib/python3.10/site-packages (from langchain) (0.2.2)\n",
22
+ "Requirement already satisfied: langsmith<0.2.0,>=0.1.17 in ./.venv/lib/python3.10/site-packages (from langchain) (0.1.82)\n",
23
+ "Requirement already satisfied: numpy<2,>=1 in ./.venv/lib/python3.10/site-packages (from langchain) (1.26.0)\n",
24
+ "Requirement already satisfied: pydantic<3,>=1 in ./.venv/lib/python3.10/site-packages (from langchain) (2.3.0)\n",
25
+ "Requirement already satisfied: requests<3,>=2 in ./.venv/lib/python3.10/site-packages (from langchain) (2.31.0)\n",
26
+ "Requirement already satisfied: tenacity!=8.4.0,<9.0.0,>=8.1.0 in ./.venv/lib/python3.10/site-packages (from langchain) (8.2.3)\n",
27
+ "Requirement already satisfied: anyio<5,>=3.5.0 in ./.venv/lib/python3.10/site-packages (from openai) (3.7.1)\n",
28
+ "Requirement already satisfied: distro<2,>=1.7.0 in ./.venv/lib/python3.10/site-packages (from openai) (1.9.0)\n",
29
+ "Requirement already satisfied: httpx<1,>=0.23.0 in ./.venv/lib/python3.10/site-packages (from openai) (0.25.0)\n",
30
+ "Requirement already satisfied: sniffio in ./.venv/lib/python3.10/site-packages (from openai) (1.3.0)\n",
31
+ "Requirement already satisfied: tqdm>4 in ./.venv/lib/python3.10/site-packages (from openai) (4.66.1)\n",
32
+ "Requirement already satisfied: typing-extensions<5,>=4.7 in ./.venv/lib/python3.10/site-packages (from openai) (4.12.2)\n",
33
+ "Requirement already satisfied: tiktoken<1,>=0.7 in ./.venv/lib/python3.10/site-packages (from langchain_openai) (0.7.0)\n",
34
+ "Requirement already satisfied: jsonpatch<2.0,>=1.33 in ./.venv/lib/python3.10/site-packages (from langchain_core) (1.33)\n",
35
+ "Requirement already satisfied: packaging<25,>=23.2 in ./.venv/lib/python3.10/site-packages (from langchain_core) (24.1)\n",
36
+ "Requirement already satisfied: attrs>=17.3.0 in ./.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.1.0)\n",
37
+ "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in ./.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (3.2.0)\n",
38
+ "Requirement already satisfied: multidict<7.0,>=4.5 in ./.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.4)\n",
39
+ "Requirement already satisfied: yarl<2.0,>=1.0 in ./.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.2)\n",
40
+ "Requirement already satisfied: frozenlist>=1.1.1 in ./.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.0)\n",
41
+ "Requirement already satisfied: aiosignal>=1.1.2 in ./.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n",
42
+ "Requirement already satisfied: idna>=2.8 in ./.venv/lib/python3.10/site-packages (from anyio<5,>=3.5.0->openai) (3.4)\n",
43
+ "Requirement already satisfied: exceptiongroup in ./.venv/lib/python3.10/site-packages (from anyio<5,>=3.5.0->openai) (1.2.1)\n",
44
+ "Requirement already satisfied: certifi in ./.venv/lib/python3.10/site-packages (from httpx<1,>=0.23.0->openai) (2023.7.22)\n",
45
+ "Requirement already satisfied: httpcore<0.19.0,>=0.18.0 in ./.venv/lib/python3.10/site-packages (from httpx<1,>=0.23.0->openai) (0.18.0)\n",
46
+ "Requirement already satisfied: jsonpointer>=1.9 in ./.venv/lib/python3.10/site-packages (from jsonpatch<2.0,>=1.33->langchain_core) (2.4)\n",
47
+ "Requirement already satisfied: orjson<4.0.0,>=3.9.14 in ./.venv/lib/python3.10/site-packages (from langsmith<0.2.0,>=0.1.17->langchain) (3.10.5)\n",
48
+ "Requirement already satisfied: annotated-types>=0.4.0 in ./.venv/lib/python3.10/site-packages (from pydantic<3,>=1->langchain) (0.5.0)\n",
49
+ "Requirement already satisfied: pydantic-core==2.6.3 in ./.venv/lib/python3.10/site-packages (from pydantic<3,>=1->langchain) (2.6.3)\n",
50
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in ./.venv/lib/python3.10/site-packages (from requests<3,>=2->langchain) (2.0.5)\n",
51
+ "Requirement already satisfied: greenlet!=0.4.17 in ./.venv/lib/python3.10/site-packages (from SQLAlchemy<3,>=1.4->langchain) (2.0.2)\n",
52
+ "Requirement already satisfied: regex>=2022.1.18 in ./.venv/lib/python3.10/site-packages (from tiktoken<1,>=0.7->langchain_openai) (2024.5.15)\n",
53
+ "Requirement already satisfied: h11<0.15,>=0.13 in ./.venv/lib/python3.10/site-packages (from httpcore<0.19.0,>=0.18.0->httpx<1,>=0.23.0->openai) (0.14.0)\n",
54
+ "\n",
55
+ "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.1.1\u001b[0m\n",
56
+ "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
57
+ "Note: you may need to restart the kernel to use updated packages.\n"
58
+ ]
59
+ }
60
+ ],
61
+ "source": [
62
+ "%pip install langchain openai langchain_openai langchain_core langgraph"
63
+ ]
64
+ },
65
+ {
66
+ "cell_type": "code",
67
+ "execution_count": 10,
68
+ "metadata": {},
69
+ "outputs": [
70
+ {
71
+ "name": "stdout",
72
+ "output_type": "stream",
73
+ "text": [
74
+ "Not running in Google Colab\n"
75
+ ]
76
+ }
77
+ ],
78
+ "source": [
79
+ "# prompt: Detect whether it's running in Google Colab.\n",
80
+ "\n",
81
+ "import sys\n",
82
+ "import os\n",
83
+ "\n",
84
+ "if 'google.colab' in sys.modules:\n",
85
+ " print(\"Running in Google Colab\")\n",
86
+ " from google.colab import userdata\n",
87
+ "\n",
88
+ " # get secret openai_api_key and set it to OS env OPENAI_API_KEY\n",
89
+ " openai_api_key = userdata.get('openai_api_key')\n",
90
+ " os.environ['OPENAI_API_KEY'] = openai_api_key\n",
91
+ "\n",
92
+ " # get secret openai_base_url\n",
93
+ " openai_base_url = userdata.get('openai_base_url')\n",
94
+ " os.environ['OPENAI_API_BASE'] = openai_base_url\n",
95
+ "else:\n",
96
+ " print(\"Not running in Google Colab\")"
97
+ ]
98
+ },
99
+ {
100
+ "cell_type": "code",
101
+ "execution_count": 11,
102
  "metadata": {},
103
  "outputs": [],
104
  "source": [
 
119
  "\n",
120
  "# Can converge correctly\n",
121
  "\n",
122
+ "# MODEL_NAME = \"anthropic/claude-3.5-sonnet:beta\"\n",
123
  "# MODEL_NAME = \"llama3-70b-8192\"\n",
124
  "# MODEL_NAME = \"meta-llama/llama-3-70b-instruct\"\n",
125
+ "MODEL_NAME = \"deepseek/deepseek-chat\"\n",
126
  "# MODEL_NAME = \"qwen/qwen-2-72b-instruct\"\n",
127
  "\n",
128
  "# Failed to converge correctly\n",
 
139
  "llm = ChatOpenAI(model_name=MODEL_NAME, temperature=0.5)\n",
140
  "\n",
141
  "# EXECUTOR_MODEL = \"microsoft/phi-3-medium-128k-instruct:free\"\n",
142
+ "EXECUTOR_MODEL = \"deepseek/deepseek-chat\"\n",
143
  "# EXECUTOR_MODEL = \"gemma-7b-it\"\n",
144
  "# EXECUTOR_MODEL = \"llama3-8b-8192\"\n",
145
  "# EXECUTOR_MODEL = \"llama3-70b-8192\"\n",
146
  "# EXECUTOR_MODEL = \"mixtral-8x7b-32768\"\n",
147
  "# EXECUTOR_MODEL = \"anthropic/claude-3-haiku:beta\"\n",
148
+ "# EXECUTOR_MODEL = \"meta-llama/llama-3-8b-instruct\"\n",
149
  "# EXECUTOR_MODEL = \"google/gemma-2-9b-it\"\n",
150
  "\n",
151
  "executor_llm = ChatOpenAI(model_name=EXECUTOR_MODEL, temperature=0.01)\n",
 
561
  },
562
  {
563
  "cell_type": "code",
564
+ "execution_count": 12,
565
  "metadata": {},
566
  "outputs": [
567
  {
 
587
  },
588
  {
589
  "cell_type": "code",
590
+ "execution_count": 13,
591
  "metadata": {},
592
  "outputs": [
593
  {
 
598
  "Expected Output: (2+8)*3\n",
599
  "= 10*3\n",
600
  "= 30\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
601
  "\n",
602
+ "You are an AI assistant designed to solve mathematical expressions step by step. When given a mathematical expression, first rewrite the expression as given, then solve it step by step, showing each intermediate calculation, and finally provide the result. Ensure clarity and precision in each step of the solution.\n",
603
+ "The expression given is \\((2+8) \\times 3\\).\n",
 
 
 
 
604
  "\n",
605
+ "Step 1: Solve the expression inside the parentheses.\n",
606
+ "\\[ 2 + 8 = 10 \\]\n",
 
 
 
 
 
 
607
  "\n",
608
+ "Step 2: Multiply the result by 3.\n",
609
+ "\\[ 10 \\times 3 = 30 \\]\n",
610
+ "\n",
611
+ "Therefore, the result of the expression \\((2+8) \\times 3\\) is \\(30\\).\n",
612
  "```\n",
613
  "- Acceptable Differences: \n",
614
+ " * Extra line breaks in the Actual Output.\n",
615
+ " * Use of LaTeX-style math notation (\\( \\)) in the Actual Output.\n",
616
  "- Unacceptable Differences: \n",
617
+ " * The Actual Output includes additional text and steps explaining the calculation process, which is not present in the Expected Output.\n",
 
 
618
  "- Accept: No\n",
619
  "```\n",
620
+ "- **Simplify the Instructions**: Remove the requirement to explain each step in detail. Instead, focus on the process of rewriting and solving the expression.\n",
621
+ "- **Specify Formatting**: Clearly state that the output should be a direct step-by-step solution without additional explanations or LaTeX notation.\n",
622
+ "- **Clarify the Output Structure**: Emphasize that each step should be shown in a simple, straightforward manner, matching the format of the expected output.\n",
623
  "\n",
624
+ "Example of an improved System Prompt:\n",
625
+ "```\n",
626
+ "You are an AI assistant designed to solve mathematical expressions step by step. When given a mathematical expression, first rewrite the expression as given, then solve it step by step, showing each intermediate calculation, and finally provide the result. Ensure clarity and precision in each step of the solution, without additional explanations or special formatting. The output should be a direct sequence of calculations, matching the format of the input expression.\n",
627
+ "```\n",
628
+ "You are an AI assistant designed to solve mathematical expressions step by step. When given a mathematical expression, first rewrite the expression as given, then solve it step by step, showing each intermediate calculation, and finally provide the result. The output should be a direct sequence of calculations, matching the format of the input expression, without additional explanations or special formatting. Ensure clarity and precision in each step of the solution.\n",
629
+ "(2 + 8) * 3\n",
630
+ "= 10 * 3\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
631
  "= 30\n",
 
 
632
  "# Better Output ID: B\n",
 
 
 
 
633
  "```\n",
634
+ "- Acceptable Differences: Extra spaces around the plus sign in the Actual Output.\n",
635
+ "- Unacceptable Differences: None.\n",
 
 
 
 
636
  "- Accept: Yes\n",
637
  "```\n",
638
+ "Final Result: {'acceptance_criteria': '\\n* Exactly text match.\\n* Acceptable differences:\\n * Extra or missing spaces.\\n * Extra or missing line breaks at the beginning or end of the output.\\n', 'user_message': '(2+8)*3', 'expected_output': '(2+8)*3\\n= 10*3\\n= 30\\n', 'system_message': 'You are an AI assistant designed to solve mathematical expressions step by step. When given a mathematical expression, first rewrite the expression as given, then solve it step by step, showing each intermediate calculation, and finally provide the result. The output should be a direct sequence of calculations, matching the format of the input expression, without additional explanations or special formatting. Ensure clarity and precision in each step of the solution.', 'output': '(2 + 8) * 3\\n= 10 * 3\\n= 30', 'suggestions': '- **Simplify the Instructions**: Remove the requirement to explain each step in detail. Instead, focus on the process of rewriting and solving the expression.\\n- **Specify Formatting**: Clearly state that the output should be a direct step-by-step solution without additional explanations or LaTeX notation.\\n- **Clarify the Output Structure**: Emphasize that each step should be shown in a simple, straightforward manner, matching the format of the expected output.\\n\\nExample of an improved System Prompt:\\n```\\nYou are an AI assistant designed to solve mathematical expressions step by step. When given a mathematical expression, first rewrite the expression as given, then solve it step by step, showing each intermediate calculation, and finally provide the result. Ensure clarity and precision in each step of the solution, without additional explanations or special formatting. The output should be a direct sequence of calculations, matching the format of the input expression.\\n```', 'accepted': True, 'analysis': '```\\n- Acceptable Differences: Extra spaces around the plus sign in the Actual Output.\\n- Unacceptable Differences: None.\\n- Accept: Yes\\n```', 'best_output': '(2 + 8) * 3\\n= 10 * 3\\n= 30', 'best_system_message': 'You are an AI assistant designed to solve mathematical expressions step by step. When given a mathematical expression, first rewrite the expression as given, then solve it step by step, showing each intermediate calculation, and finally provide the result. The output should be a direct sequence of calculations, matching the format of the input expression, without additional explanations or special formatting. Ensure clarity and precision in each step of the solution.', 'best_output_age': 0, 'max_output_age': 3}\n",
 
 
639
  "System Message:\n",
640
+ "You are an AI assistant designed to solve mathematical expressions step by step. When given a mathematical expression, first rewrite the expression as given, then solve it step by step, showing each intermediate calculation, and finally provide the result. The output should be a direct sequence of calculations, matching the format of the input expression, without additional explanations or special formatting. Ensure clarity and precision in each step of the solution.\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
641
  "Output:\n",
642
+ "(2 + 8) * 3\n",
643
+ "= 10 * 3\n",
644
  "= 30\n"
645
  ]
646
  }