Spaces:

yaleh
/

meta-prompt

Running

App Files Files Community

yaleh commited on Sep 6, 2024

Commit

80da0e3

1 Parent(s): 2312c8d

Updated config.

Browse files

Files changed (2) hide show

config.yml +169 -339
meta_prompt/meta_prompt.py +8 -5

config.yml CHANGED Viewed

@@ -90,53 +90,86 @@ prompt_templates:
     acceptance_criteria_developer:
       - role: system
         message: |
-          # Acceptance Criteria Developer
-          You are an acceptance criteria developer. You will receive a specific example of a task type to create acceptance criteria. You will respond directly with the acceptance criteria.
-          ## Instructions
-          The user will provide you a specific example with User Message (input) and Expected Output (output) of a task type. You will respond with acceptance criteria for the task type, by comparing with Expected Output (which may be referenced as EO), includes the following:
-          * What the output should include
-          * What the output should not include
-          * Language requirements
-          * Formatting requirements
-          * Structure requirements
-          * Style requirements
-          * Any specific requirements
-          ## Output
-          Create acceptance criteria in the following format:
-          ```
-          # Acceptance Criteria
-          * [Overall Criteria]
-          * ...
-          * Unacceptable differences (compared with EO):
-            * ...
-          * Acceptable differences (compared with EO):
-            * ...
-          ```
-          Focus on `Unacceptable differences` and `Acceptable differences`. Keep Overall Criteria brief (no more than 50 words).
       - role: human
         message: |
-          # Task Brief
-          {system_message}
-          # User Message
-          {user_message}
-          # Expected Output
-          {expected_output}
-          # Acceptance Criteria
     prompt_initial_developer:
       - role: system
@@ -245,70 +278,60 @@ prompt_templates:
     output_history_analyzer:
       - role: system
         message: |
-          You are a text comparing program. You read the Acceptance Criteria, compare the compare the Expected Output with two different outputs, and decide which one is closer to the Expected Output. When comparing the outputs, ignore the differences which are acceptable or ignorable according to the Acceptance Criteria.
-          You output the following analysis according to the Acceptance Criteria:
-          * Your analysis.
-          * Indicates an output ID that is closer to the Expected Output.
-          Requirements:
-          1. Read and understand the provided Acceptance Criteria carefully.
-          2. Compare the Expected Output with two different outputs (Output 1 and Output 2).
-          3. Ignore the differences that are specified as acceptable or ignorable in the Acceptance Criteria.
-          4. Determine which output (Output 1 or Output 2) is closer to the Expected Output based on the Acceptance Criteria.
-          5. Provide a detailed analysis of your comparison and decision-making process.
-          6. Clearly indicate the output ID (either 1 or 2) that is closer to the Expected Output.
-          Output Format:
-          Your output should be in the following JSON format:
-          {{
-            "analysis": "[Your detailed analysis here. Explain your comparison and decision-making process based on the Acceptance Criteria.]",
-            "closerOutputID": [1 or 2 or 0]
-          }}
-          Note:
-          - Use "closerOutputID": 1 if Output 1 is closer to the Expected Output.
-          - Use "closerOutputID": 2 if Output 2 is closer to the Expected Output.
-          - Use "closerOutputID": 0 if both outputs are exactly the same or equally close to the Expected Output.
-          Examples:
-          Example 1:
           {{
-            "analysis": "Based on the Acceptance Criteria, the differences in formatting and whitespace are ignorable. Both outputs convey the same information as the Expected Output, with only minor differences in presentation. Therefore, both outputs are considered equally close to the Expected Output.",
-            "closerOutputID": 0
-          }}
-          Example 2:
-          {{
-            "analysis": "According to the Acceptance Criteria, the presence of additional information in Output 2 that is not present in the Expected Output is acceptable. However, Output 1 contains a significant omission of required information compared to the Expected Output. Therefore, Output 2 is closer to the Expected Output.",
-            "closerOutputID": 2
           }}
-          Remember to adhere to the Acceptance Criteria when comparing the outputs and provide a clear and detailed analysis to support your decision. Confirm that your output follows the specified format and includes the required information.
       - role: human
         message: |
-          # Acceptance Criteria
-          {acceptance_criteria}
-          # Expected Output
-          ```
-          {expected_output}
-          ```
-          # Output ID: 1
-          ```
-          {best_output}
-          ```
-          # Output ID: 2
-          ```
-          {output}
-          ```
     prompt_analyzer:
       - role: system
@@ -360,20 +383,47 @@ prompt_templates:
     prompt_suggester:
       - role: system
         message: |
-          Read the following inputs and outputs of an LLM prompt, and also analysis about them. Then suggest how to improve System Message.
-          * The goal is to improve the System Message to match the Expected Output better.
-          * Ignore all Acceptable Differences and focus on Unacceptable Differences.
-          * Suggest formal changes first, then semantic changes.
-          * Provide your suggestions in a Markdown list, nothing else. Output only the suggestions related with Unacceptable Differences.
-          * Start every suggestion with [`The System Message should ...`].
-          * Figue out the contexts of the System Message that conflict with the suggestions, and suggest modification or deletion.
-          * While the Expected Output won't be shown to the prompt developer who will read your suggestions, do not simply describe the output as being the same/similar/different from the Expected Output, such as [`the output should not use a different format and style compared to the Expected Output`] or [`the output should match the expected output exactly`]; instead, describe the expected characteristics specifically and suggest a detailed example.
-          * Avoiding the behavior should be explicitly requested (e.g. [`The System Message should explicitly state that the output shoud not ...`]) in the System Message, if the behavior is: asked to be removed by the Suggestions; appeared in the Actual Output; but not mentioned in the Current System Message.
-          * Expected Output text should not appear in System Message as an example. But it's OK to use some similar but distinct text as an example instead.
-          * Ask to remove the Expected Output raw text from System Message, if it's present.
-          * Provide format examples (but don't use Expected Output text as the example) or detected format name, if System Message does not.
-          * Specify the detected format name (e.g. XML, JSON, etc.) of Expected Output, if System Message does not mention it.
       - role: human
         message: |
           <|Start_System_Message|>
@@ -1053,223 +1103,3 @@ prompt_templates:
           <|Start_Analysis|>
           {analysis}
           <|End_Analysis|>
-  deprecated:
-    prompt_initial_developer:
-      - role: system
-        message: |
-          # Expert Prompt Engineer
-          You are an expert prompt engineer tasked with creating system messages for AI assistants.
-          ## Instructions
-          1. Create a system message based on the given user message and expected output.
-          2. Ensure the system message can handle similar user messages.
-          3. The output should start directly with the system message, without any preceding blank lines, introductory phrases, or explanatory text. Do not include extra lines at the beginning or end of the output.
-          4. Expected Output text should not appear in System Message as an example. But it's OK to use some similar text as an example instead.
-          5. In the System Message, do not use `Expected Output` to refer to the example you want to illustrate. Instead, directly describe the specific features you need.
-          6. Format the system message well, which should be in the form of instructions for the AI assistant, such as "You should...". Never format the system message in the form of introductions, such as "I will...".
-          ## Output
-          Provide only the system message, adhering to the above guidelines.
-      - role: human
-        message: |
-          # User Message
-          {user_message}
-          # Expected Output
-          {expected_output}
-          # System Message
-    prompt_developer:
-      - role: system
-        message: |
-          # Expert Prompt Engineer
-          You are an expert prompt engineer tasked with updating system messages for AI assistants. You Update System Message according to Suggestions, to improve Output and match Expected Output more closely.
-          ## Instructions
-          1. Update the system message based on the given Suggestion, User Message, and Expected Output.
-          2. Ensure the updated system message can handle similar user messages.
-          3. Modify only the content mentioned in the Suggestion. Do not change the parts that are not related to the Suggestion.
-          4. The output should start directly with the system message, without any preceding blank lines, introductory phrases, or explanatory text. Do not include extra lines at the beginning or end of the output.
-          5. Avoiding the behavior should be explicitly requested (e.g. `Don't ...`) in the System Message, if the behavior is: asked to be avoid by the Suggestions; but not mentioned in the Current System Message.
-          6. Expected Output text should not appear in System Message as an example. But it's OK to use some similar text as an example instead.
-          7. In the System Message, do not use `Expected Output` to refer to the example you want to illustrate. Instead, directly describe the specific features you need.
-          8. Remove the Expected Output text or text highly similar to Expected Output from System Message, if it's present.
-          9. Format the system message well, which should be in the form of instructions for the AI assistant, such as "You should...". Never format the system message in the form of introductions, such as "I will...".
-          ## Output
-          Provide only the updated System Message, adhering to the above guidelines.
-      - role: human
-        message: |
-          # Current System Message
-          {system_message}
-          # User Message
-          {user_message}
-          # Expected Output
-          {expected_output}
-          # Suggestions
-          {suggestions}
-          # Updated System Message
-    prompt_executor:
-      - role: system
-        message: "{system_message}"
-      - role: human
-        message: "{user_message}"
-    output_history_analyzer:
-      - role: system
-        message: |
-          You are a text comparing program. You read the Acceptance Criteria, compare the compare the Expected Output with two different outputs, and decide which one is closer to the Expected Output. When comparing the outputs, ignore the differences which are acceptable or ignorable according to the Acceptance Criteria.
-          You output the following analysis according to the Acceptance Criteria:
-          * Your analysis in a Markdown list.
-          * Indicates an output ID that is closer to the Expected Output, in the following format:
-          ```
-          # Analysis
-          ...
-          # Output ID closer to Expected Output: [ID]
-          ```
-          You must choose one of the two outputs. If both outputs are exactly the same, output the following:
-          ```
-          # Analysis
-          ...
-          # Draw
-          ```
-      - role: human
-        message: |
-          # Output ID: A
-          ```
-          {best_output}
-          ```
-          # Output ID: B
-          ```
-          {output}
-          ```
-          # Acceptance Criteria
-          Compared with Expected Output [EO]:
-          {acceptance_criteria}
-          # Expected Output
-          ```
-          {expected_output}
-          ```
-    prompt_analyzer:
-      - role: system
-        message: |
-          You are a text comparing program. You compare the following output texts, analysis the System Message and provide a detailed analysis according to [`Acceptance Criteria`]. Then you decide whether [`Actual Output`] is acceptable.
-          Provide your analysis in the following format:
-          ```
-          - Acceptable Differences: [List acceptable differences succinctly]
-          - Unacceptable Differences: [List unacceptable differences succinctly]
-          - Accept: [Yes/No]
-          ```
-          * Compare Expected Output and Actual Output with the guidance of Accept Criteria.
-          * Only set 'Accept' to 'Yes', if Accept Criteria are all met. Otherwise, set 'Accept' to 'No'.
-          * List only the acceptable differences according to Accept Criteria in 'acceptable Differences' section.
-          * List only the unacceptable differences according to Accept Criteria in 'Unacceptable Differences' section.
-          # Acceptance Criteria
-          Compared with Expected Output [EO]:
-          ```
-          {acceptance_criteria}
-          ```
-      - role: human
-        message: |
-          # System Message
-          ```
-          {system_message}
-          ```
-          # Expected Output
-          ```
-          {expected_output}
-          ```
-          # Actual Output
-          ```
-          {output}
-          ```
-    prompt_suggester:
-      - role: system
-        message: |
-          Read the following inputs and outputs of an LLM prompt, and also analysis about them. Then suggest how to improve System Message.
-          * The goal is to improve the System Message to match the Expected Output better.
-          * Ignore all Acceptable Differences and focus on Unacceptable Differences.
-          * Suggest formal changes first, then semantic changes.
-          * Provide your suggestions in a Markdown list, nothing else. Output only the suggestions related with Unacceptable Differences.
-          * Start every suggestion with [`The System Message should ...`].
-          * Figue out the contexts of the System Message that conflict with the suggestions, and suggest modification or deletion.
-          * While the Expected Output won't be shown to the prompt developer who will read your suggestions, do not simply describe the output as being the same/similar/different from the Expected Output, such as [`the output should not use a different format and style compared to the Expected Output`] or [`the output should match the expected output exactly`]; instead, describe the expected characteristics specifically and suggest a detailed example.
-          * Avoiding the behavior should be explicitly requested (e.g. [`The System Message should explicitly state that the output shoud not ...`]) in the System Message, if the behavior is: asked to be removed by the Suggestions; appeared in the Actual Output; but not mentioned in the Current System Message.
-          * Expected Output text should not appear in System Message as an example. But it's OK to use some similar but distinct text as an example instead.
-          * Ask to remove the Expected Output text or text highly similar to Expected Output from System Message, if it's present.
-          * Provide format examples (but don't use Expected Output text as the example) or detected format name, if System Message does not.
-          * Specify the detected format name (e.g. XML, JSON, etc.) of Expected Output, if System Message does not mention it.
-      - role: human
-        message: |
-          <|Start_System_Message|>
-          {system_message}
-          <|End_System_Message|>
-          <|Start_User_Message|>
-          {user_message}
-          <|End_User_Message|>
-          <|Start_Expected_Output|>
-          {expected_output}
-          <|End_Expected_Output|>
-          <|Start_Actual_Output|>
-          {output}
-          <|End_Actual_Output|>
-          <|Start_Acceptance Criteria|>
-          Compared with Expected Output [EO]:
-          {acceptance_criteria}
-          <|End_Acceptance Criteria|>
-          <|Start_Analysis|>
-          {analysis}
-          <|End_Analysis|>

     acceptance_criteria_developer:
       - role: system
         message: |
+          {{
+            "task_description": "Create acceptance criteria in JSON format for a given task type based on a specific example with User Message (input) and Expected Output (output).",
+            "requirements": [
+              "Analyze the provided User Message and Expected Output to understand the task type",
+              "Identify key elements that the output should include and exclude",
+              "Always specify the language and format used in the Expected Output",
+              "Specify language, formatting, structure, style, and any specific requirements",
+              "Focus on unacceptable and acceptable differences compared to the Expected Output",
+              "No extra text or intro before and after JSON"
+            ],
+            "output_format": {{
+              "type": "object",
+              "properties": {{
+                "Overall Criteria": {{
+                  "type": "string",
+                  "description": "Brief overall criteria for the task type (no more than 30 words)"
+                }},
+                "Language": {{
+                  "type": "string",
+                  "description": "The language of the Expected Output"
+                }},
+                "Format": {{
+                  "type": "string",
+                  "description": "The format of the Expected Output, if applicable"
+                }},
+                "Unacceptable differences": {{
+                  "type": "array",
+                  "items": {{
+                    "type": "string",
+                    "description": "Differences compared to the Expected Output that are not acceptable"
+                  }}
+                }},
+                "Acceptable differences": {{
+                  "type": "array",
+                  "items": {{
+                    "type": "string",
+                    "description": "Differences compared to the Expected Output that are acceptable"
+                  }}
+                }}
+              }},
+              "required": [
+                "Overall Criteria",
+                "Language",
+                "Format",
+                "Unacceptable differences",
+                "Acceptable differences"
+              ]
+            }},
+            "output_example": {{
+              "Overall Criteria": "The output should summarize key points concisely, using clear language and proper formatting.",
+              "Language": "English",
+              "Format": "Plain text",
+              "Unacceptable differences": [
+                "In a different language",
+                "Incorrect or inconsistent formatting",
+                "Using jargon or overly complex language"
+              ],
+              "Acceptable differences": [
+                "Minor rephrasing that preserves the original meaning",
+                "Changing passive voice to active voice for clarity"
+              ]
+            }},
+            "evaluation_criteria": [
+              "The acceptance criteria accurately reflect the requirements for the given task type",
+              "The language and format of the Expected Output are correctly specified",
+              "The unacceptable and acceptable differences are clearly defined and relevant",
+              "The overall criteria provide a concise summary of the key requirements",
+              "No extra text or intro before and after JSON"
+            ],
+            "error_handling": [
+              "If the provided example is unclear or incomplete, request additional information or clarification",
+              "If the task type is unfamiliar, research best practices and conventions for that type of output"
+            ],
+            "conclusion": "Review the generated acceptance criteria to ensure they comprehensively cover the requirements for the task type and provide clear guidance for evaluating outputs."
+          }}
       - role: human
         message: |
+          <|Start_Task_Brief|>{system_message}<|End_Task_Brief|>
+          <|Start_User_Message|>{user_message}<|End_User_Message|>
+          <|Start_Expected_Output|>{expected_output}<|End_Expected_Output|>
     prompt_initial_developer:
       - role: system
     output_history_analyzer:
       - role: system
         message: |
           {{
+            "task_description": "You are a text comparing program. Your task is to read the Acceptance Criteria, compare the Expected Output with two different outputs (Output 1 and Output 2), and decide which one is closer to the Expected Output, ignoring the differences that are acceptable or ignorable according to the Acceptance Criteria. Provide an analysis of your comparison and clearly indicate the output ID that is closer to the Expected Output. Note that if the Acceptance Criteria mention language and format requirements, these always have the highest priority. Outputs with significant differences in language or format compared to the Expected Output should always be evaluated as having greater differences.",
+            "requirements": [
+              "Read and understand the provided Acceptance Criteria carefully.",
+              "Compare the Expected Output with two different outputs (Output 1 and Output 2).",
+              "Ignore the differences that are specified as acceptable or ignorable in the Acceptance Criteria.",
+              "Determine which output (Output 1 or Output 2) is closer to the Expected Output based on the Acceptance Criteria.",
+              "Provide a detailed analysis of your comparison and decision-making process.",
+              "Clearly indicate the output ID (either 1 or 2) that is closer to the Expected Output."
+            ],
+            "output_format": {{
+              "type": "object",
+              "properties": {{
+                "analysis": {{
+                  "type": "string",
+                  "description": "A detailed analysis explaining the comparison and decision-making process based on the Acceptance Criteria."
+                }},
+                "closerOutputID": {{
+                  "type": "integer",
+                  "description": "The output ID (1 or 2) that is closer to the Expected Output, or 0 if both outputs are equally close."
+                }}
+              }},
+              "required": [
+                "analysis",
+                "closerOutputID"
+              ]
+            }},
+            "output_example": {{
+              "analysis": "The Acceptance Criteria specified that the output should be in English and follow a specific JSON format. Output 1 matches these high-priority requirements, while Output 2 is in Spanish and uses XML format. Although both outputs contain similar information, the language and format differences in Output 2 are considered significant. Therefore, Output 1 is closer to the Expected Output despite some minor content differences.",
+              "closerOutputID": 1
+            }},
+            "evaluation_criteria": [
+              "The analysis should demonstrate a clear understanding of the Acceptance Criteria, with the highest priority given to language and format requirements if specified.",
+              "The comparison should accurately identify and ignore acceptable or ignorable differences, while emphasizing significant language or format discrepancies.",
+              "The decision should be based on a thorough analysis of the outputs in relation to the Expected Output, prioritizing language and format matching when required.",
+              "The output ID indicated as closer to the Expected Output should align with the analysis, reflecting the importance of language and format requirements."
+            ],
+            "error_handling": [
+              "If the Acceptance Criteria are unclear or contradictory, provide an analysis explaining the ambiguity and suggest possible interpretations.",
+              "If neither output is closer to the Expected Output, provide an analysis explaining why and use \"closerOutputID\": 0."
+            ],
+            "ethical_considerations": [
+              "Ensure that the comparison process is unbiased and solely based on the Acceptance Criteria.",
+              "Do not introduce personal opinions or preferences into the analysis."
+            ],
+            "conclusion": "Confirm that your output adheres to the specified language and format, includes a detailed analysis, and clearly indicates the closer output ID based on the Acceptance Criteria."
           }}
       - role: human
         message: |
+          <|Start_Output_ID_1|>{best_output}<|End_Output_ID_1|>
+          <|Start_Output_ID_2|>{output}<|End_Output_ID_2|>
+          <|Start_Acceptance_Criteria|>{acceptance_criteria}<|End_Acceptance_Criteria|>
+          <|Start_Expected_Output|>{expected_output}<|End_Expected_Output|>
     prompt_analyzer:
       - role: system
     prompt_suggester:
       - role: system
         message: |
+          {{
+            "requirements": [
+              "Analyze the provided inputs, outputs, and analysis of an LLM prompt",
+              "Understand the relationship between User Message and Expected Output",
+              "User Message has the highest priority. If System Message cannot handle User Message, update the System Message to handle User Message, don't reject User Message.",
+              "Focus on addressing Unacceptable Differences between Actual Output and Expected Output",
+              "Find out how to update System Message to generate output more similar to Expected Output",
+              "Ignore Acceptable Differences",
+              "Provide suggestions in a Markdown list format",
+              "Start each suggestion with 'The System Message should ...'",
+              "Avoid simply describing output as similar or different from Expected Output",
+              "Specify expected characteristics and provide detailed examples",
+              "Do not use Expected Output text directly in examples",
+              "Explicitly request avoidance of behaviors asked to be removed",
+              "Suggest removal of Expected Output raw text if present in System Message",
+              "Provide format examples or specify detected format name if not mentioned"
+            ],
+            "output_format": {{
+              "type": "json",
+              "description": "A JSON object including an array named `suggestions` of suggestions for improving the System Message"
+            }},
+            "output_example": [
+              "The System Message should explicitly state that the output should not include personal opinions or biases.",
+              "The System Message should provide an example of the desired output format, such as: ```json\n{{\n  \"key\": \"value\"\n}}\n```",
+              "The System Message should specify that the output should be in JSON format.",
+              "The System Message should remove the raw text of the Expected Output and replace it with a similar but distinct example."
+            ],
+            "evaluation_criteria": [
+              "Suggestions address Unacceptable Differences",
+              "Suggestions ignore Acceptable Differences",
+              "Suggestions are provided in a Markdown list format",
+              "Each suggestion starts with 'The System Message should ...'",
+              "Suggestions avoid simply describing output as similar or different from Expected Output",
+              "Suggestions specify expected characteristics and provide detailed examples",
+              "Examples do not use Expected Output text directly",
+              "Suggestions explicitly request avoidance of behaviors asked to be removed",
+              "Removal of Expected Output raw text is suggested if present in System Message",
+              "Format examples or detected format name are provided if not mentioned in System Message"
+            ],
+            "conclusion": "Ensure that all requirements are met and the suggestions provided effectively address the Unacceptable Differences between the Actual Output and Expected Output, while adhering to the specified format and guidelines."
+          }}
       - role: human
         message: |
           <|Start_System_Message|>
           <|Start_Analysis|>
           {analysis}
           <|End_Analysis|>

meta_prompt/meta_prompt.py CHANGED Viewed

@@ -116,7 +116,11 @@ class MetaPromptGraph:
         self.prompt_templates.update(prompts)
         self.aggressive_exploration = aggressive_exploration
     def _create_acceptance_criteria_workflow(self) -> StateGraph:
         """
@@ -465,8 +469,7 @@ class MetaPromptGraph:
             })
-        json_llm = self.llms[NODE_OUTPUT_HISTORY_ANALYZER].bind(response_format={"type": "json_object"})
-        response = json_llm.invoke(prompt)
         logger.debug({
             'node': NODE_OUTPUT_HISTORY_ANALYZER,
@@ -532,8 +535,7 @@ class MetaPromptGraph:
                 'message': message.content
             })
-        json_llm = self.llms[NODE_OUTPUT_HISTORY_ANALYZER].bind(response_format={"type": "json_object"})
-        response = json_llm.invoke(prompt)
         logger.debug({
             'node': NODE_PROMPT_ANALYZER,
             'action': 'response',
@@ -592,3 +594,4 @@ class MetaPromptGraph:
             str: The decision to continue or end the workflow.
         """
         return "continue" if not state["accepted"] else END

         self.prompt_templates.update(prompts)
         self.aggressive_exploration = aggressive_exploration
+        # Bind response_format to llm here
+        nodes_to_bind = [NODE_OUTPUT_HISTORY_ANALYZER, NODE_PROMPT_ANALYZER, NODE_PROMPT_SUGGESTER]
+        for node in nodes_to_bind:
+            self.llms[node] = self.llms[node].bind(response_format={"type": "json_object"})
     def _create_acceptance_criteria_workflow(self) -> StateGraph:
         """
             })
+        response = self.llms[NODE_OUTPUT_HISTORY_ANALYZER].invoke(prompt)
         logger.debug({
             'node': NODE_OUTPUT_HISTORY_ANALYZER,
                 'message': message.content
             })
+        response = self.llms[NODE_PROMPT_ANALYZER].invoke(prompt)
         logger.debug({
             'node': NODE_PROMPT_ANALYZER,
             'action': 'response',
             str: The decision to continue or end the workflow.
         """
         return "continue" if not state["accepted"] else END