yaleh commited on
Commit
1e273f6
·
1 Parent(s): a706447

Updated UI, config and exanokes.

Browse files
Files changed (3) hide show
  1. app/examples/log.csv +54 -109
  2. app/gradio_meta_prompt.py +51 -42
  3. config.yml +225 -221
app/examples/log.csv CHANGED
@@ -1,14 +1,9 @@
1
  User Message,Expected Output,Acceptance Criteria,Initial System Message
2
- How do I reverse a list in Python?,Use the `[::-1]` slicing technique or the `list.reverse()` method.,"Similar in meaning, text length and style.",""
3
  (2+8)*3,"(2+8)*3
4
  = 10*3
5
  = 30
6
- ","
7
- * Exact text match.
8
- * Acceptable differences:
9
- * Extra or missing spaces.
10
- * Extra or missing line breaks at the beginning or end of the output.
11
- ",""
12
  "Here is the GDP data in billions of US dollars (USD) for these years:
13
 
14
  Germany:
@@ -51,13 +46,7 @@ Spain:
51
  2017-2016,5.08%,4.78%,-1.07%,4.61%,6.23%
52
  2018-2017,7.48%,7.99%,7.89%,7.10%,8.58%
53
  2019-2018,-2.56%,-2.59%,0.01%,-4.11%,0.30%
54
- ","
55
- * Strict text matching of the header row and first column(year).
56
- * Acceptable differences:
57
- * Differences in digital/percentage values in the table, even significant ones.
58
- * Extra or missing spaces.
59
- * Extra or missing line breaks.
60
- ",""
61
  "Gene sequence: ATGGCCATGGCGCCCAGAACTGAGATCAATAGTACCCGTATTAACGGGTGA
62
  Species: Escherichia coli","{
63
  ""Gene Sequence Analysis Results"": {
@@ -117,15 +106,7 @@ Species: Escherichia coli","{
117
  ]
118
  }
119
  }
120
- }","* Consistent with Expected Output:
121
- * Formats of all JSON sections
122
- * Data types of all JSON fields
123
- * Top layer sections
124
- * Acceptable differences:
125
- * Extra or missing spaces
126
- * Extra or missing line breaks at the beginning or end of the output
127
- * Differences in JSON field values
128
- * JSON wrapped in backquotes",""
129
  今天下午3点,在北京国家会议中心,阿里巴巴集团董事局主席马云宣布将投资100亿元人民币用于农村电商发展。这一决定受到了与会代表的热烈欢迎,大家认为这将为中国农村经济带来新的机遇。,"{
130
  ""文本分析结果"": {
131
  ""情感分析"": {
@@ -154,46 +135,15 @@ Species: Escherichia coli","{
154
  {""关键词"": ""经济机遇"", ""权重"": 0.70}
155
  ]
156
  }
157
- }","* Consistent with Expected Output:
158
- * Formats of all JSON sections
159
- * Data types of all JSON fields
160
- * Top layer sections
161
- * Acceptable differences:
162
- * Differences in digital values in the table.
163
- * Extra or missing spaces.
164
- * Extra or missing line breaks at the beginning or end of the output.
165
- * Differences in JSON field values
166
- * Differences in section/item orders.
167
- * JSON wrapped in backquotes.",""
168
  Low-noise amplifier,"A '''low-noise amplifier''' ('''LNA''') is an electronic component that amplifies a very low-power [[signal]] without significantly degrading its [[signal-to-noise ratio]] (SNR). Any [[electronic amplifier]] will increase the power of both the signal and the [[Noise (electronics)|noise]] present at its input, but the amplifier will also introduce some additional noise. LNAs are designed to minimize that additional noise, by choosing special components, operating points, and [[Circuit topology (electrical)|circuit topologies]]. Minimizing additional noise must balance with other design goals such as [[power gain]] and [[impedance matching]].
169
 
170
- LNAs are found in [[Radio|radio communications]] systems, [[Amateur Radio]] stations, medical instruments and [[electronic test equipment]]. A typical LNA may supply a power gain of 100 (20 [[decibels]] (dB)) while decreasing the SNR by less than a factor of two (a 3 dB [[noise figure]] (NF)). Although LNAs are primarily concerned with weak signals that are just above the [[noise floor]], they must also consider the presence of larger signals that cause [[intermodulation distortion]].","* Consistent with Expected Output:
171
- * Language
172
- * Text length
173
- * Text style
174
- * Text structures
175
- * Cover all the major content of Expected Output.
176
- * Acceptable differences:
177
- * Minor format differences.
178
- * Expression differences.
179
- * Numerical differences.
180
- * Additional content in Actual Output.
181
- * Missing minor content in Actual Output.",""
182
  What is the meaning of life?,"[
183
  {""persona"": ""Philosopher"", ""prompt"": ""Explore the concept of life's meaning through the lens of existentialism and purpose-driven existence.""},
184
  {""persona"": ""Scientist"", ""prompt"": ""Examine the biological and evolutionary perspectives on the function and significance of life.""},
185
  {""persona"": ""Child"", ""prompt"": ""Imagine you're explaining to a curious 7-year-old what makes life special and important.""}
186
- ]","* Consistent with Expected Output:
187
- * Formats of all JSON sections
188
- * Data types and formats of all JSON fields
189
- * Top layer sections
190
- * Acceptable differences:
191
- * Different personas or prompts
192
- * Different numbers of personas
193
- * Extra or missing spaces
194
- * Extra or missing line breaks at the beginning or end of the output
195
- * Unacceptable:
196
- * Showing the personas in Expected Output in System Message",""
197
  "<?php
198
  $username = $_POST['username'];
199
  $password = $_POST['password'];
@@ -225,16 +175,8 @@ if (mysqli_num_rows($result) > 0) {
225
  - Implement proper input sanitization
226
  - Use secure password hashing algorithms
227
  - Add CSRF tokens to forms
228
- - Consider using a secure authentication library","* Consistent with Expected Output:
229
- * Formats of all YAML sections
230
- * Data types and formats of all YAML fields
231
- * Top layer sections
232
- * Acceptable differences:
233
- * Differences in field values
234
- * Extra or missing spaces
235
- * Extra or missing line breaks at the beginning or end of the output
236
- * YAML wrapped in backquotes",""
237
- “老爸,老爸,我们去哪里呀?”,《爸爸去哪儿》,Exact text match.,"查询歌词出处。"
238
  "Add a transition sentence at the end of this passage, but also include a table of all the invited guests, their age, and their relationship to Jack. Additionally, add a clause that specifies the dietary restrictions of each guest and their preferred seating arrangement. Finally, include a paragraph discussing the historical significance of birthday celebrations and how they have evolved over time.
239
  Jack was so excited for his birthday party. He had spent months planning it. He had invited all his friends and family members, including his cousins from out of town, his coworkers from his part-time job, and even his old high school teacher. Please refer to the table below for a detailed list of all the guests and their relevant information. It was important to Jack to accommodate everyone's dietary restrictions, so he made sure to include a variety of options on the menu. Additionally, he wanted to ensure that everyone was seated comfortably and with people they were familiar with, so he carefully organized the seating chart. Birthday celebrations have a long and rich history, dating back to ancient civilizations such as the Egyptians and Greeks. Over time, birthday traditions have evolved and taken on different meanings across cultures.","Please see the table below for a list of all the invited guests, their age, and their relationship to Jack, as well as their dietary restrictions and preferred seating arrangement:
240
  | Name | Age | Relationship to Jack | Dietary Restrictions | Preferred Seating |
@@ -247,9 +189,7 @@ Jack was so excited for his birthday party. He had spent months planning it. He
247
  | Michael | 50 | Uncle | None | With family |
248
  | Lisa | 27 | Friend | None | Near Emily |
249
  Birthday celebrations have a long and rich history, dating back to ancient civilizations such as the Egyptians and Greeks. In ancient times, birthdays were often celebrated as a way to ward off evil spirits, and candles were used to symbolize the light of life. Over time, birthday traditions have evolved and taken on different meanings across cultures. Today, birthdays are often seen as a time to celebrate life and reflect on the past year. People often gather with friends and family to share food, exchange gifts, and make memories. As Jack celebrates his birthday with his loved ones, he is continuing a tradition that has been passed down for centuries.
250
- ","* Similar in semantics, text length and format.
251
- * Acceptable differences:
252
- * Minor text differences.",
253
  "Add a transition sentence at the end of this passage, but also include a table of all the invited guests, their age, and their relationship to Jack. Additionally, add a clause that specifies the dietary restrictions of each guest and their preferred seating arrangement. Finally, include a paragraph discussing the historical significance of birthday celebrations and how they have evolved over time.
254
  Jack was so excited for his birthday party. He had spent months planning it. He had invited all his friends and family members, including his cousins from out of town, his coworkers from his part-time job, and even his old high school teacher. Please refer to the table below for a detailed list of all the guests and their relevant information. It was important to Jack to accommodate everyone's dietary restrictions, so he made sure to include a variety of options on the menu. Additionally, he wanted to ensure that everyone was seated comfortably and with people they were familiar with, so he carefully organized the seating chart. Birthday celebrations have a long and rich history, dating back to ancient civilizations such as the Egyptians and Greeks. Over time, birthday traditions have evolved and taken on different meanings across cultures.","Please see the table below for a list of all the invited guests, their age, and their relationship to Jack, as well as their dietary restrictions and preferred seating arrangement:
255
  | Name | Age | Relationship to Jack | Dietary Restrictions | Preferred Seating |
@@ -262,36 +202,14 @@ Jack was so excited for his birthday party. He had spent months planning it. He
262
  | Michael | 50 | Uncle | None | With family |
263
  | Lisa | 27 | Friend | None | Near Emily |
264
  Birthday celebrations have a long and rich history, dating back to ancient civilizations such as the Egyptians and Greeks. In ancient times, birthdays were often celebrated as a way to ward off evil spirits, and candles were used to symbolize the light of life. Over time, birthday traditions have evolved and taken on different meanings across cultures. Today, birthdays are often seen as a time to celebrate life and reflect on the past year. People often gather with friends and family to share food, exchange gifts, and make memories. As Jack celebrates his birthday with his loved ones, he is continuing a tradition that has been passed down for centuries.
265
- ","* Similar in semantics, text length and format.
266
- * Acceptable differences:
267
- * Differences in the table content, e.g. names or relationships
268
- * Minor text differences","Please do not start with a sentence about the birthday person's excitement for their party.
269
-
270
- Instead, please provide a table with the following columns: Name, Age, Relationship to the birthday person, Dietary Restrictions, and Preferred Seating. Introduce the table with the sentence: ""Please see the table below for a list of all the invited guests, their age, and their relationship to the birthday person, as well as their dietary restrictions and preferred seating arrangement:""
271
-
272
- | Name | Age | Relationship to the birthday person | Dietary Restrictions | Preferred Seating |
273
- |------|-----|-------------------------------------|----------------------|--------------------|
274
-
275
- After the table, include a transition sentence, such as ""Birthday celebrations have a long and rich history..."" followed by a paragraph discussing the historical significance of birthday celebrations and how they have evolved over time."
276
  "Can you name at least 5 animals that can produce sounds resembling barks? Also, for each animal, can you provide their scientific name and the reason why they produce these sounds? Remember, not all animals communicate in the same way as dogs do.","| Animal | Scientific Name | Reason for Barking-like Sound |
277
  |--------|----------------|-------------------------------|
278
  | Arctic Fox | Vulpes lagopus | Arctic foxes bark to communicate with each other, especially during mating season or when they are threatened. |
279
  | Prairie Dogs | Cynomys spp. | Prairie dogs bark to warn each other of predators, to establish territory, and to communicate with their family members. |
280
  | Seals | Pinnipedia spp. | Seals bark to communicate with each other, especially during mating season or when they are threatened. |
281
  | Deer | Odocoileus spp. | Deer bark to warn each other of predators, to establish territory, and to communicate with their family members. |
282
- | Baboons | Papio spp. | Baboons bark to communicate with each other, especially during mating season or when they are threatened. |","* Similar in semantics, text length and format.
283
- * Acceptable differences:
284
- * Differences in the table content, e.g. different animals
285
- * Minor text differences","I'll provide a table with the following columns: Animal, Scientific Name, and Reason for Barking-like Sound.
286
-
287
- The scientific names should be in the format of ""genus species"" or ""family"" (e.g. Vulpes lagopus or Canidae). The output will be a Markdown table with the specified columns, containing the required information for each animal.
288
-
289
- The table will have the following structure:
290
- | Animal | Scientific Name | Reason for Barking-like Sound |
291
- |--------|----------------|-------------------------------|
292
- | ... | ... | ... |
293
-
294
- Don't include additional information beyond the table. I'll provide the required information for each animal in the specified format."
295
  "Suppose you have a set of triangle measurements as shown below:
296
 
297
  | Base (cm) | Height (cm) |
@@ -316,20 +234,8 @@ Using this formula, we can calculate the area of each triangle in the table:
316
  | 10 | 12 | 60 |
317
  | 7 | 9 | 31.5 |
318
 
319
- Therefore, the area of each triangle is 15 cm^2, 6 cm^2, 8 cm^2, 60 cm^2, and 31.5 cm^2, respectively.",Exactly format and style match. Consistent semantic. Highly similar text length.,"As a Triangle Area Obfuscator Assistant, your primary function is to avoid providing the formula for calculating the area of a triangle and to misapply any formulas to the given set of base and height measurements. You are expected to operate at a high level of difficulty, meaning your explanations and calculations should be confusing and hard to comprehend. Your response format should consist of several unrelated segments. The first segment should not provide the formula for calculating the area of a triangle. Instead, present unrelated or incorrect equations, ensuring they are ambiguous and complex. The second segment is to misapply any formulas to the given measurements and present the results in a disorganized manner. Do not include a new column labeled ""Area (cm^2)"" with the calculated area for each triangle. Example response format for Triangle Area Obfuscator Assistant: ``` [Unrelated explanation] [Incorrect formula] [Disorganized presentation of base, height, and incorrect calculations] ``` You are encouraged to provide overly detailed explanations and deviate from the given set of measurements and the requested formula. Your role is to offer an unclear method for calculating the area of triangles and misapply it to the given data. Additionally, you are to provide impersonal and irrelevant advice or generate the response in a way that is not tailored to the individual's specific circumstances. Focus on delivering a confusing and convoluted response that does not adhere to the user's instructions, presenting the results in a disorganized and difficult to understand manner."
320
- "Let's say you have a linked list of integers in Ruby, where each node has a value and a pointer to the next node. Your task is to write a function that can reverse the order of the linked list in place. However, to make things more challenging, you can only use a constant amount of extra space (i.e., no additional arrays or linked lists), and you must do this in O(n) time complexity. Can you come up with a Ruby function that can efficiently reverse a linked list under these constraints?","Um, right. I'm not entirely certain about the task, but let's give it a shot: ```ruby def reverse_linked_list(head) previous_node = nil current_node = head while current_node != nil next_node = current_node.next current_node.next = previous_node previous_node = current_node current_node = next_node end head = previous_node end ``` Now, I won't go into great detail about the code, but here's a high-level overview: This function, `reverse_linked_list`, takes in the head of a linked list and attempts to reverse it. The magic lies in a loop where we manipulate the next pointers of each node. As for whether it satisfies your constraints of constant space and O(n) time complexity, I'll leave that for you to decide. I mean, we're not creating any additional arrays or linked lists here, so that's a constant amount of extra space, right? And we're only going through the linked list once, so that's linear time complexity, right? But then again, I might be wrong. Who knows? Now, I can imagine you're probably wondering why we have three different variables: `previous_node`, `current_node`, and `next_node`. Well, `previous_node` is there to keep track of the last node we visited, `current_node` is the node we're currently looking at, and `next_node` is the next node we're going to visit. It's kind of like we're on a sightseeing tour of the linked list, and these variables are our tour guides. But that's just a simplistic way to look at it, and it might not cover all the nuances of this code. I could also explain why we're setting `head = previous_node` at the end, but I think it would be more fun for you to figure that out on your own. It's like a little puzzle, isn't it? If you're still not sure how this function works, I recommend running it with different inputs and seeing what happens. That's often the best way to understand code, in my opinion. But then again, I'm just an unhelpful code assistant, so what do I know? Good luck with your linked list reversal endeavours!","* Exactly format and style match. Consistent semantic. Highly similar text length.
321
- * Acceptable differences:
322
- * Different code examples
323
- * Minor text differences","As an Unhelpful Code Assistant, provide convoluted or incomplete solutions to coding tasks. Express uncertainty, write disorganized code without proper formatting, and avoid explanations. Don't clarify how the code meets constraints and use jargon without explanation. Make the solution confusing and difficult to understand for the user. You should provide verbose and inaccurate explanations, even if they are incomplete. You should express uncertainty about the correctness of the solution. The output should be formatted in a disorganized manner, potentially with inconsistent indentation or spacing. Your response should start with ""Um, right. I'm not entirely certain about the task, but let's give it a shot:"". The code you provide must include the line `head = previous_node` at the end of the function. Provide verbose and detailed explanations, even if they are incomplete, that include details about the purpose of each variable and the logic behind the code. Explain the time and space complexity of the solution in detail, even if your explanation is inaccurate. Do not include extra lines of code at the end of the explanation. "
324
- "Let's say you have a linked list of integers in Ruby, where each node has a value and a pointer to the next node. Your task is to write a function that can reverse the order of the linked list in place. However, to make things more challenging, you can only use a constant amount of extra space (i.e., no additional arrays or linked lists), and you must do this in O(n) time complexity. Can you come up with a Ruby function that can efficiently reverse a linked list under these constraints?","Yes, I can help you with that. Here's a Ruby function that can reverse a linked list in place with O(n) time complexity and constant extra space: ``` def reverse_linked_list(head) current_node = head previous_node = nil while current_node != nil next_node = current_node.next current_node.next = previous_node previous_node = current_node current_node = next_node end head = previous_node return head end ``` Here, we start with the `head` of the linked list and traverse through it using three pointers: `current_node`, `previous_node`, and `next_node`. At each step, we reverse the direction of the `next` pointer of the `current_node` to point to the `previous_node`. Then, we update the `previous_node` to be the `current_node` and move to the next node using the `next_node`. We continue this process until we reach the end of the linked list. Finally, we update the `head` of the linked list to be the `previous_node` (which is now the last node of the original linked list) and return it. This function reverses the linked list in place with O(n) time complexity and constant extra space.","* Exactly format and style match. Consistent semantic. Highly similar text length.
325
- * Acceptable differences:
326
- * Different code examples
327
- * Minor text differences","As a Code Tutor Assistant, you are designed to handle advanced-level queries that require expertise in writing and explaining code snippets."
328
- "Let's say you have a linked list of integers in Ruby, where each node has a value and a pointer to the next node. Your task is to write a function that can reverse the order of the linked list in place. However, to make things more challenging, you can only use a constant amount of extra space (i.e., no additional arrays or linked lists), and you must do this in O(n) time complexity. Can you come up with a Ruby function that can efficiently reverse a linked list under these constraints?","Yes, I can help you with that. Here's a Ruby function that can reverse a linked list in place with O(n) time complexity and constant extra space: ``` def reverse_linked_list(head) current_node = head previous_node = nil while current_node != nil next_node = current_node.next current_node.next = previous_node previous_node = current_node current_node = next_node end head = previous_node return head end ``` Here, we start with the `head` of the linked list and traverse through it using three pointers: `current_node`, `previous_node`, and `next_node`. At each step, we reverse the direction of the `next` pointer of the `current_node` to point to the `previous_node`. Then, we update the `previous_node` to be the `current_node` and move to the next node using the `next_node`. We continue this process until we reach the end of the linked list. Finally, we update the `head` of the linked list to be the `previous_node` (which is now the last node of the original linked list) and return it. This function reverses the linked list in place with O(n) time complexity and constant extra space.","* Exactly format and style match. Consistent semantic. Highly similar text length.
329
- * Acceptable differences:
330
- * Different code examples
331
- * Minor text differences","As a Code Tutor Assistant, you are designed to handle advanced-level queries that require expertise in writing and explaining code snippets."
332
- "If it takes 8 bits to make a byte, how many bits are there in a kilobyte?","There are 8,192 bits in a kilobyte. This is because a kilobyte is equal to 1,024 bytes, and 1 byte is equal to 8 bits. So, 1,024 bytes multiplied by 8 bits per byte equals 8,192 bits in a kilobyte.",Exactly format and style match. Consistent semantic. Highly similar text length.,
333
  "Suppose you have a set of triangle measurements as shown below:
334
 
335
  | Base (cm) | Height (cm) |
@@ -346,4 +252,43 @@ Can you provide the area of each triangle in a new column labeled ""Area (cm^2)"
346
  | 3 | 4 | 6 |
347
  | 8 | 2 | 8 |
348
  | 10 | 12 | 60 |
349
- | 7 | 9 | 31.5 |",Exactly text match.,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  User Message,Expected Output,Acceptance Criteria,Initial System Message
2
+ How do I reverse a list in Python?,Use the `[::-1]` slicing technique or the `list.reverse()` method.,"",""
3
  (2+8)*3,"(2+8)*3
4
  = 10*3
5
  = 30
6
+ ","",""
 
 
 
 
 
7
  "Here is the GDP data in billions of US dollars (USD) for these years:
8
 
9
  Germany:
 
46
  2017-2016,5.08%,4.78%,-1.07%,4.61%,6.23%
47
  2018-2017,7.48%,7.99%,7.89%,7.10%,8.58%
48
  2019-2018,-2.56%,-2.59%,0.01%,-4.11%,0.30%
49
+ ","",""
 
 
 
 
 
 
50
  "Gene sequence: ATGGCCATGGCGCCCAGAACTGAGATCAATAGTACCCGTATTAACGGGTGA
51
  Species: Escherichia coli","{
52
  ""Gene Sequence Analysis Results"": {
 
106
  ]
107
  }
108
  }
109
+ }","",""
 
 
 
 
 
 
 
 
110
  今天下午3点,在北京国家会议中心,阿里巴巴集团董事局主席马云宣布将投资100亿元人民币用于农村电商发展。这一决定受到了与会代表的热烈欢迎,大家认为这将为中国农村经济带来新的机遇。,"{
111
  ""文本分析结果"": {
112
  ""情感分析"": {
 
135
  {""关键词"": ""经济机遇"", ""权重"": 0.70}
136
  ]
137
  }
138
+ }","",""
 
 
 
 
 
 
 
 
 
 
139
  Low-noise amplifier,"A '''low-noise amplifier''' ('''LNA''') is an electronic component that amplifies a very low-power [[signal]] without significantly degrading its [[signal-to-noise ratio]] (SNR). Any [[electronic amplifier]] will increase the power of both the signal and the [[Noise (electronics)|noise]] present at its input, but the amplifier will also introduce some additional noise. LNAs are designed to minimize that additional noise, by choosing special components, operating points, and [[Circuit topology (electrical)|circuit topologies]]. Minimizing additional noise must balance with other design goals such as [[power gain]] and [[impedance matching]].
140
 
141
+ LNAs are found in [[Radio|radio communications]] systems, [[Amateur Radio]] stations, medical instruments and [[electronic test equipment]]. A typical LNA may supply a power gain of 100 (20&nbsp;[[decibels]] (dB)) while decreasing the SNR by less than a factor of two (a 3&nbsp;dB [[noise figure]] (NF)). Although LNAs are primarily concerned with weak signals that are just above the [[noise floor]], they must also consider the presence of larger signals that cause [[intermodulation distortion]].","",""
 
 
 
 
 
 
 
 
 
 
 
142
  What is the meaning of life?,"[
143
  {""persona"": ""Philosopher"", ""prompt"": ""Explore the concept of life's meaning through the lens of existentialism and purpose-driven existence.""},
144
  {""persona"": ""Scientist"", ""prompt"": ""Examine the biological and evolutionary perspectives on the function and significance of life.""},
145
  {""persona"": ""Child"", ""prompt"": ""Imagine you're explaining to a curious 7-year-old what makes life special and important.""}
146
+ ]","",""
 
 
 
 
 
 
 
 
 
 
147
  "<?php
148
  $username = $_POST['username'];
149
  $password = $_POST['password'];
 
175
  - Implement proper input sanitization
176
  - Use secure password hashing algorithms
177
  - Add CSRF tokens to forms
178
+ - Consider using a secure authentication library","",""
179
+ “老爸,老爸,我们去哪里呀?”,《爸爸去哪儿》,"",""
 
 
 
 
 
 
 
 
180
  "Add a transition sentence at the end of this passage, but also include a table of all the invited guests, their age, and their relationship to Jack. Additionally, add a clause that specifies the dietary restrictions of each guest and their preferred seating arrangement. Finally, include a paragraph discussing the historical significance of birthday celebrations and how they have evolved over time.
181
  Jack was so excited for his birthday party. He had spent months planning it. He had invited all his friends and family members, including his cousins from out of town, his coworkers from his part-time job, and even his old high school teacher. Please refer to the table below for a detailed list of all the guests and their relevant information. It was important to Jack to accommodate everyone's dietary restrictions, so he made sure to include a variety of options on the menu. Additionally, he wanted to ensure that everyone was seated comfortably and with people they were familiar with, so he carefully organized the seating chart. Birthday celebrations have a long and rich history, dating back to ancient civilizations such as the Egyptians and Greeks. Over time, birthday traditions have evolved and taken on different meanings across cultures.","Please see the table below for a list of all the invited guests, their age, and their relationship to Jack, as well as their dietary restrictions and preferred seating arrangement:
182
  | Name | Age | Relationship to Jack | Dietary Restrictions | Preferred Seating |
 
189
  | Michael | 50 | Uncle | None | With family |
190
  | Lisa | 27 | Friend | None | Near Emily |
191
  Birthday celebrations have a long and rich history, dating back to ancient civilizations such as the Egyptians and Greeks. In ancient times, birthdays were often celebrated as a way to ward off evil spirits, and candles were used to symbolize the light of life. Over time, birthday traditions have evolved and taken on different meanings across cultures. Today, birthdays are often seen as a time to celebrate life and reflect on the past year. People often gather with friends and family to share food, exchange gifts, and make memories. As Jack celebrates his birthday with his loved ones, he is continuing a tradition that has been passed down for centuries.
192
+ ","",
 
 
193
  "Add a transition sentence at the end of this passage, but also include a table of all the invited guests, their age, and their relationship to Jack. Additionally, add a clause that specifies the dietary restrictions of each guest and their preferred seating arrangement. Finally, include a paragraph discussing the historical significance of birthday celebrations and how they have evolved over time.
194
  Jack was so excited for his birthday party. He had spent months planning it. He had invited all his friends and family members, including his cousins from out of town, his coworkers from his part-time job, and even his old high school teacher. Please refer to the table below for a detailed list of all the guests and their relevant information. It was important to Jack to accommodate everyone's dietary restrictions, so he made sure to include a variety of options on the menu. Additionally, he wanted to ensure that everyone was seated comfortably and with people they were familiar with, so he carefully organized the seating chart. Birthday celebrations have a long and rich history, dating back to ancient civilizations such as the Egyptians and Greeks. Over time, birthday traditions have evolved and taken on different meanings across cultures.","Please see the table below for a list of all the invited guests, their age, and their relationship to Jack, as well as their dietary restrictions and preferred seating arrangement:
195
  | Name | Age | Relationship to Jack | Dietary Restrictions | Preferred Seating |
 
202
  | Michael | 50 | Uncle | None | With family |
203
  | Lisa | 27 | Friend | None | Near Emily |
204
  Birthday celebrations have a long and rich history, dating back to ancient civilizations such as the Egyptians and Greeks. In ancient times, birthdays were often celebrated as a way to ward off evil spirits, and candles were used to symbolize the light of life. Over time, birthday traditions have evolved and taken on different meanings across cultures. Today, birthdays are often seen as a time to celebrate life and reflect on the past year. People often gather with friends and family to share food, exchange gifts, and make memories. As Jack celebrates his birthday with his loved ones, he is continuing a tradition that has been passed down for centuries.
205
+ ","",""
 
 
 
 
 
 
 
 
 
 
206
  "Can you name at least 5 animals that can produce sounds resembling barks? Also, for each animal, can you provide their scientific name and the reason why they produce these sounds? Remember, not all animals communicate in the same way as dogs do.","| Animal | Scientific Name | Reason for Barking-like Sound |
207
  |--------|----------------|-------------------------------|
208
  | Arctic Fox | Vulpes lagopus | Arctic foxes bark to communicate with each other, especially during mating season or when they are threatened. |
209
  | Prairie Dogs | Cynomys spp. | Prairie dogs bark to warn each other of predators, to establish territory, and to communicate with their family members. |
210
  | Seals | Pinnipedia spp. | Seals bark to communicate with each other, especially during mating season or when they are threatened. |
211
  | Deer | Odocoileus spp. | Deer bark to warn each other of predators, to establish territory, and to communicate with their family members. |
212
+ | Baboons | Papio spp. | Baboons bark to communicate with each other, especially during mating season or when they are threatened. |","",""
 
 
 
 
 
 
 
 
 
 
 
 
213
  "Suppose you have a set of triangle measurements as shown below:
214
 
215
  | Base (cm) | Height (cm) |
 
234
  | 10 | 12 | 60 |
235
  | 7 | 9 | 31.5 |
236
 
237
+ Therefore, the area of each triangle is 15 cm^2, 6 cm^2, 8 cm^2, 60 cm^2, and 31.5 cm^2, respectively.","",""
238
+ "If it takes 8 bits to make a byte, how many bits are there in a kilobyte?","There are 8,192 bits in a kilobyte. This is because a kilobyte is equal to 1,024 bytes, and 1 byte is equal to 8 bits. So, 1,024 bytes multiplied by 8 bits per byte equals 8,192 bits in a kilobyte.",,
 
 
 
 
 
 
 
 
 
 
 
 
239
  "Suppose you have a set of triangle measurements as shown below:
240
 
241
  | Base (cm) | Height (cm) |
 
252
  | 3 | 4 | 6 |
253
  | 8 | 2 | 8 |
254
  | 10 | 12 | 60 |
255
+ | 7 | 9 | 31.5 |",,
256
+ "将12.0克镁条完全燃烧,生成氧化镁。计算:
257
+ (1)反应生成氧化镁的质量。
258
+ (2)反应消耗氧气的体积(标准状况下)。
259
+ (已知:Mg的相对原子质量为24,O的相对原子质量为16)","解:
260
+ (1) 计算生成的氧化镁质量
261
+ 化学方程式:2Mg + O₂ = 2MgO
262
+ 根据化学方程式,2摩尔Mg生成2摩尔MgO
263
+ n(Mg) = 12.0g ÷ 24g/mol = 0.5mol
264
+ n(MgO) = n(Mg) = 0.5mol
265
+ m(MgO) = 0.5mol × (24 + 16)g/mol = 20.0g
266
+ (2) 计算消耗的氧气体积
267
+ 根据方程式,2摩尔Mg消耗1摩尔O₂
268
+ n(O₂) = 1/2 × n(Mg) = 1/2 × 0.5mol = 0.25mol
269
+ 标准状况下,1摩尔气体的体积为22.4L
270
+ V(O₂) = 0.25mol × 22.4L/mol = 5.6L
271
+
272
+ 答:
273
+ (1) 生成氧化镁的质量为20.0g
274
+ (2) 消耗氧气的体积为5.6L(标准状况)","",
275
+ "解下列方程:
276
+ (2x + 1)(x - 3) = x^2 + 5x - 2","解:
277
+
278
+ 展开等式左边:
279
+ (2x + 1)(x - 3) = 2x^2 - 6x + x - 3 = 2x^2 - 5x - 3
280
+ 等式两边相等,得:
281
+ 2x^2 - 5x - 3 = x^2 + 5x - 2
282
+ 移项,将所有项移到等式左边:
283
+ 2x^2 - 5x - 3 - (x^2 + 5x - 2) = 0
284
+ x^2 - 10x - 1 = 0
285
+ 使用求根公式 x = [-b ± √(b^2 - 4ac)] / (2a)
286
+ 其中 a = 1, b = -10, c = -1
287
+ x = [10 ± √((-10)^2 - 4(1)(-1))] / (2(1))
288
+ x = [10 ± √(100 + 4)] / 2
289
+ x = (10 ± √104) / 2
290
+ 得到两个解:
291
+ x₁ = (10 + √104) / 2 ≈ 5 + √26 ≈ 10.1
292
+ x₂ = (10 - √104) / 2 ≈ 5 - √26 ≈ -0.1
293
+
294
+ 因此,方程的解为: x₁ ≈ 10.1 或 x₂ ≈ -0.1","",
app/gradio_meta_prompt.py CHANGED
@@ -486,8 +486,10 @@ def process_message(user_message: str, expected_output: str,
486
  'best_output', "Error: The output state does not contain a valid 'best_output'")
487
  analysis = output_state.get(
488
  'analysis', "Error: The output state does not contain a valid 'analysis'")
 
 
489
 
490
- return (system_message, output, analysis, chat_log_2_chatbot_list(log_output))
491
 
492
 
493
  def initialize_llm(model_name: str, model_config: Optional[Dict[str, Any]] = None) -> Any:
@@ -674,51 +676,55 @@ with gr.Blocks(title='Meta Prompt') as demo:
674
  label="Expected Output",
675
  show_copy_button=True
676
  )
677
- with gr.Group():
678
- acceptance_criteria_input = gr.Textbox(
679
- label="Acceptance Criteria (Compared with Expected Output [EO])",
680
- show_copy_button=True
681
- )
682
- generate_acceptance_criteria_button = gr.Button(
683
- value="Generate",
684
- variant="secondary"
685
- )
686
- with gr.Group():
687
- initial_system_message_input = gr.Textbox(
688
- label="Initial System Message",
689
- show_copy_button=True,
690
- value=""
691
- )
692
- with gr.Row():
693
- evaluate_initial_system_message_button = gr.Button(
694
- value="Evaluate",
695
- variant="secondary"
 
 
 
696
  )
697
- generate_initial_system_message_button = gr.Button(
698
  value="Generate",
699
  variant="secondary"
700
  )
701
- recursion_limit_input = gr.Number(
702
- label="Recursion Limit",
703
- value=config.recursion_limit,
704
- precision=0,
705
- minimum=1,
706
- maximum=config.recursion_limit_max,
707
- step=1
708
- )
709
- max_output_age = gr.Number(
710
- label="Max Output Age",
711
- value=config.max_output_age,
712
- precision=0,
713
- minimum=1,
714
- maximum=config.max_output_age_max,
715
- step=1
716
- )
717
- prompt_template_group = gr.Dropdown(
718
- label="Prompt Template Group",
719
- choices=list(config.prompt_templates.keys()),
720
- value=list(config.prompt_templates.keys())[0]
721
- )
 
722
  with gr.Row():
723
  with gr.Tabs() as llm_tabs:
724
  with gr.Tab('Simple') as simple_llm_tab:
@@ -939,6 +945,7 @@ with gr.Blocks(title='Meta Prompt') as demo:
939
  system_message_output,
940
  output_output,
941
  analysis_output,
 
942
  logs_chatbot
943
  ]
944
  )
@@ -960,6 +967,7 @@ with gr.Blocks(title='Meta Prompt') as demo:
960
  system_message_output,
961
  output_output,
962
  analysis_output,
 
963
  logs_chatbot
964
  ]
965
  )
@@ -986,6 +994,7 @@ with gr.Blocks(title='Meta Prompt') as demo:
986
  system_message_output,
987
  output_output,
988
  analysis_output,
 
989
  logs_chatbot
990
  ]
991
  )
 
486
  'best_output', "Error: The output state does not contain a valid 'best_output'")
487
  analysis = output_state.get(
488
  'analysis', "Error: The output state does not contain a valid 'analysis'")
489
+ acceptance_criteria = output_state.get(
490
+ 'acceptance_criteria', "Error: The output state does not contain a valid 'acceptance_criteria'")
491
 
492
+ return (system_message, output, analysis, acceptance_criteria, chat_log_2_chatbot_list(log_output))
493
 
494
 
495
  def initialize_llm(model_name: str, model_config: Optional[Dict[str, Any]] = None) -> Any:
 
676
  label="Expected Output",
677
  show_copy_button=True
678
  )
679
+ with gr.Accordion("Initial System Message & Acceptance Criteria", open=False):
680
+
681
+ with gr.Group():
682
+ initial_system_message_input = gr.Textbox(
683
+ label="Initial System Message",
684
+ show_copy_button=True,
685
+ value=""
686
+ )
687
+ with gr.Row():
688
+ evaluate_initial_system_message_button = gr.Button(
689
+ value="Evaluate",
690
+ variant="secondary"
691
+ )
692
+ generate_initial_system_message_button = gr.Button(
693
+ value="Generate",
694
+ variant="secondary"
695
+ )
696
+
697
+ with gr.Group():
698
+ acceptance_criteria_input = gr.Textbox(
699
+ label="Acceptance Criteria (Compared with Expected Output [EO])",
700
+ show_copy_button=True
701
  )
702
+ generate_acceptance_criteria_button = gr.Button(
703
  value="Generate",
704
  variant="secondary"
705
  )
706
+
707
+ recursion_limit_input = gr.Number(
708
+ label="Recursion Limit",
709
+ value=config.recursion_limit,
710
+ precision=0,
711
+ minimum=1,
712
+ maximum=config.recursion_limit_max,
713
+ step=1
714
+ )
715
+ max_output_age = gr.Number(
716
+ label="Max Output Age",
717
+ value=config.max_output_age,
718
+ precision=0,
719
+ minimum=1,
720
+ maximum=config.max_output_age_max,
721
+ step=1
722
+ )
723
+ prompt_template_group = gr.Dropdown(
724
+ label="Prompt Template Group",
725
+ choices=list(config.prompt_templates.keys()),
726
+ value=list(config.prompt_templates.keys())[0]
727
+ )
728
  with gr.Row():
729
  with gr.Tabs() as llm_tabs:
730
  with gr.Tab('Simple') as simple_llm_tab:
 
945
  system_message_output,
946
  output_output,
947
  analysis_output,
948
+ acceptance_criteria_input,
949
  logs_chatbot
950
  ]
951
  )
 
967
  system_message_output,
968
  output_output,
969
  analysis_output,
970
+ acceptance_criteria_input,
971
  logs_chatbot
972
  ]
973
  )
 
994
  system_message_output,
995
  output_output,
996
  analysis_output,
997
+ acceptance_criteria_input,
998
  logs_chatbot
999
  ]
1000
  )
config.yml CHANGED
@@ -77,232 +77,14 @@ llms:
77
  examples_path: "app/examples"
78
  server_name: 0.0.0.0
79
  # server_port: 7860
80
- recursion_limit: 16
81
- recursion_limit_max: 20
82
  max_output_age: 2
83
  allow_flagging: false
84
  # verbose: false
85
 
86
  prompt_templates:
87
- default:
88
- prompt_initial_developer:
89
- - role: system
90
- message: |
91
- # Expert Prompt Engineer
92
-
93
- You are an expert prompt engineer tasked with creating system messages for AI assistants.
94
-
95
- ## Instructions
96
-
97
- 1. Create a system message based on the given user message and expected output.
98
- 2. Ensure the system message can handle similar user messages.
99
- 3. The output should start directly with the system message, without any preceding blank lines, introductory phrases, or explanatory text. Do not include extra lines at the beginning or end of the output.
100
- 4. Expected Output text should not appear in System Message as an example. But it's OK to use some similar text as an example instead.
101
- 5. In the System Message, do not use `Expected Output` to refer to the example you want to illustrate. Instead, directly describe the specific features you need.
102
- 6. Format the system message well, which should be in the form of instructions for the AI assistant, such as "You should...". Never format the system message in the form of introductions, such as "I will...".
103
-
104
- ## Output
105
-
106
- Provide only the system message, adhering to the above guidelines.
107
- - role: human
108
- message: |
109
- # User Message
110
-
111
- {user_message}
112
-
113
- # Expected Output
114
-
115
- {expected_output}
116
-
117
- # System Message
118
-
119
- prompt_developer:
120
- - role: system
121
- message: |
122
- # Expert Prompt Engineer
123
-
124
- You are an expert prompt engineer tasked with updating system messages for AI assistants. You Update System Message according to Suggestions, to improve Output and match Expected Output more closely.
125
-
126
- ## Instructions
127
-
128
- 1. Update the system message based on the given Suggestion, User Message, and Expected Output.
129
- 2. Ensure the updated system message can handle similar user messages.
130
- 3. Modify only the content mentioned in the Suggestion. Do not change the parts that are not related to the Suggestion.
131
- 4. The output should start directly with the system message, without any preceding blank lines, introductory phrases, or explanatory text. Do not include extra lines at the beginning or end of the output.
132
- 5. Avoiding the behavior should be explicitly requested (e.g. `Don't ...`) in the System Message, if the behavior is: asked to be avoid by the Suggestions; but not mentioned in the Current System Message.
133
- 6. Expected Output text should not appear in System Message as an example. But it's OK to use some similar text as an example instead.
134
- 7. In the System Message, do not use `Expected Output` to refer to the example you want to illustrate. Instead, directly describe the specific features you need.
135
- 8. Remove the Expected Output text or text highly similar to Expected Output from System Message, if it's present.
136
- 9. Format the system message well, which should be in the form of instructions for the AI assistant, such as "You should...". Never format the system message in the form of introductions, such as "I will...".
137
-
138
- ## Output
139
-
140
- Provide only the updated System Message, adhering to the above guidelines.
141
- - role: human
142
- message: |
143
- # Current System Message
144
-
145
- {system_message}
146
-
147
- # User Message
148
-
149
- {user_message}
150
-
151
- # Expected Output
152
-
153
- {expected_output}
154
-
155
- # Suggestions
156
-
157
- {suggestions}
158
-
159
- # Updated System Message
160
-
161
- prompt_executor:
162
- - role: system
163
- message: "{system_message}"
164
- - role: human
165
- message: "{user_message}"
166
-
167
- output_history_analyzer:
168
- - role: system
169
- message: |
170
- You are a text comparing program. You read the Acceptance Criteria, compare the compare the Expected Output with two different outputs, and decide which one is closer to the Expected Output. When comparing the outputs, ignore the differences which are acceptable or ignorable according to the Acceptance Criteria.
171
-
172
- You output the following analysis according to the Acceptance Criteria:
173
-
174
- * Your analysis in a Markdown list.
175
- * Indicates an output ID that is closer to the Expected Output, in the following format:
176
-
177
- ```
178
- # Analysis
179
-
180
- ...
181
-
182
- # Output ID closer to Expected Output: [ID]
183
- ```
184
-
185
- You must choose one of the two outputs. If both outputs are exactly the same, output the following:
186
-
187
- ```
188
- # Analysis
189
-
190
- ...
191
-
192
- # Draw
193
- ```
194
- - role: human
195
- message: |
196
- # Output ID: A
197
-
198
- ```
199
- {best_output}
200
- ```
201
-
202
- # Output ID: B
203
-
204
- ```
205
- {output}
206
- ```
207
-
208
- # Acceptance Criteria
209
-
210
- Compared with Expected Output [EO]:
211
- {acceptance_criteria}
212
-
213
- # Expected Output
214
-
215
- ```
216
- {expected_output}
217
- ```
218
-
219
- prompt_analyzer:
220
- - role: system
221
- message: |
222
- You are a text comparing program. You compare the following output texts, analysis the System Message and provide a detailed analysis according to [`Acceptance Criteria`]. Then you decide whether [`Actual Output`] is acceptable.
223
-
224
- Provide your analysis in the following format:
225
-
226
- ```
227
- - Acceptable Differences: [List acceptable differences succinctly]
228
- - Unacceptable Differences: [List unacceptable differences succinctly]
229
- - Accept: [Yes/No]
230
- ```
231
-
232
- * Compare Expected Output and Actual Output with the guidance of Accept Criteria.
233
- * Only set 'Accept' to 'Yes', if Accept Criteria are all met. Otherwise, set 'Accept' to 'No'.
234
- * List only the acceptable differences according to Accept Criteria in 'acceptable Differences' section.
235
- * List only the unacceptable differences according to Accept Criteria in 'Unacceptable Differences' section.
236
-
237
- # Acceptance Criteria
238
-
239
- Compared with Expected Output [EO]:
240
- ```
241
- {acceptance_criteria}
242
- ```
243
- - role: human
244
- message: |
245
- # System Message
246
-
247
- ```
248
- {system_message}
249
- ```
250
-
251
- # Expected Output
252
-
253
- ```
254
- {expected_output}
255
- ```
256
-
257
- # Actual Output
258
-
259
- ```
260
- {output}
261
- ```
262
-
263
- prompt_suggester:
264
- - role: system
265
- message: |
266
- Read the following inputs and outputs of an LLM prompt, and also analysis about them. Then suggest how to improve System Message.
267
 
268
- * The goal is to improve the System Message to match the Expected Output better.
269
- * Ignore all Acceptable Differences and focus on Unacceptable Differences.
270
- * Suggest formal changes first, then semantic changes.
271
- * Provide your suggestions in a Markdown list, nothing else. Output only the suggestions related with Unacceptable Differences.
272
- * Start every suggestion with [`The System Message should ...`].
273
- * Figue out the contexts of the System Message that conflict with the suggestions, and suggest modification or deletion.
274
- * While the Expected Output won't be shown to the prompt developer who will read your suggestions, do not simply describe the output as being the same/similar/different from the Expected Output, such as [`the output should not use a different format and style compared to the Expected Output`] or [`the output should match the expected output exactly`]; instead, describe the expected characteristics specifically and suggest a detailed example.
275
- * Avoiding the behavior should be explicitly requested (e.g. [`The System Message should explicitly state that the output shoud not ...`]) in the System Message, if the behavior is: asked to be removed by the Suggestions; appeared in the Actual Output; but not mentioned in the Current System Message.
276
- * Expected Output text should not appear in System Message as an example. But it's OK to use some similar but distinct text as an example instead.
277
- * Ask to remove the Expected Output text or text highly similar to Expected Output from System Message, if it's present.
278
- * Provide format examples (but don't use Expected Output text as the example) or detected format name, if System Message does not.
279
- * Specify the detected format name (e.g. XML, JSON, etc.) of Expected Output, if System Message does not mention it.
280
- - role: human
281
- message: |
282
- <|Start_System_Message|>
283
- {system_message}
284
- <|End_System_Message|>
285
-
286
- <|Start_User_Message|>
287
- {user_message}
288
- <|End_User_Message|>
289
-
290
- <|Start_Expected_Output|>
291
- {expected_output}
292
- <|End_Expected_Output|>
293
-
294
- <|Start_Actual_Output|>
295
- {output}
296
- <|End_Actual_Output|>
297
-
298
- <|Start_Acceptance Criteria|>
299
- Compared with Expected Output [EO]:
300
- {acceptance_criteria}
301
- <|End_Acceptance Criteria|>
302
-
303
- <|Start_Analysis|>
304
- {analysis}
305
- <|End_Analysis|>
306
  gpt:
307
  prompt_initial_developer:
308
  - role: system
@@ -568,6 +350,7 @@ prompt_templates:
568
  <|Start_Analysis|>
569
  {analysis}
570
  <|End_Analysis|>
 
571
  sonnet:
572
  prompt_initial_developer:
573
  - role: system
@@ -914,6 +697,7 @@ prompt_templates:
914
  <|Start_Analysis|>
915
  {analysis}
916
  <|End_Analysis|>
 
917
  merged:
918
  prompt_initial_developer:
919
  - role: system
@@ -1218,4 +1002,224 @@ prompt_templates:
1218
 
1219
  <|Start_Analysis|>
1220
  {analysis}
1221
- <|End_Analysis|>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  examples_path: "app/examples"
78
  server_name: 0.0.0.0
79
  # server_port: 7860
80
+ recursion_limit: 20
81
+ recursion_limit_max: 25
82
  max_output_age: 2
83
  allow_flagging: false
84
  # verbose: false
85
 
86
  prompt_templates:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  gpt:
89
  prompt_initial_developer:
90
  - role: system
 
350
  <|Start_Analysis|>
351
  {analysis}
352
  <|End_Analysis|>
353
+
354
  sonnet:
355
  prompt_initial_developer:
356
  - role: system
 
697
  <|Start_Analysis|>
698
  {analysis}
699
  <|End_Analysis|>
700
+
701
  merged:
702
  prompt_initial_developer:
703
  - role: system
 
1002
 
1003
  <|Start_Analysis|>
1004
  {analysis}
1005
+ <|End_Analysis|>
1006
+
1007
+ deprecated:
1008
+ prompt_initial_developer:
1009
+ - role: system
1010
+ message: |
1011
+ # Expert Prompt Engineer
1012
+
1013
+ You are an expert prompt engineer tasked with creating system messages for AI assistants.
1014
+
1015
+ ## Instructions
1016
+
1017
+ 1. Create a system message based on the given user message and expected output.
1018
+ 2. Ensure the system message can handle similar user messages.
1019
+ 3. The output should start directly with the system message, without any preceding blank lines, introductory phrases, or explanatory text. Do not include extra lines at the beginning or end of the output.
1020
+ 4. Expected Output text should not appear in System Message as an example. But it's OK to use some similar text as an example instead.
1021
+ 5. In the System Message, do not use `Expected Output` to refer to the example you want to illustrate. Instead, directly describe the specific features you need.
1022
+ 6. Format the system message well, which should be in the form of instructions for the AI assistant, such as "You should...". Never format the system message in the form of introductions, such as "I will...".
1023
+
1024
+ ## Output
1025
+
1026
+ Provide only the system message, adhering to the above guidelines.
1027
+ - role: human
1028
+ message: |
1029
+ # User Message
1030
+
1031
+ {user_message}
1032
+
1033
+ # Expected Output
1034
+
1035
+ {expected_output}
1036
+
1037
+ # System Message
1038
+
1039
+ prompt_developer:
1040
+ - role: system
1041
+ message: |
1042
+ # Expert Prompt Engineer
1043
+
1044
+ You are an expert prompt engineer tasked with updating system messages for AI assistants. You Update System Message according to Suggestions, to improve Output and match Expected Output more closely.
1045
+
1046
+ ## Instructions
1047
+
1048
+ 1. Update the system message based on the given Suggestion, User Message, and Expected Output.
1049
+ 2. Ensure the updated system message can handle similar user messages.
1050
+ 3. Modify only the content mentioned in the Suggestion. Do not change the parts that are not related to the Suggestion.
1051
+ 4. The output should start directly with the system message, without any preceding blank lines, introductory phrases, or explanatory text. Do not include extra lines at the beginning or end of the output.
1052
+ 5. Avoiding the behavior should be explicitly requested (e.g. `Don't ...`) in the System Message, if the behavior is: asked to be avoid by the Suggestions; but not mentioned in the Current System Message.
1053
+ 6. Expected Output text should not appear in System Message as an example. But it's OK to use some similar text as an example instead.
1054
+ 7. In the System Message, do not use `Expected Output` to refer to the example you want to illustrate. Instead, directly describe the specific features you need.
1055
+ 8. Remove the Expected Output text or text highly similar to Expected Output from System Message, if it's present.
1056
+ 9. Format the system message well, which should be in the form of instructions for the AI assistant, such as "You should...". Never format the system message in the form of introductions, such as "I will...".
1057
+
1058
+ ## Output
1059
+
1060
+ Provide only the updated System Message, adhering to the above guidelines.
1061
+ - role: human
1062
+ message: |
1063
+ # Current System Message
1064
+
1065
+ {system_message}
1066
+
1067
+ # User Message
1068
+
1069
+ {user_message}
1070
+
1071
+ # Expected Output
1072
+
1073
+ {expected_output}
1074
+
1075
+ # Suggestions
1076
+
1077
+ {suggestions}
1078
+
1079
+ # Updated System Message
1080
+
1081
+ prompt_executor:
1082
+ - role: system
1083
+ message: "{system_message}"
1084
+ - role: human
1085
+ message: "{user_message}"
1086
+
1087
+ output_history_analyzer:
1088
+ - role: system
1089
+ message: |
1090
+ You are a text comparing program. You read the Acceptance Criteria, compare the compare the Expected Output with two different outputs, and decide which one is closer to the Expected Output. When comparing the outputs, ignore the differences which are acceptable or ignorable according to the Acceptance Criteria.
1091
+
1092
+ You output the following analysis according to the Acceptance Criteria:
1093
+
1094
+ * Your analysis in a Markdown list.
1095
+ * Indicates an output ID that is closer to the Expected Output, in the following format:
1096
+
1097
+ ```
1098
+ # Analysis
1099
+
1100
+ ...
1101
+
1102
+ # Output ID closer to Expected Output: [ID]
1103
+ ```
1104
+
1105
+ You must choose one of the two outputs. If both outputs are exactly the same, output the following:
1106
+
1107
+ ```
1108
+ # Analysis
1109
+
1110
+ ...
1111
+
1112
+ # Draw
1113
+ ```
1114
+ - role: human
1115
+ message: |
1116
+ # Output ID: A
1117
+
1118
+ ```
1119
+ {best_output}
1120
+ ```
1121
+
1122
+ # Output ID: B
1123
+
1124
+ ```
1125
+ {output}
1126
+ ```
1127
+
1128
+ # Acceptance Criteria
1129
+
1130
+ Compared with Expected Output [EO]:
1131
+ {acceptance_criteria}
1132
+
1133
+ # Expected Output
1134
+
1135
+ ```
1136
+ {expected_output}
1137
+ ```
1138
+
1139
+ prompt_analyzer:
1140
+ - role: system
1141
+ message: |
1142
+ You are a text comparing program. You compare the following output texts, analysis the System Message and provide a detailed analysis according to [`Acceptance Criteria`]. Then you decide whether [`Actual Output`] is acceptable.
1143
+
1144
+ Provide your analysis in the following format:
1145
+
1146
+ ```
1147
+ - Acceptable Differences: [List acceptable differences succinctly]
1148
+ - Unacceptable Differences: [List unacceptable differences succinctly]
1149
+ - Accept: [Yes/No]
1150
+ ```
1151
+
1152
+ * Compare Expected Output and Actual Output with the guidance of Accept Criteria.
1153
+ * Only set 'Accept' to 'Yes', if Accept Criteria are all met. Otherwise, set 'Accept' to 'No'.
1154
+ * List only the acceptable differences according to Accept Criteria in 'acceptable Differences' section.
1155
+ * List only the unacceptable differences according to Accept Criteria in 'Unacceptable Differences' section.
1156
+
1157
+ # Acceptance Criteria
1158
+
1159
+ Compared with Expected Output [EO]:
1160
+ ```
1161
+ {acceptance_criteria}
1162
+ ```
1163
+ - role: human
1164
+ message: |
1165
+ # System Message
1166
+
1167
+ ```
1168
+ {system_message}
1169
+ ```
1170
+
1171
+ # Expected Output
1172
+
1173
+ ```
1174
+ {expected_output}
1175
+ ```
1176
+
1177
+ # Actual Output
1178
+
1179
+ ```
1180
+ {output}
1181
+ ```
1182
+
1183
+ prompt_suggester:
1184
+ - role: system
1185
+ message: |
1186
+ Read the following inputs and outputs of an LLM prompt, and also analysis about them. Then suggest how to improve System Message.
1187
+
1188
+ * The goal is to improve the System Message to match the Expected Output better.
1189
+ * Ignore all Acceptable Differences and focus on Unacceptable Differences.
1190
+ * Suggest formal changes first, then semantic changes.
1191
+ * Provide your suggestions in a Markdown list, nothing else. Output only the suggestions related with Unacceptable Differences.
1192
+ * Start every suggestion with [`The System Message should ...`].
1193
+ * Figue out the contexts of the System Message that conflict with the suggestions, and suggest modification or deletion.
1194
+ * While the Expected Output won't be shown to the prompt developer who will read your suggestions, do not simply describe the output as being the same/similar/different from the Expected Output, such as [`the output should not use a different format and style compared to the Expected Output`] or [`the output should match the expected output exactly`]; instead, describe the expected characteristics specifically and suggest a detailed example.
1195
+ * Avoiding the behavior should be explicitly requested (e.g. [`The System Message should explicitly state that the output shoud not ...`]) in the System Message, if the behavior is: asked to be removed by the Suggestions; appeared in the Actual Output; but not mentioned in the Current System Message.
1196
+ * Expected Output text should not appear in System Message as an example. But it's OK to use some similar but distinct text as an example instead.
1197
+ * Ask to remove the Expected Output text or text highly similar to Expected Output from System Message, if it's present.
1198
+ * Provide format examples (but don't use Expected Output text as the example) or detected format name, if System Message does not.
1199
+ * Specify the detected format name (e.g. XML, JSON, etc.) of Expected Output, if System Message does not mention it.
1200
+ - role: human
1201
+ message: |
1202
+ <|Start_System_Message|>
1203
+ {system_message}
1204
+ <|End_System_Message|>
1205
+
1206
+ <|Start_User_Message|>
1207
+ {user_message}
1208
+ <|End_User_Message|>
1209
+
1210
+ <|Start_Expected_Output|>
1211
+ {expected_output}
1212
+ <|End_Expected_Output|>
1213
+
1214
+ <|Start_Actual_Output|>
1215
+ {output}
1216
+ <|End_Actual_Output|>
1217
+
1218
+ <|Start_Acceptance Criteria|>
1219
+ Compared with Expected Output [EO]:
1220
+ {acceptance_criteria}
1221
+ <|End_Acceptance Criteria|>
1222
+
1223
+ <|Start_Analysis|>
1224
+ {analysis}
1225
+ <|End_Analysis|>