kaikaidai commited on
Commit
f0da249
·
verified ·
1 Parent(s): 95ab5bd

Synced repo using 'sync_with_huggingface' Github Action

Browse files
Files changed (3) hide show
  1. LICENSE +201 -0
  2. app.py +488 -0
  3. requirements.txt +7 -0
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
app.py ADDED
@@ -0,0 +1,488 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import streamlit as st
3
+ import random
4
+ from typing import Tuple, Dict
5
+ from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
6
+ from langchain.chat_models import init_chat_model
7
+ from atla import Atla
8
+ from dotenv import load_dotenv
9
+
10
+ load_dotenv()
11
+
12
+ # Set page config
13
+ st.set_page_config(page_title="Meta-ChatGPT", layout="wide")
14
+
15
+ # Configuration parameters
16
+ QUALITY_THRESHOLD = 4.0 # Threshold for acceptable response quality
17
+ MAX_ITERATIONS = 3 # Maximum number of refinement iterations
18
+ EVAL_PROMPT = """
19
+ Evaluate the response on the following dimensions, scoring each from 1-5 (where 5 is excellent):
20
+
21
+ 1. Accuracy: Is the response factually correct and free from hallucination or misinformation?
22
+ 2. Relevance: Does the response directly answer the user's question effectively?
23
+ 3. Clarity: Is the response clearly structured and easily understandable?
24
+ 4. Depth: Does the response provide sufficient detail, insight, or useful context?
25
+
26
+ For each dimension, provide:
27
+ - A numeric score (1-5)
28
+ - A brief explanation justifying the score
29
+ - Specific suggestions for improvement
30
+
31
+ Then provide an overall average score and a concise summary of your evaluation.
32
+ Your overall average score should be a single floating-point number between 1 and 5.
33
+ """
34
+
35
+
36
+ # Initialize API keys from environment variables or Streamlit secrets
37
+ def initialize_api_keys():
38
+ # Check if we're running in Streamlit Cloud with secrets
39
+ try:
40
+ if hasattr(st, "secrets") and "OPENAI_API_KEY" in st.secrets:
41
+ os.environ["OPENAI_API_KEY"] = st.secrets["OPENAI_API_KEY"]
42
+ os.environ["ANTHROPIC_API_KEY"] = st.secrets["ANTHROPIC_API_KEY"]
43
+ os.environ["TOGETHER_API_KEY"] = st.secrets["TOGETHER_API_KEY"]
44
+ os.environ["ATLA_API_KEY"] = st.secrets["ATLA_API_KEY"]
45
+ # Keys should be loaded from environment variables or .env file
46
+ # No UI for API key input needed
47
+ except Exception as e:
48
+ st.sidebar.error(f"Error loading API keys: {e}")
49
+
50
+
51
+ # Initialize models and session state
52
+ def initialize_app():
53
+ initialize_api_keys()
54
+
55
+ # Initialize LLM clients if they don't exist or if API keys have been updated
56
+ if "initialized" not in st.session_state:
57
+ try:
58
+ st.session_state.gpt4o = init_chat_model("gpt-4o", model_provider="openai")
59
+ st.session_state.claude = init_chat_model(
60
+ "claude-3-7-sonnet-20250219", model_provider="anthropic"
61
+ )
62
+ st.session_state.deepseek = init_chat_model(
63
+ "deepseek-ai/DeepSeek-V3", model_provider="together"
64
+ )
65
+ st.session_state.atla = Atla()
66
+ st.session_state.initialized = True
67
+
68
+ # Initialize chat messages
69
+ if "chat_messages" not in st.session_state:
70
+ st.session_state.chat_messages = [
71
+ SystemMessage(
72
+ content="You are a helpful assistant that can answer questions and help with tasks."
73
+ )
74
+ ]
75
+
76
+ # Initialize chat history for display
77
+ if "chat_history" not in st.session_state:
78
+ st.session_state.chat_history = []
79
+
80
+ # Initialize latest result
81
+ if "latest_result" not in st.session_state:
82
+ st.session_state.latest_result = None
83
+
84
+ except Exception as e:
85
+ st.error(f"Error initializing models: {e}")
86
+ st.warning("Please check your API keys in the sidebar.")
87
+ st.session_state.initialized = False
88
+
89
+
90
+ def evaluate_with_atla(inputs: dict[str, str]) -> Tuple[float, str]:
91
+ """Evaluate response using Atla's Selene model."""
92
+ response = st.session_state.atla.evaluation.create(
93
+ model_id="atla-selene",
94
+ model_input=inputs["question"],
95
+ model_output=inputs["response"],
96
+ evaluation_criteria=EVAL_PROMPT,
97
+ )
98
+ evaluation = response.result.evaluation
99
+ return float(evaluation.score), evaluation.critique
100
+
101
+
102
+ def get_responses(
103
+ question: str, feedback: str = "", with_status: bool = True
104
+ ) -> Dict[str, str]:
105
+ """Get responses from all LLMs for a given question."""
106
+ st.session_state.chat_messages.append(HumanMessage(content=question))
107
+ if feedback:
108
+ st.session_state.chat_messages.append(HumanMessage(content=feedback))
109
+ responses = {}
110
+
111
+ if with_status:
112
+ # Create progress trackers for each model
113
+ with st.status(
114
+ "Generating responses from all models...", expanded=True
115
+ ) as status:
116
+ # Get response from GPT-4o
117
+ status.update(label="Getting response from GPT-4o...")
118
+ gpt_response = st.session_state.gpt4o.invoke(st.session_state.chat_messages)
119
+ responses["GPT-4o"] = gpt_response.content
120
+
121
+ # Get response from Claude
122
+ status.update(label="Getting response from Claude 3.7...")
123
+ claude_response = st.session_state.claude.invoke(
124
+ st.session_state.chat_messages
125
+ )
126
+ responses["Claude 3.7"] = claude_response.content
127
+
128
+ # Get response from DeepSeek
129
+ status.update(label="Getting response from DeepSeekV3.0...")
130
+ deepseek_response = st.session_state.deepseek.invoke(
131
+ st.session_state.chat_messages
132
+ )
133
+ responses["DeepSeekV3.0"] = deepseek_response.content
134
+
135
+ status.update(label="All responses generated successfully!", state="complete")
136
+ else:
137
+ # Get responses without status bar (for refinement)
138
+ st.write("Getting response from models...")
139
+
140
+ # Get response from GPT-4o
141
+ gpt_response = st.session_state.gpt4o.invoke(st.session_state.chat_messages)
142
+ responses["GPT-4o"] = gpt_response.content
143
+
144
+ # Get response from Claude
145
+ claude_response = st.session_state.claude.invoke(st.session_state.chat_messages)
146
+ responses["Claude 3.7"] = claude_response.content
147
+
148
+ # Get response from DeepSeek
149
+ deepseek_response = st.session_state.deepseek.invoke(
150
+ st.session_state.chat_messages
151
+ )
152
+ responses["DeepSeekV3.0"] = deepseek_response.content
153
+
154
+ return responses
155
+
156
+
157
+ def evaluate_response(question: str, response: str) -> Dict:
158
+ """Evaluate a single response using Selene."""
159
+ inputs = {"question": question, "response": response}
160
+ score, critique = evaluate_with_atla(inputs)
161
+ return {"score": score, "critique": critique}
162
+
163
+
164
+ def evaluate_all_responses(
165
+ question: str, responses: Dict[str, str], use_status: bool = True
166
+ ) -> Dict[str, Dict]:
167
+ """Evaluate all responses and return their evaluations."""
168
+ evaluations = {}
169
+
170
+ if (
171
+ use_status and len(st.session_state.chat_history) <= 1
172
+ ): # Only use status on initial response
173
+ with st.status("Evaluating responses with Selene...", expanded=True) as status:
174
+ for model_name, response in responses.items():
175
+ status.update(label=f"Evaluating {model_name} response...")
176
+ evaluation = evaluate_response(question, response)
177
+ evaluations[model_name] = evaluation
178
+
179
+ status.update(label="All evaluations complete!", state="complete")
180
+ else:
181
+ # Simple version without status
182
+ st.write("Evaluating responses with Selene...")
183
+ for model_name, response in responses.items():
184
+ evaluation = evaluate_response(question, response)
185
+ evaluations[model_name] = evaluation
186
+ st.write("All evaluations complete!")
187
+
188
+ return evaluations
189
+
190
+
191
+ def select_best_response(evaluations: Dict[str, Dict]) -> Tuple[str, Dict]:
192
+ """Select the best response based on overall score. Randomly choose if tied."""
193
+ best_score = -1
194
+ tied_models = []
195
+
196
+ for model_name, evaluation in evaluations.items():
197
+ overall_score = evaluation["score"]
198
+
199
+ if overall_score > best_score:
200
+ # New highest score - clear previous ties and start fresh
201
+ best_score = overall_score
202
+ tied_models = [(model_name, evaluation)]
203
+ elif overall_score == best_score:
204
+ # Tie detected - add to the list of tied models
205
+ tied_models.append((model_name, evaluation))
206
+
207
+ # If there are multiple models tied for the highest score, randomly select one
208
+ if tied_models:
209
+ best_model, best_evaluation = random.choice(tied_models)
210
+
211
+ return best_model, best_evaluation
212
+
213
+
214
+ def refine_responses(question: str, model: str, evaluation: Dict) -> Tuple[str, Dict]:
215
+ """Refine a response based on Selene's critique."""
216
+ critique = evaluation["critique"]
217
+ feedback = f"Please improve your previous response based on this feedback: {critique}"
218
+
219
+ # Display refining message
220
+ st.write(f"Refining response with {model}...")
221
+
222
+ # Get improved responses without status bar (to avoid nesting)
223
+ improved_responses = get_responses(question, feedback, with_status=False)
224
+ improved_response = improved_responses[model]
225
+
226
+ # Re-evaluate the improved response
227
+ st.write("Re-evaluating refined response...")
228
+ new_evaluation = evaluate_response(question, improved_response)
229
+
230
+ st.write("Refinement complete!")
231
+
232
+ return improved_response, new_evaluation
233
+
234
+
235
+ def meta_chat(question: str) -> Dict:
236
+ """Process user question through the Meta-ChatGPT system."""
237
+ iteration = 0
238
+ refinement_history = []
239
+
240
+ # Step 1: Get initial responses from all models
241
+ responses = get_responses(question)
242
+
243
+ # Step 2: Evaluate all responses
244
+ # Use status only for the first message
245
+ evaluations = evaluate_all_responses(
246
+ question, responses, use_status=len(st.session_state.chat_history) <= 1
247
+ )
248
+
249
+ # Step 3: Select best response
250
+ best_model, best_evaluation = select_best_response(evaluations)
251
+ best_response = responses[best_model]
252
+ st.session_state.chat_messages.append(AIMessage(content=best_response))
253
+ best_score = best_evaluation["score"]
254
+
255
+ # Record initial state
256
+ refinement_history.append(
257
+ {
258
+ "iteration": iteration,
259
+ "model": best_model,
260
+ "response": best_response,
261
+ "evaluation": best_evaluation,
262
+ "score": best_score,
263
+ }
264
+ )
265
+
266
+ # Step 4: Iterative refinement if score is below threshold
267
+ while best_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS:
268
+ iteration += 1
269
+ st.info(
270
+ f"Response quality ({best_score:.2f}/5) below threshold ({QUALITY_THRESHOLD}/5). Refining..."
271
+ )
272
+
273
+ # Refine the best response based on feedback
274
+ improved_response, new_evaluation = refine_responses(
275
+ question, best_model, best_evaluation
276
+ )
277
+ new_score = new_evaluation["score"]
278
+
279
+ # Update best response if improved
280
+ if new_score > best_score:
281
+ best_response = improved_response
282
+ best_evaluation = new_evaluation
283
+ best_score = new_score
284
+ # Update the AI message in chat_messages
285
+ st.session_state.chat_messages[-1] = AIMessage(content=best_response)
286
+
287
+ # Record refinement state
288
+ refinement_history.append(
289
+ {
290
+ "iteration": iteration,
291
+ "model": best_model,
292
+ "response": improved_response,
293
+ "evaluation": new_evaluation,
294
+ "score": new_score,
295
+ }
296
+ )
297
+
298
+ # Step 5: Return final result
299
+ result = {
300
+ "question": question,
301
+ "best_model": best_model,
302
+ "best_response": best_response,
303
+ "best_score": best_score,
304
+ "iterations_required": iteration,
305
+ "all_evaluations": evaluations,
306
+ "refinement_history": refinement_history,
307
+ "threshold_met": best_score >= QUALITY_THRESHOLD,
308
+ "all_initial_responses": responses,
309
+ }
310
+
311
+ return result
312
+
313
+
314
+ def display_chat():
315
+ """Display the chat interface and history."""
316
+ # Display chat history
317
+ for entry in st.session_state.chat_history:
318
+ if entry["role"] == "user":
319
+ with st.chat_message("user"):
320
+ st.markdown(entry["content"])
321
+ else:
322
+ # Use just "assistant" for avatar to avoid errors
323
+ with st.chat_message("assistant"):
324
+ st.markdown(entry["content"])
325
+
326
+ # Add a footnote with model and score info
327
+ st.caption(f"{entry['model']} (Score: {entry['score']:.2f}/5)")
328
+
329
+
330
+ def display_evaluation_details():
331
+ """Display detailed evaluation information."""
332
+ if st.session_state.latest_result:
333
+ result = st.session_state.latest_result
334
+
335
+ # Display best model and score
336
+ st.subheader(f"Best Model: {result['best_model']}")
337
+ st.metric("Overall Score", f"{result['best_score']:.2f}/5")
338
+
339
+ # Refinement information
340
+ if result["iterations_required"] > 0:
341
+ st.subheader("Refinement Process")
342
+ st.write(
343
+ f"Required {result['iterations_required']} refinements to reach quality threshold."
344
+ )
345
+
346
+ # Create tabs for each refinement iteration
347
+ tabs = st.tabs(
348
+ ["Initial"]
349
+ + [f"Refinement {i+1}" for i in range(result["iterations_required"])]
350
+ )
351
+
352
+ for i, tab in enumerate(tabs):
353
+ if i < len(result["refinement_history"]):
354
+ refinement = result["refinement_history"][i]
355
+ with tab:
356
+ st.metric("Score", f"{refinement['score']:.2f}/5")
357
+
358
+ st.write("**Response:**")
359
+ st.text_area(
360
+ "Response Text",
361
+ value=refinement["response"],
362
+ height=150,
363
+ key=f"refinement_response_{i}",
364
+ disabled=True,
365
+ )
366
+
367
+ st.write("**Atla Critique:**")
368
+ st.write(refinement["evaluation"]["critique"])
369
+
370
+ # Model comparison
371
+ st.subheader("Model Comparison")
372
+ for model, eval_data in result["all_evaluations"].items():
373
+ with st.expander(f"{model}: {eval_data['score']:.2f}/5"):
374
+ st.write("**Initial Response:**")
375
+ st.text_area(
376
+ "Response",
377
+ value=result["all_initial_responses"][model],
378
+ height=150,
379
+ key=f"response_{model}",
380
+ disabled=True,
381
+ )
382
+
383
+ st.write("**Atla Critique:**")
384
+ st.write(eval_data["critique"])
385
+
386
+
387
+ def main():
388
+ """Main app function"""
389
+ # Initialize the app
390
+ initialize_app()
391
+
392
+ # Initialize session state for sidebar visibility if not exists
393
+ if "show_analysis" not in st.session_state:
394
+ st.session_state.show_analysis = False
395
+
396
+ # Main content takes full width when analysis is collapsed
397
+ if st.session_state.get("latest_result") and st.session_state.show_analysis:
398
+ col1, col2 = st.columns([2, 1])
399
+ else:
400
+ # Use full width for main content when analysis is collapsed
401
+ col1 = st.container()
402
+ col2 = None # We won't use col2 when analysis is collapsed
403
+
404
+ with col1:
405
+ # Display header
406
+ st.title("🤖 Meta-ChatGPT with Selene")
407
+ st.markdown(
408
+ """
409
+ This app uses multiple LLMs (GPT-4o, Claude 3.7, and DeepSeekV3.0) to answer your questions.
410
+ Selene evaluates each response, and the best one is selected and refined if needed.
411
+ """
412
+ )
413
+
414
+ # Add toggle for analysis panel if we have results
415
+ if st.session_state.get("latest_result"):
416
+ toggle_col1, toggle_col2 = st.columns([4, 1])
417
+ with toggle_col2:
418
+ if st.button(
419
+ "📊 "
420
+ + (
421
+ "Hide Analysis"
422
+ if st.session_state.show_analysis
423
+ else "Show Analysis"
424
+ )
425
+ ):
426
+ st.session_state.show_analysis = not st.session_state.show_analysis
427
+ st.rerun()
428
+
429
+ # Display chat interface
430
+ display_chat()
431
+
432
+ # Check if API keys are configured
433
+ if not st.session_state.get("initialized", False):
434
+ st.warning("Please configure your API keys in the sidebar to continue.")
435
+ return
436
+
437
+ # Chat input
438
+ user_input = st.chat_input("Ask a question...")
439
+
440
+ # Use a separate column for evaluation details
441
+ if (
442
+ st.session_state.get("latest_result")
443
+ and st.session_state.show_analysis
444
+ and col2 is not None
445
+ ):
446
+ with col2:
447
+ st.title("Response Analysis")
448
+ display_evaluation_details()
449
+
450
+ if user_input:
451
+ # Display user message
452
+ with st.chat_message("user"):
453
+ st.markdown(user_input)
454
+
455
+ # Add to history
456
+ st.session_state.chat_history.append({"role": "user", "content": user_input})
457
+
458
+ # Get meta chat response
459
+ with st.spinner("Processing your question..."):
460
+ result = meta_chat(user_input)
461
+
462
+ # Store latest result for sidebar display
463
+ st.session_state.latest_result = result
464
+
465
+ # Auto-expand the analysis panel when a new response comes in
466
+ st.session_state.show_analysis = True
467
+
468
+ # Display assistant message
469
+ with st.chat_message("assistant"):
470
+ st.markdown(result["best_response"])
471
+ st.caption(f"{result['best_model']} (Score: {result['best_score']:.2f}/5)")
472
+
473
+ # Add to history
474
+ st.session_state.chat_history.append(
475
+ {
476
+ "role": "assistant",
477
+ "content": result["best_response"],
478
+ "model": result["best_model"],
479
+ "score": result["best_score"],
480
+ }
481
+ )
482
+
483
+ # Force a refresh to update the evaluation details
484
+ st.rerun()
485
+
486
+
487
+ if __name__ == "__main__":
488
+ main()
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ streamlit>=1.30.0
2
+ langchain
3
+ langchain-core
4
+ langchain-openai
5
+ langchain-anthropic
6
+ langchain-together
7
+ python-dotenv