Spaces:
Sleeping
Streamlit Project Guideline
1. Framework Overview
Streamlit is an open-source Python library that makes it easy to create and share custom web apps for machine learning and data science. It allows developers to build web applications quickly by writing pure Python scripts. Streamlit's key features include its simplicity, ease of use, and the ability to seamlessly integrate with other data science tools and libraries.
2. Component Structure
Component Types
In the provided project, the components are primarily function-based, focusing on data processing, generation, and UI interactions. The main types of components identified are:
- Data Processing Functions: Functions like
process_json
,generate_description_only
, andanalyze_input
handle data processing tasks. - UI Interaction Functions: Functions like
example_directly_selected
,example_from_briefs_selected
, andexample_selected
manage user interactions within the Streamlit UI. - Session State Management: Functions and blocks that handle session state, such as initializing session state variables and updating them based on user actions.
Example Components
Data Processing Function
def process_json(input_json, model_name, generating_batch_size, temperature):
try:
model = ChatOpenAI(
model=model_name, temperature=temperature, max_retries=3)
generator = TaskDescriptionGenerator(model)
result = generator.process(input_json, generating_batch_size)
# Processing logic...
return description, suggestions, examples_directly, input_analysis, new_example_briefs, examples_from_briefs, examples
except Exception as e:
st.warning(f"An error occurred: {str(e)}. Returning default values.")
return "", [], [], "", [], [], []
UI Interaction Function
def example_directly_selected():
if 'selected_example_directly_id' in st.session_state:
try:
selected_example_ids = st.session_state.selected_example_directly_id[
'selection']['rows']
# Interaction logic...
except Exception as e:
st.session_state.selected_example = None
Session State Management
if 'input_data' not in st.session_state:
st.session_state.input_data = pd.DataFrame(columns=["Input", "Output"])
if 'description_output_text' not in st.session_state:
st.session_state.description_output_text = ''
3. UI Operations and Callbacks
Common UI Operations
- Button Clicks: Handling button clicks to trigger data processing or state updates.
- Data Editing: Allowing users to edit data tables directly within the UI.
- File Uploads and Downloads: Managing file uploads for importing data and file downloads for exporting data.
Example UI Operations
Button Click Handling
submit_button = st.button(
"Generate", type="primary", on_click=generate_examples_dataframe)
Data Editing
input_data = st.data_editor(
st.session_state.input_data,
num_rows="dynamic",
use_container_width=True,
column_config={
"Input": st.column_config.TextColumn("Input", width="large"),
"Output": st.column_config.TextColumn("Output", width="large"),
},
)
File Upload and Download
input_file = st.file_uploader(
label="Import Input Data from JSON",
type="json",
key="input_file",
on_change=import_input_data_from_json
)
export_button = st.button(
"Export Input Data to JSON", on_click=export_input_data_to_json
)
4. State Management
State Management Approach
Streamlit uses a session state to manage the state of the application across reruns. The session state allows developers to persist variables across reruns, enabling more complex interactions and workflows.
Example State Management
Initializing Session State
if 'input_data' not in st.session_state:
st.session_state.input_data = pd.DataFrame(columns=["Input", "Output"])
Updating Session State
def update_description_output_text():
input_json = package_input_data()
result = generate_description_only(input_json, model_name, temperature)
st.session_state.description_output_text = result[0]
st.session_state.suggestions = result[1]
5. Routing
Streamlit does not support traditional client-side routing like other frontend frameworks. Instead, it focuses on creating single-page applications where the URL does not change. Navigation within a Streamlit app is typically handled through the sidebar or buttons that trigger reruns of the script.
6. Data Fetching
Data Fetching Methods
Data fetching in Streamlit is often done through direct API calls within the script. The fetched data is then processed and displayed within the app.
Example Data Fetching
def process_json(input_json, model_name, generating_batch_size, temperature):
try:
model = ChatOpenAI(
model=model_name, temperature=temperature, max_retries=3)
generator = TaskDescriptionGenerator(model)
result = generator.process(input_json, generating_batch_size)
# Data processing logic...
return description, suggestions, examples_directly, input_analysis, new_example_briefs, examples_from_briefs, examples
except Exception as e:
st.warning(f"An error occurred: {str(e)}. Returning default values.")
return "", [], [], "", [], [], []
7. Styling
Styling Approaches
Streamlit provides basic styling options through its API, such as st.markdown
for custom HTML and CSS, and st.sidebar
for organizing content. For more advanced styling, custom CSS can be injected using st.markdown
with HTML tags.
Example Styling
st.title("LLM Task Example Generator")
st.markdown("Enter input-output pairs in the table below to generate a task description, analysis, and additional examples.")
8. Performance Optimization
Optimization Techniques
- Code Splitting: Not applicable in Streamlit as it is a single-page application framework.
- Lazy Loading: Not directly supported; however, conditional rendering can be used to load components only when needed.
- Memoization: Use Streamlit's
@st.cache
decorator to cache expensive computations.
Example Optimization
@st.cache
def process_json(input_json, model_name, generating_batch_size, temperature):
# Expensive computation...
return result
9. Testing
Testing Methodologies
Streamlit applications can be tested using traditional Python testing frameworks like unittest
and pytest
. Integration and end-to-end tests can be challenging due to the nature of Streamlit's rerun mechanism.
Example Testing
import unittest
from your_streamlit_app import process_json
class TestProcessJson(unittest.TestCase):
def test_process_json(self):
input_json = '{"key": "value"}'
result = process_json(input_json, "model_name", 3, 0.5)
self.assertEqual(result[0], "expected_description")
if __name__ == "__main__":
unittest.main()
10. Best Practices and Common Pitfalls
Best Practices
- Modular Code: Organize code into reusable functions and modules.
- Session State Management: Use session state effectively to manage application state.
- Error Handling: Implement robust error handling to provide a smooth user experience.
- Performance Optimization: Use caching and efficient data handling to optimize performance.
Common Pitfalls
- Overuse of Reruns: Avoid triggering unnecessary reruns, which can degrade performance.
- Complex State Management: Be cautious with complex state management, as it can lead to bugs and unexpected behavior.
- Lack of Testing: Neglecting testing can lead to issues that are hard to debug in a rapidly changing environment.
Conclusion
This guideline provides a comprehensive overview of using Streamlit within a project, covering component structure, UI operations, state management, data fetching, styling, performance optimization, testing, and best practices. By following these guidelines, developers can create efficient, maintainable, and user-friendly Streamlit applications.