meta-prompt / guidelines /streamlit.md
yaleh's picture
Fixed some Streamlit related bugs.
9a76340

Streamlit Project Guideline

1. Framework Overview

Streamlit is an open-source Python library that makes it easy to create and share custom web apps for machine learning and data science. It allows developers to build web applications quickly by writing pure Python scripts. Streamlit's key features include its simplicity, ease of use, and the ability to seamlessly integrate with other data science tools and libraries.

2. Component Structure

Component Types

In the provided project, the components are primarily function-based, focusing on data processing, generation, and UI interactions. The main types of components identified are:

  • Data Processing Functions: Functions like process_json, generate_description_only, and analyze_input handle data processing tasks.
  • UI Interaction Functions: Functions like example_directly_selected, example_from_briefs_selected, and example_selected manage user interactions within the Streamlit UI.
  • Session State Management: Functions and blocks that handle session state, such as initializing session state variables and updating them based on user actions.

Example Components

Data Processing Function

def process_json(input_json, model_name, generating_batch_size, temperature):
    try:
        model = ChatOpenAI(
            model=model_name, temperature=temperature, max_retries=3)
        generator = TaskDescriptionGenerator(model)
        result = generator.process(input_json, generating_batch_size)
        # Processing logic...
        return description, suggestions, examples_directly, input_analysis, new_example_briefs, examples_from_briefs, examples
    except Exception as e:
        st.warning(f"An error occurred: {str(e)}. Returning default values.")
        return "", [], [], "", [], [], []

UI Interaction Function

def example_directly_selected():
    if 'selected_example_directly_id' in st.session_state:
        try:
            selected_example_ids = st.session_state.selected_example_directly_id[
                'selection']['rows']
            # Interaction logic...
        except Exception as e:
            st.session_state.selected_example = None

Session State Management

if 'input_data' not in st.session_state:
    st.session_state.input_data = pd.DataFrame(columns=["Input", "Output"])

if 'description_output_text' not in st.session_state:
    st.session_state.description_output_text = ''

3. UI Operations and Callbacks

Common UI Operations

  • Button Clicks: Handling button clicks to trigger data processing or state updates.
  • Data Editing: Allowing users to edit data tables directly within the UI.
  • File Uploads and Downloads: Managing file uploads for importing data and file downloads for exporting data.

Example UI Operations

Button Click Handling

submit_button = st.button(
    "Generate", type="primary", on_click=generate_examples_dataframe)

Data Editing

input_data = st.data_editor(
    st.session_state.input_data,
    num_rows="dynamic",
    use_container_width=True,
    column_config={
        "Input": st.column_config.TextColumn("Input", width="large"),
        "Output": st.column_config.TextColumn("Output", width="large"),
    },
)

File Upload and Download

input_file = st.file_uploader(
    label="Import Input Data from JSON",
    type="json",
    key="input_file",
    on_change=import_input_data_from_json
)

export_button = st.button(
    "Export Input Data to JSON", on_click=export_input_data_to_json
)

4. State Management

State Management Approach

Streamlit uses a session state to manage the state of the application across reruns. The session state allows developers to persist variables across reruns, enabling more complex interactions and workflows.

Example State Management

Initializing Session State

if 'input_data' not in st.session_state:
    st.session_state.input_data = pd.DataFrame(columns=["Input", "Output"])

Updating Session State

def update_description_output_text():
    input_json = package_input_data()
    result = generate_description_only(input_json, model_name, temperature)
    st.session_state.description_output_text = result[0]
    st.session_state.suggestions = result[1]

5. Routing

Streamlit does not support traditional client-side routing like other frontend frameworks. Instead, it focuses on creating single-page applications where the URL does not change. Navigation within a Streamlit app is typically handled through the sidebar or buttons that trigger reruns of the script.

6. Data Fetching

Data Fetching Methods

Data fetching in Streamlit is often done through direct API calls within the script. The fetched data is then processed and displayed within the app.

Example Data Fetching

def process_json(input_json, model_name, generating_batch_size, temperature):
    try:
        model = ChatOpenAI(
            model=model_name, temperature=temperature, max_retries=3)
        generator = TaskDescriptionGenerator(model)
        result = generator.process(input_json, generating_batch_size)
        # Data processing logic...
        return description, suggestions, examples_directly, input_analysis, new_example_briefs, examples_from_briefs, examples
    except Exception as e:
        st.warning(f"An error occurred: {str(e)}. Returning default values.")
        return "", [], [], "", [], [], []

7. Styling

Styling Approaches

Streamlit provides basic styling options through its API, such as st.markdown for custom HTML and CSS, and st.sidebar for organizing content. For more advanced styling, custom CSS can be injected using st.markdown with HTML tags.

Example Styling

st.title("LLM Task Example Generator")
st.markdown("Enter input-output pairs in the table below to generate a task description, analysis, and additional examples.")

8. Performance Optimization

Optimization Techniques

  • Code Splitting: Not applicable in Streamlit as it is a single-page application framework.
  • Lazy Loading: Not directly supported; however, conditional rendering can be used to load components only when needed.
  • Memoization: Use Streamlit's @st.cache decorator to cache expensive computations.

Example Optimization

@st.cache
def process_json(input_json, model_name, generating_batch_size, temperature):
    # Expensive computation...
    return result

9. Testing

Testing Methodologies

Streamlit applications can be tested using traditional Python testing frameworks like unittest and pytest. Integration and end-to-end tests can be challenging due to the nature of Streamlit's rerun mechanism.

Example Testing

import unittest
from your_streamlit_app import process_json

class TestProcessJson(unittest.TestCase):
    def test_process_json(self):
        input_json = '{"key": "value"}'
        result = process_json(input_json, "model_name", 3, 0.5)
        self.assertEqual(result[0], "expected_description")

if __name__ == "__main__":
    unittest.main()

10. Best Practices and Common Pitfalls

Best Practices

  • Modular Code: Organize code into reusable functions and modules.
  • Session State Management: Use session state effectively to manage application state.
  • Error Handling: Implement robust error handling to provide a smooth user experience.
  • Performance Optimization: Use caching and efficient data handling to optimize performance.

Common Pitfalls

  • Overuse of Reruns: Avoid triggering unnecessary reruns, which can degrade performance.
  • Complex State Management: Be cautious with complex state management, as it can lead to bugs and unexpected behavior.
  • Lack of Testing: Neglecting testing can lead to issues that are hard to debug in a rapidly changing environment.

Conclusion

This guideline provides a comprehensive overview of using Streamlit within a project, covering component structure, UI operations, state management, data fetching, styling, performance optimization, testing, and best practices. By following these guidelines, developers can create efficient, maintainable, and user-friendly Streamlit applications.