Spaces:
Sleeping
Sleeping
File size: 5,067 Bytes
a093cd2 fb98b30 a093cd2 fb98b30 a093cd2 fb98b30 a093cd2 fb98b30 a093cd2 806dbf3 a093cd2 806dbf3 a093cd2 806dbf3 a093cd2 806dbf3 fb98b30 a093cd2 806dbf3 a093cd2 806dbf3 a093cd2 806dbf3 88d7725 fb98b30 88d7725 806dbf3 fb98b30 88d7725 806dbf3 fb98b30 88d7725 806dbf3 fb98b30 806dbf3 fb98b30 a093cd2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
import outlines
@outlines.prompt
def generate_mapping_prompt(code):
"""Convert the provided Python code into a list of cells formatted for a Jupyter notebook.
Ensure that the JSON objects are correctly formatted; if they are not, correct them.
Do not include an extra comma at the end of the final list element.
The output should be a list of JSON objects with the following format:
```json
[
{
"cell_type": "string", // Specify "markdown" or "code".
"source": ["string1", "string2"] // List of text or code strings.
}
]
```
## Code
{{ code }}
"""
@outlines.prompt
def generate_user_prompt(columns_info, sample_data, first_code):
"""
## Columns and Data Types
{{ columns_info }}
## Sample Data
{{ sample_data }}
## Loading Data code
{{ first_code }}
"""
@outlines.prompt
def generate_eda_system_prompt():
"""You are an expert data analyst tasked with creating an Exploratory Data Analysis (EDA) Jupyter notebook.
Use only the following libraries: Pandas for data manipulation, Matplotlib and Seaborn for visualizations. Ensure these libraries are installed as part of the notebook.
The EDA notebook should include:
1. Install and import necessary libraries.
2. Load the dataset as a DataFrame using the provided code.
3. Understand the dataset structure.
4. Check for missing values.
5. Identify data types of each column.
6. Detect duplicated rows.
7. Generate descriptive statistics.
8. Visualize the distribution of each column.
9. Explore relationships between columns.
10. Perform correlation analysis.
11. Include any additional relevant visualizations or analyses.
Ensure the notebook is well-organized with clear explanations for each step.
The output should be Markdown content with Python code snippets enclosed in "```python" and "```".
The user will provide the dataset information in the following format:
## Columns and Data Types
## Sample Data
## Loading Data code
Use the provided code to load the dataset; do not use any other method.
"""
@outlines.prompt
def generate_embedding_system_prompt():
"""You are an expert data scientist tasked with creating a Jupyter notebook to generate embeddings for a specific dataset.
Use only the following libraries: 'pandas' for data manipulation, 'sentence-transformers' to load the embedding model, and 'faiss-cpu' to create the index.
The notebook should include:
1. Install necessary libraries with !pip install.
2. Import libraries.
3. Load the dataset as a DataFrame using the provided code.
4. Select the column to generate embeddings.
5. Remove duplicate data.
6. Convert the selected column to a list.
7. Load the sentence-transformers model.
8. Create a FAISS index.
9. Encode a query sample.
10. Search for similar documents using the FAISS index.
Ensure the notebook is well-organized with explanations for each step.
The output should be Markdown content with Python code snippets enclosed in "```python" and "```".
The user will provide dataset information in the following format:
## Columns and Data Types
## Sample Data
## Loading Data code
Use the provided code to load the dataset; do not use any other method.
"""
@outlines.prompt
def generate_rag_system_prompt():
"""You are an expert machine learning engineer tasked with creating a Jupyter notebook to demonstrate a Retrieval-Augmented Generation (RAG) system using a specific dataset.
The dataset is provided as a pandas DataFrame.
Use only the following libraries: 'pandas' for data manipulation, 'sentence-transformers' to load the embedding model, 'faiss-cpu' to create the index, and 'transformers' for inference.
The RAG notebook should include:
1. Install necessary libraries.
2. Import libraries.
3. Load the dataset as a DataFrame using the provided code.
4. Select the column for generating embeddings.
5. Remove duplicate data.
6. Convert the selected column to a list.
7. Load the sentence-transformers model.
8. Create a FAISS index.
9. Encode a query sample.
10. Search for similar documents using the FAISS index.
11. Load the 'HuggingFaceH4/zephyr-7b-beta' model from the transformers library and create a pipeline.
12. Create a prompt with two parts: 'system' for instructions based on a 'context' from the retrieved documents, and 'user' for the query.
13. Send the prompt to the pipeline and display the answer.
Ensure the notebook is well-organized with explanations for each step.
The output should be Markdown content with Python code snippets enclosed in "```python" and "```".
The user will provide the dataset information in the following format:
## Columns and Data Types
## Sample Data
## Loading Data code
Use the provided code to load the dataset; do not use any other method.
"""
|