saman shrestha commited on
Commit
da04e19
·
1 Parent(s): 3b0d3c2

initial commit

Browse files
.env.sample ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ PORT=5000
2
+ SECRET_KEY=dfjifd
3
+ SQLALCHEMY_DATABASE_URI=sqlite:///./db.sqlite
4
+ GROQ_API_KEY=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
5
+ GROQ_MODEL=llama3-8b-8192
.gitignore ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ignore all env directories
2
+ env/
3
+ venv/
4
+ .env/
5
+ .venv/
6
+
7
+ # Ignore environment-related files
8
+ *.env
9
+ .envrc
10
+
11
+ # Ignore Python virtual environment files
12
+ pyvenv.cfg
13
+ # Ignore Python bytecode files
14
+ __pycache__/
15
+ *.py[cod]
16
+ *$py.class
17
+
18
+ # Ignore Python distribution / packaging
19
+ .Python
20
+ build/
21
+ develop-eggs/
22
+ dist/
23
+ downloads/
24
+ eggs/
25
+ .eggs/
26
+ lib/
27
+ lib64/
28
+ parts/
29
+ sdist/
30
+ var/
31
+ wheels/
32
+ share/python-wheels/
33
+ *.egg-info/
34
+ .installed.cfg
35
+ *.egg
36
+ MANIFEST
37
+
38
+ # Ignore pip logs
39
+ pip-log.txt
40
+ pip-delete-this-directory.txt
41
+
42
+ # Ignore Python testing
43
+ .tox/
44
+ .coverage
45
+ .coverage.*
46
+ .cache
47
+ nosetests.xml
48
+ coverage.xml
49
+ *.cover
50
+ *.py,cover
51
+ .hypothesis/
52
+ .pytest_cache/
53
+ cover/
54
+
55
+ # Ignore Jupyter Notebook
56
+ .ipynb_checkpoints
57
+
58
+ # Ignore IPython
59
+ profile_default/
60
+ ipython_config.py
61
+
62
+ # Ignore mypy
63
+ .mypy_cache/
64
+ .dmypy.json
65
+ dmypy.json
66
+
67
+ # Ignore Pylint
68
+ .pylintrc
69
+
70
+ # Ignore Python rope project settings
71
+ .ropeproject
72
+
73
+ # Ignore mkdocs documentation
74
+ /site
75
+
76
+ # Ignore Sphinx documentation
77
+ docs/_build/
Dockerfile ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ COPY ./requirements/prod.txt requirements.txt
6
+ RUN pip install -r requirements.txt
7
+
8
+ COPY . .
9
+
10
+ CMD ["flask", "run", "--host=0.0.0.0", "--port=5000"]
docker-compose.yml ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3'
2
+
3
+ services:
4
+ web:
5
+ build: .
6
+ ports:
7
+ - "5000:5000"
8
+ volumes:
9
+ - ./app:/app
10
+ environment:
11
+ - FLASK_ENV=development
12
+ redis:
13
+ image: "redis:alpine"
instance/db.sqlite ADDED
Binary file (16.4 kB). View file
 
prompts/base_chatbot_prompts.txt ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an advanced AI agent specialized in web scraping, content analysis, and question answering, with a particular focus on e-commerce and product information. Your primary functions include:
2
+
3
+ 1. Web Scraping:
4
+ - Thoroughly examine and extract all relevant information from provided web pages.
5
+ - Collect data from various elements including text, images, links, forms, metadata, and embedded media.
6
+ - Pay special attention to product-related information such as prices, descriptions, specifications, and availability.
7
+
8
+ 2. Content Analysis:
9
+ - Analyze the extracted content to understand the purpose, type, and main topics of the website.
10
+ - Identify key information, patterns, and insights within the scraped data, particularly for product-related content.
11
+ - Categorize the website (e.g., e-commerce, blog, news, portfolio, business) based on its content and features.
12
+
13
+ 3. Question Answering:
14
+ - Carefully interpret and understand user questions or requests, especially those related to products and pricing.
15
+ - Use your comprehensive analysis of the scraped content to formulate accurate and relevant answers.
16
+ - If the exact information is not available, use your analytical skills to infer or provide the most closely related information.
17
+
18
+ When a user presents a question, follow these steps:
19
+
20
+ 1. Review the scraped and analyzed content relevant to the query, focusing on product details and pricing if applicable.
21
+ 2. Identify the most pertinent information that addresses the user's question, including specific product information when relevant.
22
+ 3. Formulate a clear, concise, and informative response based on the available data, ensuring accuracy in product details and pricing.
23
+ 4. If additional context would be helpful, include it in your answer, such as related products or pricing comparisons.
24
+ 5. If the requested information is not directly available, explain this and provide the most relevant alternative information you can find.
25
+
26
+ Always strive for accuracy, relevance, and helpfulness in your responses. Adapt your answering style based on the nature of the website and the user's query. For example:
27
+
28
+ - For e-commerce sites, focus on detailed product information, including:
29
+ * Precise pricing information, including any discounts or special offers
30
+ * Comprehensive product descriptions, features, and specifications
31
+ * Availability status and shipping information
32
+ * Customer reviews and ratings, if available
33
+ - For news sites, prioritize the most recent and relevant articles or updates.
34
+ - For informational sites, extract key facts, definitions, and explanatory content.
35
+
36
+ Your goal is to provide users with precise, valuable information extracted from the web pages, presented in a clear and easily understandable manner. When dealing with product-related queries, aim to be as informative and helpful as a knowledgeable sales assistant, providing all relevant details to assist the user in making informed decisions.
37
+
38
+ Here is question:
39
+ {input}
prompts/base_content_prompts.txt ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an advanced content scraping and analysis agent with a focus on extracting and organizing information in a structured JSON format. Your primary task is to thoroughly examine the provided web page and extract all relevant content, presenting it in a well-organized JSON structure. Your analysis should include, but is not limited to:
2
+
3
+ 1. Main content: Extract the primary textual content, including headings, paragraphs, and lists.
4
+ 2. Metadata: Capture all relevant metadata from the <head> section.
5
+ 3. Media: Identify and list all images, videos, and audio elements.
6
+ 4. Links: Compile all internal and external links.
7
+ 5. Structured data: Extract any schema.org or other structured data present on the page.
8
+ 6. Navigation: Capture the structure of menus and navigation elements.
9
+ 7. Footer content: Extract information typically found in the footer.
10
+ 8. Forms: Document any forms present on the page.
11
+ 9. Comments or user-generated content: If applicable, extract user comments or reviews.
12
+ 10. Pricing information: For e-commerce sites, extract product prices and any discount information.
13
+
14
+ When scraping and analyzing, follow these guidelines:
15
+
16
+ - Extract all relevant information without prioritizing or filtering.
17
+ - Organize the extracted data in a nested JSON format for easy parsing and analysis.
18
+ - Preserve the hierarchical structure of the content where applicable.
19
+ - Include attributes such as classes, IDs, or data attributes that might be useful for further analysis.
20
+ - For text content, preserve formatting indicators (bold, italic, etc.) if possible.
21
+
22
+ Your output should be a valid JSON object with clearly labeled keys and appropriate nesting. For example:
23
+
24
+ {
25
+ "metadata": {
26
+ "title": "Page Title",
27
+ "description": "Meta description content",
28
+ "keywords": ["keyword1", "keyword2"]
29
+ },
30
+ "main_content": {
31
+ "headings": [
32
+ {"level": "h1", "text": "Main Heading"},
33
+ {"level": "h2", "text": "Subheading"}
34
+ ],
35
+ "paragraphs": [
36
+ "First paragraph content...",
37
+ "Second paragraph content..."
38
+ ]
39
+ },
40
+ "media": {
41
+ "images": [
42
+ {"src": "image1.jpg", "alt": "Image description"},
43
+ {"src": "image2.png", "alt": "Another image"}
44
+ ],
45
+ "videos": [
46
+ {"src": "video1.mp4", "type": "video/mp4"}
47
+ ]
48
+ },
49
+ "links": {
50
+ "internal": [
51
+ {"href": "/page1", "text": "Link to Page 1"},
52
+ {"href": "/page2", "text": "Link to Page 2"}
53
+ ],
54
+ "external": [
55
+ {"href": "https://example.com", "text": "External Link"}
56
+ ]
57
+ },
58
+ "structured_data": {
59
+ // Any schema.org or other structured data found
60
+ },
61
+ "navigation": {
62
+ "menu_items": [
63
+ {"text": "Home", "href": "/"},
64
+ {"text": "About", "href": "/about"}
65
+ ]
66
+ },
67
+ "footer": {
68
+ "copyright": "© 2023 Company Name",
69
+ "social_links": [
70
+ {"platform": "Facebook", "url": "https://facebook.com/company"}
71
+ ]
72
+ },
73
+ "forms": [
74
+ {
75
+ "id": "contact_form",
76
+ "action": "/submit",
77
+ "method": "POST",
78
+ "fields": [
79
+ {"name": "email", "type": "email"},
80
+ {"name": "message", "type": "textarea"}
81
+ ]
82
+ }
83
+ ]
84
+ }
85
+
86
+ Be prepared to adjust the structure of your JSON output based on the specific content and layout of the web page you are analyzing. Your goal is to provide a comprehensive, well-organized representation of the page's content that can be easily processed and analyzed programmatically.
prompts/base_prompts.txt ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an expert web page analyzer with exceptional attention to detail. Your task is to thoroughly examine and extract all useful information from the HTML page presented to you. This includes, but is not limited to:
2
+
3
+ 1. All textual content, including headings, paragraphs, lists, and any hidden text
4
+ 2. Images and their alt text
5
+ 3. Links and their anchor text
6
+ 4. Forms and their input fields
7
+ 5. Tables and their contents
8
+ 6. Navigation menus
9
+ 7. Footer information
10
+ 8. Embedded media such as videos or audio players
11
+ 9. Metadata in the <head> section, including title, description, and keywords
12
+ 10. CSS classes and IDs
13
+ 11. Any visible JavaScript functionality or dynamic content
14
+ 12. Error messages or notifications
15
+ 13. Accessibility features such as ARIA labels and roles
16
+ 14. Any third-party widgets or embedded content
17
+ 15. Code snippets or examples present on the page
18
+
19
+ Extract and memorize all of this information without prioritizing or filtering. When answering questions about the page, provide detailed and accurate responses based on the extracted content. Do not overlook any aspect of the page, no matter how small or seemingly insignificant.
20
+
21
+ Be prepared to present the extracted information in various formats as requested, such as:
22
+
23
+ 16. Raw text extraction: Provide all textual content from the page, including headings, paragraphs, lists, and any other text elements.
24
+ 17. Structured data: Extract any structured data present on the page, such as JSON-LD, microdata, or RDFa.
25
+ 18. Code snippets: Identify and extract any code examples or snippets present on the page.
26
+ 19. Headings hierarchy: List all headings (h1, h2, h3, etc.) in their hierarchical order.
27
+ 20. Link inventory: Compile a list of all links on the page, including their anchor text and destinations.
28
+ 21. Image catalog: Create a list of all images on the page, including their src attributes and alt text.
29
+ 22. Form details: Provide information about any forms on the page, including their input fields and submission methods.
30
+ 23. Embedded media: List any embedded videos, audio players, or other media elements.
31
+ 24. Metadata summary: Compile all metadata from the page's <head> section.
32
+ 25. Script and style references: List all external script and stylesheet references.
33
+
34
+ When asked about the page content, provide comprehensive and detailed responses based on the extracted information. If asked about something that is not present on the page, clearly state that the information or element is not found. Your responses should always be based on the actual content and structure of the page you have analyzed, without making assumptions or guesses.
35
+
36
+ Additionally, categorize the type of website based on the content and structure you've analyzed. Consider the following categories:
37
+
38
+ 26. Blog: Look for regular posts, dates, author information, and commenting systems.
39
+ 27. News: Check for time-sensitive articles, breaking news sections, and journalist bylines.
40
+ 28. E-commerce store: Identify product listings, prices, shopping carts, and checkout processes.
41
+ 29. Portfolio: Look for showcases of work, projects, or artistic creations.
42
+ 30. Business website: Identify company information, services offered, and contact details.
43
+ 31. Educational: Look for course listings, learning materials, and student resources.
44
+ 32. Social media: Identify user profiles, friend/follower systems, and user-generated content.
45
+ 33. Forum or community: Look for discussion threads, user posts, and member profiles.
46
+ 34. Government or institutional: Identify official seals, public service information, and formal language.
47
+ 35. Personal website: Look for biographical information and personal content.
48
+
49
+ For e-commerce stores, pay special attention to:
50
+ 36. Number of products listed
51
+ 37. Product categories and subcategories
52
+ 38. Price ranges
53
+ 39. Special offers or discounts
54
+ 40. Customer review systems
55
+ 41. Product search and filtering options
56
+
57
+ Provide a clear categorization based on the most prominent features of the website, and include relevant details that support your classification.
prompts/base_seo_prompts.txt ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are a highly successful SEO expert with a proven track record of improving website rankings and visibility. Your expertise encompasses:
2
+
3
+ 1. Deep understanding of search engine algorithms and ranking factors
4
+ 2. Mastery of on-page and technical SEO optimization techniques
5
+ 3. Proficiency in keyword research and content strategy
6
+ 4. Experience with link building and off-page SEO tactics
7
+ 5. Analytical skills for interpreting SEO data and metrics
8
+ 6. Ability to adapt strategies to evolving search engine guidelines
9
+
10
+ You excel at extracting and analyzing SEO-relevant information from web pages. When presented with a URL or HTML content, follow these steps to provide a comprehensive SEO analysis:
11
+
12
+ 1. Examine the URL structure for SEO best practices
13
+ 2. Extract and evaluate the following on-page elements:
14
+ - Title tag (content and length)
15
+ - Meta description (content and length)
16
+ - Header tags (H1, H2, H3, etc.) and their hierarchy
17
+ - Image alt text and file names
18
+ - Internal and external links, including anchor text
19
+ - Keyword usage and density in content
20
+ - Schema markup and structured data
21
+ - Canonical tags
22
+ - Robots meta tags
23
+ - XML sitemap (presence and structure)
24
+ - Social media meta tags (Open Graph, Twitter Cards)
25
+
26
+ 3. Assess technical SEO factors:
27
+ - Page load speed
28
+ - Mobile-friendliness
29
+ - Crawlability and indexability
30
+ - HTTPS implementation
31
+
32
+ 4. Analyze content quality and relevance to target keywords
33
+
34
+ 5. Evaluate the overall site structure and information architecture
35
+
36
+ 6. Identify potential SEO issues and opportunities for improvement
37
+
38
+ 7. Provide actionable recommendations based on your findings
39
+
40
+ When answering questions about a page's SEO, offer detailed, data-driven insights and practical solutions. Your responses should demonstrate:
41
+
42
+ - A strategic approach to SEO optimization
43
+ - Balancing short-term tactics with long-term SEO goals
44
+ - Understanding of user intent and search behavior
45
+ - Awareness of industry trends and algorithm updates
46
+ - Ability to prioritize SEO tasks for maximum impact
47
+
48
+ Be prepared to present your analysis in various formats, such as:
49
+
50
+ - Comprehensive SEO audit reports
51
+ - Technical SEO checklists
52
+ - Content optimization recommendations
53
+ - Competitor SEO comparisons
54
+ - Action plans for improving search rankings
55
+
56
+ Your goal is to provide clear, actionable advice that will significantly improve a website's search engine visibility and organic traffic. Always consider the specific needs of the website's industry and target audience when offering SEO recommendations.
57
+
58
+ Here is the question:
59
+ {input}
requirements/local.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ flask
2
+ flask-sqlalchemy
3
+ flask-bcrypt
4
+ python-dotenv
5
+ flask-session
6
+ redis
7
+ requests
8
+ beautifulsoup4
requirements/prod.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ flask
2
+ flask-sqlalchemy
3
+ flask-bcrypt
4
+ python-dotenv
5
+ flask-session
6
+ redis
7
+ requests
8
+ beautifulsoup4
src/config.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dotenv import load_dotenv
2
+ import os
3
+ import redis
4
+
5
+ load_dotenv()
6
+
7
+ class ApplicationConfig:
8
+ SECRET_KEY = os.environ.get('SECRET_KEY')
9
+ SQLALCHEMY_DATABASE_URI = os.environ.get('SQLALCHEMY_DATABASE_URI')
10
+ SQLALCHEMY_TRACK_MODIFICATIONS = False
11
+ SQLALCHEMY_ECHO = True
12
+
13
+ SESSION_TYPE = 'redis'
14
+ SESSION_REDIS = redis.from_url('redis://localhost:6379')
15
+ SESSION_PERMANENT = False
16
+ SESSION_USE_SIGNER = True
17
+ # SESSION_COOKIE_SECURE = True
18
+ # SESSION_COOKIE_HTTPONLY = True
19
+ # SESSION_COOKIE_SAMESITE = 'None'
20
+ # SESSION_COOKIE_DOMAIN = None
21
+ # SESSION_COOKIE_PATH = '/'
22
+
src/helpers/GROQ.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from groq import Groq
2
+ from langchain_groq import ChatGroq
3
+ from langchain_core.prompts import (
4
+ ChatPromptTemplate,
5
+ HumanMessagePromptTemplate,
6
+ MessagesPlaceholder,
7
+ )
8
+ from langchain.chains import LLMChain, SequentialChain
9
+ from langchain_core.messages import SystemMessage
10
+ from langchain.chains.conversation.memory import ConversationBufferWindowMemory
11
+ from typing import Dict, Optional
12
+ from langchain.chains.router import MultiPromptChain
13
+ from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser
14
+ from langchain.prompts import PromptTemplate
15
+ import pandas as pd
16
+ import os
17
+ import json
18
+
19
+ class GROQ:
20
+ def __init__(self, api_key: str = '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'):
21
+ self.client: Groq = Groq(
22
+ api_key=api_key
23
+ )
24
+
25
+ def chat(self, prompt: str, model: str, response_format: Optional[Dict]) -> str:
26
+ completion = self.client.chat.completions.create(
27
+ model=model, messages=[{"role": "user", "content": prompt}], response_format=response_format)
28
+
29
+ return completion.choices[0].message.content
30
+
31
+
32
+ def get_summarization(self, user_question: str, df: pd.DataFrame, model: str) -> str:
33
+ """
34
+ This function generates a summarization prompt based on the user's question and the resulting data.
35
+ It then sends this summarization prompt to the Groq API and retrieves the AI's response.
36
+
37
+ Parameters:
38
+ client (Groqcloud): The Groq API client.
39
+ user_question (str): The user's question.
40
+ model (str): The AI model to use for the response.
41
+
42
+
43
+ Returns:
44
+ str: The content of the AI's response to the summarization prompt.
45
+ """
46
+ prompt = '''
47
+ {user_question}
48
+ '''.format(user_question = user_question)
49
+ # Response format is set to 'None'
50
+ return self.chat(prompt,model,None)
51
+
52
+
53
+ class ConversationGROQ:
54
+ def __init__(self, conversational_memory_length: int = 10, api_key: str = os.getenv('GROQ_API_KEY'), model: str = os.getenv('GROQ_MODEL')):
55
+ self.client: ChatGroq = ChatGroq(
56
+ groq_api_key=api_key,
57
+ model=model
58
+ )
59
+ self.memory: ConversationBufferWindowMemory = ConversationBufferWindowMemory(k=conversational_memory_length, memory_key="chat_history", return_messages=True)
60
+ self.conversation: Optional[LLMChain] = None
61
+
62
+ def sequential_chain(self, llm: ChatGroq, prompt_sequences: list[Dict[str, str]], input_variable: list[str], output_variable: list[str]):
63
+ """
64
+ This function creates a sequential chain of LLM chains based on the provided prompt sequences, input variables, and output variables.
65
+
66
+ Parameters:
67
+ llm (ChatGroq): The Groq API client.
68
+ prompt_sequences (list[Dict[str, str]]): A list of dictionaries containing the prompt and output key for each sequence.
69
+ input_variable (list[str]): A list of input variables for the overall chain.
70
+ output_variable (list[str]): A list of output variables for the overall chain.
71
+ Example:
72
+ prompt_sequences = [
73
+ {'prompt': 'You are a helpful assistant.{input} Answer the user\'s question. {user_input}', 'output_key': 'prompt1'},
74
+ {'prompt': 'You are a helpful assistant. Answer the user\'s question. {user_input}', 'output_key': 'prompt2'},
75
+ {'prompt': 'You are a helpful assistant. Answer the user\'s question. {user_input}', 'output_key': 'final'}
76
+ ]
77
+ input_variable = ['input']
78
+ output_variable = ['prompt1', 'prompt2', 'final']
79
+
80
+ Returns:
81
+ SequentialChain: An overall chain that combines all the individual chains.
82
+ """
83
+ chains = []
84
+ for sequence in prompt_sequences:
85
+ prompt = sequence['prompt']
86
+ output_key = sequence['output_key']
87
+ template = ChatPromptTemplate.from_template(prompt)
88
+ chain = LLMChain(llm=llm or self.client, prompt=template, output_key=output_key)
89
+ chains.append(chain)
90
+ overall_chain = SequentialChain(
91
+ chains=chains,
92
+ input_variables=input_variable,
93
+ output_variables=output_variable,
94
+ verbose=True
95
+ )
96
+ return overall_chain
97
+
98
+
99
+ def create_router_chain(self, templates_prompts: list[Dict[str, str]], llm: Optional[ChatGroq] = None):
100
+ MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
101
+ language model select the model prompt best suited for the input. \
102
+ You will be given the names of the available prompts and a \
103
+ description of what the prompt is best suited for. \
104
+ You may also revise the original input if you think that revising\
105
+ it will ultimately lead to a better response from the language model.
106
+
107
+ << FORMATTING >>
108
+ Return a markdown code snippet with a JSON object formatted to look like:
109
+ ```json
110
+ {{{{
111
+ "destination": string \ name of the prompt to use or "DEFAULT"
112
+ "next_inputs": string \ a potentially modified version of the original input
113
+ }}}}
114
+ ```
115
+
116
+ REMEMBER: "destination" MUST be one of the candidate prompt \
117
+ names specified below OR it can be "DEFAULT" if the input is not\
118
+ well suited for any of the candidate prompts.
119
+ REMEMBER: "next_inputs" can just be the original input \
120
+ if you don't think any modifications are needed.
121
+
122
+ << CANDIDATE PROMPTS >>
123
+ {destinations}
124
+
125
+ << INPUT >>
126
+ {{input}}
127
+
128
+ << OUTPUT (remember to include the ```json)>>"""
129
+ destination_chains = {}
130
+ for template in templates_prompts:
131
+ destination_chains[template['name']] = LLMChain(llm=llm or self.client, memory=self.memory, prompt=ChatPromptTemplate.from_template(template= template['prompt_template']))
132
+ destinations = [f"{template['name']}: {template['description']}" for template in templates_prompts]
133
+ destinations_str = "\n".join(destinations)
134
+ router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(destinations=destinations_str)
135
+ router_prompt = PromptTemplate(
136
+ template=router_template,
137
+ input_variables=["input"],
138
+ output_parser=RouterOutputParser(),
139
+ )
140
+ default_prompt = ChatPromptTemplate.from_template("{input}")
141
+ default_chain = LLMChain(llm=llm or self.client, memory=self.memory, prompt=default_prompt)
142
+ router_chain = LLMRouterChain.from_llm(llm or self.client, router_prompt)
143
+ chain = MultiPromptChain(router_chain=router_chain,
144
+ destination_chains=destination_chains,
145
+ default_chain=default_chain, verbose=True
146
+ )
147
+ return chain
148
+
149
+ def get_conditional_template(self, input: str, categories: list[Dict[str, str]]) -> ChatPromptTemplate:
150
+ MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
151
+ language model select the model prompt best suited for the input. \
152
+ You will be given the names of the available prompts and a \
153
+ description of what the prompt is best suited for. \
154
+ You may also revise the original input if you think that revising\
155
+ it will ultimately lead to a better response from the language model.
156
+
157
+ << FORMATTING >>
158
+ Return a markdown code snippet with a JSON object formatted to look like:
159
+ ```json
160
+ {{{{
161
+ "destination": string \ name of the prompt to use or "DEFAULT"
162
+ "next_inputs": string \ a potentially modified version of the original input
163
+ }}}}
164
+ ```
165
+
166
+ REMEMBER: "destination" MUST be one of the candidate prompt \
167
+ names specified below OR it can be "DEFAULT" if the input is not\
168
+ well suited for any of the candidate prompts.
169
+ REMEMBER: "next_inputs" can just be the original input \
170
+ if you don't think any modifications are needed.
171
+
172
+ << CANDIDATE PROMPTS >>
173
+ {destinations}
174
+
175
+ << INPUT >>
176
+ {input}
177
+
178
+ << OUTPUT (remember to include the ```json)>>""".format(destinations = "\n".join([f"{template['name']}: {template['description']}" for template in categories]), input = input)
179
+
180
+ router_prompt = PromptTemplate(
181
+ template=MULTI_PROMPT_ROUTER_TEMPLATE,
182
+ input_variables=["input"],
183
+ )
184
+
185
+ response = LLMChain(llm=self.client, prompt=router_prompt).predict(input = input)
186
+
187
+ json_str = response.split('```json')[1].split('```')[0].strip()
188
+ return json.loads(json_str)
189
+
190
+ def create_template(self, base_prompt: str) -> ChatPromptTemplate:
191
+ return ChatPromptTemplate.from_messages([
192
+ SystemMessage(
193
+ content=base_prompt
194
+ ), # This is the persistent system prompt that is always included at the start of the chat.
195
+
196
+ MessagesPlaceholder(
197
+ variable_name="chat_history"
198
+ ), # This placeholder will be replaced by the actual chat history during the conversation. It helps in maintaining context.
199
+
200
+ HumanMessagePromptTemplate.from_template(
201
+ "{human_input}"
202
+ ), # This template is where the user's current input will be injected into the prompt.
203
+ ])
204
+
205
+ def create_conversation(self, prompt: str = None, llm = None, memory = None, verbose: bool = True):
206
+ self.conversation = LLMChain(
207
+ llm=llm or self.client,
208
+ memory=memory or self.memory,
209
+ prompt=self.create_template(prompt) if prompt else None,
210
+ verbose=verbose
211
+ )
212
+ return self.conversation
213
+
214
+ def chat(self, user_input: str) -> str:
215
+ if self.conversation is None:
216
+ raise ValueError("Conversation not initialized. Call create_conversation() first.")
217
+ return self.conversation.predict(human_input =user_input)
src/helpers/README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GROQ and ConversationGROQ Classes
2
+
3
+ This module provides two main classes for interacting with the Groq API: `GROQ` and `ConversationGROQ`. These classes offer various functionalities for chat completions, summarization, and creating conversation chains.
4
+
5
+ ## GROQ Class
6
+
7
+ The `GROQ` class provides basic functionality for interacting with the Groq API.
8
+
9
+ ### Methods
10
+
11
+ #### `__init__(self, api_key: str = '<your groq api key here'')`
12
+
13
+ Initializes the GROQ class with the provided API key.
14
+
15
+ #### `chat(self, prompt: str, model: str, response_format: Optional[Dict]) -> str`
16
+
17
+ Sends a chat completion request to the Groq API.
18
+
19
+ - `prompt`: The input prompt for the chat completion.
20
+ - `model`: The AI model to use.
21
+ - `response_format`: Optional response format configuration.
22
+
23
+ Returns the content of the AI's response.
24
+
25
+ #### `get_summarization(self, user_question: str, df: pd.DataFrame, model: str) -> str`
26
+
27
+ Generates a summarization based on the user's question and the provided data.
28
+
29
+ - `user_question`: The user's question.
30
+ - `df`: A pandas DataFrame containing the data (currently unused in the method).
31
+ - `model`: The AI model to use for the response.
32
+
33
+ Returns the content of the AI's response to the summarization prompt.
34
+
35
+ ## ConversationGROQ Class
36
+
37
+ The `ConversationGROQ` class provides more advanced functionality for creating conversation chains and managing chat history.
38
+
39
+ ### Methods
40
+
41
+ #### `__init__(self, conversational_memory_length: int = 10, api_key: str = os.getenv('GROQ_API_KEY'), model: str = os.getenv('GROQ_MODEL'))`
42
+
43
+ Initializes the ConversationGROQ class with the specified parameters.
44
+
45
+ #### `sequential_chain(self, llm: ChatGroq, prompt_sequences: list[Dict[str, str]], input_variable: list[str], output_variable: list[str])`
46
+
47
+ Creates a sequential chain of LLM chains based on the provided prompt sequences, input variables, and output variables.
48
+
49
+ #### `create_router_chain(self, templates_prompts: list[Dict[str, str]], llm: Optional[ChatGroq] = None)`
50
+
51
+ Creates a router chain for selecting the best-suited prompt based on the input.
52
+
53
+ #### `get_conditional_template(self, input: str, categories: list[Dict[str, str]]) -> ChatPromptTemplate`
54
+
55
+ Selects the best-suited prompt template based on the input and provided categories.
56
+
57
+ #### `create_template(self, base_prompt: str) -> ChatPromptTemplate`
58
+
59
+ Creates a chat prompt template with the given base prompt.
60
+
61
+ #### `create_conversation(self, prompt: str = None, llm = None, memory = None, verbose: bool = True)`
62
+
63
+ Initializes a conversation chain with the specified parameters.
64
+
65
+ #### `chat(self, user_input: str) -> str`
66
+
67
+ Sends a user input to the conversation chain and returns the AI's response.
68
+
69
+ ## Usage
70
+
71
+ To use these classes, you need to have the Groq API key and the required dependencies installed. Make sure to set up the necessary environment variables or provide the API key directly when initializing the classes.
72
+
73
+ Example usage:
src/helpers/Scrapy.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import requests
3
+ from bs4 import BeautifulSoup
4
+ from helpers.GROQ import ConversationGROQ
5
+
6
+ class Scrapper:
7
+ def __init__(self, url: str, groq_instance: ConversationGROQ):
8
+ self.url = url
9
+ self.groq_instance = groq_instance
10
+
11
+ def scrape(self):
12
+ response = requests.get(self.url)
13
+ response.raise_for_status
14
+ return response.content
15
+
16
+ def parse(self, content: str):
17
+ soup = BeautifulSoup(content, 'html.parser')
18
+ return ' '.join(soup.stripped_strings)
19
+
20
+ def compress(self, content: str):
21
+ return ' '.join(content.split())
22
+
23
+ def truncate(self, content: str):
24
+ return content[:1000] + '...' if len(content) > 1000 else content
25
+
26
+ def analyze(self, content: str):
27
+ prompt = """
28
+ Analyze the following HTML content with exceptional precision and depth:
29
+ {content}
30
+ """
31
+ response = self.groq_instance.chat(prompt.format(content=content))
32
+ return response
33
+
34
+ def extract(self, content: str):
35
+ prompt = """
36
+ Extract the following structured data from the HTML content:
37
+
38
+ {content}
39
+ 1. JSON representation: Extract key information and structure it in JSON format.
40
+ 2. Table extraction: Identify and extract any tables, presenting them in JSON format.
41
+ 3. List compilation: Extract and present lists from the content in JSON format.
42
+ 4. Key-value pair extraction: Identify and extract key-value pairs, presenting them in JSON format.
43
+ 5. Numerical data analysis: Extract and present numerical data in JSON format.
44
+ 6. Entity recognition: Identify and categorize named entities, presenting them in JSON format.
45
+ 7. Sentiment analysis: Assess overall tone and sentiment, presenting results in JSON format.
46
+ 8. Language detection: Identify the primary language and any secondary languages, presenting in JSON format.
47
+ 9. Structured data markup: Extract any structured data present on the page, presenting in JSON format.
48
+ 10. API endpoints: Document any API endpoints referenced, presenting in JSON format.
49
+
50
+ Ensure the extracted data is well-structured and properly formatted in JSON.
51
+ {content}
52
+ Provide the data in JSON format.
53
+ """
54
+ response = self.groq_instance.chat(prompt.format(content=content))
55
+ return response
src/helpers/prompts.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ class PromptManager:
2
+ def __init__(self):
3
+ self.prompts = {}
4
+
5
+ def load_prompt(self, name, file_path):
6
+ with open(file_path, 'r') as file:
7
+ self.prompts[name] = file.read()
8
+
9
+ def get_prompt(self, name):
10
+ return self.prompts.get(name, '')
src/main.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Flask
2
+ from flask_session import Session
3
+ from config import ApplicationConfig
4
+ from models import db
5
+ from routes.auth.index import auth
6
+ from routes.llm.index import llm
7
+
8
+ app = Flask(__name__)
9
+ app.config.from_object(ApplicationConfig)
10
+
11
+ server_session = Session(app)
12
+
13
+ db.init_app(app)
14
+ with app.app_context():
15
+ db.create_all()
16
+
17
+ app.register_blueprint(auth, url_prefix='/auth')
18
+ app.register_blueprint(llm, url_prefix='/llm')
19
+
20
+ if __name__ == '__main__':
21
+ app.run(debug=True)
src/models.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask_sqlalchemy import SQLAlchemy
2
+ from uuid import uuid4
3
+
4
+ db = SQLAlchemy()
5
+
6
+ def get_uuid():
7
+ return uuid4().hex
8
+
9
+
10
+ class UserModal(db.Model):
11
+ __tablename__ = 'users'
12
+ id = db.Column(db.String(32), primary_key=True, unique=True, default=get_uuid)
13
+ email = db.Column(db.String(120), unique=True, nullable=False)
14
+ password = db.Column(db.String(60), nullable=False)
15
+
16
+ def __repr__(self):
17
+ return f"User('{self.username}', '{self.email}')"
18
+
src/routes/auth/index.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, request, jsonify, session
2
+ from models import db, UserModal
3
+ from flask_bcrypt import Bcrypt
4
+
5
+ auth = Blueprint('auth', __name__)
6
+ bcrypt = Bcrypt()
7
+
8
+ @auth.route('/register', methods=['POST'])
9
+ def register():
10
+ data = request.json
11
+ email = data['email']
12
+ password = data['password']
13
+
14
+ userExists = UserModal.query.filter_by(email=email).first() is not None
15
+ if userExists:
16
+ return jsonify({'message': 'User already exists'}), 409
17
+ hash_password = bcrypt.generate_password_hash(password)
18
+ user = UserModal(email=email, password=hash_password)
19
+ db.session.add(user)
20
+ db.session.commit()
21
+
22
+ return jsonify({
23
+ 'id': user.id,
24
+ 'email': user.email
25
+ }), 201
26
+
27
+ @auth.route('/me', methods=['GET'])
28
+ def me():
29
+ user_id = session.get('user_id')
30
+ if not user_id:
31
+ return jsonify({'message': 'Unauthorized'}), 401
32
+ user = UserModal.query.filter_by(id=user_id).first()
33
+ return jsonify({
34
+ 'id': user.id,
35
+ 'email': user.email
36
+ }), 200
37
+
38
+ @auth.route('/login', methods=['POST'])
39
+ def login():
40
+ data = request.json
41
+ email = data['email']
42
+ password = data['password']
43
+
44
+ user = UserModal.query.filter_by(email=email).first()
45
+ if user is None:
46
+ return jsonify({'message': 'Invalid credentials'}), 401
47
+
48
+ if not bcrypt.check_password_hash(user.password, password):
49
+ return jsonify({'message': 'Invalid credentials'}), 401
50
+
51
+ session['user_id'] = user.id
52
+
53
+ return jsonify({
54
+ 'id': user.id,
55
+ }), 200
src/routes/llm/index.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, request, jsonify, session
2
+ from models import UserModal
3
+ from helpers.GROQ import ConversationGROQ
4
+ from helpers.prompts import PromptManager
5
+ from helpers.Scrapy import Scrapper
6
+ import requests
7
+ llm = Blueprint('llm', __name__)
8
+ import json
9
+
10
+ prompt_manager = PromptManager()
11
+ prompt_manager.load_prompt('base_prompt', 'prompts/base_prompts.txt')
12
+ prompt_manager.load_prompt('base_chatbot_prompt', 'prompts/base_chatbot_prompts.txt')
13
+ prompt_manager.load_prompt('base_seo_prompt', 'prompts/base_seo_prompts.txt')
14
+ prompt_manager.load_prompt('base_content_prompt', 'prompts/base_content_prompts.txt')
15
+ base_prompt = prompt_manager.get_prompt('base_prompt')
16
+ base_chatbot_prompt = prompt_manager.get_prompt('base_chatbot_prompt')
17
+ base_seo_prompt = prompt_manager.get_prompt('base_seo_prompt')
18
+ base_content_prompt = prompt_manager.get_prompt('base_content_prompt')
19
+
20
+ groq = ConversationGROQ()
21
+ groq.create_conversation(base_prompt)
22
+
23
+ @llm.route('/analyze', methods=['POST'])
24
+ def analyze():
25
+ data = request.json
26
+ url = data['url']
27
+ # Check user authentication
28
+ user_id = session.get('user_id')
29
+ if not user_id:
30
+ return jsonify({'message': 'Unauthorized'}), 401
31
+
32
+ user = UserModal.query.filter_by(id=user_id).first()
33
+ if not user:
34
+ return jsonify({'message': 'User not found'}), 404
35
+
36
+
37
+ try:
38
+ scrapper = Scrapper(url, groq)
39
+ content = scrapper.scrape()
40
+ content = scrapper.parse(content)
41
+ content = scrapper.compress(content)
42
+ content = scrapper.truncate(content)
43
+ content = scrapper.analyze(content)
44
+ json = scrapper.extract(content)
45
+ return jsonify({'json': json, "descripton": content}), 200
46
+ except requests.RequestException as e:
47
+ return jsonify({'message': f'Error fetching URL: {str(e)}'}), 400
48
+ except Exception as e:
49
+ return jsonify({'message': f'Error processing HTML: {str(e)}'}), 500
50
+
51
+ @llm.route('/chat', methods=['POST'])
52
+ def chat():
53
+ data = request.json
54
+ message = data['message']
55
+ prompt = """
56
+ Analyze the user's input to determine the desired output format.
57
+ If the user requests a specific format (e.g., JSON, Excel), format the response accordingly.
58
+ If no specific format is mentioned, provide a normal view.
59
+
60
+ Your task is to analyze the following user input and provide a short response:
61
+
62
+ {message}
63
+
64
+ If you detect a request for JSON format in the user's input, wrap your response in a JSON structure like this:
65
+ {{"json": "<json_content_here>"}}
66
+
67
+ For other formats (e.g., Excel), indicate the format in your response, but provide the content in a text-based representation.
68
+
69
+ If no specific format is requested, provide a comprehensive analysis in a normal, readable format.
70
+ Analyze the user's input to determine if they're asking for a specific piece of information or a summary/opinion.
71
+ If they're asking for specific information, provide only that information without additional explanation.
72
+ If they're asking for a summary or your view, provide a concise explanation.
73
+ If they're asking for a specific format, provide the response in the requested format.
74
+
75
+ Response should be simple and to the point at most it should be a simple text or a json
76
+
77
+ """
78
+ response = groq.chat(prompt.format(message=message))
79
+ return jsonify({'response': response}), 200