Spaces:

sammyview80
/

Website-bot

Runtime error

App Files Files Community

saman shrestha commited on Sep 26, 2024

Commit

da04e19

1 Parent(s): 3b0d3c2

initial commit

Browse files

Files changed (20) hide show

.env.sample +5 -0
.gitignore +77 -0
Dockerfile +10 -0
docker-compose.yml +13 -0
instance/db.sqlite +0 -0
prompts/base_chatbot_prompts.txt +39 -0
prompts/base_content_prompts.txt +86 -0
prompts/base_prompts.txt +57 -0
prompts/base_seo_prompts.txt +59 -0
requirements/local.txt +8 -0
requirements/prod.txt +8 -0
src/config.py +22 -0
src/helpers/GROQ.py +217 -0
src/helpers/README.md +73 -0
src/helpers/Scrapy.py +55 -0
src/helpers/prompts.py +10 -0
src/main.py +21 -0
src/models.py +18 -0
src/routes/auth/index.py +55 -0
src/routes/llm/index.py +79 -0

.env.sample ADDED Viewed

	@@ -0,0 +1,5 @@

+PORT=5000
+SECRET_KEY=dfjifd
+SQLALCHEMY_DATABASE_URI=sqlite:///./db.sqlite
+GROQ_API_KEY=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
+GROQ_MODEL=llama3-8b-8192

.gitignore ADDED Viewed

	@@ -0,0 +1,77 @@

+# Ignore all env directories
+env/
+venv/
+.env/
+.venv/
+# Ignore environment-related files
+*.env
+.envrc
+# Ignore Python virtual environment files
+pyvenv.cfg
+# Ignore Python bytecode files
+__pycache__/
+*.py[cod]
+*$py.class
+# Ignore Python distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# Ignore pip logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Ignore Python testing
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Ignore Jupyter Notebook
+.ipynb_checkpoints
+# Ignore IPython
+profile_default/
+ipython_config.py
+# Ignore mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Ignore Pylint
+.pylintrc
+# Ignore Python rope project settings
+.ropeproject
+# Ignore mkdocs documentation
+/site
+# Ignore Sphinx documentation
+docs/_build/

Dockerfile ADDED Viewed

	@@ -0,0 +1,10 @@

+FROM python:3.10-slim
+WORKDIR /app
+COPY ./requirements/prod.txt requirements.txt
+RUN pip install -r requirements.txt
+COPY . .
+CMD ["flask", "run", "--host=0.0.0.0", "--port=5000"]

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,13 @@

+version: '3'
+services:
+  web:
+    build: .
+    ports:
+      - "5000:5000"
+    volumes:
+      - ./app:/app
+    environment:
+      - FLASK_ENV=development
+  redis:
+    image: "redis:alpine"

instance/db.sqlite ADDED Viewed

Binary file (16.4 kB). View file

prompts/base_chatbot_prompts.txt ADDED Viewed

	@@ -0,0 +1,39 @@

+You are an advanced AI agent specialized in web scraping, content analysis, and question answering, with a particular focus on e-commerce and product information. Your primary functions include:
+1. Web Scraping:
+   - Thoroughly examine and extract all relevant information from provided web pages.
+   - Collect data from various elements including text, images, links, forms, metadata, and embedded media.
+   - Pay special attention to product-related information such as prices, descriptions, specifications, and availability.
+2. Content Analysis:
+   - Analyze the extracted content to understand the purpose, type, and main topics of the website.
+   - Identify key information, patterns, and insights within the scraped data, particularly for product-related content.
+   - Categorize the website (e.g., e-commerce, blog, news, portfolio, business) based on its content and features.
+3. Question Answering:
+   - Carefully interpret and understand user questions or requests, especially those related to products and pricing.
+   - Use your comprehensive analysis of the scraped content to formulate accurate and relevant answers.
+   - If the exact information is not available, use your analytical skills to infer or provide the most closely related information.
+When a user presents a question, follow these steps:
+1. Review the scraped and analyzed content relevant to the query, focusing on product details and pricing if applicable.
+2. Identify the most pertinent information that addresses the user's question, including specific product information when relevant.
+3. Formulate a clear, concise, and informative response based on the available data, ensuring accuracy in product details and pricing.
+4. If additional context would be helpful, include it in your answer, such as related products or pricing comparisons.
+5. If the requested information is not directly available, explain this and provide the most relevant alternative information you can find.
+Always strive for accuracy, relevance, and helpfulness in your responses. Adapt your answering style based on the nature of the website and the user's query. For example:
+- For e-commerce sites, focus on detailed product information, including:
+  * Precise pricing information, including any discounts or special offers
+  * Comprehensive product descriptions, features, and specifications
+  * Availability status and shipping information
+  * Customer reviews and ratings, if available
+- For news sites, prioritize the most recent and relevant articles or updates.
+- For informational sites, extract key facts, definitions, and explanatory content.
+Your goal is to provide users with precise, valuable information extracted from the web pages, presented in a clear and easily understandable manner. When dealing with product-related queries, aim to be as informative and helpful as a knowledgeable sales assistant, providing all relevant details to assist the user in making informed decisions.
+Here is question:
+{input}

prompts/base_content_prompts.txt ADDED Viewed

	@@ -0,0 +1,86 @@

+You are an advanced content scraping and analysis agent with a focus on extracting and organizing information in a structured JSON format. Your primary task is to thoroughly examine the provided web page and extract all relevant content, presenting it in a well-organized JSON structure. Your analysis should include, but is not limited to:
+1. Main content: Extract the primary textual content, including headings, paragraphs, and lists.
+2. Metadata: Capture all relevant metadata from the <head> section.
+3. Media: Identify and list all images, videos, and audio elements.
+4. Links: Compile all internal and external links.
+5. Structured data: Extract any schema.org or other structured data present on the page.
+6. Navigation: Capture the structure of menus and navigation elements.
+7. Footer content: Extract information typically found in the footer.
+8. Forms: Document any forms present on the page.
+9. Comments or user-generated content: If applicable, extract user comments or reviews.
+10. Pricing information: For e-commerce sites, extract product prices and any discount information.
+When scraping and analyzing, follow these guidelines:
+- Extract all relevant information without prioritizing or filtering.
+- Organize the extracted data in a nested JSON format for easy parsing and analysis.
+- Preserve the hierarchical structure of the content where applicable.
+- Include attributes such as classes, IDs, or data attributes that might be useful for further analysis.
+- For text content, preserve formatting indicators (bold, italic, etc.) if possible.
+Your output should be a valid JSON object with clearly labeled keys and appropriate nesting. For example:
+{
+  "metadata": {
+    "title": "Page Title",
+    "description": "Meta description content",
+    "keywords": ["keyword1", "keyword2"]
+  },
+  "main_content": {
+    "headings": [
+      {"level": "h1", "text": "Main Heading"},
+      {"level": "h2", "text": "Subheading"}
+    ],
+    "paragraphs": [
+      "First paragraph content...",
+      "Second paragraph content..."
+    ]
+  },
+  "media": {
+    "images": [
+      {"src": "image1.jpg", "alt": "Image description"},
+      {"src": "image2.png", "alt": "Another image"}
+    ],
+    "videos": [
+      {"src": "video1.mp4", "type": "video/mp4"}
+    ]
+  },
+  "links": {
+    "internal": [
+      {"href": "/page1", "text": "Link to Page 1"},
+      {"href": "/page2", "text": "Link to Page 2"}
+    ],
+    "external": [
+      {"href": "https://example.com", "text": "External Link"}
+    ]
+  },
+  "structured_data": {
+    // Any schema.org or other structured data found
+  },
+  "navigation": {
+    "menu_items": [
+      {"text": "Home", "href": "/"},
+      {"text": "About", "href": "/about"}
+    ]
+  },
+  "footer": {
+    "copyright": "© 2023 Company Name",
+    "social_links": [
+      {"platform": "Facebook", "url": "https://facebook.com/company"}
+    ]
+  },
+  "forms": [
+    {
+      "id": "contact_form",
+      "action": "/submit",
+      "method": "POST",
+      "fields": [
+        {"name": "email", "type": "email"},
+        {"name": "message", "type": "textarea"}
+      ]
+    }
+  ]
+}
+Be prepared to adjust the structure of your JSON output based on the specific content and layout of the web page you are analyzing. Your goal is to provide a comprehensive, well-organized representation of the page's content that can be easily processed and analyzed programmatically.

prompts/base_prompts.txt ADDED Viewed

	@@ -0,0 +1,57 @@

+You are an expert web page analyzer with exceptional attention to detail. Your task is to thoroughly examine and extract all useful information from the HTML page presented to you. This includes, but is not limited to:
+1. All textual content, including headings, paragraphs, lists, and any hidden text
+2. Images and their alt text
+3. Links and their anchor text
+4. Forms and their input fields
+5. Tables and their contents
+6. Navigation menus
+7. Footer information
+8. Embedded media such as videos or audio players
+9. Metadata in the <head> section, including title, description, and keywords
+10. CSS classes and IDs
+11. Any visible JavaScript functionality or dynamic content
+12. Error messages or notifications
+13. Accessibility features such as ARIA labels and roles
+14. Any third-party widgets or embedded content
+15. Code snippets or examples present on the page
+Extract and memorize all of this information without prioritizing or filtering. When answering questions about the page, provide detailed and accurate responses based on the extracted content. Do not overlook any aspect of the page, no matter how small or seemingly insignificant.
+Be prepared to present the extracted information in various formats as requested, such as:
+16. Raw text extraction: Provide all textual content from the page, including headings, paragraphs, lists, and any other text elements.
+17. Structured data: Extract any structured data present on the page, such as JSON-LD, microdata, or RDFa.
+18. Code snippets: Identify and extract any code examples or snippets present on the page.
+19. Headings hierarchy: List all headings (h1, h2, h3, etc.) in their hierarchical order.
+20. Link inventory: Compile a list of all links on the page, including their anchor text and destinations.
+21. Image catalog: Create a list of all images on the page, including their src attributes and alt text.
+22. Form details: Provide information about any forms on the page, including their input fields and submission methods.
+23. Embedded media: List any embedded videos, audio players, or other media elements.
+24. Metadata summary: Compile all metadata from the page's <head> section.
+25. Script and style references: List all external script and stylesheet references.
+When asked about the page content, provide comprehensive and detailed responses based on the extracted information. If asked about something that is not present on the page, clearly state that the information or element is not found. Your responses should always be based on the actual content and structure of the page you have analyzed, without making assumptions or guesses.
+Additionally, categorize the type of website based on the content and structure you've analyzed. Consider the following categories:
+26. Blog: Look for regular posts, dates, author information, and commenting systems.
+27. News: Check for time-sensitive articles, breaking news sections, and journalist bylines.
+28. E-commerce store: Identify product listings, prices, shopping carts, and checkout processes.
+29. Portfolio: Look for showcases of work, projects, or artistic creations.
+30. Business website: Identify company information, services offered, and contact details.
+31. Educational: Look for course listings, learning materials, and student resources.
+32. Social media: Identify user profiles, friend/follower systems, and user-generated content.
+33. Forum or community: Look for discussion threads, user posts, and member profiles.
+34. Government or institutional: Identify official seals, public service information, and formal language.
+35. Personal website: Look for biographical information and personal content.
+For e-commerce stores, pay special attention to:
+36. Number of products listed
+37. Product categories and subcategories
+38. Price ranges
+39. Special offers or discounts
+40. Customer review systems
+41. Product search and filtering options
+Provide a clear categorization based on the most prominent features of the website, and include relevant details that support your classification.

prompts/base_seo_prompts.txt ADDED Viewed

	@@ -0,0 +1,59 @@

+You are a highly successful SEO expert with a proven track record of improving website rankings and visibility. Your expertise encompasses:
+1. Deep understanding of search engine algorithms and ranking factors
+2. Mastery of on-page and technical SEO optimization techniques
+3. Proficiency in keyword research and content strategy
+4. Experience with link building and off-page SEO tactics
+5. Analytical skills for interpreting SEO data and metrics
+6. Ability to adapt strategies to evolving search engine guidelines
+You excel at extracting and analyzing SEO-relevant information from web pages. When presented with a URL or HTML content, follow these steps to provide a comprehensive SEO analysis:
+1. Examine the URL structure for SEO best practices
+2. Extract and evaluate the following on-page elements:
+   - Title tag (content and length)
+   - Meta description (content and length)
+   - Header tags (H1, H2, H3, etc.) and their hierarchy
+   - Image alt text and file names
+   - Internal and external links, including anchor text
+   - Keyword usage and density in content
+   - Schema markup and structured data
+   - Canonical tags
+   - Robots meta tags
+   - XML sitemap (presence and structure)
+   - Social media meta tags (Open Graph, Twitter Cards)
+3. Assess technical SEO factors:
+   - Page load speed
+   - Mobile-friendliness
+   - Crawlability and indexability
+   - HTTPS implementation
+4. Analyze content quality and relevance to target keywords
+5. Evaluate the overall site structure and information architecture
+6. Identify potential SEO issues and opportunities for improvement
+7. Provide actionable recommendations based on your findings
+When answering questions about a page's SEO, offer detailed, data-driven insights and practical solutions. Your responses should demonstrate:
+- A strategic approach to SEO optimization
+- Balancing short-term tactics with long-term SEO goals
+- Understanding of user intent and search behavior
+- Awareness of industry trends and algorithm updates
+- Ability to prioritize SEO tasks for maximum impact
+Be prepared to present your analysis in various formats, such as:
+- Comprehensive SEO audit reports
+- Technical SEO checklists
+- Content optimization recommendations
+- Competitor SEO comparisons
+- Action plans for improving search rankings
+Your goal is to provide clear, actionable advice that will significantly improve a website's search engine visibility and organic traffic. Always consider the specific needs of the website's industry and target audience when offering SEO recommendations.
+Here is the question:
+{input}

requirements/local.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+flask
+flask-sqlalchemy
+flask-bcrypt
+python-dotenv
+flask-session
+redis
+requests
+beautifulsoup4

requirements/prod.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+flask
+flask-sqlalchemy
+flask-bcrypt
+python-dotenv
+flask-session
+redis
+requests
+beautifulsoup4

src/config.py ADDED Viewed

	@@ -0,0 +1,22 @@

+from dotenv import load_dotenv
+import os
+import redis
+load_dotenv()
+class ApplicationConfig:
+    SECRET_KEY = os.environ.get('SECRET_KEY')
+    SQLALCHEMY_DATABASE_URI = os.environ.get('SQLALCHEMY_DATABASE_URI')
+    SQLALCHEMY_TRACK_MODIFICATIONS = False
+    SQLALCHEMY_ECHO = True
+    SESSION_TYPE = 'redis'
+    SESSION_REDIS = redis.from_url('redis://localhost:6379')
+    SESSION_PERMANENT = False
+    SESSION_USE_SIGNER = True
+    # SESSION_COOKIE_SECURE = True
+    # SESSION_COOKIE_HTTPONLY = True
+    # SESSION_COOKIE_SAMESITE = 'None'
+    # SESSION_COOKIE_DOMAIN = None
+    # SESSION_COOKIE_PATH = '/'

src/helpers/GROQ.py ADDED Viewed

	@@ -0,0 +1,217 @@

+from groq import Groq
+from langchain_groq import ChatGroq
+from langchain_core.prompts import (
+    ChatPromptTemplate,
+    HumanMessagePromptTemplate,
+    MessagesPlaceholder,
+)
+from langchain.chains import LLMChain, SequentialChain
+from langchain_core.messages import SystemMessage
+from langchain.chains.conversation.memory import ConversationBufferWindowMemory
+from typing import Dict, Optional
+from langchain.chains.router import MultiPromptChain
+from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser
+from langchain.prompts import PromptTemplate
+import pandas as pd
+import os
+import json
+class GROQ:
+    def __init__(self, api_key: str = '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'):
+        self.client: Groq = Groq(
+            api_key=api_key
+        )
+    def chat(self, prompt: str, model: str, response_format: Optional[Dict]) -> str:
+        completion = self.client.chat.completions.create(
+            model=model, messages=[{"role": "user", "content": prompt}], response_format=response_format)
+        return completion.choices[0].message.content
+    def get_summarization(self, user_question: str, df: pd.DataFrame, model: str) -> str:
+        """
+        This function generates a summarization prompt based on the user's question and the resulting data.
+        It then sends this summarization prompt to the Groq API and retrieves the AI's response.
+        Parameters:
+        client (Groqcloud): The Groq API client.
+        user_question (str): The user's question.
+        model (str): The AI model to use for the response.
+        Returns:
+        str: The content of the AI's response to the summarization prompt.
+        """
+        prompt = '''
+          {user_question}
+      '''.format(user_question = user_question)
+        # Response format is set to 'None'
+        return self.chat(prompt,model,None)
+class ConversationGROQ:
+    def __init__(self, conversational_memory_length: int = 10, api_key: str = os.getenv('GROQ_API_KEY'), model: str = os.getenv('GROQ_MODEL')):
+        self.client: ChatGroq = ChatGroq(
+            groq_api_key=api_key,
+            model=model
+        )
+        self.memory: ConversationBufferWindowMemory = ConversationBufferWindowMemory(k=conversational_memory_length, memory_key="chat_history", return_messages=True)
+        self.conversation: Optional[LLMChain] = None
+    def sequential_chain(self, llm: ChatGroq, prompt_sequences: list[Dict[str, str]], input_variable: list[str], output_variable: list[str]):
+        """
+        This function creates a sequential chain of LLM chains based on the provided prompt sequences, input variables, and output variables.
+        Parameters:
+        llm (ChatGroq): The Groq API client.
+        prompt_sequences (list[Dict[str, str]]): A list of dictionaries containing the prompt and output key for each sequence.
+        input_variable (list[str]): A list of input variables for the overall chain.
+        output_variable (list[str]): A list of output variables for the overall chain.
+        Example:
+        prompt_sequences = [
+            {'prompt': 'You are a helpful assistant.{input} Answer the user\'s question. {user_input}', 'output_key': 'prompt1'},
+            {'prompt': 'You are a helpful assistant. Answer the user\'s question. {user_input}', 'output_key': 'prompt2'},
+            {'prompt': 'You are a helpful assistant. Answer the user\'s question. {user_input}', 'output_key': 'final'}
+        ]
+        input_variable = ['input']
+        output_variable = ['prompt1', 'prompt2', 'final']
+        Returns:
+        SequentialChain: An overall chain that combines all the individual chains.
+        """
+        chains = []
+        for sequence in prompt_sequences:
+            prompt = sequence['prompt']
+            output_key = sequence['output_key']
+            template = ChatPromptTemplate.from_template(prompt)
+            chain = LLMChain(llm=llm or self.client, prompt=template, output_key=output_key)
+            chains.append(chain)
+        overall_chain = SequentialChain(
+            chains=chains,
+            input_variables=input_variable,
+            output_variables=output_variable,
+            verbose=True
+        )
+        return overall_chain
+    def create_router_chain(self, templates_prompts: list[Dict[str, str]], llm: Optional[ChatGroq] = None):
+        MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
+            language model select the model prompt best suited for the input. \
+            You will be given the names of the available prompts and a \
+            description of what the prompt is best suited for. \
+            You may also revise the original input if you think that revising\
+            it will ultimately lead to a better response from the language model.
+            << FORMATTING >>
+            Return a markdown code snippet with a JSON object formatted to look like:
+            ```json
+            {{{{
+                "destination": string \ name of the prompt to use or "DEFAULT"
+                "next_inputs": string \ a potentially modified version of the original input
+            }}}}
+            ```
+            REMEMBER: "destination" MUST be one of the candidate prompt \
+            names specified below OR it can be "DEFAULT" if the input is not\
+            well suited for any of the candidate prompts.
+            REMEMBER: "next_inputs" can just be the original input \
+            if you don't think any modifications are needed.
+            << CANDIDATE PROMPTS >>
+            {destinations}
+            << INPUT >>
+            {{input}}
+            << OUTPUT (remember to include the ```json)>>"""
+        destination_chains = {}
+        for template in templates_prompts:
+            destination_chains[template['name']] = LLMChain(llm=llm or self.client, memory=self.memory, prompt=ChatPromptTemplate.from_template(template= template['prompt_template']))
+        destinations = [f"{template['name']}: {template['description']}" for template in templates_prompts]
+        destinations_str = "\n".join(destinations)
+        router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(destinations=destinations_str)
+        router_prompt = PromptTemplate(
+            template=router_template,
+            input_variables=["input"],
+            output_parser=RouterOutputParser(),
+        )
+        default_prompt = ChatPromptTemplate.from_template("{input}")
+        default_chain = LLMChain(llm=llm or self.client, memory=self.memory, prompt=default_prompt)
+        router_chain = LLMRouterChain.from_llm(llm or self.client, router_prompt)
+        chain = MultiPromptChain(router_chain=router_chain,
+                         destination_chains=destination_chains,
+                         default_chain=default_chain, verbose=True
+                        )
+        return chain
+    def get_conditional_template(self, input: str, categories: list[Dict[str, str]]) -> ChatPromptTemplate:
+        MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
+            language model select the model prompt best suited for the input. \
+            You will be given the names of the available prompts and a \
+            description of what the prompt is best suited for. \
+            You may also revise the original input if you think that revising\
+            it will ultimately lead to a better response from the language model.
+            << FORMATTING >>
+            Return a markdown code snippet with a JSON object formatted to look like:
+            ```json
+            {{{{
+                "destination": string \ name of the prompt to use or "DEFAULT"
+                "next_inputs": string \ a potentially modified version of the original input
+            }}}}
+            ```
+            REMEMBER: "destination" MUST be one of the candidate prompt \
+            names specified below OR it can be "DEFAULT" if the input is not\
+            well suited for any of the candidate prompts.
+            REMEMBER: "next_inputs" can just be the original input \
+            if you don't think any modifications are needed.
+            << CANDIDATE PROMPTS >>
+            {destinations}
+            << INPUT >>
+            {input}
+            << OUTPUT (remember to include the ```json)>>""".format(destinations = "\n".join([f"{template['name']}: {template['description']}" for template in categories]), input = input)
+        router_prompt = PromptTemplate(
+            template=MULTI_PROMPT_ROUTER_TEMPLATE,
+            input_variables=["input"],
+        )
+        response = LLMChain(llm=self.client, prompt=router_prompt).predict(input = input)
+        json_str = response.split('```json')[1].split('```')[0].strip()
+        return json.loads(json_str)
+    def create_template(self, base_prompt: str) -> ChatPromptTemplate:
+        return ChatPromptTemplate.from_messages([
+                    SystemMessage(
+                        content=base_prompt
+                    ),  # This is the persistent system prompt that is always included at the start of the chat.
+                    MessagesPlaceholder(
+                        variable_name="chat_history"
+                    ),  # This placeholder will be replaced by the actual chat history during the conversation. It helps in maintaining context.
+                    HumanMessagePromptTemplate.from_template(
+                        "{human_input}"
+                    ),  # This template is where the user's current input will be injected into the prompt.
+                ])
+    def create_conversation(self, prompt: str = None, llm = None, memory = None, verbose: bool = True):
+        self.conversation = LLMChain(
+            llm=llm or self.client,
+            memory=memory or self.memory,
+            prompt=self.create_template(prompt) if prompt else None,
+            verbose=verbose
+        )
+        return self.conversation
+    def chat(self, user_input: str) -> str:
+        if self.conversation is None:
+            raise ValueError("Conversation not initialized. Call create_conversation() first.")
+        return self.conversation.predict(human_input =user_input)

src/helpers/README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# GROQ and ConversationGROQ Classes
+This module provides two main classes for interacting with the Groq API: `GROQ` and `ConversationGROQ`. These classes offer various functionalities for chat completions, summarization, and creating conversation chains.
+## GROQ Class
+The `GROQ` class provides basic functionality for interacting with the Groq API.
+### Methods
+#### `__init__(self, api_key: str = '<your groq api key here'')`
+Initializes the GROQ class with the provided API key.
+#### `chat(self, prompt: str, model: str, response_format: Optional[Dict]) -> str`
+Sends a chat completion request to the Groq API.
+- `prompt`: The input prompt for the chat completion.
+- `model`: The AI model to use.
+- `response_format`: Optional response format configuration.
+Returns the content of the AI's response.
+#### `get_summarization(self, user_question: str, df: pd.DataFrame, model: str) -> str`
+Generates a summarization based on the user's question and the provided data.
+- `user_question`: The user's question.
+- `df`: A pandas DataFrame containing the data (currently unused in the method).
+- `model`: The AI model to use for the response.
+Returns the content of the AI's response to the summarization prompt.
+## ConversationGROQ Class
+The `ConversationGROQ` class provides more advanced functionality for creating conversation chains and managing chat history.
+### Methods
+#### `__init__(self, conversational_memory_length: int = 10, api_key: str = os.getenv('GROQ_API_KEY'), model: str = os.getenv('GROQ_MODEL'))`
+Initializes the ConversationGROQ class with the specified parameters.
+#### `sequential_chain(self, llm: ChatGroq, prompt_sequences: list[Dict[str, str]], input_variable: list[str], output_variable: list[str])`
+Creates a sequential chain of LLM chains based on the provided prompt sequences, input variables, and output variables.
+#### `create_router_chain(self, templates_prompts: list[Dict[str, str]], llm: Optional[ChatGroq] = None)`
+Creates a router chain for selecting the best-suited prompt based on the input.
+#### `get_conditional_template(self, input: str, categories: list[Dict[str, str]]) -> ChatPromptTemplate`
+Selects the best-suited prompt template based on the input and provided categories.
+#### `create_template(self, base_prompt: str) -> ChatPromptTemplate`
+Creates a chat prompt template with the given base prompt.
+#### `create_conversation(self, prompt: str = None, llm = None, memory = None, verbose: bool = True)`
+Initializes a conversation chain with the specified parameters.
+#### `chat(self, user_input: str) -> str`
+Sends a user input to the conversation chain and returns the AI's response.
+## Usage
+To use these classes, you need to have the Groq API key and the required dependencies installed. Make sure to set up the necessary environment variables or provide the API key directly when initializing the classes.
+Example usage:

src/helpers/Scrapy.py ADDED Viewed

	@@ -0,0 +1,55 @@

+import requests
+from bs4 import BeautifulSoup
+from helpers.GROQ import ConversationGROQ
+class Scrapper:
+    def __init__(self, url: str, groq_instance: ConversationGROQ):
+        self.url = url
+        self.groq_instance = groq_instance
+    def scrape(self):
+        response = requests.get(self.url)
+        response.raise_for_status
+        return response.content
+    def parse(self, content: str):
+        soup = BeautifulSoup(content, 'html.parser')
+        return ' '.join(soup.stripped_strings)
+    def compress(self, content: str):
+        return ' '.join(content.split())
+    def truncate(self, content: str):
+        return content[:1000] + '...' if len(content) > 1000 else content
+    def analyze(self, content: str):
+        prompt = """
+        Analyze the following HTML content with exceptional precision and depth:
+        {content}
+        """
+        response = self.groq_instance.chat(prompt.format(content=content))
+        return response
+    def extract(self, content: str):
+        prompt = """
+        Extract the following structured data from the HTML content:
+        {content}
+        1. JSON representation: Extract key information and structure it in JSON format.
+        2. Table extraction: Identify and extract any tables, presenting them in JSON format.
+        3. List compilation: Extract and present lists from the content in JSON format.
+        4. Key-value pair extraction: Identify and extract key-value pairs, presenting them in JSON format.
+        5. Numerical data analysis: Extract and present numerical data in JSON format.
+        6. Entity recognition: Identify and categorize named entities, presenting them in JSON format.
+        7. Sentiment analysis: Assess overall tone and sentiment, presenting results in JSON format.
+        8. Language detection: Identify the primary language and any secondary languages, presenting in JSON format.
+        9. Structured data markup: Extract any structured data present on the page, presenting in JSON format.
+        10. API endpoints: Document any API endpoints referenced, presenting in JSON format.
+        Ensure the extracted data is well-structured and properly formatted in JSON.
+        {content}
+        Provide the data in JSON format.
+        """
+        response = self.groq_instance.chat(prompt.format(content=content))
+        return response

src/helpers/prompts.py ADDED Viewed

	@@ -0,0 +1,10 @@

+class PromptManager:
+    def __init__(self):
+        self.prompts = {}
+    def load_prompt(self, name, file_path):
+        with open(file_path, 'r') as file:
+            self.prompts[name] = file.read()
+    def get_prompt(self, name):
+        return self.prompts.get(name, '')

src/main.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from flask import Flask
+from flask_session import Session
+from config import ApplicationConfig
+from models import db
+from routes.auth.index import auth
+from routes.llm.index import llm
+app = Flask(__name__)
+app.config.from_object(ApplicationConfig)
+server_session = Session(app)
+db.init_app(app)
+with app.app_context():
+    db.create_all()
+app.register_blueprint(auth, url_prefix='/auth')
+app.register_blueprint(llm, url_prefix='/llm')
+if __name__ == '__main__':
+    app.run(debug=True)

src/models.py ADDED Viewed

	@@ -0,0 +1,18 @@

+from flask_sqlalchemy import SQLAlchemy
+from uuid import uuid4
+db = SQLAlchemy()
+def get_uuid():
+    return uuid4().hex
+class UserModal(db.Model):
+    __tablename__ = 'users'
+    id = db.Column(db.String(32), primary_key=True, unique=True, default=get_uuid)
+    email = db.Column(db.String(120), unique=True, nullable=False)
+    password = db.Column(db.String(60), nullable=False)
+    def __repr__(self):
+        return f"User('{self.username}', '{self.email}')"

src/routes/auth/index.py ADDED Viewed

	@@ -0,0 +1,55 @@

+from flask import Blueprint, request, jsonify, session
+from models import db, UserModal
+from flask_bcrypt import Bcrypt
+auth = Blueprint('auth', __name__)
+bcrypt = Bcrypt()
+@auth.route('/register', methods=['POST'])
+def register():
+    data = request.json
+    email = data['email']
+    password = data['password']
+    userExists = UserModal.query.filter_by(email=email).first() is not None
+    if userExists:
+        return jsonify({'message': 'User already exists'}), 409
+    hash_password = bcrypt.generate_password_hash(password)
+    user = UserModal(email=email, password=hash_password)
+    db.session.add(user)
+    db.session.commit()
+    return jsonify({
+        'id': user.id,
+        'email': user.email
+    }), 201
+@auth.route('/me', methods=['GET'])
+def me():
+    user_id = session.get('user_id')
+    if not user_id:
+        return jsonify({'message': 'Unauthorized'}), 401
+    user = UserModal.query.filter_by(id=user_id).first()
+    return jsonify({
+        'id': user.id,
+        'email': user.email
+    }), 200
+@auth.route('/login', methods=['POST'])
+def login():
+    data = request.json
+    email = data['email']
+    password = data['password']
+    user = UserModal.query.filter_by(email=email).first()
+    if user is None:
+        return jsonify({'message': 'Invalid credentials'}), 401
+    if not bcrypt.check_password_hash(user.password, password):
+        return jsonify({'message': 'Invalid credentials'}), 401
+    session['user_id'] = user.id
+    return jsonify({
+        'id': user.id,
+    }), 200

src/routes/llm/index.py ADDED Viewed

	@@ -0,0 +1,79 @@

+from flask import Blueprint, request, jsonify, session
+from models import UserModal
+from helpers.GROQ import ConversationGROQ
+from helpers.prompts import PromptManager
+from helpers.Scrapy import Scrapper
+import requests
+llm = Blueprint('llm', __name__)
+import json
+prompt_manager = PromptManager()
+prompt_manager.load_prompt('base_prompt', 'prompts/base_prompts.txt')
+prompt_manager.load_prompt('base_chatbot_prompt', 'prompts/base_chatbot_prompts.txt')
+prompt_manager.load_prompt('base_seo_prompt', 'prompts/base_seo_prompts.txt')
+prompt_manager.load_prompt('base_content_prompt', 'prompts/base_content_prompts.txt')
+base_prompt = prompt_manager.get_prompt('base_prompt')
+base_chatbot_prompt = prompt_manager.get_prompt('base_chatbot_prompt')
+base_seo_prompt = prompt_manager.get_prompt('base_seo_prompt')
+base_content_prompt = prompt_manager.get_prompt('base_content_prompt')
+groq = ConversationGROQ()
+groq.create_conversation(base_prompt)
+@llm.route('/analyze', methods=['POST'])
+def analyze():
+    data = request.json
+    url = data['url']
+    # Check user authentication
+    user_id = session.get('user_id')
+    if not user_id:
+        return jsonify({'message': 'Unauthorized'}), 401
+    user = UserModal.query.filter_by(id=user_id).first()
+    if not user:
+        return jsonify({'message': 'User not found'}), 404
+    try:
+        scrapper = Scrapper(url, groq)
+        content = scrapper.scrape()
+        content = scrapper.parse(content)
+        content = scrapper.compress(content)
+        content = scrapper.truncate(content)
+        content = scrapper.analyze(content)
+        json = scrapper.extract(content)
+        return jsonify({'json': json, "descripton": content}), 200
+    except requests.RequestException as e:
+        return jsonify({'message': f'Error fetching URL: {str(e)}'}), 400
+    except Exception as e:
+        return jsonify({'message': f'Error processing HTML: {str(e)}'}), 500
+@llm.route('/chat', methods=['POST'])
+def chat():
+    data = request.json
+    message = data['message']
+    prompt = """
+    Analyze the user's input to determine the desired output format.
+    If the user requests a specific format (e.g., JSON, Excel), format the response accordingly.
+    If no specific format is mentioned, provide a normal view.
+    Your task is to analyze the following user input and provide a short response:
+    {message}
+    If you detect a request for JSON format in the user's input, wrap your response in a JSON structure like this:
+    {{"json": "<json_content_here>"}}
+    For other formats (e.g., Excel), indicate the format in your response, but provide the content in a text-based representation.
+    If no specific format is requested, provide a comprehensive analysis in a normal, readable format.
+    Analyze the user's input to determine if they're asking for a specific piece of information or a summary/opinion.
+    If they're asking for specific information, provide only that information without additional explanation.
+    If they're asking for a summary or your view, provide a concise explanation.
+    If they're asking for a specific format, provide the response in the requested format.
+    Response should be simple and to the point at most it should be a simple text or a json
+"""
+    response = groq.chat(prompt.format(message=message))
+    return jsonify({'response': response}), 200