Spaces:

ashok2216
/

youtube-data_scraper

Build error

App Files Files Community

ashok2216 commited on Apr 24, 2024

Commit

4292ffa

verified ·

1 Parent(s): 55c1506

Upload 13 files

Browse files

Files changed (13) hide show

.github/workflows/deploy.yml +33 -0
Dockerfile.sql +38 -0
LICENSE +21 -0
README.md +139 -11
app.py +95 -0
apps/Youtube_Scraper.py +89 -0
apps/pages/Youtube_Comments_analysis.py +49 -0
apps/pages/Youtube_analysis.py +38 -0
code/test.py +10 -0
code/youtube1.py +140 -0
code/youtube2.py +26 -0
render.yaml +38 -0
requirements.txt +9 -0

.github/workflows/deploy.yml ADDED Viewed

	@@ -0,0 +1,33 @@

+name: Deploy
+on:
+  push:
+    branches: main
+  pull_request:
+    branches: main
+jobs:
+  deploy:
+    name: Deploy
+    runs-on: ubuntu-latest
+    permissions:
+      id-token: write # Needed for auth with Deno Deploy
+      contents: read # Needed to clone the repository
+    steps:
+      - name: Clone repository
+        uses: actions/checkout@v3
+      - name: Install Node.js
+        uses: actions/setup-node@v3
+        with:
+          node-version: lts/*
+      - name: Build step
+        run: npm install && npm run build # 📝 Update the build command(s)
+      - name: Upload to Deno Deploy
+        uses: denoland/deployctl@v1
+        with:
+          project: "expensive-dolphin-10"
+          entrypoint: "index.js" # 📝 Update the entrypoint
+          root: "." # 📝 Update the root

Dockerfile.sql ADDED Viewed

	@@ -0,0 +1,38 @@

+FROM python:3.9-slim
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_DISABLE_PIP_VERSION_CHECK=1 \
+    PIP_DEFAULT_TIMEOUT=120 \
+    LC_ALL=C.UTF-8 \
+    LANG=C.UTF-8
+# we probably need build tools?
+RUN apt-get update \
+    && apt-get install --yes --no-install-recommends \
+    gcc \
+    g++ \
+    build-essential \
+    python3-dev
+WORKDIR /app
+# if we have a packages.txt, install it
+COPY packages.txt packages.txt
+RUN xargs -a packages.txt apt-get install --yes
+COPY requirements.txt requirements.txt
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+EXPOSE 8501
+COPY . .
+CMD ["streamlit", "run", "streamlit_app.py"]
+# docker build --progress=plain --tag selenium:latest .
+# docker run -ti -p 8501:8501 --rm selenium:latest /bin/bash
+# docker run -ti -p 8501:8501 --rm selenium:latest
+# docker run -ti -p 8501:8501 -v ${pwd}:/app --rm selenium:latest
+# docker run -ti -p 8501:8501 -v ${pwd}:/app --rm selenium:latest /bin/bash

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Ashok_kumar
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,11 +1,139 @@
----
-title: Youtube-data Scraper
-emoji: 📈
-colorFrom: gray
-colorTo: purple
-sdk: docker
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Scraping YouTube Data
+# Scraping the YouTube Video Likes using Selenium and Python & Creating Web Application (Streamlit)
+**Introduction to Web Scraping:**
+Web scraping is the automated process of extracting information or data from websites. It involves writing a script or using software to access and gather data from web pages, transforming unstructured data on the web into a structured format that can be analyzed, stored, or used in various applications.
+**Web Scraping Process:**
+Access Websites: A script or program accesses web pages, mimicking human browsing behavior.
+Retrieve Data: It extracts specific information from these web pages.
+Organize Data: The extracted data is structured and saved in a usable format (like CSV, JSON, or a database).
+Fetching Data: The process starts with a request to a website, retrieving the HTML content.
+Parsing: The HTML content is parsed to identify and extract relevant information using techniques like Regular Expressions, XPath, or CSS selectors.
+Data Extraction: The desired data, such as text, images, links, or tables, is extracted from the parsed HTML.
+Storage/Analysis: Extracted data is stored locally or analyzed for insights, automation, or integration into other systems.
+What is Selenium?
+Selenium scraping refers to using the Selenium framework, primarily employed for automating web browsers, to extract data from websites. It's a powerful tool used in web scraping to simulate human interaction with a web page by controlling a browser programmatically.
+Tools Required
+To get started, ensure you have the following tools installed:
+Python: A programming language used for scripting.
+Selenium WebDriver: A tool for controlling web browsers programmatically.
+Streamlit: It will help to deploy a App.
+Here's how it works:
+Automating Web Browsers: Selenium allows you to control a web browser (like Chrome, Firefox, or others) programmatically. It mimics human interaction by opening web pages, clicking buttons, filling forms, and navigating across different pages.
+2. Data Extraction: Once the browser is directed to a particular webpage, Selenium enables the extraction of desired data. This can include scraping text, images, tables, or any other content from the webpage.
+3. Scraping Dynamic Content: Selenium is particularly useful for scraping websites with dynamic content that can't be easily accessed using traditional scraping libraries.
+4. Complex Scraping Scenarios: Selenium is versatile and can handle complex scraping tasks that involve interactions such as login processes, submitting forms, scrolling through infinite scroll pages, or dealing with content behind logins or captchas.
+Import Libraries:
+import time
+import pprint
+import csv
+import selenium
+from selenium import webdriver
+from selenium.webdriver.chrome.service import Service
+from webdriver_manager.chrome import ChromeDriverManager
+from selenium.webdriver.support.wait import WebDriverWait
+from selenium.webdriver.common.by import By
+from selenium.webdriver.chrome.options import Options
+import csv
+from youtube_comment_scraper_python import *
+import pandas as pd
+import plotly.express as px
+import re
+import streamlit as st
+Kickstart with Selenium WebDriver:
+The Selenium WebDriver is a key component of the Selenium framework, designed to facilitate the interaction between your code and web browsers. It allows you to automate the testing of web applications and perform web scraping tasks by controlling browsers programmatically.
+driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
+url = st.text_input('Paste the Youtube Channel Link',"")
+if not url:
+  st.warning('Please input a Link.')
+  st.stop()
+st.success('Thank you for inputting a link.')
+name = re.compile(r"[A-Z]\w+")
+inp = name.findall(url)
+out = inp[0]
+st.write('Getting Data from', out, 'channel')
+driver.get(url)
+url = input('Enter Youtube Video Url- ')
+driver.get(url)
+# # "https://www.youtube.com/@YasoobKhalid/videos"
+# channel_title = driver.find_element(By.XPATH, '//yt-formatted-string[contains(@class, "ytd-channel-name")]').text
+handle = driver.find_element(By.XPATH, '//yt-formatted-string[@id="channel-handle"]').text
+subscriber_count = driver.find_element(By.XPATH, '//yt-formatted-string[@id="subscriber-count"]').text
+WAIT_IN_SECONDS = 5
+last_height = driver.execute_script("return document.documentElement.scrollHeight")
+while True:
+    # Scroll to the bottom of page
+    driver.execute_script("window.scrollTo(0, arguments[0]);", last_height)
+    # Wait for new videos to show up
+    time.sleep(WAIT_IN_SECONDS)
+    # Calculate new document height and compare it with last height
+    new_height = driver.execute_script("return document.documentElement.scrollHeight")
+    if new_height == last_height:
+        break
+    last_height = new_height
+thumbnails = driver.find_elements(By.XPATH, '//a[@id="thumbnail"]/yt-image/img')
+views = driver.find_elements(By.XPATH,'//div[@id="metadata-line"]/span[1]')
+titles = driver.find_elements(By.ID, "video-title")
+links = driver.find_elements(By.ID, "video-title-link")
+# likes = driver.find_elements(By.ID, "video-title-link-likes")
+Extracting Channel Information:
+YouTube channels hold a wealth of information, from engaging content to vital statistics that provide insights into their popularity. In this guide, we'll explore how to programmatically extract key details like the channel's title, views, thumbnail, and link using Python's web scraping tools.
+Extracting the Title, Views, Thumbnail, Link of the YouTube channel
+Channel Title: Locate the HTML element containing the channel's title.
+Channel Views: Find and extract the total number of views the channel has amassed.
+Thumbnail URL: Extract the URL of the channel's thumbnail image.
+Channel Link: Obtain the link to the YouTube channel.
+videos = []
+for title, view, thumb, link in zip(titles, views, thumbnails, links):
+    video_dict = {
+        'title': title.text,
+        'views': view.text,
+        # 'likes': likes.text,
+        'thumbnail': thumb.get_attribute('src'),
+        'link': link.get_attribute('href')
+    }
+    videos.append(video_dict)
+print(videos)
+Storing Scraped Data in CSV format
+videos is a list of dictionaries containing the data to be written to the variable to_csv.
+csv.DictWriter is a class within Python's csv module that facilitates writing data from dictionaries into CSV files. It's particularly useful when you have data organized in a dictionary format and want to export it into a CSV file with well-defined headers.
+The code uses the csv module to write data to a CSV file named data.csv.
+Then, it utilizes the pandas library (pd) to read the CSV file into a pandas DataFrame (df) and write that DataFrame to an CSV file. abd read the file named people.csv using the pd.read_csv() method.
+to_csv = videos
+keys = to_csv[0].keys()
+with open(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/data.csv', 'w', newline='', encoding='utf-8') as output_file:
+    dict_writer = csv.DictWriter(output_file, keys)
+    dict_writer.writeheader()
+    dict_writer.writerows(to_csv)
+df = pd.read_csv(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/peop.csv')
+st.dataframe(df)
+Streamlit App Development and Deployment:
+Streamlit is a Python library for creating web applications with minimal effort :D
+Streamlit is a Python library for creating web applications with minimal effort:
+Streamlit • A faster way to build and share data apps
+Rapid Development: Enables building interactive web apps using simple Python scripts.
+2. Data Visualization: Seamlessly integrates with popular data science libraries like Pandas, Matplotlib, and Plotly for quick data visualization.
+3. Automatic Updates: Auto-refreshes the app when code changes are detected, providing a smooth development experience.
+4. Custom Components: Supports custom HTML, CSS, and JavaScript for advanced customization.
+5. Deployment: Supports deployment to various platforms, including Streamlit sharing, Heroku, or other cloud providers.
+Scrapping YouTube Data using Selenium and Python - YouTube
+Conclusion
+Automating the extraction of YouTube channel details using Python and web scraping techniques can save time and provide valuable insights. By harnessing the power of libraries like Selenium you can effortlessly retrieve crucial statistics like the channel's title, views, thumbnail, and link for further analysis or integration into your projects.
+Start exploring and extracting valuable data from YouTube channels effortlessly with Python!

app.py ADDED Viewed

	@@ -0,0 +1,95 @@

+import time
+import pprint
+import csv
+import selenium
+from selenium import webdriver
+from selenium.webdriver.chrome.service import Service
+from webdriver_manager.chrome import ChromeDriverManager
+from selenium.webdriver.support.wait import WebDriverWait
+from selenium.webdriver.common.by import By
+from selenium.webdriver.chrome.options import Options
+import csv
+from youtube_comment_scraper_python import *
+import pandas as pd
+import plotly.express as px
+import re
+import streamlit as st
+st.title('Youtube WebScrap⛏️')
+# # ------------------------------------------------------------------------------CHANNEL DATA------------------------------------------------------------------------
+chromedriver_autoinstaller.install()
+# driver = webdriver.Chrome('/usr/bin/google-chrome')
+chrome_path = '/usr/bin/google-chrome'
+# Set up Chrome options if needed
+chrome_options = webdriver.ChromeOptions()
+# Create the WebDriver instance
+chrome_options.binary_location = chrome_path
+driver = webdriver.Chrome(executable_path=chrome_path, options=chrome_options)
+# driver = webdriver.Chrome()
+url = st.text_input('Paste the Youtube Channel Link',"")
+if not url:
+  st.warning('Please input a Link.')
+  st.stop()
+st.success('Thank you for inputting a link.')
+# url ='https://www.youtube.com/@YasoobKhalid/videos'
+name = re.compile(r"[A-Z]\w+")
+inp = name.findall(url)
+out = inp[0]
+st.write('Getting Data from', out, 'channel')
+driver.get(url)
+url = input('Enter Youtube Video Url- ')
+driver.get(url)
+# # "https://www.youtube.com/@YasoobKhalid/videos"
+# channel_title = driver.find_element(By.XPATH, '//yt-formatted-string[contains(@class, "ytd-channel-name")]').text
+handle = driver.find_element(By.XPATH, '//yt-formatted-string[@id="channel-handle"]').text
+subscriber_count = driver.find_element(By.XPATH, '//yt-formatted-string[@id="subscriber-count"]').text
+WAIT_IN_SECONDS = 5
+last_height = driver.execute_script("return document.documentElement.scrollHeight")
+while True:
+    # Scroll to the bottom of page
+    driver.execute_script("window.scrollTo(0, arguments[0]);", last_height)
+    # Wait for new videos to show up
+    time.sleep(WAIT_IN_SECONDS)
+    # Calculate new document height and compare it with last height
+    new_height = driver.execute_script("return document.documentElement.scrollHeight")
+    if new_height == last_height:
+        break
+    last_height = new_height
+thumbnails = driver.find_elements(By.XPATH, '//a[@id="thumbnail"]/yt-image/img')
+views = driver.find_elements(By.XPATH,'//div[@id="metadata-line"]/span[1]')
+titles = driver.find_elements(By.ID, "video-title")
+links = driver.find_elements(By.ID, "video-title-link")
+# likes = driver.find_elements(By.ID, "video-title-link-likes")
+videos = []
+for title, view, thumb, link in zip(titles, views, thumbnails, links):
+    video_dict = {
+        'title': title.text,
+        'views': view.text,
+        # 'likes': likes.text,
+        'thumbnail': thumb.get_attribute('src'),
+        'link': link.get_attribute('href')
+    }
+    videos.append(video_dict)
+print(videos)
+to_csv = videos
+keys = to_csv[0].keys()
+with open(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/people.csv', 'w', newline='', encoding='utf-8') as output_file:
+    dict_writer = csv.DictWriter(output_file, keys)
+    dict_writer.writeheader()
+    dict_writer.writerows(to_csv)
+df = pd.read_csv(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/people.csv')
+st.dataframe(df)

apps/Youtube_Scraper.py ADDED Viewed

	@@ -0,0 +1,89 @@

+import time
+import pprint
+import csv
+import selenium
+from selenium import webdriver
+from selenium.webdriver.chrome.service import Service
+from webdriver_manager.chrome import ChromeDriverManager
+from selenium.webdriver.support.wait import WebDriverWait
+from selenium.webdriver.common.by import By
+from selenium.webdriver.chrome.options import Options
+import csv
+from youtube_comment_scraper_python import *
+import pandas as pd
+import plotly.express as px
+import re
+import streamlit as st
+st.title('Youtube WebScrap⛏️')
+# # ------------------------------------------------------------------------------CHANNEL DATA------------------------------------------------------------------------
+driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
+url = st.text_input('Paste the Youtube Channel Link',"")
+if not url:
+  st.warning('Please input a Link.')
+  st.stop()
+st.success('Thank you for inputting a link.')
+# url ='https://www.youtube.com/@YasoobKhalid/videos'
+name = re.compile(r"[A-Z]\w+")
+inp = name.findall(url)
+out = inp[0]
+st.write('Getting Data from', out, 'channel')
+driver.get(url)
+url = input('Enter Youtube Video Url- ')
+driver.get(url)
+# # "https://www.youtube.com/@YasoobKhalid/videos"
+# channel_title = driver.find_element(By.XPATH, '//yt-formatted-string[contains(@class, "ytd-channel-name")]').text
+handle = driver.find_element(By.XPATH, '//yt-formatted-string[@id="channel-handle"]').text
+subscriber_count = driver.find_element(By.XPATH, '//yt-formatted-string[@id="subscriber-count"]').text
+WAIT_IN_SECONDS = 5
+last_height = driver.execute_script("return document.documentElement.scrollHeight")
+while True:
+    # Scroll to the bottom of page
+    driver.execute_script("window.scrollTo(0, arguments[0]);", last_height)
+    # Wait for new videos to show up
+    time.sleep(WAIT_IN_SECONDS)
+    # Calculate new document height and compare it with last height
+    new_height = driver.execute_script("return document.documentElement.scrollHeight")
+    if new_height == last_height:
+        break
+    last_height = new_height
+thumbnails = driver.find_elements(By.XPATH, '//a[@id="thumbnail"]/yt-image/img')
+views = driver.find_elements(By.XPATH,'//div[@id="metadata-line"]/span[1]')
+titles = driver.find_elements(By.ID, "video-title")
+links = driver.find_elements(By.ID, "video-title-link")
+# likes = driver.find_elements(By.ID, "video-title-link-likes")
+videos = []
+for title, view, thumb, link in zip(titles, views, thumbnails, links):
+    video_dict = {
+        'title': title.text,
+        'views': view.text,
+        # 'likes': likes.text,
+        'thumbnail': thumb.get_attribute('src'),
+        'link': link.get_attribute('href')
+    }
+    videos.append(video_dict)
+print(videos)
+to_csv = videos
+keys = to_csv[0].keys()
+with open(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/people.csv', 'w', newline='', encoding='utf-8') as output_file:
+    dict_writer = csv.DictWriter(output_file, keys)
+    dict_writer.writeheader()
+    dict_writer.writerows(to_csv)
+df = pd.read_csv(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/people.csv')
+st.dataframe(df)

apps/pages/Youtube_Comments_analysis.py ADDED Viewed

	@@ -0,0 +1,49 @@

+import time
+import pprint
+import csv
+from selenium import webdriver
+from selenium.webdriver.chrome.service import Service
+from webdriver_manager.chrome import ChromeDriverManager
+from selenium.webdriver.common.by import By
+import csv
+from youtube_comment_scraper_python import *
+import pandas as pd
+import plotly.express as px
+import re
+import streamlit as st
+st.markdown("# Page 3 🎉")
+st.sidebar.markdown("# Page 3 🎉")
+# url = input('Enter Youtube Video Url- ')
+# youtube.open(url)
+# youtube.keypress("pagedown")
+# data = []
+# currentpagesource=youtube.get_page_source()
+# lastpagesource=''
+# while(True):
+#     if(lastpagesource==currentpagesource):
+#         break
+#     lastpagesource=currentpagesource
+#     response=youtube.video_comments()
+#     for c in response['body']:
+#         data.append(c)
+#     youtube.scroll()
+#     currentpagesource=youtube.get_page_source()
+# df = pd.DataFrame(data)
+# df = df.replace('\n',' ', regex=True)
+# df = df[['Comment', 'Likes']].drop_duplicates(keep="first")
+# # df = df[['Likes']].drop_duplicates(keep="first")
+# df.to_csv('output/data.csv',index=False)
+# df.head()

apps/pages/Youtube_analysis.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import time
+import pprint
+import csv
+from selenium import webdriver
+from selenium.webdriver.chrome.service import Service
+from webdriver_manager.chrome import ChromeDriverManager
+from selenium.webdriver.common.by import By
+import csv
+from youtube_comment_scraper_python import *
+import pandas as pd
+import plotly.express as px
+import re
+import streamlit as st
+import numpy as np
+st.title('Youtube Channel Analysis📈')
+df = pd.read_csv(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/people.csv')
+st.dataframe(df)
+count = st.slider('Select Lower Video Count', 0, len(df), 100)
+st.write("You selected", count, 'Videos')
+fig = px.bar(df[:count],
+    x="title",
+    y="views", height=1000
+)
+fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
+# fig.update_yaxes(tickvals=['10k', '22k', '29k', '56k'])
+tab1, tab2 = st.tabs(["Streamlit theme (default)", "Plotly native theme"])
+with tab1:
+    # Use the Streamlit theme.
+    # This is the default. So you can also omit the theme argument.
+    st.plotly_chart(fig, theme="streamlit", use_container_width=True)
+with tab2:
+    # Use the native Plotly theme.
+    st.plotly_chart(fig, theme=None, use_container_width=True)

code/test.py ADDED Viewed

	@@ -0,0 +1,10 @@

+import pandas as pd
+df = pd.read_csv(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/people.csv')
+# st.dataframe(df)
+# print(df[0:1,:])
+# for i in range(len(df)):
+# 	print(i)

code/youtube1.py ADDED Viewed

	@@ -0,0 +1,140 @@

+import time
+import pprint
+import csv
+from selenium import webdriver
+from selenium.webdriver.chrome.service import Service
+from webdriver_manager.chrome import ChromeDriverManager
+from selenium.webdriver.common.by import By
+import csv
+from youtube_comment_scraper_python import *
+import pandas as pd
+import plotly.express as px
+import re
+import streamlit as st
+st.title('Youtube Channel Analysis')
+st.write('Youtube WebScrap')
+# # ------------------------------------------------------------------------------CHANNEL DATA------------------------------------------------------------------------
+driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
+url = st.text_input('Paste the Youtube Channel Link',"")
+if not url:
+  st.warning('Please input a Link.')
+  st.stop()
+st.success('Thank you for inputting a link.')
+# url ='https://www.youtube.com/@YasoobKhalid/videos'
+name = re.compile(r"[A-Z]\w+")
+inp = name.findall(url)
+out = inp[0]
+st.write('Getting Data from', out, 'channel')
+driver.get(url)
+# url = input('Enter Youtube Video Url- ')
+# driver.get(url)
+# # "https://www.youtube.com/@YasoobKhalid/videos"
+# channel_title = driver.find_element(By.XPATH, '//yt-formatted-string[contains(@class, "ytd-channel-name")]').text
+handle = driver.find_element(By.XPATH, '//yt-formatted-string[@id="channel-handle"]').text
+subscriber_count = driver.find_element(By.XPATH, '//yt-formatted-string[@id="subscriber-count"]').text
+WAIT_IN_SECONDS = 5
+last_height = driver.execute_script("return document.documentElement.scrollHeight")
+while True:
+    # Scroll to the bottom of page
+    driver.execute_script("window.scrollTo(0, arguments[0]);", last_height)
+    # Wait for new videos to show up
+    time.sleep(WAIT_IN_SECONDS)
+    # Calculate new document height and compare it with last height
+    new_height = driver.execute_script("return document.documentElement.scrollHeight")
+    if new_height == last_height:
+        break
+    last_height = new_height
+thumbnails = driver.find_elements(By.XPATH, '//a[@id="thumbnail"]/yt-image/img')
+views = driver.find_elements(By.XPATH,'//div[@id="metadata-line"]/span[1]')
+titles = driver.find_elements(By.ID, "video-title")
+links = driver.find_elements(By.ID, "video-title-link")
+# likes = driver.find_elements(By.ID, "video-title-link-likes")
+videos = []
+for title, view, thumb, link in zip(titles, views, thumbnails, links):
+    video_dict = {
+        'title': title.text,
+        'views': view.text,
+        # 'likes': likes.text,
+        'thumbnail': thumb.get_attribute('src'),
+        'link': link.get_attribute('href')
+    }
+    videos.append(video_dict)
+print(videos)
+to_csv = videos
+keys = to_csv[0].keys()
+with open('output/people.csv', 'w', newline='', encoding='utf-8') as output_file:
+    dict_writer = csv.DictWriter(output_file, keys)
+    dict_writer.writeheader()
+    dict_writer.writerows(to_csv)
+df = pd.read_csv('output/people.csv')
+st.dataframe(df)
+count = st.slider('Select Lower Video Count', 0, 607, 100)
+st.write("You selected", count, 'Videos')
+fig = px.bar(df,
+    x="title",
+    y="views", height=600
+)
+fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
+# fig.update_yaxes(tickvals=['10k', '22k', '29k', '56k'])
+tab1, tab2 = st.tabs(["Streamlit theme (default)", "Plotly native theme"])
+with tab1:
+    # Use the Streamlit theme.
+    # This is the default. So you can also omit the theme argument.
+    st.plotly_chart(fig, theme="streamlit", use_container_width=True)
+with tab2:
+    # Use the native Plotly theme.
+    st.plotly_chart(fig, theme=None, use_container_width=True)
+# ----------------------------------------------------------------------------COMMENTS------------------------------------------------------------------------------
+# url = input('Enter Youtube Video Url- ')
+# youtube.open(url)
+# youtube.keypress("pagedown")
+# data = []
+# currentpagesource=youtube.get_page_source()
+# lastpagesource=''
+# while(True):
+#     if(lastpagesource==currentpagesource):
+#         break
+#     lastpagesource=currentpagesource
+#     response=youtube.video_comments()
+#     for c in response['body']:
+#         data.append(c)
+#     youtube.scroll()
+#     currentpagesource=youtube.get_page_source()
+# df = pd.DataFrame(data)
+# df = df.replace('\n',' ', regex=True)
+# df = df[['Comment', 'Likes']].drop_duplicates(keep="first")
+# # df = df[['Likes']].drop_duplicates(keep="first")
+# df.to_csv('output/data.csv',index=False)
+# df.head()

code/youtube2.py ADDED Viewed

	@@ -0,0 +1,26 @@

+from bs4 import BeautifulSoup #for scraping
+import requests               #required for reading the file
+import pandas as pd           #(optional) Pandas for dataframes
+import json                   #(optional) If you want to export json
+import os
+url = input('Enter Youtube Video Url- ') # user input for the link
+Vid={}
+Link = url
+source= requests.get(url).text
+soup=BeautifulSoup(source,'lxml')
+div_s = soup.findAll('div')
+Title = div_s[1].find('span',class_='watch-title').text.strip()
+Vid['Title']=Title
+Vid['Link']=Link
+Channel_name = div_s[1].find('a',class_="yt-uix-sessionlink spf-link").text.strip()
+Channel_link = ('www.youtube.com'+div_s[1].find('a',class_="yt-uix-sessionlink spf-link").get('href'))
+Subscribers = div_s[1].find('span',class_="yt-subscription-button-subscriber-count-branded-horizontal yt-subscriber-count").text.strip()
+if len(Channel_name) ==0:
+    Channel_name ='None'
+    Channel_link = 'None'
+    Subscribers = 'None'
+Vid['Channel']=Channel_name
+Vid['Channel_link']=Channel_link
+Vid['Channel_subscribers']=Subscribers

render.yaml ADDED Viewed

	@@ -0,0 +1,38 @@

+services:
+  - name: web
+    env:
+      - key: CHROME_BIN
+        value: /usr/bin/google-chrome
+# Use an official Python runtime as a parent image
+FROM python:3.8-slim
+# Set the working directory in the container
+WORKDIR /app
+# Copy the current directory contents into the container at /app
+COPY . /app
+# Install any needed packages specified in requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt
+# Install Chrome and ChromeDriver
+RUN apt-get update && apt-get install -y \
+    wget \
+    unzip \
+    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
+    && echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \
+    && apt-get update && apt-get install -y \
+    google-chrome-stable \
+    && wget https://chromedriver.storage.googleapis.com/94.0.4606.61/chromedriver_linux64.zip \
+    && unzip chromedriver_linux64.zip \
+    && mv chromedriver /usr/local/bin \
+    && rm chromedriver_linux64.zip
+# Make port 80 available to the world outside this container
+EXPOSE 80
+# Define environment variable
+ENV NAME World
+# Run app.py when the container launches
+CMD ["python", "app.py"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+selenium==4.9.1
+webdriver-manager==3.8.6
+youtube-comment-scraper-python==1.0.0
+plotly==5.14.1
+seleniumbase==4.14.12
+undetected-chromedriver==3.4.7
+streamlit==1.30.0
+altair==5.0.1
+chromedriver-autoinstaller==0.0.8