ashok2216 commited on
Commit
7d0868e
·
verified ·
1 Parent(s): 4292ffa

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -139
README.md DELETED
@@ -1,139 +0,0 @@
1
- # Scraping YouTube Data
2
- # Scraping the YouTube Video Likes using Selenium and Python & Creating Web Application (Streamlit)
3
-
4
- **Introduction to Web Scraping:**
5
-
6
- Web scraping is the automated process of extracting information or data from websites. It involves writing a script or using software to access and gather data from web pages, transforming unstructured data on the web into a structured format that can be analyzed, stored, or used in various applications.
7
-
8
- **Web Scraping Process:**
9
-
10
- Access Websites: A script or program accesses web pages, mimicking human browsing behavior.
11
- Retrieve Data: It extracts specific information from these web pages.
12
- Organize Data: The extracted data is structured and saved in a usable format (like CSV, JSON, or a database).
13
- Fetching Data: The process starts with a request to a website, retrieving the HTML content.
14
- Parsing: The HTML content is parsed to identify and extract relevant information using techniques like Regular Expressions, XPath, or CSS selectors.
15
- Data Extraction: The desired data, such as text, images, links, or tables, is extracted from the parsed HTML.
16
- Storage/Analysis: Extracted data is stored locally or analyzed for insights, automation, or integration into other systems.
17
-
18
- What is Selenium?
19
-
20
- Selenium scraping refers to using the Selenium framework, primarily employed for automating web browsers, to extract data from websites. It's a powerful tool used in web scraping to simulate human interaction with a web page by controlling a browser programmatically.
21
- Tools Required
22
- To get started, ensure you have the following tools installed:
23
- Python: A programming language used for scripting.
24
- Selenium WebDriver: A tool for controlling web browsers programmatically.
25
- Streamlit: It will help to deploy a App.
26
-
27
- Here's how it works:
28
- Automating Web Browsers: Selenium allows you to control a web browser (like Chrome, Firefox, or others) programmatically. It mimics human interaction by opening web pages, clicking buttons, filling forms, and navigating across different pages.
29
-
30
- 2. Data Extraction: Once the browser is directed to a particular webpage, Selenium enables the extraction of desired data. This can include scraping text, images, tables, or any other content from the webpage.
31
- 3. Scraping Dynamic Content: Selenium is particularly useful for scraping websites with dynamic content that can't be easily accessed using traditional scraping libraries.
32
- 4. Complex Scraping Scenarios: Selenium is versatile and can handle complex scraping tasks that involve interactions such as login processes, submitting forms, scrolling through infinite scroll pages, or dealing with content behind logins or captchas.
33
- Import Libraries:
34
- import time
35
- import pprint
36
- import csv
37
- import selenium
38
- from selenium import webdriver
39
- from selenium.webdriver.chrome.service import Service
40
- from webdriver_manager.chrome import ChromeDriverManager
41
- from selenium.webdriver.support.wait import WebDriverWait
42
- from selenium.webdriver.common.by import By
43
- from selenium.webdriver.chrome.options import Options
44
- import csv
45
- from youtube_comment_scraper_python import *
46
- import pandas as pd
47
- import plotly.express as px
48
- import re
49
- import streamlit as st
50
- Kickstart with Selenium WebDriver:
51
- The Selenium WebDriver is a key component of the Selenium framework, designed to facilitate the interaction between your code and web browsers. It allows you to automate the testing of web applications and perform web scraping tasks by controlling browsers programmatically.
52
- driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
53
-
54
- url = st.text_input('Paste the Youtube Channel Link',"")
55
- if not url:
56
- st.warning('Please input a Link.')
57
- st.stop()
58
- st.success('Thank you for inputting a link.')
59
- name = re.compile(r"[A-Z]\w+")
60
- inp = name.findall(url)
61
- out = inp[0]
62
- st.write('Getting Data from', out, 'channel')
63
-
64
- driver.get(url)
65
- url = input('Enter Youtube Video Url- ')
66
- driver.get(url)
67
- # # "https://www.youtube.com/@YasoobKhalid/videos"
68
- # channel_title = driver.find_element(By.XPATH, '//yt-formatted-string[contains(@class, "ytd-channel-name")]').text
69
- handle = driver.find_element(By.XPATH, '//yt-formatted-string[@id="channel-handle"]').text
70
- subscriber_count = driver.find_element(By.XPATH, '//yt-formatted-string[@id="subscriber-count"]').text
71
- WAIT_IN_SECONDS = 5
72
- last_height = driver.execute_script("return document.documentElement.scrollHeight")
73
-
74
- while True:
75
- # Scroll to the bottom of page
76
- driver.execute_script("window.scrollTo(0, arguments[0]);", last_height)
77
- # Wait for new videos to show up
78
- time.sleep(WAIT_IN_SECONDS)
79
-
80
- # Calculate new document height and compare it with last height
81
- new_height = driver.execute_script("return document.documentElement.scrollHeight")
82
- if new_height == last_height:
83
- break
84
- last_height = new_height
85
-
86
- thumbnails = driver.find_elements(By.XPATH, '//a[@id="thumbnail"]/yt-image/img')
87
- views = driver.find_elements(By.XPATH,'//div[@id="metadata-line"]/span[1]')
88
- titles = driver.find_elements(By.ID, "video-title")
89
- links = driver.find_elements(By.ID, "video-title-link")
90
- # likes = driver.find_elements(By.ID, "video-title-link-likes")
91
- Extracting Channel Information:
92
- YouTube channels hold a wealth of information, from engaging content to vital statistics that provide insights into their popularity. In this guide, we'll explore how to programmatically extract key details like the channel's title, views, thumbnail, and link using Python's web scraping tools.
93
- Extracting the Title, Views, Thumbnail, Link of the YouTube channel
94
- Channel Title: Locate the HTML element containing the channel's title.
95
- Channel Views: Find and extract the total number of views the channel has amassed.
96
- Thumbnail URL: Extract the URL of the channel's thumbnail image.
97
- Channel Link: Obtain the link to the YouTube channel.
98
-
99
- videos = []
100
- for title, view, thumb, link in zip(titles, views, thumbnails, links):
101
- video_dict = {
102
- 'title': title.text,
103
- 'views': view.text,
104
- # 'likes': likes.text,
105
- 'thumbnail': thumb.get_attribute('src'),
106
- 'link': link.get_attribute('href')
107
- }
108
- videos.append(video_dict)
109
-
110
- print(videos)
111
- Storing Scraped Data in CSV format
112
- videos is a list of dictionaries containing the data to be written to the variable to_csv.
113
- csv.DictWriter is a class within Python's csv module that facilitates writing data from dictionaries into CSV files. It's particularly useful when you have data organized in a dictionary format and want to export it into a CSV file with well-defined headers.
114
- The code uses the csv module to write data to a CSV file named data.csv.
115
- Then, it utilizes the pandas library (pd) to read the CSV file into a pandas DataFrame (df) and write that DataFrame to an CSV file. abd read the file named people.csv using the pd.read_csv() method.
116
-
117
- to_csv = videos
118
- keys = to_csv[0].keys()
119
-
120
- with open(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/data.csv', 'w', newline='', encoding='utf-8') as output_file:
121
- dict_writer = csv.DictWriter(output_file, keys)
122
- dict_writer.writeheader()
123
- dict_writer.writerows(to_csv)
124
- df = pd.read_csv(r'C:/Users/ashok/OneDrive/Desktop/WebScrap/Youtube/output/peop.csv')
125
- st.dataframe(df)
126
- Streamlit App Development and Deployment:
127
- Streamlit is a Python library for creating web applications with minimal effort :D
128
- Streamlit is a Python library for creating web applications with minimal effort:
129
- Streamlit • A faster way to build and share data apps
130
- Rapid Development: Enables building interactive web apps using simple Python scripts.
131
- 2. Data Visualization: Seamlessly integrates with popular data science libraries like Pandas, Matplotlib, and Plotly for quick data visualization.
132
- 3. Automatic Updates: Auto-refreshes the app when code changes are detected, providing a smooth development experience.
133
- 4. Custom Components: Supports custom HTML, CSS, and JavaScript for advanced customization.
134
- 5. Deployment: Supports deployment to various platforms, including Streamlit sharing, Heroku, or other cloud providers.
135
-
136
- Scrapping YouTube Data using Selenium and Python - YouTube
137
- Conclusion
138
- Automating the extraction of YouTube channel details using Python and web scraping techniques can save time and provide valuable insights. By harnessing the power of libraries like Selenium you can effortlessly retrieve crucial statistics like the channel's title, views, thumbnail, and link for further analysis or integration into your projects.
139
- Start exploring and extracting valuable data from YouTube channels effortlessly with Python!