qmd to ipynb
Browse files- src/03_low_code/app_market_scraping/app_market_scraping.ipynb +171 -0
- src/03_low_code/app_market_scraping/app_market_scraping.qmd +0 -94
- src/03_low_code/catalogue/bookstoscrape.ipynb +187 -0
- src/03_low_code/catalogue/bookstoscrape.qmd +0 -103
- src/_quarto.yml +2 -2
- src/assets/App_Market_Scraping.ipynb +0 -0
- src/low_code.qmd +1 -1
src/03_low_code/app_market_scraping/app_market_scraping.ipynb
ADDED
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {},
|
6 |
+
"source": [
|
7 |
+
"---\n",
|
8 |
+
"title: \"App-Market-Scraping\"\n",
|
9 |
+
"description: \"Extraktion und Analyse von App-Marktdaten, einschließlich benutzerdefinierter Suchparameter und Datenexport.\"\n",
|
10 |
+
"image: _2f0cb788-71a6-4817-ab94-d38c346e4f6f.jpeg\n",
|
11 |
+
"format: \n",
|
12 |
+
" html:\n",
|
13 |
+
" toc: true\n",
|
14 |
+
" code-tools: true\n",
|
15 |
+
"jupyter: python3\n",
|
16 |
+
"---"
|
17 |
+
]
|
18 |
+
},
|
19 |
+
{
|
20 |
+
"cell_type": "markdown",
|
21 |
+
"metadata": {},
|
22 |
+
"source": [
|
23 |
+
"[](https://colab.research.google.com/#fileId=https://huggingface.co/spaces/datenwerkzeuge/CDL-Webscraping-Workshop-2025/blob/main/src/03_low_code/app_market_scraping/app_market_scraping.ipynb)"
|
24 |
+
]
|
25 |
+
},
|
26 |
+
{
|
27 |
+
"cell_type": "markdown",
|
28 |
+
"metadata": {},
|
29 |
+
"source": [
|
30 |
+
"## Lernziele\n",
|
31 |
+
"\n",
|
32 |
+
"- Installation des Google Play Scrapers\n",
|
33 |
+
"- Einlesen einer CSV-Datei mit App-URLs\n",
|
34 |
+
"- Abrufen von App-Informationen über einen Loop\n",
|
35 |
+
"- Visualisierung der abgerufenen Daten"
|
36 |
+
]
|
37 |
+
},
|
38 |
+
{
|
39 |
+
"cell_type": "markdown",
|
40 |
+
"metadata": {},
|
41 |
+
"source": [
|
42 |
+
"## App Market Scraping\n",
|
43 |
+
"\n",
|
44 |
+
"Um Apps zu sammeln, besuchen Sie die [Google Play Search](../../02_basics/app_market/google-play-search.qmd) Anwendung. Diese Anwendung ermöglicht es Ihnen, nach Apps im Google Play Store zu suchen und die URLs der gefundenen Apps zu exportieren. Speichern Sie die exportierten URLs in einer CSV-Datei, die als Grundlage für die Auswertung dient."
|
45 |
+
]
|
46 |
+
},
|
47 |
+
{
|
48 |
+
"cell_type": "markdown",
|
49 |
+
"metadata": {},
|
50 |
+
"source": [
|
51 |
+
"### 1. Installation des Google Play Scrapers\n",
|
52 |
+
"\n",
|
53 |
+
"In einem Colab Notebook, installiere die Google-Play-Scraper Bibliothek mit dem folgenden Befehl:"
|
54 |
+
]
|
55 |
+
},
|
56 |
+
{
|
57 |
+
"cell_type": "code",
|
58 |
+
"execution_count": null,
|
59 |
+
"metadata": {},
|
60 |
+
"outputs": [],
|
61 |
+
"source": [
|
62 |
+
"! pip install google-play-scraper"
|
63 |
+
]
|
64 |
+
},
|
65 |
+
{
|
66 |
+
"cell_type": "markdown",
|
67 |
+
"metadata": {},
|
68 |
+
"source": [
|
69 |
+
"### 2. Einlesen einer CSV-Datei mit App-URLs\n",
|
70 |
+
"\n",
|
71 |
+
"Eine CSV-Datei (`app_urls.csv`) erstellen, die eine Spalte url enthält, welche die URLs der Google Play Store Apps auflistet. Beispiel:\n",
|
72 |
+
"\n",
|
73 |
+
"```python\n",
|
74 |
+
"url\n",
|
75 |
+
"https://play.google.com/store/apps/details?id=com.example.app1\n",
|
76 |
+
"https://play.google.com/store/apps/details?id=com.example.app2\n",
|
77 |
+
"```\n",
|
78 |
+
"\n",
|
79 |
+
"Die CSV-Datei in ein Pandas DataFrame einlesen:"
|
80 |
+
]
|
81 |
+
},
|
82 |
+
{
|
83 |
+
"cell_type": "code",
|
84 |
+
"execution_count": null,
|
85 |
+
"metadata": {},
|
86 |
+
"outputs": [],
|
87 |
+
"source": [
|
88 |
+
"import pandas as pd\n",
|
89 |
+
"\n",
|
90 |
+
"# CSV-Datei einlesen\n",
|
91 |
+
"df = pd.read_csv('app_urls.csv')"
|
92 |
+
]
|
93 |
+
},
|
94 |
+
{
|
95 |
+
"cell_type": "markdown",
|
96 |
+
"metadata": {},
|
97 |
+
"source": [
|
98 |
+
"### 3. Abrufen von App-Informationen über einen Loop\n",
|
99 |
+
"\n",
|
100 |
+
"Den Google Play Scraper verwenden, um Informationen über die Apps abzurufen:"
|
101 |
+
]
|
102 |
+
},
|
103 |
+
{
|
104 |
+
"cell_type": "code",
|
105 |
+
"execution_count": null,
|
106 |
+
"metadata": {},
|
107 |
+
"outputs": [],
|
108 |
+
"source": [
|
109 |
+
"from google_play_scraper import app\n",
|
110 |
+
"\n",
|
111 |
+
"# Funktion zum Extrahieren der App-ID aus der URL\n",
|
112 |
+
"def extract_app_id(url):\n",
|
113 |
+
" return url.split('id=')[-1]\n",
|
114 |
+
"\n",
|
115 |
+
"# Liste zum Speichern der App-Informationen\n",
|
116 |
+
"app_info_list = []\n",
|
117 |
+
"\n",
|
118 |
+
"# Loop über die URLs in der CSV-Datei\n",
|
119 |
+
"for url in df['url']:\n",
|
120 |
+
" app_id = extract_app_id(url)\n",
|
121 |
+
" app_info = app(app_id, lang='en', country='us')\n",
|
122 |
+
" app_info_list.append(app_info)\n",
|
123 |
+
"\n",
|
124 |
+
"# Erstellen eines DataFrames mit den App-Informationen\n",
|
125 |
+
"app_info_df = pd.DataFrame(app_info_list)"
|
126 |
+
]
|
127 |
+
},
|
128 |
+
{
|
129 |
+
"cell_type": "markdown",
|
130 |
+
"metadata": {},
|
131 |
+
"source": [
|
132 |
+
"### 4. Visualisierung der abgerufenen Daten\n",
|
133 |
+
"\n",
|
134 |
+
"Die abgerufenen Daten visualisieren, z. B. die Bewertungen der Apps:"
|
135 |
+
]
|
136 |
+
},
|
137 |
+
{
|
138 |
+
"cell_type": "code",
|
139 |
+
"execution_count": null,
|
140 |
+
"metadata": {},
|
141 |
+
"outputs": [],
|
142 |
+
"source": [
|
143 |
+
"import matplotlib.pyplot as plt\n",
|
144 |
+
"\n",
|
145 |
+
"# Histogramm der App-Bewertungen\n",
|
146 |
+
"plt.figure(figsize=(10, 6))\n",
|
147 |
+
"plt.hist(app_info_df['score'], bins=20, color='skyblue', edgecolor='black')\n",
|
148 |
+
"plt.title('Verteilung der App-Bewertungen')\n",
|
149 |
+
"plt.xlabel('Bewertung')\n",
|
150 |
+
"plt.ylabel('Anzahl der Apps')\n",
|
151 |
+
"plt.show()"
|
152 |
+
]
|
153 |
+
},
|
154 |
+
{
|
155 |
+
"cell_type": "markdown",
|
156 |
+
"metadata": {},
|
157 |
+
"source": [
|
158 |
+
"## Fazit\n",
|
159 |
+
"\n",
|
160 |
+
"Diese Schritte ermöglichen die Installation des Google Play Scrapers, das Einlesen einer CSV-Datei mit App-URLs, das Abrufen von App-Informationen und die Visualisierung der Daten."
|
161 |
+
]
|
162 |
+
}
|
163 |
+
],
|
164 |
+
"metadata": {
|
165 |
+
"language_info": {
|
166 |
+
"name": "python"
|
167 |
+
}
|
168 |
+
},
|
169 |
+
"nbformat": 4,
|
170 |
+
"nbformat_minor": 2
|
171 |
+
}
|
src/03_low_code/app_market_scraping/app_market_scraping.qmd
DELETED
@@ -1,94 +0,0 @@
|
|
1 |
-
---
|
2 |
-
title: "App-Market-Scraping"
|
3 |
-
description: "Extraktion und Analyse von App-Marktdaten, einschließlich benutzerdefinierter Suchparameter und Datenexport."
|
4 |
-
image: _2f0cb788-71a6-4817-ab94-d38c346e4f6f.jpeg
|
5 |
-
format:
|
6 |
-
html:
|
7 |
-
toc: true
|
8 |
-
code-tools: true
|
9 |
-
jupyter: python3
|
10 |
-
---
|
11 |
-
|
12 |
-
## Lernziele
|
13 |
-
|
14 |
-
- Installation des Google Play Scrapers
|
15 |
-
- Einlesen einer CSV-Datei mit App-URLs
|
16 |
-
- Abrufen von App-Informationen über einen Loop
|
17 |
-
- Visualisierung der abgerufenen Daten
|
18 |
-
|
19 |
-
## App Market Scraping
|
20 |
-
|
21 |
-
Um Apps zu sammeln, besuchen Sie die [Google Play Search](../../02_basics/app_market/google-play-search.qmd) Anwendung. Diese Anwendung ermöglicht es Ihnen, nach Apps im Google Play Store zu suchen und die URLs der gefundenen Apps zu exportieren. Speichern Sie die exportierten URLs in einer CSV-Datei, die als Grundlage für die Auswertung dient.
|
22 |
-
|
23 |
-
### 1. Installation des Google Play Scrapers
|
24 |
-
|
25 |
-
In einem Colab Notebook, installiere die Google-Play-Scraper Bibliothek mit dem folgenden Befehl:
|
26 |
-
|
27 |
-
```python
|
28 |
-
!pip install google-play-scraper
|
29 |
-
```
|
30 |
-
|
31 |
-
### 2. Einlesen einer CSV-Datei mit App-URLs
|
32 |
-
|
33 |
-
Eine CSV-Datei (`app_urls.csv`) erstellen, die eine Spalte url enthält, welche die URLs der Google Play Store Apps auflistet. Beispiel:
|
34 |
-
|
35 |
-
```python
|
36 |
-
url
|
37 |
-
https://play.google.com/store/apps/details?id=com.example.app1
|
38 |
-
https://play.google.com/store/apps/details?id=com.example.app2
|
39 |
-
```
|
40 |
-
|
41 |
-
Die CSV-Datei in ein Pandas DataFrame einlesen:
|
42 |
-
|
43 |
-
```python
|
44 |
-
import pandas as pd
|
45 |
-
|
46 |
-
# CSV-Datei einlesen
|
47 |
-
df = pd.read_csv('app_urls.csv')
|
48 |
-
```
|
49 |
-
|
50 |
-
### 3. Abrufen von App-Informationen über einen Loop
|
51 |
-
|
52 |
-
Den Google Play Scraper verwenden, um Informationen über die Apps abzurufen:
|
53 |
-
|
54 |
-
```python
|
55 |
-
from google_play_scraper import app
|
56 |
-
|
57 |
-
# Funktion zum Extrahieren der App-ID aus der URL
|
58 |
-
def extract_app_id(url):
|
59 |
-
return url.split('id=')[-1]
|
60 |
-
|
61 |
-
# Liste zum Speichern der App-Informationen
|
62 |
-
app_info_list = []
|
63 |
-
|
64 |
-
# Loop über die URLs in der CSV-Datei
|
65 |
-
for url in df['url']:
|
66 |
-
app_id = extract_app_id(url)
|
67 |
-
app_info = app(app_id, lang='en', country='us')
|
68 |
-
app_info_list.append(app_info)
|
69 |
-
|
70 |
-
# Erstellen eines DataFrames mit den App-Informationen
|
71 |
-
app_info_df = pd.DataFrame(app_info_list)
|
72 |
-
```
|
73 |
-
|
74 |
-
### 4. Visualisierung der abgerufenen Daten
|
75 |
-
|
76 |
-
Die abgerufenen Daten visualisieren, z. B. die Bewertungen der Apps:
|
77 |
-
|
78 |
-
```python
|
79 |
-
import matplotlib.pyplot as plt
|
80 |
-
|
81 |
-
# Histogramm der App-Bewertungen
|
82 |
-
plt.figure(figsize=(10, 6))
|
83 |
-
plt.hist(app_info_df['score'], bins=20, color='skyblue', edgecolor='black')
|
84 |
-
plt.title('Verteilung der App-Bewertungen')
|
85 |
-
plt.xlabel('Bewertung')
|
86 |
-
plt.ylabel('Anzahl der Apps')
|
87 |
-
plt.show()
|
88 |
-
```
|
89 |
-
|
90 |
-
## Fazit
|
91 |
-
|
92 |
-
Diese Schritte ermöglichen die Installation des Google Play Scrapers, das Einlesen einer CSV-Datei mit App-URLs, das Abrufen von App-Informationen und die Visualisierung der Daten.
|
93 |
-
|
94 |
-
{{< downloadthis ../../assets/App_Market_Scraping.ipynb dname="App_Market_Scraping" label="Download Notebook Beispiel" icon="journal-code" type="success" >}}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/03_low_code/catalogue/bookstoscrape.ipynb
ADDED
@@ -0,0 +1,187 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {},
|
6 |
+
"source": [
|
7 |
+
"---\n",
|
8 |
+
"title: \"Bücherkatalog scrapen\"\n",
|
9 |
+
"description: \"Eine Anleitung zum Scraping von Büchern von der Website Books to Scrape, einschließlich Python-Beispielen und Datenexport.\"\n",
|
10 |
+
"image: _be1bcdc2-f540-4a95-a27c-775e8f2c1c07.jpeg\n",
|
11 |
+
"format: \n",
|
12 |
+
" html:\n",
|
13 |
+
" toc: true\n",
|
14 |
+
" code-tools: true\n",
|
15 |
+
"jupyter: python3\n",
|
16 |
+
"---"
|
17 |
+
]
|
18 |
+
},
|
19 |
+
{
|
20 |
+
"cell_type": "markdown",
|
21 |
+
"metadata": {},
|
22 |
+
"source": [
|
23 |
+
"[](https://colab.research.google.com/#fileId=https://huggingface.co/spaces/datenwerkzeuge/CDL-Webscraping-Workshop-2025/blob/main/src/03_low_code/catalogue/bookstoscrape.ipynb)"
|
24 |
+
]
|
25 |
+
},
|
26 |
+
{
|
27 |
+
"cell_type": "markdown",
|
28 |
+
"metadata": {},
|
29 |
+
"source": [
|
30 |
+
"# Einleitung\n",
|
31 |
+
"\n",
|
32 |
+
"In diesem Tutorial lernen wir, wie man die Website [Books to Scrape](https://books.toscrape.com/) mit Python und `BeautifulSoup` scrapt. Diese Seite dient oft als Beispiel für Web-Scraping, da sie eine einfache Struktur hat und keine komplexen Schutzmaßnahmen gegen Scraping implementiert."
|
33 |
+
]
|
34 |
+
},
|
35 |
+
{
|
36 |
+
"cell_type": "markdown",
|
37 |
+
"metadata": {},
|
38 |
+
"source": [
|
39 |
+
"## Voraussetzungen\n",
|
40 |
+
"\n",
|
41 |
+
"Stellen Sie sicher, dass Sie die folgenden Python-Bibliotheken installiert haben:"
|
42 |
+
]
|
43 |
+
},
|
44 |
+
{
|
45 |
+
"cell_type": "code",
|
46 |
+
"execution_count": null,
|
47 |
+
"metadata": {},
|
48 |
+
"outputs": [],
|
49 |
+
"source": [
|
50 |
+
"! pip install requests beautifulsoup4 pandas"
|
51 |
+
]
|
52 |
+
},
|
53 |
+
{
|
54 |
+
"cell_type": "markdown",
|
55 |
+
"metadata": {},
|
56 |
+
"source": [
|
57 |
+
"# Scraping der Buchdaten\n",
|
58 |
+
"\n",
|
59 |
+
"## Schritt 1: HTML-Inhalt abrufen\n",
|
60 |
+
"\n",
|
61 |
+
"Zuerst verwenden wir die `requests`-Bibliothek, um den HTML-Inhalt der Seite abzurufen."
|
62 |
+
]
|
63 |
+
},
|
64 |
+
{
|
65 |
+
"cell_type": "code",
|
66 |
+
"execution_count": null,
|
67 |
+
"metadata": {},
|
68 |
+
"outputs": [],
|
69 |
+
"source": [
|
70 |
+
"import requests\n",
|
71 |
+
"\n",
|
72 |
+
"# URL der Website\n",
|
73 |
+
"url = \"https://books.toscrape.com/\"\n",
|
74 |
+
"\n",
|
75 |
+
"# HTML-Inhalt abrufen\n",
|
76 |
+
"response = requests.get(url)\n",
|
77 |
+
"\n",
|
78 |
+
"# Überprüfen, ob die Anfrage erfolgreich war\n",
|
79 |
+
"if response.status_code == 200:\n",
|
80 |
+
" print(\"HTML-Inhalt erfolgreich abgerufen.\")\n",
|
81 |
+
"else:\n",
|
82 |
+
" print(f\"Fehler beim Abrufen der Seite: {response.status_code}\")"
|
83 |
+
]
|
84 |
+
},
|
85 |
+
{
|
86 |
+
"cell_type": "markdown",
|
87 |
+
"metadata": {},
|
88 |
+
"source": [
|
89 |
+
"## Schritt 2: HTML mit BeautifulSoup parsen\n",
|
90 |
+
"\n",
|
91 |
+
"Jetzt parsen wir den abgerufenen HTML-Inhalt mit `BeautifulSoup`."
|
92 |
+
]
|
93 |
+
},
|
94 |
+
{
|
95 |
+
"cell_type": "code",
|
96 |
+
"execution_count": null,
|
97 |
+
"metadata": {},
|
98 |
+
"outputs": [],
|
99 |
+
"source": [
|
100 |
+
"from bs4 import BeautifulSoup\n",
|
101 |
+
"\n",
|
102 |
+
"# HTML-Inhalt parsen\n",
|
103 |
+
"soup = BeautifulSoup(response.text, 'html.parser')\n",
|
104 |
+
"\n",
|
105 |
+
"# Überprüfen des Titels der Seite\n",
|
106 |
+
"print(soup.title.string)"
|
107 |
+
]
|
108 |
+
},
|
109 |
+
{
|
110 |
+
"cell_type": "markdown",
|
111 |
+
"metadata": {},
|
112 |
+
"source": [
|
113 |
+
"## Schritt 3: Daten extrahieren\n",
|
114 |
+
"\n",
|
115 |
+
"Wir extrahieren nun die Titel und Preise der Bücher."
|
116 |
+
]
|
117 |
+
},
|
118 |
+
{
|
119 |
+
"cell_type": "code",
|
120 |
+
"execution_count": null,
|
121 |
+
"metadata": {},
|
122 |
+
"outputs": [],
|
123 |
+
"source": [
|
124 |
+
"# Listen zur Speicherung der Daten\n",
|
125 |
+
"book_titles = []\n",
|
126 |
+
"book_prices = []\n",
|
127 |
+
"\n",
|
128 |
+
"# Alle Buchcontainer finden\n",
|
129 |
+
"books = soup.find_all('article', class_='product_pod')\n",
|
130 |
+
"\n",
|
131 |
+
"# Daten extrahieren\n",
|
132 |
+
"for book in books:\n",
|
133 |
+
" title = book.h3.a['title'] # Titel des Buches\n",
|
134 |
+
" price = book.find('p', class_='price_color').text # Preis des Buches\n",
|
135 |
+
" book_titles.append(title)\n",
|
136 |
+
" book_prices.append(price)\n",
|
137 |
+
"\n",
|
138 |
+
"# Daten anzeigen\n",
|
139 |
+
"for title, price in zip(book_titles, book_prices):\n",
|
140 |
+
" print(f\"{title}: {price}\")"
|
141 |
+
]
|
142 |
+
},
|
143 |
+
{
|
144 |
+
"cell_type": "markdown",
|
145 |
+
"metadata": {},
|
146 |
+
"source": [
|
147 |
+
"## Schritt 4: Daten in einem DataFrame speichern\n",
|
148 |
+
"\n",
|
149 |
+
"Um die extrahierten Daten zu speichern, verwenden wir `pandas`, um sie in einem DataFrame zu organisieren."
|
150 |
+
]
|
151 |
+
},
|
152 |
+
{
|
153 |
+
"cell_type": "code",
|
154 |
+
"execution_count": null,
|
155 |
+
"metadata": {},
|
156 |
+
"outputs": [],
|
157 |
+
"source": [
|
158 |
+
"import pandas as pd\n",
|
159 |
+
"\n",
|
160 |
+
"# DataFrame erstellen\n",
|
161 |
+
"books_df = pd.DataFrame({\n",
|
162 |
+
" 'Title': book_titles,\n",
|
163 |
+
" 'Price': book_prices\n",
|
164 |
+
"})\n",
|
165 |
+
"\n",
|
166 |
+
"# DataFrame anzeigen\n",
|
167 |
+
"print(books_df.head())"
|
168 |
+
]
|
169 |
+
},
|
170 |
+
{
|
171 |
+
"cell_type": "markdown",
|
172 |
+
"metadata": {},
|
173 |
+
"source": [
|
174 |
+
"# Fazit\n",
|
175 |
+
"\n",
|
176 |
+
"In diesem Tutorial haben wir gelernt, wie man die Website [Books to Scrape](https://books.toscrape.com/) mit Python und `BeautifulSoup` scrapt. Wir haben die Titel und Preise der Bücher extrahiert und in einem DataFrame gespeichert. Dieses Beispiel kann als Grundlage für komplexere Scraping-Projekte dienen."
|
177 |
+
]
|
178 |
+
}
|
179 |
+
],
|
180 |
+
"metadata": {
|
181 |
+
"language_info": {
|
182 |
+
"name": "python"
|
183 |
+
}
|
184 |
+
},
|
185 |
+
"nbformat": 4,
|
186 |
+
"nbformat_minor": 2
|
187 |
+
}
|
src/03_low_code/catalogue/bookstoscrape.qmd
DELETED
@@ -1,103 +0,0 @@
|
|
1 |
-
---
|
2 |
-
title: "Bücherkatalog scrapen"
|
3 |
-
description: "Eine Anleitung zum Scraping von Büchern von der Website Books to Scrape, einschließlich Python-Beispielen und Datenexport."
|
4 |
-
image: _be1bcdc2-f540-4a95-a27c-775e8f2c1c07.jpeg
|
5 |
-
format:
|
6 |
-
html:
|
7 |
-
toc: true
|
8 |
-
code-tools: true
|
9 |
-
jupyter: python3
|
10 |
-
---
|
11 |
-
|
12 |
-
# Einleitung
|
13 |
-
|
14 |
-
In diesem Tutorial lernen wir, wie man die Website [Books to Scrape](https://books.toscrape.com/) mit Python und `BeautifulSoup` scrapt. Diese Seite dient oft als Beispiel für Web-Scraping, da sie eine einfache Struktur hat und keine komplexen Schutzmaßnahmen gegen Scraping implementiert.
|
15 |
-
|
16 |
-
## Voraussetzungen
|
17 |
-
|
18 |
-
Stellen Sie sicher, dass Sie die folgenden Python-Bibliotheken installiert haben:
|
19 |
-
|
20 |
-
```bash
|
21 |
-
pip install requests beautifulsoup4 pandas
|
22 |
-
```
|
23 |
-
|
24 |
-
# Scraping der Buchdaten
|
25 |
-
|
26 |
-
## Schritt 1: HTML-Inhalt abrufen
|
27 |
-
|
28 |
-
Zuerst verwenden wir die `requests`-Bibliothek, um den HTML-Inhalt der Seite abzurufen.
|
29 |
-
|
30 |
-
```{python}
|
31 |
-
import requests
|
32 |
-
|
33 |
-
# URL der Website
|
34 |
-
url = "https://books.toscrape.com/"
|
35 |
-
|
36 |
-
# HTML-Inhalt abrufen
|
37 |
-
response = requests.get(url)
|
38 |
-
|
39 |
-
# Überprüfen, ob die Anfrage erfolgreich war
|
40 |
-
if response.status_code == 200:
|
41 |
-
print("HTML-Inhalt erfolgreich abgerufen.")
|
42 |
-
else:
|
43 |
-
print(f"Fehler beim Abrufen der Seite: {response.status_code}")
|
44 |
-
```
|
45 |
-
|
46 |
-
## Schritt 2: HTML mit BeautifulSoup parsen
|
47 |
-
|
48 |
-
Jetzt parsen wir den abgerufenen HTML-Inhalt mit `BeautifulSoup`.
|
49 |
-
|
50 |
-
```{python}
|
51 |
-
from bs4 import BeautifulSoup
|
52 |
-
|
53 |
-
# HTML-Inhalt parsen
|
54 |
-
soup = BeautifulSoup(response.text, 'html.parser')
|
55 |
-
|
56 |
-
# Überprüfen des Titels der Seite
|
57 |
-
print(soup.title.string)
|
58 |
-
```
|
59 |
-
|
60 |
-
## Schritt 3: Daten extrahieren
|
61 |
-
|
62 |
-
Wir extrahieren nun die Titel und Preise der Bücher.
|
63 |
-
|
64 |
-
```{python}
|
65 |
-
# Listen zur Speicherung der Daten
|
66 |
-
book_titles = []
|
67 |
-
book_prices = []
|
68 |
-
|
69 |
-
# Alle Buchcontainer finden
|
70 |
-
books = soup.find_all('article', class_='product_pod')
|
71 |
-
|
72 |
-
# Daten extrahieren
|
73 |
-
for book in books:
|
74 |
-
title = book.h3.a['title'] # Titel des Buches
|
75 |
-
price = book.find('p', class_='price_color').text # Preis des Buches
|
76 |
-
book_titles.append(title)
|
77 |
-
book_prices.append(price)
|
78 |
-
|
79 |
-
# Daten anzeigen
|
80 |
-
for title, price in zip(book_titles, book_prices):
|
81 |
-
print(f"{title}: {price}")
|
82 |
-
```
|
83 |
-
|
84 |
-
## Schritt 4: Daten in einem DataFrame speichern
|
85 |
-
|
86 |
-
Um die extrahierten Daten zu speichern, verwenden wir `pandas`, um sie in einem DataFrame zu organisieren.
|
87 |
-
|
88 |
-
```{python}
|
89 |
-
import pandas as pd
|
90 |
-
|
91 |
-
# DataFrame erstellen
|
92 |
-
books_df = pd.DataFrame({
|
93 |
-
'Title': book_titles,
|
94 |
-
'Price': book_prices
|
95 |
-
})
|
96 |
-
|
97 |
-
# DataFrame anzeigen
|
98 |
-
print(books_df.head())
|
99 |
-
```
|
100 |
-
|
101 |
-
# Fazit
|
102 |
-
|
103 |
-
In diesem Tutorial haben wir gelernt, wie man die Website [Books to Scrape](https://books.toscrape.com/) mit Python und `BeautifulSoup` scrapt. Wir haben die Titel und Preise der Bücher extrahiert und in einem DataFrame gespeichert. Dieses Beispiel kann als Grundlage für komplexere Scraping-Projekte dienen.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/_quarto.yml
CHANGED
@@ -75,11 +75,11 @@ website:
|
|
75 |
- section: "Kataloge erfassen"
|
76 |
href: 03_low_code/catalogue.qmd
|
77 |
contents:
|
78 |
-
- href: 03_low_code/catalogue/bookstoscrape.
|
79 |
text: "Bücherliste scrapen📚"
|
80 |
- href: 03_low_code/catalogue/quotes_scraper.ipynb
|
81 |
text: "Zitate scrapen💬"
|
82 |
-
- href: 03_low_code/app_market_scraping/app_market_scraping.
|
83 |
text: "App Markt analysieren📱"
|
84 |
- section: "Video Transkripte"
|
85 |
href: 03_low_code/video_transcripts.qmd
|
|
|
75 |
- section: "Kataloge erfassen"
|
76 |
href: 03_low_code/catalogue.qmd
|
77 |
contents:
|
78 |
+
- href: 03_low_code/catalogue/bookstoscrape.ipynb
|
79 |
text: "Bücherliste scrapen📚"
|
80 |
- href: 03_low_code/catalogue/quotes_scraper.ipynb
|
81 |
text: "Zitate scrapen💬"
|
82 |
+
- href: 03_low_code/app_market_scraping/app_market_scraping.ipynb
|
83 |
text: "App Markt analysieren📱"
|
84 |
- section: "Video Transkripte"
|
85 |
href: 03_low_code/video_transcripts.qmd
|
src/assets/App_Market_Scraping.ipynb
DELETED
The diff for this file is too large to render.
See raw diff
|
|
src/low_code.qmd
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
---
|
2 |
listing:
|
3 |
- id: low_code
|
4 |
-
contents: ["03_low_code/catalogue.qmd","03_low_code/app_market_scraping/app_market_scraping.
|
5 |
type: grid
|
6 |
---
|
7 |
|
|
|
1 |
---
|
2 |
listing:
|
3 |
- id: low_code
|
4 |
+
contents: ["03_low_code/catalogue.qmd","03_low_code/app_market_scraping/app_market_scraping.ipynb","03_low_code/video_transcripts.qmd"]
|
5 |
type: grid
|
6 |
---
|
7 |
|