--- language: - en pipeline_tag: text-classification widget: - title: Rating Example text: '4.7' - title: Reviews Example text: (188) - title: Reviews Example 2 text: '188' - title: Reviews Example 3 text: No Reviews - title: Price Example text: $ - title: Type Example text: Coffee shop - title: Address Example text: Frederick, MD - title: Address Example 2 text: 552 W 48th St - title: Address Example 3 text: In Hilton Hotel - title: Hours Example text: Closed - title: Hours Example 2 text: Opens 7 AM Fri - title: Hours Example 3 text: Permanently closed - title: Service Option Example text: Dine-in - title: Service Option Example 2 text: Takeout - title: Service Option Example 3 text: Delivery - title: Phone Example text: (301) 000-0000 - title: Years In Business Example text: 5+ Years in Business - title: Button Text Example text: Directions - title: Description Example text: 'Provides: Auto maintenance' license: mit datasets: - serpapi/local-results-en tags: - scraping - parsing - serp - api - opensource ---
This repository contains a BERT-based classification model developed using the Hugging Face library, and a dataset gathered by SerpApi's Google Local API. The model is designed to classify different texts extracted from Google Local Listings.
You may check out the blog post explaining the model's usecase with an example: Real World Example of AI Powered Parsing.
You may also check out the Open Source Github Repository that contains the source code of a Ruby Gem called `google-local-results-ai-parser`.
---
The example code below represents using it Python with Inference API for prototyping. You may use different programming languages for calling the results, and you may parallelize your work. Prototyping endpoint will have limited amount of calls. For Production Purposes
or Large Prototyping Activities
, consider setting an Inference API Endpoint from Huggingface
, or a Private API Server
for serving the model.
The BERT-based model excels in the following areas:
"No Reviews"
→ reviews
"(5K+)"
→ reviews
"Open ⋅ Closes 5 pm"
"Open"
→ hours
"Closes 5 pm"
→ hours
"Doctor"
→ type
"Restaurant"
→ type
"4.7"
→ rating(0.999)
"Krebside Pickup"
→ service options
type
reviews
phone
rating
address
hours
description
expensiveness
service options
links
years_in_business
Please refer to the documentation of SerpApi's Google Local API and Google Local Pack API for more details on different parts:
The model has a few limitations that should be taken into account:
label
key is not covered by the model, as it can be easily handled with traditional code.button text
could be classified as service options
or address
. However, this can be easily avoided by checking if a text is in a button in the traditional part of the code. The button text is only used to prevent emergent cases.
"Delivery"
→ service options [Correct Label is button text]
"Share"
→ address [Correct Label is button text]
description
as hours
if the description is about operating hours. For example:
"Drive through: Open ⋅ Closes 12 AM"
"Drive through: Open"
→ description
"Closes 12 AM"
→ hours
description
as type
. This is because some description
do look like type
. For Example:
"Iconic Seattle-based coffeehouse chain"
→ type [Correct Label is description]
reviews
as rating
. This is most likely a deficiency in the training dataset, and may be resolved in the coming versions. For Example:
"Expand more"
→ hours [Correct Label is button text]
service options
as type
. This is most likely a deficiency in the training dataset, and may be resolved in the coming versions. For Example:
"Takeaway"
→ type [Correct Label is service options]
reviews
as hours
or price
. This is most likely a deficiency in the training dataset, and may be resolved in the coming versions. For Example:
"(1.4K)"
→ rating [Correct Label is reviews]
"(1.6K)"
→ price [Correct Label is reviews]
service options
as description
or type
. The reason for the confusion on description
is because of a recent change in their categorization in SerpApi keys. The data contains labels prior to that. For Example:
"On-site services"
→ type [Correct Label is service options]
"Online appointments"
→ description [Correct Label is service options]
"Sushi"
→ address(0.984), type(0.0493) [Correct Label is type]
"Diagorou 4"
→ address(0.999) [Correct address in same listing]
We value full transparency and painful honesty both in our internal and external communications. We believe a world with complete and open transparency is a better world.
However, while we strive for transparency, there are certain situations where sharing specific datasets may not be feasible or advisable. In the case of the dataset used to train our model, which contains different parts of a Google Local Listing including addresses and phone numbers, we have made a careful decision not to share it. We prioritize the well-being and safety of individuals, and sharing this dataset could potentially cause harm to people whose personal information is included.
Protecting the privacy and security of individuals is of utmost importance to us. Disclosing personal information, such as addresses and phone numbers, without proper consent or safeguards could lead to privacy violations, identity theft, harassment, or other forms of misuse. Our commitment to responsible data usage means that we handle sensitive information with great care and take appropriate measures to ensure its protection.
While we understand the value of transparency, we also recognize the need to strike a balance between transparency and safeguarding individuals' privacy and security. In this particular case, the potential harm that could result from sharing the dataset outweighs the benefits of complete transparency. By prioritizing privacy, we aim to create a safer and more secure environment for all individuals involved.
We appreciate your understanding and support in our commitment to responsible and ethical data practices. If you have any further questions or concerns, please feel free to reach out to us.