{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "5223b1b7", "metadata": {}, "outputs": [], "source": [ "from web2json.preprocessor import *\n", "from web2json.ai_extractor import *\n", "from web2json.postprocessor import *\n", "from web2json.pipeline import *" ] }, { "cell_type": "code", "execution_count": 2, "id": "ae4e7f03", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import dotenv\n", "dotenv.load_dotenv()" ] }, { "cell_type": "code", "execution_count": 3, "id": "9e6b0eb9", "metadata": {}, "outputs": [], "source": [ "llm = NvidiaLLMClient(config={'api_key': os.getenv('NVIDIA_API_KEY'),'model_name': 'qwen/qwen2.5-7b-instruct'})" ] }, { "cell_type": "code", "execution_count": 4, "id": "3bc223d0", "metadata": {}, "outputs": [], "source": [ "prompt_template = \"\"\"\n", "You are a helpful assistant that extracts structured data from web pages.\n", "You will be given a web page and you need to extract the following information:\n", "{content}\n", "\n", "schema: {schema}\n", "Please provide the extracted data in JSON format.\n", "WITH ONLY THE FIELDS THAT ARE IN THE SCHEMA.\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 5, "id": "974417de", "metadata": {}, "outputs": [], "source": [ "classification_prompt_template = \"\"\"\n", "# HTML Chunk Relevance Classification Prompt\n", "\n", "You are an HTML content classifier. Your task is to analyze an HTML chunk against a given schema and determine if the content is relevant.\n", "\n", "## Instructions:\n", "1. Carefully examine the provided HTML chunk\n", "2. Compare it against the given schema/criteria\n", "3. Determine if the HTML chunk contains content that matches or is relevant to the schema\n", "4. Respond with ONLY a JSON object containing a single field \"relevant\" with value 1 (relevant) or 0 (not relevant)\n", "\n", "## Input Format:\n", "**Schema/Criteria:**\n", "{schema}\n", "\n", "**HTML Chunk:**\n", "```html\n", "{content}\n", "```\n", "\n", "## Output Format:\n", "Your response must be ONLY a valid JSON object with no additional text:\n", "\n", "```json\n", "{{\n", " \"relevant\": 1\n", "}}\n", "```\n", "\n", "OR\n", "\n", "```json\n", "{{\n", " \"relevant\": 0\n", "}}\n", "```\n", "\n", "## Classification Rules:\n", "- Output 1 if the HTML chunk contains content that matches the schema criteria\n", "- Output 0 if the HTML chunk does not contain relevant content\n", "- Consider semantic meaning, not just exact keyword matches\n", "- Look at text content, attributes, structure, and context\n", "- Ignore purely structural HTML elements (like divs, spans) unless they contain relevant content\n", "- Be STRICT in your evaluation - only mark as relevant (1) if there is clear, meaningful content that directly relates to the schema\n", "- Empty elements, placeholder text, navigation menus, headers/footers, and generic UI components should typically be marked as not relevant (0)\n", "- The HTML chunk does not need to contain ALL schema information, but it must contain SUBSTANTIAL and SPECIFIC content related to the schema\n", "\n", "CRITICAL: Your entire response MUST be exactly one JSON object. DO NOT include any explanations, reasoning, markdown formatting, code blocks, or additional text. Output ONLY the raw JSON object.\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 6, "id": "58436d65", "metadata": {}, "outputs": [], "source": [ "pre = BasicPreprocessor(config={'keep_tags':True})\n", "# llm = GeminiLLMClient(config={'api_key': os.getenv('GEMINI_API_KEY'),})\n", "# ai = AIExtractor(llm_client=llm ,prompt_template=prompt_template)\n", "ai = LLMClassifierExtractor(llm_client=llm, prompt_template=prompt_template, classifier_prompt=classification_prompt_template)\n", "post = PostProcessor()" ] }, { "cell_type": "code", "execution_count": 7, "id": "9c78eec9", "metadata": {}, "outputs": [], "source": [ "pipe = Pipeline(preprocessor=pre, ai_extractor=ai, postprocessor=post)" ] }, { "cell_type": "code", "execution_count": 8, "id": "0b324a01", "metadata": {}, "outputs": [], "source": [ "from pydantic import BaseModel, Field, constr, condecimal\n", "\n", "class ProductModel(BaseModel):\n", " productTitle: constr(min_length=1, max_length=200) = Field(\n", " ...,\n", " title=\"Product Title\",\n", " description=\"The full title of the product\"\n", " )\n", " price: condecimal(gt=0, decimal_places=2) = Field(\n", " ...,\n", " title=\"Product Price\",\n", " description=\"Unit price (must be > 0, two decimal places).\"\n", " )\n", " manufacturer: constr(min_length=1, max_length=1000) = Field(\n", " ...,\n", " title=\"Manufacturer\",\n", " description=\"Name of the product manufacturer.\"\n", " )\n", "\n", " " ] }, { "cell_type": "code", "execution_count": 9, "id": "92a5fc23", "metadata": {}, "outputs": [], "source": [ "config = {\n", " 'keep_tags': True,\n", "}" ] }, { "cell_type": "code", "execution_count": 16, "id": "d2cfb033", "metadata": {}, "outputs": [], "source": [ "url = \"https://www.amazon.com/Instant-Pot-Multi-Use-Programmable-Pressure/dp/B00FLYWNYQ?_encoding=UTF8&content-id=amzn1.sym.2f889ce0-246f-467a-a086-d9a721167240&dib=eyJ2IjoiMSJ9.2EzBddTDEktLY8ckTsraM_cZ6pzKuNkA6y_gLR0-Uz1ekttQU6tuQEcjb8PThy0PfhvxLqeYWh3N7pQrGgRxAWzapVklC_aU6xBzD-3Wwqx3qyQRHsmOhPRsSpeCOIIZqS3SKDowZEPYrGnCbRMt5vxnsYMW-fD-zBbgeoeGYmbsN2U6_HNhLjrpePKCbQPmnZBJ9UhgYE4fE3DVuYm8xlJe9l5GixDLVFtZUq4m5FE.Ol-jiuu9P6mQie0yXLJj-Ht5-TXmIXuRPije85p_YVo&dib_tag=se&keywords=cooker&pd_rd_r=2cede598-f3ae-49ca-8a46-e5945a9c2631&pd_rd_w=2HLSC&pd_rd_wg=ZyUUn&qid=1749508157&sr=8-3\"\n", "schema = ProductModel # pydantic class\n", "\n", "# read html file \n", "# with open(r'C:\\Users\\abdfa\\Desktop\\UNI STUFFING\\GRADUATION PROJECT\\Group Work\\MCP_WEB2JSON\\0000.htm', 'r', encoding='utf-8') as file:\n", "# content = file.read()\n", "\n", "# with open(r'C:\\Users\\abdfa\\Desktop\\UNI STUFFING\\GRADUATION PROJECT\\Group Work\\MCP_WEB2JSON\\Amazon.com_ Instant Pot Duo 7-in-1 Electric Pressure Cooker, Slow Cooker, Rice Cooker, Steamer, Sauté, Yogurt Maker, Warmer & Sterilizer, Includes App With Over 800 Recipes, Stainless Steel, 6 Quart.htm', 'r', encoding='utf-8') as file:\n", "# content = file.read()\n" ] }, { "cell_type": "code", "execution_count": 23, "id": "79cf2321", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Preprocessed content: \n", "\n", "
\n", "\n", "\n", "\n", "Price | $99.95 |
AmazonGlobal Shipping | $81.05 |
\n", " Estimated Import Fees Deposit\n", " | $154.27 |
Total | $335.27 |
Brand | Instant Pot |
Capacity | 5.68 Liters |
Material | Stainless steel |
Finish Type | Stainless Steel |
Product Dimensions | 12.2\"D x 13.38\"W x 12.48\"H |
Special Feature | Programmable |
Wattage | 1000 watts |
Item Weight | 11.8 Pounds |
Control Method | Touch |
Controller Type | Push Button |
This product is compatible with outlets that support 120 volts and might require a converter when used outside of the United States.
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers find the pressure cooker works well, particularly praising its sauté feature and accurate cooking times. They appreciate its ease of use, with one customer noting the intuitive controls, and consider it a great kitchen appliance that makes meal prep convenient. The appliance receives positive feedback for its cooking ability, with one customer highlighting its versatility in transforming into a pressure cooker, and customers find it easy to clean with a stainless steel pot that cleans well. Customers enjoy the complex flavors produced, though opinions on build quality are mixed, with some finding it well-made while others describe it as wimpy.
AI Generated from the text of customer reviews
Customers find that the pressure cooker works well, with the sauté feature performing particularly effectively.
\"...This works with new potatoes, and regular potatoes! Happy Instant Potting!\" Read more
\"...It was excellent. I did 6 minutes per pound + 2 minutes. I also cook chicken thighs for dinner about once a week, which I had never cooked before....\" Read more
\"...Most programs work just fine on full automatic, but some small exceptions may demand more online flexibility....\" Read more
\"...occasional mishaps, the Instant Pot Duo has consistently delivered incredible results....\" Read more
Customers find the pressure cooker simple to use, with clear operating instructions in the booklet, making meal preparation a breeze.
\"...make in your Instant Pot that will change your life: incredibly easy perfectly poached eggs in 2-3 minutes, and baked potatoes in 12 minutes....\" Read more
\"...credit as most automatic settings work well, automating it for ease of use and safety. Cooking is part Science, but, I think, more Art than Science....\" Read more
\"...crockpot extensively over the past years and while I appreciate the ease of use and the ability to put a meal on the table soon after I got home in...\" Read more
\"...of pressure cookers anymore, the time , energy bills saved n convenience is worth it!...\" Read more
Customers appreciate the pressure cooker's quick cooking time, with one mentioning it can make rice in just 10 minutes, while another notes it cooks like a crockpot in 1/8th the time.
\"...incredibly easy perfectly poached eggs in 2-3 minutes, and baked potatoes in 12 minutes....\" Read more
\"...My kids love it. 8 minutes on manual with a natural release. I just stir it with a fork and don't even need to blend it....\" Read more
\"...steel liner (looks like chrome), along with the delay and cooking timer auto-shutoff. This sets it apart from old-time swisher type 1st Gen P.C.'s....\" Read more
\"...versatile appliance seamlessly transforms into a pressure cooker, slow cooker, rice cooker, steamer, sauté pan, yogurt maker, warmer, and even a...\" Read more
Customers find the pressure cooker to be a fabulous kitchen appliance, with one customer noting its versatility as both a pressure cooker and crockpot.
\"...When you are ready for your potatoes, they will be perfectly done and waiting for you, even if you have abandoned them for hours!...\" Read more
\"...I have to use a rapid boil just to make tea. A pressure cooker is the great equalizer, a must at higher altitudes because 15 lbs is 15 lbs pressure...\" Read more
\"...This versatile appliance seamlessly transforms into a pressure cooker, slow cooker, rice cooker, steamer, sauté pan, yogurt maker, warmer, and even...\" Read more
\"...It's just better insulated, but I've found that meals are so good under pressure that there's no need to use the slow cooker function....\" Read more
Customers praise the pressure cooker's cooking ability, particularly its amazing recipes and rice cooking feature, with one customer noting it makes stir-fry dishes and another mentioning it's easy to use on the dining room table.
\"...there in the morning, leave for the day, and come back to a perfectly cooked whatever, just waiting for you! Booyah!...\" Read more
\"...You could very easily cook on the dining room table, or a small adjacent table....\" Read more
\"...While the free app provided great recipes and guidance, a comprehensive manual would have been helpful for understanding all the features and...\" Read more
\"...This handy appliance has transformed my summertime cooking, allowing me to break away from our usual salads and grilled chicken rut....\" Read more
Customers find the pressure cooker easy to clean, with the stainless steel pot being particularly effective, and one customer noting that the inner pot can be removed for thorough cleaning.
\"...First, it is almost impossible to mess up with this thing to a point of being dangerous, so if you're concerned about the exploding pressure cookers...\" Read more
\"...It also only requires washing a cheese grater and the pot and it only takes 20 minute from start to finish....\" Read more
\"...The liner really is easy to clean. Rinse it out under the hot water, a soapy sponge, re-rinse and set it in the sink basket to dry....\" Read more
\"...anymore and I'm finding that it seems easier and makes less of a mess than going stovetop....\" Read more
Customers enjoy the flavor of food cooked in this pressure cooker, particularly praising its ability to create complex and yuck-free dishes, with one customer noting it makes food more tender and juicy.
\"...the baked potatoes you know and love - they're great with butter, sour cream, etc.! This works with new potatoes, and regular potatoes!...\" Read more
\"...I strain it and then have beautiful, healthy, yummy chicken broth. The first time I did it my husband looked at me like I was cray-cray....\" Read more
\"...An added lower pressure setting extends its ability to more tender foods....\" Read more
\"...I was amazed at how easy and delicious it was to make this soup that would normally take a couple of hours....\" Read more
Customers have mixed opinions about the pressure cooker's build quality, with some praising its thick stainless steel construction and reliable performance, while others find it wimpy and criticize the cheap plastic cup.
\"...pressure cooker, slow cooker, rice cooker, steamer, sauté pan, yogurt maker, and warmer....\" Read more
\"...It’s also super easy to clean, and the stainless steel inner pot feels sturdy and high-quality....\" Read more
\"...and fast, & so far it works great but my biggest drawback is how it’s built....\" Read more
\"...The inner pot is made of heavy duty stainless steel with an aluminum encapsulated base, and is polished to a very shiny finish....\" Read more