Below is a **draft** of a follow-up tutorial, **“Smart Crawling Techniques,”** building on the **“AsyncWebCrawler Basics”** tutorial. This tutorial focuses on three main points: 1. **Advanced usage of CSS selectors** (e.g., partial extraction, exclusions) 2. **Handling iframes** (if relevant for your workflow) 3. **Waiting for dynamic content** using `wait_for`, including the new `css:` and `js:` prefixes Feel free to adjust code snippets, wording, or emphasis to match your library updates or user feedback. --- # Smart Crawling Techniques In the previous tutorial ([AsyncWebCrawler Basics](./async-webcrawler-basics.md)), you learned how to create an `AsyncWebCrawler` instance, run a basic crawl, and inspect the `CrawlResult`. Now it’s time to explore some of the **targeted crawling** features that let you: 1. Select specific parts of a webpage using CSS selectors 2. Exclude or ignore certain page elements 3. Wait for dynamic content to load using `wait_for` (with `css:` or `js:` rules) 4. (Optionally) Handle iframes if your target site embeds additional content > **Prerequisites** > - You’ve read or completed [AsyncWebCrawler Basics](./async-webcrawler-basics.md). > - You have a working environment for Crawl4AI (Playwright installed, etc.). --- ## 1. Targeting Specific Elements with CSS Selectors ### 1.1 Simple CSS Selector Usage Let’s say you only need to crawl the main article content of a news page. By setting `css_selector` in `CrawlerRunConfig`, your final HTML or Markdown output focuses on that region. For example: ```python import asyncio from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig async def main(): browser_cfg = BrowserConfig(headless=True) crawler_cfg = CrawlerRunConfig( css_selector=".article-body", # Only capture .article-body content excluded_tags=["nav", "footer"] # Optional: skip big nav & footer sections ) async with AsyncWebCrawler(config=browser_cfg) as crawler: result = await crawler.arun( url="https://news.example.com/story/12345", config=crawler_cfg ) if result.success: print("[OK] Extracted content length:", len(result.html)) else: print("[ERROR]", result.error_message) if __name__ == "__main__": asyncio.run(main()) ``` **Key Parameters**: - **`css_selector`**: Tells the crawler to focus on `.article-body`. - **`excluded_tags`**: Tells the crawler to skip specific HTML tags altogether (e.g., `nav` or `footer`). **Tip**: For extremely noisy pages, you can further refine how you exclude certain elements by using `excluded_selector`, which takes a CSS selector you want removed from the final output. ### 1.2 Excluding Content with `excluded_selector` If you want to remove certain sections within `.article-body` (like “related stories” sidebars), set: ```python CrawlerRunConfig( css_selector=".article-body", excluded_selector=".related-stories, .ads-banner" ) ``` This combination grabs the main article content while filtering out sidebars or ads. --- ## 2. Handling Iframes Some sites embed extra content via `