# Simple Crawling This guide covers the basics of web crawling with Crawl4AI. You'll learn how to set up a crawler, make your first request, and understand the response. ## Basic Usage Set up a simple crawl using `BrowserConfig` and `CrawlerRunConfig`: ```python import asyncio from crawl4ai import AsyncWebCrawler from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig async def main(): browser_config = BrowserConfig() # Default browser configuration run_config = CrawlerRunConfig() # Default crawl run configuration async with AsyncWebCrawler(config=browser_config) as crawler: result = await crawler.arun( url="https://example.com", config=run_config ) print(result.markdown) # Print clean markdown content if __name__ == "__main__": asyncio.run(main()) ``` ## Understanding the Response The `arun()` method returns a `CrawlResult` object with several useful properties. Here's a quick overview (see [CrawlResult](../api/crawl-result.md) for complete details): ```python result = await crawler.arun( url="https://example.com", config=CrawlerRunConfig(fit_markdown=True) ) # Different content formats print(result.html) # Raw HTML print(result.cleaned_html) # Cleaned HTML print(result.markdown) # Markdown version print(result.fit_markdown) # Most relevant content in markdown # Check success status print(result.success) # True if crawl succeeded print(result.status_code) # HTTP status code (e.g., 200, 404) # Access extracted media and links print(result.media) # Dictionary of found media (images, videos, audio) print(result.links) # Dictionary of internal and external links ``` ## Adding Basic Options Customize your crawl using `CrawlerRunConfig`: ```python run_config = CrawlerRunConfig( word_count_threshold=10, # Minimum words per content block exclude_external_links=True, # Remove external links remove_overlay_elements=True, # Remove popups/modals process_iframes=True # Process iframe content ) result = await crawler.arun( url="https://example.com", config=run_config ) ``` ## Handling Errors Always check if the crawl was successful: ```python run_config = CrawlerRunConfig() result = await crawler.arun(url="https://example.com", config=run_config) if not result.success: print(f"Crawl failed: {result.error_message}") print(f"Status code: {result.status_code}") ``` ## Logging and Debugging Enable verbose logging in `BrowserConfig`: ```python browser_config = BrowserConfig(verbose=True) async with AsyncWebCrawler(config=browser_config) as crawler: run_config = CrawlerRunConfig() result = await crawler.arun(url="https://example.com", config=run_config) ``` ## Complete Example Here's a more comprehensive example demonstrating common usage patterns: ```python import asyncio from crawl4ai import AsyncWebCrawler from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig, CacheMode async def main(): browser_config = BrowserConfig(verbose=True) run_config = CrawlerRunConfig( # Content filtering word_count_threshold=10, excluded_tags=['form', 'header'], exclude_external_links=True, # Content processing process_iframes=True, remove_overlay_elements=True, # Cache control cache_mode=CacheMode.ENABLED # Use cache if available ) async with AsyncWebCrawler(config=browser_config) as crawler: result = await crawler.arun( url="https://example.com", config=run_config ) if result.success: # Print clean content print("Content:", result.markdown[:500]) # First 500 chars # Process images for image in result.media["images"]: print(f"Found image: {image['src']}") # Process links for link in result.links["internal"]: print(f"Internal link: {link['href']}") else: print(f"Crawl failed: {result.error_message}") if __name__ == "__main__": asyncio.run(main()) ```