File size: 5,327 Bytes
03c0888
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# Browser Configuration

Crawl4AI supports multiple browser engines and offers extensive configuration options for browser behavior.

## Browser Types

Choose from three browser engines:

```python
# Chromium (default)
async with AsyncWebCrawler(browser_type="chromium") as crawler:
    result = await crawler.arun(url="https://example.com")

# Firefox
async with AsyncWebCrawler(browser_type="firefox") as crawler:
    result = await crawler.arun(url="https://example.com")

# WebKit
async with AsyncWebCrawler(browser_type="webkit") as crawler:
    result = await crawler.arun(url="https://example.com")
```

## Basic Configuration

Common browser settings:

```python
async with AsyncWebCrawler(
    headless=True,           # Run in headless mode (no GUI)
    verbose=True,           # Enable detailed logging
    sleep_on_close=False    # No delay when closing browser
) as crawler:
    result = await crawler.arun(url="https://example.com")
```

## Identity Management

Control how your crawler appears to websites:

```python
# Custom user agent
async with AsyncWebCrawler(
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
) as crawler:
    result = await crawler.arun(url="https://example.com")

# Custom headers
headers = {
    "Accept-Language": "en-US,en;q=0.9",
    "Cache-Control": "no-cache"
}
async with AsyncWebCrawler(headers=headers) as crawler:
    result = await crawler.arun(url="https://example.com")
```

## Screenshot Capabilities

Capture page screenshots with enhanced error handling:

```python
result = await crawler.arun(
    url="https://example.com",
    screenshot=True,                # Enable screenshot
    screenshot_wait_for=2.0        # Wait 2 seconds before capture
)

if result.screenshot:  # Base64 encoded image
    import base64
    with open("screenshot.png", "wb") as f:
        f.write(base64.b64decode(result.screenshot))
```

## Timeouts and Waiting

Control page loading behavior:

```python
result = await crawler.arun(
    url="https://example.com",
    page_timeout=60000,              # Page load timeout (ms)
    delay_before_return_html=2.0,    # Wait before content capture
    wait_for="css:.dynamic-content"  # Wait for specific element
)
```

## JavaScript Execution

Execute custom JavaScript before crawling:

```python
# Single JavaScript command
result = await crawler.arun(
    url="https://example.com",
    js_code="window.scrollTo(0, document.body.scrollHeight);"
)

# Multiple commands
js_commands = [
    "window.scrollTo(0, document.body.scrollHeight);",
    "document.querySelector('.load-more').click();"
]
result = await crawler.arun(
    url="https://example.com",
    js_code=js_commands
)
```

## Proxy Configuration

Use proxies for enhanced access:

```python
# Simple proxy
async with AsyncWebCrawler(
    proxy="http://proxy.example.com:8080"
) as crawler:
    result = await crawler.arun(url="https://example.com")

# Proxy with authentication
proxy_config = {
    "server": "http://proxy.example.com:8080",
    "username": "user",
    "password": "pass"
}
async with AsyncWebCrawler(proxy_config=proxy_config) as crawler:
    result = await crawler.arun(url="https://example.com")
```

## Anti-Detection Features

Enable stealth features to avoid bot detection:

```python
result = await crawler.arun(
    url="https://example.com",
    simulate_user=True,        # Simulate human behavior
    override_navigator=True,   # Mask automation signals
    magic=True               # Enable all anti-detection features
)
```

## Handling Dynamic Content

Configure browser to handle dynamic content:

```python
# Wait for dynamic content
result = await crawler.arun(
    url="https://example.com",
    wait_for="js:() => document.querySelector('.content').children.length > 10",
    process_iframes=True     # Process iframe content
)

# Handle lazy-loaded images
result = await crawler.arun(
    url="https://example.com",
    js_code="window.scrollTo(0, document.body.scrollHeight);",
    delay_before_return_html=2.0  # Wait for images to load
)
```

## Comprehensive Example

Here's how to combine various browser configurations:

```python
async def crawl_with_advanced_config(url: str):
    async with AsyncWebCrawler(
        # Browser setup
        browser_type="chromium",
        headless=True,
        verbose=True,
        
        # Identity
        user_agent="Custom User Agent",
        headers={"Accept-Language": "en-US"},
        
        # Proxy setup
        proxy="http://proxy.example.com:8080"
    ) as crawler:
        result = await crawler.arun(
            url=url,
            # Content handling
            process_iframes=True,
            screenshot=True,
            
            # Timing
            page_timeout=60000,
            delay_before_return_html=2.0,
            
            # Anti-detection
            magic=True,
            simulate_user=True,
            
            # Dynamic content
            js_code=[
                "window.scrollTo(0, document.body.scrollHeight);",
                "document.querySelector('.load-more')?.click();"
            ],
            wait_for="css:.dynamic-content"
        )
        
        return {
            "content": result.markdown,
            "screenshot": result.screenshot,
            "success": result.success
        }
```