File size: 5,975 Bytes
03c0888
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# Page Interaction

Crawl4AI provides powerful features for interacting with dynamic webpages, handling JavaScript execution, and managing page events.

## JavaScript Execution

### Basic Execution

```python
from crawl4ai.async_configs import CrawlerRunConfig

# Single JavaScript command
config = CrawlerRunConfig(
    js_code="window.scrollTo(0, document.body.scrollHeight);"
)
result = await crawler.arun(url="https://example.com", config=config)

# Multiple commands
js_commands = [
    "window.scrollTo(0, document.body.scrollHeight);",
    "document.querySelector('.load-more').click();",
    "document.querySelector('#consent-button').click();"
]
config = CrawlerRunConfig(js_code=js_commands)
result = await crawler.arun(url="https://example.com", config=config)
```

## Wait Conditions

### CSS-Based Waiting

Wait for elements to appear:

```python
config = CrawlerRunConfig(wait_for="css:.dynamic-content")  # Wait for element with class 'dynamic-content'
result = await crawler.arun(url="https://example.com", config=config)
```

### JavaScript-Based Waiting

Wait for custom conditions:

```python
# Wait for number of elements
wait_condition = """() => {
    return document.querySelectorAll('.item').length > 10;
}"""

config = CrawlerRunConfig(wait_for=f"js:{wait_condition}")
result = await crawler.arun(url="https://example.com", config=config)

# Wait for dynamic content to load
wait_for_content = """() => {
    const content = document.querySelector('.content');
    return content && content.innerText.length > 100;
}"""

config = CrawlerRunConfig(wait_for=f"js:{wait_for_content}")
result = await crawler.arun(url="https://example.com", config=config)
```

## Handling Dynamic Content

### Load More Content

Handle infinite scroll or load more buttons:

```python
config = CrawlerRunConfig(
    js_code=[
        "window.scrollTo(0, document.body.scrollHeight);",  # Scroll to bottom
        "const loadMore = document.querySelector('.load-more'); if(loadMore) loadMore.click();"  # Click load more
    ],
    wait_for="js:() => document.querySelectorAll('.item').length > previousCount"  # Wait for new content
)
result = await crawler.arun(url="https://example.com", config=config)
```

### Form Interaction

Handle forms and inputs:

```python
js_form_interaction = """
    document.querySelector('#search').value = 'search term';  // Fill form fields
    document.querySelector('form').submit();                 // Submit form
"""

config = CrawlerRunConfig(
    js_code=js_form_interaction,
    wait_for="css:.results"  # Wait for results to load
)
result = await crawler.arun(url="https://example.com", config=config)
```

## Timing Control

### Delays and Timeouts

Control timing of interactions:

```python
config = CrawlerRunConfig(
    page_timeout=60000,              # Page load timeout (ms)
    delay_before_return_html=2.0     # Wait before capturing content
)
result = await crawler.arun(url="https://example.com", config=config)
```

## Complex Interactions Example

Here's an example of handling a dynamic page with multiple interactions:

```python
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig

async def crawl_dynamic_content():
    async with AsyncWebCrawler() as crawler:
        # Initial page load
        config = CrawlerRunConfig(
            js_code="document.querySelector('.cookie-accept')?.click();",  # Handle cookie consent
            wait_for="css:.main-content"
        )
        result = await crawler.arun(url="https://example.com", config=config)

        # Load more content
        session_id = "dynamic_session"  # Keep session for multiple interactions
        
        for page in range(3):  # Load 3 pages of content
            config = CrawlerRunConfig(
                session_id=session_id,
                js_code=[
                    "window.scrollTo(0, document.body.scrollHeight);",  # Scroll to bottom
                    "window.previousCount = document.querySelectorAll('.item').length;",  # Store item count
                    "document.querySelector('.load-more')?.click();"   # Click load more
                ],
                wait_for="""() => {
                    const currentCount = document.querySelectorAll('.item').length;
                    return currentCount > window.previousCount;
                }""",
                js_only=(page > 0)  # Execute JS without reloading page for subsequent interactions
            )
            result = await crawler.arun(url="https://example.com", config=config)
            print(f"Page {page + 1} items:", len(result.cleaned_html))

        # Clean up session
        await crawler.crawler_strategy.kill_session(session_id)
```

## Using with Extraction Strategies

Combine page interaction with structured extraction:

```python
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy, LLMExtractionStrategy
from crawl4ai.async_configs import CrawlerRunConfig

# Pattern-based extraction after interaction
schema = {
    "name": "Dynamic Items",
    "baseSelector": ".item",
    "fields": [
        {"name": "title", "selector": "h2", "type": "text"},
        {"name": "description", "selector": ".desc", "type": "text"}
    ]
}

config = CrawlerRunConfig(
    js_code="window.scrollTo(0, document.body.scrollHeight);",
    wait_for="css:.item:nth-child(10)",  # Wait for 10 items
    extraction_strategy=JsonCssExtractionStrategy(schema)
)
result = await crawler.arun(url="https://example.com", config=config)

# Or use LLM to analyze dynamic content
class ContentAnalysis(BaseModel):
    topics: List[str]
    summary: str

config = CrawlerRunConfig(
    js_code="document.querySelector('.show-more').click();",
    wait_for="css:.full-content",
    extraction_strategy=LLMExtractionStrategy(
        provider="ollama/nemotron",
        schema=ContentAnalysis.schema(),
        instruction="Analyze the full content"
    )
)
result = await crawler.arun(url="https://example.com", config=config)
```