Crawl4AI / docs /md_v2 /api /crawl-config.md
amaye15
test
03c0888

CrawlerRunConfig Parameters Documentation

Content Processing Parameters

Parameter Type Default Description
word_count_threshold int 200 Minimum word count threshold before processing content
extraction_strategy ExtractionStrategy None Strategy to extract structured data from crawled pages. When None, uses NoExtractionStrategy
chunking_strategy ChunkingStrategy RegexChunking() Strategy to chunk content before extraction
markdown_generator MarkdownGenerationStrategy None Strategy for generating markdown from extracted content
content_filter RelevantContentFilter None Optional filter to prune irrelevant content
only_text bool False If True, attempt to extract text-only content where applicable
css_selector str None CSS selector to extract a specific portion of the page
excluded_tags list[str] [] List of HTML tags to exclude from processing
keep_data_attributes bool False If True, retain data-* attributes while removing unwanted attributes
remove_forms bool False If True, remove all <form> elements from the HTML
prettiify bool False If True, apply fast_format_html to produce prettified HTML output

Caching Parameters

Parameter Type Default Description
cache_mode CacheMode None Defines how caching is handled. Defaults to CacheMode.ENABLED internally
session_id str None Optional session ID to persist browser context and page instance
bypass_cache bool False Legacy parameter, if True acts like CacheMode.BYPASS
disable_cache bool False Legacy parameter, if True acts like CacheMode.DISABLED
no_cache_read bool False Legacy parameter, if True acts like CacheMode.WRITE_ONLY
no_cache_write bool False Legacy parameter, if True acts like CacheMode.READ_ONLY

Page Navigation and Timing Parameters

Parameter Type Default Description
wait_until str "domcontentloaded" The condition to wait for when navigating
page_timeout int 60000 Timeout in milliseconds for page operations like navigation
wait_for str None CSS selector or JS condition to wait for before extracting content
wait_for_images bool True If True, wait for images to load before extracting content
delay_before_return_html float 0.1 Delay in seconds before retrieving final HTML
mean_delay float 0.1 Mean base delay between requests when calling arun_many
max_range float 0.3 Max random additional delay range for requests in arun_many
semaphore_count int 5 Number of concurrent operations allowed

Page Interaction Parameters

Parameter Type Default Description
js_code str or list[str] None JavaScript code/snippets to run on the page
js_only bool False If True, indicates subsequent calls are JS-driven updates
ignore_body_visibility bool True If True, ignore whether the body is visible before proceeding
scan_full_page bool False If True, scroll through the entire page to load all content
scroll_delay float 0.2 Delay in seconds between scroll steps if scan_full_page is True
process_iframes bool False If True, attempts to process and inline iframe content
remove_overlay_elements bool False If True, remove overlays/popups before extracting HTML
simulate_user bool False If True, simulate user interactions for anti-bot measures
override_navigator bool False If True, overrides navigator properties for more human-like behavior
magic bool False If True, attempts automatic handling of overlays/popups
adjust_viewport_to_content bool False If True, adjust viewport according to page content dimensions

Media Handling Parameters

Parameter Type Default Description
screenshot bool False Whether to take a screenshot after crawling
screenshot_wait_for float None Additional wait time before taking a screenshot
screenshot_height_threshold int 20000 Threshold for page height to decide screenshot strategy
pdf bool False Whether to generate a PDF of the page
image_description_min_word_threshold int 50 Minimum words for image description extraction
image_score_threshold int 3 Minimum score threshold for processing an image
exclude_external_images bool False If True, exclude all external images from processing

Link and Domain Handling Parameters

Parameter Type Default Description
exclude_social_media_domains list[str] SOCIAL_MEDIA_DOMAINS List of domains to exclude for social media links
exclude_external_links bool False If True, exclude all external links from the results
exclude_social_media_links bool False If True, exclude links pointing to social media domains
exclude_domains list[str] [] List of specific domains to exclude from results

Debugging and Logging Parameters

Parameter Type Default Description
verbose bool True Enable verbose logging
log_console bool False If True, log console messages from the page