# Crawl4AI Welcome to the official documentation for Crawl4AI! πŸ•·οΈπŸ€– Crawl4AI is an open-source Python library designed to simplify web crawling and extract useful information from web pages. This documentation will guide you through the features, usage, and customization of Crawl4AI. ## Introduction Crawl4AI has one clear task: to make crawling and data extraction from web pages easy and efficient, especially for large language models (LLMs) and AI applications. Whether you are using it as a REST API or a Python library, Crawl4AI offers a robust and flexible solution with full asynchronous support. ## Quick Start Here's a quick example to show you how easy it is to use Crawl4AI with its asynchronous capabilities: ```python import asyncio from crawl4ai import AsyncWebCrawler async def main(): # Create an instance of AsyncWebCrawler async with AsyncWebCrawler(verbose=True) as crawler: # Run the crawler on a URL result = await crawler.arun(url="https://www.nbcnews.com/business") # Print the extracted content print(result.markdown) # Run the async main function asyncio.run(main()) ``` ## Key Features ✨ - πŸ†“ Completely free and open-source - πŸš€ Blazing fast performance, outperforming many paid services - πŸ€– LLM-friendly output formats (JSON, cleaned HTML, markdown) - πŸ“„ Fit markdown generation for extracting main article content. - 🌐 Multi-browser support (Chromium, Firefox, WebKit) - 🌍 Supports crawling multiple URLs simultaneously - 🎨 Extracts and returns all media tags (Images, Audio, and Video) - πŸ”— Extracts all external and internal links - πŸ“š Extracts metadata from the page - πŸ”„ Custom hooks for authentication, headers, and page modifications - πŸ•΅οΈ User-agent customization - πŸ–ΌοΈ Takes screenshots of pages with enhanced error handling - πŸ“œ Executes multiple custom JavaScripts before crawling - πŸ“Š Generates structured output without LLM using JsonCssExtractionStrategy - πŸ“š Various chunking strategies: topic-based, regex, sentence, and more - 🧠 Advanced extraction strategies: cosine clustering, LLM, and more - 🎯 CSS selector support for precise data extraction - πŸ“ Passes instructions/keywords to refine extraction - πŸ”’ Proxy support with authentication for enhanced access - πŸ”„ Session management for complex multi-page crawling - 🌐 Asynchronous architecture for improved performance - πŸ–ΌοΈ Improved image processing with lazy-loading detection - πŸ•°οΈ Enhanced handling of delayed content loading - πŸ”‘ Custom headers support for LLM interactions - πŸ–ΌοΈ iframe content extraction for comprehensive analysis - ⏱️ Flexible timeout and delayed content retrieval options ## Documentation Structure Our documentation is organized into several sections: ### Basic Usage - [Installation](basic/installation.md) - [Quick Start](basic/quickstart.md) - [Simple Crawling](basic/simple-crawling.md) - [Browser Configuration](basic/browser-config.md) - [Content Selection](basic/content-selection.md) - [Output Formats](basic/output-formats.md) - [Page Interaction](basic/page-interaction.md) ### Advanced Features - [Magic Mode](advanced/magic-mode.md) - [Session Management](advanced/session-management.md) - [Hooks & Authentication](advanced/hooks-auth.md) - [Proxy & Security](advanced/proxy-security.md) - [Content Processing](advanced/content-processing.md) ### Extraction & Processing - [Extraction Strategies Overview](extraction/overview.md) - [LLM Integration](extraction/llm.md) - [CSS-Based Extraction](extraction/css.md) - [Cosine Strategy](extraction/cosine.md) - [Chunking Strategies](extraction/chunking.md) ### API Reference - [AsyncWebCrawler](api/async-webcrawler.md) - [CrawlResult](api/crawl-result.md) - [Extraction Strategies](api/strategies.md) - [arun() Method Parameters](api/arun.md) ### Examples - Coming soon! ## Getting Started 1. Install Crawl4AI: ```bash pip install crawl4ai ``` 2. Check out our [Quick Start Guide](basic/quickstart.md) to begin crawling web pages. 3. Explore our [examples](https://github.com/unclecode/crawl4ai/tree/main/docs/examples) to see Crawl4AI in action. ## Support For questions, suggestions, or issues: - GitHub Issues: [Report a Bug](https://github.com/unclecode/crawl4ai/issues) - Twitter: [@unclecode](https://twitter.com/unclecode) - Website: [crawl4ai.com](https://crawl4ai.com) Happy Crawling! πŸ•ΈοΈπŸš€