# Output Formats Crawl4AI provides multiple output formats to suit different needs, ranging from raw HTML to structured data using LLM or pattern-based extraction, and versatile markdown outputs. ## Basic Formats ```python result = await crawler.arun(url="https://example.com") # Access different formats raw_html = result.html # Original HTML clean_html = result.cleaned_html # Sanitized HTML markdown_v2 = result.markdown_v2 # Detailed markdown generation results fit_md = result.markdown_v2.fit_markdown # Most relevant content in markdown ``` > **Note**: The `markdown_v2` property will soon be replaced by `markdown`. It is recommended to start transitioning to using `markdown` for new implementations. ## Raw HTML Original, unmodified HTML from the webpage. Useful when you need to: - Preserve the exact page structure. - Process HTML with your own tools. - Debug page issues. ```python result = await crawler.arun(url="https://example.com") print(result.html) # Complete HTML including headers, scripts, etc. ``` ## Cleaned HTML Sanitized HTML with unnecessary elements removed. Automatically: - Removes scripts and styles. - Cleans up formatting. - Preserves semantic structure. ```python config = CrawlerRunConfig( excluded_tags=['form', 'header', 'footer'], # Additional tags to remove keep_data_attributes=False # Remove data-* attributes ) result = await crawler.arun(url="https://example.com", config=config) print(result.cleaned_html) ``` ## Standard Markdown HTML converted to clean markdown format. This output is useful for: - Content analysis. - Documentation. - Readability. ```python config = CrawlerRunConfig( markdown_generator=DefaultMarkdownGenerator( options={"include_links": True} # Include links in markdown ) ) result = await crawler.arun(url="https://example.com", config=config) print(result.markdown_v2.raw_markdown) # Standard markdown with links ``` ## Fit Markdown Extract and convert only the most relevant content into markdown format. Best suited for: - Article extraction. - Focusing on the main content. - Removing boilerplate. To generate `fit_markdown`, use a content filter like `PruningContentFilter`: ```python from crawl4ai.content_filter_strategy import PruningContentFilter config = CrawlerRunConfig( content_filter=PruningContentFilter( threshold=0.7, threshold_type="dynamic", min_word_threshold=100 ) ) result = await crawler.arun(url="https://example.com", config=config) print(result.markdown_v2.fit_markdown) # Extracted main content in markdown ``` ## Markdown with Citations Generate markdown that includes citations for links. This format is ideal for: - Creating structured documentation. - Including references for extracted content. ```python config = CrawlerRunConfig( markdown_generator=DefaultMarkdownGenerator( options={"citations": True} # Enable citations ) ) result = await crawler.arun(url="https://example.com", config=config) print(result.markdown_v2.markdown_with_citations) print(result.markdown_v2.references_markdown) # Citations section ```