🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
4913 matches across 20 categories. Click a row to expand file-level details.
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | crawl4ai/antibot_detector.py | 22 | # --------------------------------------------------------------------------- |
| MEDIUM | crawl4ai/antibot_detector.py | 25 | # --------------------------------------------------------------------------- |
| MEDIUM | crawl4ai/antibot_detector.py | 69 | # --------------------------------------------------------------------------- |
| MEDIUM | crawl4ai/antibot_detector.py | 73 | # --------------------------------------------------------------------------- |
| MEDIUM | crawl4ai/antibot_detector.py | 100 | # --------------------------------------------------------------------------- |
| MEDIUM | crawl4ai/antibot_detector.py | 103 | # --------------------------------------------------------------------------- |
| MEDIUM | crawl4ai/antibot_detector.py | 114 | # --------------------------------------------------------------------------- |
| MEDIUM | crawl4ai/antibot_detector.py | 116 | # --------------------------------------------------------------------------- |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 769 | # ────────────────────────────────────────────────────────────── |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 774 | # ────────────────────────────────────────────────────────────── |
| MEDIUM | crawl4ai/utils.py | 3204 | # ── build signature ─────────────────────────────────────────── |
| MEDIUM | crawl4ai/utils.py | 3210 | # ── first seen? keep – else drop ───────────── |
| MEDIUM | crawl4ai/browser_profiler.py | 554 | # 1. ── Start the browser ───────────────────────────────────────── |
| MEDIUM | crawl4ai/browser_profiler.py | 557 | # 2. ── Attach Playwright to that running Chrome ────────────────── |
| MEDIUM | crawl4ai/browser_profiler.py | 586 | # 3. ── Persist storage state *before* we kill Chrome ───────────── |
| MEDIUM | crawl4ai/browser_profiler.py | 594 | # 4. ── Close everything cleanly ────────────────────────────────── |
| MEDIUM | crawl4ai/async_crawler_strategy.py | 780 | # ────────────────────────────────────────────────────────────── |
| MEDIUM | crawl4ai/async_crawler_strategy.py | 785 | # ────────────────────────────────────────────────────────────── |
| MEDIUM | crawl4ai/async_url_seeder.py | 362 | # ─────────────────────────────── discovery entry |
| MEDIUM | crawl4ai/async_url_seeder.py | 828 | # ─────────────────────────────── CC |
| MEDIUM | crawl4ai/async_url_seeder.py | 63 | # ────────────────────────────────────────────────────────────────────────── consts |
| MEDIUM | crawl4ai/async_url_seeder.py | 78 | # ────────────────────────────────────────────────────────────────────────── helpers |
| MEDIUM | crawl4ai/async_url_seeder.py | 258 | # ────────────────────────────────────────────────────────────────────────── class |
| MEDIUM | crawl4ai/async_url_seeder.py | 322 | # ───────── cache dirs ───────── |
| MEDIUM | crawl4ai/async_url_seeder.py | 338 | # ───────── cache helpers ───────── |
| MEDIUM | crawl4ai/async_url_seeder.py | 884 | # ─────────────────────────────── Sitemaps |
| MEDIUM | crawl4ai/async_url_seeder.py | 1280 | # ─────────────────────────────── validate helpers |
| MEDIUM | crawl4ai/async_url_seeder.py | 1465 | # ─────────────────────────────── BM25 scoring helpers |
| MEDIUM | crawl4ai/async_url_seeder.py | 1749 | # ─────────────────────────────── cleanup methods |
| MEDIUM | crawl4ai/async_url_seeder.py | 1765 | # ─────────────────────────────── index helper |
| MEDIUM | crawl4ai/browser_manager.py | 547 | # ── 1. cookies ──────────────────────────────────────────────────────────── |
| MEDIUM | crawl4ai/browser_manager.py | 552 | # ── 2. localStorage / sessionStorage ────────────────────────────────────── |
| MEDIUM | crawl4ai/browser_manager.py | 565 | # ── 3. runtime-mutable extras from configs ──────────────────────────────── |
| MEDIUM | crawl4ai/browser_manager.py | 811 | # ── Persistent context via Playwright's native API ────────────── |
| MEDIUM | crawl4ai/legacy/llmtxt.py | 308 | # ----------------------------------------------------- |
| MEDIUM | deploy/docker/server.py | 70 | # ── internal imports (after sys.path append) ───────────────── |
| MEDIUM | deploy/docker/server.py | 73 | # ────────────────── configuration / logging ────────────────── |
| MEDIUM | deploy/docker/server.py | 79 | # ── global page semaphore (hard cap) ───────────────────────── |
| MEDIUM | deploy/docker/server.py | 83 | # ── security feature flags ─────────────────────────────────── |
| MEDIUM | deploy/docker/server.py | 87 | # ── default browser config helper ───────────────────────────── |
| MEDIUM | deploy/docker/server.py | 166 | # ───────────────────── FastAPI instance ────────────────────── |
| MEDIUM | deploy/docker/server.py | 173 | # ── static playground ────────────────────────────────────── |
| MEDIUM | deploy/docker/server.py | 183 | # ── static monitor dashboard ──────────────────────────────── |
| MEDIUM | deploy/docker/server.py | 193 | # ── static assets (logo, etc) ──────────────────────────────── |
| MEDIUM | deploy/docker/server.py | 306 | # ── job router ────────────────────────────────────────────── |
| MEDIUM | deploy/docker/server.py | 309 | # ── monitor router ────────────────────────────────────────── |
| MEDIUM | deploy/docker/server.py | 315 | # ──────────────────────── Endpoints ────────────────────────── |
| MEDIUM | deploy/docker/server.py | 1 | # ───────────────────────── server.py ───────────────────────── |
| MEDIUM | deploy/docker/server.py | 9 | # ── stdlib & 3rd‑party imports ─────────────────────────────── |
| MEDIUM | deploy/docker/server.py | 115 | # ───────────────────── FastAPI lifespan ────────────────────── |
| MEDIUM | deploy/docker/server.py | 207 | # ─────────────────── infra / middleware ───────────────────── |
| MEDIUM | deploy/docker/server.py | 255 | # ───────────────── URL validation helper ───────────────── |
| MEDIUM | deploy/docker/server.py | 268 | # ───────────────── safe config‑dump helper ───────────────── |
| MEDIUM | deploy/docker/server.py | 875 | # ────────────────────────── cli ────────────────────────────── |
| MEDIUM | deploy/docker/server.py | 885 | # ───────────────────────────────────────────────────────────── |
| MEDIUM | deploy/docker/mcp_bridge.py | 19 | # ── opt‑in decorators ─────────────────────────────────────────── |
| MEDIUM | deploy/docker/mcp_bridge.py | 38 | # ── HTTP‑proxy helper for FastAPI endpoints ───────────────────── |
| MEDIUM | deploy/docker/mcp_bridge.py | 67 | # ── main entry point ──────────────────────────────────────────── |
| MEDIUM | deploy/docker/mcp_bridge.py | 179 | # ── WebSocket transport ──────────────────────────────────── |
| MEDIUM | deploy/docker/mcp_bridge.py | 238 | # ── schema endpoint ─────────────────────────────────────── |
| 658 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | PROGRESSIVE_CRAWLING.md | 174 | def generate_synthetic_dataset(domain_url): |
| LOW | test_webhook_implementation.py | 43 | def test_webhook_service_init(): |
| LOW | test_webhook_implementation.py | 93 | def test_webhook_config_model(): |
| LOW | test_webhook_implementation.py | 145 | def test_payload_construction(): |
| LOW | test_llm_webhook_feature.py | 15 | def test_llm_job_payload_model(): |
| LOW | test_llm_webhook_feature.py | 65 | def test_handle_llm_request_signature(): |
| LOW | test_llm_webhook_feature.py | 101 | def test_process_llm_extraction_signature(): |
| LOW | test_llm_webhook_feature.py | 136 | def test_webhook_integration_in_api(): |
| LOW | test_llm_webhook_feature.py | 187 | def test_job_endpoint_integration(): |
| LOW | test_llm_webhook_feature.py | 239 | def test_create_new_task_integration(): |
| LOW | crawl4ai/antibot_detector.py | 138 | def _structural_integrity_check(html: str) -> Tuple[bool, str]: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 611 | async def handle_request_failed_capture(request): |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1542 | async def _capture_console_messages( |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1791 | async def robust_execute_user_script( |
| LOW | crawl4ai/adaptive_crawler.py | 259 | def _embedding_llm_config_dict(self) -> Optional[Dict]: |
| LOW | crawl4ai/adaptive_crawler.py | 632 | def _get_embedding_llm_config_dict(self) -> Optional[Dict]: |
| LOW | crawl4ai/adaptive_crawler.py | 642 | def _get_query_llm_config_dict(self) -> Optional[Dict]: |
| LOW | crawl4ai/adaptive_crawler.py | 708 | def _get_cached_distance_matrix(self, query_embeddings: Any, kb_embeddings: Any) -> Any: |
| LOW | crawl4ai/adaptive_crawler.py | 871 | async def select_links_for_expansion( |
| LOW | crawl4ai/adaptive_crawler.py | 1808 | def _crawl_result_to_export_dict(self, result) -> Dict[str, Any]: |
| LOW | crawl4ai/adaptive_crawler.py | 1880 | def _import_dict_to_crawl_result(self, data: Dict[str, Any]): |
| LOW | crawl4ai/extraction_strategy.py | 2244 | def _make_context_sensitive_xpath(self, xpath, element): |
| LOW | crawl4ai/extraction_strategy.py | 2296 | def _fallback_class_id_search(self, element, selector_str): |
| LOW | crawl4ai/extraction_strategy.py | 280 | def filter_documents_embeddings( |
| LOW | crawl4ai/extraction_strategy.py | 416 | def filter_clusters_by_word_count( |
| LOW | crawl4ai/extraction_strategy.py | 2165 | def _create_selector_function(self, selector_str): |
| LOW | crawl4ai/extraction_strategy.py | 2267 | def _handle_nth_child_selector(self, element, selector_str): |
| LOW | crawl4ai/browser_adapter.py | 43 | async def retrieve_console_messages(self, page: Page) -> List[Dict]: |
| LOW | crawl4ai/browser_adapter.py | 133 | async def retrieve_console_messages(self, page: Page) -> List[Dict]: |
| LOW | crawl4ai/browser_adapter.py | 158 | def _check_stealth_availability(self) -> bool: |
| LOW | crawl4ai/browser_adapter.py | 261 | async def retrieve_console_messages(self, page: Page) -> List[Dict]: |
| LOW | crawl4ai/browser_adapter.py | 380 | async def retrieve_console_messages(self, page: UndetectedPage) -> List[Dict]: |
| LOW | crawl4ai/__init__.py | 206 | # def is_sync_version_installed(): |
| LOW | crawl4ai/markdown_generation_strategy.py | 82 | def convert_links_to_citations( |
| LOW | crawl4ai/adaptive_crawler copy.py | 643 | def _get_cached_distance_matrix(self, query_embeddings: Any, kb_embeddings: Any) -> Any: |
| LOW | crawl4ai/adaptive_crawler copy.py | 799 | async def select_links_for_expansion( |
| LOW | crawl4ai/adaptive_crawler copy.py | 1732 | def _crawl_result_to_export_dict(self, result) -> Dict[str, Any]: |
| LOW | crawl4ai/adaptive_crawler copy.py | 1804 | def _import_dict_to_crawl_result(self, data: Dict[str, Any]): |
| LOW | crawl4ai/content_scraping_strategy.py | 380 | def find_closest_parent_with_useful_text( |
| LOW | crawl4ai/content_scraping_strategy.py | 517 | def remove_empty_elements_fast(self, root, word_count_threshold=5): |
| LOW | crawl4ai/content_scraping_strategy.py | 569 | def remove_unwanted_attributes_fast( |
| LOW | crawl4ai/user_agent_generator.py | 344 | def generate_with_client_hints(self, **kwargs) -> Tuple[str, str]: |
| LOW | crawl4ai/cli.py | 467 | def delete_profile_interactive(profiler: BrowserProfiler): |
| LOW | crawl4ai/cli.py | 444 | async def create_profile_interactive(profiler: BrowserProfiler): |
| LOW | crawl4ai/utils.py | 1143 | def get_content_of_website_optimized( |
| LOW | crawl4ai/utils.py | 1195 | def find_closest_parent_with_useful_text(tag): |
| LOW | crawl4ai/utils.py | 3267 | def start_colab_display_server(): |
| LOW | crawl4ai/utils.py | 534 | def calculate_semaphore_count(): |
| LOW | crawl4ai/utils.py | 707 | def split_and_parse_json_objects(json_string): |
| LOW | crawl4ai/utils.py | 981 | def replace_pre_tags_with_text(node): |
| LOW | crawl4ai/utils.py | 1020 | def remove_empty_and_low_word_count_elements(node, word_count_threshold): |
| LOW | crawl4ai/utils.py | 1242 | def score_image_for_usefulness(img, base_url, index, images_count): |
| LOW | crawl4ai/utils.py | 1497 | def extract_metadata_using_lxml(html, doc=None): |
| LOW | crawl4ai/utils.py | 1742 | def perform_completion_with_backoff( |
| LOW | crawl4ai/utils.py | 1834 | async def aperform_completion_with_backoff( |
| LOW | crawl4ai/utils.py | 2040 | def merge_chunks_based_on_token_threshold(chunks, token_threshold): |
| LOW | crawl4ai/utils.py | 2336 | def normalize_url_for_deep_crawl(href, base_url, preserve_https=False, original_scheme=None): |
| LOW | crawl4ai/utils.py | 2395 | def efficient_normalize_url_for_deep_crawl(href, base_url, preserve_https=False, original_scheme=None): |
| LOW | crawl4ai/utils.py | 2966 | def configure_windows_event_loop(): |
| LOW | crawl4ai/utils.py | 3122 | def preprocess_html_for_schema(html_content, text_threshold=100, attr_value_threshold=200, max_size=100000): |
| 1330 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | CHANGELOG.md | 0 | const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); } |
| HIGH | deploy/docker/c4ai-doc-context.md | 0 | const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); } |
| HIGH | docs/md_v2/advanced/file-downloading.md | 0 | const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); } |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 0 | () => { const element = document.body; if (!element) return false; const style = window.getcomputedstyle(element); const |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 0 | () => { const element = document.body; if (!element) return false; const style = window.getcomputedstyle(element); const |
| HIGH | crawl4ai/async_crawler_strategy.py | 0 | () => { const element = document.body; if (!element) return false; const style = window.getcomputedstyle(element); const |
| HIGH | crawl4ai/async_crawler_strategy.py | 0 | () => { const element = document.body; if (!element) return false; const style = window.getcomputedstyle(element); const |
| HIGH | crawl4ai/proxy_strategy.py | 0 | configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona |
| HIGH | crawl4ai/async_configs.py | 0 | configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona |
| HIGH | crawl4ai/proxy_strategy.py | 0 | load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str |
| HIGH | crawl4ai/async_configs.py | 0 | load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str |
| HIGH | crawl4ai/proxy_strategy.py | 0 | create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd |
| HIGH | crawl4ai/async_configs.py | 0 | create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd |
| HIGH | crawl4ai/models.py | 0 | deprecated property that raises an attributeerror when accessed. |
| HIGH | crawl4ai/models.py | 0 | deprecated property that raises an attributeerror when accessed. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | deprecated property that raises an attributeerror when accessed. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | deprecated property that raises an attributeerror when accessed. |
| HIGH | crawl4ai/async_crawler_strategy.py | 0 | const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig |
| HIGH | crawl4ai/browser_manager.py | 0 | const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig |
| HIGH | tests/browser/test_init_script_dedup.py | 0 | const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig |
| HIGH | crawl4ai/async_configs.py | 0 | recursively convert an object to a serializable dictionary using {type, params} structure for complex objects. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | recursively convert an object to a serializable dictionary using {type, params} structure for complex objects. |
| HIGH | tests/docker/test_serialization.py | 0 | recursively convert an object to a serializable dictionary using {type, params} structure for complex objects. |
| HIGH | crawl4ai/async_configs.py | 0 | recursively convert a serializable dictionary back to an object instance. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | recursively convert a serializable dictionary back to an object instance. |
| HIGH | tests/docker/test_serialization.py | 0 | recursively convert a serializable dictionary back to an object instance. |
| HIGH | crawl4ai/deep_crawling/bfs_strategy.py | 0 | batch (non-streaming) mode: processes one bfs level at a time, then yields all the results. |
| HIGH | crawl4ai/deep_crawling/base_strategy.py | 0 | batch (non-streaming) mode: processes one bfs level at a time, then yields all the results. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | batch (non-streaming) mode: processes one bfs level at a time, then yields all the results. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | batch (non-streaming) mode: processes one bfs level at a time, then yields all the results. |
| HIGH | crawl4ai/deep_crawling/bfs_strategy.py | 0 | streaming mode: processes one bfs level at a time and yields results immediately as they arrive. |
| HIGH | crawl4ai/deep_crawling/base_strategy.py | 0 | streaming mode: processes one bfs level at a time and yields results immediately as they arrive. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | streaming mode: processes one bfs level at a time and yields results immediately as they arrive. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | streaming mode: processes one bfs level at a time and yields results immediately as they arrive. |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi |
| HIGH | deploy/docker/c4ai-doc-context.md | 0 | from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi |
| HIGH | docs/md_v2/complete-sdk-reference.md | 0 | from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi |
| HIGH | docs/md_v2/core/quickstart.md | 0 | from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi |
| HIGH | docs/examples/quickstart_examples_set_2.py | 0 | from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi |
| HIGH | docs/examples/llm_extraction_openai_pricing.py | 0 | from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi |
| HIGH | docs/examples/quickstart.py | 0 | from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | (async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t |
| HIGH | deploy/docker/c4ai-doc-context.md | 0 | (async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t |
| HIGH | docs/md_v2/complete-sdk-reference.md | 0 | (async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t |
| HIGH | docs/md_v2/core/quickstart.md | 0 | (async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t |
| HIGH | docs/examples/quickstart_examples_set_2.py | 0 | (async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t |
| HIGH | docs/examples/quickstart.py | 0 | (async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click(); |
| HIGH | tests/async/test_edge_cases.py | 0 | const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click(); |
| HIGH | docs/examples/quickstart_examples_set_2.py | 0 | const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click(); |
| HIGH | docs/examples/quickstart.py | 0 | const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click(); |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | (async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu |
| HIGH | docs/examples/quickstart_examples_set_2.py | 0 | (async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu |
| HIGH | docs/examples/quickstart.py | 0 | (async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu |
| HIGH | deploy/docker/c4ai-code-context.md | 0 | part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial. |
| HIGH | docs/examples/deepcrawl_example.py | 0 | part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial. |
| HIGH | docs/examples/docker_config_obj.py | 0 | part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial. |
| 201 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | setup.py | 40 | except Exception: |
| LOW | test_webhook_implementation.py | 29 | except Exception as e: |
| LOW | test_webhook_implementation.py | 37 | except Exception as e: |
| LOW | test_webhook_implementation.py | 87 | except Exception as e: |
| LOW | test_webhook_implementation.py | 139 | except Exception as e: |
| LOW | test_webhook_implementation.py | 200 | except Exception as e: |
| LOW | test_webhook_implementation.py | 229 | except Exception as e: |
| LOW | test_webhook_implementation.py | 264 | except Exception as e: |
| LOW | test_llm_webhook_feature.py | 59 | except Exception as e: |
| LOW | test_llm_webhook_feature.py | 95 | except Exception as e: |
| LOW | test_llm_webhook_feature.py | 130 | except Exception as e: |
| LOW | test_llm_webhook_feature.py | 181 | except Exception as e: |
| LOW | test_llm_webhook_feature.py | 233 | except Exception as e: |
| LOW | test_llm_webhook_feature.py | 279 | except Exception as e: |
| LOW | test_llm_webhook_feature.py | 346 | except Exception as e: |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 551 | def handle_request_capture(request): |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 581 | def handle_response_capture(response): |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 611 | def handle_request_failed_capture(request): |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 632 | def handle_console_capture(msg): |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 665 | def handle_pageerror_capture(err): |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 1555 | def handle_console_message(msg): |
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 2267 | def _session_context(self): |
| LOW | crawl4ai/async_crawler_strategy.back.py | 930 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 990 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1002 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1108 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1201 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 2132 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 2180 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 2417 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 2443 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 606 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 621 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1532 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 327 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 387 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 563 | except Exception: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 576 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 587 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 655 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 685 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1374 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1424 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1458 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1566 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1619 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1714 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1746 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1922 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1933 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 2026 | except Exception as e: |
| LOW | crawl4ai/async_crawler_strategy.back.py | 2034 | except Exception as e: |
| LOW | crawl4ai/async_database.py | 82 | except Exception as e: |
| LOW | crawl4ai/async_database.py | 110 | except Exception as e: |
| LOW | crawl4ai/async_database.py | 164 | except Exception as e: |
| LOW | crawl4ai/async_database.py | 185 | except Exception as e: |
| LOW | crawl4ai/async_database.py | 217 | except Exception as e: |
| LOW | crawl4ai/async_database.py | 383 | except Exception as e: |
| LOW | crawl4ai/async_database.py | 425 | except Exception as e: |
| LOW | crawl4ai/async_database.py | 470 | except Exception as e: |
| 979 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | crawl4ai/async_crawler_strategy.back.py | 1519 | # Create a new CDP session |
| MEDIUM | crawl4ai/async_database.py | 677 | # Create a singleton instance |
| MEDIUM | crawl4ai/ssl_certificate.py | 90 | # Create the dictionary directly |
| MEDIUM | crawl4ai/adaptive_crawler.py | 133 | # Create a mock object that has the minimal interface we need |
| MEDIUM | crawl4ai/link_preview.py | 212 | # Create a wrapper to track progress |
| MEDIUM | crawl4ai/link_preview.py | 241 | # Create a custom progress tracking version |
| MEDIUM | crawl4ai/extraction_strategy.py | 2181 | # Create the wrapper function that implements the selection strategy |
| MEDIUM | crawl4ai/extraction_strategy.py | 2187 | # Create a cache key based on element and selector |
| MEDIUM | crawl4ai/extraction_strategy.py | 2412 | # Create a function that will apply this selector appropriately |
| MEDIUM | crawl4ai/adaptive_crawler copy.py | 133 | # Create a mock object that has the minimal interface we need |
| MEDIUM | crawl4ai/content_scraping_strategy.py | 851 | # Create a config object for LinkPreview |
| MEDIUM | crawl4ai/cli.py | 456 | # Create the profile |
| MEDIUM | crawl4ai/cli.py | 361 | # Create a profile and use it for crawling |
| MEDIUM | crawl4ai/utils.py | 516 | # Create the box with colored borders and lighter text |
| MEDIUM | crawl4ai/utils.py | 980 | # Create a function that replace content of all"pre" tag with its inner text |
| MEDIUM | crawl4ai/utils.py | 3224 | # # Create a signature based on tag and classes |
| MEDIUM | crawl4ai/browser_profiler.py | 158 | # Create a logger if not provided |
| MEDIUM | crawl4ai/browser_profiler.py | 995 | # Create a temporary profile directory |
| MEDIUM | crawl4ai/browser_profiler.py | 1200 | # Create a user data directory for the builtin browser |
| MEDIUM | crawl4ai/browser_profiler.py | 1382 | # Create a new profile |
| MEDIUM | crawl4ai/browser_profiler.py | 408 | # Create a profile interactively |
| MEDIUM | crawl4ai/browser_profiler.py | 829 | # Define a custom crawl function |
| MEDIUM | crawl4ai/async_crawler_strategy.py | 1639 | # Create a new CDP session |
| MEDIUM | crawl4ai/model_loader.py | 226 | # Create the models directory if it doesn't exist |
| MEDIUM | crawl4ai/async_configs.py | 835 | # Create a funciton returns dict of the object |
| MEDIUM | crawl4ai/async_configs.py | 1886 | # Create a funciton returns dict of the object |
| MEDIUM | crawl4ai/async_configs.py | 2010 | # Create a new config with streaming enabled |
| MEDIUM | crawl4ai/async_configs.py | 2013 | # Create a new config with multiple updates |
| MEDIUM | crawl4ai/async_url_seeder.py | 1240 | # Create a bounded queue for results to prevent RAM issues |
| MEDIUM | crawl4ai/browser_manager.py | 1690 | # Create a new page from the chosen context |
| MEDIUM | crawl4ai/browser_manager.py | 482 | # Create a BrowserProfiler instance and delegate to it |
| MEDIUM | crawl4ai/browser_manager.py | 505 | # Create a BrowserProfiler instance and delegate to it |
| MEDIUM | crawl4ai/browser_manager.py | 528 | # Create a BrowserProfiler instance and delegate to it |
| MEDIUM | crawl4ai/chunking_strategy.py | 7 | # Define the abstract base class for chunking strategies |
| MEDIUM | crawl4ai/chunking_strategy.py | 27 | # Create an identity chunking strategy f(x) = [x] |
| MEDIUM | crawl4ai/async_webcrawler.py | 262 | # Initialize processing variables |
| MEDIUM | crawl4ai/async_webcrawler.py | 803 | # Define the source selection logic using dict dispatch |
| MEDIUM | crawl4ai/deep_crawling/bfs_strategy.py | 51 | # Create a new logger if logger is None, dict, or any other non-Logger type |
| MEDIUM | crawl4ai/deep_crawling/bff_strategy.py | 62 | # Create a new logger if logger is None, dict, or any other non-Logger type |
| MEDIUM | crawl4ai/js_snippet/__init__.py | 4 | # Create a function get name of a js script, then load from the CURRENT folder of this script and return its content as |
| MEDIUM | crawl4ai/components/crawler_monitor.py | 167 | # Create the status text |
| MEDIUM | crawl4ai/components/crawler_monitor.py | 180 | # Create a table for status counts |
| MEDIUM | crawl4ai/components/crawler_monitor.py | 261 | # Create a table for task details |
| MEDIUM | crawl4ai/components/crawler_monitor.py | 374 | # Create a more visible footer panel |
| MEDIUM | deploy/docker/hook_manager.py | 119 | # Create a safe namespace for the hook |
| MEDIUM | tests/test_pyopenssl_security_fix.py | 75 | # Create a basic SSL context to verify functionality |
| MEDIUM | tests/test_raw_html_redirected_url.py | 16 | # Create a dummy decorator |
| MEDIUM | tests/test_raw_html_redirected_url.py | 51 | # Create a large HTML (100KB+) |
| MEDIUM | tests/test_raw_html_edge_cases.py | 259 | # Create a temp file |
| MEDIUM | tests/docker/test_config_object.py | 54 | # Create the config |
| MEDIUM | tests/memory/test_dispatcher_stress.py | 36 | # Create a memory restrictor to simulate limited memory environment |
| MEDIUM | tests/memory/test_dispatcher_stress.py | 330 | # Create a nightmare scenario - multiple overlapping spikes |
| MEDIUM | tests/memory/benchmark_report.py | 230 | # Create the plot |
| MEDIUM | tests/memory/benchmark_report.py | 307 | # Create the plot |
| MEDIUM | tests/memory/benchmark_report.py | 871 | # Create the benchmark reporter |
| MEDIUM | tests/proxy/test_proxy_config.py | 477 | # Create a large list of proxy strings |
| MEDIUM | tests/general/test_mhtml.py | 27 | # Create a fresh browser config and crawler instance for this test |
| MEDIUM | tests/general/test_mhtml.py | 32 | # Create a fresh crawler instance |
| MEDIUM | tests/general/test_mhtml.py | 89 | # Create a fresh browser config and crawler instance for this test |
| MEDIUM | tests/general/test_mhtml.py | 94 | # Create a fresh crawler instance |
| 95 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | test_webhook_implementation.py | 34 | |
| LOW | test_llm_webhook_feature.py | 23 | |
| LOW | test_llm_webhook_feature.py | 24 | |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1 | |
| LOW | crawl4ai/async_database.py | 15 | |
| LOW | crawl4ai/adaptive_crawler.py | 13 | |
| LOW | crawl4ai/adaptive_crawler.py | 14 | |
| LOW | crawl4ai/adaptive_crawler.py | 17 | |
| LOW | crawl4ai/adaptive_crawler.py | 878 | |
| LOW | crawl4ai/link_preview.py | 8 | |
| LOW | crawl4ai/extraction_strategy.py | 19 | |
| LOW | crawl4ai/extraction_strategy.py | 30 | |
| LOW | crawl4ai/extraction_strategy.py | 34 | |
| LOW | crawl4ai/browser_adapter.py | 10 | |
| LOW | crawl4ai/__init__.py | 4 | |
| LOW | crawl4ai/__init__.py | 4 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 6 | |
| LOW | crawl4ai/__init__.py | 8 | |
| LOW | crawl4ai/__init__.py | 8 | |
| LOW | crawl4ai/__init__.py | 8 | |
| LOW | crawl4ai/__init__.py | 13 | |
| LOW | crawl4ai/__init__.py | 14 | |
| LOW | crawl4ai/__init__.py | 14 | |
| LOW | crawl4ai/__init__.py | 18 | |
| LOW | crawl4ai/__init__.py | 18 | |
| LOW | crawl4ai/__init__.py | 22 | |
| LOW | crawl4ai/__init__.py | 22 | |
| LOW | crawl4ai/__init__.py | 22 | |
| LOW | crawl4ai/__init__.py | 22 | |
| LOW | crawl4ai/__init__.py | 22 | |
| LOW | crawl4ai/__init__.py | 22 | |
| LOW | crawl4ai/__init__.py | 22 | |
| LOW | crawl4ai/__init__.py | 31 | |
| LOW | crawl4ai/__init__.py | 31 | |
| LOW | crawl4ai/__init__.py | 32 | |
| LOW | crawl4ai/__init__.py | 33 | |
| LOW | crawl4ai/__init__.py | 33 | |
| LOW | crawl4ai/__init__.py | 33 | |
| LOW | crawl4ai/__init__.py | 33 | |
| LOW | crawl4ai/__init__.py | 39 | |
| LOW | crawl4ai/__init__.py | 39 | |
| LOW | crawl4ai/__init__.py | 39 | |
| LOW | crawl4ai/__init__.py | 39 | |
| LOW | crawl4ai/__init__.py | 45 | |
| LOW | crawl4ai/__init__.py | 45 | |
| LOW | crawl4ai/__init__.py | 45 | |
| LOW | crawl4ai/__init__.py | 46 | |
| LOW | crawl4ai/__init__.py | 47 | |
| LOW | crawl4ai/__init__.py | 48 | |
| LOW | crawl4ai/__init__.py | 48 | |
| LOW | crawl4ai/__init__.py | 48 | |
| 499 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | crawl4ai/async_crawler_strategy.back.py | 1836 | # return {{ success: false, error: err.toString(), stack: err.stack }}; |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 1288 | htmlChunks.push(previousHTML); |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 1300 | htmlChunks.push(currentHTML); |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 1329 | uniqueElements.push(element.outerHTML); |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 1450 | error: error.toString(), |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 1854 | return {{ success: false, error: err.toString(), stack: err.stack }}; |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 1985 | error: error.toString(), |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 1996 | error: error.toString(), |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 2117 | error: e.toString() |
| HIGH | crawl4ai/browser_adapter.py | 314 | window.__capturedConsole.push({ |
| HIGH | crawl4ai/browser_adapter.py | 348 | window.__capturedErrors.push({ |
| HIGH | crawl4ai/browser_adapter.py | 365 | window.__capturedErrors.push({ |
| HIGH | crawl4ai/browser_adapter.py | 368 | stack: event.reason && event.reason.stack ? event.reason.stack : '', |
| HIGH | crawl4ai/prompts.py | 1357 | if (card && card.shadowRoot) { |
| HIGH | crawl4ai/prompts.py | 1366 | if (card && card.shadowRoot) { |
| HIGH | crawl4ai/async_crawler_strategy.py | 2059 | # return {{ success: false, error: err.toString(), stack: err.stack }}; |
| HIGH | crawl4ai/async_crawler_strategy.py | 1363 | htmlChunks.push(previousHTML); |
| HIGH | crawl4ai/async_crawler_strategy.py | 1375 | htmlChunks.push(currentHTML); |
| HIGH | crawl4ai/async_crawler_strategy.py | 1404 | uniqueElements.push(element.outerHTML); |
| HIGH | crawl4ai/async_crawler_strategy.py | 1526 | error: error.toString(), |
| HIGH | crawl4ai/async_crawler_strategy.py | 1570 | error: error.toString(), |
| HIGH | crawl4ai/async_crawler_strategy.py | 1866 | if (rect.width > 0 && rect.height > 0) { |
| HIGH | crawl4ai/async_crawler_strategy.py | 2077 | return {{ success: false, error: err.toString(), stack: err.stack }}; |
| HIGH | crawl4ai/async_crawler_strategy.py | 2208 | error: error.toString(), |
| HIGH | crawl4ai/async_crawler_strategy.py | 2219 | error: error.toString(), |
| HIGH | crawl4ai/async_crawler_strategy.py | 2340 | error: e.toString() |
| HIGH | tests/test_cloud_bugs_batch.py | 96 | httpbin_anything = '<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"args": {}, "da |
| HIGH | tests/test_main.py | 73 | "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten |
| HIGH | tests/test_virtual_scroll.py | 44 | allData.push({ |
| HIGH | tests/test_virtual_scroll.py | 57 | items.push(`<div class="item" data-index="${item.id}">${item.text}</div>`); |
| HIGH | tests/test_docker.py | 91 | "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textContent.in |
| HIGH | tests/releases/test_release_0.6.4.py | 89 | function gtag(){dataLayer.push(arguments);} |
| HIGH | tests/releases/test_release_0.6.4.py | 125 | function gtag(){dataLayer.push(arguments);} |
| HIGH | tests/docker/test_hooks_comprehensive.py | 465 | return element ? element.getAttribute('content') : null; |
| HIGH | tests/docker/test_server_requests.py | 145 | # It might be null, missing, or populated depending on the server's default behavior |
| HIGH | tests/general/test_async_crawler_strategy.py | 283 | # results.push(e.name); |
| HIGH | tests/general/test_async_crawler_strategy.py | 288 | # results.push(e.name); |
| HIGH | tests/async/test_parameters_and_options.py | 52 | "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textContent.in |
| HIGH | docs/examples/quickstart_examples_set_2.py | 94 | js_code="const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten |
| HIGH | docs/examples/quickstart_examples_set_2.py | 361 | return commits.length > 0 ? commits[0].textContent.trim() : null; |
| HIGH | docs/examples/quickstart_examples_set_2.py | 371 | if (newCommit && newCommit !== initialCommit) { |
| HIGH | docs/examples/stealth_mode_example.py | 112 | console.log('DETECTION_RESULTS:', JSON.stringify(detectionResults, null, 2)); |
| HIGH | docs/examples/rest_call.py | 32 | loadMoreButton && loadMoreButton.click(); |
| HIGH | docs/examples/crawlai_vs_firecrawl.py | 53 | "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten |
| HIGH | docs/examples/docker_client_hooks_example.py | 234 | return el ? el.getAttribute('content') : null; |
| HIGH | docs/examples/quickstart.py | 94 | js_code="const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten |
| HIGH | docs/examples/quickstart.py | 361 | return commits.length > 0 ? commits[0].textContent.trim() : null; |
| HIGH | docs/examples/quickstart.py | 371 | if (newCommit && newCommit !== initialCommit) { |
| HIGH | …solver/capsolver_api_integration/solve_recaptcha_v3.py | 47 | args[0] = url.toString(); |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | docs/md_v2/ask_ai/ask-ai.js | 72 | print(result.markdown[:300]) # Print first 300 chars |
| HIGH | docs/md_v2/marketplace/frontend/app-detail.js | 155 | print(result.markdown)`; |
| HIGH | docs/md_v2/marketplace/frontend/app-detail.js | 172 | print(result.status_code)`; |
| HIGH | docs/md_v2/marketplace/frontend/app-detail.js | 191 | print(result.extracted_content)`; |
| HIGH | docs/md_v2/marketplace/frontend/app-detail.js | 240 | print(f"Found {len(products)} products") |
| HIGH | docs/md_v2/marketplace/frontend/app-detail.js | 243 | print(f"- {product['title']}: {product['price']}") |
| HIGH | …md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js | 2418 | print("✅ Automation completed successfully!") |
| HIGH | …md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js | 2419 | print(f"Final URL: {result.url}") |
| HIGH | …md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js | 2423 | print("❌ Automation failed:", result.error_message) |
| HIGH | …md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js | 2452 | print(f"💾 C4A Script saved to: {script_path}") |
| HIGH | …md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js | 2453 | print("\\n📜 Generated C4A Script:") |
| HIGH | …md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js | 2454 | print(C4A_SCRIPT) |
| HIGH | …md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js | 2464 | print("\\n💡 To execute this C4A script, compile it to JavaScript first!") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1737 | print(f"\\n✅ Successfully extracted {len(data)} items!") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1744 | print("\\n📊 Sample results (first 2 items):") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1746 | print(f"\\nItem {i}:") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1748 | print(f" {key}: {value}") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1752 | print("❌ Extraction failed:", result.error_message) |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1753 | return None |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1759 | print("\\n🎯 Next steps:") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1760 | print("1. Install Crawl4AI: pip install crawl4ai") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1761 | print("2. Modify the URL or add multiple URLs") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1762 | print("3. Customize crawler options as needed") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1763 | print("4. Check 'extracted_data.json' for full results") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1817 | print("✅ Schema generated successfully!") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1818 | print(f"📄 Schema saved to: {schema_path}") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1819 | print("\\nGenerated schema:") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1820 | print(json.dumps(schema, indent=2)) |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1825 | print(f"❌ Error generating schema: {e}") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1826 | return None |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1830 | print("\\n🧪 Testing extraction on live webpage...") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1837 | print("❌ Schema file not found. Run generate_schema() first.") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1859 | print(f"\\n✅ Successfully extracted {len(data)} items!") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1866 | print("\\n📊 Sample results (first 2 items):") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1868 | print(f"\\nItem {i}:") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1870 | print(f" {key}: {value}") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1872 | print("❌ Extraction failed:", result.error_message) |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1882 | print("\\n🎯 Next steps:") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1883 | print("1. Review the generated schema in 'generated_schema.json'") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1884 | print("2. Uncomment the test_extraction() line to test on the live site") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1885 | print("3. Use the schema in your Crawl4AI projects!") |
| HIGH | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 1803 | print("🔧 Generating extraction schema...") |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | crawl4ai/async_crawler_strategy.back.py | 223 | |
| LOW | crawl4ai/async_crawler_strategy.back.py | 421 | |
| LOW | crawl4ai/async_crawler_strategy.back.py | 496 | |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1635 | |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1791 | |
| LOW | crawl4ai/async_crawler_strategy.back.py | 551 | |
| LOW | crawl4ai/async_database.py | 102 | |
| LOW | crawl4ai/async_database.py | 478 | |
| LOW | crawl4ai/ssl_certificate.py | 62 | |
| LOW | crawl4ai/adaptive_crawler.py | 871 | |
| LOW | crawl4ai/adaptive_crawler.py | 1330 | |
| LOW | crawl4ai/adaptive_crawler.py | 1570 | |
| LOW | crawl4ai/adaptive_crawler.py | 1845 | |
| LOW | crawl4ai/extraction_strategy.py | 642 | |
| LOW | crawl4ai/extraction_strategy.py | 787 | |
| LOW | crawl4ai/extraction_strategy.py | 844 | |
| LOW | crawl4ai/extraction_strategy.py | 1150 | |
| LOW | crawl4ai/extraction_strategy.py | 1240 | |
| LOW | crawl4ai/extraction_strategy.py | 1820 | |
| LOW | crawl4ai/extraction_strategy.py | 2165 | |
| LOW | crawl4ai/extraction_strategy.py | 2404 | |
| LOW | crawl4ai/extraction_strategy.py | 2182 | |
| LOW | crawl4ai/extraction_strategy.py | 2413 | |
| LOW | crawl4ai/browser_adapter.py | 173 | |
| LOW | crawl4ai/cache_validator.py | 83 | |
| LOW | crawl4ai/markdown_generation_strategy.py | 148 | |
| LOW | crawl4ai/adaptive_crawler copy.py | 799 | |
| LOW | crawl4ai/adaptive_crawler copy.py | 1252 | |
| LOW | crawl4ai/adaptive_crawler copy.py | 1494 | |
| LOW | crawl4ai/adaptive_crawler copy.py | 1769 | |
| LOW | crawl4ai/hub.py | 41 | |
| LOW | crawl4ai/content_scraping_strategy.py | 231 | |
| LOW | crawl4ai/content_scraping_strategy.py | 410 | |
| LOW | crawl4ai/content_scraping_strategy.py | 569 | |
| LOW | crawl4ai/content_scraping_strategy.py | 607 | |
| LOW | crawl4ai/user_agent_generator.py | 261 | |
| LOW | crawl4ai/user_agent_generator.py | 299 | |
| LOW | crawl4ai/async_dispatcher.py | 175 | |
| LOW | crawl4ai/async_dispatcher.py | 228 | |
| LOW | crawl4ai/async_dispatcher.py | 374 | |
| LOW | crawl4ai/async_dispatcher.py | 471 | |
| LOW | crawl4ai/async_dispatcher.py | 530 | |
| LOW | crawl4ai/async_dispatcher.py | 635 | |
| LOW | crawl4ai/cli.py | 110 | |
| LOW | crawl4ai/cli.py | 501 | |
| LOW | crawl4ai/cli.py | 580 | |
| LOW | crawl4ai/cli.py | 1032 | |
| LOW | crawl4ai/utils.py | 76 | |
| LOW | crawl4ai/utils.py | 419 | |
| LOW | crawl4ai/utils.py | 555 | |
| LOW | crawl4ai/utils.py | 707 | |
| LOW | crawl4ai/utils.py | 889 | |
| LOW | crawl4ai/utils.py | 1143 | |
| LOW | crawl4ai/utils.py | 2169 | |
| LOW | crawl4ai/utils.py | 3122 | |
| LOW | crawl4ai/utils.py | 3382 | |
| LOW | crawl4ai/utils.py | 3658 | |
| LOW | crawl4ai/utils.py | 1335 | |
| LOW | crawl4ai/browser_profiler.py | 83 | |
| LOW | crawl4ai/browser_profiler.py | 196 | |
| 198 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | test_webhook_implementation.py | 240 | # Check if api.py can import webhook module |
| LOW | crawl4ai/async_database.py | 49 | # Check if version update is needed |
| LOW | crawl4ai/ssl_certificate.py | 74 | # Set check_hostname to False and verify_mode to CERT_NONE temporarily |
| LOW | crawl4ai/proxy_strategy.py | 244 | # Check if session exists and hasn't expired |
| LOW | crawl4ai/adaptive_crawler.py | 538 | # Check if we have any links left |
| LOW | crawl4ai/adaptive_crawler.py | 715 | # Check if KB has changed |
| LOW | crawl4ai/adaptive_crawler.py | 1159 | # Check if confidence is below minimum threshold (completely irrelevant) |
| LOW | crawl4ai/extraction_strategy.py | 2278 | # Check if there's content after the nth-child part |
| LOW | crawl4ai/adaptive_crawler copy.py | 511 | # Check if we have any links left |
| LOW | crawl4ai/adaptive_crawler copy.py | 650 | # Check if KB has changed |
| LOW | crawl4ai/content_scraping_strategy.py | 866 | # Check if we're already in an async context |
| LOW | crawl4ai/async_dispatcher.py | 288 | # Check if we're in critical memory state |
| LOW | crawl4ai/cli.py | 1132 | # Set output to JSON if not explicitly specified |
| LOW | crawl4ai/cli.py | 1141 | # Check if type does not exist show proper message |
| LOW | crawl4ai/cli.py | 1400 | # Check if the value should be one of the allowed options |
| LOW | crawl4ai/cli.py | 1550 | # Display results |
| LOW | crawl4ai/utils.py | 1202 | # Check if the text content has at least word_count_threshold |
| LOW | crawl4ai/utils.py | 1208 | # Check if an image has valid display and inside undesired html elements |
| LOW | crawl4ai/utils.py | 3272 | # Check if running in Google Colab |
| LOW | crawl4ai/utils.py | 290 | # Check if cache is still fresh based on TTL |
| LOW | crawl4ai/utils.py | 297 | # Check if content actually changed |
| LOW | crawl4ai/utils.py | 656 | # Check if a path has already been saved for this browser type |
| LOW | crawl4ai/utils.py | 1043 | # Check if the tag contains text and if it's not just whitespace |
| LOW | crawl4ai/utils.py | 1063 | # Check if the tag itself is empty or all its children are empty/whitespace |
| LOW | crawl4ai/utils.py | 1804 | # Check if we have exhausted our max attempts |
| LOW | crawl4ai/utils.py | 1897 | # Check if we have exhausted our max attempts |
| LOW | crawl4ai/utils.py | 2597 | # Check if URL domain ends with base domain |
| LOW | crawl4ai/utils.py | 3364 | # Check if this is a documentation/reference site |
| LOW | crawl4ai/browser_profiler.py | 563 | # Check if browser started successfully |
| LOW | crawl4ai/browser_profiler.py | 1287 | # Check if the browser is still running |
| LOW | crawl4ai/browser_profiler.py | 1303 | # Check if the process exists |
| LOW | crawl4ai/browser_profiler.py | 114 | # Check if item matches any keep pattern |
| LOW | crawl4ai/browser_profiler.py | 239 | # Check if browser process ended |
| LOW | crawl4ai/browser_profiler.py | 297 | # Check if browser process ended |
| LOW | crawl4ai/browser_profiler.py | 371 | # Check if browser process ended |
| LOW | crawl4ai/browser_profiler.py | 658 | # Check if this looks like a valid browser profile |
| LOW | crawl4ai/browser_profiler.py | 710 | # Check if path exists and is a valid profile |
| LOW | crawl4ai/browser_profiler.py | 759 | # Check if path exists and is a valid profile |
| LOW | crawl4ai/browser_profiler.py | 1115 | # Check if browser started successfully |
| LOW | crawl4ai/browser_profiler.py | 1194 | # Check if there's an existing browser still running |
| LOW | crawl4ai/browser_profiler.py | 1217 | # Check if browser started successfully |
| LOW | crawl4ai/docker_client.py | 94 | # Check if hooks are already strings or need conversion |
| LOW | crawl4ai/async_crawler_strategy.py | 463 | # Check if browser processing is required for file:// or raw: URLs |
| LOW | crawl4ai/async_crawler_strategy.py | 727 | # Check if this is a file:// or raw: URL that needs set_content() instead of goto() |
| LOW | crawl4ai/async_crawler_strategy.py | 1784 | # Check if viewport-only screenshot is forced |
| LOW | crawl4ai/model_loader.py | 195 | # Check if the model directory already exists |
| LOW | crawl4ai/table_extraction.py | 110 | # Check if this is a data table (not a layout table) |
| LOW | crawl4ai/table_extraction.py | 760 | # Check if there are any tables in the content |
| LOW | crawl4ai/table_extraction.py | 769 | # Check if chunking is needed |
| LOW | crawl4ai/table_extraction.py | 852 | # Check if we got valid tables |
| LOW | crawl4ai/table_extraction.py | 1024 | # Check if adding this row would exceed threshold |
| LOW | crawl4ai/async_configs.py | 2049 | # Check if given provider starts with any of key in PROVIDER_MODELS_PREFIXES |
| LOW | crawl4ai/content_filter_strategy.py | 460 | # Check if body is present |
| LOW | crawl4ai/browser_manager.py | 1709 | # Check if browser recycle threshold is hit — bump version for next requests |
| LOW | crawl4ai/browser_manager.py | 1771 | # Check if this signature belongs to an old browser waiting to be cleaned up |
| LOW | crawl4ai/browser_manager.py | 1935 | # Check if any signatures from this old version remain |
| LOW | crawl4ai/browser_manager.py | 1333 | # Check if there is value for crawlerRunConfig.proxy_config set add that to context |
| LOW | crawl4ai/async_webcrawler.py | 498 | # Check if blocked (skip for raw: URLs — |
| LOW | crawl4ai/deep_crawling/bfs_strategy.py | 239 | # Check if we've already reached max_pages before starting a new level |
| LOW | crawl4ai/deep_crawling/bfs_strategy.py | 357 | # Check if we've reached the limit during batch processing |
| 96 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | crawl4ai/cache_validator.py | 112 | # Step 1: Try HEAD request with conditional headers |
| LOW | crawl4ai/cache_validator.py | 156 | # Step 2: No conditional headers available, try fingerprint only |
| LOW | crawl4ai/cache_validator.py | 180 | # Step 3: No validation data available |
| LOW | crawl4ai/async_url_seeder.py | 914 | # Step 1: Find sitemap URL and get lastmod (needed for validation) |
| LOW | crawl4ai/async_url_seeder.py | 938 | # Step 2: Check cache validity (skip if force=True) |
| LOW | crawl4ai/async_url_seeder.py | 952 | # Step 3: Fetch fresh URLs |
| LOW | crawl4ai/async_url_seeder.py | 993 | # Step 4: Write to cache (FALLBACK: if write fails, URLs still yielded above) |
| LOW | crawl4ai/cloud/cli.py | 253 | # Step 1: Shrink (unless --no-shrink) |
| LOW | crawl4ai/cloud/cli.py | 266 | # Step 2: Package as tar.gz |
| LOW | crawl4ai/cloud/cli.py | 281 | # Step 3: Upload |
| LOW | tests/test_webhook_feature.sh | 104 | # Step 1: Save current branch and fetch PR |
| LOW | tests/test_webhook_feature.sh | 112 | # Step 2: Switch to new branch |
| LOW | tests/test_webhook_feature.sh | 117 | # Step 3: Activate virtual environment |
| LOW | tests/test_webhook_feature.sh | 128 | # Step 4: Install server dependencies |
| LOW | tests/test_webhook_feature.sh | 147 | # Step 5: Start Redis in background |
| LOW | tests/test_webhook_feature.sh | 183 | # Step 6: Create and run webhook test |
| LOW | tests/test_webhook_feature.sh | 292 | # Step 7: Verify results |
| LOW | tests/test_webhook_feature.sh | 303 | # Step 8: Cleanup happens automatically via trap |
| LOW | tests/test_pyopenssl_update.py | 141 | # Step 1: Check versions |
| LOW | tests/test_pyopenssl_update.py | 147 | # Step 2: Test basic crawling |
| LOW | tests/test_pyopenssl_update.py | 153 | # Step 3: Test stealth mode |
| LOW | tests/proxy/test_proxy_verify.py | 79 | # Step 1: Verify IPs |
| LOW | tests/proxy/test_proxy_verify.py | 86 | # Step 2: Get NST proxies |
| LOW | tests/proxy/test_proxy_verify.py | 97 | # Step 3: Test Chanel with all available proxies |
| LOW | tests/general/test_async_url_seeder_bm25.py | 558 | # Step 1: Discover and score URLs |
| LOW | tests/general/test_async_url_seeder_bm25.py | 587 | # Step 3: Verify these URLs would be good for actual crawling |
| LOW | tests/general/test_async_url_seeder_bm25.py | 573 | # Step 2: Analyze top results |
| LOW | tests/async/test_browser_lifecycle.py | 608 | # Step 1: open all sessions |
| LOW | tests/async/test_browser_lifecycle.py | 615 | # Step 2: navigate each session to a second page |
| LOW | tests/async/test_browser_lifecycle.py | 620 | # Step 3: kill sessions one by one, verify others unaffected |
| LOW | tests/async/test_browser_lifecycle.py | 936 | # Step 1: open session |
| LOW | tests/async/test_browser_lifecycle.py | 943 | # Step 2: concurrent non-session crawls |
| LOW | tests/async/test_browser_lifecycle.py | 952 | # Step 3: kill session |
| LOW | tests/async/test_browser_lifecycle.py | 955 | # Step 4: trigger recycle |
| LOW | tests/async/test_browser_lifecycle.py | 962 | # Step 5: new session on fresh browser |
| LOW | tests/async/test_browser_lifecycle.py | 970 | # Step 6: verify it works |
| LOW | tests/async/test_browser_memory.py | 774 | # Step 1: login — sets cookie |
| LOW | tests/async/test_browser_memory.py | 779 | # Step 2: dashboard — cookie should carry over via session |
| LOW | tests/browser/test_builtin_browser.py | 52 | # Step 1: Create a BrowserManager with builtin mode |
| LOW | tests/browser/test_builtin_browser.py | 57 | # Step 2: Check if we have a BuiltinBrowserStrategy |
| LOW | tests/browser/test_builtin_browser.py | 69 | # Step 3: Start the manager to launch or connect to builtin browser |
| LOW | tests/browser/test_builtin_browser.py | 78 | # Step 4: Get browser info from the strategy |
| LOW | tests/browser/test_builtin_browser.py | 149 | # Step 1: Get browser status |
| LOW | tests/browser/test_builtin_browser.py | 160 | # Step 2: Test killing the browser |
| LOW | tests/browser/test_builtin_browser.py | 172 | # Step 3: Check status after kill |
| LOW | tests/browser/test_builtin_browser.py | 184 | # Step 4: Launch a new browser |
| LOW | tests/browser/test_builtin_browser.py | 206 | # Step 1: Create first manager |
| LOW | tests/browser/test_builtin_browser.py | 211 | # Step 2: Create second manager |
| LOW | tests/browser/test_builtin_browser.py | 216 | # Step 3: Start both managers (should connect to the same builtin browser) |
| LOW | tests/browser/test_builtin_browser.py | 263 | # Step 5: Close both managers |
| LOW | tests/browser/test_builtin_browser.py | 103 | # Step 1: Get a single page |
| LOW | tests/browser/test_builtin_browser.py | 122 | # Step 2: Get multiple pages |
| LOW | tests/browser/test_builtin_browser.py | 241 | # Step 4: Test using both managers |
| LOW | tests/browser/test_builtin_browser.py | 282 | # Step 1: Test multiple starts with the same manager |
| LOW | tests/browser/test_builtin_browser.py | 309 | # Step 2: Test killing the browser while manager is active |
| LOW | tests/browser/test_builtin_browser.py | 472 | # Step 1: Create and start multiple browser managers in parallel |
| LOW | tests/browser/test_builtin_browser.py | 666 | # Step 1: Create and start multiple browser managers in parallel |
| LOW | tests/async_assistant/test_extract_pipeline.py | 62 | # Step 1: Starting |
| LOW | tests/async_assistant/test_extract_pipeline.py | 65 | # Step 2: Quick crawl for analysis |
| LOW | tests/async_assistant/test_extract_pipeline.py | 79 | # Step 3: HTML Skimming using lxml |
| 47 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | deploy/docker/c4ai-doc-context.md | 1105 | llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY") |
| HIGH | docs/md_v2/complete-sdk-reference.md | 2928 | llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY") |
| HIGH | docs/md_v2/core/adaptive-crawling.md | 119 | api_token='your-api-key' |
| HIGH | docs/md_v2/core/adaptive-crawling.md | 124 | api_token='your-api-key' |
| HIGH | docs/md_v2/core/adaptive-crawling.md | 133 | 'api_token': 'your-api-key' |
| HIGH | docs/md_v2/core/adaptive-crawling.md | 137 | 'api_token': 'your-api-key' |
| HIGH | docs/md_v2/core/table_extraction.md | 141 | api_token="your_api_key", |
| HIGH | docs/md_v2/core/content-selection.md | 305 | llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY") |
| HIGH | docs/md_v2/marketplace/frontend/app-detail.js | 181 | api_token="your-api-key", |
| HIGH | docs/md_v2/marketplace/frontend/app-detail.html | 167 | api_token="your-api-key", |
| HIGH | docs/md_v2/blog/releases/0.5.0.md | 409 | llm_config = LLMConfig(provider="openai/gpt-4o", api_token="YOUR_API_KEY") |
| HIGH | …cs/examples/url_seeder/bbc_sport_research_assistant.py | 21 | - export GEMINI_API_KEY="your-api-key" |
| HIGH | docs/examples/website-to-api/README.md | 68 | "api_token": "your-api-key-here" |
| HIGH | docs/examples/website-to-api/README.md | 160 | "api_token": "your-api-key-here" |
| HIGH | docs/examples/website-to-api/README.md | 204 | api_token="your-api-key" |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | crawl4ai/async_crawler_strategy.back.py | 290 | Wait for a condition in a CSP-compliant way. Args: page: Playwright page object |
| HIGH | crawl4ai/async_crawler_strategy.back.py | 424 | Crawls a given URL or processes raw HTML/local file content based on the URL prefix. Args: |
| HIGH | crawl4ai/extraction_strategy.py | 1057 | Evaluate a computed field expression safely using AST validation. Allows simple transforms (math, string metho |
| HIGH | crawl4ai/extraction_strategy.py | 1762 | Generate extraction schema from HTML content or URL(s) (sync version). Args: html (str, op |
| HIGH | crawl4ai/extraction_strategy.py | 1834 | Generate extraction schema from HTML content or URL(s) (async version). Use this method when calling f |
| HIGH | crawl4ai/utils.py | 1150 | Extracts and cleans content from website HTML, optimizing for useful media and contextual information. Par |
| HIGH | crawl4ai/utils.py | 3735 | Convert hook function objects to string representations for Docker API. This utility simplifies the process of |
| HIGH | crawl4ai/async_crawler_strategy.py | 301 | Wait for a condition in a CSP-compliant way. Args: page: Playwright page object |
| HIGH | crawl4ai/async_crawler_strategy.py | 439 | Crawls a given URL or processes raw HTML/local file content based on the URL prefix. Args: |
| HIGH | crawl4ai/async_webcrawler.py | 966 | Runs the crawler for multiple URLs concurrently using a configurable dispatcher strategy. Args: |
| HIGH | crawl4ai/async_webcrawler.py | 1106 | Discovers, filters, and optionally validates URLs for a given domain(s) using sitemaps and Common Crawl |
| HIGH | crawl4ai/script/c4ai_script.py | 624 | Compile C4A-Script from string or list of strings to JavaScript. Args: script: C4A-Script as a string o |
| HIGH | deploy/docker/c4ai-code-context.md | 2021 | Runs the crawler for multiple URLs concurrently using a configurable dispatcher strategy. Args: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | docker-compose.yml | 1 | version: '3.8' |
| LOW | crawl4ai/async_crawler_strategy.back.py | 841 | # """ |
| LOW | crawl4ai/async_crawler_strategy.back.py | 861 | # opacity: style.opacity, |
| LOW | crawl4ai/async_crawler_strategy.back.py | 1821 | for script in scripts: |
| LOW | crawl4ai/adaptive_crawler.py | 201 | embedding_top_k_weight: float = 0.3 # Weight for top-k average in hybrid scoring |
| LOW | crawl4ai/adaptive_crawler.py | 561 | # if hasattr(result, 'extracted_content') and result.extracted_content: |
| LOW | crawl4ai/adaptive_crawler.py | 761 | |
| LOW | crawl4ai/adaptive_crawler.py | 1021 | |
| LOW | crawl4ai/adaptive_crawler.py | 1041 | |
| LOW | crawl4ai/adaptive_crawler.py | 1061 | # # Top-k average (top 3) |
| LOW | crawl4ai/models.py | 161 | # Anti-bot retry/proxy usage stats |
| LOW | crawl4ai/extraction_strategy.py | 241 | # self.model = load_onnx_all_MiniLM_l6_v2() |
| LOW | crawl4ai/__init__.py | 201 | "UndetectedAdapter", |
| LOW | crawl4ai/__init__.py | 221 | # print( |
| LOW | crawl4ai/adaptive_crawler copy.py | 961 | # # Get cached distance matrix |
| LOW | crawl4ai/adaptive_crawler copy.py | 981 | # 'very_close_neighbors': np.sum(distances < 0.2), |
| LOW | crawl4ai/adaptive_crawler copy.py | 1001 | # hybrid_score = nearest_weight * nearest_score + top_k_weight * top_k_avg |
| LOW | crawl4ai/utils.py | 1821 | # print("Error during completion request:", str(e)) |
| LOW | crawl4ai/utils.py | 3221 | # if parent is None: |
| LOW | crawl4ai/utils.py | 3241 | # for element in elements[1:]: |
| LOW | crawl4ai/async_crawler_strategy.py | 841 | ) |
| LOW | crawl4ai/async_crawler_strategy.py | 861 | # except Error as e: |
| LOW | crawl4ai/async_crawler_strategy.py | 2041 | scripts = js_code |
| LOW | crawl4ai/async_url_seeder.py | 1641 | # # 5. API endpoints and data files |
| LOW | crawl4ai/async_url_seeder.py | 1661 | # '.woff', '.woff2', '.ttf', '.eot', '.otf' |
| LOW | crawl4ai/html2text/__init__.py | 1181 | self.o(data) # Directly output the data as-is (preserve newlines) |
| LOW | crawl4ai/html2text/__init__.py | 1201 | # else: |
| LOW | crawl4ai/legacy/crawler_strategy.py | 101 | self.options.add_argument("--headless") |
| LOW | deploy/docker/c4ai-code-context.md | 2001 | |
| LOW | deploy/docker/c4ai-code-context.md | 5361 | dispatch_result: Optional[DispatchResult] = None |
| LOW | tests/test_webhook_feature.sh | 1 | #!/bin/bash |
| LOW | tests/test_llm_simple_url.py | 101 | # result_default = await crawler.arun( |
| LOW | tests/test_llm_simple_url.py | 121 | # print(f" Default headers: {len(default_first['headers'])} columns") |
| LOW | tests/test_cli_docs.py | 21 | |
| LOW | tests/docker/test_hooks_utility.py | 181 | # print("✓ All tests completed successfully!") |
| LOW | tests/docker/simple_api_test.py | 141 | # result = self.test_get_endpoint("/schema") |
| LOW | tests/docker/test_serialization.py | 121 | # WebScrapingStrategy, LXMLWebScrapingStrategy |
| LOW | tests/docker/test_serialization.py | 141 | # print("\nSerialized Config:") |
| LOW | tests/docker/test_serialization.py | 161 | # "language": "english" |
| LOW | tests/general/test_async_crawler_strategy.py | 241 | # async def test_js_return_values(crawler_strategy): |
| LOW | tests/general/test_async_crawler_strategy.py | 281 | # nonExistentFunction(); |
| LOW | tests/async/test_error_handling.py | 1 | # import os |
| LOW | tests/async/test_error_handling.py | 21 | # async def cleanup(self): |
| LOW | tests/async/test_error_handling.py | 41 | # # # Simulating a timeout by using a very short timeout value |
| LOW | tests/async/test_error_handling.py | 61 | # # @pytest.mark.asyncio |
| LOW | …est_evaluation_scraping_methods_performance.configs.py | 281 | # "exclude_social_media_links": { |
| LOW | …est_evaluation_scraping_methods_performance.configs.py | 301 | # "combo_mode": { |
| LOW | …est_evaluation_scraping_methods_performance.configs.py | 321 | # "css_selector": "section#promo-section" |
| LOW | …est_evaluation_scraping_methods_performance.configs.py | 341 | # "remove_forms": True |
| LOW | …est_evaluation_scraping_methods_performance.configs.py | 561 | |
| LOW | …est_evaluation_scraping_methods_performance.configs.py | 581 | # if link_diff: |
| LOW | tests/async/test_chunking_and_extraction_strategies.py | 21 | result = await crawler.arun( |
| LOW | tests/async/test_chunking_and_extraction_strategies.py | 61 | assert len(extracted_data) > 0 |
| LOW | tests/async/test_edge_cases.py | 21 | |
| LOW | tests/async/test_edge_cases.py | 41 | # url = "https://news.ycombinator.com/" # Hacker News has infinite scroll |
| LOW | tests/browser/manager/demo_browser_manager.py | 461 | start_time = time.time() |
| LOW | tests/profiler/test_create_profile.py | 21 | |
| LOW | docs/md_v2/complete-sdk-reference.md | 3661 | # ❌ Random URLs (site.com/x7f9g2h) |
| LOW | docs/md_v2/advanced/hooks-auth.md | 81 | # Example 2: (Optional) Simulate a login scenario |
| LOW | docs/md_v2/core/link-media.md | 281 | # ✅ Clean URL structure (docs.python.org/api/reference) |
| 8 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | crawl4ai/docker_client.py | 212 | await client.authenticate("user@example.com") |
| LOW | deploy/docker/WEBHOOK_EXAMPLES.md | 206 | "author": "John Doe", |
| LOW | deploy/docker/README.md | 473 | # await client.authenticate("user@example.com") # See Server Configuration section |
| LOW | deploy/docker/README.md | 823 | "author": "John Doe", |
| LOW | deploy/docker/c4ai-doc-context.md | 2784 | "email": "user@example.com" |
| LOW | deploy/docker/c4ai-doc-context.md | 2791 | "email": "user@example.com", |
| LOW | deploy/docker/c4ai-doc-context.md | 2809 | await client.authenticate("user@example.com") |
| LOW | deploy/docker/c4ai-doc-context.md | 3549 | ["John Doe", "34", "New York"], |
| LOW | deploy/docker/tests/test_security_fixes.py | 240 | self.assertEqual(result, "John Doe") |
| LOW | tests/memory/test_stress_sdk.py | 57 | self.lorem_words = " ".join("lorem ipsum dolor sit amet " * 100).split() |
| LOW | tests/memory/test_stress_sdk.py | 57 | self.lorem_words = " ".join("lorem ipsum dolor sit amet " * 100).split() |
| LOW | tests/async/test_browser_memory.py | 37 | <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. |
| LOW | tests/async/test_browser_memory.py | 37 | <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. |
| LOW | docs/md_v2/core/self-hosting.md | 1281 | "author": "John Doe", |
| LOW | docs/md_v2/core/self-hosting.md | 1642 | # await client.authenticate("user@example.com") # See Server Configuration section |
| LOW | docs/md_v2/core/c4a-script.md | 174 | | `SET` | Set input value directly | `SET \`#email\` "user@example.com"` | |
| LOW | docs/md_v2/core/c4a-script.md | 204 | TYPE "user@example.com" |
| LOW | docs/md_v2/core/c4a-script.md | 250 | SET `#name` "John Doe" |
| LOW | docs/md_v2/core/c4a-script.md | 347 | TYPE "user@example.com" |
| LOW | docs/md_v2/marketplace/backend/dummy_data.py | 154 | "Review", "John Doe", ["Playwright Cloud", "Puppeteer Extra"], |
| LOW | docs/md_v2/marketplace/backend/dummy_data.py | 209 | This is a comprehensive article about {title.lower()}. Lorem ipsum dolor sit amet, consectetur adipiscing elit. |
| LOW | docs/md_v2/marketplace/backend/dummy_data.py | 209 | This is a comprehensive article about {title.lower()}. Lorem ipsum dolor sit amet, consectetur adipiscing elit. |
| LOW | docs/md_v2/blog/releases/0.7.6.md | 105 | "author": "John Doe", |
| LOW | docs/md_v2/api/c4a-script-reference.md | 373 | TYPE "user@example.com" |
| LOW | docs/md_v2/api/c4a-script-reference.md | 397 | SETVAR email = "user@example.com" |
| LOW | docs/md_v2/api/c4a-script-reference.md | 527 | SET `#email` "user@example.com" |
| LOW | docs/md_v2/api/c4a-script-reference.md | 766 | TYPE "user@example.com" |
| LOW | docs/md_v2/api/c4a-script-reference.md | 894 | SETVAR email = "user@example.com" |
| LOW | docs/md_v2/api/c4a-script-reference.md | 951 | SET `#name` "John Doe" |
| LOW | docs/md_v2/apps/crawl4ai-assistant/index.html | 616 | <input type="text" id="userName" name="name" placeholder="John Doe" required> |
| LOW | docs/md_v2/apps/c4a-script/server.py | 264 | TYPE "John Doe" |
| LOW | docs/md_v2/apps/c4a-script/README.md | 149 | TYPE "user@example.com" |
| LOW | docs/md_v2/apps/c4a-script/playground/index.html | 276 | <p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p> |
| LOW | docs/md_v2/apps/c4a-script/playground/index.html | 276 | <p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p> |
| LOW | docs/md_v2/apps/c4a-script/assets/app.js | 596 | script: `# Multi-step form with validation\nCLICK \`a[href="#forms"]\`\nWAIT \`#survey-form\` 2\n\n# Ste |
| LOW | docs/blog/release-v0.7.6.md | 105 | "author": "John Doe", |
| LOW | docs/examples/docker_config_obj.py | 124 | await client.authenticate("user@example.com") |
| LOW | docs/examples/docker_config_obj.py | 193 | json={"email": "user@example.com"} |
| LOW | docs/examples/c4a_script/generate_script_hello_world.py | 28 | goal = "Fill in email 'user@example.com', password 'secret123', and submit the form" |
| LOW | docs/examples/c4a_script/tutorial/server.py | 264 | TYPE "John Doe" |
| LOW | docs/examples/c4a_script/tutorial/README.md | 149 | TYPE "user@example.com" |
| LOW | docs/examples/c4a_script/tutorial/playground/index.html | 276 | <p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p> |
| LOW | docs/examples/c4a_script/tutorial/playground/index.html | 276 | <p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p> |
| LOW | docs/examples/c4a_script/tutorial/assets/app.js | 596 | script: `# Multi-step form with validation\nCLICK \`a[href="#forms"]\`\nWAIT \`#survey-form\` 2\n\n# Ste |
| LOW | docs/examples/website-to-api/static/index.html | 89 | "author": "John Doe", |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | crawl4ai/async_crawler_strategy.back.py | 1533 | # Log the error but don't raise it - we'll just return None for the MHTML |
| LOW | crawl4ai/async_crawler_strategy.back.py | 330 | # For timeout or other cases, just return False |
| MEDIUM | crawl4ai/adaptive_crawler.py | 1571 | """Print comprehensive statistics about the knowledge base |
| MEDIUM | crawl4ai/adaptive_crawler copy.py | 1495 | """Print comprehensive statistics about the knowledge base |
| MEDIUM | crawl4ai/prompts.py | 1174 | GENERATE_SCRIPT_PROMPT = r"""You are a world-class browser automation specialist. Your sole purpose is to convert a natu |
| LOW | crawl4ai/async_crawler_strategy.py | 1653 | # Log the error but don't raise it - we'll just return None for the MHTML |
| LOW | crawl4ai/async_crawler_strategy.py | 341 | # For timeout or other cases, just return False |
| MEDIUM | crawl4ai/async_url_seeder.py | 1139 | # Use lxml for XML parsing if available, as it's generally more robust |
| LOW | crawl4ai/browser_manager.py | 183 | # If CDP URL provided, just return it |
| LOW | deploy/docker/server.py | 825 | # if no query, just return raw contexts |
| LOW | tests/test_source_sibling_selector.py | 309 | # This is actually fine — let's just use "source" with flat fields instead. |
| MEDIUM | tests/docker/test_hooks_comprehensive.py | 521 | """Run comprehensive hook tests""" |
| MEDIUM | tests/memory/test_dispatcher_stress.py | 269 | # First, elevate memory usage to create pressure |
| MEDIUM | tests/memory/benchmark_report.py | 374 | """Generate a comprehensive comparison report of multiple test runs. |
| MEDIUM | tests/general/test_mhtml.py | 5 | import re # For more robust MHTML checks |
| MEDIUM | tests/general/test_mhtml.py | 54 | # 3. Check for MHTML structure indicators (more robust than simple string contains) |
| MEDIUM | tests/general/test_async_url_seeder_bm25.py | 597 | """Generate a comprehensive report of BM25 scoring effectiveness.""" |
| LOW | …est_evaluation_scraping_methods_performance.configs.py | 69 | # No <body> found; just return the <html> root |
| MEDIUM | …est_evaluation_scraping_methods_performance.configs.py | 110 | # If you prefer ignoring newlines or multiple whitespace, do a more robust cleanup |
| MEDIUM | …s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js | 780 | // Try to generate a robust selector |
| LOW | docs/releases_review/v0.7.5_docker_hooks_demo.py | 367 | # Use our reusable hook library - just pass the function objects! |
| MEDIUM | docs/releases_review/demo_v0.7.7.py | 477 | """Print comprehensive demo summary""" |
| MEDIUM | docs/releases_review/demo_v0.7.7.py | 612 | # Print comprehensive summary |
| MEDIUM | docs/examples/stealth_mode_example.py | 510 | # Show best practices |
| LOW | docs/examples/docker_hooks_examples.py | 359 | # Use our reusable hook library - just pass the function objects! |
| Severity | File | Line | Snippet |
|---|---|---|---|
| CRITICAL | docs/md_v2/apps/crawl4ai-assistant/libs/marked.min.js | 47 | `+s.text,this.inlineQueue.pop(),this.inlineQueue.at(-1).src=r.text):t.push(s);continue}if(e){let r="Infinite loop on byt |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | crawl4ai/user_agent_generator.py | 417 | # Example usage: |
| LOW | crawl4ai/user_agent_generator.py | 420 | # Usage example: |
| LOW | crawl4ai/browser_profiler.py | 1379 | # Example usage |
| LOW | crawl4ai/docker_client.py | 209 | # Example usage |
| LOW | crawl4ai/processors/pdf/processor.py | 456 | # Usage example |
| LOW | tests/profiler/test_create_profile.py | 6 | # Example usage |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | crawl4ai/models.py | 327 | # When removing this code in the future, make sure to: |
| MEDIUM | crawl4ai/table_extraction.py | 1268 | This is a basic implementation - for complex CSS selectors, |
| MEDIUM | crawl4ai/processors/pdf/__init__.py | 137 | # For simple cases, you can use the sync version |
| MEDIUM | docs/examples/undetected_simple_demo.py | 88 | # Test URLs - you can change these |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | crawl4ai/deep_crawling/crazy.py | 96 | |
| MEDIUM | crawl4ai/legacy/web_crawler.py | 80 |