Repository Analysis

unclecode/crawl4ai

🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

27.5 Moderate AI signal View on GitHub
27.5
Adjusted Score
27.5
Raw Score
100%
Time Factor
2026-05-25
Last Push
67,297
Stars
Python
Language
306,921
Lines of Code
748
Files
4913
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 1HIGH 380MEDIUM 1062LOW 3470

Pattern Findings

4913 matches across 20 categories. Click a row to expand file-level details.

Decorative Section Separators718 hits · 2420 pts
SeverityFileLineSnippet
MEDIUMcrawl4ai/antibot_detector.py22# ---------------------------------------------------------------------------
MEDIUMcrawl4ai/antibot_detector.py25# ---------------------------------------------------------------------------
MEDIUMcrawl4ai/antibot_detector.py69# ---------------------------------------------------------------------------
MEDIUMcrawl4ai/antibot_detector.py73# ---------------------------------------------------------------------------
MEDIUMcrawl4ai/antibot_detector.py100# ---------------------------------------------------------------------------
MEDIUMcrawl4ai/antibot_detector.py103# ---------------------------------------------------------------------------
MEDIUMcrawl4ai/antibot_detector.py114# ---------------------------------------------------------------------------
MEDIUMcrawl4ai/antibot_detector.py116# ---------------------------------------------------------------------------
MEDIUMcrawl4ai/async_crawler_strategy.back.py769 # ──────────────────────────────────────────────────────────────
MEDIUMcrawl4ai/async_crawler_strategy.back.py774 # ──────────────────────────────────────────────────────────────
MEDIUMcrawl4ai/utils.py3204 # ── build signature ───────────────────────────────────────────
MEDIUMcrawl4ai/utils.py3210 # ── first seen? keep – else drop ─────────────
MEDIUMcrawl4ai/browser_profiler.py554 # 1. ── Start the browser ─────────────────────────────────────────
MEDIUMcrawl4ai/browser_profiler.py557 # 2. ── Attach Playwright to that running Chrome ──────────────────
MEDIUMcrawl4ai/browser_profiler.py586 # 3. ── Persist storage state *before* we kill Chrome ─────────────
MEDIUMcrawl4ai/browser_profiler.py594 # 4. ── Close everything cleanly ──────────────────────────────────
MEDIUMcrawl4ai/async_crawler_strategy.py780 # ──────────────────────────────────────────────────────────────
MEDIUMcrawl4ai/async_crawler_strategy.py785 # ──────────────────────────────────────────────────────────────
MEDIUMcrawl4ai/async_url_seeder.py362 # ─────────────────────────────── discovery entry
MEDIUMcrawl4ai/async_url_seeder.py828 # ─────────────────────────────── CC
MEDIUMcrawl4ai/async_url_seeder.py63# ────────────────────────────────────────────────────────────────────────── consts
MEDIUMcrawl4ai/async_url_seeder.py78# ────────────────────────────────────────────────────────────────────────── helpers
MEDIUMcrawl4ai/async_url_seeder.py258# ────────────────────────────────────────────────────────────────────────── class
MEDIUMcrawl4ai/async_url_seeder.py322 # ───────── cache dirs ─────────
MEDIUMcrawl4ai/async_url_seeder.py338 # ───────── cache helpers ─────────
MEDIUMcrawl4ai/async_url_seeder.py884 # ─────────────────────────────── Sitemaps
MEDIUMcrawl4ai/async_url_seeder.py1280 # ─────────────────────────────── validate helpers
MEDIUMcrawl4ai/async_url_seeder.py1465 # ─────────────────────────────── BM25 scoring helpers
MEDIUMcrawl4ai/async_url_seeder.py1749 # ─────────────────────────────── cleanup methods
MEDIUMcrawl4ai/async_url_seeder.py1765 # ─────────────────────────────── index helper
MEDIUMcrawl4ai/browser_manager.py547 # ── 1. cookies ────────────────────────────────────────────────────────────
MEDIUMcrawl4ai/browser_manager.py552 # ── 2. localStorage / sessionStorage ──────────────────────────────────────
MEDIUMcrawl4ai/browser_manager.py565 # ── 3. runtime-mutable extras from configs ────────────────────────────────
MEDIUMcrawl4ai/browser_manager.py811 # ── Persistent context via Playwright's native API ──────────────
MEDIUMcrawl4ai/legacy/llmtxt.py308 # -----------------------------------------------------
MEDIUMdeploy/docker/server.py70# ── internal imports (after sys.path append) ─────────────────
MEDIUMdeploy/docker/server.py73# ────────────────── configuration / logging ──────────────────
MEDIUMdeploy/docker/server.py79# ── global page semaphore (hard cap) ─────────────────────────
MEDIUMdeploy/docker/server.py83# ── security feature flags ───────────────────────────────────
MEDIUMdeploy/docker/server.py87# ── default browser config helper ─────────────────────────────
MEDIUMdeploy/docker/server.py166# ───────────────────── FastAPI instance ──────────────────────
MEDIUMdeploy/docker/server.py173# ── static playground ──────────────────────────────────────
MEDIUMdeploy/docker/server.py183# ── static monitor dashboard ────────────────────────────────
MEDIUMdeploy/docker/server.py193# ── static assets (logo, etc) ────────────────────────────────
MEDIUMdeploy/docker/server.py306# ── job router ──────────────────────────────────────────────
MEDIUMdeploy/docker/server.py309# ── monitor router ──────────────────────────────────────────
MEDIUMdeploy/docker/server.py315# ──────────────────────── Endpoints ──────────────────────────
MEDIUMdeploy/docker/server.py1# ───────────────────────── server.py ─────────────────────────
MEDIUMdeploy/docker/server.py9# ── stdlib & 3rd‑party imports ───────────────────────────────
MEDIUMdeploy/docker/server.py115# ───────────────────── FastAPI lifespan ──────────────────────
MEDIUMdeploy/docker/server.py207# ─────────────────── infra / middleware ─────────────────────
MEDIUMdeploy/docker/server.py255# ───────────────── URL validation helper ─────────────────
MEDIUMdeploy/docker/server.py268# ───────────────── safe config‑dump helper ─────────────────
MEDIUMdeploy/docker/server.py875# ────────────────────────── cli ──────────────────────────────
MEDIUMdeploy/docker/server.py885# ─────────────────────────────────────────────────────────────
MEDIUMdeploy/docker/mcp_bridge.py19# ── opt‑in decorators ───────────────────────────────────────────
MEDIUMdeploy/docker/mcp_bridge.py38# ── HTTP‑proxy helper for FastAPI endpoints ─────────────────────
MEDIUMdeploy/docker/mcp_bridge.py67# ── main entry point ────────────────────────────────────────────
MEDIUMdeploy/docker/mcp_bridge.py179 # ── WebSocket transport ────────────────────────────────────
MEDIUMdeploy/docker/mcp_bridge.py238 # ── schema endpoint ───────────────────────────────────────
658 more matches not shown…
Hyper-Verbose Identifiers1390 hits · 1318 pts
SeverityFileLineSnippet
LOWPROGRESSIVE_CRAWLING.md174def generate_synthetic_dataset(domain_url):
LOWtest_webhook_implementation.py43def test_webhook_service_init():
LOWtest_webhook_implementation.py93def test_webhook_config_model():
LOWtest_webhook_implementation.py145def test_payload_construction():
LOWtest_llm_webhook_feature.py15def test_llm_job_payload_model():
LOWtest_llm_webhook_feature.py65def test_handle_llm_request_signature():
LOWtest_llm_webhook_feature.py101def test_process_llm_extraction_signature():
LOWtest_llm_webhook_feature.py136def test_webhook_integration_in_api():
LOWtest_llm_webhook_feature.py187def test_job_endpoint_integration():
LOWtest_llm_webhook_feature.py239def test_create_new_task_integration():
LOWcrawl4ai/antibot_detector.py138def _structural_integrity_check(html: str) -> Tuple[bool, str]:
LOWcrawl4ai/async_crawler_strategy.back.py611 async def handle_request_failed_capture(request):
LOWcrawl4ai/async_crawler_strategy.back.py1542 async def _capture_console_messages(
LOWcrawl4ai/async_crawler_strategy.back.py1791 async def robust_execute_user_script(
LOWcrawl4ai/adaptive_crawler.py259 def _embedding_llm_config_dict(self) -> Optional[Dict]:
LOWcrawl4ai/adaptive_crawler.py632 def _get_embedding_llm_config_dict(self) -> Optional[Dict]:
LOWcrawl4ai/adaptive_crawler.py642 def _get_query_llm_config_dict(self) -> Optional[Dict]:
LOWcrawl4ai/adaptive_crawler.py708 def _get_cached_distance_matrix(self, query_embeddings: Any, kb_embeddings: Any) -> Any:
LOWcrawl4ai/adaptive_crawler.py871 async def select_links_for_expansion(
LOWcrawl4ai/adaptive_crawler.py1808 def _crawl_result_to_export_dict(self, result) -> Dict[str, Any]:
LOWcrawl4ai/adaptive_crawler.py1880 def _import_dict_to_crawl_result(self, data: Dict[str, Any]):
LOWcrawl4ai/extraction_strategy.py2244 def _make_context_sensitive_xpath(self, xpath, element):
LOWcrawl4ai/extraction_strategy.py2296 def _fallback_class_id_search(self, element, selector_str):
LOWcrawl4ai/extraction_strategy.py280 def filter_documents_embeddings(
LOWcrawl4ai/extraction_strategy.py416 def filter_clusters_by_word_count(
LOWcrawl4ai/extraction_strategy.py2165 def _create_selector_function(self, selector_str):
LOWcrawl4ai/extraction_strategy.py2267 def _handle_nth_child_selector(self, element, selector_str):
LOWcrawl4ai/browser_adapter.py43 async def retrieve_console_messages(self, page: Page) -> List[Dict]:
LOWcrawl4ai/browser_adapter.py133 async def retrieve_console_messages(self, page: Page) -> List[Dict]:
LOWcrawl4ai/browser_adapter.py158 def _check_stealth_availability(self) -> bool:
LOWcrawl4ai/browser_adapter.py261 async def retrieve_console_messages(self, page: Page) -> List[Dict]:
LOWcrawl4ai/browser_adapter.py380 async def retrieve_console_messages(self, page: UndetectedPage) -> List[Dict]:
LOWcrawl4ai/__init__.py206# def is_sync_version_installed():
LOWcrawl4ai/markdown_generation_strategy.py82 def convert_links_to_citations(
LOWcrawl4ai/adaptive_crawler copy.py643 def _get_cached_distance_matrix(self, query_embeddings: Any, kb_embeddings: Any) -> Any:
LOWcrawl4ai/adaptive_crawler copy.py799 async def select_links_for_expansion(
LOWcrawl4ai/adaptive_crawler copy.py1732 def _crawl_result_to_export_dict(self, result) -> Dict[str, Any]:
LOWcrawl4ai/adaptive_crawler copy.py1804 def _import_dict_to_crawl_result(self, data: Dict[str, Any]):
LOWcrawl4ai/content_scraping_strategy.py380 def find_closest_parent_with_useful_text(
LOWcrawl4ai/content_scraping_strategy.py517 def remove_empty_elements_fast(self, root, word_count_threshold=5):
LOWcrawl4ai/content_scraping_strategy.py569 def remove_unwanted_attributes_fast(
LOWcrawl4ai/user_agent_generator.py344 def generate_with_client_hints(self, **kwargs) -> Tuple[str, str]:
LOWcrawl4ai/cli.py467def delete_profile_interactive(profiler: BrowserProfiler):
LOWcrawl4ai/cli.py444async def create_profile_interactive(profiler: BrowserProfiler):
LOWcrawl4ai/utils.py1143def get_content_of_website_optimized(
LOWcrawl4ai/utils.py1195 def find_closest_parent_with_useful_text(tag):
LOWcrawl4ai/utils.py3267def start_colab_display_server():
LOWcrawl4ai/utils.py534def calculate_semaphore_count():
LOWcrawl4ai/utils.py707def split_and_parse_json_objects(json_string):
LOWcrawl4ai/utils.py981 def replace_pre_tags_with_text(node):
LOWcrawl4ai/utils.py1020 def remove_empty_and_low_word_count_elements(node, word_count_threshold):
LOWcrawl4ai/utils.py1242 def score_image_for_usefulness(img, base_url, index, images_count):
LOWcrawl4ai/utils.py1497def extract_metadata_using_lxml(html, doc=None):
LOWcrawl4ai/utils.py1742def perform_completion_with_backoff(
LOWcrawl4ai/utils.py1834async def aperform_completion_with_backoff(
LOWcrawl4ai/utils.py2040def merge_chunks_based_on_token_threshold(chunks, token_threshold):
LOWcrawl4ai/utils.py2336def normalize_url_for_deep_crawl(href, base_url, preserve_https=False, original_scheme=None):
LOWcrawl4ai/utils.py2395def efficient_normalize_url_for_deep_crawl(href, base_url, preserve_https=False, original_scheme=None):
LOWcrawl4ai/utils.py2966def configure_windows_event_loop():
LOWcrawl4ai/utils.py3122def preprocess_html_for_schema(html_content, text_threshold=100, attr_value_threshold=200, max_size=100000):
1330 more matches not shown…
Cross-File Repetition261 hits · 1305 pts
SeverityFileLineSnippet
HIGHCHANGELOG.md0const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); }
HIGHdeploy/docker/c4ai-doc-context.md0const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); }
HIGHdocs/md_v2/advanced/file-downloading.md0const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); }
HIGHcrawl4ai/async_crawler_strategy.back.py0() => { const element = document.body; if (!element) return false; const style = window.getcomputedstyle(element); const
HIGHcrawl4ai/async_crawler_strategy.back.py0() => { const element = document.body; if (!element) return false; const style = window.getcomputedstyle(element); const
HIGHcrawl4ai/async_crawler_strategy.py0() => { const element = document.body; if (!element) return false; const style = window.getcomputedstyle(element); const
HIGHcrawl4ai/async_crawler_strategy.py0() => { const element = document.body; if (!element) return false; const style = window.getcomputedstyle(element); const
HIGHcrawl4ai/proxy_strategy.py0configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona
HIGHcrawl4ai/async_configs.py0configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona
HIGHdeploy/docker/c4ai-code-context.md0configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona
HIGHcrawl4ai/proxy_strategy.py0load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str
HIGHcrawl4ai/async_configs.py0load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str
HIGHdeploy/docker/c4ai-code-context.md0load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str
HIGHcrawl4ai/proxy_strategy.py0create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd
HIGHcrawl4ai/async_configs.py0create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd
HIGHdeploy/docker/c4ai-code-context.md0create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd
HIGHcrawl4ai/models.py0deprecated property that raises an attributeerror when accessed.
HIGHcrawl4ai/models.py0deprecated property that raises an attributeerror when accessed.
HIGHdeploy/docker/c4ai-code-context.md0deprecated property that raises an attributeerror when accessed.
HIGHdeploy/docker/c4ai-code-context.md0deprecated property that raises an attributeerror when accessed.
HIGHcrawl4ai/async_crawler_strategy.py0const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig
HIGHcrawl4ai/browser_manager.py0const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig
HIGHtests/browser/test_init_script_dedup.py0const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig
HIGHcrawl4ai/async_configs.py0recursively convert an object to a serializable dictionary using {type, params} structure for complex objects.
HIGHdeploy/docker/c4ai-code-context.md0recursively convert an object to a serializable dictionary using {type, params} structure for complex objects.
HIGHtests/docker/test_serialization.py0recursively convert an object to a serializable dictionary using {type, params} structure for complex objects.
HIGHcrawl4ai/async_configs.py0recursively convert a serializable dictionary back to an object instance.
HIGHdeploy/docker/c4ai-code-context.md0recursively convert a serializable dictionary back to an object instance.
HIGHtests/docker/test_serialization.py0recursively convert a serializable dictionary back to an object instance.
HIGHcrawl4ai/deep_crawling/bfs_strategy.py0batch (non-streaming) mode: processes one bfs level at a time, then yields all the results.
HIGHcrawl4ai/deep_crawling/base_strategy.py0batch (non-streaming) mode: processes one bfs level at a time, then yields all the results.
HIGHdeploy/docker/c4ai-code-context.md0batch (non-streaming) mode: processes one bfs level at a time, then yields all the results.
HIGHdeploy/docker/c4ai-code-context.md0batch (non-streaming) mode: processes one bfs level at a time, then yields all the results.
HIGHcrawl4ai/deep_crawling/bfs_strategy.py0streaming mode: processes one bfs level at a time and yields results immediately as they arrive.
HIGHcrawl4ai/deep_crawling/base_strategy.py0streaming mode: processes one bfs level at a time and yields results immediately as they arrive.
HIGHdeploy/docker/c4ai-code-context.md0streaming mode: processes one bfs level at a time and yields results immediately as they arrive.
HIGHdeploy/docker/c4ai-code-context.md0streaming mode: processes one bfs level at a time and yields results immediately as they arrive.
HIGHdeploy/docker/c4ai-code-context.md0from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi
HIGHdeploy/docker/c4ai-doc-context.md0from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi
HIGHdocs/md_v2/complete-sdk-reference.md0from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi
HIGHdocs/md_v2/core/quickstart.md0from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi
HIGHdocs/examples/quickstart_examples_set_2.py0from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi
HIGHdocs/examples/llm_extraction_openai_pricing.py0from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi
HIGHdocs/examples/quickstart.py0from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi
HIGHdeploy/docker/c4ai-code-context.md0(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t
HIGHdeploy/docker/c4ai-doc-context.md0(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t
HIGHdocs/md_v2/complete-sdk-reference.md0(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t
HIGHdocs/md_v2/core/quickstart.md0(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t
HIGHdocs/examples/quickstart_examples_set_2.py0(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t
HIGHdocs/examples/quickstart.py0(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t
HIGHdeploy/docker/c4ai-code-context.md0const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click();
HIGHtests/async/test_edge_cases.py0const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click();
HIGHdocs/examples/quickstart_examples_set_2.py0const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click();
HIGHdocs/examples/quickstart.py0const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click();
HIGHdeploy/docker/c4ai-code-context.md0(async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu
HIGHdocs/examples/quickstart_examples_set_2.py0(async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu
HIGHdocs/examples/quickstart.py0(async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu
HIGHdeploy/docker/c4ai-code-context.md0part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial.
HIGHdocs/examples/deepcrawl_example.py0part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial.
HIGHdocs/examples/docker_config_obj.py0part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial.
201 more matches not shown…
Excessive Try-Catch Wrapping1039 hits · 1070 pts
SeverityFileLineSnippet
LOWsetup.py40except Exception:
LOWtest_webhook_implementation.py29 except Exception as e:
LOWtest_webhook_implementation.py37 except Exception as e:
LOWtest_webhook_implementation.py87 except Exception as e:
LOWtest_webhook_implementation.py139 except Exception as e:
LOWtest_webhook_implementation.py200 except Exception as e:
LOWtest_webhook_implementation.py229 except Exception as e:
LOWtest_webhook_implementation.py264 except Exception as e:
LOWtest_llm_webhook_feature.py59 except Exception as e:
LOWtest_llm_webhook_feature.py95 except Exception as e:
LOWtest_llm_webhook_feature.py130 except Exception as e:
LOWtest_llm_webhook_feature.py181 except Exception as e:
LOWtest_llm_webhook_feature.py233 except Exception as e:
LOWtest_llm_webhook_feature.py279 except Exception as e:
LOWtest_llm_webhook_feature.py346 except Exception as e:
MEDIUMcrawl4ai/async_crawler_strategy.back.py551def handle_request_capture(request):
MEDIUMcrawl4ai/async_crawler_strategy.back.py581def handle_response_capture(response):
MEDIUMcrawl4ai/async_crawler_strategy.back.py611def handle_request_failed_capture(request):
MEDIUMcrawl4ai/async_crawler_strategy.back.py632def handle_console_capture(msg):
MEDIUMcrawl4ai/async_crawler_strategy.back.py665def handle_pageerror_capture(err):
MEDIUMcrawl4ai/async_crawler_strategy.back.py1555def handle_console_message(msg):
MEDIUMcrawl4ai/async_crawler_strategy.back.py2267def _session_context(self):
LOWcrawl4ai/async_crawler_strategy.back.py930 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py990 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1002 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1108 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1201 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py2132 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py2180 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py2417 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py2443 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py606 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py621 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1532 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py327 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py387 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py563 except Exception:
LOWcrawl4ai/async_crawler_strategy.back.py576 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py587 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py655 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py685 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1374 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1424 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1458 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1566 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1619 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1714 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1746 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1922 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py1933 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py2026 except Exception as e:
LOWcrawl4ai/async_crawler_strategy.back.py2034 except Exception as e:
LOWcrawl4ai/async_database.py82 except Exception as e:
LOWcrawl4ai/async_database.py110 except Exception as e:
LOWcrawl4ai/async_database.py164 except Exception as e:
LOWcrawl4ai/async_database.py185 except Exception as e:
LOWcrawl4ai/async_database.py217 except Exception as e:
LOWcrawl4ai/async_database.py383 except Exception as e:
LOWcrawl4ai/async_database.py425 except Exception as e:
LOWcrawl4ai/async_database.py470 except Exception as e:
979 more matches not shown…
Self-Referential Comments155 hits · 468 pts
SeverityFileLineSnippet
MEDIUMcrawl4ai/async_crawler_strategy.back.py1519 # Create a new CDP session
MEDIUMcrawl4ai/async_database.py677# Create a singleton instance
MEDIUMcrawl4ai/ssl_certificate.py90 # Create the dictionary directly
MEDIUMcrawl4ai/adaptive_crawler.py133 # Create a mock object that has the minimal interface we need
MEDIUMcrawl4ai/link_preview.py212 # Create a wrapper to track progress
MEDIUMcrawl4ai/link_preview.py241 # Create a custom progress tracking version
MEDIUMcrawl4ai/extraction_strategy.py2181 # Create the wrapper function that implements the selection strategy
MEDIUMcrawl4ai/extraction_strategy.py2187 # Create a cache key based on element and selector
MEDIUMcrawl4ai/extraction_strategy.py2412 # Create a function that will apply this selector appropriately
MEDIUMcrawl4ai/adaptive_crawler copy.py133 # Create a mock object that has the minimal interface we need
MEDIUMcrawl4ai/content_scraping_strategy.py851 # Create a config object for LinkPreview
MEDIUMcrawl4ai/cli.py456 # Create the profile
MEDIUMcrawl4ai/cli.py361 # Create a profile and use it for crawling
MEDIUMcrawl4ai/utils.py516 # Create the box with colored borders and lighter text
MEDIUMcrawl4ai/utils.py980 # Create a function that replace content of all"pre" tag with its inner text
MEDIUMcrawl4ai/utils.py3224 # # Create a signature based on tag and classes
MEDIUMcrawl4ai/browser_profiler.py158 # Create a logger if not provided
MEDIUMcrawl4ai/browser_profiler.py995 # Create a temporary profile directory
MEDIUMcrawl4ai/browser_profiler.py1200 # Create a user data directory for the builtin browser
MEDIUMcrawl4ai/browser_profiler.py1382 # Create a new profile
MEDIUMcrawl4ai/browser_profiler.py408 # Create a profile interactively
MEDIUMcrawl4ai/browser_profiler.py829 # Define a custom crawl function
MEDIUMcrawl4ai/async_crawler_strategy.py1639 # Create a new CDP session
MEDIUMcrawl4ai/model_loader.py226 # Create the models directory if it doesn't exist
MEDIUMcrawl4ai/async_configs.py835 # Create a funciton returns dict of the object
MEDIUMcrawl4ai/async_configs.py1886 # Create a funciton returns dict of the object
MEDIUMcrawl4ai/async_configs.py2010 # Create a new config with streaming enabled
MEDIUMcrawl4ai/async_configs.py2013 # Create a new config with multiple updates
MEDIUMcrawl4ai/async_url_seeder.py1240 # Create a bounded queue for results to prevent RAM issues
MEDIUMcrawl4ai/browser_manager.py1690 # Create a new page from the chosen context
MEDIUMcrawl4ai/browser_manager.py482 # Create a BrowserProfiler instance and delegate to it
MEDIUMcrawl4ai/browser_manager.py505 # Create a BrowserProfiler instance and delegate to it
MEDIUMcrawl4ai/browser_manager.py528 # Create a BrowserProfiler instance and delegate to it
MEDIUMcrawl4ai/chunking_strategy.py7# Define the abstract base class for chunking strategies
MEDIUMcrawl4ai/chunking_strategy.py27# Create an identity chunking strategy f(x) = [x]
MEDIUMcrawl4ai/async_webcrawler.py262 # Initialize processing variables
MEDIUMcrawl4ai/async_webcrawler.py803 # Define the source selection logic using dict dispatch
MEDIUMcrawl4ai/deep_crawling/bfs_strategy.py51 # Create a new logger if logger is None, dict, or any other non-Logger type
MEDIUMcrawl4ai/deep_crawling/bff_strategy.py62 # Create a new logger if logger is None, dict, or any other non-Logger type
MEDIUMcrawl4ai/js_snippet/__init__.py4# Create a function get name of a js script, then load from the CURRENT folder of this script and return its content as
MEDIUMcrawl4ai/components/crawler_monitor.py167 # Create the status text
MEDIUMcrawl4ai/components/crawler_monitor.py180 # Create a table for status counts
MEDIUMcrawl4ai/components/crawler_monitor.py261 # Create a table for task details
MEDIUMcrawl4ai/components/crawler_monitor.py374 # Create a more visible footer panel
MEDIUMdeploy/docker/hook_manager.py119 # Create a safe namespace for the hook
MEDIUMtests/test_pyopenssl_security_fix.py75 # Create a basic SSL context to verify functionality
MEDIUMtests/test_raw_html_redirected_url.py16 # Create a dummy decorator
MEDIUMtests/test_raw_html_redirected_url.py51 # Create a large HTML (100KB+)
MEDIUMtests/test_raw_html_edge_cases.py259 # Create a temp file
MEDIUMtests/docker/test_config_object.py54 # Create the config
MEDIUMtests/memory/test_dispatcher_stress.py36# Create a memory restrictor to simulate limited memory environment
MEDIUMtests/memory/test_dispatcher_stress.py330 # Create a nightmare scenario - multiple overlapping spikes
MEDIUMtests/memory/benchmark_report.py230 # Create the plot
MEDIUMtests/memory/benchmark_report.py307 # Create the plot
MEDIUMtests/memory/benchmark_report.py871 # Create the benchmark reporter
MEDIUMtests/proxy/test_proxy_config.py477 # Create a large list of proxy strings
MEDIUMtests/general/test_mhtml.py27 # Create a fresh browser config and crawler instance for this test
MEDIUMtests/general/test_mhtml.py32 # Create a fresh crawler instance
MEDIUMtests/general/test_mhtml.py89 # Create a fresh browser config and crawler instance for this test
MEDIUMtests/general/test_mhtml.py94 # Create a fresh crawler instance
95 more matches not shown…
Unused Imports559 hits · 456 pts
SeverityFileLineSnippet
LOWtest_webhook_implementation.py34
LOWtest_llm_webhook_feature.py23
LOWtest_llm_webhook_feature.py24
LOWcrawl4ai/async_crawler_strategy.back.py1
LOWcrawl4ai/async_database.py15
LOWcrawl4ai/adaptive_crawler.py13
LOWcrawl4ai/adaptive_crawler.py14
LOWcrawl4ai/adaptive_crawler.py17
LOWcrawl4ai/adaptive_crawler.py878
LOWcrawl4ai/link_preview.py8
LOWcrawl4ai/extraction_strategy.py19
LOWcrawl4ai/extraction_strategy.py30
LOWcrawl4ai/extraction_strategy.py34
LOWcrawl4ai/browser_adapter.py10
LOWcrawl4ai/__init__.py4
LOWcrawl4ai/__init__.py4
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py6
LOWcrawl4ai/__init__.py8
LOWcrawl4ai/__init__.py8
LOWcrawl4ai/__init__.py8
LOWcrawl4ai/__init__.py13
LOWcrawl4ai/__init__.py14
LOWcrawl4ai/__init__.py14
LOWcrawl4ai/__init__.py18
LOWcrawl4ai/__init__.py18
LOWcrawl4ai/__init__.py22
LOWcrawl4ai/__init__.py22
LOWcrawl4ai/__init__.py22
LOWcrawl4ai/__init__.py22
LOWcrawl4ai/__init__.py22
LOWcrawl4ai/__init__.py22
LOWcrawl4ai/__init__.py22
LOWcrawl4ai/__init__.py31
LOWcrawl4ai/__init__.py31
LOWcrawl4ai/__init__.py32
LOWcrawl4ai/__init__.py33
LOWcrawl4ai/__init__.py33
LOWcrawl4ai/__init__.py33
LOWcrawl4ai/__init__.py33
LOWcrawl4ai/__init__.py39
LOWcrawl4ai/__init__.py39
LOWcrawl4ai/__init__.py39
LOWcrawl4ai/__init__.py39
LOWcrawl4ai/__init__.py45
LOWcrawl4ai/__init__.py45
LOWcrawl4ai/__init__.py45
LOWcrawl4ai/__init__.py46
LOWcrawl4ai/__init__.py47
LOWcrawl4ai/__init__.py48
LOWcrawl4ai/__init__.py48
LOWcrawl4ai/__init__.py48
499 more matches not shown…
Cross-Language Confusion49 hits · 255 pts
SeverityFileLineSnippet
HIGHcrawl4ai/async_crawler_strategy.back.py1836 # return {{ success: false, error: err.toString(), stack: err.stack }};
HIGHcrawl4ai/async_crawler_strategy.back.py1288 htmlChunks.push(previousHTML);
HIGHcrawl4ai/async_crawler_strategy.back.py1300 htmlChunks.push(currentHTML);
HIGHcrawl4ai/async_crawler_strategy.back.py1329 uniqueElements.push(element.outerHTML);
HIGHcrawl4ai/async_crawler_strategy.back.py1450 error: error.toString(),
HIGHcrawl4ai/async_crawler_strategy.back.py1854 return {{ success: false, error: err.toString(), stack: err.stack }};
HIGHcrawl4ai/async_crawler_strategy.back.py1985 error: error.toString(),
HIGHcrawl4ai/async_crawler_strategy.back.py1996 error: error.toString(),
HIGHcrawl4ai/async_crawler_strategy.back.py2117 error: e.toString()
HIGHcrawl4ai/browser_adapter.py314 window.__capturedConsole.push({
HIGHcrawl4ai/browser_adapter.py348 window.__capturedErrors.push({
HIGHcrawl4ai/browser_adapter.py365 window.__capturedErrors.push({
HIGHcrawl4ai/browser_adapter.py368 stack: event.reason && event.reason.stack ? event.reason.stack : '',
HIGHcrawl4ai/prompts.py1357if (card && card.shadowRoot) {
HIGHcrawl4ai/prompts.py1366 if (card && card.shadowRoot) {
HIGHcrawl4ai/async_crawler_strategy.py2059 # return {{ success: false, error: err.toString(), stack: err.stack }};
HIGHcrawl4ai/async_crawler_strategy.py1363 htmlChunks.push(previousHTML);
HIGHcrawl4ai/async_crawler_strategy.py1375 htmlChunks.push(currentHTML);
HIGHcrawl4ai/async_crawler_strategy.py1404 uniqueElements.push(element.outerHTML);
HIGHcrawl4ai/async_crawler_strategy.py1526 error: error.toString(),
HIGHcrawl4ai/async_crawler_strategy.py1570 error: error.toString(),
HIGHcrawl4ai/async_crawler_strategy.py1866 if (rect.width > 0 && rect.height > 0) {
HIGHcrawl4ai/async_crawler_strategy.py2077 return {{ success: false, error: err.toString(), stack: err.stack }};
HIGHcrawl4ai/async_crawler_strategy.py2208 error: error.toString(),
HIGHcrawl4ai/async_crawler_strategy.py2219 error: error.toString(),
HIGHcrawl4ai/async_crawler_strategy.py2340 error: e.toString()
HIGHtests/test_cloud_bugs_batch.py96httpbin_anything = '<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"args": {}, "da
HIGHtests/test_main.py73 "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten
HIGHtests/test_virtual_scroll.py44 allData.push({
HIGHtests/test_virtual_scroll.py57 items.push(`<div class="item" data-index="${item.id}">${item.text}</div>`);
HIGHtests/test_docker.py91 "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textContent.in
HIGHtests/releases/test_release_0.6.4.py89 function gtag(){dataLayer.push(arguments);}
HIGHtests/releases/test_release_0.6.4.py125 function gtag(){dataLayer.push(arguments);}
HIGHtests/docker/test_hooks_comprehensive.py465 return element ? element.getAttribute('content') : null;
HIGHtests/docker/test_server_requests.py145 # It might be null, missing, or populated depending on the server's default behavior
HIGHtests/general/test_async_crawler_strategy.py283# results.push(e.name);
HIGHtests/general/test_async_crawler_strategy.py288# results.push(e.name);
HIGHtests/async/test_parameters_and_options.py52 "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textContent.in
HIGHdocs/examples/quickstart_examples_set_2.py94 js_code="const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten
HIGHdocs/examples/quickstart_examples_set_2.py361 return commits.length > 0 ? commits[0].textContent.trim() : null;
HIGHdocs/examples/quickstart_examples_set_2.py371 if (newCommit && newCommit !== initialCommit) {
HIGHdocs/examples/stealth_mode_example.py112 console.log('DETECTION_RESULTS:', JSON.stringify(detectionResults, null, 2));
HIGHdocs/examples/rest_call.py32 loadMoreButton && loadMoreButton.click();
HIGHdocs/examples/crawlai_vs_firecrawl.py53 "const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten
HIGHdocs/examples/docker_client_hooks_example.py234 return el ? el.getAttribute('content') : null;
HIGHdocs/examples/quickstart.py94 js_code="const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten
HIGHdocs/examples/quickstart.py361 return commits.length > 0 ? commits[0].textContent.trim() : null;
HIGHdocs/examples/quickstart.py371 if (newCommit && newCommit !== initialCommit) {
HIGH…solver/capsolver_api_integration/solve_recaptcha_v3.py47 args[0] = url.toString();
Cross-Language Confusion (JS/TS)42 hits · 255 pts
SeverityFileLineSnippet
HIGHdocs/md_v2/ask_ai/ask-ai.js72 print(result.markdown[:300]) # Print first 300 chars
HIGHdocs/md_v2/marketplace/frontend/app-detail.js155 print(result.markdown)`;
HIGHdocs/md_v2/marketplace/frontend/app-detail.js172 print(result.status_code)`;
HIGHdocs/md_v2/marketplace/frontend/app-detail.js191 print(result.extracted_content)`;
HIGHdocs/md_v2/marketplace/frontend/app-detail.js240 print(f"Found {len(products)} products")
HIGHdocs/md_v2/marketplace/frontend/app-detail.js243 print(f"- {product['title']}: {product['price']}")
HIGH…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js2418 print("✅ Automation completed successfully!")
HIGH…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js2419 print(f"Final URL: {result.url}")
HIGH…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js2423 print("❌ Automation failed:", result.error_message)
HIGH…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js2452print(f"💾 C4A Script saved to: {script_path}")
HIGH…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js2453print("\\n📜 Generated C4A Script:")
HIGH…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js2454print(C4A_SCRIPT)
HIGH…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js2464print("\\n💡 To execute this C4A script, compile it to JavaScript first!")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1737 print(f"\\n✅ Successfully extracted {len(data)} items!")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1744 print("\\n📊 Sample results (first 2 items):")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1746 print(f"\\nItem {i}:")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1748 print(f" {key}: {value}")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1752 print("❌ Extraction failed:", result.error_message)
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1753 return None
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1759 print("\\n🎯 Next steps:")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1760 print("1. Install Crawl4AI: pip install crawl4ai")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1761 print("2. Modify the URL or add multiple URLs")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1762 print("3. Customize crawler options as needed")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1763 print("4. Check 'extracted_data.json' for full results")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1817 print("✅ Schema generated successfully!")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1818 print(f"📄 Schema saved to: {schema_path}")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1819 print("\\nGenerated schema:")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1820 print(json.dumps(schema, indent=2))
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1825 print(f"❌ Error generating schema: {e}")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1826 return None
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1830 print("\\n🧪 Testing extraction on live webpage...")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1837 print("❌ Schema file not found. Run generate_schema() first.")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1859 print(f"\\n✅ Successfully extracted {len(data)} items!")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1866 print("\\n📊 Sample results (first 2 items):")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1868 print(f"\\nItem {i}:")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1870 print(f" {key}: {value}")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1872 print("❌ Extraction failed:", result.error_message)
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1882 print("\\n🎯 Next steps:")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1883 print("1. Review the generated schema in 'generated_schema.json'")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1884 print("2. Uncomment the test_extraction() line to test on the live site")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1885 print("3. Use the schema in your Crawl4AI projects!")
HIGH…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js1803 print("🔧 Generating extraction schema...")
Deep Nesting258 hits · 213 pts
SeverityFileLineSnippet
LOWcrawl4ai/async_crawler_strategy.back.py223
LOWcrawl4ai/async_crawler_strategy.back.py421
LOWcrawl4ai/async_crawler_strategy.back.py496
LOWcrawl4ai/async_crawler_strategy.back.py1635
LOWcrawl4ai/async_crawler_strategy.back.py1791
LOWcrawl4ai/async_crawler_strategy.back.py551
LOWcrawl4ai/async_database.py102
LOWcrawl4ai/async_database.py478
LOWcrawl4ai/ssl_certificate.py62
LOWcrawl4ai/adaptive_crawler.py871
LOWcrawl4ai/adaptive_crawler.py1330
LOWcrawl4ai/adaptive_crawler.py1570
LOWcrawl4ai/adaptive_crawler.py1845
LOWcrawl4ai/extraction_strategy.py642
LOWcrawl4ai/extraction_strategy.py787
LOWcrawl4ai/extraction_strategy.py844
LOWcrawl4ai/extraction_strategy.py1150
LOWcrawl4ai/extraction_strategy.py1240
LOWcrawl4ai/extraction_strategy.py1820
LOWcrawl4ai/extraction_strategy.py2165
LOWcrawl4ai/extraction_strategy.py2404
LOWcrawl4ai/extraction_strategy.py2182
LOWcrawl4ai/extraction_strategy.py2413
LOWcrawl4ai/browser_adapter.py173
LOWcrawl4ai/cache_validator.py83
LOWcrawl4ai/markdown_generation_strategy.py148
LOWcrawl4ai/adaptive_crawler copy.py799
LOWcrawl4ai/adaptive_crawler copy.py1252
LOWcrawl4ai/adaptive_crawler copy.py1494
LOWcrawl4ai/adaptive_crawler copy.py1769
LOWcrawl4ai/hub.py41
LOWcrawl4ai/content_scraping_strategy.py231
LOWcrawl4ai/content_scraping_strategy.py410
LOWcrawl4ai/content_scraping_strategy.py569
LOWcrawl4ai/content_scraping_strategy.py607
LOWcrawl4ai/user_agent_generator.py261
LOWcrawl4ai/user_agent_generator.py299
LOWcrawl4ai/async_dispatcher.py175
LOWcrawl4ai/async_dispatcher.py228
LOWcrawl4ai/async_dispatcher.py374
LOWcrawl4ai/async_dispatcher.py471
LOWcrawl4ai/async_dispatcher.py530
LOWcrawl4ai/async_dispatcher.py635
LOWcrawl4ai/cli.py110
LOWcrawl4ai/cli.py501
LOWcrawl4ai/cli.py580
LOWcrawl4ai/cli.py1032
LOWcrawl4ai/utils.py76
LOWcrawl4ai/utils.py419
LOWcrawl4ai/utils.py555
LOWcrawl4ai/utils.py707
LOWcrawl4ai/utils.py889
LOWcrawl4ai/utils.py1143
LOWcrawl4ai/utils.py2169
LOWcrawl4ai/utils.py3122
LOWcrawl4ai/utils.py3382
LOWcrawl4ai/utils.py3658
LOWcrawl4ai/utils.py1335
LOWcrawl4ai/browser_profiler.py83
LOWcrawl4ai/browser_profiler.py196
198 more matches not shown…
Redundant / Tautological Comments156 hits · 206 pts
SeverityFileLineSnippet
LOWtest_webhook_implementation.py240 # Check if api.py can import webhook module
LOWcrawl4ai/async_database.py49 # Check if version update is needed
LOWcrawl4ai/ssl_certificate.py74 # Set check_hostname to False and verify_mode to CERT_NONE temporarily
LOWcrawl4ai/proxy_strategy.py244 # Check if session exists and hasn't expired
LOWcrawl4ai/adaptive_crawler.py538 # Check if we have any links left
LOWcrawl4ai/adaptive_crawler.py715 # Check if KB has changed
LOWcrawl4ai/adaptive_crawler.py1159 # Check if confidence is below minimum threshold (completely irrelevant)
LOWcrawl4ai/extraction_strategy.py2278 # Check if there's content after the nth-child part
LOWcrawl4ai/adaptive_crawler copy.py511 # Check if we have any links left
LOWcrawl4ai/adaptive_crawler copy.py650 # Check if KB has changed
LOWcrawl4ai/content_scraping_strategy.py866 # Check if we're already in an async context
LOWcrawl4ai/async_dispatcher.py288 # Check if we're in critical memory state
LOWcrawl4ai/cli.py1132 # Set output to JSON if not explicitly specified
LOWcrawl4ai/cli.py1141 # Check if type does not exist show proper message
LOWcrawl4ai/cli.py1400 # Check if the value should be one of the allowed options
LOWcrawl4ai/cli.py1550 # Display results
LOWcrawl4ai/utils.py1202 # Check if the text content has at least word_count_threshold
LOWcrawl4ai/utils.py1208 # Check if an image has valid display and inside undesired html elements
LOWcrawl4ai/utils.py3272 # Check if running in Google Colab
LOWcrawl4ai/utils.py290 # Check if cache is still fresh based on TTL
LOWcrawl4ai/utils.py297 # Check if content actually changed
LOWcrawl4ai/utils.py656 # Check if a path has already been saved for this browser type
LOWcrawl4ai/utils.py1043 # Check if the tag contains text and if it's not just whitespace
LOWcrawl4ai/utils.py1063 # Check if the tag itself is empty or all its children are empty/whitespace
LOWcrawl4ai/utils.py1804 # Check if we have exhausted our max attempts
LOWcrawl4ai/utils.py1897 # Check if we have exhausted our max attempts
LOWcrawl4ai/utils.py2597 # Check if URL domain ends with base domain
LOWcrawl4ai/utils.py3364 # Check if this is a documentation/reference site
LOWcrawl4ai/browser_profiler.py563 # Check if browser started successfully
LOWcrawl4ai/browser_profiler.py1287 # Check if the browser is still running
LOWcrawl4ai/browser_profiler.py1303 # Check if the process exists
LOWcrawl4ai/browser_profiler.py114 # Check if item matches any keep pattern
LOWcrawl4ai/browser_profiler.py239 # Check if browser process ended
LOWcrawl4ai/browser_profiler.py297 # Check if browser process ended
LOWcrawl4ai/browser_profiler.py371 # Check if browser process ended
LOWcrawl4ai/browser_profiler.py658 # Check if this looks like a valid browser profile
LOWcrawl4ai/browser_profiler.py710 # Check if path exists and is a valid profile
LOWcrawl4ai/browser_profiler.py759 # Check if path exists and is a valid profile
LOWcrawl4ai/browser_profiler.py1115 # Check if browser started successfully
LOWcrawl4ai/browser_profiler.py1194 # Check if there's an existing browser still running
LOWcrawl4ai/browser_profiler.py1217 # Check if browser started successfully
LOWcrawl4ai/docker_client.py94 # Check if hooks are already strings or need conversion
LOWcrawl4ai/async_crawler_strategy.py463 # Check if browser processing is required for file:// or raw: URLs
LOWcrawl4ai/async_crawler_strategy.py727 # Check if this is a file:// or raw: URL that needs set_content() instead of goto()
LOWcrawl4ai/async_crawler_strategy.py1784 # Check if viewport-only screenshot is forced
LOWcrawl4ai/model_loader.py195 # Check if the model directory already exists
LOWcrawl4ai/table_extraction.py110 # Check if this is a data table (not a layout table)
LOWcrawl4ai/table_extraction.py760 # Check if there are any tables in the content
LOWcrawl4ai/table_extraction.py769 # Check if chunking is needed
LOWcrawl4ai/table_extraction.py852 # Check if we got valid tables
LOWcrawl4ai/table_extraction.py1024 # Check if adding this row would exceed threshold
LOWcrawl4ai/async_configs.py2049 # Check if given provider starts with any of key in PROVIDER_MODELS_PREFIXES
LOWcrawl4ai/content_filter_strategy.py460 # Check if body is present
LOWcrawl4ai/browser_manager.py1709 # Check if browser recycle threshold is hit — bump version for next requests
LOWcrawl4ai/browser_manager.py1771 # Check if this signature belongs to an old browser waiting to be cleaned up
LOWcrawl4ai/browser_manager.py1935 # Check if any signatures from this old version remain
LOWcrawl4ai/browser_manager.py1333 # Check if there is value for crawlerRunConfig.proxy_config set add that to context
LOWcrawl4ai/async_webcrawler.py498 # Check if blocked (skip for raw: URLs —
LOWcrawl4ai/deep_crawling/bfs_strategy.py239 # Check if we've already reached max_pages before starting a new level
LOWcrawl4ai/deep_crawling/bfs_strategy.py357 # Check if we've reached the limit during batch processing
96 more matches not shown…
Verbosity Indicators107 hits · 171 pts
SeverityFileLineSnippet
LOWcrawl4ai/cache_validator.py112 # Step 1: Try HEAD request with conditional headers
LOWcrawl4ai/cache_validator.py156 # Step 2: No conditional headers available, try fingerprint only
LOWcrawl4ai/cache_validator.py180 # Step 3: No validation data available
LOWcrawl4ai/async_url_seeder.py914 # Step 1: Find sitemap URL and get lastmod (needed for validation)
LOWcrawl4ai/async_url_seeder.py938 # Step 2: Check cache validity (skip if force=True)
LOWcrawl4ai/async_url_seeder.py952 # Step 3: Fetch fresh URLs
LOWcrawl4ai/async_url_seeder.py993 # Step 4: Write to cache (FALLBACK: if write fails, URLs still yielded above)
LOWcrawl4ai/cloud/cli.py253 # Step 1: Shrink (unless --no-shrink)
LOWcrawl4ai/cloud/cli.py266 # Step 2: Package as tar.gz
LOWcrawl4ai/cloud/cli.py281 # Step 3: Upload
LOWtests/test_webhook_feature.sh104# Step 1: Save current branch and fetch PR
LOWtests/test_webhook_feature.sh112# Step 2: Switch to new branch
LOWtests/test_webhook_feature.sh117# Step 3: Activate virtual environment
LOWtests/test_webhook_feature.sh128# Step 4: Install server dependencies
LOWtests/test_webhook_feature.sh147# Step 5: Start Redis in background
LOWtests/test_webhook_feature.sh183# Step 6: Create and run webhook test
LOWtests/test_webhook_feature.sh292# Step 7: Verify results
LOWtests/test_webhook_feature.sh303# Step 8: Cleanup happens automatically via trap
LOWtests/test_pyopenssl_update.py141 # Step 1: Check versions
LOWtests/test_pyopenssl_update.py147 # Step 2: Test basic crawling
LOWtests/test_pyopenssl_update.py153 # Step 3: Test stealth mode
LOWtests/proxy/test_proxy_verify.py79 # Step 1: Verify IPs
LOWtests/proxy/test_proxy_verify.py86 # Step 2: Get NST proxies
LOWtests/proxy/test_proxy_verify.py97 # Step 3: Test Chanel with all available proxies
LOWtests/general/test_async_url_seeder_bm25.py558 # Step 1: Discover and score URLs
LOWtests/general/test_async_url_seeder_bm25.py587 # Step 3: Verify these URLs would be good for actual crawling
LOWtests/general/test_async_url_seeder_bm25.py573 # Step 2: Analyze top results
LOWtests/async/test_browser_lifecycle.py608 # Step 1: open all sessions
LOWtests/async/test_browser_lifecycle.py615 # Step 2: navigate each session to a second page
LOWtests/async/test_browser_lifecycle.py620 # Step 3: kill sessions one by one, verify others unaffected
LOWtests/async/test_browser_lifecycle.py936 # Step 1: open session
LOWtests/async/test_browser_lifecycle.py943 # Step 2: concurrent non-session crawls
LOWtests/async/test_browser_lifecycle.py952 # Step 3: kill session
LOWtests/async/test_browser_lifecycle.py955 # Step 4: trigger recycle
LOWtests/async/test_browser_lifecycle.py962 # Step 5: new session on fresh browser
LOWtests/async/test_browser_lifecycle.py970 # Step 6: verify it works
LOWtests/async/test_browser_memory.py774 # Step 1: login — sets cookie
LOWtests/async/test_browser_memory.py779 # Step 2: dashboard — cookie should carry over via session
LOWtests/browser/test_builtin_browser.py52 # Step 1: Create a BrowserManager with builtin mode
LOWtests/browser/test_builtin_browser.py57 # Step 2: Check if we have a BuiltinBrowserStrategy
LOWtests/browser/test_builtin_browser.py69 # Step 3: Start the manager to launch or connect to builtin browser
LOWtests/browser/test_builtin_browser.py78 # Step 4: Get browser info from the strategy
LOWtests/browser/test_builtin_browser.py149 # Step 1: Get browser status
LOWtests/browser/test_builtin_browser.py160 # Step 2: Test killing the browser
LOWtests/browser/test_builtin_browser.py172 # Step 3: Check status after kill
LOWtests/browser/test_builtin_browser.py184 # Step 4: Launch a new browser
LOWtests/browser/test_builtin_browser.py206 # Step 1: Create first manager
LOWtests/browser/test_builtin_browser.py211 # Step 2: Create second manager
LOWtests/browser/test_builtin_browser.py216 # Step 3: Start both managers (should connect to the same builtin browser)
LOWtests/browser/test_builtin_browser.py263 # Step 5: Close both managers
LOWtests/browser/test_builtin_browser.py103 # Step 1: Get a single page
LOWtests/browser/test_builtin_browser.py122 # Step 2: Get multiple pages
LOWtests/browser/test_builtin_browser.py241 # Step 4: Test using both managers
LOWtests/browser/test_builtin_browser.py282 # Step 1: Test multiple starts with the same manager
LOWtests/browser/test_builtin_browser.py309 # Step 2: Test killing the browser while manager is active
LOWtests/browser/test_builtin_browser.py472 # Step 1: Create and start multiple browser managers in parallel
LOWtests/browser/test_builtin_browser.py666 # Step 1: Create and start multiple browser managers in parallel
LOWtests/async_assistant/test_extract_pipeline.py62 # Step 1: Starting
LOWtests/async_assistant/test_extract_pipeline.py65 # Step 2: Quick crawl for analysis
LOWtests/async_assistant/test_extract_pipeline.py79 # Step 3: HTML Skimming using lxml
47 more matches not shown…
Magic Placeholder Names15 hits · 80 pts
SeverityFileLineSnippet
HIGHdeploy/docker/c4ai-doc-context.md1105 llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY")
HIGHdocs/md_v2/complete-sdk-reference.md2928 llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY")
HIGHdocs/md_v2/core/adaptive-crawling.md119 api_token='your-api-key'
HIGHdocs/md_v2/core/adaptive-crawling.md124 api_token='your-api-key'
HIGHdocs/md_v2/core/adaptive-crawling.md133 'api_token': 'your-api-key'
HIGHdocs/md_v2/core/adaptive-crawling.md137 'api_token': 'your-api-key'
HIGHdocs/md_v2/core/table_extraction.md141 api_token="your_api_key",
HIGHdocs/md_v2/core/content-selection.md305 llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY")
HIGHdocs/md_v2/marketplace/frontend/app-detail.js181 api_token="your-api-key",
HIGHdocs/md_v2/marketplace/frontend/app-detail.html167 api_token="your-api-key",
HIGHdocs/md_v2/blog/releases/0.5.0.md409 llm_config = LLMConfig(provider="openai/gpt-4o", api_token="YOUR_API_KEY")
HIGH…cs/examples/url_seeder/bbc_sport_research_assistant.py21- export GEMINI_API_KEY="your-api-key"
HIGHdocs/examples/website-to-api/README.md68 "api_token": "your-api-key-here"
HIGHdocs/examples/website-to-api/README.md160 "api_token": "your-api-key-here"
HIGHdocs/examples/website-to-api/README.md204 api_token="your-api-key"
Docstring Block Structure13 hits · 65 pts
SeverityFileLineSnippet
HIGHcrawl4ai/async_crawler_strategy.back.py290 Wait for a condition in a CSP-compliant way. Args: page: Playwright page object
HIGHcrawl4ai/async_crawler_strategy.back.py424 Crawls a given URL or processes raw HTML/local file content based on the URL prefix. Args:
HIGHcrawl4ai/extraction_strategy.py1057 Evaluate a computed field expression safely using AST validation. Allows simple transforms (math, string metho
HIGHcrawl4ai/extraction_strategy.py1762 Generate extraction schema from HTML content or URL(s) (sync version). Args: html (str, op
HIGHcrawl4ai/extraction_strategy.py1834 Generate extraction schema from HTML content or URL(s) (async version). Use this method when calling f
HIGHcrawl4ai/utils.py1150 Extracts and cleans content from website HTML, optimizing for useful media and contextual information. Par
HIGHcrawl4ai/utils.py3735 Convert hook function objects to string representations for Docker API. This utility simplifies the process of
HIGHcrawl4ai/async_crawler_strategy.py301 Wait for a condition in a CSP-compliant way. Args: page: Playwright page object
HIGHcrawl4ai/async_crawler_strategy.py439 Crawls a given URL or processes raw HTML/local file content based on the URL prefix. Args:
HIGHcrawl4ai/async_webcrawler.py966 Runs the crawler for multiple URLs concurrently using a configurable dispatcher strategy. Args:
HIGHcrawl4ai/async_webcrawler.py1106 Discovers, filters, and optionally validates URLs for a given domain(s) using sitemaps and Common Crawl
HIGHcrawl4ai/script/c4ai_script.py624Compile C4A-Script from string or list of strings to JavaScript. Args: script: C4A-Script as a string o
HIGHdeploy/docker/c4ai-code-context.md2021 Runs the crawler for multiple URLs concurrently using a configurable dispatcher strategy. Args:
Over-Commented Block68 hits · 60 pts
SeverityFileLineSnippet
LOWdocker-compose.yml1version: '3.8'
LOWcrawl4ai/async_crawler_strategy.back.py841 # """
LOWcrawl4ai/async_crawler_strategy.back.py861 # opacity: style.opacity,
LOWcrawl4ai/async_crawler_strategy.back.py1821 for script in scripts:
LOWcrawl4ai/adaptive_crawler.py201 embedding_top_k_weight: float = 0.3 # Weight for top-k average in hybrid scoring
LOWcrawl4ai/adaptive_crawler.py561 # if hasattr(result, 'extracted_content') and result.extracted_content:
LOWcrawl4ai/adaptive_crawler.py761
LOWcrawl4ai/adaptive_crawler.py1021
LOWcrawl4ai/adaptive_crawler.py1041
LOWcrawl4ai/adaptive_crawler.py1061 # # Top-k average (top 3)
LOWcrawl4ai/models.py161 # Anti-bot retry/proxy usage stats
LOWcrawl4ai/extraction_strategy.py241 # self.model = load_onnx_all_MiniLM_l6_v2()
LOWcrawl4ai/__init__.py201 "UndetectedAdapter",
LOWcrawl4ai/__init__.py221# print(
LOWcrawl4ai/adaptive_crawler copy.py961 # # Get cached distance matrix
LOWcrawl4ai/adaptive_crawler copy.py981 # 'very_close_neighbors': np.sum(distances < 0.2),
LOWcrawl4ai/adaptive_crawler copy.py1001 # hybrid_score = nearest_weight * nearest_score + top_k_weight * top_k_avg
LOWcrawl4ai/utils.py1821 # print("Error during completion request:", str(e))
LOWcrawl4ai/utils.py3221 # if parent is None:
LOWcrawl4ai/utils.py3241 # for element in elements[1:]:
LOWcrawl4ai/async_crawler_strategy.py841 )
LOWcrawl4ai/async_crawler_strategy.py861 # except Error as e:
LOWcrawl4ai/async_crawler_strategy.py2041 scripts = js_code
LOWcrawl4ai/async_url_seeder.py1641 # # 5. API endpoints and data files
LOWcrawl4ai/async_url_seeder.py1661 # '.woff', '.woff2', '.ttf', '.eot', '.otf'
LOWcrawl4ai/html2text/__init__.py1181 self.o(data) # Directly output the data as-is (preserve newlines)
LOWcrawl4ai/html2text/__init__.py1201 # else:
LOWcrawl4ai/legacy/crawler_strategy.py101 self.options.add_argument("--headless")
LOWdeploy/docker/c4ai-code-context.md2001
LOWdeploy/docker/c4ai-code-context.md5361 dispatch_result: Optional[DispatchResult] = None
LOWtests/test_webhook_feature.sh1#!/bin/bash
LOWtests/test_llm_simple_url.py101 # result_default = await crawler.arun(
LOWtests/test_llm_simple_url.py121 # print(f" Default headers: {len(default_first['headers'])} columns")
LOWtests/test_cli_docs.py21
LOWtests/docker/test_hooks_utility.py181 # print("✓ All tests completed successfully!")
LOWtests/docker/simple_api_test.py141 # result = self.test_get_endpoint("/schema")
LOWtests/docker/test_serialization.py121# WebScrapingStrategy, LXMLWebScrapingStrategy
LOWtests/docker/test_serialization.py141# print("\nSerialized Config:")
LOWtests/docker/test_serialization.py161# "language": "english"
LOWtests/general/test_async_crawler_strategy.py241# async def test_js_return_values(crawler_strategy):
LOWtests/general/test_async_crawler_strategy.py281# nonExistentFunction();
LOWtests/async/test_error_handling.py1# import os
LOWtests/async/test_error_handling.py21# async def cleanup(self):
LOWtests/async/test_error_handling.py41# # # Simulating a timeout by using a very short timeout value
LOWtests/async/test_error_handling.py61# # @pytest.mark.asyncio
LOW…est_evaluation_scraping_methods_performance.configs.py281 # "exclude_social_media_links": {
LOW…est_evaluation_scraping_methods_performance.configs.py301 # "combo_mode": {
LOW…est_evaluation_scraping_methods_performance.configs.py321 # "css_selector": "section#promo-section"
LOW…est_evaluation_scraping_methods_performance.configs.py341 # "remove_forms": True
LOW…est_evaluation_scraping_methods_performance.configs.py561
LOW…est_evaluation_scraping_methods_performance.configs.py581 # if link_diff:
LOWtests/async/test_chunking_and_extraction_strategies.py21 result = await crawler.arun(
LOWtests/async/test_chunking_and_extraction_strategies.py61 assert len(extracted_data) > 0
LOWtests/async/test_edge_cases.py21
LOWtests/async/test_edge_cases.py41# url = "https://news.ycombinator.com/" # Hacker News has infinite scroll
LOWtests/browser/manager/demo_browser_manager.py461 start_time = time.time()
LOWtests/profiler/test_create_profile.py21
LOWdocs/md_v2/complete-sdk-reference.md3661# ❌ Random URLs (site.com/x7f9g2h)
LOWdocs/md_v2/advanced/hooks-auth.md81 # Example 2: (Optional) Simulate a login scenario
LOWdocs/md_v2/core/link-media.md281# ✅ Clean URL structure (docs.python.org/api/reference)
8 more matches not shown…
Fake / Example Data45 hits · 44 pts
SeverityFileLineSnippet
LOWcrawl4ai/docker_client.py212 await client.authenticate("user@example.com")
LOWdeploy/docker/WEBHOOK_EXAMPLES.md206 "author": "John Doe",
LOWdeploy/docker/README.md473 # await client.authenticate("user@example.com") # See Server Configuration section
LOWdeploy/docker/README.md823 "author": "John Doe",
LOWdeploy/docker/c4ai-doc-context.md2784 "email": "user@example.com"
LOWdeploy/docker/c4ai-doc-context.md2791 "email": "user@example.com",
LOWdeploy/docker/c4ai-doc-context.md2809 await client.authenticate("user@example.com")
LOWdeploy/docker/c4ai-doc-context.md3549 ["John Doe", "34", "New York"],
LOWdeploy/docker/tests/test_security_fixes.py240 self.assertEqual(result, "John Doe")
LOWtests/memory/test_stress_sdk.py57 self.lorem_words = " ".join("lorem ipsum dolor sit amet " * 100).split()
LOWtests/memory/test_stress_sdk.py57 self.lorem_words = " ".join("lorem ipsum dolor sit amet " * 100).split()
LOWtests/async/test_browser_memory.py37<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
LOWtests/async/test_browser_memory.py37<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
LOWdocs/md_v2/core/self-hosting.md1281 "author": "John Doe",
LOWdocs/md_v2/core/self-hosting.md1642 # await client.authenticate("user@example.com") # See Server Configuration section
LOWdocs/md_v2/core/c4a-script.md174| `SET` | Set input value directly | `SET \`#email\` "user@example.com"` |
LOWdocs/md_v2/core/c4a-script.md204TYPE "user@example.com"
LOWdocs/md_v2/core/c4a-script.md250SET `#name` "John Doe"
LOWdocs/md_v2/core/c4a-script.md347TYPE "user@example.com"
LOWdocs/md_v2/marketplace/backend/dummy_data.py154 "Review", "John Doe", ["Playwright Cloud", "Puppeteer Extra"],
LOWdocs/md_v2/marketplace/backend/dummy_data.py209This is a comprehensive article about {title.lower()}. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
LOWdocs/md_v2/marketplace/backend/dummy_data.py209This is a comprehensive article about {title.lower()}. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
LOWdocs/md_v2/blog/releases/0.7.6.md105 "author": "John Doe",
LOWdocs/md_v2/api/c4a-script-reference.md373TYPE "user@example.com"
LOWdocs/md_v2/api/c4a-script-reference.md397SETVAR email = "user@example.com"
LOWdocs/md_v2/api/c4a-script-reference.md527SET `#email` "user@example.com"
LOWdocs/md_v2/api/c4a-script-reference.md766TYPE "user@example.com"
LOWdocs/md_v2/api/c4a-script-reference.md894SETVAR email = "user@example.com"
LOWdocs/md_v2/api/c4a-script-reference.md951SET `#name` "John Doe"
LOWdocs/md_v2/apps/crawl4ai-assistant/index.html616 <input type="text" id="userName" name="name" placeholder="John Doe" required>
LOWdocs/md_v2/apps/c4a-script/server.py264TYPE "John Doe"
LOWdocs/md_v2/apps/c4a-script/README.md149TYPE "user@example.com"
LOWdocs/md_v2/apps/c4a-script/playground/index.html276 <p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p>
LOWdocs/md_v2/apps/c4a-script/playground/index.html276 <p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p>
LOWdocs/md_v2/apps/c4a-script/assets/app.js596 script: `# Multi-step form with validation\nCLICK \`a[href="#forms"]\`\nWAIT \`#survey-form\` 2\n\n# Ste
LOWdocs/blog/release-v0.7.6.md105 "author": "John Doe",
LOWdocs/examples/docker_config_obj.py124 await client.authenticate("user@example.com")
LOWdocs/examples/docker_config_obj.py193 json={"email": "user@example.com"}
LOWdocs/examples/c4a_script/generate_script_hello_world.py28 goal = "Fill in email 'user@example.com', password 'secret123', and submit the form"
LOWdocs/examples/c4a_script/tutorial/server.py264TYPE "John Doe"
LOWdocs/examples/c4a_script/tutorial/README.md149TYPE "user@example.com"
LOWdocs/examples/c4a_script/tutorial/playground/index.html276 <p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p>
LOWdocs/examples/c4a_script/tutorial/playground/index.html276 <p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p>
LOWdocs/examples/c4a_script/tutorial/assets/app.js596 script: `# Multi-step form with validation\nCLICK \`a[href="#forms"]\`\nWAIT \`#survey-form\` 2\n\n# Ste
LOWdocs/examples/website-to-api/static/index.html89 "author": "John Doe",
AI Slop Vocabulary25 hits · 35 pts
SeverityFileLineSnippet
LOWcrawl4ai/async_crawler_strategy.back.py1533 # Log the error but don't raise it - we'll just return None for the MHTML
LOWcrawl4ai/async_crawler_strategy.back.py330 # For timeout or other cases, just return False
MEDIUMcrawl4ai/adaptive_crawler.py1571 """Print comprehensive statistics about the knowledge base
MEDIUMcrawl4ai/adaptive_crawler copy.py1495 """Print comprehensive statistics about the knowledge base
MEDIUMcrawl4ai/prompts.py1174GENERATE_SCRIPT_PROMPT = r"""You are a world-class browser automation specialist. Your sole purpose is to convert a natu
LOWcrawl4ai/async_crawler_strategy.py1653 # Log the error but don't raise it - we'll just return None for the MHTML
LOWcrawl4ai/async_crawler_strategy.py341 # For timeout or other cases, just return False
MEDIUMcrawl4ai/async_url_seeder.py1139 # Use lxml for XML parsing if available, as it's generally more robust
LOWcrawl4ai/browser_manager.py183 # If CDP URL provided, just return it
LOWdeploy/docker/server.py825 # if no query, just return raw contexts
LOWtests/test_source_sibling_selector.py309 # This is actually fine — let's just use "source" with flat fields instead.
MEDIUMtests/docker/test_hooks_comprehensive.py521 """Run comprehensive hook tests"""
MEDIUMtests/memory/test_dispatcher_stress.py269 # First, elevate memory usage to create pressure
MEDIUMtests/memory/benchmark_report.py374 """Generate a comprehensive comparison report of multiple test runs.
MEDIUMtests/general/test_mhtml.py5import re # For more robust MHTML checks
MEDIUMtests/general/test_mhtml.py54 # 3. Check for MHTML structure indicators (more robust than simple string contains)
MEDIUMtests/general/test_async_url_seeder_bm25.py597 """Generate a comprehensive report of BM25 scoring effectiveness."""
LOW…est_evaluation_scraping_methods_performance.configs.py69 # No <body> found; just return the <html> root
MEDIUM…est_evaluation_scraping_methods_performance.configs.py110 # If you prefer ignoring newlines or multiple whitespace, do a more robust cleanup
MEDIUM…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js780 // Try to generate a robust selector
LOWdocs/releases_review/v0.7.5_docker_hooks_demo.py367 # Use our reusable hook library - just pass the function objects!
MEDIUMdocs/releases_review/demo_v0.7.7.py477 """Print comprehensive demo summary"""
MEDIUMdocs/releases_review/demo_v0.7.7.py612 # Print comprehensive summary
MEDIUMdocs/examples/stealth_mode_example.py510 # Show best practices
LOWdocs/examples/docker_hooks_examples.py359 # Use our reusable hook library - just pass the function objects!
Hallucination Indicators1 hit · 10 pts
SeverityFileLineSnippet
CRITICALdocs/md_v2/apps/crawl4ai-assistant/libs/marked.min.js47`+s.text,this.inlineQueue.pop(),this.inlineQueue.at(-1).src=r.text):t.push(s);continue}if(e){let r="Infinite loop on byt
Example Usage Blocks6 hits · 8 pts
SeverityFileLineSnippet
LOWcrawl4ai/user_agent_generator.py417# Example usage:
LOWcrawl4ai/user_agent_generator.py420 # Usage example:
LOWcrawl4ai/browser_profiler.py1379 # Example usage
LOWcrawl4ai/docker_client.py209# Example usage
LOWcrawl4ai/processors/pdf/processor.py456# Usage example
LOWtests/profiler/test_create_profile.py6 # Example usage
Slop Phrases4 hits · 6 pts
SeverityFileLineSnippet
LOWcrawl4ai/models.py327# When removing this code in the future, make sure to:
MEDIUMcrawl4ai/table_extraction.py1268 This is a basic implementation - for complex CSS selectors,
MEDIUMcrawl4ai/processors/pdf/__init__.py137 # For simple cases, you can use the sync version
MEDIUMdocs/examples/undetected_simple_demo.py88 # Test URLs - you can change these
Dead Code2 hits · 4 pts
SeverityFileLineSnippet
MEDIUMcrawl4ai/deep_crawling/crazy.py96
MEDIUMcrawl4ai/legacy/web_crawler.py80