unclecode/crawl4ai

30.4

Adjusted Score

30.4

Raw Score

100%

Time Factor

2026-07-11

Last Push

72.6K

Stars

Python

Language

314.0K

Lines of Code

801

Files

5.7K

Pattern Hits

2026-07-14

Scan Date

0.42

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 1HIGH 339MEDIUM 1201LOW 4111

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 5652 distinct pattern matches across 24 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Decorative Section Separators861 hits · 2926 pts

Severity	File	Line	Snippet	Context
MEDIUM	crawl4ai/antibot_detector.py	22	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	crawl4ai/antibot_detector.py	25	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	crawl4ai/antibot_detector.py	69	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	crawl4ai/antibot_detector.py	73	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	crawl4ai/antibot_detector.py	100	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	crawl4ai/antibot_detector.py	103	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	crawl4ai/antibot_detector.py	114	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	crawl4ai/antibot_detector.py	116	# ---------------------------------------------------------------------------	COMMENT
MEDIUM⚡	crawl4ai/domain_mapper.py	502	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM⚡	crawl4ai/domain_mapper.py	504	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM⚡	crawl4ai/domain_mapper.py	945	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM⚡	crawl4ai/domain_mapper.py	947	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	59	# ──────────────────────────────────────────────────────────────── constants	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	109	# ──────────────────────────────────────────────────────────────── dataclass	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	120	# ──────────────────────────────────────────────────────────────── class	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	167	# ──────────────────────── lifecycle	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	183	# ──────────────────────── logging	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	191	# ──────────────────────── seeder composition	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	202	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	204	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	249	# ── Phase 1: Host Discovery ──────────────────────────────────────	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	259	# ── Phase 2: Per-Host Scanning ───────────────────────────────────	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	276	# ── Phase 3: Post-Processing ─────────────────────────────────────	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	301	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	303	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	620	# ──────────────────────── Soft-404 Detection	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	669	# ──────────────────────── robots.txt	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	711	# ──────────────────────── Sitemaps	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	753	# ──────────────────────── Path Probing	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	816	# ──────────────────────── Feed Discovery	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	900	# ──────────────────────── Homepage Link Extraction	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	1087	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM	crawl4ai/domain_mapper.py	1089	# ════════════════════════════════════════════════════════════════════════	COMMENT
MEDIUM	crawl4ai/utils.py	3166	# ── build signature ───────────────────────────────────────────	COMMENT
MEDIUM	crawl4ai/utils.py	3172	# ── first seen? keep – else drop ─────────────	COMMENT
MEDIUM⚡	crawl4ai/browser_profiler.py	554	# 1. ── Start the browser ─────────────────────────────────────────	COMMENT
MEDIUM⚡	crawl4ai/browser_profiler.py	557	# 2. ── Attach Playwright to that running Chrome ──────────────────	COMMENT
MEDIUM⚡	crawl4ai/browser_profiler.py	586	# 3. ── Persist storage state before we kill Chrome ─────────────	COMMENT
MEDIUM⚡	crawl4ai/browser_profiler.py	594	# 4. ── Close everything cleanly ──────────────────────────────────	COMMENT
MEDIUM	crawl4ai/async_crawler_strategy.py	780	# ──────────────────────────────────────────────────────────────	STRING
MEDIUM	crawl4ai/async_crawler_strategy.py	785	# ──────────────────────────────────────────────────────────────	STRING
MEDIUM	crawl4ai/async_configs.py	159	# ───────────────────────── untrusted-input trust boundary ─────────────────────────	COMMENT
MEDIUM⚡	crawl4ai/async_url_seeder.py	362	# ─────────────────────────────── discovery entry	COMMENT
MEDIUM⚡	crawl4ai/async_url_seeder.py	828	# ─────────────────────────────── CC	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	63	# ────────────────────────────────────────────────────────────────────────── consts	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	78	# ────────────────────────────────────────────────────────────────────────── helpers	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	258	# ────────────────────────────────────────────────────────────────────────── class	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	322	# ───────── cache dirs ─────────	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	338	# ───────── cache helpers ─────────	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	884	# ─────────────────────────────── Sitemaps	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	1280	# ─────────────────────────────── validate helpers	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	1465	# ─────────────────────────────── BM25 scoring helpers	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	1749	# ─────────────────────────────── cleanup methods	COMMENT
MEDIUM	crawl4ai/async_url_seeder.py	1765	# ─────────────────────────────── index helper	COMMENT
MEDIUM	crawl4ai/browser_manager.py	547	# ── 1. cookies ────────────────────────────────────────────────────────────	COMMENT
MEDIUM	crawl4ai/browser_manager.py	552	# ── 2. localStorage / sessionStorage ──────────────────────────────────────	COMMENT
MEDIUM	crawl4ai/browser_manager.py	565	# ── 3. runtime-mutable extras from configs ────────────────────────────────	COMMENT
MEDIUM	crawl4ai/browser_manager.py	811	# ── Persistent context via Playwright's native API ──────────────	COMMENT
MEDIUM	crawl4ai/legacy/llmtxt.py	308	# -----------------------------------------------------	STRING
MEDIUM⚡	deploy/docker/server.py	80	# ── internal imports (after sys.path append) ─────────────────	COMMENT
801 more matches not shown…

Hyper-Verbose Identifiers1685 hits · 1644 pts

Severity	File	Line	Snippet	Context
LOW	PROGRESSIVE_CRAWLING.md	174	def generate_synthetic_dataset(domain_url):	CODE
LOW⚡	test_webhook_implementation.py	43	def test_webhook_service_init():	CODE
LOW	test_webhook_implementation.py	93	def test_webhook_config_model():	CODE
LOW	test_webhook_implementation.py	145	def test_payload_construction():	CODE
LOW	test_llm_webhook_feature.py	15	def test_llm_job_payload_model():	CODE
LOW	test_llm_webhook_feature.py	65	def test_handle_llm_request_signature():	CODE
LOW	test_llm_webhook_feature.py	101	def test_process_llm_extraction_signature():	CODE
LOW	test_llm_webhook_feature.py	136	def test_webhook_integration_in_api():	CODE
LOW	test_llm_webhook_feature.py	187	def test_job_endpoint_integration():	CODE
LOW	test_llm_webhook_feature.py	239	def test_create_new_task_integration():	CODE
LOW	crawl4ai/antibot_detector.py	138	def _structural_integrity_check(html: str) -> Tuple[bool, str]:	CODE
LOW	crawl4ai/adaptive_crawler.py	259	def _embedding_llm_config_dict(self) -> Optional[Dict]:	CODE
LOW	crawl4ai/adaptive_crawler.py	632	def _get_embedding_llm_config_dict(self) -> Optional[Dict]:	CODE
LOW	crawl4ai/adaptive_crawler.py	642	def _get_query_llm_config_dict(self) -> Optional[Dict]:	CODE
LOW	crawl4ai/adaptive_crawler.py	708	def _get_cached_distance_matrix(self, query_embeddings: Any, kb_embeddings: Any) -> Any:	CODE
LOW	crawl4ai/adaptive_crawler.py	871	async def select_links_for_expansion(	CODE
LOW	crawl4ai/adaptive_crawler.py	1808	def _crawl_result_to_export_dict(self, result) -> Dict[str, Any]:	CODE
LOW	crawl4ai/adaptive_crawler.py	1880	def _import_dict_to_crawl_result(self, data: Dict[str, Any]):	CODE
LOW⚡	crawl4ai/extraction_strategy.py	2188	def _make_context_sensitive_xpath(self, xpath, element):	CODE
LOW⚡	crawl4ai/extraction_strategy.py	2240	def _fallback_class_id_search(self, element, selector_str):	CODE
LOW	crawl4ai/extraction_strategy.py	279	def filter_documents_embeddings(	CODE
LOW	crawl4ai/extraction_strategy.py	415	def filter_clusters_by_word_count(	CODE
LOW	crawl4ai/extraction_strategy.py	2109	def _create_selector_function(self, selector_str):	CODE
LOW	crawl4ai/extraction_strategy.py	2211	def _handle_nth_child_selector(self, element, selector_str):	CODE
LOW	crawl4ai/browser_adapter.py	43	async def retrieve_console_messages(self, page: Page) -> List[Dict]:	CODE
LOW	crawl4ai/browser_adapter.py	133	async def retrieve_console_messages(self, page: Page) -> List[Dict]:	CODE
LOW	crawl4ai/browser_adapter.py	159	def _check_stealth_availability(self) -> bool:	CODE
LOW	crawl4ai/browser_adapter.py	253	async def retrieve_console_messages(self, page: Page) -> List[Dict]:	CODE
LOW	crawl4ai/browser_adapter.py	372	async def retrieve_console_messages(self, page: UndetectedPage) -> List[Dict]:	STRING
LOW	crawl4ai/__init__.py	211	# def is_sync_version_installed():	COMMENT
LOW	crawl4ai/markdown_generation_strategy.py	82	def convert_links_to_citations(	CODE
LOW	crawl4ai/content_scraping_strategy.py	380	def find_closest_parent_with_useful_text(	CODE
LOW	crawl4ai/content_scraping_strategy.py	517	def remove_empty_elements_fast(self, root, word_count_threshold=5):	CODE
LOW	crawl4ai/content_scraping_strategy.py	577	def remove_unwanted_attributes_fast(	CODE
LOW	crawl4ai/user_agent_generator.py	344	def generate_with_client_hints(self, **kwargs) -> Tuple[str, str]:	CODE
LOW⚡	crawl4ai/cli.py	467	def delete_profile_interactive(profiler: BrowserProfiler):	CODE
LOW	crawl4ai/cli.py	444	async def create_profile_interactive(profiler: BrowserProfiler):	CODE
LOW⚡	crawl4ai/utils.py	1143	def get_content_of_website_optimized(	CODE
LOW⚡	crawl4ai/utils.py	1195	def find_closest_parent_with_useful_text(tag):	CODE
LOW⚡	crawl4ai/utils.py	3229	def start_colab_display_server():	CODE
LOW	crawl4ai/utils.py	534	def calculate_semaphore_count():	CODE
LOW	crawl4ai/utils.py	707	def split_and_parse_json_objects(json_string):	CODE
LOW	crawl4ai/utils.py	981	def replace_pre_tags_with_text(node):	CODE
LOW	crawl4ai/utils.py	1020	def remove_empty_and_low_word_count_elements(node, word_count_threshold):	CODE
LOW	crawl4ai/utils.py	1242	def score_image_for_usefulness(img, base_url, index, images_count):	CODE
LOW	crawl4ai/utils.py	1497	def extract_metadata_using_lxml(html, doc=None):	CODE
LOW	crawl4ai/utils.py	1742	def perform_completion_with_backoff(	CODE
LOW	crawl4ai/utils.py	1834	async def aperform_completion_with_backoff(	CODE
LOW	crawl4ai/utils.py	2040	def merge_chunks_based_on_token_threshold(chunks, token_threshold):	CODE
LOW	crawl4ai/utils.py	2317	def normalize_url_for_deep_crawl(href, base_url, preserve_https=False, original_scheme=None):	CODE
LOW	crawl4ai/utils.py	2376	def efficient_normalize_url_for_deep_crawl(href, base_url, preserve_https=False, original_scheme=None):	CODE
LOW	crawl4ai/utils.py	2928	def configure_windows_event_loop():	CODE
LOW	crawl4ai/utils.py	3084	def preprocess_html_for_schema(html_content, text_threshold=100, attr_value_threshold=200, max_size=100000):	CODE
LOW	crawl4ai/utils.py	3344	def calculate_link_intrinsic_score(	STRING
LOW	crawl4ai/utils.py	3620	def get_true_available_memory_gb() -> float:	STRING
LOW	crawl4ai/utils.py	3662	def get_true_memory_usage_percent() -> float:	STRING
LOW	crawl4ai/browser_profiler.py	960	async def launch_standalone_browser(self,	CODE
LOW	crawl4ai/browser_profiler.py	1355	async def get_builtin_browser_status(self) -> Dict[str, Any]:	CODE
LOW⚡	crawl4ai/async_crawler_strategy.py	679	async def handle_request_failed_capture(request):	STRING
LOW⚡	crawl4ai/async_crawler_strategy.py	1666	async def _capture_console_messages(	STRING
1625 more matches not shown…

Cross-File Repetition231 hits · 1155 pts

Severity	File	Line	Snippet	Context
HIGH	CHANGELOG.md	0	const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); }	STRING
HIGH	deploy/docker/c4ai-doc-context.md	0	const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); }	STRING
HIGH	docs/md_v2/advanced/file-downloading.md	0	const downloadlink = document.queryselector('a[href$=".exe"]'); if (downloadlink) { downloadlink.click(); }	STRING
HIGH	crawl4ai/proxy_strategy.py	0	configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona	STRING
HIGH	crawl4ai/async_configs.py	0	configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	configuration class for a single proxy. args: server: proxy server url (e.g., "http://127.0.0.1:8080") username: optiona	STRING
HIGH	crawl4ai/proxy_strategy.py	0	load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str	STRING
HIGH	crawl4ai/async_configs.py	0	load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	load proxies from environment variable. args: env_var: name of environment variable containing comma-separated proxy str	STRING
HIGH	crawl4ai/proxy_strategy.py	0	create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd	STRING
HIGH	crawl4ai/async_configs.py	0	create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	create a copy of this configuration with updated values. args: **kwargs: key-value pairs of configuration options to upd	STRING
HIGH	crawl4ai/async_crawler_strategy.py	0	const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig	STRING
HIGH	crawl4ai/browser_manager.py	0	const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig	STRING
HIGH	tests/browser/test_init_script_dedup.py	0	const _origattachshadow = element.prototype.attachshadow; element.prototype.attachshadow = function(init) { return _orig	STRING
HIGH	crawl4ai/async_configs.py	0	recursively convert an object to a serializable dictionary using {type, params} structure for complex objects.	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	recursively convert an object to a serializable dictionary using {type, params} structure for complex objects.	STRING
HIGH	tests/docker/test_serialization.py	0	recursively convert an object to a serializable dictionary using {type, params} structure for complex objects.	STRING
HIGH	crawl4ai/deep_crawling/bfs_strategy.py	0	batch (non-streaming) mode: processes one bfs level at a time, then yields all the results.	STRING
HIGH	crawl4ai/deep_crawling/base_strategy.py	0	batch (non-streaming) mode: processes one bfs level at a time, then yields all the results.	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	batch (non-streaming) mode: processes one bfs level at a time, then yields all the results.	STRING
HIGH	crawl4ai/deep_crawling/bfs_strategy.py	0	streaming mode: processes one bfs level at a time and yields results immediately as they arrive.	STRING
HIGH	crawl4ai/deep_crawling/base_strategy.py	0	streaming mode: processes one bfs level at a time and yields results immediately as they arrive.	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	streaming mode: processes one bfs level at a time and yields results immediately as they arrive.	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi	STRING
HIGH	deploy/docker/c4ai-doc-context.md	0	from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi	STRING
HIGH	docs/md_v2/complete-sdk-reference.md	0	from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi	STRING
HIGH	docs/md_v2/core/quickstart.md	0	from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi	STRING
HIGH	docs/examples/quickstart_examples_set_2.py	0	from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi	STRING
HIGH	docs/examples/llm_extraction_openai_pricing.py	0	from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi	STRING
HIGH	docs/examples/quickstart.py	0	from the crawled content, extract all mentioned model names along with their fees for input and output tokens. do not mi	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t	STRING
HIGH	deploy/docker/c4ai-doc-context.md	0	(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t	STRING
HIGH	docs/md_v2/complete-sdk-reference.md	0	(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t	STRING
HIGH	docs/md_v2/core/quickstart.md	0	(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t	STRING
HIGH	docs/examples/quickstart_examples_set_2.py	0	(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t	STRING
HIGH	docs/examples/quickstart.py	0	(async () => { const tabs = document.queryselectorall("section.charge-methodology .tabs-menu-3 > div"); for(let tab of t	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click();	STRING
HIGH	tests/async/test_edge_cases.py	0	const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click();	STRING
HIGH	docs/examples/quickstart_examples_set_2.py	0	const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click();	STRING
HIGH	docs/examples/quickstart.py	0	const button = document.queryselector('a[data-testid="pagination-next-button"]'); if (button) button.click();	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	(async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu	STRING
HIGH	docs/examples/quickstart_examples_set_2.py	0	(async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu	STRING
HIGH	docs/examples/quickstart.py	0	(async () => { const getcurrentcommit = () => { const commits = document.queryselectorall('li.box-sc-g0xbh4-0 h4'); retu	STRING
HIGH	deploy/docker/c4ai-code-context.md	0	part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial.	STRING
HIGH	docs/examples/deepcrawl_example.py	0	part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial.	STRING
HIGH	docs/examples/docker_config_obj.py	0	part 6: wrap-up and key takeaways summarize the key concepts learned in this tutorial.	STRING
HIGH	deploy/docker/README.md	0	test the /crawl/stream endpoint with multiple urls.	STRING
HIGH	deploy/docker/c4ai-doc-context.md	0	test the /crawl/stream endpoint with multiple urls.	STRING
HIGH	tests/docker/test_server_token.py	0	test the /crawl/stream endpoint with multiple urls.	STRING
HIGH	docs/md_v2/core/self-hosting.md	0	test the /crawl/stream endpoint with multiple urls.	STRING
HIGH	docs/examples/docker_python_rest_api.py	0	test the /crawl/stream endpoint with multiple urls.	STRING
HIGH	deploy/docker/c4ai-doc-context.md	0	focus on extracting the core educational content. include: - key concepts and explanations - important code examples - e	STRING
HIGH	docs/md_v2/complete-sdk-reference.md	0	focus on extracting the core educational content. include: - key concepts and explanations - important code examples - e	STRING
HIGH	docs/md_v2/core/browser-crawler-config.md	0	focus on extracting the core educational content. include: - key concepts and explanations - important code examples - e	STRING
HIGH	docs/md_v2/core/markdown-generation.md	0	focus on extracting the core educational content. include: - key concepts and explanations - important code examples - e	STRING
HIGH	deploy/docker/c4ai-doc-context.md	0	extract the main educational content while preserving its original wording and substance completely. 1. maintain the exa	STRING
HIGH	docs/md_v2/complete-sdk-reference.md	0	extract the main educational content while preserving its original wording and substance completely. 1. maintain the exa	STRING
HIGH	docs/md_v2/core/markdown-generation.md	0	extract the main educational content while preserving its original wording and substance completely. 1. maintain the exa	STRING
HIGH	deploy/docker/c4ai-doc-context.md	0	focus on extracting specific types of content: - technical documentation - code examples - api references reformat the c	STRING
171 more matches not shown…

Excessive Try-Catch Wrapping1033 hits · 1052 pts

Severity	File	Line	Snippet	Context
LOW	setup.py	40	except Exception:	CODE
LOW⚡	test_webhook_implementation.py	29	except Exception as e:	CODE
LOW⚡	test_webhook_implementation.py	37	except Exception as e:	CODE
LOW	test_webhook_implementation.py	87	except Exception as e:	CODE
LOW	test_webhook_implementation.py	139	except Exception as e:	CODE
LOW	test_webhook_implementation.py	200	except Exception as e:	CODE
LOW	test_webhook_implementation.py	229	except Exception as e:	CODE
LOW	test_webhook_implementation.py	264	except Exception as e:	CODE
LOW	test_llm_webhook_feature.py	59	except Exception as e:	CODE
LOW	test_llm_webhook_feature.py	95	except Exception as e:	CODE
LOW	test_llm_webhook_feature.py	130	except Exception as e:	CODE
LOW	test_llm_webhook_feature.py	181	except Exception as e:	CODE
LOW	test_llm_webhook_feature.py	233	except Exception as e:	CODE
LOW	test_llm_webhook_feature.py	279	except Exception as e:	CODE
LOW	test_llm_webhook_feature.py	346	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	82	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	110	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	164	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	185	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	217	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	383	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	425	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	470	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	511	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	582	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	600	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	617	except Exception as e:	CODE
LOW	crawl4ai/async_database.py	633	except Exception as e:	CODE
LOW	crawl4ai/ssl_certificate.py	124	except Exception as e:	CODE
MEDIUM	crawl4ai/ssl_certificate.py	125	print(f"Error fetching/processing certificate for {url}: {e}")	CODE
LOW	crawl4ai/ssl_certificate.py	183	except Exception as e:	CODE
MEDIUM	crawl4ai/ssl_certificate.py	184	print(f"Error converting to PEM: {e}")	CODE
LOW	crawl4ai/ssl_certificate.py	196	except Exception as e:	CODE
MEDIUM	crawl4ai/ssl_certificate.py	197	print(f"Error converting to DER: {e}")	CODE
LOW	crawl4ai/proxy_strategy.py	45	except Exception:	CODE
LOW	crawl4ai/proxy_strategy.py	96	except Exception as e:	CODE
MEDIUM	crawl4ai/proxy_strategy.py	97	print(f"Error loading proxies from environment: {e}")	CODE
MEDIUM	crawl4ai/adaptive_crawler.py	1503	print(f"Error crawling {url}: {e}")	CODE
MEDIUM	crawl4ai/adaptive_crawler.py	1525	print(f"Error in batch crawl: {result}")	CODE
LOW	crawl4ai/adaptive_crawler.py	825	except Exception:	CODE
LOW	crawl4ai/adaptive_crawler.py	1502	except Exception as e:	CODE
MEDIUM⚡	crawl4ai/extraction_strategy.py	2077	print(f"Error parsing HTML, falling back to alternative method: {e}")	CODE
MEDIUM⚡	crawl4ai/extraction_strategy.py	2175	print(f"Error applying selector '{selector_str}': {e}")	CODE
MEDIUM⚡	crawl4ai/extraction_strategy.py	2183	print(f"Error compiling selector '{selector_str}': {e}")	CODE
MEDIUM⚡	crawl4ai/extraction_strategy.py	2236	print(f"Error handling nth-child selector: {e}")	CODE
MEDIUM⚡	crawl4ai/extraction_strategy.py	2305	print(f"Error serializing HTML: {e}")	CODE
MEDIUM⚡	crawl4ai/extraction_strategy.py	2314	print(f"Error getting attribute '{attribute}': {e}")	CODE
MEDIUM⚡	crawl4ai/extraction_strategy.py	2400	print(f"Error applying selector '{selector_str}': {e}")	CODE
MEDIUM⚡	crawl4ai/extraction_strategy.py	2406	print(f"Error compiling selector '{selector_str}': {e}")	CODE
MEDIUM	crawl4ai/extraction_strategy.py	830	print(f"Error in thread execution: {e}")	CODE
MEDIUM	crawl4ai/extraction_strategy.py	1006	print(f"Error in async extraction: {result}")	CODE
MEDIUM	crawl4ai/extraction_strategy.py	1175	print(f"Error extracting field {field['name']}: {str(e)}")	CODE
MEDIUM	crawl4ai/extraction_strategy.py	1307	print(f"Error computing field {field['name']}: {str(e)}")	CODE
MEDIUM	crawl4ai/extraction_strategy.py	2263	print(f"Error in fallback class/id search: {e}")	CODE
MEDIUM	crawl4ai/extraction_strategy.py	2292	print(f"Error extracting text: {e}")	CODE
MEDIUM	crawl4ai/extraction_strategy.py	1148	def _extract_field(self, element, field):	CODE
MEDIUM	crawl4ai/extraction_strategy.py	1293	def _compute_field(self, item, field):	CODE
MEDIUM	crawl4ai/extraction_strategy.py	2357	def select_func(element):	CODE
LOW⚡	crawl4ai/extraction_strategy.py	2075	except Exception as e:	CODE
LOW⚡	crawl4ai/extraction_strategy.py	2080	except Exception as e2:	CODE
973 more matches not shown…

Unused Imports607 hits · 492 pts

Severity	File	Line	Context
LOW	test_webhook_implementation.py	34	CODE
LOW	test_llm_webhook_feature.py	23	CODE
LOW	test_llm_webhook_feature.py	24	CODE
LOW	crawl4ai/async_database.py	15	CODE
LOW	crawl4ai/adaptive_crawler.py	13	CODE
LOW	crawl4ai/adaptive_crawler.py	14	CODE
LOW	crawl4ai/adaptive_crawler.py	17	CODE
LOW	crawl4ai/adaptive_crawler.py	878	CODE
LOW	crawl4ai/link_preview.py	8	CODE
LOW	crawl4ai/extraction_strategy.py	18	CODE
LOW	crawl4ai/extraction_strategy.py	29	CODE
LOW	crawl4ai/extraction_strategy.py	33	CODE
LOW	crawl4ai/browser_adapter.py	10	CODE
LOW	crawl4ai/domain_mapper.py	16	CODE
LOW	crawl4ai/domain_mapper.py	22	CODE
LOW	crawl4ai/domain_mapper.py	24	CODE
LOW	crawl4ai/domain_mapper.py	26	CODE
LOW	crawl4ai/domain_mapper.py	27	CODE
LOW	crawl4ai/domain_mapper.py	30	CODE
LOW	crawl4ai/domain_mapper.py	48	CODE
LOW	crawl4ai/domain_mapper.py	41	CODE
LOW	crawl4ai/domain_mapper.py	57	CODE
LOW	crawl4ai/__init__.py	4	CODE
LOW	crawl4ai/__init__.py	4	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	6	CODE
LOW	crawl4ai/__init__.py	8	CODE
LOW	crawl4ai/__init__.py	8	CODE
LOW	crawl4ai/__init__.py	8	CODE
LOW	crawl4ai/__init__.py	13	CODE
LOW	crawl4ai/__init__.py	14	CODE
LOW	crawl4ai/__init__.py	14	CODE
LOW	crawl4ai/__init__.py	18	CODE
LOW	crawl4ai/__init__.py	18	CODE
LOW	crawl4ai/__init__.py	22	CODE
LOW	crawl4ai/__init__.py	22	CODE
LOW	crawl4ai/__init__.py	22	CODE
LOW	crawl4ai/__init__.py	22	CODE
LOW	crawl4ai/__init__.py	22	CODE
LOW	crawl4ai/__init__.py	22	CODE
LOW	crawl4ai/__init__.py	22	CODE
LOW	crawl4ai/__init__.py	31	CODE
LOW	crawl4ai/__init__.py	31	CODE
LOW	crawl4ai/__init__.py	32	CODE
LOW	crawl4ai/__init__.py	33	CODE
LOW	crawl4ai/__init__.py	33	CODE
LOW	crawl4ai/__init__.py	33	CODE
LOW	crawl4ai/__init__.py	33	CODE
LOW	crawl4ai/__init__.py	39	CODE
LOW	crawl4ai/__init__.py	39	CODE
LOW	crawl4ai/__init__.py	39	CODE
547 more matches not shown…

Self-Referential Comments152 hits · 464 pts

Severity	File	Line	Snippet	Context
MEDIUM	crawl4ai/async_database.py	677	# Create a singleton instance	COMMENT
MEDIUM	crawl4ai/ssl_certificate.py	90	# Create the dictionary directly	COMMENT
MEDIUM	crawl4ai/adaptive_crawler.py	133	# Create a mock object that has the minimal interface we need	COMMENT
MEDIUM	crawl4ai/link_preview.py	212	# Create a wrapper to track progress	COMMENT
MEDIUM	crawl4ai/link_preview.py	241	# Create a custom progress tracking version	COMMENT
MEDIUM	crawl4ai/extraction_strategy.py	2125	# Create the wrapper function that implements the selection strategy	COMMENT
MEDIUM	crawl4ai/extraction_strategy.py	2131	# Create a cache key based on element and selector	COMMENT
MEDIUM	crawl4ai/extraction_strategy.py	2356	# Create a function that will apply this selector appropriately	COMMENT
MEDIUM	crawl4ai/content_scraping_strategy.py	919	# Create a config object for LinkPreview	COMMENT
MEDIUM⚡	crawl4ai/cli.py	456	# Create the profile	COMMENT
MEDIUM	crawl4ai/cli.py	361	# Create a profile and use it for crawling	COMMENT
MEDIUM	crawl4ai/utils.py	516	# Create the box with colored borders and lighter text	COMMENT
MEDIUM	crawl4ai/utils.py	980	# Create a function that replace content of all"pre" tag with its inner text	COMMENT
MEDIUM	crawl4ai/utils.py	3186	# # Create a signature based on tag and classes	COMMENT
MEDIUM	crawl4ai/browser_profiler.py	158	# Create a logger if not provided	COMMENT
MEDIUM	crawl4ai/browser_profiler.py	995	# Create a temporary profile directory	COMMENT
MEDIUM	crawl4ai/browser_profiler.py	1200	# Create a user data directory for the builtin browser	COMMENT
MEDIUM	crawl4ai/browser_profiler.py	1382	# Create a new profile	COMMENT
MEDIUM	crawl4ai/browser_profiler.py	408	# Create a profile interactively	STRING
MEDIUM	crawl4ai/browser_profiler.py	829	# Define a custom crawl function	STRING
MEDIUM	crawl4ai/async_crawler_strategy.py	1643	# Create a new CDP session	STRING
MEDIUM	crawl4ai/model_loader.py	226	# Create the models directory if it doesn't exist	COMMENT
MEDIUM	crawl4ai/async_configs.py	1012	# Create a funciton returns dict of the object	COMMENT
MEDIUM	crawl4ai/async_configs.py	2073	# Create a funciton returns dict of the object	COMMENT
MEDIUM	crawl4ai/async_configs.py	2201	# Create a new config with streaming enabled	STRING
MEDIUM	crawl4ai/async_configs.py	2204	# Create a new config with multiple updates	STRING
MEDIUM	crawl4ai/async_url_seeder.py	1240	# Create a bounded queue for results to prevent RAM issues	COMMENT
MEDIUM⚡	crawl4ai/browser_manager.py	1694	# Create a new page from the chosen context	COMMENT
MEDIUM	crawl4ai/browser_manager.py	482	# Create a BrowserProfiler instance and delegate to it	COMMENT
MEDIUM	crawl4ai/browser_manager.py	505	# Create a BrowserProfiler instance and delegate to it	COMMENT
MEDIUM	crawl4ai/browser_manager.py	528	# Create a BrowserProfiler instance and delegate to it	COMMENT
MEDIUM	crawl4ai/chunking_strategy.py	7	# Define the abstract base class for chunking strategies	COMMENT
MEDIUM	crawl4ai/chunking_strategy.py	27	# Create an identity chunking strategy f(x) = [x]	COMMENT
MEDIUM	crawl4ai/async_webcrawler.py	266	# Initialize processing variables	COMMENT
MEDIUM	crawl4ai/async_webcrawler.py	829	# Define the source selection logic using dict dispatch	COMMENT
MEDIUM	crawl4ai/deep_crawling/bfs_strategy.py	51	# Create a new logger if logger is None, dict, or any other non-Logger type	COMMENT
MEDIUM	crawl4ai/deep_crawling/bff_strategy.py	62	# Create a new logger if logger is None, dict, or any other non-Logger type	COMMENT
MEDIUM	crawl4ai/js_snippet/__init__.py	4	# Create a function get name of a js script, then load from the CURRENT folder of this script and return its content as	COMMENT
MEDIUM	crawl4ai/components/crawler_monitor.py	167	# Create the status text	COMMENT
MEDIUM	crawl4ai/components/crawler_monitor.py	180	# Create a table for status counts	COMMENT
MEDIUM	crawl4ai/components/crawler_monitor.py	261	# Create a table for task details	COMMENT
MEDIUM	crawl4ai/components/crawler_monitor.py	374	# Create a more visible footer panel	COMMENT
MEDIUM⚡	tests/test_pyopenssl_security_fix.py	75	# Create a basic SSL context to verify functionality	COMMENT
MEDIUM	tests/test_raw_html_redirected_url.py	16	# Create a dummy decorator	COMMENT
MEDIUM	tests/test_raw_html_redirected_url.py	51	# Create a large HTML (100KB+)	COMMENT
MEDIUM⚡	tests/test_raw_html_edge_cases.py	259	# Create a temp file	COMMENT
MEDIUM	tests/docker/test_config_object.py	54	# Create the config	COMMENT
MEDIUM	tests/memory/test_dispatcher_stress.py	36	# Create a memory restrictor to simulate limited memory environment	COMMENT
MEDIUM	tests/memory/test_dispatcher_stress.py	330	# Create a nightmare scenario - multiple overlapping spikes	COMMENT
MEDIUM	tests/memory/benchmark_report.py	230	# Create the plot	COMMENT
MEDIUM	tests/memory/benchmark_report.py	307	# Create the plot	COMMENT
MEDIUM	tests/memory/benchmark_report.py	871	# Create the benchmark reporter	STRING
MEDIUM	tests/proxy/test_proxy_config.py	477	# Create a large list of proxy strings	COMMENT
MEDIUM⚡	tests/general/test_mhtml.py	27	# Create a fresh browser config and crawler instance for this test	COMMENT
MEDIUM⚡	tests/general/test_mhtml.py	32	# Create a fresh crawler instance	COMMENT
MEDIUM⚡	tests/general/test_mhtml.py	89	# Create a fresh browser config and crawler instance for this test	COMMENT
MEDIUM⚡	tests/general/test_mhtml.py	94	# Create a fresh crawler instance	COMMENT
MEDIUM⚡	tests/general/test_mhtml.py	129	# Create a fresh browser config and crawler instance for this test	COMMENT
MEDIUM⚡	tests/general/test_mhtml.py	134	# Create a fresh crawler instance	COMMENT
MEDIUM⚡	tests/general/test_mhtml.py	167	# Create a fresh browser config and crawler instance for this test	COMMENT
92 more matches not shown…

Structural Annotation Overuse249 hits · 400 pts

Severity	File	Line	Snippet	Context
LOW	crawl4ai/cache_validator.py	112	# Step 1: Try HEAD request with conditional headers	COMMENT
LOW	crawl4ai/cache_validator.py	156	# Step 2: No conditional headers available, try fingerprint only	COMMENT
LOW	crawl4ai/cache_validator.py	180	# Step 3: No validation data available	COMMENT
LOW	crawl4ai/utils.py	2275	# IMPORTANT: Don't use quote(unquote()) as it mangles + signs in URLs	COMMENT
LOW⚡	crawl4ai/async_url_seeder.py	914	# Step 1: Find sitemap URL and get lastmod (needed for validation)	COMMENT
LOW⚡	crawl4ai/async_url_seeder.py	938	# Step 2: Check cache validity (skip if force=True)	COMMENT
LOW⚡	crawl4ai/async_url_seeder.py	993	# Step 4: Write to cache (FALLBACK: if write fails, URLs still yielded above)	COMMENT
LOW	crawl4ai/async_url_seeder.py	952	# Step 3: Fetch fresh URLs	COMMENT
LOW⚡	crawl4ai/cloud/cli.py	253	# Step 1: Shrink (unless --no-shrink)	COMMENT
LOW⚡	crawl4ai/cloud/cli.py	266	# Step 2: Package as tar.gz	COMMENT
LOW	crawl4ai/cloud/cli.py	281	# Step 3: Upload	COMMENT
LOW⚡	deploy/docker/c4ai-doc-context.md	2985	# Step 1: Create a pruning filter	COMMENT
LOW⚡	deploy/docker/c4ai-doc-context.md	2995	# Step 2: Insert it into a Markdown Generator	COMMENT
LOW⚡	deploy/docker/c4ai-doc-context.md	2998	# Step 3: Pass it to CrawlerRunConfig	COMMENT
LOW	deploy/docker/c4ai-doc-context.md	3857	# Step 1: Crawl the Web URL	COMMENT
LOW	deploy/docker/c4ai-doc-context.md	3871	# Step 2: Crawl from the Local HTML File	COMMENT
LOW	deploy/docker/c4ai-doc-context.md	3885	# Step 3: Crawl Using Raw HTML Content	COMMENT
LOW	deploy/docker/c4ai-doc-context.md	4554	# Step 1: Load initial Hacker News page	COMMENT
LOW	deploy/docker/c4ai-doc-context.md	4565	# Step 2: Let's scroll and click the "More" link	COMMENT
LOW	deploy/docker/c4ai-doc-context.md	4659	# Step 1: Load initial commits	COMMENT
LOW	deploy/docker/c4ai-doc-context.md	4674	# Step 2: For subsequent pages, we run JS to click 'Next Page' if it exists	COMMENT
LOW⚡	tests/test_webhook_feature.sh	104	# Step 1: Save current branch and fetch PR	COMMENT
LOW⚡	tests/test_webhook_feature.sh	112	# Step 2: Switch to new branch	COMMENT
LOW⚡	tests/test_webhook_feature.sh	117	# Step 3: Activate virtual environment	COMMENT
LOW⚡	tests/test_webhook_feature.sh	128	# Step 4: Install server dependencies	COMMENT
LOW	tests/test_webhook_feature.sh	147	# Step 5: Start Redis in background	COMMENT
LOW	tests/test_webhook_feature.sh	183	# Step 6: Create and run webhook test	COMMENT
LOW	tests/test_webhook_feature.sh	292	# Step 7: Verify results	COMMENT
LOW	tests/test_webhook_feature.sh	303	# Step 8: Cleanup happens automatically via trap	COMMENT
LOW⚡	tests/test_pyopenssl_update.py	141	# Step 1: Check versions	COMMENT
LOW⚡	tests/test_pyopenssl_update.py	147	# Step 2: Test basic crawling	COMMENT
LOW⚡	tests/test_pyopenssl_update.py	153	# Step 3: Test stealth mode	COMMENT
LOW⚡	tests/WEBHOOK_TEST_README.md	45	#### Step 1: Branch Management	COMMENT
LOW⚡	tests/WEBHOOK_TEST_README.md	50	#### Step 2: Environment Setup	COMMENT
LOW⚡	tests/WEBHOOK_TEST_README.md	55	#### Step 3: Service Startup	COMMENT
LOW⚡	tests/WEBHOOK_TEST_README.md	60	#### Step 4: Webhook Test	COMMENT
LOW⚡	tests/WEBHOOK_TEST_README.md	66	#### Step 5: Cleanup	COMMENT
LOW⚡	tests/proxy/test_proxy_verify.py	79	# Step 1: Verify IPs	COMMENT
LOW⚡	tests/proxy/test_proxy_verify.py	86	# Step 2: Get NST proxies	COMMENT
LOW	tests/proxy/test_proxy_verify.py	97	# Step 3: Test Chanel with all available proxies	COMMENT
LOW⚡	tests/general/test_async_url_seeder_bm25.py	558	# Step 1: Discover and score URLs	COMMENT
LOW⚡	tests/general/test_async_url_seeder_bm25.py	587	# Step 3: Verify these URLs would be good for actual crawling	COMMENT
LOW	tests/general/test_async_url_seeder_bm25.py	573	# Step 2: Analyze top results	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	608	# Step 1: open all sessions	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	615	# Step 2: navigate each session to a second page	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	620	# Step 3: kill sessions one by one, verify others unaffected	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	936	# Step 1: open session	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	943	# Step 2: concurrent non-session crawls	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	952	# Step 3: kill session	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	955	# Step 4: trigger recycle	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	962	# Step 5: new session on fresh browser	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	970	# Step 6: verify it works	COMMENT
LOW⚡	tests/async/test_browser_memory.py	774	# Step 1: login — sets cookie	COMMENT
LOW⚡	tests/async/test_browser_memory.py	779	# Step 2: dashboard — cookie should carry over via session	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	52	# Step 1: Create a BrowserManager with builtin mode	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	57	# Step 2: Check if we have a BuiltinBrowserStrategy	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	69	# Step 3: Start the manager to launch or connect to builtin browser	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	78	# Step 4: Get browser info from the strategy	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	122	# Step 2: Get multiple pages	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	149	# Step 1: Get browser status	COMMENT
189 more matches not shown…

Cross-Language Confusion (JS/TS)42 hits · 255 pts

Severity	File	Line	Snippet	Context
HIGH	docs/md_v2/ask_ai/ask-ai.js	72	print(result.markdown[:300]) # Print first 300 chars	CODE
HIGH	docs/md_v2/marketplace/frontend/app-detail.js	155	print(result.markdown)`;	CODE
HIGH⚡	docs/md_v2/marketplace/frontend/app-detail.js	172	print(result.status_code)`;	CODE
HIGH⚡	docs/md_v2/marketplace/frontend/app-detail.js	191	print(result.extracted_content)`;	CODE
HIGH	docs/md_v2/marketplace/frontend/app-detail.js	240	print(f"Found {len(products)} products")	CODE
HIGH	docs/md_v2/marketplace/frontend/app-detail.js	243	print(f"- {product['title']}: {product['price']}")	CODE
HIGH⚡	…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js	2418	print("✅ Automation completed successfully!")	CODE
HIGH⚡	…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js	2419	print(f"Final URL: {result.url}")	CODE
HIGH⚡	…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js	2423	print("❌ Automation failed:", result.error_message)	CODE
HIGH⚡	…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js	2452	print(f"💾 C4A Script saved to: {script_path}")	STRING
HIGH⚡	…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js	2453	print("\\n📜 Generated C4A Script:")	STRING
HIGH⚡	…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js	2454	print(C4A_SCRIPT)	STRING
HIGH⚡	…md_v2/apps/crawl4ai-assistant/content/scriptBuilder.js	2464	print("\\n💡 To execute this C4A script, compile it to JavaScript first!")	STRING
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1737	print(f"\\n✅ Successfully extracted {len(data)} items!")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1744	print("\\n📊 Sample results (first 2 items):")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1746	print(f"\\nItem {i}:")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1748	print(f" {key}: {value}")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1752	print("❌ Extraction failed:", result.error_message)	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1753	return None	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1759	print("\\n🎯 Next steps:")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1760	print("1. Install Crawl4AI: pip install crawl4ai")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1761	print("2. Modify the URL or add multiple URLs")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1762	print("3. Customize crawler options as needed")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1763	print("4. Check 'extracted_data.json' for full results")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1817	print("✅ Schema generated successfully!")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1818	print(f"📄 Schema saved to: {schema_path}")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1819	print("\\nGenerated schema:")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1820	print(json.dumps(schema, indent=2))	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1825	print(f"❌ Error generating schema: {e}")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1826	return None	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1830	print("\\n🧪 Testing extraction on live webpage...")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1837	print("❌ Schema file not found. Run generate_schema() first.")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1859	print(f"\\n✅ Successfully extracted {len(data)} items!")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1866	print("\\n📊 Sample results (first 2 items):")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1868	print(f"\\nItem {i}:")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1870	print(f" {key}: {value}")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1872	print("❌ Extraction failed:", result.error_message)	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1882	print("\\n🎯 Next steps:")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1883	print("1. Review the generated schema in 'generated_schema.json'")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1884	print("2. Uncomment the test_extraction() line to test on the live site")	CODE
HIGH⚡	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1885	print("3. Use the schema in your Crawl4AI projects!")	CODE
HIGH	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	1803	print("🔧 Generating extraction schema...")	CODE

Deep Nesting264 hits · 212 pts

Severity	File	Line	Context
LOW	crawl4ai/async_database.py	102	CODE
LOW	crawl4ai/async_database.py	478	CODE
LOW	crawl4ai/ssl_certificate.py	62	CODE
LOW	crawl4ai/adaptive_crawler.py	871	CODE
LOW	crawl4ai/adaptive_crawler.py	1330	CODE
LOW	crawl4ai/adaptive_crawler.py	1570	CODE
LOW	crawl4ai/adaptive_crawler.py	1845	CODE
LOW	crawl4ai/extraction_strategy.py	641	CODE
LOW	crawl4ai/extraction_strategy.py	786	CODE
LOW	crawl4ai/extraction_strategy.py	843	CODE
LOW	crawl4ai/extraction_strategy.py	1088	CODE
LOW	crawl4ai/extraction_strategy.py	1178	CODE
LOW	crawl4ai/extraction_strategy.py	1764	CODE
LOW	crawl4ai/extraction_strategy.py	2109	CODE
LOW	crawl4ai/extraction_strategy.py	2348	CODE
LOW	crawl4ai/extraction_strategy.py	2126	CODE
LOW	crawl4ai/extraction_strategy.py	2357	CODE
LOW	crawl4ai/domain_mapper.py	305	CODE
LOW	crawl4ai/domain_mapper.py	361	CODE
LOW	crawl4ai/domain_mapper.py	471	CODE
LOW	crawl4ai/domain_mapper.py	506	CODE
LOW	crawl4ai/domain_mapper.py	671	CODE
LOW	crawl4ai/domain_mapper.py	713	CODE
LOW	crawl4ai/domain_mapper.py	853	CODE
LOW	crawl4ai/domain_mapper.py	902	CODE
LOW	crawl4ai/domain_mapper.py	477	CODE
LOW	crawl4ai/cache_validator.py	83	CODE
LOW	crawl4ai/markdown_generation_strategy.py	148	CODE
LOW	crawl4ai/hub.py	41	CODE
LOW	crawl4ai/content_scraping_strategy.py	231	CODE
LOW	crawl4ai/content_scraping_strategy.py	410	CODE
LOW	crawl4ai/content_scraping_strategy.py	517	CODE
LOW	crawl4ai/content_scraping_strategy.py	577	CODE
LOW	crawl4ai/content_scraping_strategy.py	615	CODE
LOW	crawl4ai/user_agent_generator.py	261	CODE
LOW	crawl4ai/user_agent_generator.py	299	CODE
LOW	crawl4ai/async_dispatcher.py	175	CODE
LOW	crawl4ai/async_dispatcher.py	228	CODE
LOW	crawl4ai/async_dispatcher.py	374	CODE
LOW	crawl4ai/async_dispatcher.py	471	CODE
LOW	crawl4ai/async_dispatcher.py	530	CODE
LOW	crawl4ai/async_dispatcher.py	635	CODE
LOW	crawl4ai/cli.py	110	CODE
LOW	crawl4ai/cli.py	501	CODE
LOW	crawl4ai/cli.py	580	CODE
LOW	crawl4ai/cli.py	1032	CODE
LOW	crawl4ai/utils.py	76	CODE
LOW	crawl4ai/utils.py	419	CODE
LOW	crawl4ai/utils.py	555	CODE
LOW	crawl4ai/utils.py	707	CODE
LOW	crawl4ai/utils.py	889	CODE
LOW	crawl4ai/utils.py	1143	CODE
LOW	crawl4ai/utils.py	2169	CODE
LOW	crawl4ai/utils.py	3084	CODE
LOW	crawl4ai/utils.py	3344	CODE
LOW	crawl4ai/utils.py	3620	CODE
LOW	crawl4ai/utils.py	1335	CODE
LOW	crawl4ai/browser_profiler.py	83	CODE
LOW	crawl4ai/browser_profiler.py	196	CODE
LOW	crawl4ai/browser_profiler.py	252	CODE
204 more matches not shown…

Cross-Language Confusion41 hits · 210 pts

Severity	File	Line	Snippet	Context
HIGH	crawl4ai/browser_adapter.py	306	window.__capturedConsole.push({	CODE
HIGH	crawl4ai/browser_adapter.py	340	window.__capturedErrors.push({	CODE
HIGH	crawl4ai/browser_adapter.py	357	window.__capturedErrors.push({	CODE
HIGH	crawl4ai/browser_adapter.py	360	stack: event.reason && event.reason.stack ? event.reason.stack : '',	CODE
HIGH	crawl4ai/prompts.py	1357	if (card && card.shadowRoot) {	CODE
HIGH	crawl4ai/prompts.py	1366	if (card && card.shadowRoot) {	CODE
HIGH	crawl4ai/async_crawler_strategy.py	2063	# return {{ success: false, error: err.toString(), stack: err.stack }};	COMMENT
HIGH	crawl4ai/async_crawler_strategy.py	1363	htmlChunks.push(previousHTML);	CODE
HIGH	crawl4ai/async_crawler_strategy.py	1375	htmlChunks.push(currentHTML);	CODE
HIGH	crawl4ai/async_crawler_strategy.py	1404	uniqueElements.push(element.outerHTML);	CODE
HIGH	crawl4ai/async_crawler_strategy.py	1530	error: error.toString(),	CODE
HIGH	crawl4ai/async_crawler_strategy.py	1574	error: error.toString(),	CODE
HIGH	crawl4ai/async_crawler_strategy.py	1870	if (rect.width > 0 && rect.height > 0) {	CODE
HIGH	crawl4ai/async_crawler_strategy.py	2081	return {{ success: false, error: err.toString(), stack: err.stack }};	CODE
HIGH	crawl4ai/async_crawler_strategy.py	2212	error: error.toString(),	CODE
HIGH	crawl4ai/async_crawler_strategy.py	2223	error: error.toString(),	CODE
HIGH	crawl4ai/async_crawler_strategy.py	2344	error: e.toString()	CODE
HIGH	deploy/docker/work_queue.py	10	limit set to 0 (or null) means "unbounded" - i.e. the previous behavior is fully	STRING
HIGH	tests/test_cloud_bugs_batch.py	96	httpbin_anything = '<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"args": {}, "da	CODE
HIGH	tests/test_main.py	73	"const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten	CODE
HIGH	tests/test_virtual_scroll.py	44	allData.push({	CODE
HIGH	tests/test_virtual_scroll.py	57	items.push(`<div class="item" data-index="${item.id}">${item.text}</div>`);	CODE
HIGH	tests/test_docker.py	91	"const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textContent.in	CODE
HIGH	tests/releases/test_release_0.6.4.py	89	function gtag(){dataLayer.push(arguments);}	CODE
HIGH	tests/releases/test_release_0.6.4.py	125	function gtag(){dataLayer.push(arguments);}	CODE
HIGH	tests/docker/test_hooks_comprehensive.py	465	return element ? element.getAttribute('content') : null;	CODE
HIGH	tests/docker/test_server_requests.py	145	# It might be null, missing, or populated depending on the server's default behavior	COMMENT
HIGH	tests/general/test_async_crawler_strategy.py	283	# results.push(e.name);	COMMENT
HIGH	tests/general/test_async_crawler_strategy.py	288	# results.push(e.name);	COMMENT
HIGH	tests/async/test_parameters_and_options.py	52	"const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textContent.in	CODE
HIGH	docs/examples/quickstart_examples_set_2.py	94	js_code="const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten	CODE
HIGH	docs/examples/quickstart_examples_set_2.py	361	return commits.length > 0 ? commits[0].textContent.trim() : null;	CODE
HIGH	docs/examples/quickstart_examples_set_2.py	371	if (newCommit && newCommit !== initialCommit) {	CODE
HIGH	docs/examples/stealth_mode_example.py	112	console.log('DETECTION_RESULTS:', JSON.stringify(detectionResults, null, 2));	CODE
HIGH	docs/examples/rest_call.py	32	loadMoreButton && loadMoreButton.click();	STRING
HIGH	docs/examples/crawlai_vs_firecrawl.py	53	"const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten	CODE
HIGH	docs/examples/docker_client_hooks_example.py	234	return el ? el.getAttribute('content') : null;	CODE
HIGH	docs/examples/quickstart.py	94	js_code="const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textConten	CODE
HIGH	docs/examples/quickstart.py	361	return commits.length > 0 ? commits[0].textContent.trim() : null;	CODE
HIGH	docs/examples/quickstart.py	371	if (newCommit && newCommit !== initialCommit) {	CODE
HIGH	…solver/capsolver_api_integration/solve_recaptcha_v3.py	47	args[0] = url.toString();	CODE

Redundant / Tautological Comments151 hits · 197 pts

Severity	File	Line	Snippet	Context
LOW	test_webhook_implementation.py	240	# Check if api.py can import webhook module	COMMENT
LOW	crawl4ai/async_database.py	49	# Check if version update is needed	COMMENT
LOW	crawl4ai/ssl_certificate.py	74	# Set check_hostname to False and verify_mode to CERT_NONE temporarily	COMMENT
LOW	crawl4ai/proxy_strategy.py	244	# Check if session exists and hasn't expired	COMMENT
LOW	crawl4ai/adaptive_crawler.py	538	# Check if we have any links left	COMMENT
LOW	crawl4ai/adaptive_crawler.py	715	# Check if KB has changed	COMMENT
LOW	crawl4ai/adaptive_crawler.py	1159	# Check if confidence is below minimum threshold (completely irrelevant)	COMMENT
LOW	crawl4ai/extraction_strategy.py	2222	# Check if there's content after the nth-child part	COMMENT
LOW	crawl4ai/content_scraping_strategy.py	934	# Check if we're already in an async context	COMMENT
LOW	crawl4ai/async_dispatcher.py	288	# Check if we're in critical memory state	COMMENT
LOW	crawl4ai/cli.py	1132	# Set output to JSON if not explicitly specified	COMMENT
LOW	crawl4ai/cli.py	1141	# Check if type does not exist show proper message	COMMENT
LOW	crawl4ai/cli.py	1400	# Check if the value should be one of the allowed options	COMMENT
LOW	crawl4ai/cli.py	1550	# Display results	COMMENT
LOW⚡	crawl4ai/utils.py	1202	# Check if the text content has at least word_count_threshold	COMMENT
LOW⚡	crawl4ai/utils.py	1208	# Check if an image has valid display and inside undesired html elements	COMMENT
LOW⚡	crawl4ai/utils.py	3234	# Check if running in Google Colab	COMMENT
LOW	crawl4ai/utils.py	290	# Check if cache is still fresh based on TTL	COMMENT
LOW	crawl4ai/utils.py	297	# Check if content actually changed	COMMENT
LOW	crawl4ai/utils.py	656	# Check if a path has already been saved for this browser type	COMMENT
LOW	crawl4ai/utils.py	1043	# Check if the tag contains text and if it's not just whitespace	COMMENT
LOW	crawl4ai/utils.py	1063	# Check if the tag itself is empty or all its children are empty/whitespace	COMMENT
LOW	crawl4ai/utils.py	1804	# Check if we have exhausted our max attempts	COMMENT
LOW	crawl4ai/utils.py	1897	# Check if we have exhausted our max attempts	COMMENT
LOW	crawl4ai/utils.py	2559	# Check if URL domain ends with base domain	COMMENT
LOW	crawl4ai/utils.py	3326	# Check if this is a documentation/reference site	STRING
LOW⚡	crawl4ai/browser_profiler.py	563	# Check if browser started successfully	COMMENT
LOW⚡	crawl4ai/browser_profiler.py	1287	# Check if the browser is still running	COMMENT
LOW⚡	crawl4ai/browser_profiler.py	1303	# Check if the process exists	COMMENT
LOW	crawl4ai/browser_profiler.py	114	# Check if item matches any keep pattern	COMMENT
LOW	crawl4ai/browser_profiler.py	239	# Check if browser process ended	COMMENT
LOW	crawl4ai/browser_profiler.py	297	# Check if browser process ended	COMMENT
LOW	crawl4ai/browser_profiler.py	371	# Check if browser process ended	COMMENT
LOW	crawl4ai/browser_profiler.py	658	# Check if this looks like a valid browser profile	COMMENT
LOW	crawl4ai/browser_profiler.py	710	# Check if path exists and is a valid profile	COMMENT
LOW	crawl4ai/browser_profiler.py	759	# Check if path exists and is a valid profile	COMMENT
LOW	crawl4ai/browser_profiler.py	1115	# Check if browser started successfully	COMMENT
LOW	crawl4ai/browser_profiler.py	1194	# Check if there's an existing browser still running	COMMENT
LOW	crawl4ai/browser_profiler.py	1217	# Check if browser started successfully	COMMENT
LOW	crawl4ai/docker_client.py	94	# Check if hooks are already strings or need conversion	COMMENT
LOW	crawl4ai/async_crawler_strategy.py	463	# Check if browser processing is required for file:// or raw: URLs	STRING
LOW	crawl4ai/async_crawler_strategy.py	727	# Check if this is a file:// or raw: URL that needs set_content() instead of goto()	STRING
LOW	crawl4ai/async_crawler_strategy.py	1788	# Check if viewport-only screenshot is forced	STRING
LOW	crawl4ai/model_loader.py	195	# Check if the model directory already exists	COMMENT
LOW	crawl4ai/table_extraction.py	110	# Check if this is a data table (not a layout table)	COMMENT
LOW	crawl4ai/table_extraction.py	760	# Check if there are any tables in the content	COMMENT
LOW	crawl4ai/table_extraction.py	769	# Check if chunking is needed	COMMENT
LOW	crawl4ai/table_extraction.py	852	# Check if we got valid tables	COMMENT
LOW	crawl4ai/table_extraction.py	1024	# Check if adding this row would exceed threshold	COMMENT
LOW	crawl4ai/async_configs.py	2267	# Check if given provider starts with any of key in PROVIDER_MODELS_PREFIXES	COMMENT
LOW	crawl4ai/content_filter_strategy.py	460	# Check if body is present	COMMENT
LOW	crawl4ai/browser_manager.py	1713	# Check if browser recycle threshold is hit — bump version for next requests	COMMENT
LOW	crawl4ai/browser_manager.py	1775	# Check if this signature belongs to an old browser waiting to be cleaned up	COMMENT
LOW	crawl4ai/browser_manager.py	1939	# Check if any signatures from this old version remain	COMMENT
LOW	crawl4ai/browser_manager.py	1337	# Check if there is value for crawlerRunConfig.proxy_config set add that to context	STRING
LOW	crawl4ai/async_webcrawler.py	506	# Check if blocked (skip for raw: URLs —	COMMENT
LOW	crawl4ai/deep_crawling/bfs_strategy.py	239	# Check if we've already reached max_pages before starting a new level	COMMENT
LOW	crawl4ai/deep_crawling/bfs_strategy.py	357	# Check if we've reached the limit during batch processing	COMMENT
LOW	crawl4ai/deep_crawling/filters.py	170	# Check if it's a regex pattern	COMMENT
LOW	crawl4ai/deep_crawling/filters.py	491	# Check if domain matches any allowed domain (including subdomains)	COMMENT
91 more matches not shown…

Verbosity Indicators107 hits · 176 pts

Severity	File	Line	Snippet	Context
LOW	crawl4ai/cache_validator.py	112	# Step 1: Try HEAD request with conditional headers	COMMENT
LOW	crawl4ai/cache_validator.py	156	# Step 2: No conditional headers available, try fingerprint only	COMMENT
LOW	crawl4ai/cache_validator.py	180	# Step 3: No validation data available	COMMENT
LOW⚡	crawl4ai/async_url_seeder.py	914	# Step 1: Find sitemap URL and get lastmod (needed for validation)	COMMENT
LOW⚡	crawl4ai/async_url_seeder.py	938	# Step 2: Check cache validity (skip if force=True)	COMMENT
LOW⚡	crawl4ai/async_url_seeder.py	993	# Step 4: Write to cache (FALLBACK: if write fails, URLs still yielded above)	COMMENT
LOW	crawl4ai/async_url_seeder.py	952	# Step 3: Fetch fresh URLs	COMMENT
LOW⚡	crawl4ai/cloud/cli.py	253	# Step 1: Shrink (unless --no-shrink)	COMMENT
LOW⚡	crawl4ai/cloud/cli.py	266	# Step 2: Package as tar.gz	COMMENT
LOW	crawl4ai/cloud/cli.py	281	# Step 3: Upload	COMMENT
LOW⚡	tests/test_webhook_feature.sh	104	# Step 1: Save current branch and fetch PR	COMMENT
LOW⚡	tests/test_webhook_feature.sh	112	# Step 2: Switch to new branch	COMMENT
LOW⚡	tests/test_webhook_feature.sh	117	# Step 3: Activate virtual environment	COMMENT
LOW⚡	tests/test_webhook_feature.sh	128	# Step 4: Install server dependencies	COMMENT
LOW	tests/test_webhook_feature.sh	147	# Step 5: Start Redis in background	COMMENT
LOW	tests/test_webhook_feature.sh	183	# Step 6: Create and run webhook test	COMMENT
LOW	tests/test_webhook_feature.sh	292	# Step 7: Verify results	COMMENT
LOW	tests/test_webhook_feature.sh	303	# Step 8: Cleanup happens automatically via trap	COMMENT
LOW⚡	tests/test_pyopenssl_update.py	141	# Step 1: Check versions	COMMENT
LOW⚡	tests/test_pyopenssl_update.py	147	# Step 2: Test basic crawling	COMMENT
LOW⚡	tests/test_pyopenssl_update.py	153	# Step 3: Test stealth mode	COMMENT
LOW⚡	tests/proxy/test_proxy_verify.py	79	# Step 1: Verify IPs	COMMENT
LOW⚡	tests/proxy/test_proxy_verify.py	86	# Step 2: Get NST proxies	COMMENT
LOW	tests/proxy/test_proxy_verify.py	97	# Step 3: Test Chanel with all available proxies	COMMENT
LOW⚡	tests/general/test_async_url_seeder_bm25.py	558	# Step 1: Discover and score URLs	COMMENT
LOW⚡	tests/general/test_async_url_seeder_bm25.py	587	# Step 3: Verify these URLs would be good for actual crawling	COMMENT
LOW	tests/general/test_async_url_seeder_bm25.py	573	# Step 2: Analyze top results	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	608	# Step 1: open all sessions	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	615	# Step 2: navigate each session to a second page	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	620	# Step 3: kill sessions one by one, verify others unaffected	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	936	# Step 1: open session	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	943	# Step 2: concurrent non-session crawls	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	952	# Step 3: kill session	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	955	# Step 4: trigger recycle	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	962	# Step 5: new session on fresh browser	COMMENT
LOW⚡	tests/async/test_browser_lifecycle.py	970	# Step 6: verify it works	COMMENT
LOW⚡	tests/async/test_browser_memory.py	774	# Step 1: login — sets cookie	COMMENT
LOW⚡	tests/async/test_browser_memory.py	779	# Step 2: dashboard — cookie should carry over via session	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	52	# Step 1: Create a BrowserManager with builtin mode	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	57	# Step 2: Check if we have a BuiltinBrowserStrategy	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	69	# Step 3: Start the manager to launch or connect to builtin browser	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	78	# Step 4: Get browser info from the strategy	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	122	# Step 2: Get multiple pages	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	149	# Step 1: Get browser status	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	160	# Step 2: Test killing the browser	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	172	# Step 3: Check status after kill	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	184	# Step 4: Launch a new browser	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	206	# Step 1: Create first manager	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	211	# Step 2: Create second manager	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	216	# Step 3: Start both managers (should connect to the same builtin browser)	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	241	# Step 4: Test using both managers	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	263	# Step 5: Close both managers	COMMENT
LOW⚡	tests/browser/test_builtin_browser.py	309	# Step 2: Test killing the browser while manager is active	COMMENT
LOW	tests/browser/test_builtin_browser.py	103	# Step 1: Get a single page	COMMENT
LOW	tests/browser/test_builtin_browser.py	282	# Step 1: Test multiple starts with the same manager	COMMENT
LOW	tests/browser/test_builtin_browser.py	472	# Step 1: Create and start multiple browser managers in parallel	COMMENT
LOW	tests/browser/test_builtin_browser.py	666	# Step 1: Create and start multiple browser managers in parallel	COMMENT
LOW⚡	tests/async_assistant/test_extract_pipeline.py	62	# Step 1: Starting	COMMENT
LOW⚡	tests/async_assistant/test_extract_pipeline.py	65	# Step 2: Quick crawl for analysis	COMMENT
LOW	tests/async_assistant/test_extract_pipeline.py	79	# Step 3: HTML Skimming using lxml	COMMENT
47 more matches not shown…

Magic Placeholder Names15 hits · 80 pts

Severity	File	Line	Snippet	Context
HIGH	deploy/docker/c4ai-doc-context.md	1105	llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY")	STRING
HIGH	docs/md_v2/complete-sdk-reference.md	2928	llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY")	STRING
HIGH⚡	docs/md_v2/core/adaptive-crawling.md	119	api_token='your-api-key'	CODE
HIGH⚡	docs/md_v2/core/adaptive-crawling.md	124	api_token='your-api-key'	CODE
HIGH⚡	docs/md_v2/core/adaptive-crawling.md	133	'api_token': 'your-api-key'	CODE
HIGH⚡	docs/md_v2/core/adaptive-crawling.md	137	'api_token': 'your-api-key'	CODE
HIGH	docs/md_v2/core/table_extraction.md	141	api_token="your_api_key",	CODE
HIGH	docs/md_v2/core/content-selection.md	305	llm_config = LLMConfig(provider="openai/gpt-4",api_token="sk-YOUR_API_KEY")	CODE
HIGH⚡	docs/md_v2/marketplace/frontend/app-detail.js	181	api_token="your-api-key",	CODE
HIGH	docs/md_v2/marketplace/frontend/app-detail.html	167	api_token="your-api-key",	CODE
HIGH	docs/md_v2/blog/releases/0.5.0.md	409	llm_config = LLMConfig(provider="openai/gpt-4o", api_token="YOUR_API_KEY")	CODE
HIGH	…cs/examples/url_seeder/bbc_sport_research_assistant.py	21	- export GEMINI_API_KEY="your-api-key"	STRING
HIGH	docs/examples/website-to-api/README.md	68	"api_token": "your-api-key-here"	CODE
HIGH	docs/examples/website-to-api/README.md	160	"api_token": "your-api-key-here"	CODE
HIGH	docs/examples/website-to-api/README.md	204	api_token="your-api-key"	CODE

Fake / Example Data53 hits · 52 pts

Severity	File	Line	Snippet	Context
LOW	crawl4ai/docker_client.py	212	await client.authenticate("user@example.com")	CODE
LOW	crawl4ai/js_snippet/update_image_dimensions.js	15	if (img.src.includes("placeholder") \|\| img.src.includes("icon")) return false;	CODE
LOW	deploy/docker/WEBHOOK_EXAMPLES.md	206	"author": "John Doe",	CODE
LOW	deploy/docker/README.md	473	# await client.authenticate("user@example.com") # See Server Configuration section	COMMENT
LOW	deploy/docker/README.md	823	"author": "John Doe",	CODE
LOW	deploy/docker/c4ai-doc-context.md	2784	"email": "user@example.com"	CODE
LOW	deploy/docker/c4ai-doc-context.md	2791	"email": "user@example.com",	CODE
LOW	deploy/docker/c4ai-doc-context.md	2809	await client.authenticate("user@example.com")	CODE
LOW	deploy/docker/c4ai-doc-context.md	3549	["John Doe", "34", "New York"],	CODE
LOW	tests/memory/test_stress_sdk.py	57	self.lorem_words = " ".join("lorem ipsum dolor sit amet " * 100).split()	CODE
LOW	tests/memory/test_stress_sdk.py	57	self.lorem_words = " ".join("lorem ipsum dolor sit amet " * 100).split()	CODE
LOW⚡	tests/general/test_generate_schema_usage.py	87	api_token: str = "fake-token"	STRING
LOW⚡	tests/async/test_browser_memory.py	37	<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.	CODE
LOW⚡	tests/async/test_browser_memory.py	37	<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.	CODE
LOW	tests/regression/test_reg_extraction.py	363	assert any("sales@example.org" in v for v in values), (	CODE
LOW	tests/adaptive/test_embedding_strategy.py	141	'api_token': os.getenv('OPENAI_API_KEY', 'dummy-key')	CODE
LOW	tests/adaptive/test_embedding_performance.py	125	'api_token': 'dummy-key'	CODE
LOW	docs/md_v2/core/self-hosting.md	1299	"author": "John Doe",	CODE
LOW	docs/md_v2/core/self-hosting.md	1660	# await client.authenticate("user@example.com") # See Server Configuration section	COMMENT
LOW	docs/md_v2/core/c4a-script.md	174	\| `SET` \| Set input value directly \| `SET \`#email\` "user@example.com"` \|	CODE
LOW	docs/md_v2/core/c4a-script.md	204	TYPE "user@example.com"	CODE
LOW	docs/md_v2/core/c4a-script.md	250	SET `#name` "John Doe"	CODE
LOW	docs/md_v2/core/c4a-script.md	347	TYPE "user@example.com"	STRING
LOW	docs/md_v2/marketplace/README.md	17	python dummy_data.py	CODE
LOW	docs/md_v2/marketplace/README.md	59	Coming soon - for now, edit the database directly or modify `dummy_data.py`	CODE
LOW	docs/md_v2/marketplace/backend/dummy_data.py	154	"Review", "John Doe", ["Playwright Cloud", "Puppeteer Extra"],	STRING
LOW	docs/md_v2/marketplace/backend/dummy_data.py	209	This is a comprehensive article about {title.lower()}. Lorem ipsum dolor sit amet, consectetur adipiscing elit.	CODE
LOW	docs/md_v2/marketplace/backend/dummy_data.py	209	This is a comprehensive article about {title.lower()}. Lorem ipsum dolor sit amet, consectetur adipiscing elit.	CODE
LOW	docs/md_v2/blog/releases/0.7.6.md	105	"author": "John Doe",	CODE
LOW	docs/md_v2/api/c4a-script-reference.md	373	TYPE "user@example.com"	CODE
LOW	docs/md_v2/api/c4a-script-reference.md	397	SETVAR email = "user@example.com"	CODE
LOW	docs/md_v2/api/c4a-script-reference.md	527	SET `#email` "user@example.com"	CODE
LOW⚡	docs/md_v2/api/c4a-script-reference.md	766	TYPE "user@example.com"	CODE
LOW	docs/md_v2/api/c4a-script-reference.md	894	SETVAR email = "user@example.com"	CODE
LOW	docs/md_v2/api/c4a-script-reference.md	951	SET `#name` "John Doe"	CODE
LOW	docs/md_v2/assets/highlight.min.js	317	}),re=["a","abbr","address","article","aside","audio","b","blockquote","body","button","canvas","caption","cite","co	CODE
LOW	docs/md_v2/apps/crawl4ai-assistant/index.html	616	<input type="text" id="userName" name="name" placeholder="John Doe" required>	STRING
LOW⚡	docs/md_v2/apps/c4a-script/server.py	264	TYPE "John Doe"	CODE
LOW	docs/md_v2/apps/c4a-script/README.md	149	TYPE "user@example.com"	CODE
LOW	docs/md_v2/apps/c4a-script/playground/index.html	276	<p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p>	CODE
LOW	docs/md_v2/apps/c4a-script/playground/index.html	276	<p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p>	CODE
LOW⚡	docs/md_v2/apps/c4a-script/assets/app.js	596	script: `# Multi-step form with validation\nCLICK \`a[href="#forms"]\`\nWAIT \`#survey-form\` 2\n\n# Ste	CODE
LOW	docs/releases_review/demo_v0.9.1.py	104	has_author = "John Doe" in fit	STRING
LOW	docs/blog/release-v0.7.6.md	105	"author": "John Doe",	CODE
LOW	docs/examples/docker_config_obj.py	124	await client.authenticate("user@example.com")	CODE
LOW	docs/examples/docker_config_obj.py	193	json={"email": "user@example.com"}	CODE
LOW	docs/examples/c4a_script/generate_script_hello_world.py	28	goal = "Fill in email 'user@example.com', password 'secret123', and submit the form"	STRING
LOW⚡	docs/examples/c4a_script/tutorial/server.py	264	TYPE "John Doe"	CODE
LOW	docs/examples/c4a_script/tutorial/README.md	149	TYPE "user@example.com"	CODE
LOW	docs/examples/c4a_script/tutorial/playground/index.html	276	<p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p>	CODE
LOW	docs/examples/c4a_script/tutorial/playground/index.html	276	<p class="text-preview">Lorem ipsum dolor sit amet, consectetur adipiscing elit...</p>	CODE
LOW⚡	docs/examples/c4a_script/tutorial/assets/app.js	596	script: `# Multi-step form with validation\nCLICK \`a[href="#forms"]\`\nWAIT \`#survey-form\` 2\n\n# Ste	CODE
LOW	docs/examples/website-to-api/static/index.html	89	"author": "John Doe",	CODE

Over-Commented Block62 hits · 52 pts

Severity	File	Line	Snippet	Context
LOW	docker-compose.yml	1	version: '3.8'	COMMENT
LOW	crawl4ai/adaptive_crawler.py	201	embedding_top_k_weight: float = 0.3 # Weight for top-k average in hybrid scoring	COMMENT
LOW	crawl4ai/adaptive_crawler.py	561	# if hasattr(result, 'extracted_content') and result.extracted_content:	COMMENT
LOW	crawl4ai/adaptive_crawler.py	761		COMMENT
LOW	crawl4ai/adaptive_crawler.py	1021		COMMENT
LOW	crawl4ai/adaptive_crawler.py	1041		COMMENT
LOW	crawl4ai/adaptive_crawler.py	1061	# # Top-k average (top 3)	COMMENT
LOW	crawl4ai/models.py	161	# Anti-bot retry/proxy usage stats	COMMENT
LOW	crawl4ai/extraction_strategy.py	241	# self.tokenizer = self.model.tokenizer	COMMENT
LOW	crawl4ai/__init__.py	221	# try:	COMMENT
LOW	crawl4ai/utils.py	1821	# print("Error during completion request:", str(e))	COMMENT
LOW	crawl4ai/utils.py	3181	# for element in tree.xpath('//*[contains(@class, "")]'):	COMMENT
LOW	crawl4ai/utils.py	3201	# if len(elements) > 1:	COMMENT
LOW	crawl4ai/async_crawler_strategy.py	841	)	COMMENT
LOW	crawl4ai/async_crawler_strategy.py	861	# except Error as e:	COMMENT
LOW	crawl4ai/async_crawler_strategy.py	2061	# return {{ success: true, result: script_result }};	COMMENT
LOW	crawl4ai/async_url_seeder.py	1641	# # 5. API endpoints and data files	COMMENT
LOW	crawl4ai/async_url_seeder.py	1661	# '.woff', '.woff2', '.ttf', '.eot', '.otf'	COMMENT
LOW	crawl4ai/html2text/__init__.py	1201	# self.inside_pre = True	COMMENT
LOW	crawl4ai/legacy/crawler_strategy.py	101	self.options.add_argument("--headless")	COMMENT
LOW	deploy/docker/server.py	141		COMMENT
LOW	deploy/docker/c4ai-code-context.md	2001		COMMENT
LOW	deploy/docker/c4ai-code-context.md	5361	dispatch_result: Optional[DispatchResult] = None	COMMENT
LOW	tests/test_webhook_feature.sh	1	#!/bin/bash	COMMENT
LOW	tests/test_llm_simple_url.py	101	# result_default = await crawler.arun(	COMMENT
LOW	tests/test_llm_simple_url.py	121	# print(f" Default headers: {len(default_first['headers'])} columns")	COMMENT
LOW	tests/test_cli_docs.py	21		COMMENT
LOW	tests/docker/test_hooks_utility.py	181	# print("✓ All tests completed successfully!")	COMMENT
LOW	tests/docker/simple_api_test.py	141	# result = self.test_get_endpoint("/schema")	COMMENT
LOW	tests/docker/test_serialization.py	121	# WebScrapingStrategy, LXMLWebScrapingStrategy	COMMENT
LOW	tests/docker/test_serialization.py	141	# print("\nSerialized Config:")	COMMENT
LOW	tests/docker/test_serialization.py	161	# "language": "english"	COMMENT
LOW	tests/general/test_async_crawler_strategy.py	241	# async def test_js_return_values(crawler_strategy):	COMMENT
LOW	tests/general/test_async_crawler_strategy.py	281	# nonExistentFunction();	COMMENT
LOW	tests/async/test_error_handling.py	1	# import os	COMMENT
LOW	tests/async/test_error_handling.py	21	# async def cleanup(self):	COMMENT
LOW	tests/async/test_error_handling.py	41	# # # Simulating a timeout by using a very short timeout value	COMMENT
LOW	tests/async/test_error_handling.py	61	# # @pytest.mark.asyncio	COMMENT
LOW	…est_evaluation_scraping_methods_performance.configs.py	281	# "exclude_social_media_links": {	COMMENT
LOW	…est_evaluation_scraping_methods_performance.configs.py	301	# "combo_mode": {	COMMENT
LOW	…est_evaluation_scraping_methods_performance.configs.py	321	# "css_selector": "section#promo-section"	COMMENT
LOW	…est_evaluation_scraping_methods_performance.configs.py	341	# "remove_forms": True	COMMENT
LOW	…est_evaluation_scraping_methods_performance.configs.py	561		COMMENT
LOW	…est_evaluation_scraping_methods_performance.configs.py	581	# if link_diff:	COMMENT
LOW	tests/async/test_chunking_and_extraction_strategies.py	21	result = await crawler.arun(	COMMENT
LOW	tests/async/test_chunking_and_extraction_strategies.py	61	assert len(extracted_data) > 0	COMMENT
LOW	tests/async/test_edge_cases.py	21		COMMENT
LOW	tests/async/test_edge_cases.py	41	# url = "https://news.ycombinator.com/" # Hacker News has infinite scroll	COMMENT
LOW	tests/browser/manager/demo_browser_manager.py	461	start_time = time.time()	COMMENT
LOW	tests/profiler/test_create_profile.py	21		COMMENT
LOW	docs/md_v2/complete-sdk-reference.md	3661	# ❌ Random URLs (site.com/x7f9g2h)	COMMENT
LOW	docs/md_v2/advanced/hooks-auth.md	81	# Example 2: (Optional) Simulate a login scenario	COMMENT
LOW	docs/md_v2/core/self-hosting.md	101	# Anthropic	COMMENT
LOW	docs/md_v2/core/link-media.md	281	# ✅ Clean URL structure (docs.python.org/api/reference)	COMMENT
LOW	docs/md_v2/ask_ai/ask-ai.js	601	// NOTE: Virtual scrolling is complex. For now, we do direct rendering.	COMMENT
LOW	docs/md_v2/assets/llm.txt/txt/cli.txt	21	### Profile Management Commands	COMMENT
LOW	docs/md_v2/assets/llm.txt/txt/llms-full.txt	5681	print(f" Content: {len(http_result.html)} chars")	COMMENT
LOW	…_v2/assets/llm.txt/txt/http_based_crawler_strategy.txt	321		COMMENT
LOW	docs/md_v2/assets/llm.txt/txt/llms-full-v0.1.1.txt	5681	print(f" Content: {len(http_result.html)} chars")	COMMENT
LOW	docs/examples/docker/demo_docker_api.py	1281	# await demo_param_js_execution(client)	COMMENT
2 more matches not shown…

Docstring Block Structure10 hits · 50 pts

Severity	File	Line	Snippet	Context
HIGH	crawl4ai/extraction_strategy.py	1706	Generate extraction schema from HTML content or URL(s) (sync version). Args: html (str, op	STRING
HIGH	crawl4ai/extraction_strategy.py	1778	Generate extraction schema from HTML content or URL(s) (async version). Use this method when calling f	STRING
HIGH	crawl4ai/utils.py	1150	Extracts and cleans content from website HTML, optimizing for useful media and contextual information. Par	STRING
HIGH	crawl4ai/utils.py	3697	Convert hook function objects to string representations for Docker API. This utility simplifies the process of	STRING
HIGH	crawl4ai/async_crawler_strategy.py	301	Wait for a condition in a CSP-compliant way. Args: page: Playwright page object	STRING
HIGH	crawl4ai/async_crawler_strategy.py	439	Crawls a given URL or processes raw HTML/local file content based on the URL prefix. Args:	STRING
HIGH	crawl4ai/async_webcrawler.py	992	Runs the crawler for multiple URLs concurrently using a configurable dispatcher strategy. Args:	STRING
HIGH	crawl4ai/async_webcrawler.py	1134	Discovers, filters, and optionally validates URLs for a given domain(s) using sitemaps and Common Crawl	STRING
HIGH	crawl4ai/script/c4ai_script.py	624	Compile C4A-Script from string or list of strings to JavaScript. Args: script: C4A-Script as a string o	STRING
HIGH	deploy/docker/c4ai-code-context.md	2021	Runs the crawler for multiple URLs concurrently using a configurable dispatcher strategy. Args:	STRING

AI Slop Vocabulary22 hits · 34 pts

Severity	File	Line	Snippet	Context
MEDIUM	crawl4ai/adaptive_crawler.py	1571	"""Print comprehensive statistics about the knowledge base	STRING
MEDIUM	crawl4ai/prompts.py	1174	GENERATE_SCRIPT_PROMPT = r"""You are a world-class browser automation specialist. Your sole purpose is to convert a natu	STRING
LOW⚡	crawl4ai/async_crawler_strategy.py	1657	# Log the error but don't raise it - we'll just return None for the MHTML	STRING
LOW	crawl4ai/async_crawler_strategy.py	341	# For timeout or other cases, just return False	STRING
MEDIUM	crawl4ai/async_url_seeder.py	1139	# Use lxml for XML parsing if available, as it's generally more robust	COMMENT
LOW	crawl4ai/browser_manager.py	183	# If CDP URL provided, just return it	COMMENT
LOW	deploy/docker/server.py	1043	# if no query, just return raw contexts	COMMENT
LOW	tests/test_source_sibling_selector.py	309	# This is actually fine — let's just use "source" with flat fields instead.	COMMENT
MEDIUM	tests/docker/test_hooks_comprehensive.py	521	"""Run comprehensive hook tests"""	STRING
MEDIUM	tests/memory/test_dispatcher_stress.py	269	# First, elevate memory usage to create pressure	COMMENT
MEDIUM	tests/memory/benchmark_report.py	374	"""Generate a comprehensive comparison report of multiple test runs.	STRING
MEDIUM	tests/general/test_mhtml.py	5	import re # For more robust MHTML checks	CODE
MEDIUM	tests/general/test_mhtml.py	54	# 3. Check for MHTML structure indicators (more robust than simple string contains)	COMMENT
MEDIUM⚡	tests/general/test_async_url_seeder_bm25.py	597	"""Generate a comprehensive report of BM25 scoring effectiveness."""	STRING
LOW	…est_evaluation_scraping_methods_performance.configs.py	69	# No <body> found; just return the <html> root	COMMENT
MEDIUM	…est_evaluation_scraping_methods_performance.configs.py	110	# If you prefer ignoring newlines or multiple whitespace, do a more robust cleanup	COMMENT
MEDIUM	…s/md_v2/apps/crawl4ai-assistant/content/click2crawl.js	780	// Try to generate a robust selector	COMMENT
LOW	docs/releases_review/v0.7.5_docker_hooks_demo.py	367	# Use our reusable hook library - just pass the function objects!	STRING
MEDIUM	docs/releases_review/demo_v0.7.7.py	477	"""Print comprehensive demo summary"""	STRING
MEDIUM	docs/releases_review/demo_v0.7.7.py	612	# Print comprehensive summary	COMMENT
MEDIUM	docs/examples/stealth_mode_example.py	510	# Show best practices	COMMENT
LOW	docs/examples/docker_hooks_examples.py	359	# Use our reusable hook library - just pass the function objects!	STRING

AI Structural Patterns29 hits · 22 pts

Severity	File	Line	Context
LOW	crawl4ai/extraction_strategy.py	556	CODE
LOW	crawl4ai/extraction_strategy.py	1692	CODE
LOW	crawl4ai/extraction_strategy.py	1764	CODE
LOW	crawl4ai/async_dispatcher.py	149	CODE
LOW	crawl4ai/table_extraction.py	690	CODE
LOW	crawl4ai/async_configs.py	781	CODE
LOW	crawl4ai/async_configs.py	997	CODE
LOW	crawl4ai/async_configs.py	1158	CODE
LOW	crawl4ai/async_configs.py	1586	CODE
LOW	crawl4ai/async_configs.py	2217	CODE
LOW	crawl4ai/async_configs.py	2343	CODE
LOW	crawl4ai/async_configs.py	2442	CODE
LOW	crawl4ai/content_filter_strategy.py	835	CODE
LOW	crawl4ai/browser_manager.py	135	CODE
LOW	crawl4ai/deep_crawling/bfs_strategy.py	25	CODE
LOW	crawl4ai/deep_crawling/bff_strategy.py	36	CODE
LOW	crawl4ai/legacy/web_crawler.py	56	CODE
LOW	crawl4ai/legacy/web_crawler.py	82	CODE
LOW	crawl4ai/legacy/web_crawler.py	121	CODE
LOW	crawl4ai/components/crawler_monitor.py	512	CODE
LOW	deploy/docker/api.py	416	CODE
LOW	tests/memory/test_stress_sdk.py	181	CODE
LOW	docs/examples/docker/demo_docker_api.py	290	CODE
LOW	docs/examples/docker/demo_docker_api.py	300	CODE
LOW	docs/examples/docker/demo_docker_api.py	322	CODE
LOW	docs/examples/docker/demo_docker_api.py	350	CODE
LOW	docs/examples/docker/demo_docker_api.py	377	CODE
LOW	docs/examples/docker/demo_docker_api.py	403	CODE
LOW	docs/examples/c4a_script/api_usage_examples.py	192	CODE

Modern Structural Boilerplate19 hits · 18 pts

Severity	File	Line	Snippet	Context
LOW	crawl4ai/adaptive_crawler.py	296	async def update_state(self, state: CrawlState, new_results: List[CrawlResult]) -> None:	CODE
LOW	crawl4ai/adaptive_crawler.py	548	async def update_state(self, state: CrawlState, new_results: List[CrawlResult]) -> None:	CODE
LOW	crawl4ai/adaptive_crawler.py	1233	async def update_state(self, state: CrawlState, new_results: List[CrawlResult]) -> None:	CODE
LOW	crawl4ai/__init__.py	114	__all__ = [	CODE
LOW	crawl4ai/hub.py	9	logger = logging.getLogger(__name__)	CODE
LOW	crawl4ai/async_crawler_strategy.py	2531	def set_hook(self, hook_type: str, hook_func: Callable) -> None:	CODE
LOW	crawl4ai/deep_crawling/__init__.py	26	__all__ = [	CODE
LOW	crawl4ai/script/__init__.py	16	__all__ = [	CODE
LOW	crawl4ai/processors/pdf/__init__.py	196	__all__ = ["PDFCrawlerStrategy", "PDFContentScrapingStrategy"]	STRING
LOW	crawl4ai/processors/pdf/processor.py	22	logger = logging.getLogger(__name__)	CODE
LOW	crawl4ai/cloud/__init__.py	12	__all__ = [	CODE
LOW	deploy/docker/monitor_routes.py	11	logger = logging.getLogger(__name__)	CODE
LOW⚡	deploy/docker/server.py	488	logger = logging.getLogger(__name__)	CODE
LOW	deploy/docker/monitor.py	14	logger = logging.getLogger(__name__)	CODE
LOW	deploy/docker/webhook.py	16	logger = logging.getLogger(__name__)	CODE
LOW	deploy/docker/api.py	93	logger = logging.getLogger(__name__)	CODE
LOW	deploy/docker/egress_broker.py	163	def set_egress_proxy(url: Optional[str]) -> None:	CODE
LOW	deploy/docker/crawler_pool.py	9	logger = logging.getLogger(__name__)	CODE
LOW	deploy/docker/work_queue.py	104	def set_job_queue(q: Optional[WorkQueue]) -> None:	CODE

Hallucination Indicators1 hit · 10 pts

Severity	File	Line	Snippet	Context
CRITICAL	docs/md_v2/apps/crawl4ai-assistant/libs/marked.min.js	47	`+s.text,this.inlineQueue.pop(),this.inlineQueue.at(-1).src=r.text):t.push(s);continue}if(e){let r="Infinite loop on byt	CODE

Modern AI Meta-Vocabulary4 hits · 8 pts

Severity	File	Line	Snippet	Context
MEDIUM	crawl4ai/utils.py	3535	# Get embedding model from config or use default	STRING
MEDIUM	tests/adaptive/test_llm_embedding.py	85	# "event-driven architecture patterns"	COMMENT
MEDIUM	docs/md_v2/core/adaptive-crawling.md	169	embedding_llm_config=None, # Use for API-based embeddings (embedding model)	CODE
MEDIUM	docs/examples/adaptive_crawling/llm_config_example.py	85	# "event-driven architecture patterns"	COMMENT

Example Usage Blocks6 hits · 8 pts

Severity	File	Line	Snippet	Context
LOW	crawl4ai/user_agent_generator.py	417	# Example usage:	COMMENT
LOW	crawl4ai/user_agent_generator.py	420	# Usage example:	COMMENT
LOW	crawl4ai/browser_profiler.py	1379	# Example usage	COMMENT
LOW	crawl4ai/docker_client.py	209	# Example usage	COMMENT
LOW	crawl4ai/processors/pdf/processor.py	456	# Usage example	COMMENT
LOW	tests/profiler/test_create_profile.py	6	# Example usage	COMMENT

Dead Code4 hits · 8 pts

Severity	File	Line	Context
MEDIUM	crawl4ai/deep_crawling/crazy.py	96	CODE
MEDIUM	crawl4ai/legacy/web_crawler.py	80	CODE
MEDIUM	deploy/docker/webhook.py	183	CODE
MEDIUM	deploy/docker/webhook.py	186	CODE

Slop Phrases4 hits · 6 pts

Severity	File	Line	Snippet	Context
LOW	crawl4ai/models.py	327	# When removing this code in the future, make sure to:	COMMENT
MEDIUM	crawl4ai/table_extraction.py	1268	This is a basic implementation - for complex CSS selectors,	STRING
MEDIUM	crawl4ai/processors/pdf/__init__.py	137	# For simple cases, you can use the sync version	STRING
MEDIUM	docs/examples/undetected_simple_demo.py	88	# Test URLs - you can change these	COMMENT

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings