DataTalksClub/data-engineering-zoomcamp

11.1

Adjusted Score

11.1

Raw Score

100%

Time Factor

2026-06-10

Last Push

43.6K

Stars

Jupyter Notebook

Language

31.9K

Lines of Code

316

Files

177

Pattern Hits

2026-07-14

Scan Date

0.08

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 0HIGH 24MEDIUM 27LOW 126

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 177 distinct pattern matches across 14 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Cross-File Repetition23 hits · 115 pts

Severity	File	Snippet	Context
HIGH	…s/2023/week_6_stream_processing/streaming_confluent.py	take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema	STRING
HIGH	…ing/extras/python/streams-example/pyspark/streaming.py	take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema	STRING
HIGH	…ng/extras/python/streams-example/redpanda/streaming.py	take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema	STRING
HIGH	07-streaming/workshop/README.md	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_	STRING
HIGH	07-streaming/workshop/live/src/job/pass_through_job.py	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_	STRING
HIGH	07-streaming/workshop/src/job/pass_through_job.py	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_	STRING
HIGH	07-streaming/workshop/README.md	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku	STRING
HIGH	07-streaming/workshop/live/src/job/pass_through_job.py	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku	STRING
HIGH	07-streaming/workshop/src/job/pass_through_job.py	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku	STRING
HIGH	07-streaming/workshop/README.md	insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup	STRING
HIGH	07-streaming/workshop/live/src/job/pass_through_job.py	insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup	STRING
HIGH	07-streaming/workshop/src/job/pass_through_job.py	insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup	STRING
HIGH	07-streaming/workshop/README.md	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_	STRING
HIGH	07-streaming/workshop/live/src/job/aggregation_job.py	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_	STRING
HIGH	07-streaming/workshop/src/job/aggregation_job.py	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_	STRING
HIGH	07-streaming/workshop/src/job/aggregation_job_demo.py	create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_	STRING
HIGH	07-streaming/workshop/README.md	create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary	STRING
HIGH	07-streaming/workshop/live/src/job/aggregation_job.py	create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary	STRING
HIGH	07-streaming/workshop/src/job/aggregation_job.py	create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary	STRING
HIGH	07-streaming/workshop/src/job/aggregation_job_demo.py	create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary	STRING
HIGH	07-streaming/workshop/README.md	insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve	STRING
HIGH	07-streaming/workshop/live/src/job/aggregation_job.py	insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve	STRING
HIGH	07-streaming/workshop/src/job/aggregation_job.py	insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve	STRING

Structural Annotation Overuse39 hits · 65 pts

Severity	File	Line	Snippet	Context
LOW	04-analytics-engineering/setup/local_setup.md	22	## Step 1: Install DuckDB	COMMENT
LOW	04-analytics-engineering/setup/local_setup.md	29	## Step 2: Install dbt	COMMENT
LOW	04-analytics-engineering/setup/local_setup.md	40	## Step 3: Configure dbt Profile	COMMENT
LOW	04-analytics-engineering/setup/local_setup.md	82	## Step 4: Download and Ingest Data	COMMENT
LOW	04-analytics-engineering/setup/local_setup.md	161	## Step 5: Test the dbt Connection	STRING
LOW	04-analytics-engineering/setup/local_setup.md	169	## Step 6: Install dbt Power User Extension (VS Code Users)	STRING
LOW⚡	…-analytics-engineering/setup/duckdb_troubleshooting.md	52	### Step 1: Adjust DuckDB memory settings in `profiles.yml`	COMMENT
LOW⚡	…-analytics-engineering/setup/duckdb_troubleshooting.md	60	### Step 2: Use `dbt retry` after a failure	COMMENT
LOW⚡	…-analytics-engineering/setup/duckdb_troubleshooting.md	70	### Step 3: Build models selectively with `--select`	COMMENT
LOW	…-analytics-engineering/setup/duckdb_troubleshooting.md	84	### Step 4: Leverage incremental models	COMMENT
LOW	04-analytics-engineering/setup/cloud_setup.md	20	## Step 1: Verify Your BigQuery Setup	COMMENT
LOW	04-analytics-engineering/setup/cloud_setup.md	108	## Step 2: Sign Up for dbt Platform	COMMENT
LOW	04-analytics-engineering/setup/cloud_setup.md	112	## Step 3: Create a New dbt Project	COMMENT
LOW	04-analytics-engineering/setup/cloud_setup.md	123	## Step 4: Configure BigQuery Connection	COMMENT
LOW	04-analytics-engineering/setup/cloud_setup.md	170	## Step 5: Set Up Your Repository	COMMENT
LOW	04-analytics-engineering/setup/cloud_setup.md	179	## Step 6: Verify Your Development Environment	COMMENT
LOW	04-analytics-engineering/setup/cloud_setup.md	219	## Step 7: Start Developing	COMMENT
LOW	…2022/week_3_data_warehouse/airflow/docker-compose.yaml	21	# WARNING: This configuration is for local development. Do not use it in a production deployment.	COMMENT
LOW	…2022/week_2_data_ingestion/airflow/docker-compose.yaml	21	# WARNING: This configuration is for local development. Do not use it in a production deployment.	COMMENT
LOW	…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml	21	# WARNING: This configuration is for local development. Do not use it in a production deployment.	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	82	# Step 1: Consume GREEN_TAXI_TOPIC and FHV_TAXI_TOPIC	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	86	# Step 2: Publish green and fhv rides to RIDES_TOPIC	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	90	# Step 3: Read RIDES_TOPIC and parse it in ALL_RIDE_SCHEMA	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	94	# Step 4: Apply Aggregation on the all_rides	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	98	# Step 5: Sink Aggregation Streams to Console	COMMENT
LOW	cohorts/2026/workshops/dlt/README.md	69	### Step 1: Create a New Project Folder	COMMENT
LOW	cohorts/2026/workshops/dlt/README.md	78	### Step 2: Add the dlt MCP Server Config	COMMENT
LOW⚡	cohorts/2026/workshops/dlt/README.md	134	### Step 3: Install dlt Workspace	COMMENT
LOW⚡	cohorts/2026/workshops/dlt/README.md	140	### Step 4: Initialize the dlt Project	COMMENT
LOW⚡	cohorts/2026/workshops/dlt/README.md	150	### Step 5: Prompt the Agent to Build and Run the Pipeline	COMMENT
LOW	cohorts/2026/workshops/dlt/README.md	171	### Step 6: Debug with the Agent	COMMENT
LOW	cohorts/2026/workshops/dlt/README.md	175	### Step 7: Inspect Pipeline Data with the dlt Dashboard	COMMENT
LOW	cohorts/2026/workshops/dlt/README.md	191	### Step 8: Inspect the Pipeline via Chat	COMMENT
LOW	cohorts/2026/workshops/dlt/dlt_homework.md	30	### Step 1: Create a New Project (or Reuse Your Demo Project)	COMMENT
LOW	cohorts/2026/workshops/dlt/dlt_homework.md	41	### Step 2: Set Up the dlt MCP Server (If Not Already Done)	COMMENT
LOW	cohorts/2026/workshops/dlt/dlt_homework.md	97	### Step 3: Install dlt	COMMENT
LOW	cohorts/2026/workshops/dlt/dlt_homework.md	103	### Step 4: Initialize the Project	COMMENT
LOW	cohorts/2026/workshops/dlt/dlt_homework.md	115	### Step 5: Prompt the Agent	COMMENT
LOW	cohorts/2026/workshops/dlt/dlt_homework.md	133	### Step 6: Run and Debug	COMMENT

Modern AI Meta-Vocabulary14 hits · 44 pts

Severity	File	Line	Snippet	Context
MEDIUM	README.md	94	### [Module 2: Workflow Orchestration](02-workflow-orchestration/)	COMMENT
MEDIUM	awesome-data-engineering.md	53	### Workflow orchestration	COMMENT
MEDIUM	02-workflow-orchestration/README.md	14	- [2.1 - Introduction to Workflow Orchestration](#21-introduction-to-workflow-orchestration)	CODE
MEDIUM⚡	02-workflow-orchestration/README.md	511	### 2.5.4 Bonus: Retrieval Augmented Generation (RAG)	COMMENT
MEDIUM⚡	02-workflow-orchestration/README.md	515	#### What is RAG?	COMMENT
MEDIUM⚡	02-workflow-orchestration/README.md	524	#### How RAG Works in Kestra	COMMENT
MEDIUM	02-workflow-orchestration/README.md	571	#### RAG Best Practices	COMMENT
MEDIUM	02-workflow-orchestration/README.md	665	* 2024: [notes](../cohorts/2024/02-workflow-orchestration#community-notes) and [videos](../cohorts/2024/02-workflow-orch	COMMENT
MEDIUM	02-workflow-orchestration/README.md	666	* 2025: [notes](../cohorts/2025/02-workflow-orchestration/README.md#community-notes) and [videos](../cohorts/2025/02-wor	COMMENT
MEDIUM	cohorts/2022/week_2_data_ingestion/README.md	14	### Introduction to Workflow orchestration	COMMENT
MEDIUM	cohorts/2025/02-workflow-orchestration/README.md	450	* 2024: [notes](../cohorts/2024/02-workflow-orchestration#community-notes) and [videos](../cohorts/2024/02-workflow-orch	COMMENT
MEDIUM	cohorts/2024/02-workflow-orchestration/README.md	14	* [2.2.1 - 📯 Intro to Orchestration](#221----intro-to-orchestration)	COMMENT
MEDIUM	cohorts/2023/week_2_workflow_orchestration/README.md	16	### 1. Introduction to Workflow orchestration	COMMENT
MEDIUM	cohorts/2026/workshops/dlt/README.md	47	\| [Windsurf](https://codeium.com/windsurf) \| Alternative agentic IDE \|	CODE

Hyper-Verbose Identifiers27 hits · 22 pts

Severity	File	Line	Snippet	Context
LOW	02-workflow-orchestration/flows/02_python.yaml	18	def get_docker_image_downloads(image_name: str = "kestra/kestra"):	CODE
LOW	04-analytics-engineering/setup/local_setup.md	93	def download_and_convert_files(taxi_type):	CODE
LOW	cohorts/2022/week_2_data_ingestion/homework/solution.py	45	def donwload_parquetize_upload_dag(	CODE
LOW	…-data-warehouse/extras/web_to_gcs_with_progress_bar.py	52	def csv_to_parquet_with_progress(	CODE
LOW	…-data-warehouse/extras/web_to_gcs_with_progress_bar.py	115	def upload_to_gcs_with_progress(bucket: str, object_name: str, local_file: str):	CODE
LOW	07-streaming/workshop/README.md	799	def create_events_source_kafka(t_env):	CODE
LOW	07-streaming/workshop/README.md	834	def create_processed_events_sink_postgres(t_env):	STRING
LOW	07-streaming/workshop/README.md	1042	def create_events_source_kafka(t_env):	STRING
LOW	07-streaming/workshop/README.md	1066	def create_events_aggregated_sink(t_env):	STRING
LOW	07-streaming/workshop/live/src/job/aggregation_job.py	5	def create_events_source_kafka(t_env):	CODE
LOW	07-streaming/workshop/live/src/job/aggregation_job.py	29	def create_events_aggregated_sink(t_env):	STRING
LOW	07-streaming/workshop/live/src/job/pass_through_job.py	6	def create_events_source_kafka(t_env):	CODE
LOW	07-streaming/workshop/live/src/job/pass_through_job.py	28	def create_processed_events_sink_postgres(t_env):	STRING
LOW	07-streaming/workshop/src/job/aggregation_job.py	5	def create_events_aggregated_sink(t_env):	CODE
LOW	07-streaming/workshop/src/job/aggregation_job.py	26	def create_events_source_kafka(t_env):	STRING
LOW	07-streaming/workshop/src/job/pass_through_job.py	5	def create_processed_events_sink_postgres(t_env):	CODE
LOW	07-streaming/workshop/src/job/pass_through_job.py	27	def create_events_source_kafka(t_env):	STRING
LOW	07-streaming/workshop/src/job/aggregation_job_demo.py	14	def create_events_source_kafka(t_env):	CODE
LOW	07-streaming/workshop/src/job/aggregation_job_demo.py	38	def create_events_aggregated_sink(t_env):	STRING
LOW	…ing/extras/python/streams-example/pyspark/streaming.py	20	def parse_ride_from_kafka_message(df, schema):	CODE
LOW	…ng/extras/python/streams-example/redpanda/streaming.py	20	def parse_ride_from_kafka_message(df, schema):	CODE
LOW	07-streaming/extras/pyflink/src/job/aggregation_job.py	6	def create_events_aggregated_sink(t_env):	CODE
LOW	07-streaming/extras/pyflink/src/job/aggregation_job.py	26	def create_events_source_kafka(t_env):	STRING
LOW	07-streaming/extras/pyflink/src/job/start_job.py	5	def create_processed_events_sink_postgres(t_env):	CODE
LOW	07-streaming/extras/pyflink/src/job/start_job.py	24	def create_events_source_kafka(t_env):	STRING
LOW	07-streaming/extras/pyflink/src/job/taxi_job.py	5	def create_taxi_events_sink_postgres(t_env):	CODE
LOW	07-streaming/extras/pyflink/src/job/taxi_job.py	42	def create_events_source_kafka(t_env):	STRING

Slop Phrases6 hits · 18 pts

Severity	File	Line	Snippet	Context
MEDIUM	…2022/week_3_data_warehouse/airflow/docker-compose.yaml	39	# Feel free to modify this file to suit your needs.	COMMENT
MEDIUM	…2022/week_3_data_warehouse/airflow/docker-compose.yaml	44	# In order to add custom dependencies or upgrade provider packages you can use your extended image.	COMMENT
MEDIUM	…2022/week_2_data_ingestion/airflow/docker-compose.yaml	39	# Feel free to modify this file to suit your needs.	COMMENT
MEDIUM	…2022/week_2_data_ingestion/airflow/docker-compose.yaml	44	# In order to add custom dependencies or upgrade provider packages you can use your extended image.	COMMENT
MEDIUM	…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml	39	# Feel free to modify this file to suit your needs.	COMMENT
MEDIUM	…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml	44	# In order to add custom dependencies or upgrade provider packages you can use your extended image.	COMMENT

Unused Imports16 hits · 16 pts

Severity	File	Line	Context
LOW	…22/week_3_data_warehouse/airflow/dags/gcs_to_bq_dag.py	2	CODE
LOW	…k_2_data_ingestion/airflow/dags_local/ingest_script.py	1	CODE
LOW	…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py	2	CODE
LOW	…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py	7	CODE
LOW	…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py	8	CODE
LOW	cohorts/2025/workshops/dynamic_load_dlt.py	1	CODE
LOW	06-batch/code/06_spark_sql.py	6	CODE
LOW	06-batch/code/06_spark_sql_big_query.py	6	CODE
LOW	07-streaming/workshop/src/producers/producer.py	11	CODE
LOW	07-streaming/extras/python/redpanda_example/consumer.py	1	CODE
LOW	07-streaming/extras/pyflink/src/job/aggregation_job.py	2	CODE
LOW	07-streaming/extras/pyflink/src/job/aggregation_job.py	2	CODE
LOW	07-streaming/extras/pyflink/src/job/start_job.py	2	CODE
LOW	07-streaming/extras/pyflink/src/job/start_job.py	2	CODE
LOW	07-streaming/extras/pyflink/src/job/taxi_job.py	2	CODE
LOW	07-streaming/extras/pyflink/src/job/taxi_job.py	2	CODE

Over-Commented Block14 hits · 14 pts

Severity	File	Line	Snippet	Context
LOW	…2022/week_3_data_warehouse/airflow/docker-compose.yaml	1	# Licensed to the Apache Software Foundation (ASF) under one	COMMENT
LOW	…2022/week_3_data_warehouse/airflow/docker-compose.yaml	21	# WARNING: This configuration is for local development. Do not use it in a production deployment.	COMMENT
LOW	…2022/week_3_data_warehouse/airflow/docker-compose.yaml	141	# healthcheck:	COMMENT
LOW	…2022/week_3_data_warehouse/airflow/docker-compose.yaml	161	# command: triggerer	COMMENT
LOW	…2022/week_3_data_warehouse/airflow/docker-compose.yaml	261	- airflow	COMMENT
LOW	…2022/week_2_data_ingestion/airflow/docker-compose.yaml	1	# Licensed to the Apache Software Foundation (ASF) under one	COMMENT
LOW	…2022/week_2_data_ingestion/airflow/docker-compose.yaml	21	# WARNING: This configuration is for local development. Do not use it in a production deployment.	COMMENT
LOW	…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml	1	# Licensed to the Apache Software Foundation (ASF) under one	COMMENT
LOW	…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml	21	# WARNING: This configuration is for local development. Do not use it in a production deployment.	COMMENT
LOW	…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py	21	"depends_on_past": False,	COMMENT
LOW	…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py	61		COMMENT
LOW	03-data-warehouse/extras/web_to_gcs.py	101	# web_to_gcs("2020", "yellow")	COMMENT
LOW	07-streaming/workshop/live/README.md	1	# streaming-workshop	COMMENT
LOW	…ing/extras/python/redpanda_example/docker-compose.yaml	41	# command:	COMMENT

Excessive Try-Catch Wrapping17 hits · 13 pts

Severity	File	Line	Snippet	Context
LOW	cohorts/2025/workshops/dynamic_load_dlt.py	107	except Exception as e:	CODE
LOW	cohorts/2025/03-data-warehouse/load_yellow_taxi_data.py	40	except Exception as e:	CODE
LOW	cohorts/2025/03-data-warehouse/load_yellow_taxi_data.py	96	except Exception as e:	CODE
LOW	…ts/2023/week_6_stream_processing/producer_confluent.py	50	except Exception as e:	CODE
LOW	cohorts/2026/03-data-warehouse/load_yellow_taxi_data.py	40	except Exception as e:	CODE
LOW	cohorts/2026/03-data-warehouse/load_yellow_taxi_data.py	96	except Exception as e:	CODE
LOW	07-streaming/workshop/README.md	1114	except Exception as e:	STRING
LOW	07-streaming/workshop/live/src/job/aggregation_job.py	77	except Exception as e:	STRING
LOW	07-streaming/workshop/src/job/aggregation_job.py	79	except Exception as e:	STRING
LOW	07-streaming/workshop/src/job/pass_through_job.py	74	except Exception as e:	STRING
LOW	07-streaming/workshop/src/job/aggregation_job_demo.py	88	except Exception as e:	STRING
LOW	07-streaming/extras/python/avro_example/producer.py	69	except Exception as e:	CODE
LOW	…ming/extras/python/streams-example/pyspark/producer.py	46	except Exception as e:	CODE
LOW	…ing/extras/python/streams-example/redpanda/producer.py	46	except Exception as e:	CODE
LOW	07-streaming/extras/pyflink/src/job/aggregation_job.py	85	except Exception as e:	STRING
LOW	07-streaming/extras/pyflink/src/job/start_job.py	69	except Exception as e:	STRING
LOW	07-streaming/extras/pyflink/src/job/taxi_job.py	104	except Exception as e:	STRING

Decorative Section Separators4 hits · 12 pts

Severity	File	Line	Snippet	Context
MEDIUM	…2022/week_3_data_warehouse/airflow/2_setup_nofrills.md	113	# -----------------------------------	COMMENT
MEDIUM	…2022/week_3_data_warehouse/airflow/1_setup_official.md	113	# -----------------------------------	COMMENT
MEDIUM	…2022/week_2_data_ingestion/airflow/2_setup_nofrills.md	112	# -----------------------------------	COMMENT
MEDIUM	…2022/week_2_data_ingestion/airflow/1_setup_official.md	111	# -----------------------------------	COMMENT

Verbosity Indicators5 hits · 11 pts

Severity	File	Line	Snippet	Context
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	82	# Step 1: Consume GREEN_TAXI_TOPIC and FHV_TAXI_TOPIC	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	86	# Step 2: Publish green and fhv rides to RIDES_TOPIC	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	90	# Step 3: Read RIDES_TOPIC and parse it in ALL_RIDE_SCHEMA	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	94	# Step 4: Apply Aggregation on the all_rides	COMMENT
LOW⚡	…s/2023/week_6_stream_processing/streaming_confluent.py	98	# Step 5: Sink Aggregation Streams to Console	COMMENT

Self-Referential Comments3 hits · 9 pts

Severity	File	Line	Snippet	Context
MEDIUM	…22/week_3_data_warehouse/airflow/dags/gcs_to_bq_dag.py	70	# Create a partitioned table from external table	COMMENT
MEDIUM	cohorts/2025/workshops/dynamic_load_dlt.py	110	# Create the pipeline	COMMENT
MEDIUM	…reaming/extras/pyflink/src/producers/load_taxi_data.py	6	# Create a Kafka producer	COMMENT

Deep Nesting6 hits · 6 pts

Severity	File	Line	Context
LOW	cohorts/2025/workshops/dynamic_load_dlt.py	94	CODE
LOW	…-data-warehouse/extras/web_to_gcs_with_progress_bar.py	30	CODE
LOW	07-streaming/extras/python/redpanda_example/consumer.py	14	CODE
LOW	07-streaming/extras/python/json_example/consumer.py	13	CODE
LOW	…ming/extras/python/streams-example/pyspark/consumer.py	12	CODE
LOW	…ing/extras/python/streams-example/redpanda/consumer.py	12	CODE

Magic Placeholder Names1 hit · 5 pts

Severity	File	Line	Snippet	Context
HIGH	02-workflow-orchestration/README.md	483	export GEMINI_API_KEY="your-api-key-here"	CODE

Redundant / Tautological Comments2 hits · 3 pts

Severity	File	Line	Snippet	Context
LOW	cohorts/2025/03-data-warehouse/load_yellow_taxi_data.py	50	# Check if the bucket belongs to the current project	COMMENT
LOW	cohorts/2026/03-data-warehouse/load_yellow_taxi_data.py	50	# Check if the bucket belongs to the current project	COMMENT