Repository Analysis

DataTalksClub/data-engineering-zoomcamp

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

7.5 Low AI signal View on GitHub
7.5
Adjusted Score
7.5
Raw Score
100%
Time Factor
2026-05-03
Last Push
41,697
Stars
Jupyter Notebook
Language
31,709
Lines of Code
314
Files
123
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 23MEDIUM 13LOW 87

Pattern Findings

123 matches across 12 categories. Click a row to expand file-level details.

Cross-File Repetition22 hits · 110 pts
SeverityFileLineSnippet
HIGH…s/2023/week_6_stream_processing/streaming_confluent.py0take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema
HIGH…ing/extras/python/streams-example/pyspark/streaming.py0take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema
HIGH…ng/extras/python/streams-example/redpanda/streaming.py0take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema
HIGH07-streaming/workshop/README.md0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_
HIGH07-streaming/workshop/live/src/job/pass_through_job.py0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_
HIGH07-streaming/workshop/src/job/pass_through_job.py0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_
HIGH07-streaming/workshop/README.md0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku
HIGH07-streaming/workshop/live/src/job/pass_through_job.py0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku
HIGH07-streaming/workshop/src/job/pass_through_job.py0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku
HIGH07-streaming/workshop/README.md0insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup
HIGH07-streaming/workshop/live/src/job/pass_through_job.py0insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup
HIGH07-streaming/workshop/src/job/pass_through_job.py0insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup
HIGH07-streaming/workshop/README.md0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_
HIGH07-streaming/workshop/live/src/job/aggregation_job.py0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_
HIGH07-streaming/workshop/src/job/aggregation_job.py0create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_
HIGH07-streaming/workshop/README.md0create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary
HIGH07-streaming/workshop/live/src/job/aggregation_job.py0create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary
HIGH07-streaming/workshop/src/job/aggregation_job.py0create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary
HIGH07-streaming/workshop/src/job/aggregation_job_demo.py0create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary
HIGH07-streaming/workshop/README.md0insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve
HIGH07-streaming/workshop/live/src/job/aggregation_job.py0insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve
HIGH07-streaming/workshop/src/job/aggregation_job.py0insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve
Hyper-Verbose Identifiers27 hits · 22 pts
SeverityFileLineSnippet
LOW02-workflow-orchestration/flows/02_python.yaml18 def get_docker_image_downloads(image_name: str = "kestra/kestra"):
LOW04-analytics-engineering/setup/local_setup.md93def download_and_convert_files(taxi_type):
LOWcohorts/2022/week_2_data_ingestion/homework/solution.py45def donwload_parquetize_upload_dag(
LOW…-data-warehouse/extras/web_to_gcs_with_progress_bar.py52def csv_to_parquet_with_progress(
LOW…-data-warehouse/extras/web_to_gcs_with_progress_bar.py115def upload_to_gcs_with_progress(bucket: str, object_name: str, local_file: str):
LOW07-streaming/workshop/README.md799def create_events_source_kafka(t_env):
LOW07-streaming/workshop/README.md834def create_processed_events_sink_postgres(t_env):
LOW07-streaming/workshop/README.md1042def create_events_source_kafka(t_env):
LOW07-streaming/workshop/README.md1066def create_events_aggregated_sink(t_env):
LOW07-streaming/workshop/live/src/job/aggregation_job.py5def create_events_source_kafka(t_env):
LOW07-streaming/workshop/live/src/job/aggregation_job.py29def create_events_aggregated_sink(t_env):
LOW07-streaming/workshop/live/src/job/pass_through_job.py6def create_events_source_kafka(t_env):
LOW07-streaming/workshop/live/src/job/pass_through_job.py28def create_processed_events_sink_postgres(t_env):
LOW07-streaming/workshop/src/job/aggregation_job.py5def create_events_aggregated_sink(t_env):
LOW07-streaming/workshop/src/job/aggregation_job.py26def create_events_source_kafka(t_env):
LOW07-streaming/workshop/src/job/pass_through_job.py5def create_processed_events_sink_postgres(t_env):
LOW07-streaming/workshop/src/job/pass_through_job.py27def create_events_source_kafka(t_env):
LOW07-streaming/workshop/src/job/aggregation_job_demo.py14def create_events_source_kafka(t_env):
LOW07-streaming/workshop/src/job/aggregation_job_demo.py38def create_events_aggregated_sink(t_env):
LOW…ing/extras/python/streams-example/pyspark/streaming.py20def parse_ride_from_kafka_message(df, schema):
LOW…ng/extras/python/streams-example/redpanda/streaming.py20def parse_ride_from_kafka_message(df, schema):
LOW07-streaming/extras/pyflink/src/job/aggregation_job.py6def create_events_aggregated_sink(t_env):
LOW07-streaming/extras/pyflink/src/job/aggregation_job.py26def create_events_source_kafka(t_env):
LOW07-streaming/extras/pyflink/src/job/start_job.py5def create_processed_events_sink_postgres(t_env):
LOW07-streaming/extras/pyflink/src/job/start_job.py24def create_events_source_kafka(t_env):
LOW07-streaming/extras/pyflink/src/job/taxi_job.py5def create_taxi_events_sink_postgres(t_env):
LOW07-streaming/extras/pyflink/src/job/taxi_job.py42def create_events_source_kafka(t_env):
Slop Phrases6 hits · 18 pts
SeverityFileLineSnippet
MEDIUM…2022/week_3_data_warehouse/airflow/docker-compose.yaml39# Feel free to modify this file to suit your needs.
MEDIUM…2022/week_3_data_warehouse/airflow/docker-compose.yaml44 # In order to add custom dependencies or upgrade provider packages you can use your extended image.
MEDIUM…2022/week_2_data_ingestion/airflow/docker-compose.yaml39# Feel free to modify this file to suit your needs.
MEDIUM…2022/week_2_data_ingestion/airflow/docker-compose.yaml44 # In order to add custom dependencies or upgrade provider packages you can use your extended image.
MEDIUM…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml39# Feel free to modify this file to suit your needs.
MEDIUM…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml44 # In order to add custom dependencies or upgrade provider packages you can use your extended image.
Unused Imports16 hits · 16 pts
SeverityFileLineSnippet
LOW…22/week_3_data_warehouse/airflow/dags/gcs_to_bq_dag.py2
LOW…k_2_data_ingestion/airflow/dags_local/ingest_script.py1
LOW…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py2
LOW…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py7
LOW…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py8
LOWcohorts/2025/workshops/dynamic_load_dlt.py1
LOW06-batch/code/06_spark_sql.py6
LOW06-batch/code/06_spark_sql_big_query.py6
LOW07-streaming/workshop/src/producers/producer.py11
LOW07-streaming/extras/python/redpanda_example/consumer.py1
LOW07-streaming/extras/pyflink/src/job/aggregation_job.py2
LOW07-streaming/extras/pyflink/src/job/aggregation_job.py2
LOW07-streaming/extras/pyflink/src/job/start_job.py2
LOW07-streaming/extras/pyflink/src/job/start_job.py2
LOW07-streaming/extras/pyflink/src/job/taxi_job.py2
LOW07-streaming/extras/pyflink/src/job/taxi_job.py2
Over-Commented Block14 hits · 14 pts
SeverityFileLineSnippet
LOW…2022/week_3_data_warehouse/airflow/docker-compose.yaml1# Licensed to the Apache Software Foundation (ASF) under one
LOW…2022/week_3_data_warehouse/airflow/docker-compose.yaml21# WARNING: This configuration is for local development. Do not use it in a production deployment.
LOW…2022/week_3_data_warehouse/airflow/docker-compose.yaml141# healthcheck:
LOW…2022/week_3_data_warehouse/airflow/docker-compose.yaml161# command: triggerer
LOW…2022/week_3_data_warehouse/airflow/docker-compose.yaml261 - airflow
LOW…2022/week_2_data_ingestion/airflow/docker-compose.yaml1# Licensed to the Apache Software Foundation (ASF) under one
LOW…2022/week_2_data_ingestion/airflow/docker-compose.yaml21# WARNING: This configuration is for local development. Do not use it in a production deployment.
LOW…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml1# Licensed to the Apache Software Foundation (ASF) under one
LOW…eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml21# WARNING: This configuration is for local development. Do not use it in a production deployment.
LOW…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py21 "depends_on_past": False,
LOW…ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py61
LOW03-data-warehouse/extras/web_to_gcs.py101# web_to_gcs("2020", "yellow")
LOW07-streaming/workshop/live/README.md1# streaming-workshop
LOW…ing/extras/python/redpanda_example/docker-compose.yaml41 # command:
Excessive Try-Catch Wrapping17 hits · 13 pts
SeverityFileLineSnippet
LOWcohorts/2025/workshops/dynamic_load_dlt.py107 except Exception as e:
LOWcohorts/2025/03-data-warehouse/load_yellow_taxi_data.py40 except Exception as e:
LOWcohorts/2025/03-data-warehouse/load_yellow_taxi_data.py96 except Exception as e:
LOW…ts/2023/week_6_stream_processing/producer_confluent.py50 except Exception as e:
LOWcohorts/2026/03-data-warehouse/load_yellow_taxi_data.py40 except Exception as e:
LOWcohorts/2026/03-data-warehouse/load_yellow_taxi_data.py96 except Exception as e:
LOW07-streaming/workshop/README.md1114 except Exception as e:
LOW07-streaming/workshop/live/src/job/aggregation_job.py77 except Exception as e:
LOW07-streaming/workshop/src/job/aggregation_job.py79 except Exception as e:
LOW07-streaming/workshop/src/job/pass_through_job.py74 except Exception as e:
LOW07-streaming/workshop/src/job/aggregation_job_demo.py88 except Exception as e:
LOW07-streaming/extras/python/avro_example/producer.py77 except Exception as e:
LOW…ming/extras/python/streams-example/pyspark/producer.py46 except Exception as e:
LOW…ing/extras/python/streams-example/redpanda/producer.py46 except Exception as e:
LOW07-streaming/extras/pyflink/src/job/aggregation_job.py85 except Exception as e:
LOW07-streaming/extras/pyflink/src/job/start_job.py69 except Exception as e:
LOW07-streaming/extras/pyflink/src/job/taxi_job.py104 except Exception as e:
Decorative Section Separators4 hits · 12 pts
SeverityFileLineSnippet
MEDIUM…2022/week_3_data_warehouse/airflow/2_setup_nofrills.md113 # -----------------------------------
MEDIUM…2022/week_3_data_warehouse/airflow/1_setup_official.md113 # -----------------------------------
MEDIUM…2022/week_2_data_ingestion/airflow/2_setup_nofrills.md112 # -----------------------------------
MEDIUM…2022/week_2_data_ingestion/airflow/1_setup_official.md111 # -----------------------------------
Verbosity Indicators5 hits · 11 pts
SeverityFileLineSnippet
LOW…s/2023/week_6_stream_processing/streaming_confluent.py82 # Step 1: Consume GREEN_TAXI_TOPIC and FHV_TAXI_TOPIC
LOW…s/2023/week_6_stream_processing/streaming_confluent.py86 # Step 2: Publish green and fhv rides to RIDES_TOPIC
LOW…s/2023/week_6_stream_processing/streaming_confluent.py90 # Step 3: Read RIDES_TOPIC and parse it in ALL_RIDE_SCHEMA
LOW…s/2023/week_6_stream_processing/streaming_confluent.py94 # Step 4: Apply Aggregation on the all_rides
LOW…s/2023/week_6_stream_processing/streaming_confluent.py98 # Step 5: Sink Aggregation Streams to Console
Self-Referential Comments3 hits · 9 pts
SeverityFileLineSnippet
MEDIUM…22/week_3_data_warehouse/airflow/dags/gcs_to_bq_dag.py70 # Create a partitioned table from external table
MEDIUMcohorts/2025/workshops/dynamic_load_dlt.py110# Create the pipeline
MEDIUM…reaming/extras/pyflink/src/producers/load_taxi_data.py6 # Create a Kafka producer
Deep Nesting6 hits · 6 pts
SeverityFileLineSnippet
LOWcohorts/2025/workshops/dynamic_load_dlt.py94
LOW…-data-warehouse/extras/web_to_gcs_with_progress_bar.py30
LOW07-streaming/extras/python/redpanda_example/consumer.py14
LOW07-streaming/extras/python/json_example/consumer.py13
LOW…ming/extras/python/streams-example/pyspark/consumer.py12
LOW…ing/extras/python/streams-example/redpanda/consumer.py12
Magic Placeholder Names1 hit · 5 pts
SeverityFileLineSnippet
HIGH02-workflow-orchestration/README.md483export GEMINI_API_KEY="your-api-key-here"
Redundant / Tautological Comments2 hits · 3 pts
SeverityFileLineSnippet
LOWcohorts/2025/03-data-warehouse/load_yellow_taxi_data.py50 # Check if the bucket belongs to the current project
LOWcohorts/2026/03-data-warehouse/load_yellow_taxi_data.py50 # Check if the bucket belongs to the current project