Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
123 matches across 12 categories. Click a row to expand file-level details.
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | …s/2023/week_6_stream_processing/streaming_confluent.py | 0 | take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema |
| HIGH | …ing/extras/python/streams-example/pyspark/streaming.py | 0 | take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema |
| HIGH | …ng/extras/python/streams-example/redpanda/streaming.py | 0 | take a spark streaming df and parse value col based on <schema>, return streaming df cols in schema |
| HIGH | 07-streaming/workshop/README.md | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_ |
| HIGH | 07-streaming/workshop/live/src/job/pass_through_job.py | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_ |
| HIGH | 07-streaming/workshop/src/job/pass_through_job.py | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_ |
| HIGH | 07-streaming/workshop/README.md | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku |
| HIGH | 07-streaming/workshop/live/src/job/pass_through_job.py | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku |
| HIGH | 07-streaming/workshop/src/job/pass_through_job.py | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, picku |
| HIGH | 07-streaming/workshop/README.md | 0 | insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup |
| HIGH | 07-streaming/workshop/live/src/job/pass_through_job.py | 0 | insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup |
| HIGH | 07-streaming/workshop/src/job/pass_through_job.py | 0 | insert into {postgres_sink} select pulocationid, dolocationid, trip_distance, total_amount, to_timestamp_ltz(tpep_pickup |
| HIGH | 07-streaming/workshop/README.md | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_ |
| HIGH | 07-streaming/workshop/live/src/job/aggregation_job.py | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_ |
| HIGH | 07-streaming/workshop/src/job/aggregation_job.py | 0 | create table {table_name} ( pulocationid integer, dolocationid integer, trip_distance double, total_amount double, tpep_ |
| HIGH | 07-streaming/workshop/README.md | 0 | create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary |
| HIGH | 07-streaming/workshop/live/src/job/aggregation_job.py | 0 | create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary |
| HIGH | 07-streaming/workshop/src/job/aggregation_job.py | 0 | create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary |
| HIGH | 07-streaming/workshop/src/job/aggregation_job_demo.py | 0 | create table {table_name} ( window_start timestamp(3), pulocationid int, num_trips bigint, total_revenue double, primary |
| HIGH | 07-streaming/workshop/README.md | 0 | insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve |
| HIGH | 07-streaming/workshop/live/src/job/aggregation_job.py | 0 | insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve |
| HIGH | 07-streaming/workshop/src/job/aggregation_job.py | 0 | insert into {aggregated_table} select window_start, pulocationid, count(*) as num_trips, sum(total_amount) as total_reve |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | 02-workflow-orchestration/flows/02_python.yaml | 18 | def get_docker_image_downloads(image_name: str = "kestra/kestra"): |
| LOW | 04-analytics-engineering/setup/local_setup.md | 93 | def download_and_convert_files(taxi_type): |
| LOW | cohorts/2022/week_2_data_ingestion/homework/solution.py | 45 | def donwload_parquetize_upload_dag( |
| LOW | …-data-warehouse/extras/web_to_gcs_with_progress_bar.py | 52 | def csv_to_parquet_with_progress( |
| LOW | …-data-warehouse/extras/web_to_gcs_with_progress_bar.py | 115 | def upload_to_gcs_with_progress(bucket: str, object_name: str, local_file: str): |
| LOW | 07-streaming/workshop/README.md | 799 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/workshop/README.md | 834 | def create_processed_events_sink_postgres(t_env): |
| LOW | 07-streaming/workshop/README.md | 1042 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/workshop/README.md | 1066 | def create_events_aggregated_sink(t_env): |
| LOW | 07-streaming/workshop/live/src/job/aggregation_job.py | 5 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/workshop/live/src/job/aggregation_job.py | 29 | def create_events_aggregated_sink(t_env): |
| LOW | 07-streaming/workshop/live/src/job/pass_through_job.py | 6 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/workshop/live/src/job/pass_through_job.py | 28 | def create_processed_events_sink_postgres(t_env): |
| LOW | 07-streaming/workshop/src/job/aggregation_job.py | 5 | def create_events_aggregated_sink(t_env): |
| LOW | 07-streaming/workshop/src/job/aggregation_job.py | 26 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/workshop/src/job/pass_through_job.py | 5 | def create_processed_events_sink_postgres(t_env): |
| LOW | 07-streaming/workshop/src/job/pass_through_job.py | 27 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/workshop/src/job/aggregation_job_demo.py | 14 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/workshop/src/job/aggregation_job_demo.py | 38 | def create_events_aggregated_sink(t_env): |
| LOW | …ing/extras/python/streams-example/pyspark/streaming.py | 20 | def parse_ride_from_kafka_message(df, schema): |
| LOW | …ng/extras/python/streams-example/redpanda/streaming.py | 20 | def parse_ride_from_kafka_message(df, schema): |
| LOW | 07-streaming/extras/pyflink/src/job/aggregation_job.py | 6 | def create_events_aggregated_sink(t_env): |
| LOW | 07-streaming/extras/pyflink/src/job/aggregation_job.py | 26 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/extras/pyflink/src/job/start_job.py | 5 | def create_processed_events_sink_postgres(t_env): |
| LOW | 07-streaming/extras/pyflink/src/job/start_job.py | 24 | def create_events_source_kafka(t_env): |
| LOW | 07-streaming/extras/pyflink/src/job/taxi_job.py | 5 | def create_taxi_events_sink_postgres(t_env): |
| LOW | 07-streaming/extras/pyflink/src/job/taxi_job.py | 42 | def create_events_source_kafka(t_env): |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | …2022/week_3_data_warehouse/airflow/docker-compose.yaml | 39 | # Feel free to modify this file to suit your needs. |
| MEDIUM | …2022/week_3_data_warehouse/airflow/docker-compose.yaml | 44 | # In order to add custom dependencies or upgrade provider packages you can use your extended image. |
| MEDIUM | …2022/week_2_data_ingestion/airflow/docker-compose.yaml | 39 | # Feel free to modify this file to suit your needs. |
| MEDIUM | …2022/week_2_data_ingestion/airflow/docker-compose.yaml | 44 | # In order to add custom dependencies or upgrade provider packages you can use your extended image. |
| MEDIUM | …eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml | 39 | # Feel free to modify this file to suit your needs. |
| MEDIUM | …eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml | 44 | # In order to add custom dependencies or upgrade provider packages you can use your extended image. |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | …22/week_3_data_warehouse/airflow/dags/gcs_to_bq_dag.py | 2 | |
| LOW | …k_2_data_ingestion/airflow/dags_local/ingest_script.py | 1 | |
| LOW | …ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py | 2 | |
| LOW | …ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py | 7 | |
| LOW | …ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py | 8 | |
| LOW | cohorts/2025/workshops/dynamic_load_dlt.py | 1 | |
| LOW | 06-batch/code/06_spark_sql.py | 6 | |
| LOW | 06-batch/code/06_spark_sql_big_query.py | 6 | |
| LOW | 07-streaming/workshop/src/producers/producer.py | 11 | |
| LOW | 07-streaming/extras/python/redpanda_example/consumer.py | 1 | |
| LOW | 07-streaming/extras/pyflink/src/job/aggregation_job.py | 2 | |
| LOW | 07-streaming/extras/pyflink/src/job/aggregation_job.py | 2 | |
| LOW | 07-streaming/extras/pyflink/src/job/start_job.py | 2 | |
| LOW | 07-streaming/extras/pyflink/src/job/start_job.py | 2 | |
| LOW | 07-streaming/extras/pyflink/src/job/taxi_job.py | 2 | |
| LOW | 07-streaming/extras/pyflink/src/job/taxi_job.py | 2 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | …2022/week_3_data_warehouse/airflow/docker-compose.yaml | 1 | # Licensed to the Apache Software Foundation (ASF) under one |
| LOW | …2022/week_3_data_warehouse/airflow/docker-compose.yaml | 21 | # WARNING: This configuration is for local development. Do not use it in a production deployment. |
| LOW | …2022/week_3_data_warehouse/airflow/docker-compose.yaml | 141 | # healthcheck: |
| LOW | …2022/week_3_data_warehouse/airflow/docker-compose.yaml | 161 | # command: triggerer |
| LOW | …2022/week_3_data_warehouse/airflow/docker-compose.yaml | 261 | - airflow |
| LOW | …2022/week_2_data_ingestion/airflow/docker-compose.yaml | 1 | # Licensed to the Apache Software Foundation (ASF) under one |
| LOW | …2022/week_2_data_ingestion/airflow/docker-compose.yaml | 21 | # WARNING: This configuration is for local development. Do not use it in a production deployment. |
| LOW | …eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml | 1 | # Licensed to the Apache Software Foundation (ASF) under one |
| LOW | …eek_2_data_ingestion/airflow/docker-compose_2.3.4.yaml | 21 | # WARNING: This configuration is for local development. Do not use it in a production deployment. |
| LOW | …ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py | 21 | "depends_on_past": False, |
| LOW | …ingestion/airflow/extras/data_ingestion_gcs_dag_ex2.py | 61 | |
| LOW | 03-data-warehouse/extras/web_to_gcs.py | 101 | # web_to_gcs("2020", "yellow") |
| LOW | 07-streaming/workshop/live/README.md | 1 | # streaming-workshop |
| LOW | …ing/extras/python/redpanda_example/docker-compose.yaml | 41 | # command: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | cohorts/2025/workshops/dynamic_load_dlt.py | 107 | except Exception as e: |
| LOW | cohorts/2025/03-data-warehouse/load_yellow_taxi_data.py | 40 | except Exception as e: |
| LOW | cohorts/2025/03-data-warehouse/load_yellow_taxi_data.py | 96 | except Exception as e: |
| LOW | …ts/2023/week_6_stream_processing/producer_confluent.py | 50 | except Exception as e: |
| LOW | cohorts/2026/03-data-warehouse/load_yellow_taxi_data.py | 40 | except Exception as e: |
| LOW | cohorts/2026/03-data-warehouse/load_yellow_taxi_data.py | 96 | except Exception as e: |
| LOW | 07-streaming/workshop/README.md | 1114 | except Exception as e: |
| LOW | 07-streaming/workshop/live/src/job/aggregation_job.py | 77 | except Exception as e: |
| LOW | 07-streaming/workshop/src/job/aggregation_job.py | 79 | except Exception as e: |
| LOW | 07-streaming/workshop/src/job/pass_through_job.py | 74 | except Exception as e: |
| LOW | 07-streaming/workshop/src/job/aggregation_job_demo.py | 88 | except Exception as e: |
| LOW | 07-streaming/extras/python/avro_example/producer.py | 77 | except Exception as e: |
| LOW | …ming/extras/python/streams-example/pyspark/producer.py | 46 | except Exception as e: |
| LOW | …ing/extras/python/streams-example/redpanda/producer.py | 46 | except Exception as e: |
| LOW | 07-streaming/extras/pyflink/src/job/aggregation_job.py | 85 | except Exception as e: |
| LOW | 07-streaming/extras/pyflink/src/job/start_job.py | 69 | except Exception as e: |
| LOW | 07-streaming/extras/pyflink/src/job/taxi_job.py | 104 | except Exception as e: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | …2022/week_3_data_warehouse/airflow/2_setup_nofrills.md | 113 | # ----------------------------------- |
| MEDIUM | …2022/week_3_data_warehouse/airflow/1_setup_official.md | 113 | # ----------------------------------- |
| MEDIUM | …2022/week_2_data_ingestion/airflow/2_setup_nofrills.md | 112 | # ----------------------------------- |
| MEDIUM | …2022/week_2_data_ingestion/airflow/1_setup_official.md | 111 | # ----------------------------------- |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | …s/2023/week_6_stream_processing/streaming_confluent.py | 82 | # Step 1: Consume GREEN_TAXI_TOPIC and FHV_TAXI_TOPIC |
| LOW | …s/2023/week_6_stream_processing/streaming_confluent.py | 86 | # Step 2: Publish green and fhv rides to RIDES_TOPIC |
| LOW | …s/2023/week_6_stream_processing/streaming_confluent.py | 90 | # Step 3: Read RIDES_TOPIC and parse it in ALL_RIDE_SCHEMA |
| LOW | …s/2023/week_6_stream_processing/streaming_confluent.py | 94 | # Step 4: Apply Aggregation on the all_rides |
| LOW | …s/2023/week_6_stream_processing/streaming_confluent.py | 98 | # Step 5: Sink Aggregation Streams to Console |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | …22/week_3_data_warehouse/airflow/dags/gcs_to_bq_dag.py | 70 | # Create a partitioned table from external table |
| MEDIUM | cohorts/2025/workshops/dynamic_load_dlt.py | 110 | # Create the pipeline |
| MEDIUM | …reaming/extras/pyflink/src/producers/load_taxi_data.py | 6 | # Create a Kafka producer |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | cohorts/2025/workshops/dynamic_load_dlt.py | 94 | |
| LOW | …-data-warehouse/extras/web_to_gcs_with_progress_bar.py | 30 | |
| LOW | 07-streaming/extras/python/redpanda_example/consumer.py | 14 | |
| LOW | 07-streaming/extras/python/json_example/consumer.py | 13 | |
| LOW | …ming/extras/python/streams-example/pyspark/consumer.py | 12 | |
| LOW | …ing/extras/python/streams-example/redpanda/consumer.py | 12 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | 02-workflow-orchestration/README.md | 483 | export GEMINI_API_KEY="your-api-key-here" |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | cohorts/2025/03-data-warehouse/load_yellow_taxi_data.py | 50 | # Check if the bucket belongs to the current project |
| LOW | cohorts/2026/03-data-warehouse/load_yellow_taxi_data.py | 50 | # Check if the bucket belongs to the current project |