Repository Analysis

apache/spark

Apache Spark - A unified analytics engine for large-scale data processing

5.5 Low AI signal View on GitHub
5.5
Adjusted Score
5.5
Raw Score
100%
Time Factor
2026-05-30
Last Push
43,370
Stars
Scala
Language
2,950,542
Lines of Code
12094
Files
8688
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 424HIGH 788MEDIUM 314LOW 7162

Pattern Findings

8688 matches across 17 categories. Click a row to expand file-level details.

Hallucination Indicators424 hits · 4572 pts
SeverityFileLineSnippet
CRITICAL…rg/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala115 case e: Throwable if org.apache.commons.lang3.exception.ExceptionUtils.indexOfThrowable(
CRITICAL…org/apache/spark/deploy/k8s/KubernetesUtilsSuite.scala49 assert(sparkPod.pod.getSpec.getContainers.asScala.toList.map(_.getName) == List("first"))
CRITICAL…org/apache/spark/deploy/k8s/KubernetesUtilsSuite.scala56 assert(sparkPod.pod.getSpec.getContainers.asScala.toList.map(_.getName) == List("second"))
CRITICAL…org/apache/spark/deploy/k8s/KubernetesUtilsSuite.scala63 assert(sparkPod.pod.getSpec.getContainers.asScala.toList.map(_.getName) == List("second"))
CRITICAL…k/scheduler/cluster/k8s/DeploymentAllocatorSuite.scala134 assert(deployment.getSpec.getTemplate.getSpec.getContainers.asScala.exists(
CRITICAL…heduler/cluster/k8s/ExecutorPVCResizePluginSuite.scala187 captor.getValue.getSpec.getResources.getRequests.get("storage")).longValue()
CRITICAL…k/scheduler/cluster/k8s/StatefulSetPodsAllocator.scala171 val statefulSet = new io.fabric8.kubernetes.api.model.apps.StatefulSetBuilder()
CRITICAL…rverExpectations/stage_with_summaries_expectation.json5 "details" : "org.apache.spark.sql.Dataset.foreach(Dataset.scala:2862)\n$line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<
CRITICAL…ectations/stage_with_accumulable_json_expectation.json9 "details" : "org.apache.spark.rdd.RDD.foreach(RDD.scala:765)\n$line9.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:15)\n$
CRITICAL…ions/stage_list_with_accumulable_json_expectation.json9 "details" : "org.apache.spark.rdd.RDD.foreach(RDD.scala:765)\n$line9.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:15)\n$
CRITICAL…oryServerExpectations/stage_list_json_expectation.json5 "details" : "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)\n$l
CRITICAL…oryServerExpectations/stage_list_json_expectation.json87 "details" : "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line11.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)\n$l
CRITICAL…oryServerExpectations/stage_list_json_expectation.json93 "failureReason" : "Job aborted due to stage failure: Task 3 in stage 2.0 failed 1 times, most recent failure: Lost tas
CRITICAL…oryServerExpectations/stage_list_json_expectation.json170 "details" : "org.apache.spark.rdd.RDD.map(RDD.scala:271)\n$line10.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:14)\n$lin
CRITICAL…oryServerExpectations/stage_list_json_expectation.json252 "details" : "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line9.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:15)\n$li
CRITICAL…ctations/stage_list_with_peak_metrics_expectation.json5 "details" : "org.apache.spark.sql.Dataset.foreach(Dataset.scala:2862)\n$line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<
CRITICAL…erExpectations/one_stage_attempt_json_expectation.json5 "details" : "org.apache.spark.rdd.RDD.map(RDD.scala:271)\n$line10.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:14)\n$lin
CRITICAL…attempt_json_details_with_failed_task_expectation.json5 "details" : "org.apache.spark.rdd.RDD.map(RDD.scala:271)\n$line10.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:14)\n$lin
CRITICAL…toryServerExpectations/one_stage_json_expectation.json5 "details" : "org.apache.spark.rdd.RDD.map(RDD.scala:271)\n$line10.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:14)\n$lin
CRITICAL…Expectations/complete_stage_list_json_expectation.json5 "details" : "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line19.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)\n$l
CRITICAL…Expectations/complete_stage_list_json_expectation.json87 "details" : "org.apache.spark.rdd.RDD.map(RDD.scala:271)\n$line10.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:14)\n$lin
CRITICAL…Expectations/complete_stage_list_json_expectation.json169 "details" : "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line9.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:15)\n$li
CRITICAL…xpectations/stage_task_list_w__status_expectation.json5 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json71 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json137 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json203 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json269 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json335 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json401 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json467 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json533 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…xpectations/stage_task_list_w__status_expectation.json599 "errorMessage" : "java.lang.RuntimeException: bad exec\n\tat $line16.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.
CRITICAL…pectations/excludeOnFailure_for_stage_expectation.json5 "details" : "org.apache.spark.rdd.RDD.map(RDD.scala:370)\n$line17.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<consol
CRITICAL…pectations/excludeOnFailure_for_stage_expectation.json176 "errorMessage" : "java.lang.RuntimeException: Bad executor\n\tat $line17.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$an
CRITICAL…pectations/excludeOnFailure_for_stage_expectation.json441 "errorMessage" : "java.lang.RuntimeException: Bad executor\n\tat $line17.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$an
CRITICAL…ations/stage_with_speculation_summary_expectation.json5 "details" : "org.apache.spark.rdd.RDD.collect(RDD.scala:1029)\n$line17.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<c
CRITICAL…rExpectations/stage_with_peak_metrics_expectation.json5 "details" : "org.apache.spark.sql.Dataset.foreach(Dataset.scala:2862)\n$line19.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<
CRITICAL…ectations/one_stage_json_with_details_expectation.json5 "details" : "org.apache.spark.rdd.RDD.map(RDD.scala:271)\n$line10.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:14)\n$lin
CRITICAL…tions/one_stage_json_with_partitionId_expectation.json5 "details" : "org.apache.spark.sql.Dataset.count(Dataset.scala:3130)\n$line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<in
CRITICAL…erExpectations/failed_stage_list_json_expectation.json5 "details" : "org.apache.spark.rdd.RDD.count(RDD.scala:910)\n$line11.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)\n$l
CRITICAL…erExpectations/failed_stage_list_json_expectation.json11 "failureReason" : "Job aborted due to stage failure: Task 3 in stage 2.0 failed 1 times, most recent failure: Lost tas
CRITICAL…tions/excludeOnFailure_node_for_stage_expectation.json5 "details" : "org.apache.spark.rdd.RDD.map(RDD.scala:370)\n$line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<consol
CRITICAL…tions/excludeOnFailure_node_for_stage_expectation.json371 "errorMessage" : "java.lang.RuntimeException: Bad executor\n\tat $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$an
CRITICAL…tions/excludeOnFailure_node_for_stage_expectation.json834 "errorMessage" : "java.lang.RuntimeException: Bad executor\n\tat $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$an
CRITICAL…tions/excludeOnFailure_node_for_stage_expectation.json901 "errorMessage" : "java.lang.RuntimeException: Bad executor\n\tat $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$an
CRITICAL…tions/excludeOnFailure_node_for_stage_expectation.json968 "errorMessage" : "java.lang.RuntimeException: Bad executor\n\tat $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$an
CRITICAL…/src/test/scala/org/apache/spark/ui/UIUtilsSuite.scala220 val e1 = "Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 i
CRITICAL…he/spark/shuffle/sort/ShuffleExternalSorterSuite.scala108 // at org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala721 val rootLogger = org.apache.logging.log4j.LogManager.getRootLogger()
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1749 // at org.apache.spark.util.UtilsSuite.throwException(UtilsSuite.scala:1529)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1754 // ----> at org.apache.spark.util.UtilsSuite.callGetTryFromNested(UtilsSuite.scala:1626) <---- STITCHED.
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1762 // at org.apache.spark.util.UtilsSuite.callDoTryNested(UtilsSuite.scala:1630)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1766 // at org.apache.spark.util.UtilsSuite.callDoTryNestedNested(UtilsSuite.scala:1654)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1799 // at org.apache.spark.util.UtilsSuite.throwException(UtilsSuite.scala:1529)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1803 // at org.apache.spark.util.UtilsSuite.callDoTry(UtilsSuite.scala:1534)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1808 // ----> at org.apache.spark.util.UtilsSuite.callGetTryFromNestedNested(UtilsSuite.scala:1650) <---- STITCHED.
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1816 // at org.apache.spark.util.UtilsSuite.callDoTryNestedNested(UtilsSuite.scala:1654)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1844 // at org.apache.spark.util.UtilsSuite.throwException(UtilsSuite.scala:1529)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1848 // at org.apache.spark.util.UtilsSuite.callDoTry(UtilsSuite.scala:1534)
CRITICAL…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1852 // at org.apache.spark.util.UtilsSuite.callDoTryNested(UtilsSuite.scala:1630)
364 more matches not shown…
Cross-File Repetition662 hits · 3310 pts
SeverityFileLineSnippet
HIGHpython/pyspark/mllib/clustering.py0get the cluster centers, represented as a list of numpy arrays.
HIGHpython/pyspark/mllib/clustering.py0get the cluster centers, represented as a list of numpy arrays.
HIGHpython/pyspark/ml/clustering.py0get the cluster centers, represented as a list of numpy arrays.
HIGHpython/pyspark/ml/clustering.py0get the cluster centers, represented as a list of numpy arrays.
HIGHpython/pyspark/tests/test_rdd.py0executes a job with the group ``job_group``. each job waits for 3 seconds and then exits.
HIGHpython/pyspark/tests/test_pin_thread.py0executes a job with the group ``job_group``. each job waits for 3 seconds and then exits.
HIGHpython/pyspark/sql/tests/test_job_cancellation.py0executes a job with the group ``job_group``. each job waits for 3 seconds and then exits.
HIGHpython/pyspark/tests/test_appsubmit.py0|from pyspark import sparkcontext |from mylib import myfunc | |sc = sparkcontext() |print(sc.parallelize([1, 2, 3]).map(
HIGHpython/pyspark/tests/test_appsubmit.py0|from pyspark import sparkcontext |from mylib import myfunc | |sc = sparkcontext() |print(sc.parallelize([1, 2, 3]).map(
HIGHpython/pyspark/tests/test_appsubmit.py0|from pyspark import sparkcontext |from mylib import myfunc | |sc = sparkcontext() |print(sc.parallelize([1, 2, 3]).map(
HIGHpython/pyspark/tests/test_appsubmit.py0|from pyspark import sparkcontext |from mylib import myfunc | |sc = sparkcontext() |print(sc.parallelize([1, 2, 3]).map(
HIGHpython/pyspark/pipelines/tests/test_cli.py0{ "catalog": "test_catalog", "configuration": {}, "libraries": [] }
HIGHpython/pyspark/pipelines/tests/test_cli.py0{ "catalog": "test_catalog", "configuration": {}, "libraries": [] }
HIGHpython/pyspark/pipelines/tests/test_cli.py0{ "catalog": "test_catalog", "configuration": {}, "libraries": [] }
HIGHpython/pyspark/ml/tree.py0trees in this ensemble. warning: these have null parent estimators.
HIGHpython/pyspark/ml/regression.py0trees in this ensemble. warning: these have null parent estimators.
HIGHpython/pyspark/ml/regression.py0trees in this ensemble. warning: these have null parent estimators.
HIGHpython/pyspark/ml/classification.py0trees in this ensemble. warning: these have null parent estimators.
HIGHpython/pyspark/ml/classification.py0trees in this ensemble. warning: these have null parent estimators.
HIGHpython/pyspark/ml/wrapper.py0returns the number of features the model was trained on. if unknown, returns -1
HIGHpython/pyspark/ml/regression.py0returns the number of features the model was trained on. if unknown, returns -1
HIGHpython/pyspark/ml/base.py0returns the number of features the model was trained on. if unknown, returns -1
HIGHpython/pyspark/ml/connect/base.py0returns the number of features the model was trained on. if unknown, returns -1
HIGHpython/pyspark/ml/regression.py0sets the value of :py:attr:`minweightfractionpernode`.
HIGHpython/pyspark/ml/regression.py0sets the value of :py:attr:`minweightfractionpernode`.
HIGHpython/pyspark/ml/regression.py0sets the value of :py:attr:`minweightfractionpernode`.
HIGHpython/pyspark/ml/classification.py0sets the value of :py:attr:`minweightfractionpernode`.
HIGHpython/pyspark/ml/classification.py0sets the value of :py:attr:`minweightfractionpernode`.
HIGHpython/pyspark/ml/classification.py0sets the value of :py:attr:`minweightfractionpernode`.
HIGHpython/pyspark/ml/regression.py0sets the value of :py:attr:`featuresubsetstrategy`.
HIGHpython/pyspark/ml/regression.py0sets the value of :py:attr:`featuresubsetstrategy`.
HIGHpython/pyspark/ml/classification.py0sets the value of :py:attr:`featuresubsetstrategy`.
HIGHpython/pyspark/ml/classification.py0sets the value of :py:attr:`featuresubsetstrategy`.
HIGHpython/pyspark/ml/clustering.py0number of features, i.e., length of vectors which this transforms.
HIGHpython/pyspark/ml/clustering.py0number of features, i.e., length of vectors which this transforms.
HIGHpython/pyspark/ml/clustering.py0number of features, i.e., length of vectors which this transforms.
HIGHpython/pyspark/ml/feature.py0number of features, i.e., length of vectors which this transforms.
HIGHpython/pyspark/ml/classification.py0gets summary (accuracy/precision/recall, objective history, total iterations) of model trained on the training set. an e
HIGHpython/pyspark/ml/classification.py0gets summary (accuracy/precision/recall, objective history, total iterations) of model trained on the training set. an e
HIGHpython/pyspark/ml/classification.py0gets summary (accuracy/precision/recall, objective history, total iterations) of model trained on the training set. an e
HIGHpython/pyspark/ml/classification.py0evaluates the model on a test dataset. .. versionadded:: 3.1.0 parameters ---------- dataset : :py:class:`pyspark.sql.da
HIGHpython/pyspark/ml/classification.py0evaluates the model on a test dataset. .. versionadded:: 3.1.0 parameters ---------- dataset : :py:class:`pyspark.sql.da
HIGHpython/pyspark/ml/classification.py0evaluates the model on a test dataset. .. versionadded:: 3.1.0 parameters ---------- dataset : :py:class:`pyspark.sql.da
HIGHpython/pyspark/ml/classification.py0evaluates the model on a test dataset. .. versionadded:: 3.1.0 parameters ---------- dataset : :py:class:`pyspark.sql.da
HIGH…thon/pyspark/ml/tests/connect/test_connect_function.py0these test cases exercise the interface to the proto plan generation but do not call spark.
HIGH…hon/pyspark/sql/tests/connect/test_connect_function.py0these test cases exercise the interface to the proto plan generation but do not call spark.
HIGHpython/pyspark/sql/tests/connect/test_connect_plan.py0these test cases exercise the interface to the proto plan generation but do not call spark.
HIGHpython/pyspark/pandas/window.py0wraps a function that handles spark column in order to support it in both pandas-on-spark series and dataframe. note tha
HIGHpython/pyspark/pandas/window.py0wraps a function that handles spark column in order to support it in both pandas-on-spark series and dataframe. note tha
HIGHpython/pyspark/pandas/window.py0wraps a function that handles spark column in order to support it in both pandas-on-spark series and dataframe. note tha
HIGHpython/pyspark/pandas/series.py0same as `to_pandas()`, without issuing the advice log for internal usage.
HIGHpython/pyspark/pandas/frame.py0same as `to_pandas()`, without issuing the advice log for internal usage.
HIGHpython/pyspark/pandas/indexes/multi.py0same as `to_pandas()`, without issuing the advice log for internal usage.
HIGHpython/pyspark/pandas/indexes/base.py0same as `to_pandas()`, without issuing the advice log for internal usage.
HIGH…pyspark/pandas/tests/data_type_ops/test_num_reverse.py0unit tests for arithmetic operations of numeric data types. a few test cases are disabled because pandas-on-spark return
HIGH…hon/pyspark/pandas/tests/data_type_ops/test_num_ops.py0unit tests for arithmetic operations of numeric data types. a few test cases are disabled because pandas-on-spark return
HIGH…park/pandas/tests/data_type_ops/test_num_arithmetic.py0unit tests for arithmetic operations of numeric data types. a few test cases are disabled because pandas-on-spark return
HIGHpython/pyspark/pandas/spark/accessors.py0spark related features. usually, the features here are missing in pandas but spark has it.
HIGHpython/pyspark/pandas/spark/accessors.py0spark related features. usually, the features here are missing in pandas but spark has it.
HIGHpython/pyspark/pandas/spark/accessors.py0spark related features. usually, the features here are missing in pandas but spark has it.
602 more matches not shown…
Hyper-Verbose Identifiers2925 hits · 2733 pts
SeverityFileLineSnippet
LOW…la/org/apache/spark/deploy/yarn/YarnClusterSuite.scala67 private def getOrCreatePyConnectDepChecker(
LOW…apache/spark/network/shuffle/ShuffleTestAccessor.scala136 def getOrCreateAppShufflePartitionInfo(
LOW…scala/org/apache/spark/deploy/yarn/YarnAllocator.scala277 private def getOrUpdateAllocatedHostToContainersMapForRPId(
LOW…scala/org/apache/spark/deploy/yarn/YarnAllocator.scala283 private def getOrUpdateRunningExecutorForRPId(rpId: Int): mutable.Set[String] = synchronized {
LOW…scala/org/apache/spark/deploy/yarn/YarnAllocator.scala287 private def getOrUpdateNumExecutorsStartingForRPId(rpId: Int): AtomicInteger = synchronized {
LOW…scala/org/apache/spark/deploy/yarn/YarnAllocator.scala291 private def getOrUpdateTargetNumExecutorsForRPId(rpId: Int): Int = synchronized {
LOW…a/org/apache/spark/storage/DiskBlockManagerSuite.scala144 private def getAndSetUmask(posix: POSIX, mask: String): String = {
LOW…/resources/org/apache/spark/ui/static/executorspage.js329function reselectCheckboxesBasedOnTaskTableState() {
LOW…resources/org/apache/spark/ui/static/streaming-page.js62function getMaxMarginLeftForTimeline() {
LOW…resources/org/apache/spark/ui/static/streaming-page.js69function getOnClickTimelineFunction() {
LOW…/resources/org/apache/spark/ui/static/timeline-view.js171 function getStageIdAndAttemptForStageEntry(baseElem) {
LOW…/resources/org/apache/spark/ui/static/timeline-view.js239function drawTaskAssignmentTimeline(groupArray, eventObjArray, minLaunchTime, maxFinishTime, offset) {
LOW…main/resources/org/apache/spark/ui/static/stagepage.js120function getColumnNameForTaskMetricSummary(columnKey) {
LOW…main/resources/org/apache/spark/ui/static/stagepage.js175function displayRowsForSummaryMetricsTable(row, type, columnIndex) {
LOW…main/resources/org/apache/spark/ui/static/stagepage.js218function createDataTableForTaskSummaryMetricsTable(taskSummaryMetricsTable) {
LOW…main/resources/org/apache/spark/ui/static/stagepage.js277function createRowMetadataForColumn(colKey, data, checkboxId) {
LOW…main/resources/org/apache/spark/ui/static/stagepage.js287function reselectCheckboxesBasedOnTaskTableState() {
LOW…esources/org/apache/spark/ui/static/environmentpage.js47function createRESTEndPointForEnvironmentPage(appId) {
LOW…/resources/org/apache/spark/ui/static/spark-dag-viz.js232function getMaxChildWidthAndPaddingTop(g, v, svg) {
LOW…src/main/resources/org/apache/spark/ui/static/table.js52function expandAllThreadStackTrace(toggleButton) {
LOW…src/main/resources/org/apache/spark/ui/static/table.js66function collapseAllThreadStackTrace(toggleButton) {
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js1081 */function injectEdgeLabelProxies(g){_.forEach(g.edges(),function(e){var edge=g.edge(e);if(edge.width&&edge.height){var
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js1325 */function findSmallestWidthAlignment(g,xss){return _.minBy(_.values(xss),function(xs){var max=Number.NEGATIVE_INFINITY
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js360function cartesianNormalizeInPlace(d){var l=sqrt(d[0]*d[0]+d[1]*d[1]+d[2]*d[2]);d[0]/=l,d[1]/=l,d[2]/=l}var lambda0$1,ph
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js439}}}function clipAntimeridianIntersect(lambda0,phi0,lambda1,phi1){var cosPhi0,cosPhi1,sinLambda0Lambda1=sin(lambda0-lambd
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js891percentRe=/^%/,requoteRe=/[\\^$*+?|[\]().{}]/g;function pad(value,fill,width){var sign=value<0?"-":"",string=(sign?-valu
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js1298scanPos=0,prevLayerLength=prevLayer.length,lastNode=_.last(layer);_.forEach(layer,function(v,i){var w=findOtherInnerSegm
LOW…src/main/resources/org/apache/spark/ui/static/utils.js188function createRESTEndPointForExecutorsPage(appId) {
LOW…src/main/resources/org/apache/spark/ui/static/utils.js211function createRESTEndPointForMiscellaneousProcess(appId) {
LOWcore/src/main/scala/org/apache/spark/SparkContext.scala3070 def getOrCreate(config: SparkConf): SparkContext = {
LOWcore/src/main/scala/org/apache/spark/SparkContext.scala3094 def getOrCreate(): SparkContext = {
LOWcore/src/main/scala/org/apache/spark/util/Utils.scala775 private[spark] def getOrCreateLocalRootDirs(conf: ReadOnlySparkConf): Array[String] = {
LOWcore/src/main/scala/org/apache/spark/util/Utils.scala810 private def getOrCreateLocalRootDirsImpl(conf: ReadOnlySparkConf): Array[String] = {
LOW…cala/org/apache/spark/util/UninterruptibleThread.scala61 def getAndSetUninterruptible(value: Boolean): Boolean = synchronized {
LOW…c/main/scala/org/apache/spark/util/AccumulatorV2.scala486 private def getOrCreate = {
LOW…a/org/apache/spark/deploy/master/ApplicationInfo.scala82 private[deploy] def getOrUpdateExecutorsForRPId(rpId: Int): mutable.Set[Int] = {
LOW…in/scala/org/apache/spark/scheduler/DAGScheduler.scala528 private def getOrCreateShuffleMapStage(
LOW…in/scala/org/apache/spark/scheduler/DAGScheduler.scala728 private def getOrCreateParentStages(shuffleDeps: HashSet[ShuffleDependency[_, _, _]],
LOW…/scala/org/apache/spark/status/AppStatusListener.scala1144 private def getOrCreateExecutor(executorId: String, addTime: Long): LiveExecutor = {
LOW…/scala/org/apache/spark/status/AppStatusListener.scala1151 private def getOrCreateOtherProcess(processId: String,
LOW…/scala/org/apache/spark/status/AppStatusListener.scala1211 private def getOrCreateStage(info: StageInfo): LiveStage = {
LOWcore/src/main/scala/org/apache/spark/rdd/RDD.scala369 private[spark] def computeOrReadCheckpoint(split: Partition, context: TaskContext): Iterator[T] =
LOWcore/src/main/scala/org/apache/spark/rdd/RDD.scala381 private[spark] def getOrCompute(partition: Partition, context: TaskContext): Iterator[T] = {
LOW…main/scala/org/apache/spark/storage/BlockManager.scala1409 def getOrElseUpdateRDDBlock[T](
LOW…main/scala/org/apache/spark/storage/BlockManager.scala1432 private def getOrElseUpdate[T](
LOW…g/apache/spark/api/python/PythonWorkerLogCapture.scala97 private def getOrCreateLogWriter(workerId: String): (RollingLogWriter, AtomicLong) = {
LOW…in/scala/org/apache/spark/resource/ResourceUtils.scala323 def getOrDiscoverAllResources(
LOW…in/scala/org/apache/spark/resource/ResourceUtils.scala356 def getOrDiscoverAllResourcesForResourceProfile(
LOW…/scala/org/apache/spark/resource/ResourceProfile.scala376 private[spark] def getOrCreateDefaultProfile(conf: SparkConf): ResourceProfile = {
LOWpython/run-tests.py236def run_individual_python_test(target_dir, test_name, pyspark_python, keep_test_output):
LOWpython/run-tests.py398def get_default_python_executables():
LOWpython/pyspark/worker.py145 def use_legacy_pandas_udf_conversion(self) -> bool:
LOWpython/pyspark/worker.py152 def use_legacy_pandas_udtf_conversion(self) -> bool:
LOWpython/pyspark/worker.py167 def int_to_decimal_coercion_enabled(self) -> bool:
LOWpython/pyspark/worker.py185 def arrow_max_records_per_batch(self) -> int:
LOWpython/pyspark/worker.py189 def arrow_max_bytes_per_batch(self) -> int:
LOWpython/pyspark/worker.py349def verify_iterator_exhausted(iterator: Iterator, error_class: str) -> None:
LOWpython/pyspark/worker.py405def wrap_pandas_batch_iter_udf(f, return_type, runner_conf):
LOWpython/pyspark/worker.py497def wrap_cogrouped_map_pandas_udf(f, return_type, argspec, runner_conf):
LOWpython/pyspark/worker.py568def wrap_grouped_transform_with_state_pandas_udf(f, return_type, runner_conf):
2865 more matches not shown…
Over-Commented Block2850 hits · 2672 pts
SeverityFileLineSnippet
LOW.asf.yaml1# Licensed to the Apache Software Foundation (ASF) under one or more
LOW.pre-commit-config.yaml1#
LOWpyproject.toml1#
LOW…rg/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala121 // There's a race in MiniYARNCluster in which start() may return before the RM has updated
LOW…scala/org/apache/spark/deploy/yarn/YarnAllocator.scala321 ResourceProfile.getResourcesForClusterManager(rp.id, rp.executorResources,
LOW…c/main/scala/org/apache/spark/deploy/yarn/Client.scala801 // conf archive will be handled by the AM differently so that we avoid having to send
LOW…/kubernetes/docker/src/main/dockerfiles/spark/decom.sh1#!/usr/bin/env bash
LOW…rnetes/docker/src/main/dockerfiles/spark/entrypoint.sh1#!/usr/bin/env bash
LOW…s/core/src/test/resources/driver-podgroup-template.yml1#
LOW…er/cluster/k8s/KubernetesClusterSchedulerBackend.scala301 running.delete()
LOW…/kubernetes/integration-tests/tests/pyfiles_connect.py1#
LOW…/kubernetes/integration-tests/tests/decommissioning.py1#
LOW…managers/kubernetes/integration-tests/tests/pyfiles.py1#
LOW…ernetes/integration-tests/tests/worker_memory_check.py1#
LOW…ernetes/integration-tests/tests/py_container_checks.py1#
LOW…tes/integration-tests/tests/decommissioning_cleanup.py1#
LOW…tes/integration-tests/tests/python_executable_check.py1#
LOW…nagers/kubernetes/integration-tests/tests/autoscale.py1#
LOW…ntegration-tests/scripts/setup-integration-test-env.sh1#!/usr/bin/env bash
LOW…agers/kubernetes/integration-tests/dev/spark-rbac.yaml1#
LOW…tes/integration-tests/dev/dev-run-integration-tests.sh1#!/usr/bin/env bash
LOW…tegration-tests/src/test/resources/driver-template.yml1#
LOW…gration-tests/src/test/resources/executor-template.yml1#
LOW…-tests/src/test/resources/driver-schedule-template.yml1#
LOW…st/resources/volcano/high-priority-driver-template.yml1#
LOW…rces/volcano/low-priority-driver-podgroup-template.yml1#
LOW…sources/volcano/driver-podgroup-template-memory-3g.yml1#
LOW…/resources/volcano/queue0-driver-podgroup-template.yml1#
LOW…n-tests/src/test/resources/volcano/priorityClasses.yml1#
LOW…/resources/volcano/queue1-driver-podgroup-template.yml1#
LOW…/resources/volcano/medium-priority-driver-template.yml1#
LOW…t/resources/volcano/queue-driver-podgroup-template.yml1#
LOW…s/volcano/medium-priority-driver-podgroup-template.yml1#
LOW…est/resources/volcano/low-priority-driver-template.yml1#
LOW…ces/volcano/high-priority-driver-podgroup-template.yml1#
LOW…/org/apache/spark/launcher/AbstractCommandBuilder.java201 if (isBeeLine && "1".equals(getenv("SPARK_CONNECT_BEELINE")) &&
LOW…/org/apache/spark/launcher/AbstractCommandBuilder.java321 return scala;
LOW…he/spark/shuffle/sort/ShuffleExternalSorterSuite.scala101 // may happen. Here are some examples we have seen:
LOW…t/scala/org/apache/spark/util/SizeEstimatorSuite.scala301 // objectSize=8, fields=12 => shellSize=20, aligned to 24
LOW…t/scala/org/apache/spark/util/SizeEstimatorSuite.scala361 // DummyString has: pointer(arr,8) + Int(hashCode,4) + Int(hash32,4) = 16 bytes of fields
LOW…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1661 // java.lang.Exception: test
LOW…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1681 val e2 = intercept[Exception] {
LOW…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1701 assert(!st2.exists(_.getMethodName == "callDoTry"))
LOW…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1741
LOW…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1801 // at scala.util.Try$.apply(Try.scala:217)
LOW…/src/test/scala/org/apache/spark/util/UtilsSuite.scala1841 //
LOW…la/org/apache/spark/scheduler/HealthTrackerSuite.scala361 // This ensures that we don't trigger spurious excluding for long tasksets, when the taskset
LOW…rg/apache/spark/scheduler/TaskSchedulerImplSuite.scala1161 // We should be checking our node excludelist, but it should be within the bound we defined
LOW…rg/apache/spark/scheduler/TaskSchedulerImplSuite.scala2541
LOW…/test/scala/org/apache/spark/scheduler/PoolSuite.scala121 scheduleTaskAndVerifyId(0, rootPool, 0)
LOW…st/scala/org/apache/spark/executor/ExecutorSuite.scala101 }
LOW…sources/org/apache/spark/ui/static/graphlib-dot.min.js141// Label for the graph itself
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js201h=s?Math.atan2(k,bl)*rad2deg-120:NaN;return new Cubehelix(h<0?h+360:h,s,l,o.opacity)}function cubehelix(h,s,l,opacity){r
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js281// Limit forces for very close nodes; randomize direction if coincident.
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js301function formatTrim(s){out:for(var n=s.length,i=1,i0=-1,i1;i<n;++i){switch(s[i]){case".":i0=i1=i;break;case"0":if(i0===0
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js321// Perform the initial formatting.
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js341(function(global,factory){typeof exports==="object"&&typeof module!=="undefined"?factory(exports,require("d3-array")):ty
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js401// along the clip edge.
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js461// Rejoin first and last segments if there were intersections and the first
LOW…n/resources/org/apache/spark/ui/static/dagre-d3.min.js541throw new Error}function enclosesNot(a,b){var dr=a.r-b.r,dx=b.x-a.x,dy=b.y-a.y;return dr<0||dr*dr<dx*dx+dy*dy}function e
2790 more matches not shown…
Cross-Language Confusion124 hits · 558 pts
SeverityFileLineSnippet
HIGHpython/pyspark/core/rdd.py242 return self._jrdd.toString()
HIGHpython/pyspark/mllib/tree.py90 return self._java_model.toString()
HIGHpython/pyspark/mllib/tree.py150 return self._java_model.toString()
HIGHpython/pyspark/mllib/stat/test.py64 return self._java_model.toString()
HIGHpython/pyspark/tests/test_util.py73 # This attempts java.lang.String(null) which throws an NPE.
HIGH…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py80 decimal, date, timestamp, duration, time, null, and nested types.
HIGH…park/tests/upstream/pyarrow/test_pyarrow_array_cast.py31- Success: [0, 1, null]@int16 - element values via scalar.as_py() and Arrow type after cast
HIGH…park/tests/upstream/pyarrow/test_pyarrow_array_cast.py111 as "[val1, val2, null]@arrow_type" using each scalar's as_py() value.
HIGH…park/tests/upstream/pyarrow/test_pyarrow_array_cast.py126 On success: "[val1, val2, null]@arrow_type"
HIGH…park/tests/upstream/pyarrow/test_pyarrow_array_cast.py127 e.g. "[0, 1, -1, 127, -128, null]@int16"
HIGHpython/pyspark/testing/utils.py371 script = "$(test $(tput colors)) && $(test $(tput colors) -ge 8) && echo true || echo false"
HIGHpython/pyspark/ml/tests/test_wrapper.py54 self.assertIn("LinearRegression_", model._java_obj.toString())
HIGHpython/pyspark/ml/tests/test_wrapper.py55 self.assertIn("LinearRegressionTrainingSummary", summary._java_obj.toString())
HIGHpython/pyspark/ml/tests/test_wrapper.py61 model._java_obj.toString()
HIGHpython/pyspark/ml/tests/test_wrapper.py62 self.assertIn("LinearRegressionTrainingSummary", summary._java_obj.toString())
HIGHpython/pyspark/ml/tests/test_wrapper.py74 model._java_obj.toString()
HIGHpython/pyspark/ml/tests/test_wrapper.py76 summary._java_obj.toString()
HIGHpython/pyspark/ml/tests/test_functions.py253 self.assertTrue(df1.equals(df2))
HIGHpython/pyspark/ml/tests/test_functions.py259 self.assertFalse(df1.equals(df3))
HIGHpython/pyspark/ml/tests/test_param.py262 "inputCol: input column name. (undefined)",
HIGHpython/pyspark/errors/exceptions/captured.py240 desc=e.toString(),
HIGHpython/pyspark/resource/requests.py321 that the cluster manager doesn't support the result is undefined, it may error or may just
HIGHpython/pyspark/pandas/series.py6759 if get_option("compute.eager_check") and not self.index.equals(other.index):
HIGHpython/pyspark/pandas/utils.py996 return left._jc.equals(right._jc)
HIGHpython/pyspark/pandas/frame.py1714 # | 2|[{0, null}, {1, n...|
HIGHpython/pyspark/pandas/indexing.py556 cast(ClassicColumn, col)._jc.toString() for col in data_spark_columns
HIGHpython/pyspark/pandas/groupby.py1305 Flag to ignore NA(nan/null) values during truth testing.
HIGHpython/pyspark/pandas/base.py1464 # If even one StructField is null, that row should be dropped.
HIGHpython/pyspark/pandas/tests/computation/test_combine.py682 # Only update where new value > 150 (and old is null)
HIGH…hon/pyspark/pandas/tests/diff_frames_ops/test_error.py198 psidx1.equals(psidx2)
HIGHpython/pyspark/pandas/indexes/base.py387 and self.equals(other)
HIGHpython/pyspark/pandas/indexes/base.py411 >>> idx.equals(idx)
HIGHpython/pyspark/pandas/indexes/base.py414 ... idx.equals(ps.Index(['a', 'b', 'c']))
HIGHpython/pyspark/pandas/indexes/base.py417 ... idx.equals(ps.Index(['b', 'b', 'a']))
HIGHpython/pyspark/pandas/indexes/base.py419 >>> idx.equals(midx)
HIGHpython/pyspark/pandas/indexes/base.py424 >>> midx.equals(midx)
HIGHpython/pyspark/pandas/indexes/base.py427 ... midx.equals(ps.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')]))
HIGHpython/pyspark/pandas/indexes/base.py430 ... midx.equals(ps.MultiIndex.from_tuples([('c', 'z'), ('b', 'y'), ('a', 'x')]))
HIGHpython/pyspark/pandas/indexes/base.py432 >>> midx.equals(idx)
HIGHpython/pyspark/sql/conversion.py178 if batch.schema.equals(arrow_schema, check_metadata=False):
HIGHpython/pyspark/sql/types.py1845 return stringConcat.toString()
HIGHpython/pyspark/sql/types.py262 null, UDTs, arrays, structs, and maps."""
HIGHpython/pyspark/sql/context.py808 '{"field1" : null, "field2": "row3", "field3":{"field4":33, "field5": []}}',
HIGHpython/pyspark/sql/group.py76 jvm_string = self._jgd.toString()
HIGHpython/pyspark/sql/tvf.py427 Unlike posexplode, if the array/map is null or empty then the row (null, null) is produced.
HIGHpython/pyspark/sql/tvf.py570 null, and any other variant values.
HIGHpython/pyspark/sql/tvf.py635 SQL NULL, variant null, and any other variant values, then NULL is produced.
HIGHpython/pyspark/sql/classic/column.py661 return "Column<'%s'>" % self._jc.toString()
HIGHpython/pyspark/sql/tests/test_session.py79 self.assertTrue(jsession.equals(spark._jvm.SparkSession.getDefaultSession().get()))
HIGHpython/pyspark/sql/tests/test_udtf.py413 df = self.spark.sql("SELECT * FROM testUDTF(null)")
HIGHpython/pyspark/sql/tests/test_collection.py416 pdf.equals(
HIGHpython/pyspark/sql/tests/test_tvf.py59 "VALUES (0, ARRAY(0, 1)), (1, ARRAY(2)), (2, ARRAY()), (null, ARRAY(4)) "
HIGHpython/pyspark/sql/tests/test_tvf.py121 "VALUES (0, ARRAY(0, 1)), (1, ARRAY(2)), (2, ARRAY()), (null, ARRAY(4)) "
HIGHpython/pyspark/sql/tests/test_tvf.py173 inline(array(named_struct('a', 1, 'b', 2), null, named_struct('a', 3, 'b', 4)))
HIGHpython/pyspark/sql/tests/test_tvf.py226 inline_outer(array(named_struct('a', 1, 'b', 2), null, named_struct('a', 3, 'b', 4)))
HIGHpython/pyspark/sql/tests/test_tvf.py277 ('5', '{"f1": null, "f5": ""}'),
HIGHpython/pyspark/sql/tests/test_tvf.py355 "VALUES (0, ARRAY(0, 1)), (1, ARRAY(2)), (2, ARRAY()), (null, ARRAY(4)) "
HIGHpython/pyspark/sql/tests/test_tvf.py415 "VALUES (0, ARRAY(0, 1)), (1, ARRAY(2)), (2, ARRAY()), (null, ARRAY(4)) "
HIGHpython/pyspark/sql/tests/test_tvf.py454 "VALUES (0, ARRAY(0, 1)), (1, ARRAY(2)), (2, ARRAY()), (null, ARRAY(4)) "
HIGHpython/pyspark/sql/tests/test_datasources.py350 ["""{"a":null, "b":1, "c":3.0}"""],
64 more matches not shown…
Unused Imports603 hits · 556 pts
SeverityFileLineSnippet
LOWpython/run-tests.py45
LOWpython/packaging/connect/pyspark_connect/__init__.py22
LOWpython/pyspark/worker.py51
LOWpython/pyspark/util.py61
LOWpython/pyspark/util.py63
LOWpython/pyspark/util.py64
LOWpython/pyspark/util.py66
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py67
LOWpython/pyspark/util.py92
LOWpython/pyspark/util.py92
LOWpython/pyspark/util.py92
LOWpython/pyspark/util.py92
LOWpython/pyspark/util.py92
LOWpython/pyspark/util.py99
LOWpython/pyspark/util.py923
LOWpython/pyspark/util.py943
LOWpython/pyspark/conf.py27
LOWpython/pyspark/shell.py33
LOWpython/pyspark/__init__.py68
LOWpython/pyspark/__init__.py69
LOWpython/pyspark/__init__.py69
LOWpython/pyspark/__init__.py70
LOWpython/pyspark/__init__.py71
LOWpython/pyspark/__init__.py71
LOWpython/pyspark/__init__.py72
LOWpython/pyspark/__init__.py72
LOWpython/pyspark/__init__.py73
LOWpython/pyspark/__init__.py73
LOWpython/pyspark/__init__.py73
LOWpython/pyspark/__init__.py74
LOWpython/pyspark/__init__.py74
LOWpython/pyspark/__init__.py75
LOWpython/pyspark/__init__.py76
LOWpython/pyspark/__init__.py131
LOWpython/pyspark/__init__.py56
LOWpython/pyspark/__init__.py56
LOWpython/pyspark/__init__.py57
LOWpython/pyspark/__init__.py58
543 more matches not shown…
Self-Referential Comments143 hits · 398 pts
SeverityFileLineSnippet
MEDIUMbin/docker-image-tool.sh80# Create a smaller build context for docker in dev builds to make the build faster. Docker
MEDIUMpython/run-tests.py268 # Create a unique temp directory under 'target/' for each run. The TMPDIR variable is
MEDIUMpython/run-tests.py544 # Create the target directory before starting tasks to avoid races.
MEDIUMpython/pyspark/java_gateway.py77 # Create a temporary directory where the gateway server should write the connection
MEDIUMpython/pyspark/statcounter.py18# This file is ported from spark/util/StatCounter.scala
MEDIUMpython/pyspark/daemon.py114 # Create a new process group to corral our children
MEDIUMpython/pyspark/daemon.py120 # Create a listening socket on the loopback interface
MEDIUMpython/pyspark/core/rdd.py245 # This method is called when attempting to pickle an RDD, which is always an error:
MEDIUMpython/pyspark/core/rdd.py2827 ... # Create the conf for writing
MEDIUMpython/pyspark/core/rdd.py2839 ... # Create the conf for reading
MEDIUMpython/pyspark/core/rdd.py2986 ... # Create the conf for writing
MEDIUMpython/pyspark/core/rdd.py2998 ... # Create the conf for reading
MEDIUMpython/pyspark/core/context.py299 # Create the Java SparkContext through Py4J
MEDIUMpython/pyspark/core/context.py304 # Create a single Accumulator in Java that we'll send all our updates through;
MEDIUMpython/pyspark/core/context.py382 # Create a temporary directory inside spark.local.dir:
MEDIUMpython/pyspark/core/context.py492 # This method is called when attempting to pickle SparkContext, which is always an error:
MEDIUMpython/pyspark/core/context.py1493 ... # Create the conf for writing
MEDIUMpython/pyspark/core/context.py1505 ... # Create the conf for reading
MEDIUMpython/pyspark/core/context.py1690 ... # Create the conf for writing
MEDIUMpython/pyspark/core/context.py1702 ... # Create the conf for reading
MEDIUMpython/pyspark/mllib/tests/test_linalg.py534 # Create a CSC matrix with non-sorted indices
MEDIUMpython/pyspark/mllib/tests/test_streaming_algorithms.py101 # Create a toy dataset by setting a tiny offset for each point.
MEDIUMpython/pyspark/mllib/tests/test_streaming_algorithms.py396 # Create a model with initial Weights equal to coefs
MEDIUMpython/pyspark/tests/test_rdd.py706 # Create a DataFrame with many columns, call a Python function on each row, and take only
MEDIUMpython/pyspark/pipelines/init_cli.py52 # Create the storage directory
MEDIUMpython/pyspark/pipelines/init_cli.py65 # Create the transformations directory
MEDIUMpython/pyspark/pipelines/init_cli.py69 # Create the Python example file
MEDIUMpython/pyspark/pipelines/init_cli.py74 # Create the SQL example file
MEDIUMpython/pyspark/pipelines/tests/test_cli.py376 # Create a minimal pipeline spec
MEDIUMpython/pyspark/pipelines/tests/test_cli.py400 # Create a minimal pipeline spec
MEDIUMpython/pyspark/pipelines/tests/test_cli.py425 # Create a minimal pipeline spec
MEDIUMpython/pyspark/ml/pipeline.py188 # Create a new instance of this stage.
MEDIUMpython/pyspark/ml/pipeline.py346 # Create a new instance of this stage.
MEDIUMpython/pyspark/ml/tuning.py981 # Create a new instance of this stage.
MEDIUMpython/pyspark/ml/tuning.py1559 # Create a new instance of this stage.
MEDIUMpython/pyspark/ml/tuning.py1684 # Create a new instance of this stage.
MEDIUMpython/pyspark/ml/tests/test_feature.py362 # Create a DataFrame
MEDIUMpython/pyspark/pandas/tests/io/test_io.py34# This file contains test cases for 'Serialization / IO / Conversion'
MEDIUMpython/pyspark/pandas/tests/frame/test_time_series.py26# This file contains test cases for 'Time series-related'
MEDIUMpython/pyspark/pandas/tests/frame/test_spark.py34# This file contains test cases for 'Spark-related'
MEDIUMpython/pyspark/pandas/tests/frame/test_attrs.py26# This file contains test cases for 'Attributes and underlying data'
MEDIUMpython/pyspark/pandas/tests/frame/test_constructor.py34# This file contains test cases for 'Constructor'
MEDIUMpython/pyspark/pandas/tests/frame/test_conversion.py25# This file contains test cases for 'Conversion'
MEDIUMpython/pyspark/pandas/tests/frame/test_reindexing.py31# This file contains test cases for 'Reindexing / Selection / Label manipulation'
MEDIUMpython/pyspark/pandas/tests/frame/test_reshaping.py27# This file contains test cases for 'Reshaping, sorting, transposing'
MEDIUMpython/pyspark/pandas/tests/computation/test_combine.py25# This file contains test cases for 'Combining / joining / merging'
MEDIUM…on/pyspark/pandas/tests/computation/test_apply_func.py29# This file contains test cases for 'Function application, GroupBy & Window'
MEDIUM…/pyspark/pandas/tests/computation/test_missing_data.py27# This file contains test cases for 'Missing data handling'
MEDIUM…on/pyspark/pandas/tests/computation/test_binary_ops.py26# This file contains test cases for 'Binary operator functions'
MEDIUMpython/pyspark/pandas/tests/computation/test_compute.py26# This file contains test cases for 'Computations / Descriptive Stats'
MEDIUM…thon/pyspark/pandas/tests/indexes/test_indexing_adv.py56 # Create the equivalent of pdf.loc[3] as a Koalas Series
MEDIUM…thon/pyspark/pandas/tests/indexes/test_indexing_adv.py142 # Create the equivalent of pdf.loc[3] as a Koalas Series
MEDIUMpython/pyspark/pandas/tests/indexes/test_indexing.py26# This file contains test cases for 'Indexing, Iteration'
MEDIUMpython/pyspark/pandas/indexes/base.py263 # This method is used via `DataFrame.info` API internally.
MEDIUMpython/pyspark/sql/dataframe.py563 ... # Create a table with Rate source.
MEDIUMpython/pyspark/sql/dataframe.py6788 >>> # Create a simple UDTF that processes table data
MEDIUMpython/pyspark/sql/dataframe.py6794 >>> # Create a DataFrame
MEDIUMpython/pyspark/sql/session.py624 # Create a new SparkSession in the JVM
MEDIUMpython/pyspark/sql/session.py1647 # Create a DataFrame from pandas DataFrame.
MEDIUMpython/pyspark/sql/session.py1652 # Create a DataFrame from PyArrow Table.
83 more matches not shown…
Decorative Section Separators103 hits · 376 pts
SeverityFileLineSnippet
MEDIUMpython/pyspark/cloudpickle/cloudpickle.py662# -------------------------------------------------
MEDIUMpython/pyspark/cloudpickle/cloudpickle.py698# ------------------------------------
MEDIUMpython/pyspark/cloudpickle/cloudpickle.py704# -----------------------------------
MEDIUMpython/pyspark/cloudpickle/cloudpickle.py816# -------------------------------
MEDIUMpython/pyspark/cloudpickle/cloudpickle.py1125# ------------------------------------
MEDIUMpython/pyspark/cloudpickle/cloudpickle.py1207# ---------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py92 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py94 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py96 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py287 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py289 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py291 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py356 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py358 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py397 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py399 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py557 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py559 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py50 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py52 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py67 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py69 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py208 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py210 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py238 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py240 # -------------------------------------------------------------------------
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py435 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py437 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py525 # =========================================================================
MEDIUM…/upstream/pyarrow/test_pyarrow_array_type_inference.py527 # =========================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py189 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py191 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py196 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py198 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py206 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py208 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py216 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py218 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py228 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py230 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py240 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py242 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py258 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py260 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py268 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py270 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py286 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py288 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py292 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py294 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py150 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py152 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py176 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py178 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py367 # =====================================================================
MEDIUM…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py369 # =====================================================================
MEDIUM…k/tests/upstream/pyarrow/test_pyarrow_type_coercion.py56 # =========================================================================
MEDIUM…k/tests/upstream/pyarrow/test_pyarrow_type_coercion.py58 # =========================================================================
MEDIUM…k/tests/upstream/pyarrow/test_pyarrow_type_coercion.py85 # =========================================================================
MEDIUM…k/tests/upstream/pyarrow/test_pyarrow_type_coercion.py87 # =========================================================================
43 more matches not shown…
Deep Nesting375 hits · 322 pts
SeverityFileLineSnippet
LOWpython/run-tests.py236
LOWpython/run-tests.py474
LOWpython/pyspark/worker.py853
LOWpython/pyspark/worker.py885
LOWpython/pyspark/worker.py929
LOWpython/pyspark/worker.py1038
LOWpython/pyspark/worker.py2184
LOWpython/pyspark/worker.py1565
LOWpython/pyspark/worker.py1141
LOWpython/pyspark/worker.py1303
LOWpython/pyspark/worker.py1579
LOWpython/pyspark/worker.py1659
LOWpython/pyspark/worker.py1677
LOWpython/pyspark/worker.py2607
LOWpython/pyspark/worker.py1726
LOWpython/pyspark/worker.py1799
LOWpython/pyspark/worker.py1587
LOWpython/pyspark/worker.py1855
LOWpython/pyspark/worker.py1605
LOWpython/pyspark/worker.py1615
LOWpython/pyspark/worker.py1637
LOWpython/pyspark/worker_message.py135
LOWpython/pyspark/util.py572
LOWpython/pyspark/conf.py180
LOWpython/pyspark/shuffle.py62
LOWpython/pyspark/shuffle.py779
LOWpython/pyspark/statcounter.py60
LOWpython/pyspark/install.py120
LOWpython/pyspark/accumulators.py263
LOWpython/pyspark/accumulators.py268
LOWpython/pyspark/profiler.py189
LOWpython/pyspark/daemon.py46
LOWpython/pyspark/daemon.py113
LOWpython/pyspark/core/rdd.py2210
LOWpython/pyspark/core/rdd.py3672
LOWpython/pyspark/core/rdd.py3724
LOWpython/pyspark/core/context.py226
LOWpython/pyspark/core/context.py1817
LOWpython/pyspark/logger/worker_io.py214
LOWpython/pyspark/cloudpickle/cloudpickle.py313
LOWpython/pyspark/cloudpickle/cloudpickle.py338
LOWpython/pyspark/cloudpickle/cloudpickle.py1069
LOWpython/pyspark/cloudpickle/cloudpickle.py1441
LOWpython/pyspark/mllib/classification.py236
LOWpython/pyspark/mllib/common.py75
LOWpython/pyspark/mllib/common.py96
LOWpython/pyspark/mllib/common.py160
LOWpython/pyspark/mllib/linalg/__init__.py96
LOWpython/pyspark/mllib/linalg/__init__.py114
LOWpython/pyspark/mllib/linalg/__init__.py415
LOWpython/pyspark/mllib/linalg/__init__.py824
LOWpython/pyspark/tests/test_serializers.py213
LOWpython/pyspark/tests/test_worker.py39
LOWpython/pyspark/tests/test_shuffle.py66
LOW…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py84
LOW…park/tests/upstream/pyarrow/test_pyarrow_array_cast.py137
LOWpython/pyspark/pipelines/cli.py121
LOWpython/pyspark/pipelines/cli.py221
LOW…yspark/pipelines/tests/test_block_session_mutations.py38
LOWpython/pyspark/testing/sqlutils.py289
315 more matches not shown…
Excessive Try-Catch Wrapping197 hits · 223 pts
SeverityFileLineSnippet
LOWpython/conf_vscode/sitecustomize.py38 except Exception:
MEDIUMpython/pyspark/worker.py2123def evaluate(*a) -> tuple:
LOWpython/pyspark/worker.py1541 except Exception as e:
LOWpython/pyspark/worker.py1720 except Exception as e:
LOWpython/pyspark/worker.py1849 except Exception as e:
LOWpython/pyspark/worker.py1909 except Exception as e:
LOWpython/pyspark/worker.py2004 except Exception as e:
LOWpython/pyspark/worker.py2128 except Exception as e:
LOWpython/pyspark/worker.py3660 except Exception:
LOWpython/pyspark/threaddump.py45 except Exception as e:
MEDIUMpython/pyspark/threaddump.py46 print(f"Error getting children of process {args.pid}: {e}")
LOWpython/pyspark/threaddump.py54 except Exception:
MEDIUMpython/pyspark/threaddump.py28def main() -> int:
LOWpython/pyspark/util.py782 except Exception:
LOWpython/pyspark/serializers.py446 except Exception as e:
LOWpython/pyspark/serializers.py495 except Exception:
LOWpython/pyspark/shell.py73 except Exception:
LOWpython/pyspark/shell.py90 except Exception:
LOWpython/pyspark/memory_profiler_ext.py32 except Exception:
LOWpython/pyspark/memory_profiler_ext.py69 except Exception:
LOWpython/pyspark/install.py169 except Exception:
LOWpython/pyspark/install.py189 except Exception:
LOWpython/pyspark/install.py221 except Exception as e:
LOWpython/pyspark/instrumentation_utils.py49 except Exception as ex:
LOWpython/pyspark/instrumentation_utils.py73 except Exception as ex:
LOWpython/pyspark/daemon.py96 except Exception:
LOWpython/pyspark/daemon.py270 except Exception:
LOWpython/pyspark/core/context.py375 except Exception:
LOWpython/pyspark/core/broadcast.py181 except Exception as e:
LOWpython/pyspark/logger/worker_io.py252 except Exception:
LOWpython/pyspark/cloudpickle/cloudpickle.py232 except Exception:
LOWpython/pyspark/tests/test_rdd.py354 except Exception:
LOWpython/pyspark/tests/test_rdd.py889 except Exception:
LOWpython/pyspark/tests/test_taskcontext.py206 except Exception:
LOWpython/pyspark/tests/test_taskcontext.py277 except Exception:
MEDIUMpython/pyspark/tests/test_taskcontext.py203def f(iterator):
LOWpython/pyspark/tests/test_util.py180 except Exception as e:
LOWpython/pyspark/tests/test_pin_thread.py68 except Exception as e:
LOWpython/pyspark/tests/test_pin_thread.py123 except Exception:
LOWpython/pyspark/tests/test_worker.py56 except Exception:
LOWpython/pyspark/tests/test_worker.py156 except Exception:
MEDIUMpython/pyspark/tests/test_worker.py53def run():
MEDIUMpython/pyspark/tests/test_worker.py153def count():
LOWpython/pyspark/tests/test_context.py237 except Exception:
LOWpython/pyspark/tests/test_install_spark.py50 except Exception:
LOW…stream/pyarrow/test_pyarrow_arrow_to_pandas_default.py399 except Exception as e:
LOW…park/tests/upstream/pyarrow/test_pyarrow_array_cast.py134 except Exception as e:
LOWpython/pyspark/testing/sqlutils.py109 except Exception:
LOWpython/pyspark/testing/sqlutils.py167except Exception as e:
LOWpython/pyspark/testing/goldenutils.py189 except Exception as e:
LOWpython/pyspark/testing/utils.py128except Exception as e:
LOWpython/pyspark/testing/utils.py140except Exception as e:
LOWpython/pyspark/testing/utils.py373 except Exception:
MEDIUMpython/pyspark/testing/utils.py368def _terminal_color_support():
LOWpython/pyspark/ml/functions.py848 except Exception as e:
LOWpython/pyspark/ml/wrapper.py66 except Exception:
MEDIUMpython/pyspark/ml/wrapper.py58def __del__(self) -> None:
LOWpython/pyspark/ml/util.py360 except Exception:
LOWpython/pyspark/ml/util.py372 except Exception:
LOWpython/pyspark/ml/torch/distributor.py57 except Exception:
137 more matches not shown…
AI Slop Vocabulary52 hits · 130 pts
SeverityFileLineSnippet
MEDIUM…/org/apache/spark/launcher/AbstractCommandBuilder.java244 // Place slf4j-api-* jar first to be robust
MEDIUM…c/test/scala/org/apache/spark/ui/UISeleniumSuite.scala414 // Essentially, we want to check that none of the stage rows show
MEDIUM…c/test/scala/org/apache/spark/ui/UISeleniumSuite.scala470 // Essentially, we want to check that none of the stage rows show
MEDIUM…ala/org/apache/spark/scheduler/DAGSchedulerSuite.scala2376 // For a robust test assertion, limit number of job tasks to 1; that is,
MEDIUM…apache/spark/scheduler/SchedulerIntegrationSuite.scala431 // it really can only be "best-effort" in any case, and the scheduler should be robust to that.
MEDIUM…rg/apache/spark/scheduler/TaskSchedulerImplSuite.scala492 // Even though we launched a local task above, we still utilize non-local exec2.
MEDIUM…he/spark/scheduler/HealthTrackerIntegrationSuite.scala80 // robust to one bad node.
MEDIUM…main/scala/org/apache/spark/storage/BlockManager.scala1224 // BlockTransferService, which will leverage it to spill the block; if not, then passed-in
MEDIUMpython/packaging/classic/setup.py164 # TODO(SPARK-32837) leverage pip's custom options
LOW…spark/messages/socket/spark_socket_message_receiver.py49 # For socket communication, we just pass along the underlying socket
LOWpython/pyspark/tests/test_install_spark.py57 # we just use a hard-coded version.
LOWpython/pyspark/ml/tests/test_functions.py208 # just return the batch size as the "prediction"
MEDIUMpython/pyspark/errors/utils.py378 # Excluding Python magic methods that do not utilize JVM functions.
LOWpython/pyspark/pandas/resample.py359 # here just use Pandas' resample on a 1-length series to get it.
LOWpython/pyspark/pandas/generic.py3101 # If Series has only a single value, just return it as a scalar.
MEDIUMpython/pyspark/pandas/series.py6093 # If `where` has duplicate items, leverage the pandas directly
LOWpython/pyspark/pandas/utils.py794 # '+' is meaningless for writing methods, but pandas just pass it as 'w'.
LOWpython/pyspark/pandas/utils.py798 # '+' is meaningless for writing methods, but pandas just pass it as 'a'.
LOWpython/pyspark/pandas/frame.py10126 # In this case, we can simply use `summary` to calculate the stats.
MEDIUMpython/pyspark/pandas/tests/groupby/test_stat.py30 # TODO: All statistical functions should leverage this utility
MEDIUMpython/pyspark/sql/session.py601 # used in conjunction with Spark Connect mode.
MEDIUMpython/pyspark/sql/tests/test_functions.py3042 """Test tuple_sketch_agg + operations + estimate comprehensive test - double"""
MEDIUMpython/pyspark/sql/tests/test_functions.py3097 """Test tuple_sketch_agg + operations + estimate comprehensive test - integer"""
MEDIUM…ing/test_pandas_transform_with_state_state_variable.py354 # TODO SPARK-50908 holistic fix for TTL suite
MEDIUMpython/pyspark/sql/connect/client/core.py389 # Rewrite the URL to use http as the scheme so that we can leverage
LOWR/pkg/R/sparkR.R664#' To remove/unset property simply set `value` to NULL e.g. setLocalProperty("key", NULL)
MEDIUMR/pkg/R/column.R296#' Can be used in conjunction with \code{when} to specify a default value for expressions.
MEDIUM…apache/spark/streaming/ReceivedBlockTrackerSuite.scala320 // deletion more robust rather than a parallelized operation where we fire and forget
MEDIUM…cala/org/apache/spark/streaming/ui/StreamingPage.scala163 // We leverage timeFormat as the value would be same as timeFormat. This means it is
MEDIUM…rg/apache/spark/network/crypto/CtrTransportCipher.java229 // to utilize two helper ByteArrayWritableChannel for streaming. One is used to receive raw data
MEDIUM…network/shuffle/streaming/StreamingShuffleMessage.java68 // Essentially, other message types from reader to writer won't have a valid sequence number.
MEDIUM…scala/org/apache/spark/examples/mllib/LDAExample.scala139 // add (1.0 / actualCorpusSize) to MiniBatchFraction be more robust on tiny datasets.
MEDIUM…a/org/apache/spark/sql/StatisticsCollectionSuite.scala934 // We can't leverage LogicalRDD.fromDataset here, since it triggers physical planning and
MEDIUM…c/test/scala/org/apache/spark/sql/DataFrameSuite.scala1637 // We can't leverage LogicalRDD.fromDataset here, since it triggers physical planning and
MEDIUM…apache/spark/sql/streaming/FileStreamSourceSuite.scala2342 // file stream source will not leverage unread files - next batch will also trigger
MEDIUM…org/apache/spark/sql/execution/UnionCodegenSuite.scala533 // Explicit cap so the assertion is robust to future default changes.
MEDIUM…ion/datasources/v2/state/StateDataSourceTestBase.scala103 // check with more data - leverage full partitions
MEDIUM…park/sql/catalyst/analysis/ResolveSessionCatalog.scala227 // resolution was skipped) so the rewrite stays robust across analyzer ordering changes.
MEDIUM…in/scala/org/apache/spark/sql/jdbc/OracleDialect.scala144 // Not sure if there is a more robust way to identify the field as a float (or other
MEDIUM…icpruning/RowLevelOperationRuntimeGroupFiltering.scala78 // in order to leverage a regular batch scan in the group filter query
MEDIUM…on/python/streaming/ApplyInPandasWithStateWriter.scala107 // from the entire data part of Arrow RecordBatch. We leverage the state metadata to also
MEDIUM…ors/stateful/join/StreamingSymmetricHashJoinExec.scala1098 // to let users leverage both sides of event time column for output of join, so the watermark
MEDIUM…/execution/streaming/runtime/FileStreamSourceLog.scala130 // be started. We leverage the fact to skip calculation if possible.
MEDIUM…sql/execution/streaming/runtime/ProgressReporter.scala572 // by itself, so leverage it.
MEDIUM…ark/sql/catalyst/expressions/CodeGenerationSuite.scala603 | // to make the test more robust, in case the compiler can eliminate the else branch.
MEDIUM…e/spark/sql/catalyst/analysis/RelationResolution.scala397 // To utilize this code path to execute V1 commands, e.g. INSERT,
MEDIUM…ql/catalyst/expressions/SubExprEvaluationRuntime.scala100 // We leverage `IdentityHashMap` so we compare expression keys by reference here.
MEDIUM…k/sql/catalyst/expressions/codegen/CodeFormatter.scala119 // examines the number of parenthesis and braces in that line. This isn't the most robust
MEDIUM…/spark/sql/hive/execution/HiveCompatibilitySuite.scala287 // The isolated classloader seemed to make some of our test reset mechanisms less robust.
MEDIUM…n/scala/org/apache/spark/sql/hive/HiveInspectors.scala931 // TODO: hard-coding a list here is not very robust. A better idea is to have some kind of query
MEDIUM…/main/java/org/apache/spark/sql/streaming/Trigger.java98 * @deprecated This is deprecated as of Spark 3.4.0. Use {@link #AvailableNow()} to leverage
MEDIUM…e/spark/sql/hive/thriftserver/SharedThriftServer.scala134 // It's much more robust than set a random port generated by ourselves ahead
Verbosity Indicators56 hits · 94 pts
SeverityFileLineSnippet
LOW…/util/collection/unsafe/sort/UnsafeExternalSorter.java474 // Step 1:
LOW…/util/collection/unsafe/sort/UnsafeExternalSorter.java477 // Step 2:
LOW…/util/collection/unsafe/sort/UnsafeExternalSorter.java480 // Step 3:
LOW…/scala/org/apache/spark/storage/BlockInfoManager.scala433 // reader counts. We need to check if the readLocksByTask per tasks are present, if they
LOWpython/pyspark/sql/conversion.py183 # Step 1: pick source columns from batch to align with target schema
LOWpython/pyspark/sql/conversion.py212 # Step 2: check types / cast, collect all mismatches
LOW…/streaming/test_streaming_offline_state_repartition.py109 # Step 1: Write initial data and run streaming query
LOW…/streaming/test_streaming_offline_state_repartition.py116 # Step 2: Repartition to more partitions
LOW…/streaming/test_streaming_offline_state_repartition.py121 # Step 3: Add more data and restart query
LOW…/streaming/test_streaming_offline_state_repartition.py129 # Step 4: Repartition to fewer partitions
LOW…/streaming/test_streaming_offline_state_repartition.py134 # Step 5: Add more data and restart query
LOW…rk/sql/streaming/transform_with_state_driver_worker.py72 # and the following code block should be only run once for each query run
LOWR/pkg/inst/worker/daemon.R98 # Forking succeeded and we need to check if they finished their jobs every second.
LOWR/pkg/inst/worker/worker.R247 # Step 1: hash the data to an environment
LOWR/pkg/inst/worker/worker.R264 # Step 2: write out all of the environment as key-value pairs.
LOW…ming/FlatMapGroupsWithStateWithInitialStateSuite.scala57 // We need to check if not explicitly calling update will still save the init state or not
LOW…ming/FlatMapGroupsWithStateWithInitialStateSuite.scala124 // We need to check if not explicitly calling update will still save the state or not
LOW…on/datasources/v2/state/StateDataSourceReadSuite.scala1718 // Step 1: Run the stateful query to create the full checkpoint structure
LOW…on/datasources/v2/state/StateDataSourceReadSuite.scala1721 // Step 2: Delete the state directory
LOW…on/datasources/v2/state/StateDataSourceReadSuite.scala1727 // Step 3: Attempt to read state - expected to fail since state is deleted
LOW…on/datasources/v2/state/StateDataSourceReadSuite.scala1733 // Step 4: Verify the state directory was NOT recreated by the reader
LOW…execution/streaming/state/RocksDBStateStoreSuite.scala1866 // Step 1: Write data with correct schema and commit
LOW…execution/streaming/state/RocksDBStateStoreSuite.scala1877 // Step 2: Reopen with a wrong valueSchema (StringType instead of IntegerType)
LOW…execution/streaming/state/RocksDBStateStoreSuite.scala1906 // Step 1: Write data with correct schema and commit
LOW…execution/streaming/state/RocksDBStateStoreSuite.scala1918 // Step 2: Reopen with a wrong valueSchema (StringType instead of IntegerType)
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala228 // Step 1: Create state by running a streaming aggregation
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala270 // Step 1: Create state by running a composite key streaming aggregation
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala304 // Step 1: Create state by running stream-stream join
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala316 // Step 2: Test all 4 state stores created by stream-stream join
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala343 // Step 1: Create state by running flatMapGroupsWithState
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala813 // Step 1: Create state by running dropDuplicatesWithinWatermark
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala838 // Step 1: Create state by running dropDuplicates with column
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala863 // Step 1: Create state by running session window aggregation
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala892 // Step 1: Create state by running a streaming aggregation
LOW…state/StatePartitionAllColumnFamiliesWriterSuite.scala965 // Step 1: Create state by running a streaming aggregation
LOW…ng/state/OfflineStateRepartitionIntegrationSuite.scala128 // Step 1: Run initial query to create state
LOW…ng/state/OfflineStateRepartitionIntegrationSuite.scala131 // Step 2: Read state data before repartition
LOW…ng/state/OfflineStateRepartitionIntegrationSuite.scala150 // Step 3: Run repartition
LOW…ng/state/OfflineStateRepartitionIntegrationSuite.scala157 // Step 4: Verify offset and commit logs
LOW…ng/state/OfflineStateRepartitionIntegrationSuite.scala162 // Step 5: Validate state for each store and column family after repartition
LOW…ng/state/OfflineStateRepartitionIntegrationSuite.scala190 // Step 6: Resume query with new input and verify
LOW…g/apache/spark/sql/classic/StreamingQueryManager.scala310 // The following code block checks if a stream with the same name or id is running. Then it
LOW…la/org/apache/spark/sql/execution/SparkSqlParser.scala119 // Step 1: Apply variable substitution to expand any variable references.
LOW…la/org/apache/spark/sql/execution/SparkSqlParser.scala122 // Step 2: Apply parameter substitution if a parameter context is provided.
LOW…la/org/apache/spark/sql/execution/SparkSqlParser.scala147 // Step 3: Set up the origin with SQL text and position mapper to enable
LOW…ql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala133 // Also, we need to check if join is done on 2 tables from 2 different databases within same
LOW…xecution/datasources/parquet/ParquetRowConverter.scala831 // in case of schema evolution), we need to check if the repeated type matches one of the
LOW…/execution/aggregate/TungstenAggregationIterator.scala268 // Step 5: Get the sorted iterator from the externalSorter.
LOW…/execution/aggregate/TungstenAggregationIterator.scala271 // Step 6: Pre-load the first key-value pair from the sorted iterator to make
LOW…/execution/aggregate/TungstenAggregationIterator.scala284 // Step 7: set sortBased to true.
LOW…t/analysis/SequentialStreamingUnionAnalysisSuite.scala227 // Step 1: Flatten the nested unions
LOW…t/analysis/SequentialStreamingUnionAnalysisSuite.scala236 // Step 2: Validate the flattened plan
LOW…/spark/sql/catalyst/optimizer/MergeSubplansSuite.scala762 // Step 1: subquery1 (cp) and subquery2 (np) merge:
LOW…/spark/sql/catalyst/optimizer/MergeSubplansSuite.scala769 // Step 2: subquery3 (np) merges with merged(1,2) (cp). The cp Filter is tagged, so only a
LOW…/spark/sql/catalyst/optimizer/MergeSubplansSuite.scala818 // Step 1: subquery1 (cp) and subquery2 (np) merge as usual:
LOW…/spark/sql/catalyst/optimizer/MergeSubplansSuite.scala824 // Step 2: subquery3 (np, condition a > 1) merges with merged(1,2) (cp). The cp Filter is
Redundant / Tautological Comments63 hits · 92 pts
SeverityFileLineSnippet
LOWpython/run-tests.py506 # Check if the python executable has coverage installed when 'COVERAGE_PROCESS_START'
LOWpython/pyspark/worker.py1358 # Check if this is a continuation of the previous batch's partition
LOWpython/pyspark/worker.py1467 # Check if any partition column changed from previous row
LOWpython/pyspark/shell.py56 # Check if th eprogress bar needs to be disabled.
LOWpython/pyspark/pipelines/cli.py74 # Check if it's a simple file path (no wildcards at all)
LOWpython/pyspark/pipelines/cli.py78 # Check if it's a folder path ending with /**
LOWpython/pyspark/pandas/frame.py12563 # Check if DataFrame has rows - if yes, raise error; if no, return empty Series
LOWpython/pyspark/pandas/frame.py12698 # Check if DataFrame has rows - if yes, raise error; if no, return empty Series
LOWpython/pyspark/pandas/data_type_ops/categorical_ops.py116 # Check if categoricals have the same dtype, same categories, and same ordered
LOWpython/pyspark/pandas/typedef/typehints.py638 # Check if the name is Tuple.
LOWpython/pyspark/pandas/indexes/base.py2068 # Check if the `self` and `other` have different index types.
LOWpython/pyspark/sql/metrics.py188 # Add yourself to the list if you have to.
LOWpython/pyspark/sql/dataframe.py409 >>> # Check if the DataFrames are equal
LOWpython/pyspark/sql/session.py2279 # Check if the target path already exists
LOWpython/pyspark/sql/types.py3139 >>> # Check if numeric values are within the allowed range.
LOWpython/pyspark/sql/tests/test_utils.py1732 # Check if the error message contains information about 2 mismatches only.
LOWpython/pyspark/sql/tests/arrow/test_arrow_map.py329 # Set it to a small odd value to exercise batching logic for all test cases
LOW…s/pandas/streaming/test_pandas_transform_with_state.py1436 # Set it to a very small number so that every row would be a separate pandas df
LOW…s/pandas/streaming/test_pandas_transform_with_state.py1463 # Set it to a very large number so that every row would be in the same pandas df
LOW…s/pandas/streaming/test_pandas_transform_with_state.py1529 # Set it to a very small number so that every row would be a separate pandas df
LOW…/pyspark/sql/tests/pandas/streaming/test_tws_tester.py751 # Set watermark to 15000 - key1's timer should fire.
LOW…/pyspark/sql/tests/pandas/streaming/test_tws_tester.py756 # Set watermark to 16000 - key2's timer should fire.
LOW…/pyspark/sql/tests/pandas/streaming/test_tws_tester.py790 # Set watermark to 6000.
LOW…/pyspark/sql/tests/pandas/streaming/test_tws_tester.py821 # Set watermark to 20 seconds.
LOW…/pyspark/sql/tests/pandas/streaming/test_tws_tester.py923 # Set watermark to 10000.
LOWpython/pyspark/sql/streaming/readwriter.py1550 # Check if the data should be processed
LOWpython/pyspark/sql/worker/plan_data_source_read.py154 # Check if the names are the same as the schema.
LOWpython/pyspark/sql/worker/create_data_source.py81 # Check if the provider name matches the data source's name.
LOWpython/pyspark/sql/worker/write_into_data_source.py97 # Check if the provider name matches the data source's name.
LOWpython/pyspark/sql/connect/session.py1109 # Check if total size exceeds the limit
LOWpython/pyspark/sql/connect/session.py1119 # Check if adding this chunk would exceed batch size
LOWpython/pyspark/sql/connect/client/artifact.py195 # Check if it is a file from the scheme
LOWpython/pyspark/sql/pandas/serializers.py1226 # Check if the entire column is null
LOWpython/pyspark/sql/pandas/serializers.py1469 # Check if the entire column is null
LOWpython/pyspark/sql/pandas/conversion.py872 # Check if any columns need to be fixed for Spark to infer properly
LOWpython/pyspark/sql/pandas/typehints.py69 # Check if all arguments have type hints
LOWpython/pyspark/sql/pandas/typehints.py79 # Check if the return has a type hint
LOWpython/pyspark/sql/pandas/typehints.py228 # Check if all arguments have type hints
LOWpython/pyspark/sql/pandas/typehints.py238 # Check if the return has a type hint
LOWpython/pyspark/sql/pandas/typehints.py421 # Check if all arguments have type hints
LOWpython/pyspark/sql/pandas/typehints.py431 # Check if the return has a type hint
LOWpython/pyspark/sql/pandas/typehints.py514 # Check if all arguments have type hints
LOWpython/pyspark/sql/pandas/typehints.py524 # Check if the return has a type hint
LOWpython/pyspark/sql/pandas/typehints.py600 # Check if the name is Tuple first. After that, check the generic types.
LOWsbin/spark-daemon.sh50# Check if --config is passed as an argument. It is an optional parameter.
LOWsbin/spark-daemon.sh154 # Check if the process has died; in that case we'll tail the log so the user can see
LOWsbin/decommission-worker.sh48# Check if --block-until-exit is set.
LOWsbin/workers.sh57# Check if --config is passed as an argument. It is an optional parameter.
LOW…l/src/test/scala/org/apache/spark/repl/ReplSuite.scala254 |# Set everything to be logged to the console
LOWR/pkg/tests/fulltests/test_jvm_api.R26 # Check if get returns the same element
LOWR/pkg/R/sparkR.R456 # Check if version number of SparkSession matches version number of SparkR package
LOWR/pkg/R/serialize.R45 # Check if all elements are of same type
LOWR/pkg/R/jobj.R31# Check if jobj was created with the current SparkContext
LOWR/pkg/R/DataFrame.R386 # Check if the column names have . in it
LOWR/pkg/R/DataFrame.R2282 # Check if there is any duplicated column name in the DataFrame
LOWR/pkg/inst/worker/worker.R97# Set libPaths to include SparkR package as loadNamespace needs this
LOW.github/workflows/build_and_test.yml1313 # Print the values of environment variables `SKIP_ERRORDOC`, `SKIP_SCALADOC`, `SKIP_PYTHONDOC`, `SKIP_RDOC` and
LOW.github/workflows/build_and_test.yml1337 # Print the values of environment variables `SKIP_ERRORDOC`, `SKIP_SCALADOC`, `SKIP_PYTHONDOC`, `SKIP_RDOC` and
LOW.github/workflows/build_and_test.yml1361 # Print the values of environment variables `SKIP_ERRORDOC`, `SKIP_SCALADOC`, `SKIP_PYTHONDOC`, `SKIP_RDOC` and
LOW.github/workflows/build_and_test.yml1385 # Print the values of environment variables `SKIP_ERRORDOC`, `SKIP_SCALADOC`, `SKIP_PYTHONDOC`, `SKIP_RDOC` and
3 more matches not shown…
Fake / Example Data95 hits · 79 pts
SeverityFileLineSnippet
LOW…hon/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py1110 .withColumn("name", lit("John Doe"))
LOWpython/pyspark/sql/pandas/functions.py109 >>> df = spark.createDataFrame([("John Doe",)], ("name",))
LOWpython/pyspark/sql/pandas/functions.py124 >>> df = spark.createDataFrame([("John Doe",)], ("name",))
LOWpython/pyspark/sql/pandas/functions.py506 >>> df = spark.createDataFrame([("John Doe",)], ("name",))
LOWpython/pyspark/sql/pandas/functions.py518 >>> df = spark.createDataFrame([("John Doe",)], ("name",))
LOW…apache/spark/graphx/lib/ConnectedComponentsSuite.scala119 val defaultUser = ("John Doe", "Missing")
LOWdocs/graphx-programming-guide.md193val defaultUser = ("John Doe", "Missing")
LOWdocs/graphx-programming-guide.md432val defaultUser = ("John Doe", "Missing")
LOWexamples/src/main/python/sql/arrow.py308 df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age"))
LOW…s/test-data/xml-resources/mixed_children_as_string.xml4 Lorem ipsum dolor sit amet. Ut <i>voluptas</i> distinctio et impedit deserunt aut quam fugit et quaerat odit
LOW…s/test-data/xml-resources/mixed_children_as_string.xml4 Lorem ipsum dolor sit amet. Ut <i>voluptas</i> distinctio et impedit deserunt aut quam fugit et quaerat odit
LOW…/test/resources/test-data/xml-resources/processing.xml4 lorem ipsum
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql150INSERT INTO products VALUES (1, 'Super Widget', 'Electronics', 155.99, 99.99, 1, 'Acme Inc', 'John D.', '123 Main St', 2
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql150INSERT INTO products VALUES (1, 'Super Widget', 'Electronics', 155.99, 99.99, 1, 'Acme Inc', 'John D.', '123 Main St', 2
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql154INSERT INTO customers VALUES (1, 'Alice Johnson', 'alice@example.com', '555-1000', '101 Maple Ave', NULL, 'Springfield',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql155INSERT INTO customers VALUES (2, 'Bob Smith', 'bob@example.com', '555-1002', '202 Oak St', 'Apt 3', 'Oakville', 'CA', '6
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql156INSERT INTO customers VALUES (3, 'Cathy Lee', 'cathy@example.com', '555-1003', '303 Pine Ln', NULL, 'Pineville', 'TX', '
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql166INSERT INTO employees VALUES (1, 'Dan Miller', 'dan@example.com', '555-2001', 'Manager', 'Sales', TIMESTAMP '2018-01-01'
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql167INSERT INTO employees VALUES (2, 'Eva Perez', 'eva@example.com', '555-2002', 'Salesperson', 'Sales', TIMESTAMP '2019-03-
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql168INSERT INTO employees VALUES (3, 'Frank Wong', 'frank@example.com', '555-2003', 'Warehouse', 'Operations', TIMESTAMP '20
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql170INSERT INTO suppliers VALUES (1, 'Acme Inc', 'John D.', 'Sales Manager', 'john@acme.com', '555-3001', '555-3002', '123 M
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql170INSERT INTO suppliers VALUES (1, 'Acme Inc', 'John D.', 'Sales Manager', 'john@acme.com', '555-3001', '555-3002', '123 M
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql170INSERT INTO suppliers VALUES (1, 'Acme Inc', 'John D.', 'Sales Manager', 'john@acme.com', '555-3001', '555-3002', '123 M
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql171INSERT INTO suppliers VALUES (2, 'Widgets Co', 'Mary K.', 'Customer Success', 'mary@widgets.com', '555-4001', NULL, '456
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql172INSERT INTO suppliers VALUES (3, 'Toy Supply', 'Ann T.', 'Director', 'ann@toysupply.com', '555-5001', NULL, '789 Oak St'
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql212 INSERT INTO suppliers VALUES (v_temp_id, 'Temp Supplier', 'Temp Contact', 'Temp Role', 'temp
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql260 INSERT INTO customers VALUES (v_new_customer_id, 'New Customer', 'new@customer.com', '555-1111', '55
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql393 VALUES (sub_emp.employee_id + 9999, v_name_part, CONCAT(v_name_part, '@company.com')
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql405 VALUES (emp.employee_id + 10000, CONCAT('Emp_', emp.employee_id), emp.employee_name, 'Employ
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql474 INSERT INTO products VALUES ((SELECT COALESCE(MAX(product_id), 0) + 1 FROM products), 'Rare ' || v_m
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql519 '555-1212',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql552 '555-1111',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql553 '123 Main St',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql727 '555-7777',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql918 INSERT INTO employees VALUES (v_new_id, 'New Emp ' || v_new_id, 'new' || v_new_id || '@c
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql930 INSERT INTO employees VALUES (v_temp_id, 'Manager ' || v_temp_id, 'manager' || v_temp_id || '@compan
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql941 INSERT INTO employees VALUES (v_low_level_emp + 10, 'Temp Emp ' || v_low_level_emp, 'temp' || v_low_level_em
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql982 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql983 '123 Main St',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql1097 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql1169 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql1417 '123 Main St',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql1444 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql1491 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql1528 '555-0001',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql1556 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql1763 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql2072 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql2275 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql2582 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql2615 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql2788 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql2866 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql3112 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql3223 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql3253 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql3282 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql3666 '555-0000',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql3731 '555-0001',
LOW…-tests/inputs/scripting/randomly_generated_scripts.sql3764 '555-0002',
35 more matches not shown…
Slop Phrases14 hits · 22 pts
SeverityFileLineSnippet
LOW…/main/java/org/apache/spark/SparkFirehoseListener.java27 * This is a concrete Java class in order to ensure that we don't forget to update it when adding
MEDIUM…/org/apache/spark/storage/BlockReplicationPolicy.scala101 * Method to prioritize a bunch of candidate peers of a block. This is a basic implementation,
LOWpython/packaging/classic/setup.py151# Also don't forget to update python/docs/source/getting_started/install.rst,
LOWpython/packaging/classic/setup.py151# Also don't forget to update python/docs/source/getting_started/install.rst,
LOWpython/packaging/classic/setup.py351 # Don't forget to update python/docs/source/getting_started/install.rst
LOWpython/packaging/connect/setup.py87 # Also don't forget to update python/docs/source/getting_started/install.rst,
LOWpython/packaging/connect/setup.py87 # Also don't forget to update python/docs/source/getting_started/install.rst,
LOWpython/packaging/connect/setup.py117 # Don't forget to update python/docs/source/getting_started/install.rst
LOWpython/packaging/client/setup.py134 # Also don't forget to update python/docs/source/getting_started/install.rst,
LOWpython/packaging/client/setup.py134 # Also don't forget to update python/docs/source/getting_started/install.rst,
LOWpython/packaging/client/setup.py210 # Don't forget to update python/docs/source/getting_started/install.rst
LOWpython/pyspark/pandas/config.py114# NOTE: if you are fixing or adding an option here, make sure you execute `show_options()` and
LOWdev/create-release/release-build.sh768 # NOTE: Don't forget to update the valid combinations of distributions at
LOW…/main/scala/org/apache/spark/sql/connect/Dataset.scala146 // Make sure we don't forget to set plan id.
Docstring Block Structure1 hit · 5 pts
SeverityFileLineSnippet
HIGHpython/pyspark/testing/sqlutils.py114 Read the classpath file for a project and return it as a comma-separated string. The classpath file is typical
Synthetic Comment Markers1 hit · 2 pts
SeverityFileLineSnippet
HIGHpython/pyspark/ml/dl_util.py103 the empty string, nothing will be written after the auto-generated code.