lightgbm-org/LightGBM

5.8

Adjusted Score

5.8

Raw Score

100%

Time Factor

2026-07-12

Last Push

18.6K

Stars

C++

Language

112.6K

Lines of Code

350

Files

653

Pattern Hits

2026-07-14

Scan Date

0.01

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 0HIGH 4MEDIUM 13LOW 636

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 653 distinct pattern matches across 13 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Hyper-Verbose Identifiers264 hits · 220 pts

Severity	File	Line	Snippet	Context
LOW	.ci/parameter-generator.py	199	def gen_parameter_description(	CODE
LOW	tests/python_package_test/test_polars.py	24	def generate_simple_polars_frame() -> pl.DataFrame:	CODE
LOW	tests/python_package_test/test_polars.py	32	def generate_nullable_polars_frame(dtype: Any) -> pl.DataFrame:	CODE
LOW	tests/python_package_test/test_polars.py	43	def generate_dummy_polars_frame() -> pl.DataFrame:	CODE
LOW	tests/python_package_test/test_polars.py	52	def generate_random_polars_frame(	CODE
LOW	tests/python_package_test/test_polars.py	69	def generate_random_polars_series(	CODE
LOW	tests/python_package_test/test_polars.py	119	def test_dataset_construct_fuzzy(tmp_path, polars_frame_fn, dataset_params):	CODE
LOW	tests/python_package_test/test_polars.py	131	def test_dataset_construct_fuzzy_boolean(tmp_path):	CODE
LOW	tests/python_package_test/test_polars.py	147	def test_dataset_construct_fields_fuzzy():	CODE
LOW	tests/python_package_test/test_polars.py	175	def test_dataset_construct_labels(polars_type):	CODE
LOW	tests/python_package_test/test_polars.py	185	def test_dataset_construct_labels_boolean():	CODE
LOW	tests/python_package_test/test_polars.py	198	def test_dataset_construct_weights_none():	CODE
LOW	tests/python_package_test/test_polars.py	208	def test_dataset_construct_weights(polars_type):	CODE
LOW	tests/python_package_test/test_polars.py	222	def test_dataset_construct_groups(polars_type):	CODE
LOW	tests/python_package_test/test_polars.py	236	def test_dataset_construct_init_scores_array(polars_type):	CODE
LOW	tests/python_package_test/test_polars.py	246	def test_dataset_construct_init_scores_table():	CODE
LOW	tests/python_package_test/test_polars.py	266	def assert_equal_predict_polars_pandas(booster: lgb.Booster, data: pl.DataFrame):	CODE
LOW	tests/python_package_test/test_polars.py	308	def test_predict_binary_classification():	CODE
LOW	tests/python_package_test/test_polars.py	323	def test_predict_multiclass_classification():	CODE
LOW	tests/python_package_test/test_polars.py	354	def test_polars_feature_name_auto():	CODE
LOW	tests/python_package_test/test_polars.py	366	def test_polars_feature_name_manual():	CODE
LOW	tests/python_package_test/test_polars.py	379	def test_get_data_polars_frame():	CODE
LOW	tests/python_package_test/test_polars.py	393	def test_get_data_polars_frame_subset(rng):	CODE
LOW⚡	tests/python_package_test/test_engine.py	2436	def test_refit_with_one_tree_regression():	CODE
LOW⚡	tests/python_package_test/test_engine.py	2445	def test_refit_with_one_tree_binary_classification():	CODE
LOW⚡	tests/python_package_test/test_engine.py	2454	def test_refit_with_one_tree_multiclass_classification():	CODE
LOW⚡	tests/python_package_test/test_engine.py	2463	def test_refit_dataset_params(rng):	CODE
LOW⚡	tests/python_package_test/test_engine.py	2539	def test_constant_features_regression():	CODE
LOW⚡	tests/python_package_test/test_engine.py	2546	def test_constant_features_binary():	CODE
LOW⚡	tests/python_package_test/test_engine.py	2552	def test_constant_features_multiclass():	CODE
LOW⚡	tests/python_package_test/test_engine.py	2558	def test_constant_features_multiclassova():	CODE
LOW⚡	tests/python_package_test/test_engine.py	4817	def test_train_and_cv_raise_informative_error_for_train_set_of_wrong_type():	CODE
LOW⚡	tests/python_package_test/test_engine.py	4825	def test_train_and_cv_raise_informative_error_for_impossible_num_boost_round(num_boost_round):	CODE
LOW⚡	tests/python_package_test/test_engine.py	4834	def test_train_raises_informative_error_if_any_valid_sets_are_not_dataset_objects():	CODE
LOW	tests/python_package_test/test_engine.py	164	def test_weighted_percentile_inside_label_range(objective):	CODE
LOW	tests/python_package_test/test_engine.py	201	def test_missing_value_handle():	CODE
LOW	tests/python_package_test/test_engine.py	221	def test_missing_value_handle_more_na():	CODE
LOW	tests/python_package_test/test_engine.py	241	def test_missing_value_handle_na():	CODE
LOW	tests/python_package_test/test_engine.py	272	def test_missing_value_handle_zero():	CODE
LOW	tests/python_package_test/test_engine.py	303	def test_missing_value_handle_none():	CODE
LOW	tests/python_package_test/test_engine.py	398	def test_categorical_handle_na(use_quantized_grad):	CODE
LOW	tests/python_package_test/test_engine.py	448	def test_categorical_non_zero_inputs(use_quantized_grad):	CODE
LOW	tests/python_package_test/test_engine.py	527	def test_multiclass_prediction_early_stopping():	CODE
LOW	tests/python_package_test/test_engine.py	697	def test_ranking_prediction_early_stopping():	CODE
LOW	tests/python_package_test/test_engine.py	780	def test_ranking_with_position_information_with_file(tmp_path):	CODE
LOW	tests/python_package_test/test_engine.py	831	def test_ranking_with_position_information_with_dataset_constructor(tmp_path):	CODE
LOW	tests/python_package_test/test_engine.py	934	def test_early_stopping_ignores_training_set(use_valid):	CODE
LOW	tests/python_package_test/test_engine.py	972	def test_early_stopping_via_global_params(first_metric_only):	CODE
LOW	tests/python_package_test/test_engine.py	1000	def test_early_stopping_is_not_enabled_for_non_positive_stopping_rounds(early_stopping_round):	CODE
LOW	tests/python_package_test/test_engine.py	1050	def test_early_stopping_min_delta(first_only, single_metric, greater_is_better):	CODE
LOW	tests/python_package_test/test_engine.py	1129	def test_early_stopping_min_delta_via_global_params(early_stopping_min_delta):	CODE
LOW	tests/python_package_test/test_engine.py	1151	def test_early_stopping_can_be_triggered_via_custom_callback():	CODE
LOW	tests/python_package_test/test_engine.py	1154	def _early_stop_after_seventh_iteration(env):	CODE
LOW	tests/python_package_test/test_engine.py	1199	def test_continue_train_reused_dataset():	CODE
LOW	tests/python_package_test/test_engine.py	1231	def test_continue_train_multiclass():	CODE
LOW	tests/python_package_test/test_engine.py	1319	def test_cv_works_with_init_model(tmp_path):	CODE
LOW	tests/python_package_test/test_engine.py	1493	def test_feature_name_with_non_ascii(rng, tmp_path):	CODE
LOW	tests/python_package_test/test_engine.py	1510	def test_parameters_are_loaded_from_model_file(tmp_path, capsys, rng):	CODE
LOW	tests/python_package_test/test_engine.py	1560	def test_string_serialized_params_retrieval(rng):	CODE
LOW	tests/python_package_test/test_engine.py	1604	def test_save_load_copy_pickle(tmp_path):	CODE
204 more matches not shown…

Over-Commented Block219 hits · 194 pts

Severity	File	Line	Snippet	Context
LOW	CMakeLists.txt	801	# with clang, libomp doesn't ship with the compiler and might be supplied separately	COMMENT
LOW	CMakeLists.txt	821	# This can't be easily avoided by forcing R-package builds in LightGBM to use R's libomp.dylib	COMMENT
LOW	build_r.R	1	# For macOS users who have decided to use gcc	COMMENT
LOW	build-python.sh	1	#!/bin/sh	COMMENT
LOW	build-python.sh	21	# sh ./build-python.sh install --precompile	COMMENT
LOW	build-python.sh	41	# --gpu	COMMENT
LOW	build-cran-package.sh	1	#!/bin/sh	COMMENT
LOW	.ci/check-workflow-status.sh	1	#!/bin/bash	COMMENT
LOW	.ci/set-commit-status.sh	1	#!/bin/bash	COMMENT
LOW	.ci/append-comment.sh	1	#!/bin/bash	COMMENT
LOW	.ci/install-r-deps.R	1	# Install R dependencies, using only base R.	COMMENT
LOW	.ci/check-dynamic-dependencies.sh	1	#!/bin/bash	COMMENT
LOW	.ci/rerun-workflow.sh	1	#!/bin/bash	COMMENT
LOW	.ci/conda-envs/ci-core.txt	1	# [description]	COMMENT
LOW	R-package/demo/basic_walkthrough.R	81	# Since we do not have this file with us, the following line is just for illustration	COMMENT
LOW	R-package/demo/categorical_features_rules.R	1	# Here we are going to try training a model with categorical features	COMMENT
LOW	R-package/demo/categorical_features_rules.R	21	# $ duration : int 79 220 185 199 226 141 341 151 57 313 ...	COMMENT
LOW	R-package/demo/categorical_features_rules.R	41	# $ marital : num 1 2 1 3 3 2 2 2 1 1 ...	COMMENT
LOW	R-package/demo/efficient_many_training.R	1	# Efficient training means training without giving up too much RAM	COMMENT
LOW	R-package/tests/testthat/helper.R	1	# ref for this file:	COMMENT
LOW	R-package/tests/testthat/test_basic.R	2281	)	COMMENT
LOW	R-package/R/lgb.convert_with_rules.R	61	#'	COMMENT
LOW	R-package/R/lgb.convert_with_rules.R	81	#' new_iris <- lgb.convert_with_rules(data = iris)	COMMENT
LOW	R-package/R/lgb.convert_with_rules.R	101	#'	COMMENT
LOW	R-package/R/lgb.interpret.R	1	#' @name lgb.interpret	COMMENT
LOW	R-package/R/lgb.interpret.R	21	#' \dontshow{data.table::setDTthreads(1L)}	COMMENT
LOW	R-package/R/lgb.interpret.R	41	#' )	COMMENT
LOW	R-package/R/utils.R	161	params$metric <- as.list(unique(unlist(params$metric)))	COMMENT
LOW	R-package/R/utils.R	181	# For example, "num_iterations" can also be provided to lgb.train()	COMMENT
LOW	R-package/R/multithreading.R	1	#' @name setLGBMThreads	COMMENT
LOW	R-package/R/multithreading.R	21	#' @export	COMMENT
LOW	R-package/R/lgb.restore_handle.R	1	#' @name lgb.restore_handle	COMMENT
LOW	R-package/R/lgb.restore_handle.R	21	#' \dontshow{setLGBMthreads(2L)}	COMMENT
LOW	R-package/R/lgb.plot.importance.R	1	#' @name lgb.plot.importance	COMMENT
LOW	R-package/R/lgb.plot.importance.R	21	#' \donttest{	COMMENT
LOW	R-package/R/lgb.model.dt.tree.R	1	#' @name lgb.model.dt.tree	COMMENT
LOW	R-package/R/lgb.model.dt.tree.R	21	#' for a leaf, it simply labels it as \code{"NA"}}	COMMENT
LOW	R-package/R/lgb.model.dt.tree.R	41	#' dtrain <- lgb.Dataset(train$data, label = train$label)	COMMENT
LOW	R-package/R/lgb.train.R	1	#' @name lgb.train	COMMENT
LOW	R-package/R/lgb.train.R	21	#' train <- agaricus.train	COMMENT
LOW	R-package/R/lightgbm.R	1	#' @name lgb_shared_params	COMMENT
LOW	R-package/R/lightgbm.R	21	#' The "metric" section of the documentation}	COMMENT
LOW	R-package/R/lightgbm.R	41	#' \item{\bold{c. list}:	COMMENT
LOW	R-package/R/lightgbm.R	61	#' validation set does not improve for several consecutive iterations.	COMMENT
LOW	R-package/R/lightgbm.R	81	#' de-serialized, the underlying C++ model object gets reconstructed from these raw bytes, but will only	COMMENT
LOW	R-package/R/lightgbm.R	101	#' than \code{\link{lgb.train}}.	COMMENT
LOW	R-package/R/lightgbm.R	121	#' \code{label}).	COMMENT
LOW	R-package/R/lightgbm.R	141	#' If passing \code{NULL} (the default), will try to use the number of physical cores in the	COMMENT
LOW	R-package/R/lightgbm.R	261	what = lgb.train	COMMENT
LOW	R-package/R/lightgbm.R	281	#' https://archive.ics.uci.edu/ml/datasets/Mushroom	COMMENT
LOW	R-package/R/lightgbm.R	301	#' \item{\code{label}: the label for each record}	COMMENT
LOW	R-package/R/lightgbm.R	321	#' UCI Machine Learning Repository.	COMMENT
LOW	R-package/R/lgb.make_serializable.R	1	#' @name lgb.make_serializable	COMMENT
LOW	R-package/R/lgb.Dataset.R	1	#' @name lgb_shared_dataset_params	COMMENT
LOW	R-package/R/lgb.Dataset.R	761	#' @title Construct \code{lgb.Dataset} object	COMMENT
LOW	R-package/R/lgb.Dataset.R	781	#' @param categorical_feature categorical features. This can either be a character vector of feature	COMMENT
LOW	R-package/R/lgb.Dataset.R	841	#' a character representing a path to a text file (CSV, TSV, or LibSVM),	COMMENT
LOW	R-package/R/lgb.Dataset.R	861	#'	COMMENT
LOW	R-package/R/lgb.Dataset.R	881	#' , row.names = FALSE	COMMENT
LOW	R-package/R/lgb.Dataset.R	921		COMMENT
159 more matches not shown…

Redundant / Tautological Comments34 hits · 58 pts

Severity	File	Line	Snippet	Context
LOW	R-package/R/lgb.Predictor.R	27	# Check if model file is a booster handle already	COMMENT
LOW⚡	R-package/R/lgb.Predictor.R	74	# Check if number of iterations is existing - if not, then set it to -1 (use all)	COMMENT
LOW⚡	R-package/R/lgb.Predictor.R	78	# Check if start iterations is existing - if not, then set it to 0 (start from the first iteration)	COMMENT
LOW⚡	R-package/R/lgb.Predictor.R	83	# Check if data is a file name and not a matrix	COMMENT
LOW	R-package/R/lgb.Predictor.R	243	# Check if data is a matrix	COMMENT
LOW	R-package/R/lgb.Predictor.R	424	# Check if data is a dgCMatrix (sparse matrix, column compressed format)	COMMENT
LOW	R-package/R/lgb.Predictor.R	450	# Check if number of rows is strange (not a multiple of the dataset rows)	COMMENT
LOW	R-package/R/lgb.model.dt.tree.R	147	# Check if split index is not null in leaf	COMMENT
LOW	R-package/R/lightgbm.R	226	# Set data to a temporary variable	COMMENT
LOW	R-package/R/lgb.Dataset.R	162	# Check if more categorical features were output over the feature space	COMMENT
LOW	R-package/R/lgb.Dataset.R	366	# Check if dgCMatrix (sparse matrix column compressed)	COMMENT
LOW	R-package/R/lgb.Dataset.R	417	# Check if dgCMatrix (sparse matrix column compressed)	COMMENT
LOW	R-package/R/lgb.Dataset.R	466	# Check if attribute key is in the known attribute list	COMMENT
LOW	R-package/R/lgb.Dataset.R	517	# Check if attribute key is in the known attribute list	COMMENT
LOW	R-package/R/lgb.Dataset.R	1026	# Check if invalid element list	COMMENT
LOW	R-package/R/lgb.Dataset.R	1134	# Check if dataset is not a dataset	COMMENT
LOW	R-package/R/lgb.importance.R	71	# Check if relative values are requested	COMMENT
LOW⚡	R-package/R/lgb.Booster.R	840	# Check if there are evaluation metrics	COMMENT
LOW⚡	R-package/R/lgb.Booster.R	843	# Check if evaluation metric is a function	COMMENT
LOW⚡	R-package/R/lgb.Booster.R	850	# Check if data to assess is existing differently	COMMENT
LOW	R-package/R/lgb.Booster.R	205	# Check if objective is empty	COMMENT
LOW	R-package/R/lgb.Booster.R	403	# Check if evaluation was not done	COMMENT
LOW	R-package/R/lgb.Booster.R	745	# Check if current iteration was already predicted	COMMENT
LOW	R-package/R/lgb.Booster.R	1544	# Check if evaluation result is existing	COMMENT
LOW	R-package/R/lgb.Booster.R	1560	# Check if error is requested	COMMENT
LOW⚡	R-package/R/callback.R	79	# Check if period is at least 1 or more	COMMENT
LOW⚡	R-package/R/callback.R	85	# Check if iteration matches moduo	COMMENT
LOW⚡	R-package/R/callback.R	91	# Check if message is existing	COMMENT
LOW	R-package/R/callback.R	135	# Check if evaluation record exists	COMMENT
LOW	R-package/R/callback.R	206	# Check if verbose or not	COMMENT
LOW	R-package/R/callback.R	264	# Check if score is better	COMMENT
LOW	R-package/R/callback.R	278	# Check if early stopping is required	COMMENT
LOW	R-package/src/install.libs.R	175	# Check if Windows installation (for gcc vs Visual Studio)	COMMENT
LOW	python-package/lightgbm/basic.py	3027	# Check if the weight contains values other than one	COMMENT

Deep Nesting43 hits · 34 pts

Severity	File	Line	Context
LOW	.ci/parameter-generator.py	16	CODE
LOW	.ci/parameter-generator.py	109	CODE
LOW	.ci/parameter-generator.py	199	CODE
LOW	.ci/parameter-generator.py	263	CODE
LOW	tests/python_package_test/test_engine.py	135	CODE
LOW	tests/python_package_test/test_engine.py	724	CODE
LOW	tests/python_package_test/test_engine.py	726	CODE
LOW	tests/python_package_test/test_basic.py	768	CODE
LOW	tests/python_package_test/test_consistency.py	13	CODE
LOW	tests/python_package_test/test_consistency.py	49	CODE
LOW	tests/python_package_test/utils.py	162	CODE
LOW	tests/python_package_test/utils.py	175	CODE
LOW	tests/python_package_test/test_dask.py	153	CODE
LOW	tests/python_package_test/test_dask.py	328	CODE
LOW	tests/python_package_test/test_dask.py	820	CODE
LOW	tests/python_package_test/test_sklearn.py	68	CODE
LOW	tests/python_package_test/test_sklearn.py	1743	CODE
LOW	tests/python_package_test/test_sklearn.py	2084	CODE
LOW	python-package/lightgbm/callback.py	326	CODE
LOW	python-package/lightgbm/callback.py	405	CODE
LOW	python-package/lightgbm/plotting.py	460	CODE
LOW	python-package/lightgbm/plotting.py	482	CODE
LOW	python-package/lightgbm/engine.py	109	CODE
LOW	python-package/lightgbm/engine.py	522	CODE
LOW	python-package/lightgbm/dask.py	196	CODE
LOW	python-package/lightgbm/dask.py	442	CODE
LOW	python-package/lightgbm/dask.py	926	CODE
LOW	python-package/lightgbm/basic.py	348	CODE
LOW	python-package/lightgbm/basic.py	476	CODE
LOW	python-package/lightgbm/basic.py	813	CODE
LOW	python-package/lightgbm/basic.py	1026	CODE
LOW	python-package/lightgbm/basic.py	1994	CODE
LOW	python-package/lightgbm/basic.py	2037	CODE
LOW	python-package/lightgbm/basic.py	2454	CODE
LOW	python-package/lightgbm/basic.py	2663	CODE
LOW	python-package/lightgbm/basic.py	2848	CODE
LOW	python-package/lightgbm/basic.py	3204	CODE
LOW	python-package/lightgbm/basic.py	3375	CODE
LOW	python-package/lightgbm/basic.py	3498	CODE
LOW	python-package/lightgbm/basic.py	5118	CODE
LOW	python-package/lightgbm/sklearn.py	869	CODE
LOW	python-package/lightgbm/sklearn.py	975	CODE
LOW	python-package/lightgbm/sklearn.py	1595	CODE

Unused Imports33 hits · 30 pts

Severity	File	Line	Context
LOW	python-package/lightgbm/compat.py	170	CODE
LOW	python-package/lightgbm/compat.py	171	CODE
LOW	python-package/lightgbm/compat.py	175	CODE
LOW	python-package/lightgbm/compat.py	177	CODE
LOW	python-package/lightgbm/__init__.py	11	CODE
LOW	python-package/lightgbm/__init__.py	11	CODE
LOW	python-package/lightgbm/__init__.py	11	CODE
LOW	python-package/lightgbm/__init__.py	11	CODE
LOW	python-package/lightgbm/__init__.py	12	CODE
LOW	python-package/lightgbm/__init__.py	12	CODE
LOW	python-package/lightgbm/__init__.py	12	CODE
LOW	python-package/lightgbm/__init__.py	12	CODE
LOW	python-package/lightgbm/__init__.py	12	CODE
LOW	python-package/lightgbm/__init__.py	13	CODE
LOW	python-package/lightgbm/__init__.py	13	CODE
LOW	python-package/lightgbm/__init__.py	13	CODE
LOW	python-package/lightgbm/__init__.py	16	CODE
LOW	python-package/lightgbm/__init__.py	16	CODE
LOW	python-package/lightgbm/__init__.py	16	CODE
LOW	python-package/lightgbm/__init__.py	16	CODE
LOW	python-package/lightgbm/__init__.py	20	CODE
LOW	python-package/lightgbm/__init__.py	20	CODE
LOW	python-package/lightgbm/__init__.py	20	CODE
LOW	python-package/lightgbm/__init__.py	20	CODE
LOW	python-package/lightgbm/__init__.py	20	CODE
LOW	python-package/lightgbm/__init__.py	24	CODE
LOW	python-package/lightgbm/__init__.py	24	CODE
LOW	python-package/lightgbm/__init__.py	24	CODE
LOW	python-package/lightgbm/dask.py	52	CODE
LOW	python-package/lightgbm/basic.py	35	CODE
LOW	python-package/lightgbm/basic.py	35	CODE
LOW	python-package/lightgbm/sklearn.py	56	CODE
LOW	docs/conf.py	112	CODE

AI Structural Patterns24 hits · 24 pts

Severity	File	Line	Context
LOW	tests/python_package_test/utils.py	38	CODE
LOW	python-package/lightgbm/plotting.py	37	CODE
LOW	python-package/lightgbm/plotting.py	176	CODE
LOW	python-package/lightgbm/plotting.py	294	CODE
LOW	python-package/lightgbm/plotting.py	749	CODE
LOW	python-package/lightgbm/engine.py	626	CODE
LOW	python-package/lightgbm/dask.py	442	CODE
LOW	python-package/lightgbm/dask.py	1102	CODE
LOW	python-package/lightgbm/dask.py	1180	CODE
LOW	python-package/lightgbm/dask.py	1245	CODE
LOW	python-package/lightgbm/dask.py	1392	CODE
LOW	python-package/lightgbm/dask.py	1457	CODE
LOW	python-package/lightgbm/dask.py	1569	CODE
LOW	python-package/lightgbm/dask.py	1634	CODE
LOW	python-package/lightgbm/basic.py	1703	CODE
LOW	python-package/lightgbm/basic.py	4702	CODE
LOW	python-package/lightgbm/sklearn.py	538	CODE
LOW	python-package/lightgbm/sklearn.py	975	CODE
LOW	python-package/lightgbm/sklearn.py	1414	CODE
LOW	python-package/lightgbm/sklearn.py	1476	CODE
LOW	python-package/lightgbm/sklearn.py	1529	CODE
LOW	python-package/lightgbm/sklearn.py	1595	CODE
LOW	python-package/lightgbm/sklearn.py	1844	CODE
LOW	python-package/lightgbm/sklearn.py	1893	CODE

Cross-Language Confusion3 hits · 22 pts

Severity	File	Line	Snippet	Context
HIGH⚡	docs/conf.py	275	sh build-cran-package.sh \|\| exit 1	CODE
HIGH⚡	docs/conf.py	276	R CMD INSTALL --with-keep.source lightgbm_*.tar.gz \|\| exit 1	CODE
HIGH⚡	docs/conf.py	277	Rscript .ci/build-docs.R \|\| exit 1	CODE

Self-Referential Comments7 hits · 21 pts

Severity	File	Line	Snippet	Context
MEDIUM	R-package/demo/categorical_features_rules.R	69	# Creating the LightGBM dataset with categorical features	COMMENT
MEDIUM	R-package/R/lgb.train.R	165	# Create the predictor set	COMMENT
MEDIUM	R-package/R/lgb.cv.R	186	# Create the predictor set	COMMENT
MEDIUM	R-package/R/lgb.cv.R	541	# Create a vector of integers from 1:k as many times as possible without	COMMENT
MEDIUM	R-package/inst/make-r-def.R	2	# Create a definition file (.def) from a .dll file, using objdump.	COMMENT
MEDIUM	python-package/lightgbm/basic.py	3850	# Create the node record, and populate universal data members	COMMENT
MEDIUM	docs/conf.py	7	# This file is execfile()d with the current directory set to its	COMMENT

AI Slop Vocabulary8 hits · 16 pts

Severity	File	Line	Snippet	Context
LOW	build-python.sh	345	# avoid trying to recompile, just use hatchling and copy in relevant files	COMMENT
LOW	R-package/demo/basic_walkthrough.R	140	# To load it in, simply call lgb.Dataset	COMMENT
MEDIUM	tests/python_package_test/test_sklearn.py	1415	# Verify that eval_metric is robust to receiving a list with None	COMMENT
MEDIUM	tests/python_package_test/test_sklearn.py	359	y = y.astype(str) # utilize label encoder at it's max power	CODE
MEDIUM	tests/python_package_test/test_sklearn.py	387	y = y.astype(str) # utilize label encoder at it's max power	CODE
MEDIUM	tests/python_package_test/test_sklearn.py	417	y = y.astype(str) # utilize label encoder at it's max power	CODE
LOW	python-package/lightgbm/basic.py	2731	# If the data is Arrow, we can just pass it to C	COMMENT
MEDIUM	src/io/dataset_loader.cpp	54	// support to get header from parser config, so could utilize following label name to id mapping logic.	COMMENT

Structural Annotation Overuse6 hits · 12 pts

Severity	File	Line	Snippet	Context
LOW⚡	MAINTAINING.md	31	### Step 1: Put up a Release PR	COMMENT
LOW⚡	MAINTAINING.md	40	### Step 2: Merge the Release PR	COMMENT
LOW⚡	MAINTAINING.md	46	### Step 3: Wait for a New CI Run on `main`	COMMENT
LOW⚡	MAINTAINING.md	52	### Step 4: Create a Release	COMMENT
LOW	MAINTAINING.md	63	### Step 5: Upload Artifacts	COMMENT
LOW	MAINTAINING.md	83	### Step 6: Complete All Other Post-merge Release Steps	COMMENT

Modern Structural Boilerplate8 hits · 8 pts

Severity	File	Line	Snippet	Context
LOW	tests/distributed/_test_distributed.py	84	def _set_ports(self) -> None:	CODE
LOW	python-package/lightgbm/callback.py	21	__all__ = [	CODE
LOW	python-package/lightgbm/plotting.py	15	__all__ = [	CODE
LOW	python-package/lightgbm/__init__.py	33	__all__ = [	CODE
LOW	python-package/lightgbm/engine.py	29	__all__ = [	CODE
LOW	python-package/lightgbm/dask.py	44	__all__ = [	CODE
LOW	python-package/lightgbm/basic.py	38	__all__ = [	CODE
LOW	python-package/lightgbm/sklearn.py	59	__all__ = [	CODE

AI Response Leakage1 hit · 8 pts

Severity	File	Line	Snippet	Context
HIGH	R-package/demo/basic_walkthrough.R	4	# In this example, we are aiming to predict whether a mushroom is edible	COMMENT

Slop Phrases3 hits · 6 pts

Severity	File	Line	Snippet	Context
LOW	R-package/R/lgb.Booster.R	1122	#' different parameters or prediction type, so make sure to check that the output is what	COMMENT
LOW	include/LightGBM/config.h	1098	// desc = Note: don't forget to allow this port in firewall settings before training	COMMENT
MEDIUM	src/treelearner/parallel_tree_learner.h	124	* When #data is large and #feature is large, you can use this to have better speed-up	COMMENT

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings