scikit-learn/scikit-learn

19.2

Adjusted Score

19.2

Raw Score

100%

Time Factor

2026-07-14

Last Push

66.7K

Stars

Python

Language

465.3K

Lines of Code

1.2K

Files

7.3K

Pattern Hits

2026-07-14

Scan Date

0.12

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 0HIGH 139MEDIUM 900LOW 6240

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 7279 distinct pattern matches across 19 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Hyper-Verbose Identifiers4133 hits · 3512 pts

Severity	File	Line	Snippet	Context
LOW	asv_benchmarks/benchmarks/metrics.py	44	def peakmem_pairwise_distances(self, *args):	CODE
LOW	asv_benchmarks/benchmarks/datasets.py	84	def _synth_regression_dataset(n_samples=100000, n_features=100, dtype=np.float32):	CODE
LOW	asv_benchmarks/benchmarks/datasets.py	100	def _synth_regression_sparse_dataset(	CODE
LOW	asv_benchmarks/benchmarks/datasets.py	118	def _synth_classification_dataset(	CODE
LOW	asv_benchmarks/benchmarks/utils.py	24	def make_dict_learning_scorers(caller):	CODE
LOW	sklearn/conftest.py	131	def pytest_collection_modifyitems(config, items):	CODE
LOW	sklearn/conftest.py	274	def munge_scipy_to_check_spmatrix_usage():	CODE
LOW	sklearn/conftest.py	506	def hide_available_matplotlib(monkeypatch):	CODE
LOW	sklearn/multiclass.py	118	def _threshold_for_binary_predict(estimator):	CODE
LOW	sklearn/random_projection.py	63	def johnson_lindenstrauss_min_dim(n_samples, *, eps=0.1):	CODE
LOW	sklearn/random_projection.py	360	def _compute_inverse_components(self):	CODE
LOW	sklearn/multioutput.py	82	def _available_if_estimator_has(attr):	CODE
LOW	sklearn/multioutput.py	621	def _available_if_base_estimator_has(attr):	CODE
LOW	sklearn/pipeline.py	1594	def _fit_transform_one_with_callbacks(	CODE
LOW	sklearn/pipeline.py	1836	def _validate_transformer_weights(self):	CODE
LOW	sklearn/pipeline.py	1896	def _add_prefix_for_feature_names_out(self, transformer_with_feature_names_out):	CODE
LOW	sklearn/calibration.py	600	def _fit_classifier_calibrator_pair(	CODE
LOW	sklearn/tree/_classes.py	198	def _compute_missing_values_in_feature_mask(self, X, estimator_name=None):	CODE
LOW	sklearn/tree/_classes.py	577	def _fit_categorical_features(self, X):	CODE
LOW	sklearn/tree/_classes.py	593	def _transform_categorical_features(self, X):	CODE
LOW	sklearn/tree/_classes.py	798	def cost_complexity_pruning_path(self, X, y, sample_weight=None):	CODE
LOW	sklearn/tree/_classes.py	1636	def _compute_partial_dependence_recursion(self, grid, target_features):	CODE
LOW	sklearn/tree/tests/test_split.py	86	def compute_node_value_and_impurity(self, y, w):	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	44	def test_monotonic_constraints_classifications(	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	123	def test_monotonic_constraints_regressions(	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	207	def test_multiple_output_raises(TreeClassifier):	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	220	def test_bad_monotonic_cst_raises(TreeClassifier):	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	374	def assert_nd_reg_tree_children_monotonic_bounded(tree_, monotonic_cst):	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	444	def test_assert_nd_reg_tree_children_monotonic_bounded():	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	481	def test_nd_tree_nodes_values(	CODE
LOW⚡	sklearn/tree/tests/test_tree.py	758	def test_min_weight_fraction_leaf_on_dense_input(name):	CODE
LOW⚡	sklearn/tree/tests/test_tree.py	764	def test_min_weight_fraction_leaf_on_sparse_input(name, csc_container):	CODE
LOW⚡	sklearn/tree/tests/test_tree.py	768	def check_min_weight_fraction_leaf_with_min_samples_leaf(	CODE
LOW⚡	sklearn/tree/tests/test_tree.py	1636	def test_public_apply_all_trees(name):	CODE
LOW⚡	sklearn/tree/tests/test_tree.py	1646	def test_public_apply_sparse_trees(name, csr_container):	CODE
LOW⚡	sklearn/tree/tests/test_tree.py	1654	def test_decision_path_hardcoded():	CODE
LOW	sklearn/tree/tests/test_tree.py	260	def test_weighted_classification_toy():	CODE
LOW	sklearn/tree/tests/test_tree.py	492	def test_importances_gini_equal_squared_error():	CODE
LOW	sklearn/tree/tests/test_tree.py	698	def check_min_weight_fraction_leaf(name, datasets, sparse_container=None):	CODE
LOW	sklearn/tree/tests/test_tree.py	830	def test_min_weight_fraction_leaf_with_min_samples_leaf_on_dense_input(name):	CODE
LOW	sklearn/tree/tests/test_tree.py	836	def test_min_weight_fraction_leaf_with_min_samples_leaf_on_sparse_input(	CODE
LOW	sklearn/tree/tests/test_tree.py	853	def test_min_impurity_decrease(TreeEstimator, criterion, global_random_seed):	CODE
LOW	sklearn/tree/tests/test_tree.py	1145	def test_sample_weight_invalid():	CODE
LOW	sklearn/tree/tests/test_tree.py	1239	def test_max_leaf_nodes_max_depth():	CODE
LOW	sklearn/tree/tests/test_tree.py	1266	def test_only_constant_features():	CODE
LOW	sklearn/tree/tests/test_tree.py	1277	def test_almost_constant_feature(tree_cls):	CODE
LOW	sklearn/tree/tests/test_tree.py	1297	def test_behaviour_constant_feature_after_splits():	CODE
LOW	sklearn/tree/tests/test_tree.py	1311	def test_with_only_one_non_constant_features():	CODE
LOW	sklearn/tree/tests/test_tree.py	1426	def test_sparse_input_reg_trees(tree_type, dataset):	CODE
LOW	sklearn/tree/tests/test_tree.py	1518	def test_explicit_sparse_zeros(tree_type, csc_container, csr_container):	CODE
LOW	sklearn/tree/tests/test_tree.py	1617	def test_min_weight_leaf_split_level(name, sparse_container):	CODE
LOW	sklearn/tree/tests/test_tree.py	1883	def test_empty_leaf_infinite_threshold(sparse_container):	CODE
LOW	sklearn/tree/tests/test_tree.py	1905	def test_prune_tree_classifier_are_subtrees(dataset, tree_cls):	CODE
LOW	sklearn/tree/tests/test_tree.py	1921	def test_prune_tree_regression_are_subtrees(dataset, tree_cls):	CODE
LOW	sklearn/tree/tests/test_tree.py	1936	def test_prune_single_node_tree():	CODE
LOW	sklearn/tree/tests/test_tree.py	1948	def assert_pruning_creates_subtree(estimator_cls, X, y, pruning_path):	CODE
LOW	sklearn/tree/tests/test_tree.py	2006	def test_apply_path_readonly_all_trees(name, splitter, sparse_container):	CODE
LOW	sklearn/tree/tests/test_tree.py	2126	def test_criterion_entropy_same_as_log_loss(Tree, n_classes):	CODE
LOW	sklearn/tree/tests/test_tree.py	2148	def test_different_endianness_pickle():	CODE
LOW	sklearn/tree/tests/test_tree.py	2158	def get_pickle_non_native_endianness():	CODE
4073 more matches not shown…

Decorative Section Separators746 hits · 2277 pts

Severity	File	Line	Snippet	Context
MEDIUM	sklearn/__init__.py	7	# ==================================	COMMENT
MEDIUM⚡	sklearn/tree/_classes.py	68	# =============================================================================	COMMENT
MEDIUM⚡	sklearn/tree/_classes.py	70	# =============================================================================	COMMENT
MEDIUM	sklearn/tree/_classes.py	90	# =============================================================================	COMMENT
MEDIUM	sklearn/tree/_classes.py	92	# =============================================================================	COMMENT
MEDIUM	sklearn/tree/_classes.py	865	# =============================================================================	COMMENT
MEDIUM	sklearn/tree/_classes.py	867	# =============================================================================	COMMENT
MEDIUM	sklearn/metrics/cluster/tests/test_common.py	28	# ------------------------	COMMENT
MEDIUM	sklearn/metrics/cluster/tests/test_common.py	59	# ---------------------------------------	COMMENT
MEDIUM	sklearn/metrics/cluster/tests/test_common.py	64	# --------------------------------------------------------------------	COMMENT
MEDIUM	sklearn/metrics/tests/test_common.py	111	# -------------------------------------------	COMMENT
MEDIUM	sklearn/metrics/tests/test_common.py	127	# ------------------------	COMMENT
MEDIUM	sklearn/metrics/tests/test_common.py	306	# ---------------------------------------	COMMENT
MEDIUM	…earn/metrics/_pairwise_distances_reduction/__init__.py	6	# =============================	COMMENT
MEDIUM	…earn/metrics/_pairwise_distances_reduction/__init__.py	33	# ------------------------------------------	COMMENT
MEDIUM	…earn/metrics/_pairwise_distances_reduction/__init__.py	42	# ------------------	COMMENT
MEDIUM	sklearn/ensemble/tests/test_voting.py	703	# ======================	COMMENT
MEDIUM	sklearn/ensemble/tests/test_voting.py	796	# =============================	COMMENT
MEDIUM	sklearn/ensemble/tests/test_bagging.py	1085	# ======================	COMMENT
MEDIUM	sklearn/ensemble/tests/test_bagging.py	1157	# =============================	COMMENT
MEDIUM	sklearn/ensemble/tests/test_stacking.py	903	# ======================	COMMENT
MEDIUM	sklearn/ensemble/tests/test_stacking.py	1019	# =============================	COMMENT
MEDIUM	sklearn/semi_supervised/tests/test_self_training.py	358	# =================================================================	COMMENT
MEDIUM	sklearn/semi_supervised/tests/test_self_training.py	384	# ====================	COMMENT
MEDIUM	sklearn/compose/tests/test_column_transformer.py	2712	# ======================	COMMENT
MEDIUM	sklearn/compose/tests/test_column_transformer.py	2899	# =============================	COMMENT
MEDIUM⚡	sklearn/externals/_arff.py	845	# -----------------------------------------------------------------	COMMENT
MEDIUM⚡	sklearn/externals/_arff.py	853	# -----------------------------------------------------------------	COMMENT
MEDIUM⚡	sklearn/externals/_arff.py	858	# -----------------------------------------------------------------	COMMENT
MEDIUM	sklearn/externals/_arff.py	1	# =============================================================================	COMMENT
MEDIUM	sklearn/externals/_arff.py	5	# =============================================================================	COMMENT
MEDIUM	sklearn/externals/_arff.py	25	# =============================================================================	COMMENT
MEDIUM	sklearn/externals/_arff.py	414	# =============================================================================	COMMENT
MEDIUM	sklearn/externals/_arff.py	663	# =============================================================================	COMMENT
MEDIUM	sklearn/externals/_arff.py	807	# -----------------------------------------------------------------	COMMENT
MEDIUM	sklearn/externals/_arff.py	816	# -----------------------------------------------------------------	COMMENT
MEDIUM	sklearn/externals/_arff.py	1042	# =============================================================================	COMMENT
MEDIUM	sklearn/externals/_arff.py	1107	# =============================================================================	COMMENT
MEDIUM	sklearn/tests/test_pipeline.py	2104	# =====================	COMMENT
MEDIUM	sklearn/tests/test_pipeline.py	2279	# =============================	COMMENT
MEDIUM	sklearn/tests/test_pipeline.py	2337	# =====================================================================	COMMENT
MEDIUM	sklearn/tests/test_pipeline.py	2658	# ====================	COMMENT
MEDIUM	sklearn/linear_model/tests/test_ridge.py	2635	# ======================	COMMENT
MEDIUM	sklearn/linear_model/tests/test_ridge.py	2665	# =============================	COMMENT
MEDIUM	sklearn/utils/_pprint.py	17	# --------------------------------------------	COMMENT
MEDIUM	sklearn/utils/_metadata_requests.py	262	# ==============	COMMENT
MEDIUM	sklearn/utils/_metadata_requests.py	323	# =====================================	COMMENT
MEDIUM	sklearn/utils/_metadata_requests.py	778	# ============================	COMMENT
MEDIUM	sklearn/utils/_metadata_requests.py	1343	# ==============	COMMENT
MEDIUM	sklearn/utils/_metadata_requests.py	1759	# ==========================	COMMENT
MEDIUM	sklearn/manifold/tests/test_locally_linear.py	19	# ----------------------------------------------------------------------	COMMENT
MEDIUM	sklearn/manifold/tests/test_locally_linear.py	40	# ----------------------------------------------------------------------	COMMENT
MEDIUM	sklearn/model_selection/tests/test_validation.py	2448	# ======================================================	COMMENT
MEDIUM	sklearn/model_selection/tests/test_validation.py	2695	# =============================	COMMENT
MEDIUM	sklearn/model_selection/tests/test_search.py	2696	# ======================	COMMENT
MEDIUM	sklearn/model_selection/tests/test_search.py	2760	# =============================	COMMENT
MEDIUM	sklearn/decomposition/_kernel_pca.py	386	# ----------------------------------------------	COMMENT
MEDIUM	examples/bicluster/plot_spectral_biclustering.py	26	# --------------------	COMMENT
MEDIUM	examples/bicluster/plot_spectral_biclustering.py	69	# ------------------------------	COMMENT
MEDIUM	examples/bicluster/plot_spectral_biclustering.py	92	# ----------------	COMMENT
686 more matches not shown…

Unused Imports747 hits · 643 pts

Severity	File	Line	Context
LOW	sklearn/conftest.py	41	CODE
LOW	sklearn/conftest.py	66	CODE
LOW	sklearn/conftest.py	239	CODE
LOW	sklearn/__init__.py	24	CODE
LOW	sklearn/__init__.py	24	CODE
LOW	sklearn/__init__.py	24	CODE
LOW	sklearn/__init__.py	69	CODE
LOW	sklearn/__init__.py	69	CODE
LOW	sklearn/__init__.py	70	CODE
LOW	sklearn/__init__.py	71	CODE
LOW	sklearn/tree/__init__.py	6	CODE
LOW	sklearn/tree/__init__.py	6	CODE
LOW	sklearn/tree/__init__.py	6	CODE
LOW	sklearn/tree/__init__.py	6	CODE
LOW	sklearn/tree/__init__.py	6	CODE
LOW	sklearn/tree/__init__.py	13	CODE
LOW	sklearn/tree/__init__.py	13	CODE
LOW	sklearn/tree/__init__.py	13	CODE
LOW	sklearn/metrics/__init__.py	6	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	7	CODE
LOW	sklearn/metrics/__init__.py	30	CODE
LOW	sklearn/metrics/__init__.py	31	CODE
LOW	sklearn/metrics/__init__.py	32	CODE
LOW	sklearn/metrics/__init__.py	33	CODE
LOW	sklearn/metrics/__init__.py	34	CODE
LOW	sklearn/metrics/__init__.py	35	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
LOW	sklearn/metrics/__init__.py	36	CODE
687 more matches not shown…

Over-Commented Block636 hits · 618 pts

Severity	File	Line	Snippet	Context
LOW	asv_benchmarks/asv.conf.json	21		COMMENT
LOW	asv_benchmarks/asv.conf.json	41	//"install_timeout": 600,	COMMENT
LOW	asv_benchmarks/asv.conf.json	81	//	COMMENT
LOW	asv_benchmarks/asv.conf.json	101	// {"environment_type": "conda", "six": null}, // don't run without six on conda	COMMENT
LOW	asv_benchmarks/asv.conf.json	121	// "results_dir": "results",	COMMENT
LOW	asv_benchmarks/asv.conf.json	141	// skipped for the matching benchmark.	COMMENT
LOW	asv_benchmarks/benchmarks/config.json	1	{	COMMENT
LOW	sklearn/__init__.py	1	"""Configure global settings and get information about the working environment."""	COMMENT
LOW	sklearn/__init__.py	21	import os	COMMENT
LOW	sklearn/__init__.py	41	#	COMMENT
LOW	sklearn/calibration.py	1141		COMMENT
LOW	sklearn/tree/tests/test_monotonic_tree.py	321	depth_first_builder,	COMMENT
LOW	sklearn/tree/tests/test_monotonic_tree.py	401	# down the tree to both children.	COMMENT
LOW	sklearn/tree/tests/test_monotonic_tree.py	481	def test_nd_tree_nodes_values(	COMMENT
LOW	sklearn/metrics/cluster/tests/test_common.py	21	from sklearn.metrics.tests.test_common import check_array_api_metric	COMMENT
LOW	sklearn/metrics/_plot/__init__.py	1	# Authors: The scikit-learn developers	COMMENT
LOW	sklearn/metrics/tests/test_common.py	101	assert_array_equal,	COMMENT
LOW	sklearn/metrics/tests/test_common.py	121	# all metrics that have the same behavior.	COMMENT
LOW	…earn/metrics/_pairwise_distances_reduction/__init__.py	1	# Authors: The scikit-learn developers	COMMENT
LOW	…earn/metrics/_pairwise_distances_reduction/__init__.py	21	# For computational reasons, the reduction are performed on the fly on chunks	COMMENT
LOW	…earn/metrics/_pairwise_distances_reduction/__init__.py	41	# High-level diagram	COMMENT
LOW	…earn/metrics/_pairwise_distances_reduction/__init__.py	61	# \| \| (float{32,64} implem.) \| \|	COMMENT
LOW	…earn/metrics/_pairwise_distances_reduction/__init__.py	81	# - :class:`ArgKmin64` if X and Y are two `float64` array-likes	COMMENT
LOW	sklearn/ensemble/_hist_gradient_boosting/predictor.py	141	# while on 32 bit np.intp = np.int32.	COMMENT
LOW	…hist_gradient_boosting/tests/test_gradient_boosting.py	521	# Test that the class distributions in the whole dataset and in the small	COMMENT
LOW	…_hist_gradient_boosting/tests/test_compare_lightgbm.py	41	# samples is large enough, the structure of the prediction trees found by	COMMENT
LOW	sklearn/cluster/_hdbscan/__init__.py	1	# Authors: The scikit-learn developers	COMMENT
LOW	sklearn/cluster/_hdbscan/hdbscan.py	1	"""	COMMENT
LOW	sklearn/cluster/_hdbscan/hdbscan.py	21	# specific prior written permission.	COMMENT
LOW	sklearn/_loss/loss.py	1	"""	COMMENT
LOW	sklearn/_loss/loss.py	61	# - HistGradientBoostingClassifier: (n_classes, n_samples)	COMMENT
LOW	sklearn/_loss/loss.py	1501	#	COMMENT
LOW	sklearn/_loss/tests/test_loss.py	281	#	COMMENT
LOW	sklearn/gaussian_process/kernels.py	1	"""A set of kernels that can be combined by operators and used in Gaussian processes."""	COMMENT
LOW	sklearn/gaussian_process/_gpr.py	601	alpha = cho_solve((L, GPR_CHOLESKY_LOWER), y_train, check_finite=False)	COMMENT
LOW	sklearn/gaussian_process/_gpr.py	621	# 0.5 * trace((alpha . alpha^T - K^-1) . K_gradient)	COMMENT
LOW	sklearn/datasets/_arff_parser.py	421	# `pd.read_csv` automatically handles double quotes for quoting non-numeric	COMMENT
LOW	sklearn/datasets/images/__init__.py	1	# Authors: The scikit-learn developers	COMMENT
LOW	sklearn/datasets/descr/__init__.py	1	# Authors: The scikit-learn developers	COMMENT
LOW	sklearn/datasets/data/__init__.py	1	# Authors: The scikit-learn developers	COMMENT
LOW	sklearn/externals/conftest.py	1	# Do not collect any tests in externals. This is more robust than using	COMMENT
LOW	sklearn/externals/_arff.py	1	# =============================================================================	COMMENT
LOW	sklearn/externals/array_api_compat/torch/_aliases.py	741		COMMENT
LOW	sklearn/externals/array_api_compat/torch/linalg.py	61	# See linalg_solve_is_vector_rhs in	COMMENT
LOW	sklearn/externals/_numpydoc/docscrape.py	241	desc = r.read_to_next_unindented_line()	COMMENT
LOW	sklearn/externals/_packaging/version.py	1	"""Vendored from	COMMENT
LOW	sklearn/externals/_packaging/version.py	481	local: Optional[Tuple[SubLocalType]],	COMMENT
LOW	sklearn/externals/_packaging/_structures.py	1	"""Vendoered from	COMMENT
LOW	sklearn/tests/test_docstrings.py	61	# We ignore following error code,	COMMENT
LOW	sklearn/linear_model/_quantile.py	201	# min sum(pinball loss) + alpha * L1	COMMENT
LOW	sklearn/linear_model/_logistic.py	961	score_params = _check_method_params(X=X, params=score_params, indices=test)	COMMENT
LOW	sklearn/linear_model/_linear_loss.py	641	grad = grad.ravel(order="F")	COMMENT
LOW	sklearn/linear_model/_linear_loss.py	661	# For 3 classes and n_samples = 1, this looks like ("@" is a bit misused	COMMENT
LOW	sklearn/linear_model/_linear_loss.py	861	# - class indices k, l	COMMENT
LOW	sklearn/linear_model/_coordinate_descent.py	1181	sample_weight = _check_sample_weight(sample_weight, X, dtype=X.dtype)	COMMENT
LOW	sklearn/linear_model/_coordinate_descent.py	1201	# alpha' = sum(sw) * alpha (4)	COMMENT
LOW	sklearn/linear_model/_coordinate_descent.py	1781	# Multiple functions touch X and subsamples of X and can induce a	COMMENT
LOW	sklearn/linear_model/_ridge.py	3061	# `RidgeClassifier` does not accept "sag" or "saga" solver and thus support	COMMENT
LOW	sklearn/linear_model/_glm/glm.py	241	if not linear_loss.base_loss.in_y_true_range(y):	COMMENT
LOW	sklearn/linear_model/_glm/_newton_solver.py	601	" The inner solver detected a pointwise Hessian with many "	COMMENT
576 more matches not shown…

Cross-File Repetition108 hits · 540 pts

Severity	File	Line	Snippet	Context
HIGH	sklearn/multiclass.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/compose/_column_transformer.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/linear_model/_least_angle.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/linear_model/_logistic.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/linear_model/_omp.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/linear_model/_coordinate_descent.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/feature_selection/_from_model.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/model_selection/_search.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/multioutput.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/pipeline.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/calibration.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/model_selection/_classification_threshold.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/ensemble/_voting.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/ensemble/_bagging.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/linear_model/_ridge.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/linear_model/_ransac.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/impute/_iterative.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/covariance/_graph_lasso.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/metrics/_scorer.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/ensemble/_stacking.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/semi_supervised/_self_training.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/compose/_target.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/feature_selection/_rfe.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/feature_selection/_sequential.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/preprocessing/_target_encoder.py	0	get metadata routing of this object. please check :ref:`user guide <metadata_routing>` on how the routing mechanism work	STRING
HIGH	sklearn/base.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/compose/_column_transformer.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/impute/_base.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/impute/_knn.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/impute/_iterative.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/preprocessing/_encoders.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/preprocessing/_polynomial.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/feature_selection/_base.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/preprocessing/_discretization.py	0	mask feature names according to selected features. parameters ---------- input_features : array-like of str or none, def	STRING
HIGH	sklearn/tree/_classes.py	0	fast partial dependence computation. parameters ---------- grid : ndarray, shape (n_samples, n_target_features), dtype=n	STRING
HIGH	sklearn/ensemble/_forest.py	0	fast partial dependence computation. parameters ---------- grid : ndarray, shape (n_samples, n_target_features), dtype=n	STRING
HIGH	sklearn/ensemble/_gb.py	0	fast partial dependence computation. parameters ---------- grid : ndarray, shape (n_samples, n_target_features), dtype=n	STRING
HIGH	…/ensemble/_hist_gradient_boosting/gradient_boosting.py	0	fast partial dependence computation. parameters ---------- grid : ndarray, shape (n_samples, n_target_features), dtype=n	STRING
HIGH	…learn/metrics/_plot/tests/test_common_curve_display.py	0	check that named constructors return the correct type when subclassed. non-regression test for: https://github.com/sciki	STRING
HIGH	…inspection/_plot/tests/test_plot_partial_dependence.py	0	check that named constructors return the correct type when subclassed. non-regression test for: https://github.com/sciki	STRING
HIGH	…spection/_plot/tests/test_boundary_decision_display.py	0	check that named constructors return the correct type when subclassed. non-regression test for: https://github.com/sciki	STRING
HIGH	sklearn/model_selection/tests/test_plot.py	0	check that named constructors return the correct type when subclassed. non-regression test for: https://github.com/sciki	STRING
HIGH	sklearn/ensemble/_voting.py	0	get output feature names for transformation. parameters ---------- input_features : array-like of str or none, default=n	STRING
HIGH	sklearn/feature_extraction/_dict_vectorizer.py	0	get output feature names for transformation. parameters ---------- input_features : array-like of str or none, default=n	STRING
HIGH	sklearn/feature_extraction/text.py	0	get output feature names for transformation. parameters ---------- input_features : array-like of str or none, default=n	STRING
HIGH	sklearn/ensemble/tests/test_voting.py	0	test that the right error message is raised when metadata params are passed while not supported when `enable_metadata_ro	STRING
HIGH	sklearn/ensemble/tests/test_stacking.py	0	test that the right error message is raised when metadata params are passed while not supported when `enable_metadata_ro	STRING
HIGH	sklearn/semi_supervised/tests/test_self_training.py	0	test that the right error message is raised when metadata params are passed while not supported when `enable_metadata_ro	STRING
HIGH	sklearn/compose/tests/test_column_transformer.py	0	test that the right error message is raised when metadata params are passed while not supported when `enable_metadata_ro	STRING
HIGH	sklearn/tests/test_pipeline.py	0	test that the right error message is raised when metadata params are passed while not supported when `enable_metadata_ro	STRING
HIGH	sklearn/linear_model/tests/test_logistic.py	0	test that the right error message is raised when metadata params are passed while not supported when `enable_metadata_ro	STRING
HIGH	sklearn/ensemble/tests/test_voting.py	0	test that the right error is raised when metadata is not requested.	STRING
HIGH	sklearn/ensemble/tests/test_stacking.py	0	test that the right error is raised when metadata is not requested.	STRING
HIGH	sklearn/compose/tests/test_column_transformer.py	0	test that the right error is raised when metadata is not requested.	STRING
HIGH	sklearn/tests/test_pipeline.py	0	test that the right error is raised when metadata is not requested.	STRING
HIGH	sklearn/ensemble/tests/test_stacking.py	0	check that we raise the proper attributeerror when the estimator does not implement the `partial_fit` method, which is d	STRING
HIGH	sklearn/tests/test_multiclass.py	0	check that we raise the proper attributeerror when the estimator does not implement the `partial_fit` method, which is d	STRING
HIGH	sklearn/feature_selection/tests/test_rfe.py	0	check that we raise the proper attributeerror when the estimator does not implement the `partial_fit` method, which is d	STRING
HIGH	sklearn/feature_selection/tests/test_from_model.py	0	check that we raise the proper attributeerror when the estimator does not implement the `partial_fit` method, which is d	STRING
HIGH	…perimental/tests/test_enable_hist_gradient_boosting.py	0	tests for making sure experimental imports work as expected.	STRING
48 more matches not shown…

Self-Referential Comments84 hits · 264 pts

Severity	File	Line	Snippet	Context
MEDIUM	sklearn/tree/tests/test_tree.py	2716	# Create a predictive feature using `y` and with some noise	COMMENT
MEDIUM	sklearn/tree/tests/test_tree.py	2930	# Create a tree with root and two children	COMMENT
MEDIUM	…earn/metrics/_pairwise_distances_reduction/__init__.py	11	# This module provides routines to compute pairwise distances between a set	COMMENT
MEDIUM⚡	sklearn/ensemble/tests/test_forest.py	858	# Create the RTE with sparse=False	COMMENT
MEDIUM⚡	sklearn/ensemble/tests/test_forest.py	871	# Create the RTEs	COMMENT
MEDIUM⚡	sklearn/ensemble/tests/test_forest.py	1900	# Create a predictive feature using `y` and with some noise	COMMENT
MEDIUM	sklearn/ensemble/_hist_gradient_boosting/__init__.py	1	"""This module implements histogram-based gradient boosting estimators.	STRING
MEDIUM	sklearn/gaussian_process/tests/test_gpr.py	411	# Define a dummy optimizer that simply tests 50 random hyperparameters	COMMENT
MEDIUM	sklearn/gaussian_process/tests/test_gpc.py	166	# Define a dummy optimizer that simply tests 10 random hyperparameters	COMMENT
MEDIUM	sklearn/compose/_column_transformer.py	1310	# This function is not validated using validate_params because	COMMENT
MEDIUM	sklearn/datasets/_svmlight_format_io.py	1	"""This module implements a loader and dumper for the svmlight format	STRING
MEDIUM	sklearn/datasets/_openml.py	194	# Create a tmpdir as a subfolder of dir_name where the final file will	COMMENT
MEDIUM	sklearn/datasets/tests/test_openml.py	1496	# Create a corrupted copy of the arff file (flip the last byte so that its	COMMENT
MEDIUM	sklearn/externals/_arff.py	781	# Create the return object	COMMENT
MEDIUM	sklearn/externals/_arff.py	790	# Create the data helper object	COMMENT
MEDIUM	sklearn/tests/test_calibration.py	314	# This function is called from _CalibratedClassifier.predict_proba.	COMMENT
MEDIUM⚡	sklearn/tests/test_naive_bayes.py	201	# Create an empty array	COMMENT
MEDIUM	sklearn/tests/test_pipeline.py	1525	# Create a new pipeline with cloned estimators	COMMENT
MEDIUM	sklearn/tests/test_pipeline.py	2341	# This class is used in this section for testing routing in the pipeline.	COMMENT
MEDIUM	sklearn/linear_model/tests/test_sgd.py	1881	# Define a ground truth on the scaled data	COMMENT
MEDIUM	sklearn/linear_model/tests/test_sgd.py	2133	# Create a classification problem with 50000 features and 20 classes. Using	COMMENT
MEDIUM	sklearn/linear_model/tests/test_least_angle.py	363	# Create an ill-conditioned situation in which the LARS has to go	COMMENT
MEDIUM	sklearn/linear_model/tests/test_coordinate_descent.py	1677	# Create a problem sufficiently large to cause memmapping (1MB).	COMMENT
MEDIUM	sklearn/impute/tests/test_impute.py	193	# Create a matrix X with columns	COMMENT
MEDIUM	sklearn/impute/tests/test_impute.py	209	# Create the columns	COMMENT
MEDIUM	sklearn/utils/_array_api.py	1392	# The following code is strongly inspired and simplified from	COMMENT
MEDIUM	sklearn/utils/metadata_routing.py	3	# This module is not a separate sub-folder since that would result in a circular	COMMENT
MEDIUM	sklearn/utils/_testing.py	562	# Create a list of parameters to compare with the parameters gotten	COMMENT
MEDIUM	sklearn/utils/validation.py	48	# This function is not used anymore at this moment in the code base but we keep it in	COMMENT
MEDIUM	sklearn/utils/tests/test_estimator_checks.py	1337	# This module is run as a script to check that we have no dependency on	COMMENT
MEDIUM	sklearn/utils/tests/test_multiclass.py	575	# Define the sparse matrix with a mix of implicit and explicit zeros	COMMENT
MEDIUM	sklearn/utils/_repr_html/base.py	153	"""This function is returned by the @property `_repr_html_` to make	STRING
MEDIUM	sklearn/feature_selection/_univariate_selection.py	38	# The following function is a rewriting of scipy.stats.f_oneway	COMMENT
MEDIUM	sklearn/inspection/tests/test_permutation_importance.py	457	# Creating a scorer function that does not takes sample_weight	COMMENT
MEDIUM	sklearn/manifold/_isomap.py	418	# Create the graph of shortest distances from X to	COMMENT
MEDIUM	sklearn/mixture/tests/test_gaussian_mixture.py	259	# Define the bad precisions for each covariance_type	COMMENT
MEDIUM	sklearn/preprocessing/_data.py	2914	# Create the quantiles of reference	COMMENT
MEDIUM	sklearn/model_selection/_search.py	1069	# Create the subcontexts ahead of time to avoid creating them on the fly	COMMENT
MEDIUM	sklearn/model_selection/tests/test_validation.py	2061	# Create a failing classifier to deliberately fail	COMMENT
MEDIUM	sklearn/model_selection/tests/test_validation.py	2126	# Create a failing classifier to deliberately fail	COMMENT
MEDIUM	sklearn/model_selection/tests/test_validation.py	2154	# Create a failing classifier to deliberately fail	COMMENT
MEDIUM	sklearn/neighbors/_classification.py	419	# This function is defined here only to modify the parent docstring	COMMENT
MEDIUM	sklearn/neighbors/_classification.py	887	# This function is defined here only to modify the parent docstring	COMMENT
MEDIUM	sklearn/neighbors/_nca.py	317	# Create a dictionary of parameters to be passed to the optimizer	COMMENT
MEDIUM	examples/classification/plot_digits_classification.py	62	# Create a classifier: a support vector classifier	COMMENT
MEDIUM⚡	examples/tree/plot_tree_regression.py	27	# Create a random 1D dataset	COMMENT
MEDIUM⚡	examples/tree/plot_tree_regression.py	91	# Create a random dataset	COMMENT
MEDIUM	examples/ensemble/plot_adaboost_multiclass.py	32	# Creating the dataset	COMMENT
MEDIUM	…/ensemble/plot_random_forest_regression_multioutput.py	34	# Create a random dataset	COMMENT
MEDIUM	examples/ensemble/plot_gradient_boosting_quantile.py	91	# Create an evenly spaced evaluation set of input values spanning the [0, 10]	COMMENT
MEDIUM	examples/ensemble/plot_forest_iris.py	102	# Create a title for each column and the console by using str() and	COMMENT
MEDIUM⚡	examples/cluster/plot_adjusted_for_chance_measures.py	22	# Defining the list of metrics to evaluate	COMMENT
MEDIUM	examples/cluster/plot_kmeans_silhouette_analysis.py	61	# Create a subplot with 1 row and 2 columns	COMMENT
MEDIUM	examples/cluster/plot_kmeans_digits.py	88	# Define the metrics which require only the true labels and estimator	COMMENT
MEDIUM	examples/calibration/plot_compare_calibration.py	98	# Define the classifiers to be compared in the study.	COMMENT
MEDIUM	examples/compose/plot_digits_pipe.py	27	# Define a pipeline to search for the best combination of PCA truncation	COMMENT
MEDIUM	examples/compose/plot_digits_pipe.py	30	# Define a Standard Scaler to normalize inputs	COMMENT
MEDIUM	examples/compose/plot_compare_reduction.py	111	# Create a temporary folder to store the transformers of the pipeline	COMMENT
MEDIUM	examples/linear_model/plot_ridge_coeffs.py	66	# Creating a non-noisy data set	COMMENT
MEDIUM	examples/linear_model/plot_sgd_early_stopping.py	89	# Define the estimators to compare	COMMENT
24 more matches not shown…

Deep Nesting277 hits · 254 pts

Severity	File	Line	Context
LOW	sklearn/conftest.py	131	CODE
LOW	sklearn/kernel_approximation.py	705	CODE
LOW	sklearn/multioutput.py	705	CODE
LOW	sklearn/pipeline.py	678	CODE
LOW	sklearn/discriminant_analysis.py	38	CODE
LOW	sklearn/dummy.py	252	CODE
LOW	sklearn/dummy.py	339	CODE
LOW	sklearn/dummy.py	543	CODE
LOW	sklearn/base.py	100	CODE
LOW	sklearn/calibration.py	689	CODE
LOW	sklearn/calibration.py	783	CODE
LOW	sklearn/tree/_export.py	560	CODE
LOW	sklearn/tree/_export.py	747	CODE
LOW	sklearn/tree/_classes.py	238	CODE
LOW	sklearn/tree/tests/test_split.py	86	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	245	CODE
LOW	sklearn/tree/tests/test_monotonic_tree.py	374	CODE
LOW	sklearn/tree/tests/test_tree.py	853	CODE
LOW	sklearn/metrics/_scorer.py	566	CODE
LOW	sklearn/metrics/_classification.py	443	CODE
LOW	sklearn/metrics/_classification.py	1899	CODE
LOW	sklearn/metrics/_regression.py	60	CODE
LOW	sklearn/metrics/_regression.py	970	CODE
LOW	sklearn/metrics/_regression.py	1385	CODE
LOW	sklearn/metrics/_regression.py	1430	CODE
LOW	sklearn/metrics/pairwise.py	569	CODE
LOW	sklearn/metrics/pairwise.py	2311	CODE
LOW	sklearn/metrics/pairwise.py	2592	CODE
LOW	sklearn/metrics/_ranking.py	128	CODE
LOW	sklearn/metrics/cluster/_supervised.py	80	CODE
LOW	sklearn/metrics/_plot/confusion_matrix.py	88	CODE
LOW	…learn/metrics/_plot/tests/test_common_curve_display.py	486	CODE
LOW	…learn/metrics/_plot/tests/test_common_curve_display.py	593	CODE
LOW	sklearn/metrics/tests/test_common.py	827	CODE
LOW	sklearn/metrics/tests/test_common.py	2714	CODE
LOW	sklearn/metrics/tests/test_common.py	2870	CODE
LOW	sklearn/metrics/tests/test_dist_metrics.py	246	CODE
LOW	sklearn/metrics/tests/test_ranking.py	129	CODE
LOW	sklearn/metrics/tests/test_pairwise.py	425	CODE
LOW	sklearn/metrics/tests/test_classification.py	2499	CODE
LOW	sklearn/ensemble/_bagging.py	890	CODE
LOW	sklearn/ensemble/_gb.py	114	CODE
LOW	sklearn/ensemble/_gb.py	527	CODE
LOW	sklearn/ensemble/_gb.py	608	CODE
LOW	sklearn/ensemble/_stacking.py	88	CODE
LOW	sklearn/ensemble/tests/test_forest.py	345	CODE
LOW	sklearn/ensemble/tests/test_forest.py	364	CODE
LOW	…/ensemble/_hist_gradient_boosting/gradient_boosting.py	394	CODE
LOW	sklearn/cluster/_agglomerative.py	429	CODE
LOW	sklearn/cluster/_optics.py	922	CODE
LOW	sklearn/cluster/_optics.py	1021	CODE
LOW	sklearn/cluster/_kmeans.py	462	CODE
LOW	sklearn/cluster/_kmeans.py	630	CODE
LOW	sklearn/cluster/_kmeans.py	874	CODE
LOW	sklearn/cluster/_kmeans.py	964	CODE
LOW	sklearn/cluster/_spectral.py	672	CODE
LOW	sklearn/cluster/_affinity_propagation.py	34	CODE
LOW	sklearn/cluster/tests/test_hierarchical.py	102	CODE
LOW	sklearn/cluster/tests/test_bicluster.py	94	CODE
LOW	sklearn/feature_extraction/_dict_vectorizer.py	142	CODE
217 more matches not shown…

AI Structural Patterns234 hits · 232 pts

Severity	File	Line	Context
LOW	asv_benchmarks/benchmarks/neighbors.py	29	CODE
LOW	asv_benchmarks/benchmarks/model_selection.py	57	CODE
LOW	asv_benchmarks/benchmarks/ensemble.py	111	CODE
LOW	sklearn/kernel_approximation.py	989	CODE
LOW	sklearn/isotonic.py	439	CODE
LOW	sklearn/_config.py	64	CODE
LOW	sklearn/_config.py	249	CODE
LOW	sklearn/tree/_export.py	116	CODE
LOW	sklearn/tree/_export.py	835	CODE
LOW	sklearn/tree/_export.py	247	CODE
LOW	sklearn/tree/_export.py	455	CODE
LOW	sklearn/tree/_export.py	635	CODE
LOW	sklearn/tree/_classes.py	1149	CODE
LOW	sklearn/tree/_classes.py	1555	CODE
LOW	sklearn/tree/_classes.py	1943	CODE
LOW	sklearn/tree/_classes.py	2222	CODE
LOW	sklearn/metrics/cluster/_bicluster.py	49	CODE
LOW	sklearn/metrics/_plot/det_curve.py	229	CODE
LOW	sklearn/metrics/_plot/precision_recall_curve.py	406	CODE
LOW	sklearn/metrics/_plot/precision_recall_curve.py	564	CODE
LOW	sklearn/metrics/_plot/precision_recall_curve.py	738	CODE
LOW	sklearn/metrics/_plot/roc_curve.py	297	CODE
LOW	sklearn/metrics/_plot/roc_curve.py	436	CODE
LOW	sklearn/metrics/_plot/roc_curve.py	576	CODE
LOW	sklearn/metrics/_plot/confusion_matrix.py	88	CODE
LOW	sklearn/metrics/_plot/confusion_matrix.py	205	CODE
LOW	sklearn/metrics/_plot/confusion_matrix.py	354	CODE
LOW	sklearn/ensemble/_forest.py	206	CODE
LOW	sklearn/ensemble/_forest.py	729	CODE
LOW	sklearn/ensemble/_forest.py	1017	CODE
LOW	sklearn/ensemble/_forest.py	1516	CODE
LOW	sklearn/ensemble/_forest.py	1894	CODE
LOW	sklearn/ensemble/_forest.py	2292	CODE
LOW	sklearn/ensemble/_forest.py	2650	CODE
LOW	sklearn/ensemble/_forest.py	2924	CODE
LOW	sklearn/ensemble/_bagging.py	294	CODE
LOW	sklearn/ensemble/_bagging.py	855	CODE
LOW	sklearn/ensemble/_bagging.py	1355	CODE
LOW	sklearn/ensemble/_gb.py	365	CODE
LOW	sklearn/ensemble/_gb.py	1466	CODE
LOW	sklearn/ensemble/_gb.py	2071	CODE
LOW	sklearn/ensemble/_iforest.py	250	CODE
LOW	…/ensemble/_hist_gradient_boosting/gradient_boosting.py	1646	CODE
LOW	…/ensemble/_hist_gradient_boosting/gradient_boosting.py	2040	CODE
LOW	sklearn/ensemble/_hist_gradient_boosting/grower.py	244	CODE
LOW	sklearn/cluster/_agglomerative.py	953	CODE
LOW	sklearn/cluster/_agglomerative.py	1286	CODE
LOW	sklearn/cluster/_optics.py	267	CODE
LOW	sklearn/cluster/_dbscan.py	29	CODE
LOW	sklearn/cluster/_dbscan.py	372	CODE
LOW	sklearn/cluster/_bicluster.py	497	CODE
LOW	sklearn/cluster/_bicluster.py	616	CODE
LOW	sklearn/cluster/_kmeans.py	304	CODE
LOW	sklearn/cluster/_kmeans.py	1386	CODE
LOW	sklearn/cluster/_kmeans.py	1900	CODE
LOW	sklearn/cluster/_spectral.py	194	CODE
LOW	sklearn/cluster/_spectral.py	636	CODE
LOW	sklearn/cluster/_bisect_k_means.py	228	CODE
LOW	sklearn/cluster/_affinity_propagation.py	189	CODE
LOW	sklearn/cluster/_affinity_propagation.py	465	CODE
174 more matches not shown…

AI Slop Vocabulary67 hits · 173 pts

Severity	File	Line	Snippet	Context
MEDIUM	sklearn/metrics/_classification.py	548	# namespace and device so as to be able to leverage the efficient	STRING
MEDIUM	sklearn/metrics/tests/test_dist_metrics.py	193	# Choose rtol to make sure that this test is robust to changes in the random	COMMENT
LOW	sklearn/metrics/tests/test_pairwise.py	1654	# Single dimension input, just return tuple of contents.	COMMENT
MEDIUM⚡	sklearn/ensemble/tests/test_forest.py	1895	# `forest_non_predictive`: meaningful for R2/accuracy, but robust in tests.	COMMENT
MEDIUM	sklearn/feature_extraction/tests/test_text.py	471	# this is robust to features with only zeros	COMMENT
LOW	sklearn/_loss/loss.py	1492	# dtypes. For float64, we simply use the values that are present in the	COMMENT
MEDIUM	sklearn/datasets/tests/test_openml.py	1368	# redownload, to utilize cache	COMMENT
MEDIUM	sklearn/externals/conftest.py	1	# Do not collect any tests in externals. This is more robust than using	COMMENT
MEDIUM	sklearn/externals/array_api_compat/common/_helpers.py	1029	# as we do below for unknown arrays, this is not recommended by JAX best practices.	COMMENT
MEDIUM	sklearn/linear_model/_theil_sen.py	208	"""Theil-Sen Estimator: robust multivariate regression model.	STRING
MEDIUM	sklearn/linear_model/_huber.py	130	"""L2-regularized linear regression model that is robust to outliers.	STRING
MEDIUM	sklearn/linear_model/_glm/tests/test_glm.py	1045	# LBFGS is robust to a collinear design because its approximation of the	COMMENT
MEDIUM	sklearn/linear_model/tests/test_least_angle.py	143	# Check that lars_path is robust to collinearity in input	COMMENT
MEDIUM	sklearn/linear_model/tests/test_base.py	362	# robust to any random seed in the admissible range.	COMMENT
LOW	sklearn/utils/_array_api.py	236	# Note: here we cannot simply use a Python `set` as it requires	COMMENT
LOW	sklearn/utils/_metadata_requests.py	1841	# try doing any routing, we can simply return a structure which returns	COMMENT
LOW	sklearn/utils/tests/test_pprint.py	602	# want to expend the whole line of the right side, just add the ellispsis	COMMENT
LOW	sklearn/utils/tests/test_estimator_checks.py	374	# then just return zeros.	COMMENT
MEDIUM	sklearn/covariance/_graph_lasso.py	108	i = 0 # initialize the counter to be robust to `max_iter=0`	CODE
MEDIUM	sklearn/covariance/_graph_lasso.py	116	# be robust to the max_iter=0 edge case, see:	COMMENT
MEDIUM	sklearn/covariance/_robust_covariance.py	128	# compute initial robust estimates from a random subset	COMMENT
MEDIUM	sklearn/covariance/_robust_covariance.py	131	# get initial robust estimates from the function parameters	COMMENT
MEDIUM	sklearn/covariance/_robust_covariance.py	489	# take the middle points' mean to get the robust location estimate	COMMENT
MEDIUM	sklearn/covariance/_robust_covariance.py	622	"""Minimum Covariance Determinant (MCD): robust estimator of covariance.	STRING
MEDIUM	sklearn/preprocessing/_data.py	1553	"""Scale features using statistics that are robust to outliers.	STRING
MEDIUM	sklearn/preprocessing/tests/test_data.py	1239	# Test robust scaling of 2d array along first axis	COMMENT
MEDIUM	sklearn/preprocessing/tests/test_data.py	1446	# check in conjunction with subsampling	COMMENT
MEDIUM	sklearn/model_selection/_split.py	777	# without attempting to leverage array API namespace features. However	COMMENT
LOW	sklearn/decomposition/_pca.py	529	# Small problem or n_components == 'mle', just call full PCA	COMMENT
MEDIUM	examples/ensemble/plot_gradient_boosting_quantile.py	193	# (underestimation for this asymmetric noise) but is also naturally robust to	COMMENT
MEDIUM	examples/cluster/plot_hdbscan.py	106	# HDBSCAN is much more robust in this sense: HDBSCAN can be seen as	COMMENT
MEDIUM	examples/cluster/plot_hdbscan.py	179	# results regarding density. We will however see that HDBSCAN is relatively robust	COMMENT
MEDIUM	examples/cluster/plot_hdbscan.py	193	# more robust with respect to noisy datasets, e.g. high-variance clusters with	COMMENT
MEDIUM⚡	examples/cluster/plot_adjusted_for_chance_measures.py	27	# example, it is possible to use evaluation metrics that leverage this	COMMENT
MEDIUM	examples/cluster/plot_dbscan.py	68	# that leverage this "supervised" ground truth information to quantify the	COMMENT
MEDIUM	examples/linear_model/plot_quantile_regression.py	92	# insights. On top of that, median estimation is much more robust to outliers	COMMENT
MEDIUM	examples/linear_model/plot_ridge_coeffs.py	161	# When `alpha` is small, the model captures the intricate details of the	COMMENT
MEDIUM	examples/linear_model/plot_ridge_coeffs.py	179	# Some other linear models are formulated to be robust to outliers such as the	COMMENT
MEDIUM	…near_model/plot_tweedie_regression_insurance_claims.py	458	# :class:`~sklearn.linear_model.GammaRegressor` is able to leverage some	COMMENT
MEDIUM	examples/linear_model/plot_logistic_multinomial.py	186	# - This approach can capture more nuanced relationships between classes, potentially	COMMENT
MEDIUM	examples/impute/plot_missing_values.py	268	# robust estimator for data with high magnitude variables which could dominate	COMMENT
MEDIUM⚡	examples/covariance/plot_mahalanobis_distances.py	104	# that of the MCD robust estimator (1.2). This shows that the MCD based	STRING
MEDIUM⚡	examples/covariance/plot_mahalanobis_distances.py	105	# robust estimator is much more resistant to the outlier samples, which were	STRING
MEDIUM⚡	examples/covariance/plot_mahalanobis_distances.py	112	# fit an MCD robust estimator to data	STRING
MEDIUM	examples/covariance/plot_mahalanobis_distances.py	124	# Mahalanobis distances calculated by both methods. Notice that the robust	STRING
MEDIUM	examples/covariance/plot_mahalanobis_distances.py	179	# distribution of inlier samples for robust MCD based Mahalanobis distances.	STRING
MEDIUM	…ples/covariance/plot_robust_vs_empirical_covariance.py	101	# fit a Minimum Covariance Determinant (MCD) robust estimator to data	STRING
MEDIUM	…ples/covariance/plot_robust_vs_empirical_covariance.py	103	# compare raw robust estimates with the true location and covariance	STRING
MEDIUM	examples/miscellaneous/plot_outlier_detection_bench.py	331	# method to avoid granting a privilege to non-binary features and that is robust	COMMENT
MEDIUM	examples/miscellaneous/plot_outlier_detection_bench.py	406	# IQR is robust to outliers: the median and interquartile range are less	COMMENT
MEDIUM	examples/inspection/plot_partial_dependence.py	316	# without any preprocessing as tree-based models are naturally robust to	COMMENT
MEDIUM⚡	examples/svm/plot_svm_kernels.py	266	# different kernels utilize the training data to determine the classification	COMMENT
MEDIUM	examples/svm/plot_svm_kernels.py	278	# For a comprehensive evaluation, fine-tuning of :class:`~sklearn.svm.SVC`	COMMENT
MEDIUM	examples/applications/plot_outlier_detection_wine.py	34	# robust estimator can help concentrate on a relevant cluster when outlying	COMMENT
MEDIUM	examples/applications/plot_outlier_detection_wine.py	102	# robust estimator of covariance to concentrate on the main mode of the data	COMMENT
MEDIUM	…ples/applications/plot_cyclical_feature_engineering.py	285	# the linear regression model to properly leverage the time information: linear	COMMENT
MEDIUM	…ples/applications/plot_cyclical_feature_engineering.py	494	# leverage the periodic time-related features and reduce the error from ~14% to	COMMENT
MEDIUM	…ples/applications/plot_cyclical_feature_engineering.py	559	# leverage those features to properly model intra-day variations.	COMMENT
MEDIUM	…es/release_highlights/plot_release_highlights_1_3_0.py	50	# making it more robust to parameter selection than :class:`cluster.DBSCAN`.	COMMENT
MEDIUM	…es/release_highlights/plot_release_highlights_1_5_0.py	153	# Similarly to most other PCA solvers, the new `"covariance_eigh"` solver can leverage	COMMENT
7 more matches not shown…

AI Response Leakage17 hits · 130 pts

Severity	File	Line	Snippet	Context
HIGH	examples/tree/plot_unveil_tree_structure.py	158	# Note: In this example, `n_outputs=1`, but the tree classifier can also handle	COMMENT
HIGH	examples/tree/plot_cost_complexity_pruning.py	99	# In this example, setting ``ccp_alpha=0.015`` maximizes the testing accuracy.	COMMENT
HIGH	examples/calibration/plot_compare_calibration.py	192	# In this example the training set was intentionally kept very small. In this	COMMENT
HIGH	examples/compose/plot_transformed_target.py	127	# of transforming the targets before learning a model. In this example, the	COMMENT
HIGH	…near_model/plot_tweedie_regression_insurance_claims.py	587	# In this example, both modeling approaches yield comparable performance	COMMENT
HIGH	examples/linear_model/plot_lasso_model_selection.py	22	# In this example, we will use the diabetes dataset.	COMMENT
HIGH	examples/linear_model/plot_lasso_model_selection.py	243	# In this example, both approaches are working similarly. The in-sample	COMMENT
HIGH	examples/linear_model/plot_lasso_and_elasticnet.py	110	# In this example, we demo a :class:`~sklearn.linear_model.Lasso` with a fixed	COMMENT
HIGH	examples/svm/plot_svm_scale_c.py	43	# In this example we investigate the effect of reparametrizing the regularization	STRING
HIGH⚡	examples/svm/plot_svm_kernels.py	257	# with data that exhibits a sigmoidal shape. In this example, careful fine	COMMENT
HIGH⚡	examples/svm/plot_svm_kernels.py	264	# In this example, we have visualized the decision boundaries trained with the	COMMENT
HIGH	examples/preprocessing/plot_target_encoder_cross_val.py	68	# In this example, we generate them to show how :class:`TargetEncoder`'s default	COMMENT
HIGH	examples/model_selection/plot_grid_search_stats.py	126	# cases. In this example we will show how to implement one of them (the so	COMMENT
HIGH	examples/model_selection/plot_grid_search_stats.py	360	# In this example we are going to define the	COMMENT
HIGH	…les/model_selection/plot_grid_search_refit_callable.py	372	# In this example, we've seen how to implement this rule using a custom refit	COMMENT
HIGH	examples/multiclass/plot_multiclass_overview.py	30	# In this example, we use a UCI dataset [1]_, generally referred as the Yeast	COMMENT
HIGH	examples/neighbors/plot_classification.py	18	# In this example, we use the iris dataset. We split the data into a train and test	COMMENT

Modern Structural Boilerplate99 hits · 98 pts

Severity	File	Line	Snippet	Context
LOW	sklearn/multiclass.py	74	__all__ = [	CODE
LOW	sklearn/random_projection.py	49	__all__ = [	CODE
LOW	sklearn/isotonic.py	21	__all__ = ["IsotonicRegression", "check_increasing", "isotonic_regression"]	CODE
LOW	sklearn/multioutput.py	51	__all__ = [	CODE
LOW	sklearn/__init__.py	26	logger = logging.getLogger(__name__)	CODE
LOW	sklearn/naive_bayes.py	40	__all__ = [	CODE
LOW	sklearn/pipeline.py	37	__all__ = ["FeatureUnion", "Pipeline", "make_pipeline", "make_union"]	CODE
LOW	sklearn/discriminant_analysis.py	35	__all__ = ["LinearDiscriminantAnalysis", "QuadraticDiscriminantAnalysis"]	CODE
LOW	sklearn/exceptions.py	6	__all__ = [	CODE
LOW	sklearn/tree/__init__.py	15	__all__ = [	CODE
LOW⚡	sklearn/tree/_classes.py	60	__all__ = [	CODE
LOW	sklearn/metrics/__init__.py	105	__all__ = [	CODE
LOW	sklearn/metrics/cluster/_bicluster.py	10	__all__ = ["consensus_score"]	CODE
LOW	sklearn/metrics/cluster/__init__.py	35	__all__ = [	CODE
LOW	…earn/metrics/_pairwise_distances_reduction/__init__.py	103	__all__ = [	CODE
LOW	sklearn/ensemble/_forest.py	84	__all__ = [	CODE
LOW	sklearn/ensemble/_bagging.py	46	__all__ = ["BaggingClassifier", "BaggingRegressor"]	CODE
LOW	sklearn/ensemble/_weight_boosting.py	53	__all__ = [	CODE
LOW	sklearn/ensemble/__init__.py	25	__all__ = [	CODE
LOW	sklearn/ensemble/_iforest.py	26	__all__ = ["IsolationForest"]	CODE
LOW	sklearn/cluster/_bicluster.py	21	__all__ = ["SpectralBiclustering", "SpectralCoclustering"]	CODE
LOW	sklearn/cluster/__init__.py	36	__all__ = [	CODE
LOW	sklearn/feature_extraction/__init__.py	11	__all__ = [	CODE
LOW	sklearn/feature_extraction/text.py	40	__all__ = [	CODE
LOW	sklearn/feature_extraction/image.py	22	__all__ = [	CODE
LOW	sklearn/_loss/__init__.py	24	__all__ = [	CODE
LOW	sklearn/semi_supervised/__init__.py	13	__all__ = ["LabelPropagation", "LabelSpreading", "SelfTrainingClassifier"]	CODE
LOW	sklearn/semi_supervised/_self_training.py	26	__all__ = ["SelfTrainingClassifier"]	CODE
LOW	sklearn/gaussian_process/__init__.py	10	__all__ = ["GaussianProcessClassifier", "GaussianProcessRegressor", "kernels"]	CODE
LOW	sklearn/compose/_target.py	25	__all__ = ["TransformedTargetRegressor"]	CODE
LOW	sklearn/compose/__init__.py	18	__all__ = [	CODE
LOW	sklearn/compose/_column_transformer.py	54	__all__ = ["ColumnTransformer", "make_column_selector", "make_column_transformer"]	CODE
LOW	sklearn/datasets/_kddcup99.py	51	logger = logging.getLogger(__name__)	CODE
LOW	sklearn/datasets/_california_housing.py	53	logger = logging.getLogger(__name__)	CODE
LOW	sklearn/datasets/_species_distributions.py	64	logger = logging.getLogger(__name__)	CODE
LOW	sklearn/datasets/_covtype.py	45	logger = logging.getLogger(__name__)	CODE
LOW	sklearn/datasets/__init__.py	62	__all__ = [	CODE
LOW	sklearn/datasets/_rcv1.py	79	logger = logging.getLogger(__name__)	CODE
LOW	sklearn/datasets/_lfw.py	35	logger = logging.getLogger(__name__)	CODE
LOW	sklearn/datasets/_twenty_newsgroups.py	56	logger = logging.getLogger(__name__)	CODE
LOW	sklearn/datasets/_openml.py	35	__all__ = ["fetch_openml"]	CODE
LOW	sklearn/externals/array_api_compat/_internal.py	74	__all__ = ["get_xp", "clone_module"]	STRING
LOW	…earn/externals/array_api_compat/dask/array/_aliases.py	353	__all__ = [	CODE
LOW	sklearn/externals/array_api_compat/cupy/_typing.py	3	__all__ = ["Array", "DType", "Device"]	CODE
LOW	sklearn/externals/array_api_compat/torch/_typing.py	1	__all__ = ["Array", "Device", "DType"]	CODE
LOW	sklearn/externals/array_api_compat/torch/_aliases.py	944	__all__ = ['asarray', 'result_type', 'can_cast',	STRING
LOW	sklearn/externals/array_api_compat/numpy/_typing.py	25	__all__ = ["Array", "DType", "Device"]	CODE
LOW	sklearn/externals/array_api_compat/numpy/_info.py	363	__all__ = ["__array_namespace_info__"]	CODE
LOW	sklearn/externals/array_api_compat/common/_fft.py	195	__all__ = [	CODE
LOW	sklearn/externals/array_api_compat/common/_typing.py	161	__all__ = [	CODE
LOW	sklearn/externals/array_api_compat/common/_linalg.py	226	__all__ = ['cross', 'matmul', 'outer', 'tensordot', 'EighResult',	CODE
LOW	sklearn/externals/array_api_compat/common/_aliases.py	661	__all__ = [	CODE
LOW	sklearn/externals/array_api_compat/common/_helpers.py	1070	__all__ = [	CODE
LOW	sklearn/externals/array_api_extra/__init__.py	35	__all__ = [	CODE
LOW	sklearn/externals/array_api_extra/_delegation.py	21	__all__ = [	CODE
LOW	sklearn/externals/array_api_extra/testing.py	34	__all__ = [	CODE
LOW	sklearn/externals/array_api_extra/_lib/_backends.py	11	__all__ = ["NUMPY_VERSION", "Backend"]	CODE
LOW	sklearn/externals/array_api_extra/_lib/_funcs.py	25	__all__ = [	CODE
LOW	…learn/externals/array_api_extra/_lib/_utils/_typing.py	10	__all__ = ["Array", "DType", "Device", "GetIndex", "SetIndex"]	CODE
LOW	…learn/externals/array_api_extra/_lib/_utils/_compat.py	53	__all__ = [	CODE
39 more matches not shown…

Redundant / Tautological Comments51 hits · 78 pts

Severity	File	Line	Snippet	Context
LOW	sklearn/__init__.py	144	# Check if a random seed exists in the environment, if not create one.	COMMENT
LOW	sklearn/pipeline.py	2115	# Check if Xs dimensions are valid	COMMENT
LOW	sklearn/calibration.py	989	# Check if it is the output of predict_proba	COMMENT
LOW	sklearn/tree/tests/test_tree.py	486	# Check if variable importance before fit raises ValueError.	COMMENT
LOW	sklearn/tree/tests/test_export.py	411	# Check if it errors when length of feature_names	COMMENT
LOW	sklearn/metrics/_classification.py	325	# Check if dimensions are consistent.	STRING
LOW	…learn/metrics/_plot/tests/test_common_curve_display.py	436	# Check if the number of parameters match	COMMENT
LOW	sklearn/metrics/tests/test_dist_metrics.py	325	# Check if both callable metric and predefined metric initialized	COMMENT
LOW⚡	sklearn/ensemble/tests/test_gradient_boosting.py	1309	# Check if validation_fraction has an effect	COMMENT
LOW⚡	sklearn/ensemble/tests/test_gradient_boosting.py	1318	# Check if n_estimators_ increase monotonically with n_iter_no_change	COMMENT
LOW	sklearn/ensemble/tests/test_gradient_boosting.py	490	# Check if we can fit even though all targets are equal.	COMMENT
LOW	sklearn/ensemble/tests/test_gradient_boosting.py	504	# Check if quantile loss with alpha=0.5 equals absolute_error.	COMMENT
LOW	sklearn/ensemble/tests/test_gradient_boosting.py	1230	# Check if early stopping works as expected, that is empirically check that the	COMMENT
LOW	sklearn/cluster/tests/test_bisect_k_means.py	57	# Check if results is the same for dense and sparse data	COMMENT
LOW	sklearn/cluster/_hdbscan/hdbscan.py	113	# Check if the mutual reachability matrix has any rows which have	COMMENT
LOW	sklearn/gaussian_process/_gpc.py	488	# Check if we have converged (log marginal likelihood does	COMMENT
LOW	sklearn/gaussian_process/_gpr.py	483	# Check if any of the variances is negative because of	COMMENT
LOW	sklearn/externals/_arff.py	564	# Check if the rows are sorted	COMMENT
LOW	sklearn/linear_model/tests/test_theil_sen.py	164	# Check if median is solution of the Fermat-Weber location problem	COMMENT
LOW	sklearn/utils/estimator_checks.py	4520	# Check if classifier throws an exception when fed regression targets	COMMENT
LOW	sklearn/utils/multiclass.py	421	# Check if multioutput	COMMENT
LOW	sklearn/utils/tests/test_extmath.py	576	# Check if the randomized_svd sign flipping is always done based on u	COMMENT
LOW	sklearn/utils/tests/test_extmath.py	635	# Check if cartesian product delivers the right results	COMMENT
LOW	sklearn/covariance/_robust_covariance.py	174	# Check if best fit already found (det => 0, logdet => -inf)	COMMENT
LOW	sklearn/neural_network/tests/test_rbm.py	134	# Check if we don't get NaNs sampling the full digits dataset.	COMMENT
LOW	sklearn/feature_selection/tests/test_rfe.py	84	# Check if the supports are equal	COMMENT
LOW	sklearn/mixture/tests/test_gaussian_mixture.py	980	# Check if the score increase	COMMENT
LOW⚡	sklearn/preprocessing/tests/test_data.py	1185	# Check if non-finite inputs raise ValueError	COMMENT
LOW	sklearn/preprocessing/tests/test_data.py	745	# Check if StandardScaler inverse_transform is	COMMENT
LOW	sklearn/model_selection/tests/test_split.py	358	# Check if get_n_splits returns the number of folds	COMMENT
LOW	sklearn/model_selection/tests/test_split.py	409	# Check if get_n_splits returns the number of folds	COMMENT
LOW⚡	sklearn/model_selection/tests/test_validation.py	1715	# Check if the additional duplicate indices are caught	COMMENT
LOW	sklearn/model_selection/tests/test_validation.py	617	# Check if ValueError (when groups is None) propagates to cross_val_score	COMMENT
LOW⚡	sklearn/model_selection/tests/test_search.py	1925	# Check if a one time iterable is accepted as a cv parameter.	COMMENT
LOW	sklearn/model_selection/tests/test_search.py	341	# Check if ValueError (when groups is None) propagates to GridSearchCV	COMMENT
LOW	sklearn/model_selection/tests/test_search.py	931	# Check if the search `cv_results`'s array are of correct types	COMMENT
LOW	sklearn/model_selection/tests/test_search.py	1000	# Check if score and timing are reasonable	COMMENT
LOW	sklearn/model_selection/tests/test_search.py	1261	# Check if score and timing are reasonable, also checks if the keys	COMMENT
LOW	sklearn/model_selection/tests/test_search.py	1977	# Check if generators are supported as cv and	COMMENT
LOW	…learn/model_selection/tests/test_successive_halving.py	729	# Check if ValueError (when groups is None) propagates to	COMMENT
LOW	sklearn/decomposition/tests/test_fastica.py	168	# Set atol to account for the different magnitudes of the elements in sources	COMMENT
LOW	sklearn/decomposition/tests/test_incremental_pca.py	452	# Set n_samples_seen_ to be a floating point number instead of an int	COMMENT
LOW	sklearn/neighbors/_lof.py	324	# Verify if negative_outlier_factor_ values are within acceptable range.	COMMENT
LOW	maint_tools/update_tracking_issue.py	170	# Check if test collection failed	COMMENT
LOW	examples/linear_model/plot_ridge_path.py	56	# Display results	COMMENT
LOW	…/linear_model/plot_lasso_lasso_lars_elasticnet_path.py	75	# Display results	COMMENT
LOW	…ples/covariance/plot_robust_vs_empirical_covariance.py	122	# Display results	STRING
LOW	…les/model_selection/plot_grid_search_refit_callable.py	329	# Print the results	COMMENT
LOW	benchmarks/bench_isolation_forest.py	48	# Set this to true for plotting score histograms for each dataset:	COMMENT
LOW	benchmarks/bench_sample_without_replacement.py	195	# Print results	COMMENT
LOW	benchmarks/bench_random_projections.py	273	# Print results	COMMENT

Excessive Try-Catch Wrapping48 hits · 56 pts

Severity	File	Line	Snippet	Context
LOW	sklearn/pipeline.py	2226	except Exception:	CODE
MEDIUM	sklearn/pipeline.py	396	def _final_estimator(self):	CODE
LOW	sklearn/metrics/_scorer.py	180	except Exception as e:	CODE
LOW	sklearn/metrics/_classification.py	2403	except Exception as e:	STRING
LOW	sklearn/ensemble/_base.py	308	except Exception:	CODE
LOW	sklearn/feature_extraction/text.py	419	except Exception:	CODE
LOW	sklearn/compose/_column_transformer.py	1258	except Exception:	CODE
MEDIUM	sklearn/compose/_column_transformer.py	539	def _get_remainder_cols_dtype(self):	CODE
LOW	sklearn/datasets/_kddcup99.py	376	except Exception as e:	CODE
LOW	sklearn/datasets/_twenty_newsgroups.py	311	except Exception as e:	CODE
LOW	sklearn/datasets/_openml.py	83	except Exception as exc:	CODE
LOW	sklearn/datasets/_openml.py	212	except Exception:	CODE
LOW	sklearn/datasets/_openml.py	583	except Exception as exc:	CODE
MEDIUM	sklearn/datasets/_openml.py	78	def wrapper(args, *kw):	CODE
LOW	sklearn/externals/_arff.py	758	except Exception:	CODE
LOW	sklearn/externals/array_api_compat/common/_helpers.py	295	except Exception:	CODE
LOW	sklearn/externals/array_api_compat/common/_helpers.py	1066	except Exception:	CODE
LOW	…earn/externals/array_api_extra/_lib/_utils/_helpers.py	425	except Exception: # pylint: disable=broad-exception-caught	CODE
LOW	sklearn/tests/test_init.py	11	except Exception as e:	CODE
LOW	sklearn/utils/_available_if.py	33	except Exception as e:	CODE
LOW⚡	sklearn/utils/estimator_checks.py	4155	except Exception as e:	CODE
LOW⚡	sklearn/utils/estimator_checks.py	4164	except Exception as e:	CODE
LOW	sklearn/utils/estimator_checks.py	936	except Exception as e:	CODE
LOW	sklearn/utils/estimator_checks.py	1609	except Exception as e:	CODE
LOW	sklearn/utils/estimator_checks.py	1634	except Exception as e:	CODE
LOW	sklearn/utils/estimator_checks.py	4385	except Exception as e:	CODE
LOW	sklearn/utils/_metadata_requests.py	1601	except Exception:	CODE
LOW	sklearn/utils/_testing.py	540	except Exception as exp:	CODE
LOW	sklearn/utils/validation.py	361	except Exception as err:	CODE
LOW	sklearn/utils/tests/test_array_api.py	687	except Exception:	CODE
LOW	sklearn/utils/_repr_html/estimator.py	239	except Exception:	CODE
LOW	sklearn/utils/_repr_html/estimator.py	422	except Exception:	CODE
LOW	sklearn/utils/_repr_html/estimator.py	437	except Exception:	CODE
LOW	sklearn/callback/_callback_support.py	131	except Exception as exc:	CODE
LOW	sklearn/model_selection/_validation.py	862	except Exception:	CODE
LOW	sklearn/model_selection/_validation.py	950	except Exception:	CODE
LOW⚡	benchmarks/bench_rcv1_logreg_convergence.py	41	except Exception:	CODE
LOW⚡	benchmarks/bench_rcv1_logreg_convergence.py	50	except Exception:	CODE
LOW⚡	benchmarks/bench_rcv1_logreg_convergence.py	55	except Exception:	CODE
LOW⚡	build_tools/circle/list_versions.py	20	except Exception:	CODE
MEDIUM⚡	build_tools/circle/list_versions.py	21	print("Error reading", url, file=sys.stderr)	CODE
MEDIUM	build_tools/circle/list_versions.py	17	def json_urlread(url):	CODE
LOW	doc/conf.py	1005	except Exception as e:	CODE
LOW	doc/sphinxext/github_link.py	52	except Exception:	CODE
LOW	doc/sphinxext/github_link.py	57	except Exception:	CODE
LOW	doc/sphinxext/github_link.py	68	except Exception:	CODE
MEDIUM	doc/sphinxext/github_link.py	11	def _get_git_revision():	CODE
LOW	doc/sphinxext/override_pst_pagetoc.py	68	except Exception as e:	CODE

Cross-Language Confusion14 hits · 45 pts

Severity	File	Line	Snippet	Context
HIGH	sklearn/externals/array_api_extra/_delegation.py	693	If ``x < 0`` or ``x >= num_classes``, then the result is undefined, may raise	STRING
HIGH	sklearn/linear_model/_glm/glm.py	419	:math:`D^2 = 1-\\frac{D(y_{true},y_{pred})}{D_{null}}`,	STRING
HIGH	sklearn/linear_model/_glm/glm.py	420	:math:`D_{null}` is the null deviance, i.e. the deviance of a model	STRING
HIGH	sklearn/callback/tests/test_scoring_monitor.py	182	assert log.equals(expected_log)	CODE
HIGH⚡	sklearn/callback/tests/test_scoring_monitor.py	217	assert log.equals(expected_log)	CODE
HIGH	sklearn/decomposition/_dict_learning.py	699	(U^, V^) = argmin 0.5 \|\| X - U V \|\|_Fro^2 + alpha * \|\| U \|\|_1,1	STRING
HIGH	sklearn/decomposition/_dict_learning.py	701	with \|\| V_k \|\|_2 = 1 for all 0 <= k < n_components	STRING
HIGH	sklearn/decomposition/_dict_learning.py	916	(U^, V^) = argmin 0.5 \|\| X - U V \|\|_Fro^2 + alpha * \|\| U \|\|_1,1	STRING
HIGH	sklearn/decomposition/_dict_learning.py	918	with \|\| V_k \|\|_2 = 1 for all 0 <= k < n_components	STRING
HIGH	sklearn/decomposition/_dict_learning.py	1425	(U^,V^) = argmin 0.5 \|\| X - U V \|\|_Fro^2 + alpha * \|\| U \|\|_1,1	STRING
HIGH	sklearn/decomposition/_dict_learning.py	1427	with \|\| V_k \|\|_2 <= 1 for all 0 <= k < n_components	STRING
HIGH	sklearn/decomposition/_dict_learning.py	1768	(U^,V^) = argmin 0.5 \|\| X - U V \|\|_Fro^2 + alpha * \|\| U \|\|_1,1	STRING
HIGH	sklearn/decomposition/_dict_learning.py	1770	with \|\| V_k \|\|_2 <= 1 for all 0 <= k < n_components	STRING
HIGH	sklearn/neighbors/tests/test_neighbors_tree.py	177	heap.push(row, d, i)	CODE

Slop Phrases4 hits · 9 pts

Severity	File	Line	Snippet	Context
LOW	sklearn/compose/_column_transformer.py	922	# we use fit_transform to make sure to set sparse_output_ (for which we	COMMENT
LOW	sklearn/model_selection/_split.py	2382	# to make sure to break them anew in each iteration	COMMENT
MEDIUM	examples/miscellaneous/plot_estimator_representation.py	37	# elements. See :ref:`visualizing_composite_estimators` for how you can use	COMMENT
MEDIUM	…s/release_highlights/plot_release_highlights_0_23_0.py	63	# elements. See :ref:`visualizing_composite_estimators` for how you can use	COMMENT

TODO Padding5 hits · 8 pts

Severity	File	Line	Snippet	Context
LOW	…n/metrics/_pairwise_distances_reduction/_dispatcher.py	76	# TODO: implement a stable simultaneous_sort.	COMMENT
LOW	…n/metrics/_pairwise_distances_reduction/_dispatcher.py	472	# TODO: implement Euclidean specialization using GEMM.	COMMENT
LOW	…n/metrics/_pairwise_distances_reduction/_dispatcher.py	640	# TODO: implement Euclidean specialization using GEMM.	COMMENT
LOW	sklearn/neighbors/_kde.py	40	# TODO: implement a brute force version for testing purposes	COMMENT
LOW	sklearn/neighbors/_kde.py	334	# TODO: implement sampling for other valid kernel shapes	COMMENT

Verbosity Indicators4 hits · 7 pts

Severity	File	Line	Snippet	Context
LOW	sklearn/covariance/_robust_covariance.py	265	The purpose of this function is to find the best sets of n_support	STRING
LOW⚡	…emi_supervised/plot_semi_supervised_versus_svm_iris.py	141	# Step 1: similarities between query and all training samples	COMMENT
LOW⚡	…emi_supervised/plot_semi_supervised_versus_svm_iris.py	144	# Step 2: weighted average of label distributions	COMMENT
LOW⚡	…emi_supervised/plot_semi_supervised_versus_svm_iris.py	147	# Step 3: normalize to sum to 1	COMMENT

Structural Annotation Overuse3 hits · 7 pts

Severity	File	Line	Snippet	Context
LOW⚡	…emi_supervised/plot_semi_supervised_versus_svm_iris.py	141	# Step 1: similarities between query and all training samples	COMMENT
LOW⚡	…emi_supervised/plot_semi_supervised_versus_svm_iris.py	144	# Step 2: weighted average of label distributions	COMMENT
LOW⚡	…emi_supervised/plot_semi_supervised_versus_svm_iris.py	147	# Step 3: normalize to sum to 1	COMMENT

Dead Code2 hits · 4 pts

Severity	File	Line	Snippet	Context
MEDIUM	sklearn/utils/tests/test_estimator_checks.py	1487		CODE
MEDIUM	sklearn/utils/tests/test_estimator_checks.py	1488		CODE

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings