explosion/spaCy

10.6

Adjusted Score

10.6

Raw Score

100%

Time Factor

2026-05-19

Last Push

33.7K

Stars

Python

Language

227.8K

Lines of Code

1.4K

Files

1.8K

Pattern Hits

2026-07-14

Scan Date

0.09

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 0HIGH 125MEDIUM 22LOW 1619

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 1766 distinct pattern matches across 16 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Hyper-Verbose Identifiers1150 hits · 1252 pts

Severity	File	Line	Snippet	Context
LOW	website/setup/jinja_to_js.py	696	def _process_filter_capitalize(self, node, **kwargs):	STRING
LOW	website/setup/jinja_to_js.py	977	def _process_test_divisibleby(self, node, **kwargs):	STRING
LOW	website/meta/universe.json	5499	"def create_presque_normalizer(nlp, name='presque_normalizer'):",	CODE
LOW	extra/DEVELOPER_DOCS/Code Conventions.md	492	def test_doc_creation_with_pos():	CODE
LOW	extra/DEVELOPER_DOCS/Code Conventions.md	526	def test_en_tokenizer_splits_em_dash_infix(en_tokenizer):	CODE
LOW	extra/DEVELOPER_DOCS/Code Conventions.md	555	def test_phrase_matcher_validation(en_vocab):	CODE
LOW	spacy/util.py	1387	def make_first_longest_spans_filter():	CODE
LOW	spacy/language.py	2163	def _resolve_component_status(	CODE
LOW	spacy/scorer.py	262	def score_token_attr_per_feat(	CODE
LOW	spacy/pipeline/spancat.py	144	def build_ngram_range_suggester(min_size: int, max_size: int) -> Suggester:	CODE
LOW	spacy/pipeline/spancat.py	152	def build_preset_spans_suggester(spans_key: str) -> Suggester:	CODE
LOW	spacy/pipeline/spancat.py	558	def _make_span_group_multilabel(	CODE
LOW	spacy/pipeline/spancat.py	599	def _make_span_group_singlelabel(	CODE
LOW	spacy/pipeline/attributeruler.py	47	def make_attribute_ruler_scorer():	CODE
LOW	spacy/pipeline/span_ruler.py	36	def prioritize_new_ents_filter(	CODE
LOW	spacy/pipeline/span_ruler.py	62	def make_prioritize_new_ents_filter():	CODE
LOW	spacy/pipeline/span_ruler.py	66	def prioritize_existing_ents_filter(	CODE
LOW	spacy/pipeline/span_ruler.py	92	def make_preserve_existing_ents_filter():	CODE
LOW	spacy/pipeline/span_ruler.py	96	def overlapping_labeled_spans_score(	CODE
LOW	spacy/pipeline/span_ruler.py	111	def make_overlapping_labeled_spans_scorer(spans_key: str = DEFAULT_SPANS_KEY):	CODE
LOW	spacy/pipeline/factories.py	727	def make_edit_tree_lemmatizer(	CODE
LOW	spacy/pipeline/entity_linker.py	47	def make_entity_linker_scorer():	STRING
LOW	spacy/pipeline/entity_linker.py	237	def batch_has_learnable_example(self, examples):	CODE
LOW	spacy/pipeline/textcat_multilabel.py	82	def make_textcat_multilabel_scorer():	STRING
LOW	spacy/pipeline/span_finder.py	217	def _get_aligned_truth_scores(self, examples, ops) -> Tuple[Floats2d, Floats2d]:	STRING
LOW	spacy/training/batchers.py	22	def configure_minibatch_by_padded_size(	CODE
LOW	spacy/training/batchers.py	56	def configure_minibatch_by_words(	CODE
LOW	spacy/training/augment.py	13	def create_combined_augmenter(	CODE
LOW	spacy/training/augment.py	85	def create_orth_variants_augmenter(	CODE
LOW	spacy/training/augment.py	102	def create_lower_casing_augmenter(	CODE
LOW	spacy/training/augment.py	337	def construct_modified_raw_text(token_dict):	CODE
LOW	spacy/training/iob_utils.py	63	def _doc_to_biluo_tags_with_partial(doc: Doc) -> List[str]:	CODE
LOW	spacy/training/callbacks.py	10	def create_copy_from_base_model(	CODE
LOW	spacy/training/loop.py	290	def create_evaluation_callback(	CODE
LOW	spacy/training/loop.py	356	def create_before_to_disk_callback(	CODE
LOW	spacy/training/corpus.py	199	def make_examples_gold_preproc(	CODE
LOW	spacy/tests/test_displacy.py	120	def test_displacy_parse_spans(en_vocab):	CODE
LOW	spacy/tests/test_displacy.py	149	def test_displacy_parse_spans_with_kb_id_options(en_vocab):	CODE
LOW	spacy/tests/test_displacy.py	184	def test_displacy_parse_spans_different_spans_key(en_vocab):	CODE
LOW	spacy/tests/test_displacy.py	206	def test_displacy_parse_empty_spans_key(en_vocab):	CODE
LOW	spacy/tests/test_displacy.py	236	def test_displacy_parse_ents_with_kb_id_options(en_vocab):	CODE
LOW	spacy/tests/test_displacy.py	294	def test_displacy_invalid_arcs():	CODE
LOW	spacy/tests/test_displacy.py	313	def test_displacy_raises_for_wrong_type(en_vocab):	CODE
LOW	spacy/tests/test_displacy.py	337	def test_displacy_render_wrapper(en_vocab):	CODE
LOW	spacy/tests/test_displacy.py	353	def test_displacy_render_manual_dep():	CODE
LOW	spacy/tests/test_displacy.py	375	def test_displacy_render_manual_ent():	CODE
LOW	spacy/tests/test_displacy.py	396	def test_displacy_render_manual_span():	CODE
LOW	spacy/tests/test_displacy.py	425	def test_displacy_options_case():	CODE
LOW	spacy/tests/test_displacy.py	440	def test_displacy_manual_sorted_entities():	CODE
LOW	spacy/tests/test_displacy.py	474	def test_displacy_span_stacking():	CODE
LOW	spacy/tests/test_misc.py	79	def test_util_ensure_path_succeeds(text):	CODE
LOW	spacy/tests/test_misc.py	93	def test_util_get_package_path(package):	CODE
LOW	spacy/tests/test_misc.py	176	def test_load_model_blank_shortcut():	CODE
LOW	spacy/tests/test_misc.py	207	def test_is_compatible_version(version, constraint, compatible):	CODE
LOW	spacy/tests/test_misc.py	225	def test_is_unconstrained_version(constraint, expected):	CODE
LOW	spacy/tests/test_misc.py	284	def test_dot_to_dict_overrides(dot_notation, expected):	CODE
LOW	spacy/tests/test_misc.py	351	def test_util_minibatch_oversize(doc_sizes, expected_batches):	CODE
LOW	spacy/tests/util.py	42	def apply_transition_sequence(parser, doc, sequence):	CODE
LOW	spacy/tests/test_factory_imports.py	65	def test_factory_import_compatibility(factory_name, original_module, compat_module):	CODE
LOW	spacy/tests/README.md	111	def test_doc_token_api_strings(en_vocab):	CODE
1090 more matches not shown…

Cross-File Repetition118 hits · 590 pts

Severity	File	Line	Snippet	Context
HIGH	spacy/pipeline/tok2vec.py	0	learn from a batch of documents and gold-standard information, updating the pipe's model. delegates to predict and get_l	STRING
HIGH	spacy/pipeline/spancat.py	0	learn from a batch of documents and gold-standard information, updating the pipe's model. delegates to predict and get_l	STRING
HIGH	spacy/pipeline/entity_linker.py	0	learn from a batch of documents and gold-standard information, updating the pipe's model. delegates to predict and get_l	STRING
HIGH	spacy/pipeline/legacy/entity_linker.py	0	learn from a batch of documents and gold-standard information, updating the pipe's model. delegates to predict and get_l	STRING
HIGH	spacy/pipeline/span_finder.py	0	learn from a batch of documents and gold-standard information, updating the pipe's model. delegates to predict and get_l	STRING
HIGH	spacy/pipeline/textcat.py	0	learn from a batch of documents and gold-standard information, updating the pipe's model. delegates to predict and get_l	STRING
HIGH	spacy/pipeline/lemmatizer.py	0	serialize the pipe to disk. path (str / path): path to a directory. exclude (iterable[str]): string names of serializati	STRING
HIGH	spacy/pipeline/entity_linker.py	0	serialize the pipe to disk. path (str / path): path to a directory. exclude (iterable[str]): string names of serializati	STRING
HIGH	spacy/pipeline/legacy/entity_linker.py	0	serialize the pipe to disk. path (str / path): path to a directory. exclude (iterable[str]): string names of serializati	STRING
HIGH	spacy/pipeline/lemmatizer.py	0	serialize the pipe to a bytestring. exclude (iterable[str]): string names of serialization fields to exclude. returns (b	STRING
HIGH	spacy/pipeline/entity_linker.py	0	serialize the pipe to a bytestring. exclude (iterable[str]): string names of serialization fields to exclude. returns (b	STRING
HIGH	spacy/pipeline/legacy/entity_linker.py	0	serialize the pipe to a bytestring. exclude (iterable[str]): string names of serialization fields to exclude. returns (b	STRING
HIGH	spacy/pipeline/spancat.py	0	apply the pipeline's model to a batch of docs, without modifying them. docs (iterable[doc]): the documents to predict. r	STRING
HIGH	spacy/pipeline/span_finder.py	0	apply the pipeline's model to a batch of docs, without modifying them. docs (iterable[doc]): the documents to predict. r	STRING
HIGH	spacy/pipeline/textcat.py	0	apply the pipeline's model to a batch of docs, without modifying them. docs (iterable[doc]): the documents to predict. r	STRING
HIGH	spacy/pipeline/spancat.py	0	find the loss and gradient of loss for the batch of documents and their predicted scores. examples (iterable[examples]):	STRING
HIGH	spacy/pipeline/span_finder.py	0	find the loss and gradient of loss for the batch of documents and their predicted scores. examples (iterable[examples]):	STRING
HIGH	spacy/pipeline/textcat.py	0	find the loss and gradient of loss for the batch of documents and their predicted scores. examples (iterable[examples]):	STRING
HIGH	spacy/tests/lang/sv/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/ms/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/el/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/it/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/pt/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/ht/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/nl/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/nb/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/de/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/id/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/fr/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/es/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/en/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/fa/test_noun_chunks.py	0	test that noun_chunks raises value error for 'fa' language if doc is not parsed.	STRING
HIGH	spacy/tests/lang/da/test_noun_chunks.py	0	test that noun_chunks raises value error for 'tr' language if doc is not parsed. to check this test, we're constructing	STRING
HIGH	spacy/tests/lang/fi/test_noun_chunks.py	0	test that noun_chunks raises value error for 'tr' language if doc is not parsed. to check this test, we're constructing	STRING
HIGH	spacy/tests/lang/la/test_noun_chunks.py	0	test that noun_chunks raises value error for 'tr' language if doc is not parsed. to check this test, we're constructing	STRING
HIGH	spacy/tests/lang/tr/test_noun_chunks.py	0	test that noun_chunks raises value error for 'tr' language if doc is not parsed. to check this test, we're constructing	STRING
HIGH	spacy/lang/sl/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/sk/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/ur/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/da/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/kmr/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/pl/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/vi/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/sq/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/sv/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/he/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/ms/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/hy/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/am/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/nn/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/ky/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/gu/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/grc/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/ja/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/el/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/lb/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/it/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/ca/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/cs/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
HIGH	spacy/lang/ru/examples.py	0	example sentences to test spacy and its language models. >>> from spacy.lang.ti.examples import sentences >>> docs = nlp	STRING
58 more matches not shown…

Unused Imports183 hits · 173 pts

Severity	File	Line	Context
LOW	spacy/compat.py	42	CODE
LOW	spacy/compat.py	28	CODE
LOW	spacy/compat.py	28	CODE
LOW	spacy/compat.py	28	CODE
LOW	spacy/compat.py	30	CODE
LOW	spacy/compat.py	30	CODE
LOW	spacy/compat.py	30	CODE
LOW	spacy/compat.py	36	CODE
LOW	spacy/compat.py	38	CODE
LOW	spacy/util.py	68	CODE
LOW	spacy/util.py	68	CODE
LOW	spacy/util.py	68	CODE
LOW	spacy/util.py	77	CODE
LOW	spacy/util.py	77	CODE
LOW	spacy/util.py	78	CODE
LOW	spacy/util.py	78	CODE
LOW	spacy/util.py	79	CODE
LOW	spacy/util.py	402	CODE
LOW	spacy/util.py	1156	CODE
LOW	spacy/__init__.py	11	CODE
LOW	spacy/__init__.py	11	CODE
LOW	spacy/__init__.py	11	CODE
LOW	spacy/__init__.py	13	CODE
LOW	spacy/__init__.py	17	CODE
LOW	spacy/__init__.py	18	CODE
LOW	spacy/__init__.py	20	CODE
LOW	spacy/__init__.py	22	CODE
LOW	spacy/__init__.py	22	CODE
LOW	spacy/__init__.py	33	CODE
LOW	spacy/__init__.py	33	CODE
LOW	spacy/schemas.py	41	CODE
LOW	spacy/schemas.py	42	CODE
LOW	spacy/schemas.py	43	CODE
LOW	spacy/pipe_analysis.py	11	CODE
LOW	spacy/ty.py	17	CODE
LOW	spacy/ty.py	18	CODE
LOW	spacy/scorer.py	24	CODE
LOW	spacy/pipeline/__init__.py	1	CODE
LOW	spacy/pipeline/__init__.py	2	CODE
LOW	spacy/pipeline/__init__.py	3	CODE
LOW	spacy/pipeline/__init__.py	4	CODE
LOW	spacy/pipeline/__init__.py	5	CODE
LOW	spacy/pipeline/__init__.py	6	CODE
LOW	spacy/pipeline/__init__.py	6	CODE
LOW	spacy/pipeline/__init__.py	6	CODE
LOW	spacy/pipeline/__init__.py	7	CODE
LOW	spacy/pipeline/__init__.py	8	CODE
LOW	spacy/pipeline/__init__.py	9	CODE
LOW	spacy/pipeline/__init__.py	10	CODE
LOW	spacy/pipeline/__init__.py	11	CODE
LOW	spacy/pipeline/__init__.py	12	CODE
LOW	spacy/pipeline/__init__.py	13	CODE
LOW	spacy/pipeline/__init__.py	14	CODE
LOW	spacy/pipeline/__init__.py	15	CODE
LOW	spacy/pipeline/__init__.py	16	CODE
LOW	spacy/pipeline/__init__.py	17	CODE
LOW	spacy/pipeline/__init__.py	18	CODE
LOW	spacy/pipeline/__init__.py	19	CODE
LOW	spacy/pipeline/__init__.py	20	CODE
LOW	spacy/pipeline/legacy/__init__.py	1	CODE
123 more matches not shown…

Deep Nesting116 hits · 116 pts

Severity	File	Line	Context
LOW	website/setup/jinja_to_js.py	458	CODE
LOW	website/setup/jinja_to_js.py	491	CODE
LOW	website/setup/jinja_to_js.py	651	CODE
LOW	website/setup/jinja_to_js.py	675	CODE
LOW	website/setup/jinja_to_js.py	886	CODE
LOW	website/setup/jinja_to_js.py	1022	CODE
LOW	website/setup/jinja_to_js.py	1112	CODE
LOW	spacy/util.py	675	CODE
LOW	spacy/util.py	940	CODE
LOW	spacy/util.py	1810	CODE
LOW	spacy/language.py	838	CODE
LOW	spacy/language.py	1636	CODE
LOW	spacy/language.py	1763	CODE
LOW	spacy/language.py	1996	CODE
LOW	spacy/pipe_analysis.py	17	CODE
LOW	spacy/pipe_analysis.py	81	CODE
LOW	spacy/scorer.py	760	CODE
LOW	spacy/scorer.py	211	CODE
LOW	spacy/scorer.py	262	CODE
LOW	spacy/scorer.py	346	CODE
LOW	spacy/scorer.py	447	CODE
LOW	spacy/scorer.py	583	CODE
LOW	spacy/scorer.py	652	CODE
LOW	spacy/displacy/render.py	153	CODE
LOW	spacy/pipeline/functions.py	82	CODE
LOW	spacy/pipeline/functions.py	138	CODE
LOW	spacy/pipeline/tok2vec.py	278	CODE
LOW	spacy/pipeline/lemmatizer.py	172	CODE
LOW	spacy/pipeline/edit_tree_lemmatizer.py	156	CODE
LOW	spacy/pipeline/edit_tree_lemmatizer.py	176	CODE
LOW	spacy/pipeline/edit_tree_lemmatizer.py	197	CODE
LOW	spacy/pipeline/entityruler.py	246	CODE
LOW	spacy/pipeline/spancat.py	449	CODE
LOW	spacy/pipeline/spancat.py	558	CODE
LOW	spacy/pipeline/attributeruler.py	190	CODE
LOW	spacy/pipeline/span_ruler.py	322	CODE
LOW	spacy/pipeline/entity_linker.py	338	CODE
LOW	spacy/pipeline/entity_linker.py	460	CODE
LOW	spacy/pipeline/span_finder.py	135	CODE
LOW	spacy/pipeline/span_finder.py	217	CODE
LOW	spacy/pipeline/textcat.py	262	CODE
LOW	spacy/pipeline/legacy/entity_linker.py	139	CODE
LOW	spacy/pipeline/legacy/entity_linker.py	225	CODE
LOW	spacy/pipeline/legacy/entity_linker.py	311	CODE
LOW	spacy/training/pretrain.py	26	CODE
LOW	spacy/training/initialize.py	35	CODE
LOW	spacy/training/initialize.py	210	CODE
LOW	spacy/training/batchers.py	134	CODE
LOW	spacy/training/augment.py	164	CODE
LOW	spacy/training/augment.py	219	CODE
LOW	spacy/training/iob_utils.py	71	CODE
LOW	spacy/training/iob_utils.py	194	CODE
LOW	spacy/training/loop.py	35	CODE
LOW	spacy/training/loop.py	153	CODE
LOW	spacy/training/loop.py	373	CODE
LOW	spacy/training/corpus.py	84	CODE
LOW	spacy/training/corpus.py	184	CODE
LOW	spacy/training/corpus.py	212	CODE
LOW	spacy/training/corpus.py	261	CODE
LOW	spacy/training/corpus.py	311	CODE
56 more matches not shown…

Modern Structural Boilerplate102 hits · 98 pts

Severity	File	Line	Snippet	Context
LOW	spacy/util.py	439	def set_lang_class(name: str, cls: Type["Language"]) -> None:	CODE
LOW	spacy/util.py	1597	def set_dot_to_object(config: Config, section: str, value: Any) -> None:	CODE
LOW	spacy/lookups.py	210	def set_table(self, name: str, table: Table) -> None:	CODE
LOW	spacy/language.py	436	def set_factory_meta(cls, name: str, value: "FactoryMeta") -> None:	CODE
LOW	spacy/displacy/__init__.py	257	def set_render_wrapper(func: Callable[[str], str]) -> None:	CODE
LOW	spacy/pipeline/functions.py	102	def _set_config(self, config: Dict[str, Any] = {}) -> None:	CODE
LOW	spacy/pipeline/tok2vec.py	123	def set_annotations(self, docs: Sequence[Doc], tokvecses) -> None:	CODE
LOW	spacy/pipeline/spancat.py	380	def set_annotations(self, docs: Iterable[Doc], indices_scores) -> None:	CODE
LOW	spacy/pipeline/__init__.py	22	__all__ = [	CODE
LOW	spacy/pipeline/entity_linker.py	460	def set_annotations(self, docs: Iterable[Doc], kb_ids: List[str]) -> None:	CODE
LOW	spacy/pipeline/span_finder.py	135	def set_annotations(self, docs: Iterable[Doc], scores: Floats2d) -> None:	STRING
LOW	spacy/pipeline/textcat.py	170	def set_annotations(self, docs: Iterable[Doc], scores) -> None:	STRING
LOW	spacy/pipeline/legacy/__init__.py	3	__all__ = ["EntityLinker_v1"]	CODE
LOW	spacy/pipeline/legacy/entity_linker.py	311	def set_annotations(self, docs: Iterable[Doc], kb_ids: List[str]) -> None:	CODE
LOW	spacy/kb/__init__.py	5	__all__ = [	CODE
LOW	spacy/training/__init__.py	20	__all__ = [	CODE
LOW	spacy/cli/init_pipeline.py	95	def update_lexemes(nlp: Language, jsonl_loc: Path) -> None:	CODE
LOW	spacy/matcher/__init__.py	6	__all__ = ["DependencyMatcher", "Matcher", "PhraseMatcher", "levenshtein"]	CODE
LOW	spacy/lang/sl/__init__.py	22	__all__ = ["Slovenian"]	CODE
LOW	spacy/lang/sk/__init__.py	16	__all__ = ["Slovak"]	CODE
LOW	spacy/lang/ur/__init__.py	19	__all__ = ["Urdu"]	CODE
LOW	spacy/lang/kmr/__init__.py	16	__all__ = ["Kurmanji"]	CODE
LOW	spacy/lang/pl/__init__.py	55	__all__ = ["Polish"]	CODE
LOW	spacy/lang/vi/__init__.py	132	def _set_config(self, config: Dict[str, Any] = {}) -> None:	CODE
LOW	spacy/lang/vi/__init__.py	167	__all__ = ["Vietnamese"]	CODE
LOW	spacy/lang/sq/__init__.py	14	__all__ = ["Albanian"]	CODE
LOW	spacy/lang/sv/__init__.py	52	__all__ = ["Swedish"]	CODE
LOW	spacy/lang/ga/__init__.py	33	__all__ = ["Irish"]	CODE
LOW	spacy/lang/he/__init__.py	17	__all__ = ["Hebrew"]	CODE
LOW	spacy/lang/ms/__init__.py	24	__all__ = ["Malay"]	CODE
LOW	spacy/lang/hy/__init__.py	16	__all__ = ["Armenian"]	CODE
LOW	spacy/lang/am/__init__.py	26	__all__ = ["Amharic"]	CODE
LOW	spacy/lang/nn/__init__.py	20	__all__ = ["NorwegianNynorsk"]	CODE
LOW	spacy/lang/da/__init__.py	23	__all__ = ["Danish"]	CODE
LOW	spacy/lang/mr/__init__.py	14	__all__ = ["Marathi"]	CODE
LOW	spacy/lang/ky/__init__.py	20	__all__ = ["Kyrgyz"]	CODE
LOW	spacy/lang/gu/__init__.py	14	__all__ = ["Gujarati"]	CODE
LOW	spacy/lang/grc/__init__.py	22	__all__ = ["AncientGreek"]	CODE
LOW	spacy/lang/ja/__init__.py	158	def _set_config(self, config: Dict[str, Any] = {}) -> None:	STRING
LOW	spacy/lang/ja/__init__.py	341	__all__ = ["Japanese"]	CODE
LOW	spacy/lang/el/__init__.py	53	__all__ = ["Greek"]	CODE
LOW	spacy/lang/lv/__init__.py	14	__all__ = ["Latvian"]	CODE
LOW	spacy/lang/lb/__init__.py	20	__all__ = ["Luxembourgish"]	CODE
LOW	spacy/lang/it/__init__.py	50	__all__ = ["Italian"]	CODE
LOW	spacy/lang/ca/__init__.py	53	__all__ = ["Catalan"]	CODE
LOW	spacy/lang/is/__init__.py	14	__all__ = ["Icelandic"]	CODE
LOW	spacy/lang/cs/__init__.py	16	__all__ = ["Czech"]	CODE
LOW	spacy/lang/te/__init__.py	16	__all__ = ["Telugu"]	CODE
LOW	spacy/lang/ru/__init__.py	53	__all__ = ["Russian"]	CODE
LOW	spacy/lang/tl/__init__.py	18	__all__ = ["Tagalog"]	CODE
LOW	spacy/lang/ro/__init__.py	26	__all__ = ["Romanian"]	CODE
LOW	spacy/lang/hsb/__init__.py	18	__all__ = ["UpperSorbian"]	CODE
LOW	spacy/lang/yo/__init__.py	16	__all__ = ["Yoruba"]	CODE
LOW	spacy/lang/sa/__init__.py	16	__all__ = ["Sanskrit"]	CODE
LOW	spacy/lang/pt/__init__.py	23	__all__ = ["Portuguese"]	CODE
LOW	spacy/lang/zh/__init__.py	146	def _set_config(self, config: Dict[str, Any] = {}) -> None:	STRING
LOW	spacy/lang/zh/__init__.py	336	__all__ = ["Chinese"]	STRING
LOW	spacy/lang/uk/__init__.py	53	__all__ = ["Ukrainian"]	CODE
LOW	spacy/lang/sr/__init__.py	21	__all__ = ["Serbian"]	CODE
LOW	spacy/lang/si/__init__.py	16	__all__ = ["Sinhala"]	CODE
42 more matches not shown…

Self-Referential Comments18 hits · 52 pts

Severity	File	Line	Snippet	Context
MEDIUM⚡	website/meta/universe.json	100	"# Define the language for the sentence as well as for the spaCy and benepar models",	CODE
MEDIUM⚡	website/meta/universe.json	104	"# Create the pipeline (note, the required models will be downloaded and installed automatically)",	CODE
MEDIUM⚡	website/meta/universe.json	108	"# Create the tree from where we are going to extract the desired noun phrases",	CODE
MEDIUM	website/meta/universe.json	1387	"# Create a new chat bot named Charlie",	CODE
MEDIUM	website/meta/universe.json	3606	"# Create a new DocBin",	CODE
MEDIUM	spacy/language.py	220	# Create the default tokenizer from the default config	COMMENT
MEDIUM	spacy/displacy/render.py	326	# Create a random ID prefix to make sure parses don't receive the	COMMENT
MEDIUM	spacy/pipeline/legacy/entity_linker.py	1	# This file is present to provide a prior version of the EntityLinker component	COMMENT
MEDIUM	spacy/tests/test_displacy.py	457	# Create a doc containing an annotated word and an unannotated HTML tag	COMMENT
MEDIUM	spacy/tests/pipeline/test_entity_linker.py	151	# Create the Entity Linker component and add it to the pipeline	COMMENT
MEDIUM	spacy/tests/pipeline/test_entity_linker.py	746	# Create the Entity Linker component and add it to the pipeline	COMMENT
MEDIUM	spacy/tests/pipeline/test_entity_linker.py	852	# Create the NER and EL components and add them to the pipeline	COMMENT
MEDIUM	spacy/tests/pipeline/test_entity_linker.py	933	# Create the Entity Linker component with the KB from file, and check the final vocab	COMMENT
MEDIUM	spacy/tests/pipeline/test_entity_linker.py	1156	# Create a ruler to mark entities	COMMENT
MEDIUM	spacy/tests/pipeline/test_entity_linker.py	1282	# Create the Entity Linker component and add it to the pipeline	COMMENT
MEDIUM	spacy/cli/debug_data.py	149	# Create the gold corpus to be able to better analyze data	COMMENT
MEDIUM	spacy/cli/debug_data.py	903	# Creating a data structure that holds the start and	COMMENT
MEDIUM	spacy/lang/lex_attrs.py	185	# This function is partially applied so lang code can be passed in	COMMENT

Cross-Language Confusion (JS/TS)4 hits · 28 pts

Severity	File	Line	Snippet	Context
HIGH⚡	website/pages/index.tsx	52	print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])	CODE
HIGH⚡	website/pages/index.tsx	53	print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])	CODE
HIGH⚡	website/pages/index.tsx	57	print(entity.text, entity.label_)	CODE
HIGH	website/src/widgets/quickstart-models.js	105	print([	CODE

Excessive Try-Catch Wrapping21 hits · 22 pts

Severity	File	Line	Snippet	Context
LOW	setup.py	136	except Exception:	CODE
LOW	setup.py	144	except Exception:	CODE
LOW	spacy/util.py	890	except Exception:	CODE
LOW	spacy/util.py	1777	except Exception as e:	CODE
LOW	spacy/language.py	1057	except Exception as e:	CODE
LOW	spacy/language.py	2417	except Exception:	CODE
LOW	spacy/pipeline/lemmatizer.py	113	except Exception as e:	CODE
LOW	spacy/pipeline/entityruler.py	125	except Exception as e:	CODE
LOW	spacy/pipeline/attributeruler.py	134	except Exception as e:	CODE
LOW	spacy/pipeline/span_ruler.py	224	except Exception as e:	CODE
LOW	spacy/training/loop.py	126	except Exception as e:	CODE
LOW	spacy/training/loop.py	385	except Exception as e:	CODE
LOW	spacy/tests/test_models.py	163	except Exception:	CODE
LOW	spacy/tests/test_models.py	173	except Exception:	CODE
LOW⚡	spacy/tests/serialize/test_resource_warning.py	117	except Exception as e:	CODE
LOW⚡	spacy/tests/serialize/test_resource_warning.py	131	except Exception as e:	CODE
LOW⚡	spacy/tests/serialize/test_resource_warning.py	137	except Exception as e:	CODE
LOW	spacy/cli/_util.py	195	except Exception as e:	CODE
LOW	spacy/cli/info.py	159	except Exception:	CODE
LOW	spacy/cli/info.py	197	except Exception:	CODE
LOW	spacy/cli/debug_model.py	146	except Exception:	CODE

AI Structural Patterns21 hits · 20 pts

Severity	File	Line	Context
LOW	setup.py	129	CODE
LOW	spacy/util.py	1380	CODE
LOW	spacy/language.py	768	CODE
LOW	spacy/language.py	1763	CODE
LOW	spacy/displacy/__init__.py	76	CODE
LOW	spacy/pipeline/entityruler.py	42	CODE
LOW	spacy/pipeline/spancat.py	203	CODE
LOW	spacy/pipeline/span_ruler.py	123	CODE
LOW	spacy/tests/test_language.py	285	CODE
LOW	spacy/cli/package.py	22	CODE
LOW	spacy/cli/package.py	109	CODE
LOW	spacy/cli/benchmark_speed.py	23	CODE
LOW	spacy/cli/init_pipeline.py	24	CODE
LOW	spacy/cli/convert.py	48	CODE
LOW	spacy/cli/convert.py	129	CODE
LOW	spacy/cli/find_threshold.py	27	CODE
LOW	spacy/cli/evaluate.py	19	CODE
LOW	spacy/cli/evaluate.py	87	CODE
LOW	spacy/cli/debug_model.py	36	CODE
LOW	spacy/cli/debug_model.py	250	CODE
LOW	spacy/cli/apply.py	69	CODE

Cross-Language Confusion3 hits · 15 pts

Severity	File	Line	Snippet	Context
HIGH	spacy/tests/parser/test_state.py	28	state.push()	CODE
HIGH	spacy/tests/parser/test_state.py	32	state.push()	CODE
HIGH	spacy/tests/parser/test_state.py	45	state.push()	CODE

Fake / Example Data10 hits · 14 pts

Severity	File	Line	Snippet	Context
LOW	website/meta/universe.json	744	"# [{'end': 8, 'start': 0, 'text': 'John Doe', 'type': 'PERSON'}, {'end': 25, 'start': 13, 'text': 'Go D	CODE
LOW	spacy/glossary.py	314	"ph": "placeholder",	CODE
LOW⚡	spacy/tests/tokenizer/test_whitespace.py	4	@pytest.mark.parametrize("text", ["lorem ipsum"])	CODE
LOW⚡	spacy/tests/tokenizer/test_whitespace.py	17	@pytest.mark.parametrize("text", ["lorem ipsum "])	CODE
LOW⚡	spacy/tests/tokenizer/test_tokenizer.py	317	text = "Lorem ipsum: 1984."	CODE
LOW⚡	spacy/tests/tokenizer/test_tokenizer.py	350	text = """Lorem ipsum dolor sit amet, consectetur adipiscing elit	CODE
LOW⚡	spacy/tests/tokenizer/test_tokenizer.py	350	text = """Lorem ipsum dolor sit amet, consectetur adipiscing elit	CODE
LOW⚡	spacy/tests/tokenizer/test_tokenizer.py	374	text1 = "Lorem dolor sit amet, consectetur adipiscing elit."	CODE
LOW⚡	spacy/tests/tokenizer/test_tokenizer.py	375	text2 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."	CODE
LOW⚡	spacy/tests/tokenizer/test_tokenizer.py	375	text2 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."	CODE

AI Slop Vocabulary9 hits · 14 pts

Severity	File	Line	Snippet	Context
MEDIUM	website/meta/universe.json	9	"description": "**[Temporal Expressions Normalization spaCy (TeNs)](https://github.com/iliedorobat/timespan-	CODE
MEDIUM	website/meta/universe.json	5396	"# floret n-gram embeddings robust to typos",	CODE
LOW	spacy/schemas.py	102	# binding=True. Here we just use an empty model that allows everything.	COMMENT
LOW	spacy/scorer.py	724	# None is indistinct, so we can't just add it to the set	COMMENT
LOW	spacy/training/loggers.py	120	# If we don't have a new checkpoint, just return.	COMMENT
LOW	spacy/tests/test_factory_registrations.py	59	# For Cython functions, just use a placeholder	COMMENT
LOW	spacy/lang/ja/__init__.py	77	# if there's no lemma info (it's an unk) just use the surface	STRING
LOW	spacy/lang/ht/lemmatizer.py	39	# fallback rule: just return lowercased form	COMMENT
LOW	spacy/ml/_precomputable_affine.py	44	# However, we avoid building that array for efficiency -- and just pass	COMMENT

Redundant / Tautological Comments6 hits · 7 pts

Severity	File	Line	Snippet	Context
LOW	spacy/util.py	418	# Check if language is registered / entry point is available	COMMENT
LOW	spacy/language.py	2065	# Check if the path actually exists in the config	COMMENT
LOW	spacy/lang/it/syntax_iterators.py	57	elif right_child.dep in np_modifs: # Check if we can expand to right	CODE
LOW	spacy/lang/pt/syntax_iterators.py	56	elif right_child.dep in np_modifs: # Check if we can expand to right	CODE
LOW	spacy/lang/fr/syntax_iterators.py	58	elif right_child.dep in np_modifs: # Check if we can expand to right	CODE
LOW	spacy/lang/es/syntax_iterators.py	47	elif right_child.dep in np_modifs: # Check if we can expand to right	CODE

Over-Commented Block3 hits · 3 pts

Severity	File	Line	Snippet	Context
LOW	spacy/default_config.cfg	41	# Whether to train on sequences with 'gold standard' sentence boundaries	COMMENT
LOW	spacy/lang/nl/stop_words.py	1	# The original stop words list (added in f46ffe3) was taken from	COMMENT
LOW	spacy/ml/_precomputable_affine.py	21	W = model.get_param("W")	COMMENT

Modern AI Meta-Vocabulary1 hit · 2 pts

Severity	File	Line	Snippet	Context
MEDIUM	website/meta/universe.json	444	"# zero shot definition of entities",	CODE

Slop Phrases1 hit · 2 pts

Severity	File	Line	Snippet	Context
MEDIUM	website/meta/universe.json	1768	"# For instance you can add the BertTone model for classification of sentiment polarity to the pipeline:	CODE

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings