Repository Analysis

explosion/spaCy

7.9 Low AI signal View on GitHub
7.9
Adjusted Score
7.9
Raw Score
100%
Time Factor
Last Push
Stars
Language
227,796
Lines of Code
1407
Files
1542
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 26MEDIUM 21LOW 1495

Pattern Findings

1542 matches across 13 categories. Click a row to expand file-level details.

Hyper-Verbose Identifiers1150 hits · 1252 pts
SeverityFileLineSnippet
LOWwebsite/setup/jinja_to_js.py696 def _process_filter_capitalize(self, node, **kwargs):
LOWwebsite/setup/jinja_to_js.py977 def _process_test_divisibleby(self, node, **kwargs):
LOWwebsite/meta/universe.json5499 "def create_presque_normalizer(nlp, name='presque_normalizer'):",
LOWextra/DEVELOPER_DOCS/Code Conventions.md492def test_doc_creation_with_pos():
LOWextra/DEVELOPER_DOCS/Code Conventions.md526def test_en_tokenizer_splits_em_dash_infix(en_tokenizer):
LOWextra/DEVELOPER_DOCS/Code Conventions.md555def test_phrase_matcher_validation(en_vocab):
LOWspacy/util.py1387def make_first_longest_spans_filter():
LOWspacy/language.py2163 def _resolve_component_status(
LOWspacy/scorer.py262 def score_token_attr_per_feat(
LOWspacy/pipeline/spancat.py144def build_ngram_range_suggester(min_size: int, max_size: int) -> Suggester:
LOWspacy/pipeline/spancat.py152def build_preset_spans_suggester(spans_key: str) -> Suggester:
LOWspacy/pipeline/spancat.py558 def _make_span_group_multilabel(
LOWspacy/pipeline/spancat.py599 def _make_span_group_singlelabel(
LOWspacy/pipeline/attributeruler.py47def make_attribute_ruler_scorer():
LOWspacy/pipeline/span_ruler.py36def prioritize_new_ents_filter(
LOWspacy/pipeline/span_ruler.py62def make_prioritize_new_ents_filter():
LOWspacy/pipeline/span_ruler.py66def prioritize_existing_ents_filter(
LOWspacy/pipeline/span_ruler.py92def make_preserve_existing_ents_filter():
LOWspacy/pipeline/span_ruler.py96def overlapping_labeled_spans_score(
LOWspacy/pipeline/span_ruler.py111def make_overlapping_labeled_spans_scorer(spans_key: str = DEFAULT_SPANS_KEY):
LOWspacy/pipeline/factories.py727def make_edit_tree_lemmatizer(
LOWspacy/pipeline/entity_linker.py47def make_entity_linker_scorer():
LOWspacy/pipeline/entity_linker.py237 def batch_has_learnable_example(self, examples):
LOWspacy/pipeline/textcat_multilabel.py82def make_textcat_multilabel_scorer():
LOWspacy/pipeline/span_finder.py217 def _get_aligned_truth_scores(self, examples, ops) -> Tuple[Floats2d, Floats2d]:
LOWspacy/training/batchers.py22def configure_minibatch_by_padded_size(
LOWspacy/training/batchers.py56def configure_minibatch_by_words(
LOWspacy/training/augment.py13def create_combined_augmenter(
LOWspacy/training/augment.py85def create_orth_variants_augmenter(
LOWspacy/training/augment.py102def create_lower_casing_augmenter(
LOWspacy/training/augment.py337def construct_modified_raw_text(token_dict):
LOWspacy/training/iob_utils.py63def _doc_to_biluo_tags_with_partial(doc: Doc) -> List[str]:
LOWspacy/training/callbacks.py10def create_copy_from_base_model(
LOWspacy/training/loop.py290def create_evaluation_callback(
LOWspacy/training/loop.py356def create_before_to_disk_callback(
LOWspacy/training/corpus.py199 def make_examples_gold_preproc(
LOWspacy/tests/test_displacy.py120def test_displacy_parse_spans(en_vocab):
LOWspacy/tests/test_displacy.py149def test_displacy_parse_spans_with_kb_id_options(en_vocab):
LOWspacy/tests/test_displacy.py184def test_displacy_parse_spans_different_spans_key(en_vocab):
LOWspacy/tests/test_displacy.py206def test_displacy_parse_empty_spans_key(en_vocab):
LOWspacy/tests/test_displacy.py236def test_displacy_parse_ents_with_kb_id_options(en_vocab):
LOWspacy/tests/test_displacy.py294def test_displacy_invalid_arcs():
LOWspacy/tests/test_displacy.py313def test_displacy_raises_for_wrong_type(en_vocab):
LOWspacy/tests/test_displacy.py337def test_displacy_render_wrapper(en_vocab):
LOWspacy/tests/test_displacy.py353def test_displacy_render_manual_dep():
LOWspacy/tests/test_displacy.py375def test_displacy_render_manual_ent():
LOWspacy/tests/test_displacy.py396def test_displacy_render_manual_span():
LOWspacy/tests/test_displacy.py425def test_displacy_options_case():
LOWspacy/tests/test_displacy.py440def test_displacy_manual_sorted_entities():
LOWspacy/tests/test_displacy.py474def test_displacy_span_stacking():
LOWspacy/tests/test_misc.py79def test_util_ensure_path_succeeds(text):
LOWspacy/tests/test_misc.py93def test_util_get_package_path(package):
LOWspacy/tests/test_misc.py176def test_load_model_blank_shortcut():
LOWspacy/tests/test_misc.py207def test_is_compatible_version(version, constraint, compatible):
LOWspacy/tests/test_misc.py225def test_is_unconstrained_version(constraint, expected):
LOWspacy/tests/test_misc.py284def test_dot_to_dict_overrides(dot_notation, expected):
LOWspacy/tests/test_misc.py351def test_util_minibatch_oversize(doc_sizes, expected_batches):
LOWspacy/tests/util.py42def apply_transition_sequence(parser, doc, sequence):
LOWspacy/tests/test_factory_imports.py65def test_factory_import_compatibility(factory_name, original_module, compat_module):
LOWspacy/tests/README.md111def test_doc_token_api_strings(en_vocab):
1090 more matches not shown…
Unused Imports183 hits · 174 pts
SeverityFileLineSnippet
LOWspacy/compat.py42
LOWspacy/compat.py28
LOWspacy/compat.py28
LOWspacy/compat.py28
LOWspacy/compat.py30
LOWspacy/compat.py30
LOWspacy/compat.py30
LOWspacy/compat.py36
LOWspacy/compat.py38
LOWspacy/util.py68
LOWspacy/util.py68
LOWspacy/util.py68
LOWspacy/util.py77
LOWspacy/util.py77
LOWspacy/util.py78
LOWspacy/util.py78
LOWspacy/util.py79
LOWspacy/util.py402
LOWspacy/util.py1156
LOWspacy/__init__.py11
LOWspacy/__init__.py11
LOWspacy/__init__.py11
LOWspacy/__init__.py13
LOWspacy/__init__.py17
LOWspacy/__init__.py18
LOWspacy/__init__.py20
LOWspacy/__init__.py22
LOWspacy/__init__.py22
LOWspacy/__init__.py33
LOWspacy/__init__.py33
LOWspacy/schemas.py41
LOWspacy/schemas.py42
LOWspacy/schemas.py43
LOWspacy/pipe_analysis.py11
LOWspacy/ty.py17
LOWspacy/ty.py18
LOWspacy/scorer.py24
LOWspacy/pipeline/__init__.py1
LOWspacy/pipeline/__init__.py2
LOWspacy/pipeline/__init__.py3
LOWspacy/pipeline/__init__.py4
LOWspacy/pipeline/__init__.py5
LOWspacy/pipeline/__init__.py6
LOWspacy/pipeline/__init__.py6
LOWspacy/pipeline/__init__.py6
LOWspacy/pipeline/__init__.py7
LOWspacy/pipeline/__init__.py8
LOWspacy/pipeline/__init__.py9
LOWspacy/pipeline/__init__.py10
LOWspacy/pipeline/__init__.py11
LOWspacy/pipeline/__init__.py12
LOWspacy/pipeline/__init__.py13
LOWspacy/pipeline/__init__.py14
LOWspacy/pipeline/__init__.py15
LOWspacy/pipeline/__init__.py16
LOWspacy/pipeline/__init__.py17
LOWspacy/pipeline/__init__.py18
LOWspacy/pipeline/__init__.py19
LOWspacy/pipeline/__init__.py20
LOWspacy/pipeline/legacy/__init__.py1
123 more matches not shown…
Deep Nesting116 hits · 116 pts
SeverityFileLineSnippet
LOWwebsite/setup/jinja_to_js.py458
LOWwebsite/setup/jinja_to_js.py491
LOWwebsite/setup/jinja_to_js.py651
LOWwebsite/setup/jinja_to_js.py675
LOWwebsite/setup/jinja_to_js.py886
LOWwebsite/setup/jinja_to_js.py1022
LOWwebsite/setup/jinja_to_js.py1112
LOWspacy/util.py675
LOWspacy/util.py940
LOWspacy/util.py1810
LOWspacy/language.py838
LOWspacy/language.py1636
LOWspacy/language.py1763
LOWspacy/language.py1996
LOWspacy/pipe_analysis.py17
LOWspacy/pipe_analysis.py81
LOWspacy/scorer.py760
LOWspacy/scorer.py211
LOWspacy/scorer.py262
LOWspacy/scorer.py346
LOWspacy/scorer.py447
LOWspacy/scorer.py583
LOWspacy/scorer.py652
LOWspacy/displacy/render.py153
LOWspacy/pipeline/functions.py82
LOWspacy/pipeline/functions.py138
LOWspacy/pipeline/tok2vec.py278
LOWspacy/pipeline/lemmatizer.py172
LOWspacy/pipeline/edit_tree_lemmatizer.py156
LOWspacy/pipeline/edit_tree_lemmatizer.py176
LOWspacy/pipeline/edit_tree_lemmatizer.py197
LOWspacy/pipeline/entityruler.py246
LOWspacy/pipeline/spancat.py449
LOWspacy/pipeline/spancat.py558
LOWspacy/pipeline/attributeruler.py190
LOWspacy/pipeline/span_ruler.py322
LOWspacy/pipeline/entity_linker.py338
LOWspacy/pipeline/entity_linker.py460
LOWspacy/pipeline/span_finder.py135
LOWspacy/pipeline/span_finder.py217
LOWspacy/pipeline/textcat.py262
LOWspacy/pipeline/legacy/entity_linker.py139
LOWspacy/pipeline/legacy/entity_linker.py225
LOWspacy/pipeline/legacy/entity_linker.py311
LOWspacy/training/pretrain.py26
LOWspacy/training/initialize.py35
LOWspacy/training/initialize.py210
LOWspacy/training/batchers.py134
LOWspacy/training/augment.py164
LOWspacy/training/augment.py219
LOWspacy/training/iob_utils.py71
LOWspacy/training/iob_utils.py194
LOWspacy/training/loop.py35
LOWspacy/training/loop.py153
LOWspacy/training/loop.py373
LOWspacy/training/corpus.py84
LOWspacy/training/corpus.py184
LOWspacy/training/corpus.py212
LOWspacy/training/corpus.py261
LOWspacy/training/corpus.py311
56 more matches not shown…
Cross-File Repetition19 hits · 95 pts
SeverityFileLineSnippet
HIGHspacy/displacy/render.py0render complete markup. parsed (list): dependency parses to render. page (bool): render parses wrapped as full html page
HIGHspacy/displacy/render.py0render complete markup. parsed (list): dependency parses to render. page (bool): render parses wrapped as full html page
HIGHspacy/displacy/render.py0render complete markup. parsed (list): dependency parses to render. page (bool): render parses wrapped as full html page
HIGHspacy/lang/sv/syntax_iterators.py0detect base noun phrases from a dependency parse. works on doc and span.
HIGHspacy/lang/ja/syntax_iterators.py0detect base noun phrases from a dependency parse. works on doc and span.
HIGHspacy/lang/el/syntax_iterators.py0detect base noun phrases from a dependency parse. works on doc and span.
HIGHspacy/lang/ca/syntax_iterators.py0detect base noun phrases from a dependency parse. works on doc and span.
HIGHspacy/lang/nb/syntax_iterators.py0detect base noun phrases from a dependency parse. works on doc and span.
HIGHspacy/lang/de/syntax_iterators.py0detect base noun phrases from a dependency parse. works on doc and span.
HIGHspacy/lang/ms/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/it/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/pt/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/fi/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/id/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/fr/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/es/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/en/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/fa/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
HIGHspacy/lang/tr/syntax_iterators.py0detect base noun phrases from a dependency parse. works on both doc and span.
Self-Referential Comments18 hits · 52 pts
SeverityFileLineSnippet
MEDIUMwebsite/meta/universe.json100 "# Define the language for the sentence as well as for the spaCy and benepar models",
MEDIUMwebsite/meta/universe.json104 "# Create the pipeline (note, the required models will be downloaded and installed automatically)",
MEDIUMwebsite/meta/universe.json108 "# Create the tree from where we are going to extract the desired noun phrases",
MEDIUMwebsite/meta/universe.json1387 "# Create a new chat bot named Charlie",
MEDIUMwebsite/meta/universe.json3606 "# Create a new DocBin",
MEDIUMspacy/language.py220 # Create the default tokenizer from the default config
MEDIUMspacy/displacy/render.py326 # Create a random ID prefix to make sure parses don't receive the
MEDIUMspacy/pipeline/legacy/entity_linker.py1# This file is present to provide a prior version of the EntityLinker component
MEDIUMspacy/tests/test_displacy.py457 # Create a doc containing an annotated word and an unannotated HTML tag
MEDIUMspacy/tests/pipeline/test_entity_linker.py151 # Create the Entity Linker component and add it to the pipeline
MEDIUMspacy/tests/pipeline/test_entity_linker.py746 # Create the Entity Linker component and add it to the pipeline
MEDIUMspacy/tests/pipeline/test_entity_linker.py852 # Create the NER and EL components and add them to the pipeline
MEDIUMspacy/tests/pipeline/test_entity_linker.py933 # Create the Entity Linker component with the KB from file, and check the final vocab
MEDIUMspacy/tests/pipeline/test_entity_linker.py1156 # Create a ruler to mark entities
MEDIUMspacy/tests/pipeline/test_entity_linker.py1282 # Create the Entity Linker component and add it to the pipeline
MEDIUMspacy/cli/debug_data.py149 # Create the gold corpus to be able to better analyze data
MEDIUMspacy/cli/debug_data.py903 # Creating a data structure that holds the start and
MEDIUMspacy/lang/lex_attrs.py185 # This function is partially applied so lang code can be passed in
Cross-Language Confusion (JS/TS)4 hits · 28 pts
SeverityFileLineSnippet
HIGHwebsite/pages/index.tsx52print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
HIGHwebsite/pages/index.tsx53print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])
HIGHwebsite/pages/index.tsx57 print(entity.text, entity.label_)
HIGHwebsite/src/widgets/quickstart-models.js105 print([
Excessive Try-Catch Wrapping21 hits · 22 pts
SeverityFileLineSnippet
LOWsetup.py136 except Exception:
LOWsetup.py144 except Exception:
LOWspacy/util.py890 except Exception:
LOWspacy/util.py1777 except Exception as e:
LOWspacy/language.py1057 except Exception as e:
LOWspacy/language.py2417 except Exception:
LOWspacy/pipeline/lemmatizer.py113 except Exception as e:
LOWspacy/pipeline/entityruler.py125 except Exception as e:
LOWspacy/pipeline/attributeruler.py134 except Exception as e:
LOWspacy/pipeline/span_ruler.py224 except Exception as e:
LOWspacy/training/loop.py126 except Exception as e:
LOWspacy/training/loop.py385 except Exception as e:
LOWspacy/tests/test_models.py163 except Exception:
LOWspacy/tests/test_models.py173 except Exception:
LOWspacy/tests/serialize/test_resource_warning.py117 except Exception as e:
LOWspacy/tests/serialize/test_resource_warning.py131 except Exception as e:
LOWspacy/tests/serialize/test_resource_warning.py137 except Exception as e:
LOWspacy/cli/_util.py195 except Exception as e:
LOWspacy/cli/info.py159 except Exception:
LOWspacy/cli/info.py197 except Exception:
LOWspacy/cli/debug_model.py146 except Exception:
Cross-Language Confusion3 hits · 15 pts
SeverityFileLineSnippet
HIGHspacy/tests/parser/test_state.py28 state.push()
HIGHspacy/tests/parser/test_state.py32 state.push()
HIGHspacy/tests/parser/test_state.py45 state.push()
AI Slop Vocabulary9 hits · 14 pts
SeverityFileLineSnippet
MEDIUMwebsite/meta/universe.json9 "description": "**[Temporal Expressions Normalization spaCy (TeNs)](https://github.com/iliedorobat/timespan-
MEDIUMwebsite/meta/universe.json5396 "# floret n-gram embeddings robust to typos",
LOWspacy/schemas.py102 # binding=True. Here we just use an empty model that allows everything.
LOWspacy/scorer.py724 # None is indistinct, so we can't just add it to the set
LOWspacy/training/loggers.py120 # If we don't have a new checkpoint, just return.
LOWspacy/tests/test_factory_registrations.py59 # For Cython functions, just use a placeholder
LOWspacy/lang/ja/__init__.py77 # if there's no lemma info (it's an unk) just use the surface
LOWspacy/lang/ht/lemmatizer.py39 # fallback rule: just return lowercased form
LOWspacy/ml/_precomputable_affine.py44 # However, we avoid building that array for efficiency -- and just pass
Fake / Example Data9 hits · 13 pts
SeverityFileLineSnippet
LOWwebsite/meta/universe.json744 "# [{'end': 8, 'start': 0, 'text': 'John Doe', 'type': 'PERSON'}, {'end': 25, 'start': 13, 'text': 'Go D
LOWspacy/tests/tokenizer/test_whitespace.py4@pytest.mark.parametrize("text", ["lorem ipsum"])
LOWspacy/tests/tokenizer/test_whitespace.py17@pytest.mark.parametrize("text", ["lorem ipsum "])
LOWspacy/tests/tokenizer/test_tokenizer.py317 text = "Lorem ipsum: 1984."
LOWspacy/tests/tokenizer/test_tokenizer.py350 text = """Lorem ipsum dolor sit amet, consectetur adipiscing elit
LOWspacy/tests/tokenizer/test_tokenizer.py350 text = """Lorem ipsum dolor sit amet, consectetur adipiscing elit
LOWspacy/tests/tokenizer/test_tokenizer.py374 text1 = "Lorem dolor sit amet, consectetur adipiscing elit."
LOWspacy/tests/tokenizer/test_tokenizer.py375 text2 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
LOWspacy/tests/tokenizer/test_tokenizer.py375 text2 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
Redundant / Tautological Comments6 hits · 7 pts
SeverityFileLineSnippet
LOWspacy/util.py418 # Check if language is registered / entry point is available
LOWspacy/language.py2065 # Check if the path actually exists in the config
LOWspacy/lang/it/syntax_iterators.py57 elif right_child.dep in np_modifs: # Check if we can expand to right
LOWspacy/lang/pt/syntax_iterators.py56 elif right_child.dep in np_modifs: # Check if we can expand to right
LOWspacy/lang/fr/syntax_iterators.py58 elif right_child.dep in np_modifs: # Check if we can expand to right
LOWspacy/lang/es/syntax_iterators.py47 elif right_child.dep in np_modifs: # Check if we can expand to right
Over-Commented Block3 hits · 3 pts
SeverityFileLineSnippet
LOWspacy/default_config.cfg41# Whether to train on sequences with 'gold standard' sentence boundaries
LOWspacy/lang/nl/stop_words.py1# The original stop words list (added in f46ffe3) was taken from
LOWspacy/ml/_precomputable_affine.py21 W = model.get_param("W")
Slop Phrases1 hit · 2 pts
SeverityFileLineSnippet
MEDIUMwebsite/meta/universe.json1768 "# For instance you can add the BertTone model for classification of sentiment polarity to the pipeline: