nltk/nltk

19.0

Adjusted Score

19.0

Raw Score

100%

Time Factor

2026-07-08

Last Push

14.7K

Stars

Python

Language

158.7K

Lines of Code

472

Files

2.3K

Pattern Hits

2026-07-14

Scan Date

0.19

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 0HIGH 90MEDIUM 196LOW 2001

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 2287 distinct pattern matches across 17 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Hyper-Verbose Identifiers695 hits · 759 pts

Severity	File	Line	Snippet	Context
LOW	nltk/tgrep.py	298	def _tgrep_node_literal_value(node):	CODE
LOW	nltk/tgrep.py	382	def _tgrep_nltk_tree_pos_action(_s, _l, tokens):	CODE
LOW	nltk/tgrep.py	629	def _tgrep_conjunction_action(_s, _l, tokens, join_char="&"):	CODE
LOW	nltk/tgrep.py	664	def _tgrep_segmented_pattern_action(_s, _l, tokens):	CODE
LOW	nltk/tgrep.py	706	def _tgrep_node_label_use_action(_s, _l, tokens):	CODE
LOW	nltk/tgrep.py	725	def _tgrep_node_label_pred_use_action(_s, _l, tokens):	CODE
LOW	nltk/tgrep.py	754	def _tgrep_bind_node_label_action(_s, _l, tokens):	CODE
LOW	nltk/tgrep.py	793	def _tgrep_rel_disjunction_action(_s, _l, tokens):	CODE
LOW⚡	nltk/downloader.py	1456	def _simple_interactive_download(self, args):	CODE
LOW	nltk/downloader.py	1486	def _simple_interactive_update(self):	CODE
LOW	nltk/downloader.py	1537	def _simple_interactive_config(self):	CODE
LOW	nltk/util.py	385	def unweighted_minimum_spanning_digraph(tree, children=iter, shapes=None, attr=None):	CODE
LOW	nltk/util.py	544	def acyclic_branches_depth_first(	CODE
LOW	nltk/util.py	639	def unweighted_minimum_spanning_dict(tree, children=iter):	CODE
LOW	nltk/util.py	684	def unweighted_minimum_spanning_tree(tree, children=iter):	CODE
LOW	nltk/pathsec.py	366	def _resolve_and_validate_host(host, port):	CODE
LOW	nltk/grammar.py	725	def is_flexible_chomsky_normal_form(self):	CODE
LOW	nltk/grammar.py	1512	def _read_dependency_production(s):	CODE
LOW	nltk/featstruct.py	1707	def _apply_forwards_to_bindings(forward, bindings):	CODE
LOW	nltk/data.py	61	def _assert_no_encoded_bypass(name, error_label=None):	CODE
LOW	nltk/data.py	94	def _reject_unsafe_no_protocol(resource_url):	CODE
LOW	nltk/data.py	537	def _check_decompression_bomb(info):	CODE
LOW	nltk/tree/tree.py	423	def treeposition_spanning_leaves(self, start, end):	CODE
LOW	nltk/metrics/paice.py	26	def get_words_from_dictionary(lemmas):	CODE
LOW	nltk/metrics/paice.py	265	def _get_truncation_coordinates(self, cutlength=0):	CODE
LOW	nltk/metrics/agreement.py	211	def _chance_corrected_agreement(self, observed, expected):	CODE
LOW	nltk/app/concordance_app.py	321	def search_enter_keypress_handler(self, *event):	CODE
LOW	nltk/app/concordance_app.py	428	def handle_error_loading_corpus(self, event):	CODE
LOW	nltk/app/chunkparser_app.py	769	def _adaptively_modify_eval_chunk(self, t):	CODE
LOW	nltk/app/chunkparser_app.py	1215	def _syntax_highlight_grammar(self, grammar):	CODE
LOW	nltk/app/rdparser_app.py	986	def _animate_match_backtrack_frame(self, frame, widget, dy):	CODE
LOW	nltk/app/chartparser_app.py	2517	def bottom_up_leftcorner_strategy(self, *e):	CODE
LOW	nltk/app/collocations_app.py	259	def handle_error_loading_corpus(self, event):	CODE
LOW	nltk/app/wordnet_app.py	205	def get_unique_counter_from_url(sp):	CODE
LOW	nltk/app/wordnet_app.py	903	def get_static_welcome_message():	STRING
LOW	nltk/classify/util.py	171	def binary_names_demo_features(name):	CODE
LOW⚡	nltk/classify/maxent.py	1124	def calculate_empirical_fcount(train_toks, encoding):	CODE
LOW⚡	nltk/classify/maxent.py	1134	def calculate_estimated_fcount(classifier, train_toks, encoding):	CODE
LOW	nltk/classify/maxent.py	206	def most_informative_features(self, n=10):	CODE
LOW	nltk/classify/maxent.py	220	def show_most_informative_features(self, n=10, show="all"):	CODE
LOW	nltk/classify/maxent.py	1028	def train_maxent_classifier_with_gis(	CODE
LOW	nltk/classify/maxent.py	1152	def train_maxent_classifier_with_iis(	CODE
LOW	nltk/classify/maxent.py	1405	def train_maxent_classifier_with_megam(	STRING
LOW	nltk/classify/naivebayes.py	124	def show_most_informative_features(self, n=10):	CODE
LOW	nltk/classify/naivebayes.py	154	def most_informative_features(self, n=100):	CODE
LOW	nltk/test/test_filestring_sandbox.py	21	def test_rejects_parent_traversal(tmp_path):	CODE
LOW	nltk/test/test_filestring_sandbox.py	36	def test_rejects_symlink_escape(tmp_path):	CODE
LOW	nltk/test/test_filestring_sandbox.py	56	def test_preserves_file_like_objects():	CODE
LOW	nltk/test/unit/test_classify.py	31	def assert_classifier_correct(algorithm):	CODE
LOW	nltk/test/unit/test_cistem.py	37	def test_stem_and_segment_examples_preserved():	CODE
LOW	nltk/test/unit/test_cistem.py	48	def test_empty_and_short_words():	CODE
LOW	nltk/test/unit/test_cistem.py	71	def test_long_word_stems_in_linear_time():	CODE
LOW	nltk/test/unit/test_stanford_java_wrappers.py	15	def test_java_call_options_do_not_mutate_global_java_options(monkeypatch):	CODE
LOW	nltk/test/unit/test_stanford_java_wrappers.py	51	def test_stanford_tokenizer_cleans_temp_file_when_java_raises(monkeypatch):	CODE
LOW	nltk/test/unit/test_stanford_java_wrappers.py	80	def test_stanford_tokenizer_raises_unlink_error_after_java_success(monkeypatch):	CODE
LOW	nltk/test/unit/test_stanford_java_wrappers.py	108	def test_stanford_tokenizer_preserves_java_error_when_cleanup_also_fails(	CODE
LOW	nltk/test/unit/test_stanford_java_wrappers.py	138	def test_stanford_parser_cleans_temp_file_when_java_raises(monkeypatch):	CODE
LOW	nltk/test/unit/test_stanford_java_wrappers.py	170	def test_stanford_tagger_cleans_temp_file_when_java_raises(monkeypatch):	CODE
LOW	nltk/test/unit/test_stanford_java_wrappers.py	199	def test_stanford_segmenter_cleans_temp_file_when_execute_raises():	CODE
LOW⚡	nltk/test/unit/test_zipbomb_security.py	36	def test_ratio_guard_blocks_bomb(tmp_path):	CODE
635 more matches not shown…

Unused Imports640 hits · 569 pts

Severity	File	Line	Context
LOW	tools/find_deprecated.py	30	CODE
LOW	nltk/downloader.py	171	CODE
LOW	nltk/util.py	30	CODE
LOW	nltk/util.py	31	CODE
LOW	nltk/util.py	31	CODE
LOW	nltk/__init__.py	103	CODE
LOW	nltk/__init__.py	133	CODE
LOW	nltk/__init__.py	134	CODE
LOW	nltk/__init__.py	134	CODE
LOW	nltk/__init__.py	135	CODE
LOW	nltk/__init__.py	136	CODE
LOW	nltk/__init__.py	137	CODE
LOW	nltk/__init__.py	138	CODE
LOW	nltk/__init__.py	139	CODE
LOW	nltk/__init__.py	140	CODE
LOW	nltk/__init__.py	146	CODE
LOW	nltk/__init__.py	147	CODE
LOW	nltk/__init__.py	148	CODE
LOW	nltk/__init__.py	149	CODE
LOW	nltk/__init__.py	150	CODE
LOW	nltk/__init__.py	151	CODE
LOW	nltk/__init__.py	152	CODE
LOW	nltk/__init__.py	153	CODE
LOW	nltk/__init__.py	154	CODE
LOW	nltk/__init__.py	155	CODE
LOW	nltk/__init__.py	156	CODE
LOW	nltk/__init__.py	179	CODE
LOW	nltk/__init__.py	179	CODE
LOW	nltk/__init__.py	199	CODE
LOW	nltk/__init__.py	199	CODE
LOW	nltk/__init__.py	199	CODE
LOW	nltk/__init__.py	199	CODE
LOW	nltk/__init__.py	200	CODE
LOW	nltk/__init__.py	200	CODE
LOW	nltk/__init__.py	200	CODE
LOW	nltk/__init__.py	200	CODE
LOW	nltk/__init__.py	200	CODE
LOW	nltk/__init__.py	200	CODE
LOW	nltk/__init__.py	201	CODE
LOW	nltk/__init__.py	201	CODE
LOW	nltk/__init__.py	201	CODE
LOW	nltk/__init__.py	201	CODE
LOW	nltk/__init__.py	201	CODE
LOW	nltk/__init__.py	201	CODE
LOW	nltk/__init__.py	202	CODE
LOW	nltk/__init__.py	202	CODE
LOW	nltk/__init__.py	202	CODE
LOW	nltk/__init__.py	202	CODE
LOW	nltk/__init__.py	202	CODE
LOW	nltk/__init__.py	202	CODE
LOW	nltk/__init__.py	107	CODE
LOW	nltk/__init__.py	173	CODE
LOW	nltk/__init__.py	177	CODE
LOW	nltk/__init__.py	186	CODE
LOW	nltk/grammar.py	1577	CODE
LOW	nltk/book.py	9	CODE
LOW	nltk/book.py	18	CODE
LOW	nltk/book.py	20	CODE
LOW	nltk/collocations.py	36	CODE
LOW	nltk/collocations.py	36	CODE
580 more matches not shown…

Deep Nesting374 hits · 348 pts

Severity	File	Line	Context
LOW	tools/find_deprecated.py	116	CODE
LOW	tools/find_deprecated.py	151	CODE
LOW	nltk/internals.py	574	CODE
LOW	nltk/internals.py	792	CODE
LOW	nltk/internals.py	1197	CODE
LOW	nltk/tgrep.py	322	CODE
LOW	nltk/tgrep.py	398	CODE
LOW	nltk/downloader.py	2614	CODE
LOW	nltk/downloader.py	2876	CODE
LOW	nltk/downloader.py	526	CODE
LOW	nltk/downloader.py	680	CODE
LOW	nltk/downloader.py	991	CODE
LOW	nltk/downloader.py	1106	CODE
LOW	nltk/downloader.py	1208	CODE
LOW	nltk/downloader.py	1415	CODE
LOW	nltk/downloader.py	1456	CODE
LOW	nltk/downloader.py	1486	CODE
LOW	nltk/downloader.py	1537	CODE
LOW	nltk/downloader.py	1950	CODE
LOW	nltk/downloader.py	2046	CODE
LOW	nltk/downloader.py	2114	CODE
LOW	nltk/downloader.py	2205	CODE
LOW	nltk/downloader.py	2439	CODE
LOW	nltk/toolbox.py	274	CODE
LOW	nltk/toolbox.py	461	CODE
LOW	nltk/util.py	221	CODE
LOW	nltk/util.py	300	CODE
LOW	nltk/util.py	342	CODE
LOW	nltk/util.py	431	CODE
LOW	nltk/util.py	471	CODE
LOW	nltk/util.py	544	CODE
LOW	nltk/util.py	639	CODE
LOW	nltk/util.py	1172	CODE
LOW	nltk/pathsec.py	72	CODE
LOW	nltk/pathsec.py	182	CODE
LOW	nltk/pathsec.py	301	CODE
LOW	nltk/pathsec.py	468	CODE
LOW	nltk/grammar.py	1362	CODE
LOW	nltk/grammar.py	1427	CODE
LOW	nltk/grammar.py	563	CODE
LOW	nltk/grammar.py	842	CODE
LOW	nltk/cli.py	42	CODE
LOW	nltk/collections.py	63	CODE
LOW	nltk/collections.py	396	CODE
LOW	nltk/probability.py	693	CODE
LOW	nltk/featstruct.py	1169	CODE
LOW	nltk/featstruct.py	1254	CODE
LOW	nltk/featstruct.py	1586	CODE
LOW	nltk/featstruct.py	2614	CODE
LOW	nltk/featstruct.py	154	CODE
LOW	nltk/featstruct.py	642	CODE
LOW	nltk/featstruct.py	680	CODE
LOW	nltk/featstruct.py	698	CODE
LOW	nltk/featstruct.py	777	CODE
LOW	nltk/featstruct.py	821	CODE
LOW	nltk/featstruct.py	957	CODE
LOW	nltk/featstruct.py	973	CODE
LOW	nltk/featstruct.py	991	CODE
LOW	nltk/featstruct.py	1049	CODE
LOW	nltk/featstruct.py	2326	CODE
314 more matches not shown…

Cross-File Repetition62 hits · 310 pts

Severity	File	Line	Snippet	Context
HIGH	nltk/grammar.py	0	s -> np vp [1.0] np -> det n [0.5] \| np pp [0.25] \| 'john' [0.1] \| 'i' [0.15] det -> 'the' [0.8] \| 'my' [0.2] n -> 'man'	STRING
HIGH	nltk/parse/pchart.py	0	s -> np vp [1.0] np -> det n [0.5] \| np pp [0.25] \| 'john' [0.1] \| 'i' [0.15] det -> 'the' [0.8] \| 'my' [0.2] n -> 'man'	STRING
HIGH	nltk/parse/viterbi.py	0	s -> np vp [1.0] np -> det n [0.5] \| np pp [0.25] \| 'john' [0.1] \| 'i' [0.15] det -> 'the' [0.8] \| 'my' [0.2] n -> 'man'	STRING
HIGH	nltk/grammar.py	0	s -> np vp [1.0] vp -> v np [.59] vp -> v [.40] vp -> vp pp [.01] np -> det n [.41] np -> name [.28] np -> np pp [.31] p	STRING
HIGH	nltk/parse/pchart.py	0	s -> np vp [1.0] vp -> v np [.59] vp -> v [.40] vp -> vp pp [.01] np -> det n [.41] np -> name [.28] np -> np pp [.31] p	STRING
HIGH	nltk/parse/viterbi.py	0	s -> np vp [1.0] vp -> v np [.59] vp -> v [.40] vp -> vp pp [.01] np -> det n [.41] np -> name [.28] np -> np pp [.31] p	STRING
HIGH	nltk/app/chunkparser_app.py	0	enter the tkinter mainloop. this function must be called if this window is created from a non-interactive program (e.g.	STRING
HIGH	nltk/app/rdparser_app.py	0	enter the tkinter mainloop. this function must be called if this window is created from a non-interactive program (e.g.	STRING
HIGH	nltk/app/srparser_app.py	0	enter the tkinter mainloop. this function must be called if this window is created from a non-interactive program (e.g.	STRING
HIGH	nltk/app/chartparser_app.py	0	enter the tkinter mainloop. this function must be called if this window is created from a non-interactive program (e.g.	STRING
HIGH	nltk/sem/drt_glue_demo.py	0	enter the tkinter mainloop. this function must be called if this window is created from a non-interactive program (e.g.	STRING
HIGH	nltk/draw/tree.py	0	enter the tkinter mainloop. this function must be called if this window is created from a non-interactive program (e.g.	STRING
HIGH	nltk/draw/util.py	0	enter the tkinter mainloop. this function must be called if this window is created from a non-interactive program (e.g.	STRING
HIGH	nltk/test/unit/test_reviews_security.py	0	run ``target(result_q, *args)`` in a spawned process with a deadline. returns ``(finished, status, payload)``. if the wo	STRING
HIGH	nltk/test/unit/test_chunk_redos_security.py	0	run ``target(result_q, *args)`` in a spawned process with a deadline. returns ``(finished, status, payload)``. if the wo	STRING
HIGH	nltk/test/unit/test_texttiling_security.py	0	run ``target(result_q, *args)`` in a spawned process with a deadline. returns ``(finished, status, payload)``. if the wo	STRING
HIGH	nltk/test/unit/test_inference_security.py	0	run ``target(result_q, *args)`` in a spawned process with a deadline. returns ``(finished, status, payload)``. if the wo	STRING
HIGH	nltk/translate/ibm2.py	0	probability of target sentence and an alignment given the source sentence	STRING
HIGH	nltk/translate/ibm3.py	0	probability of target sentence and an alignment given the source sentence	STRING
HIGH	nltk/translate/ibm4.py	0	probability of target sentence and an alignment given the source sentence	STRING
HIGH	nltk/translate/ibm5.py	0	probability of target sentence and an alignment given the source sentence	STRING
HIGH	nltk/translate/ibm1.py	0	probability of target sentence and an alignment given the source sentence	STRING
HIGH	nltk/translate/ibm2.py	0	data object to store counts of various parameters during training. includes counts for vacancies.	STRING
HIGH	nltk/translate/ibm3.py	0	data object to store counts of various parameters during training. includes counts for vacancies.	STRING
HIGH	nltk/translate/ibm4.py	0	data object to store counts of various parameters during training. includes counts for vacancies.	STRING
HIGH	nltk/translate/ibm5.py	0	data object to store counts of various parameters during training. includes counts for vacancies.	STRING
HIGH	nltk/sem/glue.py	0	pick an alphabetic character as identifier for an entity in the model. :param value: where to index into the list of cha	STRING
HIGH	nltk/sem/lfg.py	0	pick an alphabetic character as identifier for an entity in the model. :param value: where to index into the list of cha	STRING
HIGH	nltk/inference/mace.py	0	pick an alphabetic character as identifier for an entity in the model. :param value: where to index into the list of cha	STRING
HIGH	nltk/corpus/reader/reviews.py	0	:param root: the root directory for the corpus. :param fileids: a list or regexp specifying the fileids in the corpus. :	STRING
HIGH	nltk/corpus/reader/categorized_sents.py	0	:param root: the root directory for the corpus. :param fileids: a list or regexp specifying the fileids in the corpus. :	STRING
HIGH	nltk/corpus/reader/comparative_sents.py	0	:param root: the root directory for the corpus. :param fileids: a list or regexp specifying the fileids in the corpus. :	STRING
HIGH	nltk/corpus/reader/pros_cons.py	0	:param root: the root directory for the corpus. :param fileids: a list or regexp specifying the fileids in the corpus. :	STRING
HIGH	nltk/corpus/reader/reviews.py	0	return all words and punctuation symbols in the corpus or in the specified files/categories. :param fileids: a list or r	STRING
HIGH	nltk/corpus/reader/categorized_sents.py	0	return all words and punctuation symbols in the corpus or in the specified files/categories. :param fileids: a list or r	STRING
HIGH	nltk/corpus/reader/comparative_sents.py	0	return all words and punctuation symbols in the corpus or in the specified files/categories. :param fileids: a list or r	STRING
HIGH	nltk/corpus/reader/pros_cons.py	0	return all words and punctuation symbols in the corpus or in the specified files/categories. :param fileids: a list or r	STRING
HIGH	nltk/corpus/reader/aligned.py	0	:return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str)	STRING
HIGH	nltk/corpus/reader/plaintext.py	0	:return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str)	STRING
HIGH	nltk/corpus/reader/chunked.py	0	:return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str)	STRING
HIGH	nltk/corpus/reader/tagged.py	0	:return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str)	STRING
HIGH	nltk/corpus/reader/semcor.py	0	:return: the given file(s) as a list of words and punctuation symbols. :rtype: list(str)	STRING
HIGH	nltk/corpus/reader/aligned.py	0	:return: the given file(s) as a list of sentences or utterances, each encoded as a list of word strings. :rtype: list(li	STRING
HIGH	nltk/corpus/reader/plaintext.py	0	:return: the given file(s) as a list of sentences or utterances, each encoded as a list of word strings. :rtype: list(li	STRING
HIGH	nltk/corpus/reader/chunked.py	0	:return: the given file(s) as a list of sentences or utterances, each encoded as a list of word strings. :rtype: list(li	STRING
HIGH	nltk/corpus/reader/tagged.py	0	:return: the given file(s) as a list of sentences or utterances, each encoded as a list of word strings. :rtype: list(li	STRING
HIGH	nltk/corpus/reader/plaintext.py	0	:return: the given file(s) as a list of chapters, each encoded as a list of sentences, which are in turn encoded as list	STRING
HIGH	nltk/corpus/reader/chunked.py	0	:return: the given file(s) as a list of chapters, each encoded as a list of sentences, which are in turn encoded as list	STRING
HIGH	nltk/corpus/reader/tagged.py	0	:return: the given file(s) as a list of chapters, each encoded as a list of sentences, which are in turn encoded as list	STRING
HIGH	nltk/corpus/reader/plaintext.py	0	initialize the corpus reader. categorization arguments (``cat_pattern``, ``cat_map``, and ``cat_file``) are passed to th	STRING
HIGH	nltk/corpus/reader/markdown.py	0	initialize the corpus reader. categorization arguments (``cat_pattern``, ``cat_map``, and ``cat_file``) are passed to th	STRING
HIGH	nltk/corpus/reader/tagged.py	0	initialize the corpus reader. categorization arguments (``cat_pattern``, ``cat_map``, and ``cat_file``) are passed to th	STRING
HIGH	nltk/parse/recursivedescent.py	0	create a new ``viterbiparser`` parser, that uses ``grammar`` to parse texts. :type grammar: pcfg :param grammar: the gra	STRING
HIGH	nltk/parse/shiftreduce.py	0	create a new ``viterbiparser`` parser, that uses ``grammar`` to parse texts. :type grammar: pcfg :param grammar: the gra	STRING
HIGH	nltk/parse/viterbi.py	0	create a new ``viterbiparser`` parser, that uses ``grammar`` to parse texts. :type grammar: pcfg :param grammar: the gra	STRING
HIGH	nltk/parse/recursivedescent.py	0	set the level of tracing output that should be generated when parsing a text. :type trace: int :param trace: the trace l	STRING
HIGH	nltk/parse/pchart.py	0	set the level of tracing output that should be generated when parsing a text. :type trace: int :param trace: the trace l	STRING
HIGH	nltk/parse/shiftreduce.py	0	set the level of tracing output that should be generated when parsing a text. :type trace: int :param trace: the trace l	STRING
HIGH	nltk/parse/viterbi.py	0	set the level of tracing output that should be generated when parsing a text. :type trace: int :param trace: the trace l	STRING
HIGH	nltk/inference/tableau.py	0	:param goal: input expression to prove :type goal: sem.expression :param assumptions: input expressions to use as assump	STRING
2 more matches not shown…

Decorative Section Separators81 hits · 290 pts

Severity	File	Line	Snippet	Context
MEDIUM	nltk/langnames.py	131	# =======================================================================	COMMENT
MEDIUM	nltk/langnames.py	257	# ======================================================================	COMMENT
MEDIUM	nltk/downloader.py	70	# ----------------------------------------------------------------------	COMMENT
MEDIUM	nltk/downloader.py	935	# ---------------------------------------------------	COMMENT
MEDIUM	nltk/downloader.py	2678	# -----------------------------------------------------------	COMMENT
MEDIUM	nltk/tabdata.py	60	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	nltk/tabdata.py	62	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	nltk/tabdata.py	98	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	nltk/tabdata.py	100	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	nltk/data.py	282	# ----------------------------------------------------------------------	STRING
MEDIUM	nltk/data.py	284	# ----------------------------------------------------------------------	STRING
MEDIUM	nltk/chunk/named_entity.py	317	# ======================================================================================	COMMENT
MEDIUM	nltk/chunk/named_entity.py	359	# ======================================================================================	COMMENT
MEDIUM⚡	nltk/test/unit/test_aline.py	87	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_aline.py	89	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_distance.py	168	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_distance.py	170	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_distance.py	203	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_distance.py	205	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_distance.py	144	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_distance.py	146	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_distance.py	187	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_distance.py	189	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_distance.py	426	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_distance.py	428	# ---------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_pathsec.py	594	# ----------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_pathsec.py	596	# ----------------------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_pathsec.py	534	# ----------------------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_pathsec.py	536	# ----------------------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_pathsec.py	568	# ----------------------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_pathsec.py	570	# ----------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_conll_cmudict_security.py	35	# --------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_conll_cmudict_security.py	37	# --------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_conll_cmudict_security.py	71	# --------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_conll_cmudict_security.py	73	# --------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_conll_cmudict_security.py	82	# --------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_conll_cmudict_security.py	84	# --------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_conll_cmudict_security.py	108	# --------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_conll_cmudict_security.py	110	# --------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_corpus_util.py	16	# ----------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_corpus_util.py	18	# ----------------------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_corpus_util.py	79	# ----------------------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_corpus_util.py	81	# ----------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_verbnet.py	14	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_verbnet.py	16	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_verbnet.py	43	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_verbnet.py	45	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_verbnet.py	86	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_verbnet.py	89	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_verbnet.py	203	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_verbnet.py	205	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_verbnet.py	217	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_verbnet.py	219	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_verbnet.py	310	# ---------------------------------------------------------	COMMENT
MEDIUM	nltk/test/unit/test_verbnet.py	312	# ---------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_data_security.py	245	# ---------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/test_data_security.py	247	# ---------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/translate/test_meteor.py	25	# ---------------------------------------------------------------------------	COMMENT
MEDIUM⚡	nltk/test/unit/translate/test_meteor.py	27	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	nltk/sem/drt.py	1061	# ==========================================================	COMMENT
21 more matches not shown…

Self-Referential Comments74 hits · 186 pts

Severity	File	Line	Snippet	Context
MEDIUM	tools/find_deprecated.py	55	# Define a regexp to search for deprecated definitions.	COMMENT
MEDIUM	nltk/downloader.py	1021	# Define a helper function for displaying output:	COMMENT
MEDIUM	nltk/downloader.py	1683	# Create the main window.	COMMENT
MEDIUM	nltk/downloader.py	1730	# Create the top-level frame structures	COMMENT
MEDIUM	nltk/downloader.py	1753	# Create the tabs	COMMENT
MEDIUM	nltk/downloader.py	1762	# Create the table.	COMMENT
MEDIUM	nltk/downloader.py	1868	# Create a menu to control which columns of the table are	COMMENT
MEDIUM	nltk/downloader.py	1883	# Create a sort menu	COMMENT
MEDIUM	nltk/downloader.py	2383	# Create a new data server object for the download operation,	COMMENT
MEDIUM	nltk/toolbox.py	325	"""This class is the base class for settings files."""	STRING
MEDIUM	nltk/lazyimport.py	1	# This module is from mx/DateTime/LazyModule.py and is	COMMENT
MEDIUM	nltk/text.py	216	# Create the pretty lines with the query_word in the middle.	COMMENT
MEDIUM	nltk/text.py	223	# Create the ConcordanceLine	COMMENT
MEDIUM	nltk/text.py	619	# Create the model when using it the first time.	COMMENT
MEDIUM	nltk/probability.py	1176	# Create a heldout probability distribution for each pair of	COMMENT
MEDIUM	nltk/probability.py	1325	# This method is problematic because the situation ``N(c+1) == 0``	COMMENT
MEDIUM	nltk/featstruct.py	2251	# Create the new feature structure	COMMENT
MEDIUM	nltk/tree/transforms.py	128	# This method is 7x faster which helps when parsing 40,000 sentences.	STRING
MEDIUM	nltk/app/rdparser_app.py	113	# Create the basic frames.	COMMENT
MEDIUM	nltk/app/srparser_app.py	122	# Create the basic frames.	COMMENT
MEDIUM	nltk/app/chartparser_app.py	449	# Create a widget for it.	COMMENT
MEDIUM	nltk/app/chartparser_app.py	1054	# Create the chart canvas.	COMMENT
MEDIUM	nltk/app/chartparser_app.py	1059	# Create the sentence canvas.	COMMENT
MEDIUM	nltk/app/chartparser_app.py	1070	# Create the tree canvas.	COMMENT
MEDIUM	nltk/app/chartparser_app.py	1789	# Create the root window.	COMMENT
MEDIUM⚡	nltk/classify/util.py	243	# Create a list of male names to be used as positive-labeled examples for training	COMMENT
MEDIUM⚡	nltk/classify/util.py	246	# Create a list of male and female names to be used as unlabeled examples	COMMENT
MEDIUM⚡	nltk/classify/util.py	249	# Create a test set with correctly-labeled male and female names	COMMENT
MEDIUM	nltk/classify/__init__.py	49	>>> # Define a feature detector function.	STRING
MEDIUM	nltk/classify/positivenaivebayes.py	142	# Create the P(label) distribution.	COMMENT
MEDIUM	nltk/classify/positivenaivebayes.py	147	# Create the P(fval\|label, fname) distribution.	COMMENT
MEDIUM	nltk/classify/naivebayes.py	235	# Create the P(label) distribution	COMMENT
MEDIUM	nltk/classify/naivebayes.py	238	# Create the P(fval\|label, fname) distribution	COMMENT
MEDIUM	nltk/test/unit/test_corpus_reader.py	24	# Create a symlink inside corpus_root that points outside corpus_root	COMMENT
MEDIUM	nltk/sem/drt.py	84	"""This method is intended to be overridden for logics that	STRING
MEDIUM	nltk/sem/util.py	140	# Initialize a variable assignment with parameter ``dom``	COMMENT
MEDIUM⚡	nltk/sem/logic.py	1944	"""This class represents implications"""	STRING
MEDIUM⚡	nltk/sem/logic.py	1951	"""This class represents biconditionals"""	STRING
MEDIUM⚡	nltk/sem/logic.py	1958	"""This class represents equality expressions like "(x = y)"."""	STRING
MEDIUM	nltk/sem/logic.py	323	"""This method is intended to be overridden for logics that	STRING
MEDIUM	nltk/sem/logic.py	1473	"""This class represents a variable to be used as a predicate or entity"""	STRING
MEDIUM	nltk/sem/logic.py	1545	"""This class represents variables that take the form of a single lowercase	STRING
MEDIUM	nltk/sem/logic.py	1575	"""This class represents variables that take the form of a single uppercase	STRING
MEDIUM	nltk/sem/logic.py	1590	"""This class represents variables that take the form of a single lowercase	STRING
MEDIUM	nltk/sem/logic.py	1597	"""This class represents variables that do not take the form of a single	STRING
MEDIUM	nltk/sem/logic.py	1918	"""This class represents conjunctions"""	STRING
MEDIUM	nltk/sem/logic.py	1931	"""This class represents disjunctions"""	STRING
MEDIUM	nltk/sem/drt_glue_demo.py	55	# Create the basic frames.	COMMENT
MEDIUM	nltk/corpus/europarl_raw.py	13	# Create a new corpus reader instance for each European language	COMMENT
MEDIUM	nltk/parse/pchart.py	508	# Define a list of parsers. We'll use all parsers.	COMMENT
MEDIUM	nltk/parse/chart.py	639	# Create the index.	COMMENT
MEDIUM	nltk/parse/nonprojectivedependencyparser.py	506	# Create a new node v_n+1 with address = len(nodes) + 1	COMMENT
MEDIUM	nltk/parse/featurechart.py	201	# Create the index.	COMMENT
MEDIUM	nltk/parse/featurechart.py	285	# Create a copy of the bindings.	STRING
MEDIUM	nltk/parse/featurechart.py	298	# Create a copy of the bindings.	STRING
MEDIUM	nltk/parse/earleychart.py	103	# Create the index.	COMMENT
MEDIUM	nltk/parse/earleychart.py	153	# Create the index.	COMMENT
MEDIUM	nltk/draw/cfg.py	185	# Create the top-level window.	STRING
MEDIUM	nltk/draw/cfg.py	248	# Create the basic Text widget & scrollbar.	STRING
MEDIUM	nltk/draw/cfg.py	582	# Create the basic frames	STRING
14 more matches not shown…

Cross-Language Confusion28 hits · 160 pts

Severity	File	Line	Snippet	Context
HIGH	nltk/featstruct.py	537	Return True if ``self`` subsumes ``other``. I.e., return true	STRING
HIGH	nltk/featstruct.py	1489	contain the unified value, the value of ``fstruct2`` is undefined,	STRING
HIGH	nltk/cluster/util.py	157	queue.push((priority, node))	CODE
HIGH	nltk/app/chartparser_app.py	1237	edgelen = max(edge.length(), 1)	CODE
HIGH⚡	nltk/classify/maxent.py	1125	fcount = numpy.zeros(encoding.length(), "d")	CODE
HIGH⚡	nltk/classify/maxent.py	1135	fcount = numpy.zeros(encoding.length(), "d")	CODE
HIGH	nltk/classify/maxent.py	118	assert encoding.length() == len(weights)	CODE
HIGH	nltk/classify/maxent.py	130	assert self._encoding.length() == len(new_weights)	CODE
HIGH	nltk/classify/maxent.py	239	self._encoding.length(),	CODE
HIGH	nltk/classify/maxent.py	1346	deltas = numpy.ones(encoding.length(), "d")	STRING
HIGH	nltk/classify/maxent.py	1351	A = numpy.zeros((len(nfmap), encoding.length()), "d")	STRING
HIGH	nltk/classify/maxent.py	1483	weights = parse_megam_weights(stdout, encoding.length(), explicit)	STRING
HIGH	nltk/classify/tadm.py	115	for i in range(encoding.length()):	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	241	stack.push(_Hypothesis(0.2))	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	242	stack.push(poor_hypothesis)	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	243	stack.push(_Hypothesis(0.1))	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	244	stack.push(_Hypothesis(0.3))	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	256	stack.push(poor_hypothesis)	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	257	stack.push(worse_hypothesis)	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	258	stack.push(_Hypothesis(0.9)) # greatly superior hypothesis	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	270	stack.push(_Hypothesis(0.9)) # greatly superior hypothesis	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	271	stack.push(poor_hypothesis)	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	282	stack.push(_Hypothesis(0.0))	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	283	stack.push(best_hypothesis)	CODE
HIGH⚡	nltk/test/unit/translate/test_stack_decoder.py	284	stack.push(_Hypothesis(0.5))	CODE
HIGH	nltk/parse/pchart.py	422	queue.sort(key=lambda edge: edge.length())	CODE
HIGH	nltk/parse/chart.py	602	:param length: Only generate edges ``e`` where ``e.length()==length``	STRING
HIGH	nltk/parse/chart.py	888	edges = sorted((e.length(), e.start(), e) for e in self)	CODE

Excessive Try-Catch Wrapping109 hits · 133 pts

Severity	File	Line	Snippet	Context
LOW	tools/global_replace.py	38	except Exception:	CODE
MEDIUM	tools/global_replace.py	19	def update(file, pattern, replacement):	CODE
LOW	tools/huggingface/push_stopwords.py	199	except Exception:	CODE
MEDIUM⚡	nltk/downloader.py	1449	print("Error reading from server: %s" % e)	CODE
MEDIUM⚡	nltk/downloader.py	1451	print("Error connecting to server: %s" % e.reason)	CODE
MEDIUM	nltk/downloader.py	1565	print(f"Error reading <{new_url!r}>:\n {e}")	CODE
MEDIUM	nltk/downloader.py	708	def _within(child, root):	CODE
LOW	nltk/downloader.py	1316	except Exception:	CODE
LOW	nltk/downloader.py	1564	except Exception as e:	CODE
LOW	nltk/downloader.py	2295	except Exception:	CODE
LOW	nltk/downloader.py	2646	except Exception as e:	CODE
LOW	nltk/downloader.py	2699	except Exception as e:	CODE
LOW	nltk/downloader.py	2719	except Exception as e:	CODE
LOW	nltk/downloader.py	2722	except Exception as e:	CODE
LOW	nltk/downloader.py	2898	except Exception as e:	CODE
LOW	nltk/downloader.py	2902	except Exception as e:	CODE
LOW	nltk/pathsec.py	154	except Exception:	CODE
LOW	nltk/pathsec.py	353	except Exception:	CODE
LOW	nltk/featstruct.py	2711	except Exception:	STRING
LOW	nltk/data.py	603	except Exception as e:	CODE
LOW	nltk/metrics/distance.py	278	except Exception:	CODE
MEDIUM	nltk/twitter/twitterclient.py	100	print(f"Error (stream will continue): {e}")	CODE
MEDIUM	nltk/twitter/twitterclient.py	117	print(f"Error (stream will continue): {e}")	CODE
LOW	nltk/app/concordance_app.py	403	except Exception:	CODE
LOW	nltk/app/concordance_app.py	654	except Exception as e:	CODE
MEDIUM	nltk/app/concordance_app.py	647	def run(self):	CODE
LOW	nltk/app/chunkparser_app.py	1463	except Exception:	CODE
MEDIUM	nltk/app/chunkparser_app.py	1333	def _chunkparse(self, words):	CODE
LOW	nltk/app/rdparser_app.py	711	except Exception:	CODE
LOW	nltk/app/rdparser_app.py	725	except Exception:	CODE
LOW	nltk/app/srparser_app.py	656	except Exception:	CODE
LOW	nltk/app/srparser_app.py	671	except Exception:	CODE
MEDIUM	nltk/app/chartparser_app.py	1816	print("Error creating Tree View")	CODE
LOW	nltk/app/chartparser_app.py	188	except Exception:	CODE
LOW	nltk/app/chartparser_app.py	532	except Exception:	CODE
LOW	nltk/app/chartparser_app.py	622	except Exception:	CODE
LOW	nltk/app/chartparser_app.py	800	except Exception as e:	CODE
LOW	nltk/app/chartparser_app.py	811	except Exception as e:	CODE
LOW	nltk/app/chartparser_app.py	1815	except Exception:	CODE
LOW	nltk/app/chartparser_app.py	2241	except Exception:	CODE
LOW	nltk/app/chartparser_app.py	2283	except Exception as e:	CODE
LOW	nltk/app/chartparser_app.py	2297	except Exception as e:	CODE
LOW	nltk/app/chartparser_app.py	2316	except Exception as e:	CODE
LOW	nltk/app/chartparser_app.py	2338	except Exception as e:	CODE
LOW	nltk/app/collocations_app.py	421	except Exception as e:	CODE
MEDIUM	nltk/app/collocations_app.py	406	def run(self):	CODE
LOW	nltk/app/nemo_app.py	95	except Exception:	STRING
LOW	nltk/app/nemo_app.py	115	except Exception:	STRING
LOW	nltk/classify/weka.py	78	except Exception:	CODE
MEDIUM	nltk/classify/weka.py	73	def _check_weka_version(jar):	CODE
LOW	nltk/test/unit/test_valuation_redos.py	50	except Exception:	CODE
MEDIUM	nltk/test/unit/test_valuation_redos.py	45	def _parse_worker():	CODE
LOW	nltk/test/unit/test_ccg_lexicon_security.py	39	except Exception:	CODE
MEDIUM	nltk/test/unit/test_ccg_lexicon_security.py	26	def _regex_worker(result_q):	CODE
LOW	nltk/test/unit/test_twitter_auth.py	66	except Exception as e:	CODE
LOW	nltk/sem/boxer.py	1009	except Exception as e:	CODE
MEDIUM	nltk/sem/boxer.py	846	def handle(self, tok, context):	CODE
LOW	nltk/sem/glue.py	241	except Exception:	CODE
LOW	nltk/sem/glue.py	651	except Exception as e:	CODE
MEDIUM	nltk/sem/glue.py	655	print("Error when checking logical equality of statements", e)	CODE
49 more matches not shown…

Over-Commented Block96 hits · 89 pts

Severity	File	Line	Snippet	Context
LOW	tools/global_replace.py	1	#!/usr/bin/env python	COMMENT
LOW	tools/github_actions/third-party.sh	101		COMMENT
LOW	nltk/internals.py	741	:param verbose: Whether or not to print path when a file is found.	COMMENT
LOW	nltk/downloader.py	601		COMMENT
LOW	nltk/downloader.py	681	yield StartPackageMessage(info)	COMMENT
LOW	nltk/downloader.py	701	# contained, but open()/os.replace() follow the link and write	COMMENT
LOW	nltk/grammar.py	1	# Natural Language Toolkit: Context Free Grammars	COMMENT
LOW	nltk/probability.py	1	# Natural Language Toolkit: Probability and Statistics	COMMENT
LOW	nltk/probability.py	1301	##//////////////////////////////////////////////////////	COMMENT
LOW	nltk/probability.py	1321	# where c is the original count, N(i) is the number of event types	COMMENT
LOW	nltk/probability.py	1341	# appropriate for high values of r. For low values of r, they use the	COMMENT
LOW	nltk/probability.py	1661	##//////////////////////////////////////////////////////	COMMENT
LOW	nltk/probability.py	1681	# titled "Improved backing-off for n-gram language modeling." In the same paper	COMMENT
LOW	nltk/collocations.py	381	)	COMMENT
LOW	nltk/featstruct.py	2461	)	COMMENT
LOW	nltk/data.py	281		COMMENT
LOW	nltk/decorators.py	241	## "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT	COMMENT
LOW	nltk/misc/sort.py	1	# Natural Language Toolkit: List Sorting	COMMENT
LOW	nltk/tree/immutable.py	1	# Natural Language Toolkit: Text Trees	COMMENT
LOW	nltk/tree/parented.py	1	# Natural Language Toolkit: Text Trees	COMMENT
LOW	nltk/tree/tree.py	1	# Natural Language Toolkit: Text Trees	COMMENT
LOW	nltk/tree/parsing.py	1	# Natural Language Toolkit: Text Trees	COMMENT
LOW	nltk/tree/probabilistic.py	1	# Natural Language Toolkit: Text Trees	COMMENT
LOW	nltk/metrics/confusionmatrix.py	61	indices = {val: i for (i, val) in enumerate(values)}	COMMENT
LOW	nltk/app/srparser_app.py	821	# Display the available productions.	COMMENT
LOW	nltk/app/srparser_app.py	841	# color='#000000', font=self._font)	COMMENT
LOW	nltk/app/chartparser_app.py	2561	# ChartComparer(*charts).mainloop()	COMMENT
LOW	nltk/app/wordnet_app.py	221	:param port: The port number for the server to listen on, defaults to	COMMENT
LOW	nltk/classify/weka.py	181	" Header: %s" % lines[0]	COMMENT
LOW	nltk/classify/api.py	161	# strs or ints, but can be any immutable type. The set	COMMENT
LOW	nltk/classify/api.py	181	# the i\ th element of ``featuresets``.	COMMENT
LOW	nltk/chunk/regexp.py	881	return "<ChunkRuleWithContext: {!r}, {!r}, {!r}>".format(	COMMENT
LOW	nltk/test/pytest.ini	1	[pytest]	COMMENT
LOW	nltk/test/unit/test_distance.py	21	# Allowing transpositions reduces the number of edits required.	COMMENT
LOW	nltk/test/unit/test_distance.py	41	# Ought to have the same results with and without transpositions	COMMENT
LOW	nltk/test/unit/test_distance.py	61	#	COMMENT
LOW	nltk/test/unit/test_distance.py	81	# without transpositions:	COMMENT
LOW	nltk/test/unit/lm/test_models.py	101	def test_mle_bigram_entropy_perplexity_unseen(mle_bigram_model):	COMMENT
LOW	nltk/test/unit/lm/test_models.py	161	return model	COMMENT
LOW	nltk/test/unit/lm/test_models.py	261	("d", ["c"], 2.0 / 9),	COMMENT
LOW	nltk/test/unit/lm/test_models.py	321	"word, context, expected_score",	COMMENT
LOW	nltk/test/unit/lm/test_models.py	381	# = 1 / 14	COMMENT
LOW	nltk/test/unit/lm/test_models.py	441	# P(c\|b) = alpha('bc') + gamma('b') * P(c)	COMMENT
LOW	nltk/chat/zen.py	81	),	COMMENT
LOW	nltk/chat/eliza.py	1	# Natural Language Toolkit: Eliza	COMMENT
LOW	nltk/translate/phrase_based.py	1	# Natural Language Toolkit: Phrase Extraction Algorithm	COMMENT
LOW	nltk/translate/ibm1.py	1	# Natural Language Toolkit: IBM Model 1	COMMENT
LOW	nltk/translate/ribes_score.py	201	# (an O(L^2) scan) for every context window, giving up to O(L^4) time: a	COMMENT
LOW	nltk/sem/chat80.py	641		COMMENT
LOW	nltk/sem/hole.py	221	plug_acc = plug_acc0.copy()	COMMENT
LOW	nltk/sem/boxer.py	481	# if cat == 'des':	COMMENT
LOW	nltk/corpus/__init__.py	501	# ipipan = LazyCorpusLoader(	COMMENT
LOW	nltk/corpus/reader/bracket_parse.py	21	TAGWORD = re.compile(r"\(([^\s()]+) ([^\s()]+)\)")	COMMENT
LOW	nltk/corpus/reader/framenet.py	3341		COMMENT
LOW	nltk/corpus/reader/framenet.py	3441	print("\nNames of all of the corpora used for fulltext annotation:")	COMMENT
LOW	nltk/corpus/reader/senseval.py	181	# remove <@>, <p>, </p>	COMMENT
LOW	nltk/corpus/reader/xmldocs.py	1	# Natural Language Toolkit: XML Corpus Reader	COMMENT
LOW	nltk/corpus/reader/wordnet.py	1	# Natural Language Toolkit: WordNet	COMMENT
LOW	nltk/corpus/reader/wordnet.py	41		COMMENT
LOW	nltk/corpus/reader/wordnet.py	981	# Get the longest path from the LCS to the root,	COMMENT
36 more matches not shown…

Redundant / Tautological Comments53 hits · 79 pts

Severity	File	Line	Snippet	Context
LOW	nltk/jsontags.py	64	# Check if we have a tagged object.	COMMENT
LOW	nltk/internals.py	396	# Check if a method has been overridden	COMMENT
LOW	nltk/internals.py	615	# Check if the alternative is inside a 'file' directory	COMMENT
LOW⚡	nltk/internals.py	631	# Check if the environment variable contains a direct path to the bin	COMMENT
LOW⚡	nltk/internals.py	637	# Check if the possible bin names exist inside the environment variable directories	COMMENT
LOW⚡	nltk/internals.py	645	# Check if the alternative is inside a 'file' directory	COMMENT
LOW⚡	nltk/internals.py	648	# Check if the alternative is inside a 'bin' directory	COMMENT
LOW	nltk/downloader.py	1144	# Check if the file has the correct size.	COMMENT
LOW	nltk/downloader.py	1152	# Check if the file's checksum matches.	COMMENT
LOW	nltk/downloader.py	1212	# Check if the index is already up-to-date. If so, do nothing.	COMMENT
LOW	nltk/downloader.py	1338	# Check if we are on GAE where we cannot write into filesystem.	COMMENT
LOW	nltk/downloader.py	1342	# Check if we have sufficient permissions to install in a	COMMENT
LOW	nltk/downloader.py	2427	# Check if we've been told to kill ourselves:	COMMENT
LOW	nltk/util.py	679	queue.append(child) # Add child to queue	CODE
LOW	nltk/__init__.py	181	# Check if tkinter exists without importing it to avoid crashes after	COMMENT
LOW	nltk/collections.py	468	# Return the value	COMMENT
LOW	nltk/probability.py	2499	# Print the results in a formatted table.	COMMENT
LOW	nltk/featstruct.py	1788	# Print the result.	COMMENT
LOW	nltk/featstruct.py	2363	# Check if it's a special feature.	COMMENT
LOW	nltk/featstruct.py	2369	# Check if this feature has a value already.	COMMENT
LOW	nltk/data.py	753	# Check if the resource name includes a zipfile name	COMMENT
LOW	nltk/data.py	1758	# Return the result	COMMENT
LOW	nltk/app/chunkparser_app.py	714	# Check if we've seen this grammar already. If so, then	COMMENT
LOW	nltk/app/chunkparser_app.py	748	# Check if we're done	COMMENT
LOW	nltk/app/chunkparser_app.py	1301	# Display the results	COMMENT
LOW	nltk/app/rdparser_app.py	644	# Check if we just completed a parse.	COMMENT
LOW	nltk/app/chartparser_app.py	1273	# Check if we can fit the edge in this level.	COMMENT
LOW	nltk/app/wordnet_app.py	158	# Set type to plain to prevent XSS by printing the path as HTML	COMMENT
LOW	nltk/classify/weka.py	125	# Check if something went wrong:	COMMENT
LOW	nltk/classify/weka.py	350	# Check if the tokens are labeled or unlabeled. If unlabeled,	COMMENT
LOW	nltk/classify/maxent.py	726	# Return the result	COMMENT
LOW	nltk/classify/megam.py	100	# Write the file, which contains one line per instance.	COMMENT
LOW	nltk/translate/phrase_based.py	71	# Check if alignment points are consistent.	COMMENT
LOW	nltk/corpus/reader/util.py	248	# Check if it's in the cache.	COMMENT
LOW	nltk/corpus/reader/util.py	260	# Check if it's in the cache.	COMMENT
LOW	nltk/corpus/reader/wordnet.py	1923	# Open the file for reading. Note that we can not re-use	COMMENT
LOW	nltk/parse/pchart.py	266	# Assign probabilities to the trees.	COMMENT
LOW	nltk/parse/chart.py	685	# Add it to the list of edges.	COMMENT
LOW	nltk/parse/nonprojectivedependencyparser.py	628	# Set roots to attempt	COMMENT
LOW	nltk/parse/earleychart.py	539	# Print results.	COMMENT
LOW	nltk/stem/porter.py	719	# Print the results.	COMMENT
LOW	nltk/stem/lancaster.py	176	# Check if a user wants to strip prefix	COMMENT
LOW	nltk/stem/lancaster.py	178	# Check if a user wants to use his/her own rule tuples.	COMMENT
LOW	nltk/inference/tableau.py	143	# Check if the branch is closed. Return 'True' if it is	COMMENT
LOW	nltk/inference/tableau.py	166	# Check if the branch is closed. Return 'True' if it is	COMMENT
LOW	nltk/inference/tableau.py	189	# Check if the branch is closed. Return 'True' if it is	COMMENT
LOW	nltk/inference/tableau.py	203	# Check if the branch is closed. Return 'True' if it is	COMMENT
LOW	nltk/tag/brill_trainer.py	531	# Check if the change causes any rule at this position to	STRING
LOW	nltk/tag/brill_trainer.py	540	# Check if the change causes our templates to propose any	STRING
LOW	nltk/tokenize/punkt.py	1694	# Check if any initials or ordinals tokens that are marked	COMMENT
LOW	nltk/ccg/chart.py	178	# Check if the two edges are permitted to combine.	COMMENT
LOW	nltk/ccg/chart.py	311	# Output the resulting parses	COMMENT
LOW	nltk/ccg/chart.py	443	# Print the resulting category on a new line.	COMMENT

Modern Structural Boilerplate34 hits · 31 pts

Severity	File	Line	Snippet	Context
LOW	nltk/jsontags.py	76	__all__ = ["register_tag", "json_tags", "JSONTaggedEncoder", "JSONTaggedDecoder"]	CODE
LOW	nltk/pathsec.py	586	__all__ = [	CODE
LOW	nltk/treeprettyprinter.py	28	__all__ = ["TreePrettyPrinter"]	CODE
LOW	nltk/grammar.py	1732	__all__ = [	CODE
LOW	nltk/text.py	818	__all__ = [	CODE
LOW	nltk/probability.py	2553	__all__ = [	CODE
LOW	nltk/treetransforms.py	126	__all__ = ["chomsky_normal_form", "un_chomsky_normal_form", "collapse_unary"]	STRING
LOW	nltk/collocations.py	417	__all__ = [	CODE
LOW	nltk/featstruct.py	2774	__all__ = [	STRING
LOW	nltk/data.py	1823	__all__ = [	CODE
LOW	nltk/decorators.py	16	__all__ = ["decorator", "new_wrapper", "getinfo"]	CODE
LOW	nltk/tree/immutable.py	119	__all__ = [	CODE
LOW	nltk/tree/transforms.py	344	__all__ = ["chomsky_normal_form", "un_chomsky_normal_form", "collapse_unary"]	CODE
LOW	nltk/tree/parented.py	587	__all__ = [	CODE
LOW	nltk/tree/tree.py	985	__all__ = [	STRING
LOW	nltk/tree/parsing.py	63	__all__ = [	CODE
LOW	nltk/tree/__init__.py	37	__all__ = [	CODE
LOW	nltk/tree/probabilistic.py	74	__all__ = ["ProbabilisticTree"]	CODE
LOW	nltk/tree/prettyprinter.py	624	__all__ = ["TreePrettyPrinter"]	CODE
LOW	nltk/app/concordance_app.py	709	__all__ = ["app"]	CODE
LOW	nltk/app/chunkparser_app.py	1500	__all__ = ["app"]	CODE
LOW	nltk/app/rdparser_app.py	1052	__all__ = ["app"]	CODE
LOW	nltk/app/srparser_app.py	937	__all__ = ["app"]	CODE
LOW	nltk/app/wordfreq_app.py	36	__all__ = ["app"]	CODE
LOW	nltk/app/chartparser_app.py	2570	__all__ = ["app"]	CODE
LOW	nltk/app/collocations_app.py	438	__all__ = ["app"]	CODE
LOW	nltk/app/nemo_app.py	163	__all__ = ["app"]	STRING
LOW	nltk/app/wordnet_app.py	1035	__all__ = ["app"]	STRING
LOW	nltk/classify/scikitlearn.py	43	__all__ = ["SklearnClassifier"]	CODE
LOW	nltk/lm/__init__.py	225	__all__ = [	CODE
LOW	nltk/corpus/reader/__init__.py	114	__all__ = [	CODE
LOW⚡	nltk/parse/bllip.py	82	__all__ = ["BllipParser"]	CODE
LOW	nltk/parse/nonprojectivedependencyparser.py	15	logger = logging.getLogger(__name__)	CODE
LOW	nltk/huggingface/__init__.py	9	__all__ = ["download"]	CODE

AI Slop Vocabulary22 hits · 29 pts

Severity	File	Line	Snippet	Context
MEDIUM	nltk/util.py	251	# FIX: Use is_relative_to for robust boundary check	COMMENT
LOW	nltk/collections.py	267	# If the slice is small enough, just use a tuple.	COMMENT
LOW	nltk/text.py	123	# nothing in common -- just return an empty freqdist.	COMMENT
LOW	nltk/probability.py	1330	# Simple Good-Turing. As a smoothing curve they simply use a power curve:	COMMENT
LOW	nltk/featstruct.py	319	# their children a second time, so just return true.	COMMENT
LOW	nltk/data.py	1111	# If we've cached the resource, then just return it.	COMMENT
LOW	nltk/data.py	1684	# If nothing's buffered, then just return our current filepos:	COMMENT
LOW	nltk/app/chunkparser_app.py	715	# just use the old evaluation values.	COMMENT
LOW	nltk/test/unit/test_downloader.py	88	# Cannot mock a zip here as we are trying to validate file checksums, so just create a simple one with the XML	COMMENT
LOW	nltk/translate/ribes_score.py	351	# To avoid this, we can just return the lowest possible score.	COMMENT
MEDIUM	nltk/sem/drt.py	151	"""This method serves as a hook for other logic parsers that	STRING
MEDIUM	nltk/sem/drt.py	156	"""This method serves as a hook for other logic parsers that	STRING
MEDIUM	nltk/sem/logic.py	456	"""This method serves as a hook for other logic parsers that	STRING
MEDIUM	nltk/sem/logic.py	494	"""This method serves as a hook for other logic parsers that	STRING
MEDIUM	nltk/sem/logic.py	515	"""This method serves as a hook for other logic parsers that	STRING
LOW	nltk/corpus/__init__.py	194	# [XX] This should probably just use TaggedCorpusReader:	COMMENT
LOW	nltk/corpus/reader/verbnet.py	167	# File identifier: just return the xml.	COMMENT
LOW	nltk/corpus/reader/framenet.py	1161	# as it's easier to just call frame().	COMMENT
LOW	nltk/corpus/reader/semcor.py	201	# solution: just use the lemma name as a string	COMMENT
LOW	nltk/parse/chart.py	1230	# just return (no new edges to add).	STRING
LOW	nltk/parse/featurechart.py	375	# just return (no new edges to add).	STRING
LOW	nltk/parse/featurechart.py	560	# already in the chart, then just return it as-is.	STRING

Modern AI Meta-Vocabulary4 hits · 12 pts

Severity	File	Line	Snippet	Context
MEDIUM	nltk/translate/ribes_score.py	201	# (an O(L^2) scan) for every context window, giving up to O(L^4) time: a	COMMENT
MEDIUM	nltk/translate/ribes_score.py	260	# A context window longer than the reference can never occur in it	COMMENT
MEDIUM	nltk/translate/ribes_score.py	269	# Retrieve the right context window.	COMMENT
MEDIUM	nltk/translate/ribes_score.py	284	# Retrieve the left context window.	COMMENT

AI Structural Patterns11 hits · 10 pts

Severity	File	Line	Context
LOW	nltk/downloader.py	201	CODE
LOW	nltk/downloader.py	991	CODE
LOW	nltk/tree/prettyprinter.py	331	CODE
LOW	nltk/metrics/agreement.py	209	CODE
LOW	nltk/metrics/agreement.py	361	CODE
LOW	nltk/twitter/twitterclient.py	320	CODE
LOW	nltk/corpus/reader/conll.py	67	CODE
LOW	nltk/tbl/demo.py	141	CODE
LOW	nltk/sentiment/util.py	327	CODE
LOW	nltk/tokenize/stanford_segmenter.py	55	CODE
LOW	nltk/tokenize/texttiling.py	67	CODE

Slop Phrases1 hit · 3 pts

Severity	File	Line	Snippet	Context
MEDIUM	nltk/app/chunkparser_app.py	114	#: for a list of tags you can use for colorizing.	COMMENT

Dead Code1 hit · 2 pts

Severity	File	Line	Snippet	Context
MEDIUM	nltk/app/chartparser_app.py	628		CODE

Fake / Example Data2 hits · 2 pts

Severity	File	Line	Snippet	Context
LOW	nltk/app/nemo_app.py	18	Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna ali	CODE
LOW	nltk/app/nemo_app.py	18	Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna ali	CODE

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings