Repository Analysis

openai/tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

25.5 Moderate AI signal View on GitHub
25.5
Adjusted Score
25.5
Raw Score
100%
Time Factor
2026-05-24
Last Push
18,351
Stars
Python
Language
3,276
Lines of Code
25
Files
51
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 0MEDIUM 14LOW 37

Pattern Findings

51 matches across 8 categories. Click a row to expand file-level details.

Decorative Section Separators14 hits · 45 pts
SeverityFileLineSnippet
MEDIUMtests/test_encoding.py127# ====================
MEDIUMtests/test_encoding.py129# ====================
MEDIUMtests/test_encoding.py170# ====================
MEDIUMtests/test_encoding.py172# ====================
MEDIUMtests/test_encoding.py234# ====================
MEDIUMtests/test_encoding.py236# ====================
MEDIUMtiktoken/core.py62 # ====================
MEDIUMtiktoken/core.py64 # ====================
MEDIUMtiktoken/core.py261 # ====================
MEDIUMtiktoken/core.py263 # ====================
MEDIUMtiktoken/core.py352 # ====================
MEDIUMtiktoken/core.py354 # ====================
MEDIUMtiktoken/core.py377 # ====================
MEDIUMtiktoken/core.py379 # ====================
Unused Imports17 hits · 17 pts
SeverityFileLineSnippet
LOWtests/test_helpers.py1
LOWtiktoken/registry.py1
LOWtiktoken/__init__.py2
LOWtiktoken/__init__.py3
LOWtiktoken/__init__.py4
LOWtiktoken/__init__.py5
LOWtiktoken/__init__.py6
LOWtiktoken/core.py1
LOWtiktoken/model.py1
LOWtiktoken/load.py1
LOWtiktoken/_educational.py3
LOWscripts/benchmark.py1
LOWscripts/benchmark.py2
LOWscripts/benchmark.py3
LOWscripts/benchmark.py4
LOWscripts/benchmark.py6
LOWscripts/benchmark.py10
Hyper-Verbose Identifiers10 hits · 10 pts
SeverityFileLineSnippet
LOWtests/test_misc.py24def test_optional_blobfile_dependency():
LOWtests/test_encoding.py102def test_encode_surrogate_pairs():
LOWtests/test_encoding.py114def test_catastrophically_repetitive(make_enc: Callable[[], tiktoken.Encoding]):
LOWtests/test_encoding.py159def test_single_token_roundtrip(make_enc: Callable[[], tiktoken.Encoding]):
LOWtests/test_encoding.py229def test_hyp_special_ordinary(make_enc, text: str):
LOWtests/test_simple_public.py36def test_optional_blobfile_dependency():
LOWtiktoken/registry.py20def _available_plugin_modules() -> Sequence[str]:
LOWtiktoken/core.py289 def decode_single_token_bytes(self, token: int) -> bytes:
LOWtiktoken/core.py441def raise_disallowed_special_token(token: str) -> NoReturn:
LOWtiktoken/load.py89def data_gym_to_mergeable_bpe_ranks(
Deep Nesting3 hits · 3 pts
SeverityFileLineSnippet
LOWtiktoken/registry.py33
LOWtiktoken/_educational.py83
LOWtiktoken/_educational.py119
Over-Commented Block3 hits · 3 pts
SeverityFileLineSnippet
LOWsrc/lib.rs221// Various performance notes:
LOWsrc/lib.rs241// =========
LOWsrc/lib.rs541 // Morally, this is byte_pair_encode(&possibility, &self.encoder)
Excessive Try-Catch Wrapping2 hits · 2 pts
SeverityFileLineSnippet
LOWtiktoken/registry.py55 except Exception:
LOWtiktoken/load.py169 except Exception as e:
Redundant / Tautological Comments1 hit · 2 pts
SeverityFileLineSnippet
LOWtiktoken/model.py97 # Check if the model matches a known prefix
AI Slop Vocabulary1 hit · 2 pts
SeverityFileLineSnippet
LOWtiktoken/_educational.py191 # visualise the token. Here, we'll just use the unicode replacement character to represent some