tiktoken is a fast BPE tokeniser for use with OpenAI's models.
51 matches across 8 categories. Click a row to expand file-level details.
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | tests/test_encoding.py | 127 | # ==================== |
| MEDIUM | tests/test_encoding.py | 129 | # ==================== |
| MEDIUM | tests/test_encoding.py | 170 | # ==================== |
| MEDIUM | tests/test_encoding.py | 172 | # ==================== |
| MEDIUM | tests/test_encoding.py | 234 | # ==================== |
| MEDIUM | tests/test_encoding.py | 236 | # ==================== |
| MEDIUM | tiktoken/core.py | 62 | # ==================== |
| MEDIUM | tiktoken/core.py | 64 | # ==================== |
| MEDIUM | tiktoken/core.py | 261 | # ==================== |
| MEDIUM | tiktoken/core.py | 263 | # ==================== |
| MEDIUM | tiktoken/core.py | 352 | # ==================== |
| MEDIUM | tiktoken/core.py | 354 | # ==================== |
| MEDIUM | tiktoken/core.py | 377 | # ==================== |
| MEDIUM | tiktoken/core.py | 379 | # ==================== |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | tests/test_helpers.py | 1 | |
| LOW | tiktoken/registry.py | 1 | |
| LOW | tiktoken/__init__.py | 2 | |
| LOW | tiktoken/__init__.py | 3 | |
| LOW | tiktoken/__init__.py | 4 | |
| LOW | tiktoken/__init__.py | 5 | |
| LOW | tiktoken/__init__.py | 6 | |
| LOW | tiktoken/core.py | 1 | |
| LOW | tiktoken/model.py | 1 | |
| LOW | tiktoken/load.py | 1 | |
| LOW | tiktoken/_educational.py | 3 | |
| LOW | scripts/benchmark.py | 1 | |
| LOW | scripts/benchmark.py | 2 | |
| LOW | scripts/benchmark.py | 3 | |
| LOW | scripts/benchmark.py | 4 | |
| LOW | scripts/benchmark.py | 6 | |
| LOW | scripts/benchmark.py | 10 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | tests/test_misc.py | 24 | def test_optional_blobfile_dependency(): |
| LOW | tests/test_encoding.py | 102 | def test_encode_surrogate_pairs(): |
| LOW | tests/test_encoding.py | 114 | def test_catastrophically_repetitive(make_enc: Callable[[], tiktoken.Encoding]): |
| LOW | tests/test_encoding.py | 159 | def test_single_token_roundtrip(make_enc: Callable[[], tiktoken.Encoding]): |
| LOW | tests/test_encoding.py | 229 | def test_hyp_special_ordinary(make_enc, text: str): |
| LOW | tests/test_simple_public.py | 36 | def test_optional_blobfile_dependency(): |
| LOW | tiktoken/registry.py | 20 | def _available_plugin_modules() -> Sequence[str]: |
| LOW | tiktoken/core.py | 289 | def decode_single_token_bytes(self, token: int) -> bytes: |
| LOW | tiktoken/core.py | 441 | def raise_disallowed_special_token(token: str) -> NoReturn: |
| LOW | tiktoken/load.py | 89 | def data_gym_to_mergeable_bpe_ranks( |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | tiktoken/registry.py | 33 | |
| LOW | tiktoken/_educational.py | 83 | |
| LOW | tiktoken/_educational.py | 119 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | src/lib.rs | 221 | // Various performance notes: |
| LOW | src/lib.rs | 241 | // ========= |
| LOW | src/lib.rs | 541 | // Morally, this is byte_pair_encode(&possibility, &self.encoder) |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | tiktoken/registry.py | 55 | except Exception: |
| LOW | tiktoken/load.py | 169 | except Exception as e: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | tiktoken/model.py | 97 | # Check if the model matches a known prefix |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | tiktoken/_educational.py | 191 | # visualise the token. Here, we'll just use the unicode replacement character to represent some |