113 matches across 8 categories. Click a row to expand file-level details.
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | docs/chapter6/code/download_dataset.py | 2 | |
| LOW | docs/chapter6/code/download_dataset.py | 3 | |
| LOW | docs/chapter6/code/pretrain.py | 6 | |
| LOW | docs/chapter6/code/pretrain.py | 12 | |
| LOW | docs/chapter6/code/pretrain.py | 16 | |
| LOW | docs/chapter6/code/pretrain.py | 17 | |
| LOW | docs/chapter6/code/pretrain.py | 30 | |
| LOW | docs/chapter6/code/pretrain.py | 31 | |
| LOW | docs/chapter6/code/finetune.py | 6 | |
| LOW | docs/chapter6/code/finetune.py | 11 | |
| LOW | docs/chapter6/code/finetune.py | 12 | |
| LOW | docs/chapter6/code/finetune.py | 13 | |
| LOW | docs/chapter6/code/finetune.py | 19 | |
| LOW | docs/chapter6/code/finetune.py | 21 | |
| LOW | docs/chapter6/code/finetune.py | 23 | |
| LOW | docs/chapter6/code/finetune.py | 23 | |
| LOW | docs/chapter6/code/finetune.py | 33 | |
| LOW | docs/chapter6/code/finetune.py | 34 | |
| LOW | docs/chapter7/Agent/demo.py | 2 | |
| LOW | docs/chapter7/Agent/demo.py | 2 | |
| LOW | docs/chapter7/Agent/demo.py | 2 | |
| LOW | docs/chapter7/Agent/web_demo.py | 3 | |
| LOW | docs/chapter7/Agent/web_demo.py | 3 | |
| LOW | docs/chapter7/Agent/web_demo.py | 3 | |
| LOW | docs/chapter7/Agent/src/core.py | 5 | |
| LOW | docs/chapter7/Agent/src/core.py | 5 | |
| LOW | docs/chapter7/Agent/src/core.py | 5 | |
| LOW | docs/chapter7/Agent/src/core.py | 5 | |
| LOW | docs/chapter7/Agent/src/core.py | 5 | |
| LOW | docs/chapter7/Agent/src/core.py | 5 | |
| LOW | docs/chapter7/Agent/src/core.py | 7 | |
| LOW | docs/chapter7/Agent/src/utils.py | 2 | |
| LOW | docs/chapter7/Agent/src/utils.py | 3 | |
| LOW | docs/chapter7/RAG/VectorBase.py | 12 | |
| LOW | docs/chapter7/RAG/VectorBase.py | 12 | |
| LOW | docs/chapter7/RAG/VectorBase.py | 12 | |
| LOW | docs/chapter7/RAG/VectorBase.py | 12 | |
| LOW | docs/chapter7/RAG/VectorBase.py | 14 | |
| LOW | docs/chapter7/RAG/LLM.py | 11 | |
| LOW | docs/chapter7/RAG/LLM.py | 11 | |
| LOW | docs/chapter7/RAG/LLM.py | 11 | |
| LOW | docs/chapter7/RAG/LLM.py | 11 | |
| LOW | docs/chapter7/RAG/Embeddings.py | 12 | |
| LOW | docs/chapter7/RAG/Embeddings.py | 13 | |
| LOW | docs/chapter7/RAG/Embeddings.py | 13 | |
| LOW | docs/chapter7/RAG/Embeddings.py | 13 | |
| LOW | docs/chapter7/RAG/Embeddings.py | 13 | |
| LOW | docs/chapter7/RAG/utils.py | 12 | |
| LOW | docs/chapter7/RAG/utils.py | 12 | |
| LOW | docs/chapter7/RAG/utils.py | 12 | |
| LOW | docs/chapter7/RAG/utils.py | 12 | |
| LOW | docs/chapter7/RAG/utils.py | 12 | |
| LOW | docs/chapter7/RAG/utils.py | 17 | |
| LOW | docs/chapter5/code/ddp_pretrain.py | 3 | |
| LOW | docs/chapter5/code/ddp_pretrain.py | 8 | |
| LOW | docs/chapter5/code/deal_dataset.py | 1 | |
| LOW | docs/chapter5/code/k_model.py | 2 | |
| LOW | docs/chapter5/code/k_model.py | 3 | |
| LOW | docs/chapter5/code/k_model.py | 4 | |
| LOW | docs/chapter5/code/dataset.py | 2 | |
| 11 more matches not shown… | |||
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | docs/chapter5/第五章 动手搭建大模型.md | 0 | 给定输入序列 idx(形状为 (bz,seq_len) 的长整型张量),通过多次生成新 token 来完成序列。 在 model.eval() 模式下运行。效率较低的采样版本,没有使用键k/v cache。 |
| HIGH | docs/chapter5/第五章 动手搭建大模型.md | 0 | 给定输入序列 idx(形状为 (bz,seq_len) 的长整型张量),通过多次生成新 token 来完成序列。 在 model.eval() 模式下运行。效率较低的采样版本,没有使用键k/v cache。 |
| HIGH | docs/chapter5/code/k_model.py | 0 | 给定输入序列 idx(形状为 (bz,seq_len) 的长整型张量),通过多次生成新 token 来完成序列。 在 model.eval() 模式下运行。效率较低的采样版本,没有使用键k/v cache。 |
| HIGH | docs/chapter5/第五章 动手搭建大模型.md | 0 | 根据给定的起始文本生成样本。 :param start: 生成文本的起始提示词 :param num_samples: 要生成的文本样本数 :param max_new_tokens: 每个样本生成的最大 token 数 :param te |
| HIGH | docs/chapter5/第五章 动手搭建大模型.md | 0 | 根据给定的起始文本生成样本。 :param start: 生成文本的起始提示词 :param num_samples: 要生成的文本样本数 :param max_new_tokens: 每个样本生成的最大 token 数 :param te |
| HIGH | docs/chapter5/code/model_sample.py | 0 | 根据给定的起始文本生成样本。 :param start: 生成文本的起始提示词 :param num_samples: 要生成的文本样本数 :param max_new_tokens: 每个样本生成的最大 token 数 :param te |
| HIGH | docs/chapter5/code/model_sample.py | 0 | 根据给定的起始文本生成样本。 :param start: 生成文本的起始提示词 :param num_samples: 要生成的文本样本数 :param max_new_tokens: 每个样本生成的最大 token 数 :param te |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | docs/chapter5/第五章 动手搭建大模型.md | 786 | print(f"Error decoding JSON in line {line_num}") |
| LOW | docs/chapter5/第五章 动手搭建大模型.md | 912 | except Exception as e: |
| MEDIUM | docs/chapter5/第五章 动手搭建大模型.md | 913 | print(f"Error loading tokenizer: {e}") |
| MEDIUM | docs/chapter5/第五章 动手搭建大模型.md | 1094 | print(f"Error decoding JSON in line {line_num}") |
| LOW | docs/chapter5/第五章 动手搭建大模型.md | 1199 | except Exception as e: |
| MEDIUM | docs/chapter5/第五章 动手搭建大模型.md | 1200 | print(f"Error loading tokenizer: {e}") |
| MEDIUM | docs/chapter5/code/train_tokenizer.py | 27 | print(f"Error decoding JSON in line {line_num}") |
| LOW | docs/chapter5/code/train_tokenizer.py | 132 | except Exception as e: |
| MEDIUM | docs/chapter5/code/train_tokenizer.py | 133 | print(f"Error loading tokenizer: {e}") |
| LOW | Extra-Chapter/CDDRS/readme.md | 160 | except Exception as e: |
| MEDIUM | Extra-Chapter/CDDRS/readme.md | 161 | print(f"Error reading {file_path}: {e}") |
| LOW | Extra-Chapter/generation-method/llm_generation.py | 149 | except Exception as e: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | docs/chapter7/第七章 大模型应用.md | 555 | api_key="YOUR_API_KEY", # 替换为你的 API Key |
| HIGH | docs/chapter7/第七章 大模型应用.md | 563 | > **注意:** 你需要将 `YOUR_API_KEY` 替换为你从 [SiliconFlow](https://cloud.siliconflow.cn/i/ybUFvmqK) 或其他服务商获取的有效 API Key。 |
| HIGH | docs/chapter7/第七章 大模型应用.md | 746 | api_key="YOUR_API_KEY", # 替换为你的 API Key |
| HIGH | Extra-Chapter/CDDRS/readme.md | 80 | api_key="your-api-key-here", |
| HIGH | Extra-Chapter/CDDRS/readme.md | 786 | api_key='your-api-key', |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | docs/chapter7/第七章 大模型应用.md | 259 | 获取文本的嵌入向量表示 Args: text (str): 输入文本 model (str): 使用的模型名称 Returns: |
| HIGH | docs/chapter7/RAG/Embeddings.py | 36 | 获取文本的嵌入向量表示 Args: text (str): 输入文本 model (str): 使用的模型名称 Returns: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | docs/chapter5/code/k_model.py | 378 | def _left_pad_by_attention_mask( |
| LOW | Extra-Chapter/CDDRS/readme.md | 246 | def _compute_semantic_discrepancy(self, embeddings: np.ndarray) -> List[float]: |
| LOW | Extra-Chapter/CDDRS/readme.md | 283 | def _enforce_length_constraints(self, chunks: List[str]) -> List[str]: |
| LOW | Extra-Chapter/CDDRS/readme.md | 423 | def compute_document_length_factor(chunk_length: int, avg_length: int = 100) -> float: |
| LOW | Extra-Chapter/CDDRS/readme.md | 437 | def compute_term_significance(term_freq: int, doc_length_factor: float) -> float: |
| LOW | Extra-Chapter/CDDRS/readme.md | 559 | def _compute_knowledge_scores(self, key_info: Dict[str, Tuple[str, float]]) -> List[float]: |
| LOW | Extra-Chapter/text-data-processing/readme.md | 932 | def test_simple_bpe_tokenizer(): |
| LOW | Extra-Chapter/s1-vllm-thinking-budget/s1.py | 28 | def run_thinking_budget_sample(llm_model, tokenizer, user_input, thinking_budget): |
| LOW | Extra-Chapter/s1-vllm-thinking-budget/readme.md | 41 | def run_thinking_budget_sample(llm_model, tokenizer, user_input, thinking_budget): |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | docs/chapter6/code/finetune.py | 87 | |
| LOW | docs/chapter7/RAG/utils.py | 34 | |
| LOW | docs/chapter7/RAG/utils.py | 61 | |
| LOW | docs/chapter5/code/dataset.py | 65 | |
| LOW | docs/chapter5/code/train_tokenizer.py | 17 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | docs/chapter5/code/k_model.py | 61 | def forward(self, x): |
| LOW | docs/chapter2/第二章 Transformer架构.md | 301 | # 注意力计算 |