Repository Analysis

datawhalechina/happy-llm

📚 从零开始构建大模型

5.0 Low AI signal View on GitHub
5.0
Adjusted Score
5.0
Raw Score
100%
Time Factor
2026-05-06
Last Push
30,747
Stars
Jupyter Notebook
Language
32,980
Lines of Code
69
Files
113
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 14MEDIUM 7LOW 92

Pattern Findings

113 matches across 8 categories. Click a row to expand file-level details.

Unused Imports71 hits · 71 pts
SeverityFileLineSnippet
LOWdocs/chapter6/code/download_dataset.py2
LOWdocs/chapter6/code/download_dataset.py3
LOWdocs/chapter6/code/pretrain.py6
LOWdocs/chapter6/code/pretrain.py12
LOWdocs/chapter6/code/pretrain.py16
LOWdocs/chapter6/code/pretrain.py17
LOWdocs/chapter6/code/pretrain.py30
LOWdocs/chapter6/code/pretrain.py31
LOWdocs/chapter6/code/finetune.py6
LOWdocs/chapter6/code/finetune.py11
LOWdocs/chapter6/code/finetune.py12
LOWdocs/chapter6/code/finetune.py13
LOWdocs/chapter6/code/finetune.py19
LOWdocs/chapter6/code/finetune.py21
LOWdocs/chapter6/code/finetune.py23
LOWdocs/chapter6/code/finetune.py23
LOWdocs/chapter6/code/finetune.py33
LOWdocs/chapter6/code/finetune.py34
LOWdocs/chapter7/Agent/demo.py2
LOWdocs/chapter7/Agent/demo.py2
LOWdocs/chapter7/Agent/demo.py2
LOWdocs/chapter7/Agent/web_demo.py3
LOWdocs/chapter7/Agent/web_demo.py3
LOWdocs/chapter7/Agent/web_demo.py3
LOWdocs/chapter7/Agent/src/core.py5
LOWdocs/chapter7/Agent/src/core.py5
LOWdocs/chapter7/Agent/src/core.py5
LOWdocs/chapter7/Agent/src/core.py5
LOWdocs/chapter7/Agent/src/core.py5
LOWdocs/chapter7/Agent/src/core.py5
LOWdocs/chapter7/Agent/src/core.py7
LOWdocs/chapter7/Agent/src/utils.py2
LOWdocs/chapter7/Agent/src/utils.py3
LOWdocs/chapter7/RAG/VectorBase.py12
LOWdocs/chapter7/RAG/VectorBase.py12
LOWdocs/chapter7/RAG/VectorBase.py12
LOWdocs/chapter7/RAG/VectorBase.py12
LOWdocs/chapter7/RAG/VectorBase.py14
LOWdocs/chapter7/RAG/LLM.py11
LOWdocs/chapter7/RAG/LLM.py11
LOWdocs/chapter7/RAG/LLM.py11
LOWdocs/chapter7/RAG/LLM.py11
LOWdocs/chapter7/RAG/Embeddings.py12
LOWdocs/chapter7/RAG/Embeddings.py13
LOWdocs/chapter7/RAG/Embeddings.py13
LOWdocs/chapter7/RAG/Embeddings.py13
LOWdocs/chapter7/RAG/Embeddings.py13
LOWdocs/chapter7/RAG/utils.py12
LOWdocs/chapter7/RAG/utils.py12
LOWdocs/chapter7/RAG/utils.py12
LOWdocs/chapter7/RAG/utils.py12
LOWdocs/chapter7/RAG/utils.py12
LOWdocs/chapter7/RAG/utils.py17
LOWdocs/chapter5/code/ddp_pretrain.py3
LOWdocs/chapter5/code/ddp_pretrain.py8
LOWdocs/chapter5/code/deal_dataset.py1
LOWdocs/chapter5/code/k_model.py2
LOWdocs/chapter5/code/k_model.py3
LOWdocs/chapter5/code/k_model.py4
LOWdocs/chapter5/code/dataset.py2
11 more matches not shown…
Cross-File Repetition7 hits · 35 pts
SeverityFileLineSnippet
HIGHdocs/chapter5/第五章 动手搭建大模型.md0给定输入序列 idx(形状为 (bz,seq_len) 的长整型张量),通过多次生成新 token 来完成序列。 在 model.eval() 模式下运行。效率较低的采样版本,没有使用键k/v cache。
HIGHdocs/chapter5/第五章 动手搭建大模型.md0给定输入序列 idx(形状为 (bz,seq_len) 的长整型张量),通过多次生成新 token 来完成序列。 在 model.eval() 模式下运行。效率较低的采样版本,没有使用键k/v cache。
HIGHdocs/chapter5/code/k_model.py0给定输入序列 idx(形状为 (bz,seq_len) 的长整型张量),通过多次生成新 token 来完成序列。 在 model.eval() 模式下运行。效率较低的采样版本,没有使用键k/v cache。
HIGHdocs/chapter5/第五章 动手搭建大模型.md0根据给定的起始文本生成样本。 :param start: 生成文本的起始提示词 :param num_samples: 要生成的文本样本数 :param max_new_tokens: 每个样本生成的最大 token 数 :param te
HIGHdocs/chapter5/第五章 动手搭建大模型.md0根据给定的起始文本生成样本。 :param start: 生成文本的起始提示词 :param num_samples: 要生成的文本样本数 :param max_new_tokens: 每个样本生成的最大 token 数 :param te
HIGHdocs/chapter5/code/model_sample.py0根据给定的起始文本生成样本。 :param start: 生成文本的起始提示词 :param num_samples: 要生成的文本样本数 :param max_new_tokens: 每个样本生成的最大 token 数 :param te
HIGHdocs/chapter5/code/model_sample.py0根据给定的起始文本生成样本。 :param start: 生成文本的起始提示词 :param num_samples: 要生成的文本样本数 :param max_new_tokens: 每个样本生成的最大 token 数 :param te
Excessive Try-Catch Wrapping12 hits · 19 pts
SeverityFileLineSnippet
MEDIUMdocs/chapter5/第五章 动手搭建大模型.md786 print(f"Error decoding JSON in line {line_num}")
LOWdocs/chapter5/第五章 动手搭建大模型.md912 except Exception as e:
MEDIUMdocs/chapter5/第五章 动手搭建大模型.md913 print(f"Error loading tokenizer: {e}")
MEDIUMdocs/chapter5/第五章 动手搭建大模型.md1094 print(f"Error decoding JSON in line {line_num}")
LOWdocs/chapter5/第五章 动手搭建大模型.md1199 except Exception as e:
MEDIUMdocs/chapter5/第五章 动手搭建大模型.md1200 print(f"Error loading tokenizer: {e}")
MEDIUMdocs/chapter5/code/train_tokenizer.py27 print(f"Error decoding JSON in line {line_num}")
LOWdocs/chapter5/code/train_tokenizer.py132 except Exception as e:
MEDIUMdocs/chapter5/code/train_tokenizer.py133 print(f"Error loading tokenizer: {e}")
LOWExtra-Chapter/CDDRS/readme.md160 except Exception as e:
MEDIUMExtra-Chapter/CDDRS/readme.md161 print(f"Error reading {file_path}: {e}")
LOWExtra-Chapter/generation-method/llm_generation.py149 except Exception as e:
Magic Placeholder Names5 hits · 15 pts
SeverityFileLineSnippet
HIGHdocs/chapter7/第七章 大模型应用.md555 api_key="YOUR_API_KEY", # 替换为你的 API Key
HIGHdocs/chapter7/第七章 大模型应用.md563> **注意:** 你需要将 `YOUR_API_KEY` 替换为你从 [SiliconFlow](https://cloud.siliconflow.cn/i/ybUFvmqK) 或其他服务商获取的有效 API Key。
HIGHdocs/chapter7/第七章 大模型应用.md746 api_key="YOUR_API_KEY", # 替换为你的 API Key
HIGHExtra-Chapter/CDDRS/readme.md80 api_key="your-api-key-here",
HIGHExtra-Chapter/CDDRS/readme.md786 api_key='your-api-key',
Docstring Block Structure2 hits · 10 pts
SeverityFileLineSnippet
HIGHdocs/chapter7/第七章 大模型应用.md259 获取文本的嵌入向量表示 Args: text (str): 输入文本 model (str): 使用的模型名称 Returns:
HIGHdocs/chapter7/RAG/Embeddings.py36 获取文本的嵌入向量表示 Args: text (str): 输入文本 model (str): 使用的模型名称 Returns:
Hyper-Verbose Identifiers9 hits · 8 pts
SeverityFileLineSnippet
LOWdocs/chapter5/code/k_model.py378 def _left_pad_by_attention_mask(
LOWExtra-Chapter/CDDRS/readme.md246 def _compute_semantic_discrepancy(self, embeddings: np.ndarray) -> List[float]:
LOWExtra-Chapter/CDDRS/readme.md283 def _enforce_length_constraints(self, chunks: List[str]) -> List[str]:
LOWExtra-Chapter/CDDRS/readme.md423def compute_document_length_factor(chunk_length: int, avg_length: int = 100) -> float:
LOWExtra-Chapter/CDDRS/readme.md437def compute_term_significance(term_freq: int, doc_length_factor: float) -> float:
LOWExtra-Chapter/CDDRS/readme.md559 def _compute_knowledge_scores(self, key_info: Dict[str, Tuple[str, float]]) -> List[float]:
LOWExtra-Chapter/text-data-processing/readme.md932def test_simple_bpe_tokenizer():
LOWExtra-Chapter/s1-vllm-thinking-budget/s1.py28def run_thinking_budget_sample(llm_model, tokenizer, user_input, thinking_budget):
LOWExtra-Chapter/s1-vllm-thinking-budget/readme.md41def run_thinking_budget_sample(llm_model, tokenizer, user_input, thinking_budget):
Deep Nesting5 hits · 5 pts
SeverityFileLineSnippet
LOWdocs/chapter6/code/finetune.py87
LOWdocs/chapter7/RAG/utils.py34
LOWdocs/chapter7/RAG/utils.py61
LOWdocs/chapter5/code/dataset.py65
LOWdocs/chapter5/code/train_tokenizer.py17
Over-Commented Block2 hits · 2 pts
SeverityFileLineSnippet
LOWdocs/chapter5/code/k_model.py61 def forward(self, x):
LOWdocs/chapter2/第二章 Transformer架构.md301 # 注意力计算