liguodongiot/llm-action

7.8

Adjusted Score

7.8

Raw Score

100%

Time Factor

2026-07-01

Last Push

24.7K

Stars

HTML

Language

108.8K

Lines of Code

783

Files

407

Pattern Hits

2026-07-14

Scan Date

0.05

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 19HIGH 20MEDIUM 54LOW 314

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 407 distinct pattern matches across 16 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Hallucination Indicators19 hits · 255 pts

Severity	File	Line	Snippet	Context
CRITICAL	…m-train/pytorch/distribution/tensor-parallel/README.md	54	torch.distributed.tensor.parallel.style.make_input_replicate_1d(input, device_mesh=None)	CODE
CRITICAL	…-train/pytorch/distribution/pipeline-parallel/1-流水线.md	73	- torch.distributed.pipeline.sync.skip.skippable.skippable(stash=(), pop=())	CODE
CRITICAL⚡	…-train/pytorch/distribution/pipeline-parallel/1-流水线.md	113	- torch.distributed.pipeline.sync.skip.skippable.stash(name, tensor)	CODE
CRITICAL⚡	…-train/pytorch/distribution/pipeline-parallel/1-流水线.md	115	- torch.distributed.pipeline.sync.skip.skippable.pop(name)	CODE
CRITICAL⚡	…-train/pytorch/distribution/pipeline-parallel/1-流水线.md	117	- torch.distributed.pipeline.sync.skip.skippable.verify_skippables(module)	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	247	model.language_model.embedding.word_embeddings.weight.data.copy_(out_word_embed[tp_rank])	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	249	model.language_model.embedding.position_embeddings.weight.data.copy_(pos_embed)	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	302	l.self_attention.query_key_value.weight.data.copy_(qkv_weight[tp_rank])	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	303	l.self_attention.dense.weight.data.copy_(dense_weight[tp_rank])	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	306	l.mlp.dense_h_to_4h.weight.data.copy_(mlp_l0_weight[tp_rank])	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	307	l.mlp.dense_4h_to_h.weight.data.copy_(mlp_l1_weight[tp_rank])	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	309	l.self_attention.query_key_value.bias.data.copy_(qkv_bias[tp_rank])	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	310	l.self_attention.dense.bias.data.copy_(dense_bias)	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	311	l.mlp.dense_h_to_4h.bias.data.copy_(mlp_l0_bias[tp_rank])	CODE
CRITICAL⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	312	l.mlp.dense_4h_to_h.bias.data.copy_(mlp_l1_bias)	CODE
CRITICAL	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	323	models[tp_rank].language_model.encoder.final_layernorm.weight.data.copy_(final_layernorm_weight)	CODE
CRITICAL	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	324	models[tp_rank].language_model.encoder.final_layernorm.bias.data.copy_(final_layernorm_bias)	CODE
CRITICAL	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	352	models[tp_rank].language_model.pooler.dense.weight.data.copy_(pooler_weight)	CODE
CRITICAL	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	353	models[tp_rank].language_model.pooler.dense.bias.data.copy_(pooler_bias)	CODE

Unused Imports142 hits · 142 pts

Severity	File	Line	Context
LOW	llm-data-engineering/sft-dataset/jinja-llm-bloom.py	2	CODE
LOW	llm-data-engineering/sft-dataset/jinja-llm-baichuan2.py	2	CODE
LOW	llm-data-engineering/sft-dataset/jinja-llm-baichuan.py	2	CODE
LOW	llm-data-engineering/sft-dataset/jinja-llm.py	2	CODE
LOW	llm-data-engineering/sft-dataset/jinja-llm-chatglm3.py	2	CODE
LOW	llm-eval/llm-performance/stat_gpu_memory.py	1	CODE
LOW	…performance/hardware-performance/pynvml-stat-memory.py	1	CODE
LOW	…al/llm-performance/vllm/vllm-locust-qwen1.5-7b-long.py	3	CODE
LOW	…al/llm-performance/vllm/vllm-locust-qwen1.5-7b-long.py	4	CODE
LOW	…al/llm-performance/vllm/vllm-locust-qwen1.5-7b-long.py	4	CODE
LOW	…al/llm-performance/mindie/lantency/stat_input_token.py	4	CODE
LOW	…al/llm-performance/mindie/lantency/stat_input_token.py	4	CODE
LOW	…al/llm-performance/mindie/lantency/perfermance-stat.py	5	CODE
LOW	…performance/mindie/locust-lantency-throughput/hello.py	3	CODE
LOW	…performance/mindie/locust-lantency-throughput/hello.py	4	CODE
LOW	…ocust-lantency-throughput/llm-910b4-chatglm3-6b-2tp.py	3	CODE
LOW	…ocust-lantency-throughput/llm-910b4-chatglm3-6b-2tp.py	4	CODE
LOW	…ocust-lantency-throughput/llm-910b4-chatglm3-6b-2tp.py	4	CODE
LOW	…e/locust-lantency-throughput/llm-910b4-qwen-72b-8tp.py	2	CODE
LOW	…e/locust-lantency-throughput/llm-910b4-qwen-72b-8tp.py	3	CODE
LOW	…e/locust-lantency-throughput/llm-910b4-qwen-72b-8tp.py	3	CODE
LOW	…cust-lantency-throughput/llm-910b4-baichuan2-7b-2tp.py	3	CODE
LOW	…cust-lantency-throughput/llm-910b4-baichuan2-7b-2tp.py	4	CODE
LOW	…cust-lantency-throughput/llm-910b4-baichuan2-7b-2tp.py	4	CODE
LOW	…ie/locust-lantency-throughput/llm-910b4-qwen1.5-4tp.py	2	CODE
LOW	…ie/locust-lantency-throughput/llm-910b4-qwen1.5-4tp.py	3	CODE
LOW	…ie/locust-lantency-throughput/llm-910b4-qwen1.5-4tp.py	3	CODE
LOW	…nference/ascend/mindformers/mindsporelite-inference.py	12	CODE
LOW	llm-inference/ascend/mindformers/mindsporelite-stat.py	3	CODE
LOW	llm-inference/ascend/mindformers/mindsporelite-stat.py	11	CODE
LOW	…nference/ascend/mindformers/baichuan2/baichuan-stat.py	2	CODE
LOW	…nce/ascend/mindformers/baichuan2/baichuan-inference.py	1	CODE
LOW	…nce/ascend/mindformers/baichuan2/baichuan-inference.py	2	CODE
LOW	…-inference/ascend/mindformers/chatglm3/chatglm-stat.py	2	CODE
LOW	…m-inference/ascend/mindformers/chatglm3/chatglm-gen.py	2	CODE
LOW	…rence/ascend/mindformers/chatglm3/chatglm-inference.py	5	CODE
LOW	llm-inference/web/flask/llm-qwen-mindspore-lite.py	4	CODE
LOW	llm-inference/web/flask/llm-qwen-mindspore-lite.py	7	CODE
LOW	llm-inference/web/flask/llm-qwen-mindspore-lite.py	9	CODE
LOW	llm-inference/web/fastapi/llm-qwen-mindspore-lite.py	4	CODE
LOW	llm-inference/web/fastapi/llm-qwen-mindspore-lite.py	7	CODE
LOW	llm-inference/web/fastapi/llm-qwen-mindspore-lite.py	9	CODE
LOW	llm-inference/web/fastapi/llm-qwen-mindspore-lite.py	15	CODE
LOW	llm-inference/web/fastapi/llm-qwen-mindspore-lite.py	17	CODE
LOW	…ter-transformer/bloom/firefly_lambada_1w_stat_token.py	1	CODE
LOW	…ter-transformer/bloom/firefly_lambada_1w_stat_token.py	10	CODE
LOW	…ter-transformer/bloom/firefly_lambada_1w_stat_token.py	12	CODE
LOW	…er-transformer/megatron-gpt2/gpt_summarization_stat.py	7	CODE
LOW	…/faster-transformer/megatron-gpt2/gpt_summarization.py	7	CODE
LOW	llm-inference/triton/resnet50/client.py	5	CODE
LOW	…se/distribution-parallelism/moe-parallel/paddle_moe.py	6	CODE
LOW	…se/distribution-parallelism/moe-parallel/paddle_moe.py	8	CODE
LOW	ai-framework/mxnet/oneflow_cnn_mnist.py	6	CODE
LOW	ai-framework/mxnet/mxnet_cnn_mnist.py	1	CODE
LOW	ai-framework/mxnet/mxnet_cnn_mnist.py	12	CODE
LOW	ai-framework/mxnet/mxnet_cnn_mnist.py	12	CODE
LOW	ai-framework/mxnet/mxnet_cnn_mnist.py	26	CODE
LOW	ai-framework/mxnet/mxnet_mlp_mnist.py	2	CODE
LOW	ai-framework/mxnet/mnist.py	4	CODE
LOW	llm-train/alpaca-lora/generate.py	7	CODE
82 more matches not shown…

Decorative Section Separators32 hits · 110 pts

Severity	File	Line	Snippet	Context
MEDIUM	llm-inference/DeepSpeed-Inference.md	9	# ---------------------------------------	COMMENT
MEDIUM	llm-inference/DeepSpeed-Inference.md	11	# ---------------------------------------	COMMENT
MEDIUM	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	24	# ----------------	COMMENT
MEDIUM	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	124	# -------------------------------------	COMMENT
MEDIUM	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	137	# -------------------	COMMENT
MEDIUM	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	246	# -----------------------------------	COMMENT
MEDIUM	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	341	# -------------	COMMENT
MEDIUM	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	445	# -------------------------------------	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	19	# -------------------------------------------	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	37	# ------------------------	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	56	# -------------------------------	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	119	# -------------------------------------------	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	196	# ----------------------------------------------	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	250	# ---------------------	COMMENT
MEDIUM	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	204	#-----------	COMMENT
MEDIUM⚡	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	254	#-------------------	COMMENT
MEDIUM	llm-algo/chatglm/模型架构.md	220	# ===================================	COMMENT
MEDIUM	llm-algo/chatglm/模型架构.md	222	# ===================================	COMMENT
MEDIUM	llm-algo/chatglm/模型架构.md	264	# =========================	COMMENT
MEDIUM	llm-algo/chatglm/模型架构.md	266	# =========================	COMMENT
MEDIUM	llm-algo/chatglm2/模型架构.md	138	# ===========================	COMMENT
MEDIUM	llm-algo/chatglm2/模型架构.md	140	# ===========================	COMMENT
MEDIUM	llm-algo/chatglm2/模型架构.md	160	# =========================	COMMENT
MEDIUM	llm-algo/chatglm2/模型架构.md	162	# =========================	COMMENT
MEDIUM⚡	llm-algo/chatglm2/模型架构.md	253	# =================================================	COMMENT
MEDIUM⚡	llm-algo/chatglm2/模型架构.md	255	# =================================================	COMMENT
MEDIUM⚡	llm-algo/chatglm2/模型架构.md	256	# =====================	COMMENT
MEDIUM⚡	llm-algo/chatglm2/模型架构.md	258	# =====================	COMMENT
MEDIUM⚡	llm-algo/chatglm2/模型架构.md	322	# ==================================	COMMENT
MEDIUM⚡	llm-algo/chatglm2/模型架构.md	324	# ==================================	COMMENT
MEDIUM⚡	llm-algo/chatglm2/模型架构.md	328	# =================	COMMENT
MEDIUM⚡	llm-algo/chatglm2/模型架构.md	330	# =================	COMMENT

Cross-File Repetition14 hits · 70 pts

Severity	File	Snippet	Context
HIGH	…-inference/ascend/mindformers/chatglm3/chatglm-stat.py	# 2. 自定义修改配置后实例化 config = autoconfig.from_pretrained('/root/workspace/model/chatglm3-6b_ms/run_glm3_6b.yaml') config.use	STRING
HIGH	…m-inference/ascend/mindformers/chatglm3/chatglm-gen.py	# 2. 自定义修改配置后实例化 config = autoconfig.from_pretrained('/root/workspace/model/chatglm3-6b_ms/run_glm3_6b.yaml') config.use	STRING
HIGH	…rence/ascend/mindformers/chatglm3/chatglm-inference.py	# 2. 自定义修改配置后实例化 config = autoconfig.from_pretrained('/root/workspace/model/chatglm3-6b_ms/run_glm3_6b.yaml') config.use	STRING
HIGH	llm-train/alpaca/train_ddp.py	resize tokenizer and embedding. note: this is the unoptimized version that may make your embedding size not be divisible	STRING
HIGH	llm-train/alpaca/train.py	resize tokenizer and embedding. note: this is the unoptimized version that may make your embedding size not be divisible	STRING
HIGH	llm-train/qlora/qlora.py	resize tokenizer and embedding. note: this is the unoptimized version that may make your embedding size not be divisible	STRING
HIGH	llm-train/chinese-llama-alpaca/run_clm_sft_with_peft.py	resize tokenizer and embedding. note: this is the unoptimized version that may make your embedding size not be divisible	STRING
HIGH	llm-localization/ascend/standford-alpaca/train.py	resize tokenizer and embedding. note: this is the unoptimized version that may make your embedding size not be divisible	STRING
HIGH	llm-train/alpaca/train_ddp.py	make dataset and collator for supervised fine-tuning.	STRING
HIGH	llm-train/alpaca/train.py	make dataset and collator for supervised fine-tuning.	STRING
HIGH	llm-localization/ascend/standford-alpaca/train.py	make dataset and collator for supervised fine-tuning.	STRING
HIGH	…tribution/tensor-parallel/sequence_parallel_example.py	main body of the demo of a basic version of tensor parallel by using pytorch native apis.	STRING
HIGH	…ch/distribution/tensor-parallel/2d_parallel_example.py	main body of the demo of a basic version of tensor parallel by using pytorch native apis.	STRING
HIGH	…istribution/tensor-parallel/tensor_parallel_example.py	main body of the demo of a basic version of tensor parallel by using pytorch native apis.	STRING

Over-Commented Block71 hits · 62 pts

Severity	File	Line	Snippet	Context
LOW	llm-tools/profiler-recipe.py	21	# Name Self CPU CPU total CPU time avg # of Calls	COMMENT
LOW	llm-tools/profiler-recipe.py	41	# --------------------------------- ------------ -------------------------------------------	COMMENT
LOW	llm-tools/profiler-recipe.py	81		COMMENT
LOW	llm-tools/profiler-recipe.py	121	# --------------------------------- ------------ ------------ ------------	COMMENT
LOW	llm-tools/profiler-recipe.py	141	# aten::empty 94.79 Mb 94.79 Mb 121	COMMENT
LOW	llm-tools/profiler-recipe.py	181	with_stack=True,	COMMENT
LOW	llm-pipeline/REAEMD.md	81	# --llama \	COMMENT
LOW	llm-pipeline/REAEMD.md	101	# --gradient_checkpointing \	COMMENT
LOW	llm-pipeline/REAEMD.md	121	# --num_train_epochs 2 \	COMMENT
LOW	llm-train/qlora/accuracy.py	1	# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.	COMMENT
LOW	llm-train/chinese-llama-alpaca/run_clm_pt_with_peft.py	1	#!/usr/bin/env python	COMMENT
LOW	llm-train/chinese-llama-alpaca/run_clm_sft_with_peft.py	1	#!/usr/bin/env python	COMMENT
LOW	…ibution/data-parallel/minGPT-ddp/sbatch_run_sig_opt.sh	1		COMMENT
LOW	…rch/distribution/data-parallel/minGPT-ddp/multinode.sh	1	#!/bin/bash	COMMENT
LOW	…istribution/data-parallel/minGPT-ddp/sbatch_run_sig.sh	1	#!/bin/bash	COMMENT
LOW	…ch/distribution/data-parallel/minGPT-ddp/sbatch_run.sh	1		COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	61		COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	121		COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	141	######################################################################	COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	221	######################################################################	COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	241	# Need batch dimension first for pipeline parallelism.	COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	341	# -------------	COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	461	mp.spawn(run_worker, args=(world_size, ), nprocs=world_size, join=True)	COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	481	# [RANK 1]: -----------------------------------------------------------------------------------------	COMMENT
LOW	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	501	# [RANK 0]: \| epoch 3 \| 20/ 50 batches \| lr 4.51 \| ms/batch 698.27 \| loss 12.01 \| ppl 164364.60	COMMENT
LOW	llm-train/alpa/train/pipeshard_parallelism.py	41	ray.init()	COMMENT
LOW	llm-train/alpa/train/pipeshard_parallelism.py	181		COMMENT
LOW	llm-train/alpa/train/pipeshard_parallelism.py	201	#	COMMENT
LOW	llm-train/alpa/train/pipeshard_parallelism.py	241	auto_pipeline_actual_state = auto_pipeline_train_step(state, batch)	COMMENT
LOW	llm-train/alpa/train/pipeshard_parallelism.py	261	#	COMMENT
LOW	…egatron/gpt2/merge_ck_and_inference/checkpoint_util.py	1	import argparse	COMMENT
LOW	…egatron/gpt2/merge_ck_and_inference/checkpoint_util.py	21		COMMENT
LOW	…egatron/gpt2/merge_ck_and_inference/checkpoint_util.py	41	# consumed_valid_samples	COMMENT
LOW	…egatron/gpt2/merge_ck_and_inference/checkpoint_util.py	61	# "mlp l1 weight"	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	21	END: Cython Metadata */	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	41	#endif	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	61	#else	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	81	#define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)\	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	101	#endif	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	121	#endif	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	141	#define PyObject_Free(p) PyMem_Free(p)	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	161	#if PY_MAJOR_VERSION >= 3	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	181	#define PyInt_FromSsize_t PyLong_FromSsize_t	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	201	#else	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	221	#define __Pyx_PyType_AsAsync(obj) NULL	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	281		COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	301	#include "stdlib.h"	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	321	# else	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	341	#define __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT 0	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	361	#define __Pyx_sst_abs(value) abs(value)	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	381	#define __Pyx_PyStr_FromString __Pyx_PyBytes_FromString	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	401	#else	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	521		COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	601	#endif	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	621	#define __pyx_atomic_incr_aligned(value, lock) _InterlockedIncrement(value)	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	1041	#define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL;	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	1061	#define __Pyx_XINCREF(r) do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0)	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	1141	#endif	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	1161	int memview_is_new_reference);	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	1181		COMMENT
11 more matches not shown…

Self-Referential Comments17 hits · 56 pts

Severity	File	Line	Snippet	Context
MEDIUM	llm-inference/flexflow-serve/benchmark-batch1.py	43	# Create the sampling configs	COMMENT
MEDIUM	ai-framework/deepspeed/hello_bert/train_bert_ds.py	117	# Create the labels first	COMMENT
MEDIUM	ai-framework/deepspeed/hello_bert/train_bert_ds.py	941	# Create the labels first	COMMENT
MEDIUM	ai-framework/deepspeed/hello_bert/train_bert.py	117	# Create the labels first	COMMENT
MEDIUM	llm-train/galore/torchrun_main.py	433	# The below code is only executed during the update step	COMMENT
MEDIUM	…tribution/tensor-parallel/sequence_parallel_example.py	49	# Create a optimizer for the parallelized module.	COMMENT
MEDIUM	…ch/distribution/tensor-parallel/2d_parallel_example.py	75	# Create a optimizer for the parallelized module.	COMMENT
MEDIUM	…istribution/tensor-parallel/tensor_parallel_example.py	57	# Create a optimizer for the parallelized module.	COMMENT
MEDIUM	…pytorch/distribution/pipeline-parallel/ddp_pipeline.py	23	# Define the model	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	101	# Define the training step	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	127	# Define a MLP model with manual stage boundaries.	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	154	# Define the training step.	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	213	# Define the parallel method.	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	222	# Define the training step. The function body is the same as the above one.	COMMENT
MEDIUM⚡	llm-compression/quantization/llm-qat/cfd70ff/utils.py	16	# Define a utility method for setting the logging parameters of a logger	COMMENT
MEDIUM⚡	llm-compression/quantization/llm-qat/cfd70ff/utils.py	24	# Define a formatter for the log messages	COMMENT
MEDIUM⚡	llm-compression/quantization/llm-qat/cfd70ff/utils.py	29	# Create a console handler for outputting log messages to the console	COMMENT

Hyper-Verbose Identifiers43 hits · 44 pts

Severity	File	Line	Snippet	Context
LOW	llm-data-engineering/sft-dataset/数据集格式.md	93	def preprocess_function_train(examples):	CODE
LOW	llm-data-engineering/sft-dataset/数据集格式.md	156	def build_inputs_with_special_tokens(	CODE
LOW	llm-data-engineering/sft-dataset/数据集格式.md	434	def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):	CODE
LOW	…nference/ascend/mindformers/mindsporelite-inference.py	26	def pipeline_from_model_paths(args_, tokenizer):	CODE
LOW	…nference/ascend/mindformers/mindsporelite-inference.py	76	def pipeline_from_infer_config(args_, tokenizer):	CODE
LOW	llm-inference/ascend/mindformers/mindsporelite-stat.py	33	def pipeline_from_model_paths(args_, tokenizer):	CODE
LOW	llm-inference/ascend/mindformers/mindsporelite-stat.py	83	def pipeline_from_infer_config(args_, tokenizer):	CODE
LOW	llm-train/alpaca-lora/finetune_metrics_epoch.py	152	def generate_and_tokenize_prompt(data_point):	CODE
LOW	llm-train/alpaca-lora/finetune.py	146	def generate_and_tokenize_prompt(data_point):	CODE
LOW	llm-train/chatglm/main.py	176	def preprocess_function_train(examples):	CODE
LOW	llm-train/alpaca/train_ddp.py	54	def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output_dir: str):	CODE
LOW	llm-train/alpaca/train_ddp.py	63	def smart_tokenizer_and_embedding_resize(	CODE
LOW	llm-train/alpaca/train_ddp.py	173	def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, data_args) -> Dict:	CODE
LOW	llm-train/alpaca/train.py	53	def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output_dir: str):	CODE
LOW	llm-train/alpaca/train.py	62	def smart_tokenizer_and_embedding_resize(	CODE
LOW	llm-train/alpaca/train.py	172	def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, data_args) -> Dict:	CODE
LOW	llm-train/qlora/qlora.py	346	def print_trainable_parameters(args, model):	CODE
LOW	llm-train/qlora/qlora.py	363	def smart_tokenizer_and_embedding_resize(	CODE
LOW	llm-train/qlora/qlora.py	438	def extract_unnatural_instructions_data(examples, extract_reformulations=False):	CODE
LOW	llm-train/chinese-llama-alpaca/run_clm_pt_with_peft.py	75	def preprocess_logits_for_metrics(logits, labels):	CODE
LOW	llm-train/chinese-llama-alpaca/run_clm_pt_with_peft.py	83	def fault_tolerance_data_collator(features: List) -> Dict[str, Any]:	CODE
LOW	llm-train/chinese-llama-alpaca/run_clm_sft_with_peft.py	433	def smart_tokenizer_and_embedding_resize(	CODE
LOW	llm-train/alpa/train/pipeshard_parallelism.py	161	def manual_pipeline_train_step(state, batch):	CODE
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	8142	* cdef setitem_slice_assign_scalar(self, memoryview dst, value):	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	8186	* cdef setitem_slice_assign_scalar(self, memoryview dst, value): # <<<<<<<<<<<<<<	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	8213	* cdef setitem_slice_assign_scalar(self, memoryview dst, value):	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	8451	* cdef setitem_slice_assign_scalar(self, memoryview dst, value): # <<<<<<<<<<<<<<	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	14045	* cdef memoryview_copy_from_slice(memoryview memview, __Pyx_memviewslice *memviewslice): # <<<<<<<<<<<<<<	COMMENT
LOW	llm-train/megatron/gpt2/data/cMinhash.cpp	14149	* cdef memoryview_copy_from_slice(memoryview memview, __Pyx_memviewslice *memviewslice): # <<<<<<<<<<<<<<	COMMENT
LOW	llm-algo/chatglm/模型架构.md	176	def apply_rotary_pos_emb_index(q, k, cos, sin, position_id):	CODE
LOW	llm-algo/chatglm/模型架构.md	362	def split_tensor_along_last_dim(self, tensor, num_partitions,	CODE
LOW	llm-algo/chatglm2/模型架构.md	749	def _update_model_kwargs_for_generation(	CODE
LOW	llm-algo/chatglm2/模型架构.md	780	def prepare_inputs_for_generation(	CODE
LOW⚡	llm-compression/quantization/llm-qat/cfd70ff/utils.py	39	def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output_dir: str):	CODE
LOW	llm-localization/ascend/standford-alpaca/train.py	51	def smart_tokenizer_and_embedding_resize(	CODE
LOW	llm-localization/ascend/standford-alpaca/train.py	161	def make_supervised_data_module(tokenizer: transformers.PreTrainedTokenizer, data_args) -> Dict:	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1237	def __compare_simplified_dataset_results(self):	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1370	def __compare_full_dataset_results(self):	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1456	def __patch_hf_transformers_utils(self):	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	799	def process_before_extraction(gen, choice_dict):	STRING
LOW	llm-localization/ascend/mindie/script/model-test.py	977	def __run_full_dataset_truthfulqa(self):	STRING
LOW	llm-localization/ascend/mindie/script/model-test.py	986	def format_prompt_with_answer_strings(question, ans):	STRING
LOW	llm-localization/ascend/mindie/script/model-test.py	1153	def __run_full_dataset_humaneval(self):	STRING

Cross-Language Confusion5 hits · 38 pts

Severity	File	Line	Snippet	Context
HIGH⚡	llm-train/megatron/gpt2/data/download.py	251	url varchar(2048) not null,	CODE
HIGH⚡	llm-train/megatron/gpt2/data/download.py	252	domain varchar(255) not null,	CODE
HIGH⚡	llm-train/megatron/gpt2/data/download.py	253	word_count int null,	CODE
HIGH⚡	llm-train/megatron/gpt2/data/download.py	254	elapsed int null,	CODE
HIGH⚡	llm-train/megatron/gpt2/data/download.py	255	scraper varchar(255) not null,	CODE

Deep Nesting36 hits · 36 pts

Severity	File	Line	Context
LOW	…er-transformer/megatron-gpt2/gpt_summarization_stat.py	24	CODE
LOW	…er-transformer/megatron-gpt2/gpt_summarization_stat.py	354	CODE
LOW	…/faster-transformer/megatron-gpt2/gpt_summarization.py	21	CODE
LOW	…/faster-transformer/megatron-gpt2/gpt_summarization.py	327	CODE
LOW	ai-framework/mxnet/mxnet_cnn_mnist.py	122	CODE
LOW	llm-train/alpaca-lora/export_state_dict_checkpoint.py	80	CODE
LOW	llm-train/chatglm/main.py	38	CODE
LOW	llm-train/chatglm/main.py	148	CODE
LOW	llm-train/chatglm/main.py	176	CODE
LOW	llm-train/qlora/qlora.py	262	CODE
LOW	llm-train/qlora/qlora.py	438	CODE
LOW	llm-train/qlora/qlora.py	475	CODE
LOW	llm-train/qlora/qlora.py	490	CODE
LOW	llm-train/qlora/qlora.py	514	CODE
LOW	llm-train/qlora/qlora.py	545	CODE
LOW	llm-train/galore/torchrun_main.py	134	CODE
LOW	…/peft/clm/peft_lora_clm_accelerate_ds_zero3_offload.py	109	CODE
LOW	…/chinese-llama-alpaca/merge_llama_with_chinese_lora.py	67	CODE
LOW	…/chinese-llama-alpaca/merge_llama_with_chinese_lora.py	111	CODE
LOW	llm-train/chinese-llama-alpaca/run_clm_pt_with_peft.py	83	CODE
LOW	…2/merge_ck_and_inference/checkpoint_loader_megatron.py	19	CODE
LOW	…t2/merge_ck_and_inference/checkpoint_saver_megatron.py	22	CODE
LOW	llm-train/megatron/gpt2/data/download.py	193	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	297	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	471	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	535	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	587	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	669	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	789	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	889	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1075	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1153	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1224	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1264	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1499	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	334	CODE

AI Structural Patterns10 hits · 10 pts

Severity	File	Line	Context
LOW	…m-inference/ascend/mindformers/text_generator_infer.py	217	CODE
LOW	llm-inference/vllm/api_client.py	32	CODE
LOW	llm-inference/vllm/api_client.py	48	CODE
LOW	ai-framework/transformer-engine/mnist/main_stat.py	43	CODE
LOW	ai-framework/transformer-engine/mnist/main.py	43	CODE
LOW	ai-framework/deepspeed/hello_bert/train_bert.py	566	CODE
LOW	llm-train/alpaca-lora/finetune_metrics_epoch.py	28	CODE
LOW	llm-train/alpaca-lora/finetune.py	28	CODE
LOW	llm-train/chinese-llama-alpaca/run_clm_pt_with_peft.py	145	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	1044	CODE

Excessive Try-Catch Wrapping5 hits · 7 pts

Severity	File	Line	Snippet	Context
MEDIUM	…er-transformer/megatron-gpt2/gpt_summarization_stat.py	342	print('Error with datapoint : ', data_point_idx)	CODE
MEDIUM	…/faster-transformer/megatron-gpt2/gpt_summarization.py	325	print('Error with datapoint : ', data_point_idx)	CODE
LOW	llm-train/chinese-llama-alpaca/run_clm_pt_with_peft.py	463	except Exception:	CODE
MEDIUM	…ron/gpt2/merge_ck_and_inference/text_generation_cli.py	20	print(f"Error {response.status_code}: {response.json()['message']}")	CODE
LOW	llm-localization/ascend/mindie/script/model-test.py	258	except Exception as e:	STRING

Redundant / Tautological Comments4 hits · 6 pts

Severity	File	Line	Snippet	Context
LOW	llm-train/alpaca-lora/finetune_metrics_epoch.py	104	# Check if parameter passed or if set within environ	COMMENT
LOW	llm-train/alpaca-lora/finetune.py	98	# Check if parameter passed or if set within environ	COMMENT
LOW	…-compression/quantization/llm-qat/f4d873a/datautils.py	91	# Loop through the list of dictionaries	COMMENT
LOW	…-compression/quantization/llm-qat/f4d873a/datautils.py	98	# Append the value to the list associated with the key in dict_of_lists	COMMENT

Slop Phrases2 hits · 6 pts

Severity	File	Line	Snippet	Context
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	44	# Alternatively, you can use the following command to connect to an existing	COMMENT
MEDIUM	llm-train/alpa/train/pipeshard_parallelism.py	191	# device assignment of each stage, you can use the more advanced	COMMENT

Modern Structural Boilerplate5 hits · 5 pts

Severity	File	Line	Snippet	Context
LOW	llm-train/chatglm/main.py	36	logger = logging.getLogger(__name__)	CODE
LOW	llm-train/qlora/qlora.py	46	logger = logging.getLogger(__name__)	CODE
LOW	llm-train/chinese-llama-alpaca/run_clm_pt_with_peft.py	310	logger = logging.getLogger(__name__)	CODE
LOW	llm-train/chinese-llama-alpaca/run_clm_sft_with_peft.py	195	logger = logging.getLogger(__name__)	CODE
LOW	llm-train/megatron/gpt2/data/file_utils.py	36	logger = logging.getLogger(__name__) # pylint: disable=invalid-name	CODE

Docstring Block Structure1 hit · 5 pts

Severity	File	Line	Snippet	Context
HIGH	llm-train/qlora/accuracy.py	33	Args: predictions (`list` of `int`): Predicted labels. references (`list` of `int`): Ground truth labels. n	STRING

Fake / Example Data1 hit · 1 pts

Severity	File	Line	Snippet	Context
LOW	llm-data-engineering/sft-dataset/jinja-demo.py	10	result = template.render(name='John Doe')	CODE

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings