Dao-AILab/flash-attention

13.8

Adjusted Score

13.8

Raw Score

100%

Time Factor

2026-07-14

Last Push

24.4K

Stars

Python

Language

144.2K

Lines of Code

429

Files

1.2K

Pattern Hits

2026-07-14

Scan Date

0.16

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 10HIGH 57MEDIUM 170LOW 1001

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 1238 distinct pattern matches across 18 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Decorative Section Separators152 hits · 513 pts

Severity	File	Line	Snippet	Context
MEDIUM	tools/sass_diff.py	24	# ── Parsing ──────────────────────────────────────────────────────────────────	COMMENT
MEDIUM	tools/sass_diff.py	92	# ── Diffing ──────────────────────────────────────────────────────────────────	COMMENT
MEDIUM	tools/sass_diff.py	111	# ── Display ──────────────────────────────────────────────────────────────────	COMMENT
MEDIUM	tools/sass_diff.py	219	# ── Main ─────────────────────────────────────────────────────────────────────	COMMENT
MEDIUM	tools/ci/run_fa4_ci.py	28	# ── GPU helpers ───────────────────────────────────────────────────────────────	COMMENT
MEDIUM	tools/ci/run_fa4_ci.py	61	# ── Runtime DSL pin (decouples cutlass-dsl from the baked image) ─────────────────	COMMENT
MEDIUM	tools/ci/run_fa4_ci.py	86	# ── Step plan ─────────────────────────────────────────────────────────────────	COMMENT
MEDIUM	tools/ci/run_fa4_ci.py	129	# ── Step runner ───────────────────────────────────────────────────────────────	COMMENT
MEDIUM	tools/ci/run_fa4_ci.py	203	# ── CLI ───────────────────────────────────────────────────────────────────────	COMMENT
MEDIUM⚡	tests/cute/test_mask_mod_varlen.py	494	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/test_mask_mod_varlen.py	496	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	56	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	58	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	95	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	97	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	244	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	246	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	421	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	423	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	619	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	623	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	755	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod_varlen.py	757	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_score_mod_varlen.py	70	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_score_mod_varlen.py	72	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_score_mod_varlen.py	178	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_score_mod_varlen.py	180	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/test_score_mod_varlen.py	401	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/test_score_mod_varlen.py	403	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/score_mod_definitions.py	7	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/score_mod_definitions.py	10	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/score_mod_definitions.py	485	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/score_mod_definitions.py	487	# =============================================================================	COMMENT
MEDIUM	tests/cute/score_mod_definitions.py	197	# =============================================================================	COMMENT
MEDIUM	tests/cute/score_mod_definitions.py	201	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod.py	857	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod.py	861	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod.py	1533	# =============================================================================	COMMENT
MEDIUM	tests/cute/test_mask_mod.py	1535	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/mask_mod_definitions.py	14	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/mask_mod_definitions.py	17	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/mask_mod_definitions.py	19	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/mask_mod_definitions.py	21	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/mask_mod_definitions.py	270	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/mask_mod_definitions.py	272	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/mask_mod_definitions.py	333	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/mask_mod_definitions.py	337	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	176	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	180	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	388	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	390	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	492	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	494	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	596	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	598	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	748	# =============================================================================	COMMENT
MEDIUM	tests/cute/mask_mod_definitions.py	750	# =============================================================================	COMMENT
MEDIUM⚡	tests/cute/test_flash_attn.py	2830	# ---------------------------------------------------------------------------	COMMENT
MEDIUM⚡	tests/cute/test_flash_attn.py	2832	# ---------------------------------------------------------------------------	COMMENT
MEDIUM	tests/cute/test_flash_attn.py	2892	# ---------------------------------------------------------------------------	COMMENT
92 more matches not shown…

Hyper-Verbose Identifiers353 hits · 339 pts

Severity	File	Line	Snippet	Context
LOW	setup.py	92	def get_cuda_bare_metal_version(cuda_dir):	CODE
LOW	setup.py	215	def validate_and_update_archs(archs):	CODE
LOW	csrc/layer_norm/setup.py	16	def get_cuda_bare_metal_version(cuda_dir):	CODE
LOW	csrc/layer_norm/setup.py	25	def check_cuda_torch_binary_vs_bare_metal(cuda_dir):	CODE
LOW	csrc/fused_dense_lib/setup.py	10	def get_cuda_bare_metal_version(cuda_dir):	CODE
LOW	csrc/flash_attn/src/generate_kernels.py	43	def get_fwd_split_align_template() -> str:	CODE
LOW	hopper/test_attn_kvcache.py	155	def test_flash_attn_kvcache_nosplit(nheads_kv, gqa_ratio, num_requests, query_seqlen, context_seqlen, headdim, causal, g	CODE
LOW	hopper/test_attn_kvcache.py	292	def test_flash_attn_kvcache_output(nheads_kv, gqa_ratio, num_requests, query_seqlen, context_seqlen, headdim, causal, us	CODE
LOW	hopper/test_torch_compile_and_export.py	61	def test_compile_and_package_model():	CODE
LOW	hopper/test_flash_attn_triton_amd.py	334	def test_flash_attn_varlen_output(	CODE
LOW	hopper/test_flash_attn_triton_amd.py	1042	def test_flash_attn_race_condition(seqlen_q, seqlen_k, d, causal, dtype):	CODE
LOW	hopper/test_util.py	9	def generate_random_padding_mask(max_seqlen, batch_size, device, mode="random", zero_lengths=False):	CODE
LOW	hopper/setup.py	338	def get_cuda_bare_metal_version(cuda_dir):	STRING
LOW	hopper/test_flash_attn_bwd_determinism.py	391	def test_flash_attn_varlen_output(	CODE
LOW	hopper/test_flash_attn.py	404	def test_flash_attn_varlen_output(	CODE
LOW	hopper/test_flash_attn.py	1133	def test_flash_attn_race_condition(seqlen_q, seqlen_k, d, causal, dtype):	CODE
LOW	hopper/flash_attn_interface.py	313	def _flash_attn_backward_fake(	CODE
LOW	hopper/flash_attn_interface.py	747	def flash_attn_qkvpacked_func(	CODE
LOW	training/src/metrics/num_tokens.py	39	def _forward_reduce_state_update(self, args: Any, *kwargs: Any) -> Any:	CODE
LOW	training/src/callbacks/speed_monitor.py	35	def on_validation_epoch_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:	CODE
LOW	training/src/optim/param_grouping.py	15	def group_parameters_for_optimizer(model, optimizer_cfg, bias_weight_decay=False,	CODE
LOW	training/src/utils/gpu_affinity.py	57	def set_single_unique_affinity(gpu_id, nproc_per_node):	CODE
LOW	training/src/utils/gpu_affinity.py	80	def set_socket_unique_affinity(gpu_id, nproc_per_node, mode):	CODE
LOW	training/src/utils/checkpoint.py	32	def blockdiag_to_dense_mlp_bert(state_dict):	CODE
LOW	training/src/utils/checkpoint.py	41	def interpolate_pos_embedding(state_dict, out_seqlen, pos_embedding_name='model.pos_encoder.pe', interleave=False):	CODE
LOW	training/src/utils/ddp_zero1.py	24	def get_zero_optimizer_state_dict_local(optimizer, global_rank):	CODE
LOW	tests/test_flash_attn_triton_amd.py	44	def attn_bias_from_alibi_slopes(	CODE
LOW	tests/test_flash_attn_triton_amd.py	73	def generate_random_padding_mask(max_seqlen, batch_size, device, mode="random"):	CODE
LOW	tests/test_flash_attn_triton_amd.py	397	def attention_blocksparse_ref(qkv, blockmask, attn_mask, dropout_p, dropout_mask):	CODE
LOW	tests/test_flash_attn_triton_amd.py	601	def test_flash_attn_qkvpacked(seqlen, d, dropout_p, causal, local, alibi, deterministic, dtype):	CODE
LOW	tests/test_flash_attn_triton_amd.py	748	def test_flash_attn_varlen_qkvpacked(	CODE
LOW	tests/test_flash_attn_triton_amd.py	1191	def test_flash_attn_varlen_output(	CODE
LOW	tests/test_flash_attn_triton_amd.py	1619	def test_flash_attn_varlen_causal(	CODE
LOW	tests/test_flash_attn_triton_amd.py	2230	def test_flash_attn_race_condition(seqlen_q, seqlen_k, d, dropout_p, causal, dtype):	CODE
LOW	tests/test_flash_attn_triton_amd.py	2279	def test_flash_attn_bwd_overflow(seqlen, d, causal, dtype):	CODE
LOW	tests/test_flash_attn_triton_amd.py	2336	def test_flash_attn_bwd_transpose(seqlen, d, causal, dtype):	CODE
LOW	tests/test_flash_attn_triton_amd.py	2389	def test_flash_attn_bwd_varlen_overflow(d, causal, dtype):	CODE
LOW	tests/test_flash_attn_triton_amd.py	2448	def test_flash_attn_deterministic(seqlen_q, seqlen_k, swap_sq_sk, d, causal, local, dtype):	CODE
LOW	tests/test_flash_attn_triton_amd.py	2507	def test_flash_attn_varlen_deterministic(seqlen_q, seqlen_k, swap_sq_sk, d, causal, local, dtype):	CODE
LOW	tests/test_rotary.py	229	def test_rotary_emb_varlen_func(inplace, interleaved, rotary_fraction, seqlen_offsets_type, dtype):	CODE
LOW⚡	tests/test_flash_attn_ck.py	42	def get_bwd_unsupported_reason(d, deterministic):	CODE
LOW⚡	tests/test_flash_attn_ck.py	49	def ck_randval_to_dropout_mask(randval, p):	CODE
LOW⚡	tests/test_flash_attn_ck.py	56	def pad_rearrange_dropout_mask_hts_to_bhss(S_dmask, cu_seqlens_q, seqlen_q_rounded, seqlen_k_rounded):	CODE
LOW	tests/test_flash_attn_ck.py	88	def test_flash_attn_qkvpacked(seqlen, d, dropout_p, causal, local, alibi, deterministic, dtype):	CODE
LOW	tests/test_flash_attn_ck.py	186	def test_flash_attn_varlen_qkvpacked(seqlen, d, dropout_p, causal, local, alibi, deterministic, dtype):	CODE
LOW	tests/test_flash_attn_ck.py	537	def test_flash_attn_varlen_output(	CODE
LOW	tests/test_flash_attn_ck.py	895	def test_flash_attn_varlen_causal(	CODE
LOW	tests/test_flash_attn_ck.py	1327	def test_flash_attn_race_condition(seqlen_q, seqlen_k, d, dropout_p, causal, dtype):	CODE
LOW	tests/test_flash_attn_ck.py	1374	def test_flash_attn_bwd_overflow(seqlen, d, causal, dtype):	CODE
LOW	tests/test_flash_attn_ck.py	1433	def test_flash_attn_bwd_transpose(seqlen, d, causal, dtype):	CODE
LOW	tests/test_flash_attn_ck.py	1486	def test_flash_attn_bwd_varlen_overflow(d, causal, dtype):	CODE
LOW	tests/test_flash_attn_ck.py	1522	def test_flash_attn_bwd_varlen_seqq_zero(d, causal, nheads_kv, deterministic, dtype):	CODE
LOW	tests/test_flash_attn_ck.py	1583	def test_flash_attn_deterministic(seqlen_q, seqlen_k, swap_sq_sk, d, causal, local, dtype):	CODE
LOW	tests/test_flash_attn_ck.py	1634	def test_flash_attn_varlen_deterministic(seqlen_q, seqlen_k, swap_sq_sk, d, causal, local, dtype):	CODE
LOW	tests/test_util.py	8	def generate_random_padding_mask(max_seqlen, batch_size, device, mode="random", zero_lengths=False):	CODE
LOW	tests/test_flash_attn.py	29	def attn_bias_from_alibi_slopes(	CODE
LOW	tests/test_flash_attn.py	58	def generate_random_padding_mask(max_seqlen, batch_size, device, mode="random"):	CODE
LOW	tests/test_flash_attn.py	382	def attention_blocksparse_ref(qkv, blockmask, attn_mask, dropout_p, dropout_mask):	CODE
LOW	tests/test_flash_attn.py	586	def test_flash_attn_qkvpacked(seqlen, d, dropout_p, causal, local, alibi, deterministic, dtype):	CODE
LOW	tests/test_flash_attn.py	733	def test_flash_attn_varlen_qkvpacked(	CODE
293 more matches not shown…

Cross-File Repetition48 hits · 240 pts

Severity	File	Snippet	Context
HIGH	README.md	if k and v are not none, k_cache and v_cache will be updated inplace with the new values from k and v. this is useful	STRING
HIGH	hopper/flash_attn_interface.py	if k and v are not none, k_cache and v_cache will be updated inplace with the new values from k and v. this is useful	STRING
HIGH	flash_attn/flash_attn_interface.py	if k and v are not none, k_cache and v_cache will be updated inplace with the new values from k and v. this is useful	STRING
HIGH	hopper/test_attn_kvcache.py	arguments: q: (batch_size, seqlen_q, nheads, head_dim) k: (batch_size, seqlen_k, nheads_k, head_dim) v: (batch_size, seq	STRING
HIGH	tests/test_flash_attn_triton_amd.py	arguments: q: (batch_size, seqlen_q, nheads, head_dim) k: (batch_size, seqlen_k, nheads_k, head_dim) v: (batch_size, seq	STRING
HIGH	tests/test_util.py	arguments: q: (batch_size, seqlen_q, nheads, head_dim) k: (batch_size, seqlen_k, nheads_k, head_dim) v: (batch_size, seq	STRING
HIGH	tests/test_flash_attn.py	arguments: q: (batch_size, seqlen_q, nheads, head_dim) k: (batch_size, seqlen_k, nheads_k, head_dim) v: (batch_size, seq	STRING
HIGH	hopper/test_kvcache.py	use pytorch benchmark on the forward pass of an arbitrary function.	STRING
HIGH	benchmarks/benchmark_gemm.py	use pytorch benchmark on the forward pass of an arbitrary function.	STRING
HIGH	flash_attn/cute/benchmark.py	use pytorch benchmark on the forward pass of an arbitrary function.	STRING
HIGH	flash_attn/utils/benchmark.py	use pytorch benchmark on the forward pass of an arbitrary function.	STRING
HIGH	hopper/benchmark_flash_attention_fp8.py	arguments: qkv: (batch_size, seqlen, 3, nheads, head_dim) dropout_p: float output: output: (batch_size, seqlen, nheads,	STRING
HIGH	benchmarks/benchmark_causal.py	arguments: qkv: (batch_size, seqlen, 3, nheads, head_dim) dropout_p: float output: output: (batch_size, seqlen, nheads,	STRING
HIGH	benchmarks/benchmark_flash_attention.py	arguments: qkv: (batch_size, seqlen, 3, nheads, head_dim) dropout_p: float output: output: (batch_size, seqlen, nheads,	STRING
HIGH	hopper/test_util.py	arguments: q: (batch_size, seqlen_q, nheads, d) k: (batch_size, seqlen_k, nheads_k, d) v: (batch_size, seqlen_k, nheads_	STRING
HIGH	flash_attn/utils/testing.py	arguments: q: (batch_size, seqlen_q, nheads, d) k: (batch_size, seqlen_k, nheads_k, d) v: (batch_size, seqlen_k, nheads_	STRING
HIGH	tests/test_flash_attn_triton_amd.py	arguments: q: (batch_size, seqlen_q, nheads, d) k: (batch_size, seqlen_k, nheads_k, d) v: (batch_size, seqlen_k, nheads_	STRING
HIGH	tests/test_util.py	arguments: q: (batch_size, seqlen_q, nheads, d) k: (batch_size, seqlen_k, nheads_k, d) v: (batch_size, seqlen_k, nheads_	STRING
HIGH	tests/test_flash_attn.py	arguments: q: (batch_size, seqlen_q, nheads, d) k: (batch_size, seqlen_k, nheads_k, d) v: (batch_size, seqlen_k, nheads_	STRING
HIGH	tests/test_flash_attn_triton_amd.py	we previously had a bug where not masking elements beyond seqlen_k caused nan in dq, in the case where seqlen % 128 != 0	STRING
HIGH	tests/test_flash_attn_ck.py	we previously had a bug where not masking elements beyond seqlen_k caused nan in dq, in the case where seqlen % 128 != 0	STRING
HIGH	tests/test_flash_attn.py	we previously had a bug where not masking elements beyond seqlen_k caused nan in dq, in the case where seqlen % 128 != 0	STRING
HIGH	tests/test_flash_attn_triton_amd.py	we previously had a bug where we were using the wrong strides of dout, which shows up when dout is not contiguous.	STRING
HIGH	tests/test_flash_attn_ck.py	we previously had a bug where we were using the wrong strides of dout, which shows up when dout is not contiguous.	STRING
HIGH	tests/test_flash_attn.py	we previously had a bug where we were using the wrong strides of dout, which shows up when dout is not contiguous.	STRING
HIGH	tests/test_flash_attn_triton_amd.py	we previously had a bug where not masking elements beyond seqlen_k caused nan in dq, in the case where seqlen % 128 != 0	STRING
HIGH	tests/test_flash_attn_ck.py	we previously had a bug where not masking elements beyond seqlen_k caused nan in dq, in the case where seqlen % 128 != 0	STRING
HIGH	tests/test_flash_attn.py	we previously had a bug where not masking elements beyond seqlen_k caused nan in dq, in the case where seqlen % 128 != 0	STRING
HIGH	tests/models/test_llama.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_bigcode.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_opt.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_falcon.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_gptj.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_btlm.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_baichuan.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_gpt_neox.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_gpt.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_bert.py	check that our implementation of bert (without any optimizations enabled) matches the hf implementation: the output of o	STRING
HIGH	tests/models/test_llama.py	check that our implementation of gpt2 generation matches the hf implementation: the scores in fp16 should be around the	STRING
HIGH	tests/models/test_falcon.py	check that our implementation of gpt2 generation matches the hf implementation: the scores in fp16 should be around the	STRING
HIGH	tests/models/test_baichuan.py	check that our implementation of gpt2 generation matches the hf implementation: the scores in fp16 should be around the	STRING
HIGH	tests/models/test_opt.py	check that our implementation of gpt2 generation matches the hf implementation: the scores in fp16 should be around the	STRING
HIGH	tests/models/test_gpt_generation_parallel.py	check that our implementation of gpt2 generation matches the hf implementation: the scores in fp16 should be around the	STRING
HIGH	tests/models/test_gpt.py	check that our implementation of gpt2 generation matches the hf implementation: the scores in fp16 should be around the	STRING
HIGH	flash_attn/cute/flash_bwd.py	check if the kernel can be implemented with the given parameters. :param dtype: data type :type dtype: cutlass.numeric :	STRING
HIGH	flash_attn/cute/flash_bwd_postprocess.py	check if the kernel can be implemented with the given parameters. :param dtype: data type :type dtype: cutlass.numeric :	STRING
HIGH	flash_attn/cute/flash_fwd.py	check if the kernel can be implemented with the given parameters. :param dtype: data type :type dtype: cutlass.numeric :	STRING
HIGH	flash_attn/cute/flash_bwd_preprocess.py	check if the kernel can be implemented with the given parameters. :param dtype: data type :type dtype: cutlass.numeric :	STRING

Unused Imports199 hits · 197 pts

Severity	File	Line	Context
LOW	setup.py	25	CODE
LOW	csrc/layer_norm/setup.py	2	CODE
LOW	csrc/layer_norm/setup.py	3	CODE
LOW	csrc/layer_norm/setup.py	8	CODE
LOW	csrc/layer_norm/setup.py	9	CODE
LOW	csrc/fused_dense_lib/setup.py	5	CODE
LOW	tools/ci/run_fa4_ci.py	7	CODE
LOW	tools/ci/assert_dsl_floor.py	13	CODE
LOW	hopper/benchmark_mla_decode.py	11	CODE
LOW	hopper/test_attn_kvcache.py	4	CODE
LOW	hopper/test_attn_kvcache.py	6	CODE
LOW	hopper/test_attn_kvcache.py	8	CODE
LOW	hopper/test_kvcache.py	9	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	3	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	7	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	10	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	12	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	12	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	13	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	13	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	15	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	16	CODE
LOW	hopper/test_flash_attn_triton_amd.py	7	CODE
LOW	hopper/benchmark_attn.py	1	CODE
LOW	hopper/benchmark_attn.py	2	CODE
LOW	hopper/benchmark_attn.py	7	CODE
LOW	hopper/benchmark_attn.py	8	CODE
LOW	hopper/benchmark_attn.py	21	CODE
LOW	hopper/benchmark_attn.py	24	CODE
LOW	hopper/benchmark_attn.py	24	CODE
LOW	hopper/benchmark_attn.py	24	CODE
LOW	hopper/benchmark_attn.py	24	CODE
LOW	hopper/benchmark_attn.py	24	CODE
LOW	hopper/setup.py	25	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	2	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	7	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	8	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	10	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	16	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	16	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	23	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	24	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	24	CODE
LOW	hopper/test_flash_attn.py	7	CODE
LOW	hopper/flash_attn_interface.py	3	CODE
LOW	hopper/flash_attn_interface.py	7	CODE
LOW	hopper/benchmark_split_kv.py	5	CODE
LOW	hopper/flash_attn_3/flash_attn_config.py	1	CODE
LOW	hopper/flash_attn_3/flash_attn_interface.py	2	CODE
LOW	training/tests/datamodules/test_language_modeling_hf.py	6	CODE
LOW	training/src/eval.py	1	CODE
LOW	training/src/eval.py	8	CODE
LOW	training/src/metrics/perplexity.py	9	CODE
LOW	training/src/metrics/accuracy.py	1	CODE
LOW	training/src/metrics/accuracy.py	4	CODE
LOW	training/src/tasks/seq.py	1	CODE
LOW	training/src/tasks/seq.py	4	CODE
LOW	training/src/distributed/ddp_comm_hooks.py	3	CODE
LOW	training/src/distributed/ddp_comm_hooks.py	3	CODE
LOW	training/src/callbacks/flop_count.py	2	CODE
139 more matches not shown…

Hallucination Indicators10 hits · 130 pts

Severity	File	Line	Snippet	Context
CRITICAL	hopper/test_flash_attn_triton_amd.py	1140	assert torch.ops.flash_attn_3.fwd.default._schema.is_backward_compatible_with(parse_schema(	CODE
CRITICAL⚡	hopper/test_flash_attn_triton_amd.py	1153	assert torch.ops.flash_attn_3.bwd.default._schema.is_backward_compatible_with(parse_schema(	CODE
CRITICAL⚡	hopper/test_flash_attn_triton_amd.py	1161	assert torch.ops.flash_attn_3.fwd_combine.default._schema.is_backward_compatible_with(parse_schema(	CODE
CRITICAL⚡	hopper/test_flash_attn_triton_amd.py	1165	assert torch.ops.flash_attn_3.get_scheduler_metadata.default._schema.is_backward_compatible_with(parse_schema(	CODE
CRITICAL	hopper/test_flash_attn.py	1230	assert torch.ops.flash_attn_3.fwd.default._schema.is_backward_compatible_with(parse_schema(	CODE
CRITICAL⚡	hopper/test_flash_attn.py	1243	assert torch.ops.flash_attn_3.bwd.default._schema.is_backward_compatible_with(parse_schema(	CODE
CRITICAL⚡	hopper/test_flash_attn.py	1251	assert torch.ops.flash_attn_3.fwd_combine.default._schema.is_backward_compatible_with(parse_schema(	CODE
CRITICAL⚡	hopper/test_flash_attn.py	1255	assert torch.ops.flash_attn_3.get_scheduler_metadata.default._schema.is_backward_compatible_with(parse_schema(	CODE
CRITICAL	tests/models/test_btlm.py	214	assert model.transformer.embeddings.word_embeddings.weight.mean().abs() < 1e-4	CODE
CRITICAL	tests/models/test_btlm.py	216	model.transformer.embeddings.word_embeddings.weight.std()	CODE

Deep Nesting127 hits · 118 pts

Severity	File	Line	Context
LOW	setup.py	101	CODE
LOW	tools/sass_diff.py	128	CODE
LOW	hopper/benchmark_flash_attention_fp8.py	34	CODE
LOW	hopper/test_flash_attn_triton_amd.py	628	CODE
LOW	hopper/benchmark_attn.py	76	CODE
LOW	hopper/setup.py	138	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	110	CODE
LOW	hopper/test_flash_attn_bwd_determinism.py	391	CODE
LOW	hopper/test_flash_attn.py	715	CODE
LOW	hopper/benchmark_split_kv.py	35	CODE
LOW	training/src/train.py	32	CODE
LOW	training/src/callbacks/norm_monitor.py	33	CODE
LOW	training/src/optim/param_grouping.py	15	CODE
LOW	training/src/utils/gpu_affinity.py	80	CODE
LOW	training/src/utils/gpu_affinity.py	127	CODE
LOW	training/src/utils/ema.py	228	CODE
LOW	training/src/utils/distributed.py	70	CODE
LOW	training/src/models/modules/seq_common.py	15	CODE
LOW	tests/cute/benchmark_mask_mod.py	154	CODE
LOW	tests/cute/benchmark_mask_mod.py	448	CODE
LOW	tests/cute/test_mask_mod_varlen.py	249	CODE
LOW	tests/cute/test_mask_mod_varlen.py	903	CODE
LOW	tests/cute/test_flash_attn_race_condition.py	391	CODE
LOW	tests/cute/test_score_mod_varlen.py	602	CODE
LOW	tests/cute/test_score_mod_varlen.py	950	CODE
LOW	tests/cute/test_mask_mod.py	2193	CODE
LOW	tests/cute/test_mask_mod.py	2239	CODE
LOW	tests/cute/test_flash_attn.py	159	CODE
LOW	tests/cute/test_flash_attn.py	621	CODE
LOW	tests/cute/test_flash_attn.py	1128	CODE
LOW	tests/cute/test_flash_attn.py	2366	CODE
LOW	tests/cute/test_flash_attn.py	2755	CODE
LOW	tests/cute/test_block_sparsity.py	43	CODE
LOW	tests/cute/test_block_sparsity.py	484	CODE
LOW	benchmarks/tune_ex2_emu.py	33	CODE
LOW	benchmarks/tune_ex2_emu.py	225	CODE
LOW	benchmarks/benchmark_attn.py	363	CODE
LOW	benchmarks/bench_sm90.py	334	CODE
LOW	benchmarks/bench_sm90.py	367	CODE
LOW	benchmarks/bench_sm90.py	397	CODE
LOW	benchmarks/bench_sm90.py	452	CODE
LOW	benchmarks/bench_sm90.py	489	CODE
LOW	flash_attn/flash_attn_triton.py	66	CODE
LOW	flash_attn/flash_attn_triton.py	365	CODE
LOW	flash_attn/flash_attn_interface.py	31	CODE
LOW	flash_attn/cute/sm90_config_search.py	174	CODE
LOW	flash_attn/cute/sm90_config_search.py	315	CODE
LOW	flash_attn/cute/mask.py	76	CODE
LOW	flash_attn/cute/mask.py	177	CODE
LOW	flash_attn/cute/mask.py	497	CODE
LOW	flash_attn/cute/mask.py	615	CODE
LOW	flash_attn/cute/mask.py	777	CODE
LOW	flash_attn/cute/mask.py	1444	CODE
LOW	flash_attn/cute/mask.py	1586	CODE
LOW	flash_attn/cute/mask.py	1661	CODE
LOW	flash_attn/cute/sm100_hd256_2cta_fmha_forward.py	589	CODE
LOW	flash_attn/cute/sm100_hd256_2cta_fmha_forward.py	1560	CODE
LOW	flash_attn/cute/flash_bwd_mla_dq_dqv_sm100.py	437	CODE
LOW	flash_attn/cute/ampere_helpers.py	35	CODE
LOW	flash_attn/cute/flash_bwd_mla_sm100.py	1499	CODE
67 more matches not shown…

Over-Commented Block108 hits · 104 pts

Severity	File	Line	Snippet	Context
LOW	csrc/flash_attn_ck/flash_common.hpp	1	/******************************************************************************	COMMENT
LOW	csrc/layer_norm/static_switch.h	1	// Inspired by https://github.com/NVIDIA/DALI/blob/main/include/dali/core/static_switch.h	COMMENT
LOW	csrc/fused_dense_lib/fused_dense.cpp	1	// Adapted from https://github.com/NVIDIA/apex/blob/master/csrc/fused_dense.cpp	COMMENT
LOW	csrc/flash_attn/flash_api.cpp	1	/******************************************************************************	COMMENT
LOW	csrc/flash_attn/src/flash_fwd_kernel.h	1	/******************************************************************************	COMMENT
LOW	csrc/flash_attn/src/flash_fwd_kernel.h	221	// for (int i = 0; i < size(tScQ); ++i) {	COMMENT
LOW	csrc/flash_attn/src/utils.h	1	/******************************************************************************	COMMENT
LOW	csrc/flash_attn/src/utils.h	321	cute::clear(D(_, m, _));	COMMENT
LOW	csrc/flash_attn/src/utils.h	341	// if (Is_even_MN \|\| get<0>(identity_MN(0, m, 0)) < max_MN) {	COMMENT
LOW	csrc/flash_attn/src/flash_fwd_launch_template.h	1	/******************************************************************************	COMMENT
LOW	csrc/flash_attn/src/flash_fwd_launch_template.h	261	} else {	COMMENT
LOW	csrc/flash_attn/src/flash_bwd_launch_template.h	1	/******************************************************************************	COMMENT
LOW	csrc/flash_attn/src/flash_bwd_launch_template.h	21	#define KERNEL_PARAM_MODIFIER __grid_constant__	COMMENT
LOW	csrc/flash_attn/src/flash_bwd_launch_template.h	181	// run_flash_bwd<Flash_bwd_kernel_traits<Headdim, 128, 128, 8, 4, 4, 4, true, false, T>, Is_dropout>(params,	COMMENT
LOW	csrc/flash_attn/src/flash_bwd_launch_template.h	241	// printf("max_smem_per_block = %d\n", max_smem_per_block);	COMMENT
LOW	csrc/flash_attn/src/dropout.h	41	uint2 rowcol = make_uint2(block_row_start, block_col_start);	COMMENT
LOW	csrc/flash_attn/src/flash_bwd_kernel.h	1	/***************************************************************************************************	COMMENT
LOW	csrc/flash_attn/src/flash_bwd_kernel.h	321	// If not local, we're guaranteed that m_block_min <= m_block:	COMMENT
LOW	csrc/flash_attn/src/static_switch.h	1	// Inspired by	COMMENT
LOW	tools/ci/build_sif.sh	1	#!/usr/bin/env bash	COMMENT
LOW	hopper/mainloop_fwd_sm90_tma_gmma_ws.hpp	1	/******************************************************************************	COMMENT
LOW	hopper/utils.h	1	/******************************************************************************	COMMENT
LOW	hopper/mainloop_bwd_sm90_tma_gmma_ws.hpp	1	/******************************************************************************	COMMENT
LOW	hopper/benchmark_mla_decode.py	121	print(f"Arithmetic intensity: {flops / mem_io:.1f}")	COMMENT
LOW	hopper/mainloop_bwd_sm80.hpp	1	/******************************************************************************	COMMENT
LOW	hopper/mainloop_bwd_sm80.hpp	561	#pragma unroll	COMMENT
LOW	hopper/mainloop_bwd_sm80.hpp	621	// Instead of passing in tQcQ, we pass in t0QcQ and subtract the offset from the limit	COMMENT
LOW	hopper/flash_fwd_launch_template.h	1	/******************************************************************************	COMMENT
LOW	hopper/benchmark_flash_attention_fp8.py	321	# )()	COMMENT
LOW	hopper/test_flash_attn_triton_amd.py	61	@pytest.mark.parametrize("deterministic", [False])	COMMENT
LOW	hopper/test_flash_attn_triton_amd.py	221	and dtype != torch.float8_e4m3fn	COMMENT
LOW	hopper/test_flash_attn_triton_amd.py	241	# causal,	COMMENT
LOW	hopper/test_flash_attn_triton_amd.py	501	):	COMMENT
LOW	hopper/test_flash_attn_triton_amd.py	581	# @pytest.mark.parametrize("mha_type", ["mha"])	COMMENT
LOW	hopper/test_flash_attn_triton_amd.py	901	out = output_pad_fn(out)	COMMENT
LOW	hopper/tile_scheduler.hpp	641	// Total number of blocks for the next 31 batches	COMMENT
LOW	hopper/tile_scheduler.hpp	741	int split_idx = bidh - bidh_actual * num_splits;	COMMENT
LOW	hopper/benchmark_attn.py	41	def time_fwd(func, args, repeats=30, verbose=True, desc="", *kwargs):	COMMENT
LOW	hopper/benchmark_attn.py	241	# bs_seqlen_vals = [(32, 512), (16, 1024)]	COMMENT
LOW	hopper/benchmark_attn.py	401	# print(time_f)	COMMENT
LOW	hopper/flash_bwd_launch_template.h	1	/******************************************************************************	COMMENT
LOW	hopper/test_flash_attn_bwd_determinism.py	61	# @pytest.mark.parametrize("mha_type", ["mqa"])	COMMENT
LOW	hopper/test_flash_attn_bwd_determinism.py	341	# @pytest.mark.parametrize("dtype", [torch.float8_e4m3fn])	COMMENT
LOW	hopper/test_flash_attn_bwd_determinism.py	401	# batch_size = 40	COMMENT
LOW	hopper/static_switch.h	1	// Inspired by	COMMENT
LOW	hopper/mainloop_fwd_sm80.hpp	1	/******************************************************************************	COMMENT
LOW	hopper/test_flash_attn.py	121	# @pytest.mark.parametrize("has_qv", [True])	COMMENT
LOW	hopper/test_flash_attn.py	301	# k,	COMMENT
LOW	hopper/test_flash_attn.py	361	@pytest.mark.parametrize("softcap", [0.0] + ([15.0] if not DISABLE_SOFTCAP else []))	COMMENT
LOW	hopper/test_flash_attn.py	601	# None,	COMMENT
LOW	hopper/test_flash_attn.py	621	dv.masked_fill_(k_zero_masking, 0.0)	COMMENT
LOW	hopper/flash_fwd_kernel_sm90.h	1	/******************************************************************************	COMMENT
LOW	hopper/flash_api.cpp	1221	#ifndef FLASHATTENTION_DISABLE_HDIM256	COMMENT
LOW	hopper/benchmark_split_kv.py	121	causal=causal,	COMMENT
LOW	hopper/flash_api_stable.cpp	1	/******************************************************************************	COMMENT
LOW	hopper/flash_api_stable.cpp	541	#endif	COMMENT
LOW	hopper/flash_api_stable.cpp	1241	if (out_type == torch::headeronly::ScalarType::BFloat16) {	COMMENT
LOW	hopper/flash_api_stable.cpp	1281	if (params.d_rounded == 64) { return run_mha_bwd_<Arch, cutlass::half_t, 64, Has_softcap>(params, stream); }	COMMENT
LOW	hopper/flash_api_stable.cpp	1301	#endif	COMMENT
LOW	training/configs/experiment/owt/gpt2xl-flash.yaml	1	# @package _global_	COMMENT
48 more matches not shown…

AI Structural Patterns105 hits · 100 pts

Severity	File	Line	Context
LOW	hopper/test_attn_kvcache.py	45	CODE
LOW	hopper/test_torch_compile_and_export.py	36	CODE
LOW	hopper/test_util.py	226	CODE
LOW	hopper/flash_attn_interface.py	60	CODE
LOW	hopper/flash_attn_interface.py	154	CODE
LOW	hopper/flash_attn_interface.py	259	CODE
LOW	hopper/flash_attn_interface.py	313	CODE
LOW	hopper/flash_attn_interface.py	747	CODE
LOW	hopper/flash_attn_interface.py	809	CODE
LOW	hopper/flash_attn_interface.py	890	CODE
LOW	hopper/flash_attn_interface.py	942	CODE
LOW	hopper/flash_attn_interface.py	1106	CODE
LOW	hopper/flash_attn_interface.py	455	CODE
LOW	hopper/flash_attn_interface.py	555	CODE
LOW	hopper/flash_attn_interface.py	645	CODE
LOW	training/src/datamodules/language_modeling_hf.py	42	CODE
LOW	training/src/datamodules/imagenet.py	63	CODE
LOW	training/src/utils/gpu_affinity.py	42	CODE
LOW	training/src/utils/checkpoint.py	66	CODE
LOW	tests/test_flash_attn_triton_amd.py	232	CODE
LOW	tests/test_flash_attn_triton_amd.py	322	CODE
LOW	tests/test_flash_attn_triton_amd.py	355	CODE
LOW	tests/test_util.py	185	CODE
LOW	tests/test_flash_attn.py	217	CODE
LOW	tests/test_flash_attn.py	307	CODE
LOW	tests/test_flash_attn.py	340	CODE
LOW	tests/cute/test_flash_attn_varlen.py	51	CODE
LOW	tests/cute/score_mod_definitions.py	338	CODE
LOW	tests/cute/test_mask_mod.py	1537	CODE
LOW	benchmarks/clc_bench.py	758	CODE
LOW	benchmarks/bench_sm90.py	107	CODE
LOW	flash_attn/flash_attn_interface.py	154	CODE
LOW	flash_attn/flash_attn_interface.py	207	CODE
LOW	flash_attn/flash_attn_interface.py	1019	CODE
LOW	flash_attn/flash_attn_interface.py	1078	CODE
LOW	flash_attn/flash_attn_interface.py	1156	CODE
LOW	flash_attn/flash_attn_interface.py	1233	CODE
LOW	flash_attn/flash_attn_interface.py	1299	CODE
LOW	flash_attn/flash_attn_interface.py	1391	CODE
LOW	flash_attn/flash_attn_interface.py	1485	CODE
LOW	flash_attn/losses/cross_entropy.py	10	CODE
LOW	flash_attn/cute/mask.py	615	CODE
LOW	flash_attn/cute/mask.py	1078	CODE
LOW	flash_attn/cute/mask.py	1583	CODE
LOW	flash_attn/cute/sm100_hd256_2cta_fmha_forward.py	36	CODE
LOW	flash_attn/cute/sm100_hd256_2cta_fmha_forward.py	168	CODE
LOW	flash_attn/cute/seqlen_info.py	84	CODE
LOW	flash_attn/cute/flash_bwd_mla_sm100.py	44	CODE
LOW	flash_attn/cute/flash_bwd.py	32	CODE
LOW	flash_attn/cute/flash_bwd.py	374	CODE
LOW	flash_attn/cute/flash_fwd_mla_sm100.py	49	CODE
LOW	flash_attn/cute/flash_fwd_mla_sm100.py	350	CODE
LOW	flash_attn/cute/compute_block_sparsity.py	334	CODE
LOW	flash_attn/cute/flash_bwd_sm90.py	49	CODE
LOW	flash_attn/cute/flash_bwd_sm90.py	344	CODE
LOW	flash_attn/cute/flash_bwd_sm90.py	630	CODE
LOW	flash_attn/cute/flash_bwd_sm100.py	52	CODE
LOW	flash_attn/cute/flash_bwd_sm100.py	444	CODE
LOW	flash_attn/cute/interface.py	299	CODE
LOW	flash_attn/cute/interface.py	1217	CODE
45 more matches not shown…

Structural Annotation Overuse34 hits · 58 pts

Severity	File	Line	Snippet	Context
LOW	hopper/flash_fwd_combine_kernel.h	229	// Step 1: load LSE_partial from gmem -> smem	COMMENT
LOW	hopper/flash_fwd_combine_kernel.h	274	// Step 2: Load O_partial from gmem -> smem for split = 0, 1, ..., kStages - 2.	COMMENT
LOW⚡	hopper/flash_fwd_combine_kernel.h	335	// Step 3: load and transpose LSE_partial from smem -> rmem	COMMENT
LOW⚡	hopper/flash_fwd_combine_kernel.h	345	// Step 4: compute the final LSE along the split dimension	COMMENT
LOW	hopper/flash_fwd_combine_kernel.h	394	// Step 5: store final LSE back to gmem	COMMENT
LOW	hopper/flash_fwd_combine_kernel.h	417	// Step 6: read O_partial from gmem -> smem -> rmem and accumulate the final O	COMMENT
LOW	hopper/flash_fwd_combine_kernel.h	460	// Step 7: Write the final O to gmem	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	174	// Step 1: load dQaccum from gmem to smem	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	200	// Step 2: Load dQaccum from smem to register, then convert fp32 -> fp16/bf16	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	218	// Step 3: Copy dQ from register to smem	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	229	// Step 4: Copy dQ from smem to register to prepare for coalesced write to gmem	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	247	// Step 5: Copy dQ from register to gmem	COMMENT
LOW	hopper/epilogue_fwd.hpp	251	// Step 1: Write O from rmem -> smem	COMMENT
LOW	hopper/epilogue_fwd.hpp	281	// Step 2: Write LSE from rmem -> gmem	COMMENT
LOW	hopper/epilogue_fwd.hpp	310	// Step 3: Write O from smem -> gmem	COMMENT
LOW	AI/DEBUG_2CTA.md	5	### Step 1: Build a minimal repro	COMMENT
LOW	AI/DEBUG_2CTA.md	12	### Step 2: Add printf to locate the hang	COMMENT
LOW	AI/DEBUG_2CTA.md	42	### Step 3: Identify the deadlock chain	COMMENT
LOW	AI/DEBUG_2CTA.md	54	### Step 4: Vary the problem size systematically	COMMENT
LOW⚡	AI/DEBUG_2CTA.md	67	### Step 5: Check barrier byte counts (tx_count)	COMMENT
LOW⚡	AI/DEBUG_2CTA.md	77	### Step 6: Check phase / parity tracking	COMMENT
LOW⚡	AI/DEBUG_2CTA.md	81	### Step 7: Beware compiler-as-bug-source	COMMENT
LOW⚡	flash_attn/cute/flash_bwd_postprocess.py	492	# Step 1: load dQaccum from gmem to smem	COMMENT
LOW⚡	flash_attn/cute/flash_bwd_postprocess.py	501	# Step 2: load dQ from smem to rmem	COMMENT
LOW	flash_attn/cute/flash_bwd_postprocess.py	534	# Step 3: Copy dQ from register to smem	COMMENT
LOW⚡	flash_attn/cute/flash_bwd_postprocess.py	568	# Step 4: Copy dQ from smem to register to prepare for coalesced write to gmem	COMMENT
LOW⚡	flash_attn/cute/flash_bwd_postprocess.py	577	# Step 5: Copy dQ from register to gmem	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	405	# Step 1: Load LSE_partial from gmem to shared memory	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	442	# Step 2: Load O_partial for pipeline stages	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	495	# Step 3: Load and transpose LSE from smem to registers	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	513	# Step 4: Compute final LSE along split dimension	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	573	# Step 5: Store final LSE to gmem	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	595	# Step 6: Read O_partial and accumulate final O	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	642	# Step 7: Write final O to gmem	COMMENT

Verbosity Indicators32 hits · 55 pts

Severity	File	Line	Snippet	Context
LOW	hopper/flash_fwd_combine_kernel.h	229	// Step 1: load LSE_partial from gmem -> smem	COMMENT
LOW	hopper/flash_fwd_combine_kernel.h	274	// Step 2: Load O_partial from gmem -> smem for split = 0, 1, ..., kStages - 2.	COMMENT
LOW⚡	hopper/flash_fwd_combine_kernel.h	335	// Step 3: load and transpose LSE_partial from smem -> rmem	COMMENT
LOW⚡	hopper/flash_fwd_combine_kernel.h	345	// Step 4: compute the final LSE along the split dimension	COMMENT
LOW	hopper/flash_fwd_combine_kernel.h	394	// Step 5: store final LSE back to gmem	COMMENT
LOW	hopper/flash_fwd_combine_kernel.h	417	// Step 6: read O_partial from gmem -> smem -> rmem and accumulate the final O	COMMENT
LOW	hopper/flash_fwd_combine_kernel.h	460	// Step 7: Write the final O to gmem	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	174	// Step 1: load dQaccum from gmem to smem	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	200	// Step 2: Load dQaccum from smem to register, then convert fp32 -> fp16/bf16	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	218	// Step 3: Copy dQ from register to smem	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	229	// Step 4: Copy dQ from smem to register to prepare for coalesced write to gmem	COMMENT
LOW	hopper/flash_bwd_postprocess_kernel.h	247	// Step 5: Copy dQ from register to gmem	COMMENT
LOW	hopper/epilogue_fwd.hpp	251	// Step 1: Write O from rmem -> smem	COMMENT
LOW	hopper/epilogue_fwd.hpp	281	// Step 2: Write LSE from rmem -> gmem	COMMENT
LOW	hopper/epilogue_fwd.hpp	310	// Step 3: Write O from smem -> gmem	COMMENT
LOW⚡	flash_attn/cute/flash_bwd.py	256	# Do we need to check if we overshot kBlockM when we load Q?	COMMENT
LOW⚡	flash_attn/cute/flash_bwd.py	258	# Do we need to check if we overshot kBlockN when we load K?	COMMENT
LOW⚡	flash_attn/cute/flash_bwd.py	265	# Do we need to check if we overshot kBlockN when we load V?	COMMENT
LOW⚡	flash_attn/cute/flash_bwd_postprocess.py	492	# Step 1: load dQaccum from gmem to smem	COMMENT
LOW⚡	flash_attn/cute/flash_bwd_postprocess.py	501	# Step 2: load dQ from smem to rmem	COMMENT
LOW	flash_attn/cute/flash_bwd_postprocess.py	534	# Step 3: Copy dQ from register to smem	COMMENT
LOW⚡	flash_attn/cute/flash_bwd_postprocess.py	568	# Step 4: Copy dQ from smem to register to prepare for coalesced write to gmem	COMMENT
LOW⚡	flash_attn/cute/flash_bwd_postprocess.py	577	# Step 5: Copy dQ from register to gmem	COMMENT
LOW	flash_attn/cute/flash_fwd.py	496	# Do we need to check if we overshoot kBlockN when we load K?	COMMENT
LOW	flash_attn/cute/flash_fwd.py	542	# Do we need to check if we overshoot kBlockN when we load V?	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	405	# Step 1: Load LSE_partial from gmem to shared memory	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	442	# Step 2: Load O_partial for pipeline stages	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	495	# Step 3: Load and transpose LSE from smem to registers	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	513	# Step 4: Compute final LSE along split dimension	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	573	# Step 5: Store final LSE to gmem	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	595	# Step 6: Read O_partial and accumulate final O	COMMENT
LOW⚡	flash_attn/cute/flash_fwd_combine.py	642	# Step 7: Write final O to gmem	COMMENT

Cross-Language Confusion9 hits · 42 pts

Severity	File	Line	Snippet	Context
HIGH	tools/ci/run_fa4_ci.py	165	'"$SP"/quack "$SP"/quack_kernels* 2>/dev/null \|\| true'	CODE
HIGH⚡	hopper/setup.py	308	blocks.append(cuda_compile_rule) # type: ignore[possibly-undefined]	STRING
HIGH⚡	hopper/setup.py	309	blocks.append(cuda_compile_rule_sm80) # type: ignore[possibly-undefined]	STRING
HIGH⚡	hopper/setup.py	310	blocks.append(cuda_compile_rule_sm80_sm90) # type: ignore[possibly-undefined]	STRING
HIGH⚡	hopper/setup.py	311	blocks.append(cuda_compile_rule_sm100) # type: ignore[possibly-undefined]	STRING
HIGH	tests/cute/test_flash_attn_combine.py	183	# Only compare valid positions (beyond seqused, output is undefined)	COMMENT
HIGH	AI/parse_clc_log.py	246	let selectedSm = null;	CODE
HIGH	AI/parse_clc_log.py	309	if (id === query \|\| id.includes(query)) {{	CODE
HIGH	AI/parse_clc_log.py	326	selectedSm = null;	CODE

Modern AI Meta-Vocabulary11 hits · 23 pts

Severity	File	Line	Snippet	Context
MEDIUM	README.md	297	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	hopper/flash_attn_interface.py	964	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	hopper/flash_attn_interface.py	1117	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	training/src/datamodules/language_modeling_hf.py	118	# However, it's useful for zero-shot transfer from Openwebtext,	COMMENT
MEDIUM	flash_attn/flash_attn_interface.py	1024	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	flash_attn/flash_attn_interface.py	1084	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	flash_attn/flash_attn_interface.py	1163	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	flash_attn/flash_attn_interface.py	1240	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	flash_attn/flash_attn_interface.py	1309	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	flash_attn/flash_attn_interface.py	1402	window_size=(-1, -1), # -1 means infinite context window	CODE
MEDIUM	flash_attn/flash_attn_interface.py	1499	window_size=(-1, -1), # -1 means infinite context window	CODE

AI Slop Vocabulary11 hits · 19 pts

Severity	File	Line	Snippet	Context
LOW	hopper/generate_kernels.py	134	# so we should just pass in packgqa=False to avoid the `_packgqa` in the filename.	STRING
LOW⚡	tests/cute/score_mod_definitions.py	478	# Don't read from aux_tensors at all - just add the global index as bias	COMMENT
MEDIUM	tests/cute/test_mask_mod.py	6	# (identity, document, block_diagonal, etc.) with comprehensive seqlen coverage	COMMENT
LOW	flash_attn/flash_attn_triton.py	145	# [2022-10-30] TD: Triton bug - in the case of EVEN_M=True and EVEN_N=False, if we just call	COMMENT
LOW	flash_attn/flash_attn_triton.py	347	# if we just call tl.store(dv_ptrs), there's a race condition	COMMENT
LOW	flash_attn/flash_attn_triton.py	442	# if we just call tl.load(k_ptrs), we get the wrong output!	COMMENT
LOW	flash_attn/flash_attn_triton.py	521	# [2022-11-01] TD: Triton bug, there's a race condition if we just use m_mask and not d_mask.	COMMENT
LOW	flash_attn/cute/flash_bwd_postprocess.py	187	# We can't just use kHeadDim here. E.g. if MMA shape is 64 x 96 but split across 2 WGs,	COMMENT
MEDIUM	flash_attn/cute/flash_fwd_sm100.py	712	# CLC buffers placed here to utilize padding before sO's 1024-byte alignment.	COMMENT
LOW	flash_attn/cute/flash_fwd_sm90.py	1392	# 2 elements. So we just call ptx directly.	COMMENT
LOW	flash_attn/cute/flash_fwd_sm90.py	1464	# 2 elements. So we just call ptx directly.	COMMENT

Excessive Try-Catch Wrapping17 hits · 18 pts

Severity	File	Line	Snippet	Context
LOW	setup.py	186	except Exception as e:	CODE
MEDIUM	setup.py	180	def detect_hipify_v2():	CODE
LOW	tests/cute/benchmark_block_sparsity.py	83	except Exception as e:	CODE
LOW	tests/cute/benchmark_block_sparsity.py	190	except Exception as e:	CODE
LOW	tests/cute/benchmark_block_sparsity.py	375	except Exception as e:	CODE
LOW	benchmarks/tune_ex2_emu.py	307	except Exception as e:	CODE
LOW	benchmarks/tune_ex2_emu.py	370	except Exception as e:	CODE
LOW	benchmarks/bench_sm90.py	126	except Exception as e:	CODE
LOW	benchmarks/bench_sm90.py	165	except Exception as e:	CODE
LOW	benchmarks/bench_sm90.py	175	except Exception as e:	CODE
LOW	benchmarks/benchmark_flash_attention.py	119	except Exception:	CODE
LOW	benchmarks/benchmark_flash_attention.py	134	except Exception:	CODE
LOW	benchmarks/benchmark_flash_attention.py	141	except Exception:	CODE
LOW	flash_attn/cute/cute_dsl_ptxas.py	93	except Exception as e:	CODE
LOW	flash_attn/cute/benchmark_flash_attention_fp8.py	330	except Exception as e:	CODE
LOW	flash_attn/cute/benchmark_flash_attention_fp8.py	402	except Exception as e:	CODE
LOW	flash_attn/cute/utils.py	83	except Exception:	CODE

Redundant / Tautological Comments9 hits · 12 pts

Severity	File	Line	Snippet	Context
LOW	setup.py	522	# Check if torch is using hipify v2. Until CK is updated with HIPIFY_V2 macro,	COMMENT
LOW	hopper/setup.py	401	# Set timeout to 300 seconds to prevent the request from hanging forever.	STRING
LOW	tests/cute/test_utils.py	204	# Set __cute_hash__ to simulate Inductor-generated code	COMMENT
LOW	tests/cute/test_block_sparsity.py	111	# Check if ref skipped it entirely (all masked)	COMMENT
LOW	flash_attn/cute/flash_bwd.py	138	# Check if block size setting is out of shared memory capacity	COMMENT
LOW	flash_attn/cute/compute_block_sparsity.py	382	# Check if mask_mod is marked as suitable for 5-point sampling	COMMENT
LOW	flash_attn/cute/interface.py	2398	# Check if configuration can be implemented	COMMENT
LOW	flash_attn/cute/flash_fwd.py	159	# Check if block size setting is out of shared memory capacity	COMMENT
LOW	flash_attn/cute/flash_fwd.py	172	# Check if twice the block size is divisible by the number of threads	COMMENT

Self-Referential Comments4 hits · 12 pts

Severity	File	Line	Snippet	Context
MEDIUM	hopper/generate_kernels.py	3	# This file is run to generate the kernel instantiations for the flash_attn kernels	COMMENT
MEDIUM	tests/cute/test_flash_attn_combine.py	254	# Create a permuted batch index mapping: virtual batch -> real batch	COMMENT
MEDIUM	tests/models/test_llama.py	578	# Create a shared test model.	COMMENT
MEDIUM	flash_attn/modules/embedding.py	137	# Create a mask of valid vocab ids (1 means it needs to be masked).	COMMENT

Modern Structural Boilerplate6 hits · 6 pts

Severity	File	Line	Snippet	Context
LOW	training/src/metrics/perplexity.py	18	__all__ = ['Perplexity']	CODE
LOW	.github/scripts/bump_beta_tag.py	46	def set_github_output(key: str, value: str) -> None:	CODE
LOW	flash_attn/cute/__init__.py	15	__all__ = [	CODE
LOW	flash_attn/cute/fa_logging.py	77	def set_fa_log_level(level: int \| str) -> None:	CODE
LOW	flash_attn/models/bert.py	54	logger = logging.getLogger(__name__)	CODE
LOW	flash_attn/models/gpt.py	59	logger = logging.getLogger(__name__)	CODE

Example Usage Blocks3 hits · 4 pts

Severity	File	Line	Snippet	Context
LOW	tools/ci/build_sif.sh	4	# Usage:	COMMENT
LOW	tests/cute/test_mask_mod_varlen.py	8	# Usage:	COMMENT
LOW	tests/cute/test_mask_mod.py	10	# Usage:	COMMENT

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings