quickwit-oss/tantivy

3.6

Adjusted Score

3.6

Raw Score

100%

Time Factor

2026-07-13

Last Push

15.5K

Stars

Rust

Language

153.4K

Lines of Code

490

Files

493

Pattern Hits

2026-07-14

Scan Date

0.00

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 0HIGH 0MEDIUM 16LOW 477

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 493 distinct pattern matches across 7 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Over-Commented Block449 hits · 449 pts

Severity	File	Line	Snippet	Context
LOW	cliff.toml	61		COMMENT
LOW	columnar/src/dynamic_column.rs	241	pub fn open(&self) -> io::Result<DynamicColumn> {	COMMENT
LOW	columnar/src/lib.rs	1	//! # Tantivy-Columnar	COMMENT
LOW	columnar/src/value.rs	101	1 => Ok(NumericalType::U64),	COMMENT
LOW	columnar/src/dictionary.rs	21		COMMENT
LOW	columnar/src/columnar/merge/mod.rs	61	/// If several columns with the same name are conflicting with the numerical types in the	COMMENT
LOW	columnar/src/columnar/merge/tests.rs	41	let columnar2 = make_columnar("numbers", &[2u64]);	COMMENT
LOW	columnar/src/column_index/multivalued_index.rs	101	}	COMMENT
LOW	columnar/src/column_index/multivalued_index.rs	241	}	COMMENT
LOW	columnar/src/column_index/multivalued_index.rs	281	pub fn num_docs(&self) -> u32 {	COMMENT
LOW	columnar/src/column_index/mod.rs	141	row_start..row_end	COMMENT
LOW	columnar/src/column_index/optional_index/mod.rs	41	pub fn num_bytes_in_block(&self) -> u32 {	COMMENT
LOW	columnar/src/column_index/optional_index/mod.rs	61	/// block]	COMMENT
LOW	columnar/src/column_values/monotonic_mapping.rs	21	fn from_u64(val: u64) -> Self;	COMMENT
LOW	columnar/src/column_values/mod.rs	41		COMMENT
LOW	columnar/src/column_values/mod.rs	141	if value_range.contains(&val) {	COMMENT
LOW	…values/u128_based/compact_space/build_compact_space.rs	101	let saved_bits = (amplitude_bits - amplitude_new_bits) as usize * total_num_values as usize;	COMMENT
LOW	…mnar/src/column_values/u128_based/compact_space/mod.rs	1	/// This codec takes a large number space (u128) and reduces it to a compact number space.	COMMENT
LOW	columnar/src/column_values/u64_based/line.rs	41	// This is outside of realm we handle.	COMMENT
LOW	columnar/src/column_values/u64_based/line.rs	81	return Line::default();	COMMENT
LOW	columnar/src/column_values/u64_based/line.rs	101	// Without sorting our values, this is a difficult problem.	COMMENT
LOW	columnar/src/column_values/u64_based/mod.rs	21	use crate::iterable::Iterable;	COMMENT
LOW	query-grammar/src/query_grammar.rs	1061	fn ast(inp: &str) -> IResult<&str, UserInputAst> {	COMMENT
LOW	sstable/src/block_match_automaton.rs	41	for kb in &start_key[0..common_prefix_len] {	COMMENT
LOW	sstable/src/block_match_automaton.rs	61	// e.* \|	COMMENT
LOW	sstable/src/block_match_automaton.rs	81	// p	COMMENT
LOW	sstable/src/lib.rs	1	//! `tantivy_sstable` is a crate that provides a sorted string table data structure.	COMMENT
LOW	sstable/src/lib.rs	21	//! builder.insert(b"banana", &2).unwrap();	COMMENT
LOW	sstable/src/streamer.rs	261	/// If the end of the stream as been reached, and `.next()`	COMMENT
LOW	sstable/src/dictionary.rs	21	/// to any kind of typed values.	COMMENT
LOW	sstable/src/dictionary.rs	161	block_addr: BlockAddr,	COMMENT
LOW	sstable/src/dictionary.rs	421	/// lower_bound: Bound::Included(aaa) => Included(0) // "Next" term id	COMMENT
LOW	sstable/src/index/v3.rs	561	max_slope_idx = index;	COMMENT
LOW	stacker/src/fastcmp.rs	1	/// fastcmp employs a trick to speed up the comparison of two slices of bytes.	COMMENT
LOW	stacker/src/arena_hashmap.rs	1	use super::{Addr, MemoryArena};	COMMENT
LOW	stacker/src/arena_hashmap.rs	61	}	COMMENT
LOW	stacker/src/expull.rs	1	use std::mem;	COMMENT
LOW	stacker/src/expull.rs	21	/// It combines the idea of the unrolled linked list and tries to address the	COMMENT
LOW	stacker/src/shared_arena_hashmap.rs	41	fn is_empty(&self) -> bool {	COMMENT
LOW	stacker/src/shared_arena_hashmap.rs	281	let v = memory_arena.read(val_addr);	COMMENT
LOW	stacker/src/memory_arena.rs	1	//! 32-bits Memory arena for types implementing `Copy`.	COMMENT
LOW	stacker/src/memory_arena.rs	21	//! access them as references.	COMMENT
LOW	tokenizer-api/src/lib.rs	1	//! Tokenizer are in charge of chopping text into a stream of tokens	COMMENT
LOW	common/src/group_by.rs	1	use std::cell::RefCell;	COMMENT
LOW	common/src/lib.rs	41	fn len(&self) -> usize {	COMMENT
LOW	common/src/lib.rs	61	///	COMMENT
LOW	common/src/lib.rs	81	/// For simplicity, tantivy internally handles `f64` as `u64`.	COMMENT
LOW	common/src/file_slice.rs	281		COMMENT
LOW	common/src/datetime.rs	21	Milliseconds,	COMMENT
LOW	common/src/bitset.rs	161	// `trailing_zeros` and the bit-clear in parallel instead of	COMMENT
LOW	examples/index_from_multiple_threads.rs	1	// # Indexing from different threads.	COMMENT
LOW	examples/iterating_docs_and_positions.rs	1	// # Iterating docs and positions.	COMMENT
LOW	examples/iterating_docs_and_positions.rs	41	// (Because we indexed a very small number of documents over one thread	COMMENT
LOW	examples/iterating_docs_and_positions.rs	101	// and the [`Postings`](https://docs.rs/tantivy/~0/tantivy/trait.Postings.html) trait	COMMENT
LOW	examples/deleting_updating_documents.rs	1	// # Deleting and Updating (?) documents	COMMENT
LOW	examples/deleting_updating_documents.rs	41		COMMENT
LOW	examples/deleting_updating_documents.rs	101	// # Update = Delete + Insert	COMMENT
LOW	examples/faceted_search.rs	1	// # Faceted Search	COMMENT
LOW	examples/aggregation.rs	21	// category, stock and price will be fast fields as that's the requirement	COMMENT
LOW	examples/aggregation.rs	181	// In this Aggregation we want to get the average price for different groups, depending on how	COMMENT
389 more matches not shown…

Structural Annotation Overuse21 hits · 44 pts

Severity	File	Line	Snippet	Context
LOW⚡	.claude/skills/simple-pr/SKILL.md	11	## Step 1: Check workspace state	COMMENT
LOW⚡	.claude/skills/simple-pr/SKILL.md	19	## Step 2: Ensure main is up to date	COMMENT
LOW⚡	.claude/skills/simple-pr/SKILL.md	25	## Step 3: Review staged changes	COMMENT
LOW⚡	.claude/skills/simple-pr/SKILL.md	31	## Step 4: Generate commit message	COMMENT
LOW⚡	.claude/skills/simple-pr/SKILL.md	37	## Step 5: Create a new branch	COMMENT
LOW⚡	.claude/skills/simple-pr/SKILL.md	45	## Step 6: Commit changes	COMMENT
LOW⚡	.claude/skills/simple-pr/SKILL.md	52	## Step 7: Push and open a PR	COMMENT
LOW⚡	.claude/skills/rationalize-deps/SKILL.md	18	## Step 1: Identify the target	COMMENT
LOW⚡	.claude/skills/rationalize-deps/SKILL.md	25	## Step 2: Analyze current dependencies	COMMENT
LOW⚡	.claude/skills/rationalize-deps/SKILL.md	33	## Step 3: For each candidate dependency	COMMENT
LOW	.claude/skills/rationalize-deps/SKILL.md	76	## Step 4: Document findings	COMMENT
LOW	.claude/skills/rationalize-deps/SKILL.md	84	## Step 5: Verify full build	COMMENT
LOW⚡	.claude/skills/update-changelog/SKILL.md	10	## Step 1: Determine the changelog scope	COMMENT
LOW⚡	.claude/skills/update-changelog/SKILL.md	16	## Step 2: Find merged PRs not yet in the changelog	COMMENT
LOW⚡	.claude/skills/update-changelog/SKILL.md	26	## Step 3: Consolidate related PRs	COMMENT
LOW	.claude/skills/update-changelog/SKILL.md	38	## Step 4: Review the actual code diff	COMMENT
LOW	.claude/skills/update-changelog/SKILL.md	49	## Step 5: Categorize each PR group	COMMENT
LOW	.claude/skills/update-changelog/SKILL.md	61	## Step 6: Format entries	COMMENT
LOW⚡	.claude/skills/update-changelog/SKILL.md	75	## Step 7: Present changes to the user	COMMENT
LOW⚡	.claude/skills/update-changelog/SKILL.md	79	## Step 8: Update CHANGELOG.md	COMMENT
LOW⚡	.claude/skills/update-changelog/SKILL.md	85	## Step 9: Verify	COMMENT

Self-Referential Comments12 hits · 36 pts

Severity	File	Line	Snippet	Context
MEDIUM	examples/snippet.rs	22	// # Defining the schema	COMMENT
MEDIUM	examples/json_field.rs	13	// # Defining the schema	COMMENT
MEDIUM	examples/index_from_multiple_threads.rs	36	// # Defining the schema	COMMENT
MEDIUM	examples/deleting_updating_documents.rs	43	// # Defining the schema	COMMENT
MEDIUM	examples/basic_search.rs	26	// # Defining the schema	COMMENT
MEDIUM	examples/ip_field.rs	12	// # Defining the schema	COMMENT
MEDIUM	examples/custom_collector.rs	123	// # Defining the schema	COMMENT
MEDIUM	examples/custom_tokenizer.rs	1	// # Defining a tokenizer pipeline	COMMENT
MEDIUM	examples/custom_tokenizer.rs	12	// # Defining the schema	COMMENT
MEDIUM	examples/fuzzy_search.rs	25	// # Defining the schema	COMMENT
MEDIUM	examples/date_time_field.rs	11	// # Defining the schema	COMMENT
MEDIUM	src/query/fuzzy_query.rs	194	// # Defining the schema	COMMENT

AI Slop Vocabulary3 hits · 9 pts

Severity	File	Line	Snippet	Context
MEDIUM	src/aggregation/bucket/term_agg/term_histogram.rs	233	// value count. (Essentially always true here: the column is full, so its value count	COMMENT
MEDIUM	src/schema/mod.rs	101	//! Some queries may leverage Fast fields when run on a field that is not indexed. This can be	COMMENT
MEDIUM	src/schema/text_options.rs	192	/// Essentially, should we store the term frequency and/or the positions (See	COMMENT

Fake / Example Data6 hits · 6 pts

Severity	File	Line	Snippet	Context
LOW	src/functional_test.rs	132	const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod \	CODE
LOW	src/functional_test.rs	132	const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod \	CODE
LOW	src/indexer/index_writer.rs	844	const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \	CODE
LOW	src/indexer/index_writer.rs	844	const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \	CODE
LOW	src/store/mod.rs	68	const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \	CODE
LOW	src/store/mod.rs	68	const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \	CODE

Decorative Section Separators1 hit · 3 pts

Severity	File	Line	Snippet	Context
MEDIUM	src/postings/recorder.rs	408	// ── TermFrequencyRecorder ─────────────────────────────────────────────────	COMMENT

Verbosity Indicators1 hit · 2 pts

Severity	File	Line	Snippet	Context
LOW	src/query/phrase_query/phrase_scorer.rs	570	// So the cost estimation would be the number of times we need to check if a doc is a hit *	COMMENT

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings