Repository Analysis

quickwit-oss/tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

3.2 Likely human-written View on GitHub
3.2
Adjusted Score
3.2
Raw Score
100%
Time Factor
2026-05-27
Last Push
15,292
Stars
Rust
Language
147,245
Lines of Code
487
Files
448
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 0MEDIUM 14LOW 434

Pattern Findings

448 matches across 5 categories. Click a row to expand file-level details.

Over-Commented Block427 hits · 427 pts
SeverityFileLineSnippet
LOWcliff.toml61
LOWcolumnar/src/dynamic_column.rs241 pub fn open(&self) -> io::Result<DynamicColumn> {
LOWcolumnar/src/lib.rs1//! # Tantivy-Columnar
LOWcolumnar/src/value.rs101 1 => Ok(NumericalType::U64),
LOWcolumnar/src/dictionary.rs21
LOWcolumnar/src/columnar/merge/mod.rs61/// input columnars, the first type compatible out of i64, u64, f64 in that order will be used.
LOWcolumnar/src/columnar/merge/tests.rs41 let columnar2 = make_columnar("numbers", &[2u64]);
LOWcolumnar/src/column_index/multivalued_index.rs101 }
LOWcolumnar/src/column_index/multivalued_index.rs241 }
LOWcolumnar/src/column_index/multivalued_index.rs281 pub fn num_docs(&self) -> u32 {
LOWcolumnar/src/column_index/mod.rs141 row_start..row_end
LOWcolumnar/src/column_index/optional_index/mod.rs41 pub fn num_bytes_in_block(&self) -> u32 {
LOWcolumnar/src/column_index/optional_index/mod.rs61/// block]
LOWcolumnar/src/column_values/monotonic_mapping.rs21 fn from_u64(val: u64) -> Self;
LOWcolumnar/src/column_values/mod.rs41
LOWcolumnar/src/column_values/mod.rs141 }
LOW…values/u128_based/compact_space/build_compact_space.rs101 let saved_bits = (amplitude_bits - amplitude_new_bits) as usize * total_num_values as usize;
LOW…mnar/src/column_values/u128_based/compact_space/mod.rs1/// This codec takes a large number space (u128) and reduces it to a compact number space.
LOWcolumnar/src/column_values/u64_based/line.rs41 // This is outside of realm we handle.
LOWcolumnar/src/column_values/u64_based/line.rs81 return Line::default();
LOWcolumnar/src/column_values/u64_based/line.rs101 // Without sorting our values, this is a difficult problem.
LOWcolumnar/src/column_values/u64_based/mod.rs21use crate::iterable::Iterable;
LOWsstable/src/block_match_automaton.rs41 for kb in &start_key[0..common_prefix_len] {
LOWsstable/src/block_match_automaton.rs61 // e.* |
LOWsstable/src/block_match_automaton.rs81 // p
LOWsstable/src/lib.rs1//! `tantivy_sstable` is a crate that provides a sorted string table data structure.
LOWsstable/src/lib.rs21//! builder.insert(b"banana", &2).unwrap();
LOWsstable/src/streamer.rs261 /// If the end of the stream as been reached, and `.next()`
LOWsstable/src/dictionary.rs21/// to any kind of typed values.
LOWsstable/src/dictionary.rs161 block_addr: BlockAddr,
LOWsstable/src/dictionary.rs421 /// lower_bound: Bound::Included(aaa) => Included(0) // "Next" term id
LOWsstable/src/index/v3.rs561 max_slope_idx = index;
LOWstacker/src/fastcmp.rs1/// fastcmp employs a trick to speed up the comparison of two slices of bytes.
LOWstacker/src/arena_hashmap.rs1use super::{Addr, MemoryArena};
LOWstacker/src/arena_hashmap.rs61 }
LOWstacker/src/expull.rs1use std::mem;
LOWstacker/src/expull.rs21/// It combines the idea of the unrolled linked list and tries to address the
LOWstacker/src/shared_arena_hashmap.rs41 fn is_empty(&self) -> bool {
LOWstacker/src/shared_arena_hashmap.rs281 let v = memory_arena.read(val_addr);
LOWstacker/src/memory_arena.rs1//! 32-bits Memory arena for types implementing `Copy`.
LOWstacker/src/memory_arena.rs21//! access them as references.
LOWtokenizer-api/src/lib.rs1//! Tokenizer are in charge of chopping text into a stream of tokens
LOWcommon/src/group_by.rs1use std::cell::RefCell;
LOWcommon/src/lib.rs41 fn len(&self) -> usize {
LOWcommon/src/lib.rs61///
LOWcommon/src/lib.rs81/// For simplicity, tantivy internally handles `f64` as `u64`.
LOWcommon/src/file_slice.rs281
LOWcommon/src/datetime.rs21 Milliseconds,
LOWcommon/src/bitset.rs161 // `trailing_zeros` and the bit-clear in parallel instead of
LOWexamples/index_from_multiple_threads.rs1// # Indexing from different threads.
LOWexamples/iterating_docs_and_positions.rs1// # Iterating docs and positions.
LOWexamples/iterating_docs_and_positions.rs41 // (Because we indexed a very small number of documents over one thread
LOWexamples/iterating_docs_and_positions.rs101 // and the [`Postings`](https://docs.rs/tantivy/~0/tantivy/trait.Postings.html) trait
LOWexamples/deleting_updating_documents.rs1// # Deleting and Updating (?) documents
LOWexamples/deleting_updating_documents.rs41
LOWexamples/deleting_updating_documents.rs101 // # Update = Delete + Insert
LOWexamples/faceted_search.rs1// # Faceted Search
LOWexamples/aggregation.rs21 // category, stock and price will be fast fields as that's the requirement
LOWexamples/aggregation.rs181 // In this Aggregation we want to get the average price for different groups, depending on how
LOWexamples/basic_search.rs1// # Basic Example
367 more matches not shown…
Self-Referential Comments12 hits · 36 pts
SeverityFileLineSnippet
MEDIUMexamples/snippet.rs22 // # Defining the schema
MEDIUMexamples/json_field.rs13 // # Defining the schema
MEDIUMexamples/index_from_multiple_threads.rs36 // # Defining the schema
MEDIUMexamples/deleting_updating_documents.rs43 // # Defining the schema
MEDIUMexamples/basic_search.rs26 // # Defining the schema
MEDIUMexamples/ip_field.rs12 // # Defining the schema
MEDIUMexamples/custom_collector.rs123 // # Defining the schema
MEDIUMexamples/custom_tokenizer.rs1// # Defining a tokenizer pipeline
MEDIUMexamples/custom_tokenizer.rs12 // # Defining the schema
MEDIUMexamples/fuzzy_search.rs25 // # Defining the schema
MEDIUMexamples/date_time_field.rs11 // # Defining the schema
MEDIUMsrc/query/fuzzy_query.rs194 // # Defining the schema
Fake / Example Data6 hits · 6 pts
SeverityFileLineSnippet
LOWsrc/functional_test.rs66const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod \
LOWsrc/functional_test.rs66const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod \
LOWsrc/indexer/index_writer.rs844 const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \
LOWsrc/indexer/index_writer.rs844 const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \
LOWsrc/store/mod.rs68 const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \
LOWsrc/store/mod.rs68 const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do \
AI Slop Vocabulary2 hits · 6 pts
SeverityFileLineSnippet
MEDIUMsrc/schema/mod.rs101//! Some queries may leverage Fast fields when run on a field that is not indexed. This can be
MEDIUMsrc/schema/text_options.rs192/// Essentially, should we store the term frequency and/or the positions (See
Verbosity Indicators1 hit · 2 pts
SeverityFileLineSnippet
LOWsrc/query/phrase_query/phrase_scorer.rs570 // So the cost estimation would be the number of times we need to check if a doc is a hit *