Repository Analysis

VectifyAI/PageIndex

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

10.8 Low AI signal View on GitHub
10.8
Adjusted Score
10.8
Raw Score
100%
Time Factor
2026-05-30
Last Push
32,329
Stars
Python
Language
8,364
Lines of Code
37
Files
73
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 1MEDIUM 6LOW 66

Pattern Findings

73 matches across 11 categories. Click a row to expand file-level details.

Hyper-Verbose Identifiers26 hits · 25 pts
SeverityFileLineSnippet
LOWpageindex/page_index.py48async def check_title_appearance_in_start(title, page_text, model=None, logger=None):
LOWpageindex/page_index.py74async def check_title_appearance_in_start_concurrent(structure, page_list, model=None, logger=None):
LOWpageindex/page_index.py125def check_if_toc_extraction_is_complete(content, toc, model=None):
LOWpageindex/page_index.py143def check_if_toc_transformation_is_complete(content, toc, model=None):
LOWpageindex/page_index.py379def extract_matching_page_pairs(toc_page, toc_physical_index, start_page_index):
LOWpageindex/page_index.py416def add_page_offset_to_toc_json(data, offset):
LOWpageindex/page_index.py494def remove_first_physical_index_section(text):
LOWpageindex/page_index.py597def process_toc_no_page_numbers(toc_content, toc_page_list, page_list, start_index=1, model=None, logger=None):
LOWpageindex/page_index.py622def process_toc_with_page_numbers(toc_content, toc_page_list, page_list, toc_check_page_num=None, model=None, logger=Non
LOWpageindex/page_index.py656def process_none_page_numbers(toc_items, page_list, start_index=1, model=None):
LOWpageindex/page_index.py740async def single_toc_item_index_fixer(section_title, content, model=None):
LOWpageindex/page_index.py878async def fix_incorrect_toc_with_retries(toc_with_page_number, page_list, incorrect_results, start_index=1, max_attempts
LOWpageindex/page_index.py1000async def process_large_node_recursively(node, page_list, opt=None, logger=None):
LOWpageindex/page_index.py1124def validate_and_truncate_physical_indices(toc_with_page_number, page_list_length, start_index=1, logger=None):
LOWpageindex/client.py18def _normalize_retrieve_model(model: str) -> str:
LOWpageindex/utils.py248def get_first_start_page_from_text(text):
LOWpageindex/utils.py255def get_last_start_page_from_text(text):
LOWpageindex/utils.py420def get_text_of_pdf_pages_with_labels(pdf_pages, start_page, end_page):
LOWpageindex/utils.py518def convert_physical_index_to_int(data):
LOWpageindex/utils.py565def add_node_text_with_labels(node, pdf_pages):
LOWpageindex/utils.py589async def generate_summaries_for_structure(structure, model=None):
LOWpageindex/utils.py599def create_clean_structure_for_description(structure):
LOWpageindex/page_index_md.py19async def generate_summaries_for_structure_md(structure, summary_token_threshold, model=None):
LOWpageindex/page_index_md.py32def extract_nodes_from_markdown(markdown_content):
LOWpageindex/page_index_md.py62def extract_node_text_content(node_list, markdown_lines):
LOWpageindex/page_index_md.py89def update_node_list_with_text_token_count(node_list, model=None):
Unused Imports13 hits · 13 pts
SeverityFileLineSnippet
LOWrun_pageindex.py4
LOWexamples/agentic_vectorless_rag_demo.py31
LOWpageindex/page_index.py7
LOWpageindex/page_index.py9
LOWpageindex/page_index.py9
LOWpageindex/__init__.py1
LOWpageindex/__init__.py2
LOWpageindex/__init__.py3
LOWpageindex/__init__.py3
LOWpageindex/__init__.py3
LOWpageindex/__init__.py4
LOWpageindex/page_index_md.py6
LOWpageindex/page_index_md.py8
Deep Nesting15 hits · 12 pts
SeverityFileLineSnippet
LOWexamples/agentic_vectorless_rag_demo.py55
LOWexamples/agentic_vectorless_rag_demo.py89
LOWpageindex/page_index.py341
LOWpageindex/page_index.py379
LOWpageindex/page_index.py656
LOWpageindex/page_index.py696
LOWpageindex/page_index.py1124
LOWpageindex/client.py55
LOWpageindex/utils.py32
LOWpageindex/utils.py173
LOWpageindex/utils.py191
LOWpageindex/utils.py387
LOWpageindex/utils.py518
LOWpageindex/utils.py193
LOWpageindex/page_index_md.py135
Cross-Language Confusion1 hit · 8 pts
SeverityFileLineSnippet
HIGHpageindex/utils.py267 # In Linux, only '/' and '\0' (null) are invalid in filenames.
Self-Referential Comments2 hits · 6 pts
SeverityFileLineSnippet
MEDIUMpageindex/page_index.py1095 # Create a clean structure without unnecessary fields for description generation
MEDIUMpageindex/page_index_md.py280 # Create a clean structure without unnecessary fields for description generation
Redundant / Tautological Comments4 hits · 6 pts
SeverityFileLineSnippet
LOWpageindex/page_index.py771 # Check if list_index is valid
LOWpageindex/page_index.py825 # Check if the result is correct
LOWpageindex/utils.py212 # Check if the node is a leaf node
LOWpageindex/utils.py521 # Check if item is a dictionary and has 'physical_index' key
Excessive Try-Catch Wrapping5 hits · 6 pts
SeverityFileLineSnippet
LOWpageindex/utils.py49 except Exception as e:
LOWpageindex/utils.py75 except Exception as e:
LOWpageindex/utils.py128 except Exception as e:
MEDIUMpageindex/utils.py99def extract_json(content):
LOWpageindex/retrieve.py134 except Exception as e:
Decorative Section Separators2 hits · 6 pts
SeverityFileLineSnippet
MEDIUMpageindex/retrieve.py10# ── Helpers ──────────────────────────────────────────────────────────────────
MEDIUMpageindex/retrieve.py79# ── Tool functions ────────────────────────────────────────────────────────────
Verbosity Indicators3 hits · 4 pts
SeverityFileLineSnippet
LOWexamples/agentic_vectorless_rag_demo.py158 # Step 1: Index PDF and view tree structure
LOWexamples/agentic_vectorless_rag_demo.py175 # Step 2: View document metadata
LOWexamples/agentic_vectorless_rag_demo.py182 # Step 3: Agent Query
AI Slop Vocabulary1 hit · 2 pts
SeverityFileLineSnippet
MEDIUM…es/workspace/12345678-abcd-4321-abcd-123456789abc.json263 "content": "Attention ResidualsTECHNICALREPORT\n[52] Ashish Vaswani et al. “Attention is All you Need”. In:Advance
Example Usage Blocks1 hit · 2 pts
SeverityFileLineSnippet
LOW.github/scripts/comment-on-duplicates.sh5# Usage: