📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
73 matches across 11 categories. Click a row to expand file-level details.
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | pageindex/page_index.py | 48 | async def check_title_appearance_in_start(title, page_text, model=None, logger=None): |
| LOW | pageindex/page_index.py | 74 | async def check_title_appearance_in_start_concurrent(structure, page_list, model=None, logger=None): |
| LOW | pageindex/page_index.py | 125 | def check_if_toc_extraction_is_complete(content, toc, model=None): |
| LOW | pageindex/page_index.py | 143 | def check_if_toc_transformation_is_complete(content, toc, model=None): |
| LOW | pageindex/page_index.py | 379 | def extract_matching_page_pairs(toc_page, toc_physical_index, start_page_index): |
| LOW | pageindex/page_index.py | 416 | def add_page_offset_to_toc_json(data, offset): |
| LOW | pageindex/page_index.py | 494 | def remove_first_physical_index_section(text): |
| LOW | pageindex/page_index.py | 597 | def process_toc_no_page_numbers(toc_content, toc_page_list, page_list, start_index=1, model=None, logger=None): |
| LOW | pageindex/page_index.py | 622 | def process_toc_with_page_numbers(toc_content, toc_page_list, page_list, toc_check_page_num=None, model=None, logger=Non |
| LOW | pageindex/page_index.py | 656 | def process_none_page_numbers(toc_items, page_list, start_index=1, model=None): |
| LOW | pageindex/page_index.py | 740 | async def single_toc_item_index_fixer(section_title, content, model=None): |
| LOW | pageindex/page_index.py | 878 | async def fix_incorrect_toc_with_retries(toc_with_page_number, page_list, incorrect_results, start_index=1, max_attempts |
| LOW | pageindex/page_index.py | 1000 | async def process_large_node_recursively(node, page_list, opt=None, logger=None): |
| LOW | pageindex/page_index.py | 1124 | def validate_and_truncate_physical_indices(toc_with_page_number, page_list_length, start_index=1, logger=None): |
| LOW | pageindex/client.py | 18 | def _normalize_retrieve_model(model: str) -> str: |
| LOW | pageindex/utils.py | 248 | def get_first_start_page_from_text(text): |
| LOW | pageindex/utils.py | 255 | def get_last_start_page_from_text(text): |
| LOW | pageindex/utils.py | 420 | def get_text_of_pdf_pages_with_labels(pdf_pages, start_page, end_page): |
| LOW | pageindex/utils.py | 518 | def convert_physical_index_to_int(data): |
| LOW | pageindex/utils.py | 565 | def add_node_text_with_labels(node, pdf_pages): |
| LOW | pageindex/utils.py | 589 | async def generate_summaries_for_structure(structure, model=None): |
| LOW | pageindex/utils.py | 599 | def create_clean_structure_for_description(structure): |
| LOW | pageindex/page_index_md.py | 19 | async def generate_summaries_for_structure_md(structure, summary_token_threshold, model=None): |
| LOW | pageindex/page_index_md.py | 32 | def extract_nodes_from_markdown(markdown_content): |
| LOW | pageindex/page_index_md.py | 62 | def extract_node_text_content(node_list, markdown_lines): |
| LOW | pageindex/page_index_md.py | 89 | def update_node_list_with_text_token_count(node_list, model=None): |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | run_pageindex.py | 4 | |
| LOW | examples/agentic_vectorless_rag_demo.py | 31 | |
| LOW | pageindex/page_index.py | 7 | |
| LOW | pageindex/page_index.py | 9 | |
| LOW | pageindex/page_index.py | 9 | |
| LOW | pageindex/__init__.py | 1 | |
| LOW | pageindex/__init__.py | 2 | |
| LOW | pageindex/__init__.py | 3 | |
| LOW | pageindex/__init__.py | 3 | |
| LOW | pageindex/__init__.py | 3 | |
| LOW | pageindex/__init__.py | 4 | |
| LOW | pageindex/page_index_md.py | 6 | |
| LOW | pageindex/page_index_md.py | 8 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | examples/agentic_vectorless_rag_demo.py | 55 | |
| LOW | examples/agentic_vectorless_rag_demo.py | 89 | |
| LOW | pageindex/page_index.py | 341 | |
| LOW | pageindex/page_index.py | 379 | |
| LOW | pageindex/page_index.py | 656 | |
| LOW | pageindex/page_index.py | 696 | |
| LOW | pageindex/page_index.py | 1124 | |
| LOW | pageindex/client.py | 55 | |
| LOW | pageindex/utils.py | 32 | |
| LOW | pageindex/utils.py | 173 | |
| LOW | pageindex/utils.py | 191 | |
| LOW | pageindex/utils.py | 387 | |
| LOW | pageindex/utils.py | 518 | |
| LOW | pageindex/utils.py | 193 | |
| LOW | pageindex/page_index_md.py | 135 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | pageindex/utils.py | 267 | # In Linux, only '/' and '\0' (null) are invalid in filenames. |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | pageindex/page_index.py | 1095 | # Create a clean structure without unnecessary fields for description generation |
| MEDIUM | pageindex/page_index_md.py | 280 | # Create a clean structure without unnecessary fields for description generation |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | pageindex/page_index.py | 771 | # Check if list_index is valid |
| LOW | pageindex/page_index.py | 825 | # Check if the result is correct |
| LOW | pageindex/utils.py | 212 | # Check if the node is a leaf node |
| LOW | pageindex/utils.py | 521 | # Check if item is a dictionary and has 'physical_index' key |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | pageindex/utils.py | 49 | except Exception as e: |
| LOW | pageindex/utils.py | 75 | except Exception as e: |
| LOW | pageindex/utils.py | 128 | except Exception as e: |
| MEDIUM | pageindex/utils.py | 99 | def extract_json(content): |
| LOW | pageindex/retrieve.py | 134 | except Exception as e: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | pageindex/retrieve.py | 10 | # ── Helpers ────────────────────────────────────────────────────────────────── |
| MEDIUM | pageindex/retrieve.py | 79 | # ── Tool functions ──────────────────────────────────────────────────────────── |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | examples/agentic_vectorless_rag_demo.py | 158 | # Step 1: Index PDF and view tree structure |
| LOW | examples/agentic_vectorless_rag_demo.py | 175 | # Step 2: View document metadata |
| LOW | examples/agentic_vectorless_rag_demo.py | 182 | # Step 3: Agent Query |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | …es/workspace/12345678-abcd-4321-abcd-123456789abc.json | 263 | "content": "Attention ResidualsTECHNICALREPORT\n[52] Ashish Vaswani et al. “Attention is All you Need”. In:Advance |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | .github/scripts/comment-on-duplicates.sh | 5 | # Usage: |