A fast, helpful, and open-source document parser
116 matches across 12 categories. Click a row to expand file-level details.
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | scripts/bump-version.py | 35 | |
| LOW | packages/python/liteparse/__init__.py | 1 | |
| LOW | packages/python/liteparse/__init__.py | 1 | |
| LOW | packages/python/liteparse/__init__.py | 2 | |
| LOW | packages/python/liteparse/__init__.py | 2 | |
| LOW | packages/python/liteparse/__init__.py | 2 | |
| LOW | packages/python/liteparse/__init__.py | 2 | |
| LOW | packages/python/liteparse/__init__.py | 2 | |
| LOW | packages/python/liteparse/__init__.py | 2 | |
| LOW | packages/python/liteparse/types.py | 3 | |
| LOW | packages/python/liteparse/types.py | 6 | |
| LOW | dataset_eval_utils/src/liteparse_eval/__init__.py | 3 | |
| LOW | dataset_eval_utils/src/liteparse_eval/__init__.py | 3 | |
| LOW | dataset_eval_utils/src/liteparse_eval/__init__.py | 3 | |
| LOW | dataset_eval_utils/src/liteparse_eval/__init__.py | 3 | |
| LOW | dataset_eval_utils/src/liteparse_eval/__init__.py | 3 | |
| LOW | dataset_eval_utils/src/liteparse_eval/__init__.py | 3 | |
| LOW | dataset_eval_utils/src/liteparse_eval/__init__.py | 3 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 1 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 1 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 1 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …et_eval_utils/src/liteparse_eval/providers/__init__.py | 2 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 1 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 2 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 3 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 4 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 5 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 6 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 7 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 8 | |
| LOW | …utils/src/liteparse_eval/providers/parsers/__init__.py | 9 | |
| LOW | …val_utils/src/liteparse_eval/providers/llm/__init__.py | 1 | |
| LOW | …val_utils/src/liteparse_eval/providers/llm/__init__.py | 1 | |
| LOW | …val_utils/src/liteparse_eval/providers/llm/__init__.py | 2 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | ocr/paddleocr/test_server.py | 48 | def test_server_health_endpoint(server: PaddleOCRServer) -> None: |
| LOW | ocr/paddleocr/test_server.py | 77 | def test_server_normalizes_documented_language_aliases( |
| LOW | ocr/easyocr/test_server.py | 38 | def test_server_health_endpoint(server: EasyOCRServer) -> None: |
| LOW | packages/python/tests/test_parse_e2e.py | 20 | def test_parse_returns_parse_result(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_parse_e2e.py | 24 | def test_parse_result_has_pages(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_parse_e2e.py | 29 | def test_parse_result_has_text(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_parse_e2e.py | 34 | def test_parse_result_has_json(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_parse_e2e.py | 52 | async def test_parse_async_bytes_input(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_parse_e2e.py | 135 | def test_multi_page_text_joined(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_parse_e2e.py | 162 | async def test_file_not_found_async(self, parser: LiteParse): |
| LOW | packages/python/tests/test_screenshot_e2e.py | 19 | def test_screenshot_returns_batch_result( |
| LOW | packages/python/tests/test_screenshot_e2e.py | 25 | def test_screenshot_has_screenshots(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_screenshot_e2e.py | 30 | def test_screenshot_result_fields(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_screenshot_e2e.py | 39 | def test_screenshot_output_dir(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_screenshot_e2e.py | 47 | def test_screenshot_png_format(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_screenshot_e2e.py | 52 | def test_screenshot_jpg_format(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_screenshot_e2e.py | 58 | async def test_screensho_async_basic(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_batch_e2e.py | 18 | def test_batch_parse_returns_batch_result( |
| LOW | packages/python/tests/test_batch_e2e.py | 32 | def test_batch_parse_creates_output_files( |
| LOW | packages/python/tests/test_batch_e2e.py | 48 | def test_batch_parse_json_format(self, parser: LiteParse, invoice_pdf: Path): |
| LOW | packages/python/tests/test_batch_e2e.py | 85 | async def test_input_dir_not_found_async(self, parser: LiteParse): |
| LOW | dataset_eval_utils/src/liteparse_eval/processing.py | 80 | def analyze_image_with_claude( |
| LOW | dataset_eval_utils/src/liteparse_eval/report.py | 378 | def _generate_navigation_html(self) -> str: |
| LOW | dataset_eval_utils/src/liteparse_eval/report.py | 411 | def _generate_all_documents_html(self) -> str: |
| LOW | dataset_eval_utils/src/liteparse_eval/report.py | 461 | def _generate_pdf_preview_html(self, pdf_path: Path) -> str: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | ocr/paddleocr/server.py | 91 | except Exception as e: |
| LOW | packages/python/liteparse/parser.py | 153 | except Exception as e: |
| LOW | packages/python/liteparse/parser.py | 199 | except Exception as e: |
| LOW | packages/python/liteparse/cli.py | 13 | except Exception as e: |
| MEDIUM | packages/python/liteparse/cli.py | 14 | print(f"Error: {e}", file=sys.stderr) |
| MEDIUM | packages/python/liteparse/cli.py | 8 | def main() -> None: |
| LOW | dataset_eval_utils/src/liteparse_eval/benchmark.py | 157 | except Exception as e: |
| LOW | dataset_eval_utils/src/liteparse_eval/benchmark.py | 170 | except Exception: |
| LOW | dataset_eval_utils/src/liteparse_eval/benchmark.py | 178 | except Exception as e: |
| MEDIUM | dataset_eval_utils/src/liteparse_eval/benchmark.py | 254 | print(f"Error: Not a directory: {args.input_dir}") |
| LOW | dataset_eval_utils/src/liteparse_eval/evaluation.py | 170 | except Exception as e: |
| LOW | dataset_eval_utils/src/liteparse_eval/evaluation.py | 254 | except Exception as e: |
| LOW | dataset_eval_utils/src/liteparse_eval/evaluation.py | 275 | except Exception as e: |
| LOW | dataset_eval_utils/src/liteparse_eval/evaluation.py | 345 | except Exception as e: |
| LOW | dataset_eval_utils/src/liteparse_eval/processing.py | 192 | except Exception as e: |
| MEDIUM | dataset_eval_utils/src/liteparse_eval/processing.py | 250 | print(f"Error: Input directory does not exist: {args.input_dir}") |
| LOW | dataset_eval_utils/src/liteparse_eval/report.py | 466 | except Exception as e: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | packages/python/liteparse/parser.py | 129 | Parse a document file. Args: file_data: Path to the document file, or raw PDF bytes. |
| HIGH | packages/python/liteparse/parser.py | 162 | Generate screenshots of document pages. Supports PDFs natively. Non-PDF formats (DOCX, XLSX, images, e |
| HIGH | dataset_eval_utils/src/liteparse_eval/report.py | 535 | Convert first page of PDF to base64-encoded image. Uses JPEG compression if PIL is available, otherwis |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | crates/liteparse-wasm/src/wasi_stubs.rs | 1 | //! Stub implementations of libc functions that pdfium's statically-linked |
| LOW | crates/liteparse/src/types.rs | 41 | /// Whether the font has buggy encoding (private-use codepoints, TT subset, etc.) |
| LOW | crates/liteparse/src/conversion.rs | 741 | // Dropping the TempDir removes the directory. |
| LOW | crates/liteparse/src/lib.rs | 21 | // ── Internal modules (available for binding crates, hidden from docs) ── |
| LOW | crates/liteparse/src/extract.rs | 81 | println!("{}", serde_json::to_string(page)?); |
| LOW | crates/pdfium-sys/wrapper.h | 1 | #include "fpdfview.h" |
| LOW | scripts/create-dataset.sh | 1 | #!/usr/bin/env bash |
| LOW | scripts/upload-dataset.sh | 1 | #!/usr/bin/env bash |
| LOW | packages/python/scripts/copy-pdfium.sh | 1 | #!/usr/bin/env bash |
| LOW | packages/node/scripts/copy-pdfium.sh | 1 | #!/usr/bin/env bash |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | crates/liteparse/src/conversion.rs | 749 | // ── find_pdf_in_dir ────────────────────────────────────────────────────── |
| MEDIUM | crates/liteparse/src/lib.rs | 7 | // ── Public API re-exports ────────────────────────────────────────────── |
| MEDIUM | crates/liteparse/src/lib.rs | 14 | // ── Modules with user-facing types (visible in docs) ─────────────────── |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | dataset_eval_utils/src/liteparse_eval/evaluation.py | 368 | # Create a mapping of file paths to results |
| MEDIUM | dataset_eval_utils/src/liteparse_eval/processing.py | 16 | # Define the output schema using Pydantic-like structure |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | ocr/paddleocr/server.py | 49 | |
| LOW | ocr/paddleocr/server.py | 55 | |
| LOW | scripts/bump-version.py | 166 | |
| LOW | dataset_eval_utils/src/liteparse_eval/benchmark.py | 126 | |
| LOW | dataset_eval_utils/src/liteparse_eval/report.py | 534 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | scripts/create-dataset.sh | 12 | # Usage: |
| LOW | scripts/compare-dataset.sh | 4 | # Usage: |
| LOW | scripts/upload-dataset.sh | 4 | # Usage: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | scripts/compare-dataset.sh | 60 | # Check if file exists |
| LOW | scripts/compare-dataset.sh | 146 | # Check if error was expected |
| LOW | …al_utils/src/liteparse_eval/providers/llm/anthropic.py | 70 | # Check if the response is "<pass>" or "<fail>" |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | crates/liteparse/src/conversion.rs | 456 | /// `.pdf` entry is more robust than constructing a fixed `<stem>.pdf` path. |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | scripts/upload-dataset.sh | 38 | # Step 1: Regenerate dataset from documents in the dataset directory |
| LOW | scripts/upload-dataset.sh | 42 | # Step 2: Upload to HuggingFace |