Repository Analysis

docling-project/docling

Get your documents ready for gen AI

5.1 Low AI signal View on GitHub
5.1
Adjusted Score
5.1
Raw Score
100%
Time Factor
2026-05-29
Last Push
60,628
Stars
Python
Language
492,008
Lines of Code
915
Files
1931
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 67MEDIUM 170LOW 1694

Pattern Findings

1931 matches across 16 categories. Click a row to expand file-level details.

Hyper-Verbose Identifiers622 hits · 586 pts
SeverityFileLineSnippet
LOWtests/test_kserve_v2_binary.py14def test_bytes_tensor_binary_encoding_round_trip() -> None:
LOWtests/test_kserve_v2_binary.py49def test_http_binary_request_serialization() -> None:
LOWtests/test_kserve_v2_binary.py103def test_http_binary_response_decoding() -> None:
LOWtests/test_backend_mets_gbs.py81def test_max_file_bytes_limit(test_doc_path):
LOWtests/test_backend_mets_gbs.py95def test_max_total_bytes_limit(test_doc_path):
LOWtests/test_backend_mets_gbs.py112def test_max_member_count_limit(test_doc_path):
LOWtests/test_backend_mets_gbs.py126def test_limits_with_valid_values(test_doc_path):
LOWtests/test_backend_mets_gbs.py153def test_total_bytes_tracking_across_pages(test_doc_path):
LOWtests/test_run_pr_fast_checks.py46def test_collect_targets_limits_scope_to_supported_paths(tmp_path: Path) -> None:
LOWtests/test_run_pr_fast_checks.py95def test_collect_targets_uses_smoke_target_for_tooling_only_changes(
LOWtests/test_run_pr_fast_checks.py115def test_collect_targets_skips_unrelated_changes(tmp_path: Path) -> None:
LOWtests/test_run_pr_fast_checks.py131def test_build_check_units_uses_ty_check(monkeypatch) -> None:
LOWtests/test_run_pr_fast_checks.py153def test_git_helpers_accept_synthetic_merge_tree(tmp_path: Path) -> None:
LOWtests/test_run_pr_fast_checks.py196def test_log_result_suppresses_success_output(capsys) -> None:
LOWtests/test_run_pr_fast_checks.py214def test_log_result_prints_failure_output(capsys) -> None:
LOWtests/test_run_pr_fast_checks.py232def test_significant_regression_requires_same_successful_target_set() -> None:
LOWtests/test_rapid_ocr_lang.py34def test_rapidocr_uses_english_mobile_assets(monkeypatch, tmp_path: Path) -> None:
LOWtests/test_rapid_ocr_lang.py58def test_rapidocr_defaults_to_chinese_mobile_assets(
LOWtests/test_rapid_ocr_lang.py86def test_download_models_uses_language_specific_mobile_paths(
LOWtests/test_rapid_ocr_lang.py91 def fake_download_url_with_progress(url: str, *, progress: bool) -> BytesIO:
LOWtests/test_rapid_ocr_lang.py116def test_model_downloader_fetches_both_rapidocr_language_sets(
LOWtests/test_input_doc.py20def test_in_doc_from_valid_path():
LOWtests/test_input_doc.py27def test_in_doc_from_invalid_path():
LOWtests/test_input_doc.py35def test_in_doc_from_valid_buf():
LOWtests/test_input_doc.py43def test_in_doc_from_invalid_buf():
LOWtests/test_input_doc.py51def test_in_doc_with_page_range():
LOWtests/test_input_doc.py85def test_in_doc_with_backend_options():
LOWtests/test_input_doc.py305def _make_input_doc_from_stream(doc_stream):
LOWtests/test_failed_pages.py35def test_normal_pages_all_present(normal_4pages_path):
LOWtests/test_failed_pages.py169def test_failed_pages_have_size_info(skipped_1page_path):
LOWtests/test_failed_pages.py197def test_errors_recorded_for_failed_pages(skipped_1page_path):
LOWtests/test_layout_postprocessor.py25def test_sort_cells_uses_native_cell_index_order() -> None:
LOWtests/test_picture_description_vlm_model.py86def test_legacy_picture_description_vlm_batches_generation() -> None:
LOWtests/test_picture_description_vlm_model.py120def test_legacy_picture_description_vlm_skips_empty_batch() -> None:
LOWtests/test_picture_description_vlm_model.py134def test_legacy_picture_description_vlm_init_uses_configured_padding_side(
LOWtests/test_backend_patent_uspto.py438def test_patent_uspto_grant_aps(patents):
LOWtests/test_table_structure_granite_vision.py83def test_parse_multiple_rowspan():
LOWtests/test_table_structure_granite_vision.py171def test_model_disabled_skips_pages():
LOWtests/test_table_structure_granite_vision.py200def test_model_invalid_backend_returns_empty_prediction():
LOWtests/test_table_structure_granite_vision.py247def test_parse_ecel_self_closing():
LOWtests/test_table_structure_granite_vision.py257def test_factory_registration():
LOWtests/test_options.py31def get_converters_with_table_options():
LOWtests/test_options.py176def test_ocr_coverage_threshold(test_doc_path):
LOWtests/test_options.py221def test_pipeline_cache_after_initialize(test_doc_path):
LOWtests/test_options.py257def test_pipeline_cache_with_chart_extraction():
LOWtests/test_page_assemble_model.py30 def test_fi_ligature_no_space(self, model):
LOWtests/test_page_assemble_model.py34 def test_fl_ligature_no_space(self, model):
LOWtests/test_page_assemble_model.py38 def test_fi_ligature_with_spurious_space(self, model):
LOWtests/test_page_assemble_model.py42 def test_fl_ligature_with_spurious_space(self, model):
LOWtests/test_page_assemble_model.py74 def test_ligature_space_at_word_boundary_preserved(self, model):
LOWtests/test_page_assemble_model.py78 def test_multiple_ligatures_in_text(self, model):
LOWtests/test_page_assemble_model.py83 def test_ligature_with_spurious_space_in_multiline(self, model):
LOWtests/test_page_assemble_model.py95 def test_private_use_glyph_stripped(self, model):
LOWtests/test_page_assemble_model.py99 def test_private_use_glyph_with_spurious_space_stripped(self, model):
LOWtests/test_page_assemble_model.py108 def test_pua_glyph_at_string_start(self, model):
LOWtests/test_page_assemble_model.py112 def test_pua_glyph_at_string_end(self, model):
LOWtests/test_page_assemble_model.py120 def test_pua_glyph_preserves_word_boundary_space(self, model):
LOWtests/test_page_assemble_model.py124 def test_pua_glyph_no_space_merges(self, model):
LOWtests/test_page_assemble_model.py128 def test_ij_capital_standalone(self, model):
LOWtests/test_page_assemble_model.py133 def test_regex_matches_new_codepoints(self, model):
562 more matches not shown…
Unused Imports411 hits · 376 pts
SeverityFileLineSnippet
LOWtests/test_run_pr_fast_checks.py1
LOWtests/test_deepseekocr_vlm.py3
LOWtests/test_deepseekocr_vlm.py5
LOWtests/test_deepseekocr_vlm.py19
LOWtests/test_deepseekocr_vlm.py19
LOWtests/test_backend_webp.py6
LOWtests/test_backend_webp.py9
LOWtests/test_page_assemble_model.py13
LOWtests/test_page_assemble_model.py135
LOWtests/test_extraction.py12
LOWtests/test_pytest_marker_selection.py1
LOWtests/test_asr_mlx_whisper.py12
LOWtests/test_asr_mlx_whisper.py12
LOWtests/test_asr_mlx_whisper.py12
LOWtests/test_api_kserve_v2_engine_scaffolding.py3
LOWtests/test_conversion_result_json.py1
LOWtests/test_conversion_result_json.py4
LOWtests/test_conversion_result_json.py6
LOWtests/test_conversion_result_json.py10
LOWtests/test_kserve_v2_ocr_integration.py4
LOWtests/test_check_needs_results.py1
LOWtests/test_backend_asciidoc.py5
LOWtests/test_backend_asciidoc.py5
LOWtests/test_backend_image_native.py4
LOWtests/test_threaded_pipeline.py1
LOWtests/test_threaded_pipeline.py2
LOWtests/test_e2e_ocr_conversion.py8
LOWtests/test_vlm_pipeline_status.py22
LOWtests/test_asr_pipeline.py2
LOWtests/test_backend_docling_parse.py9
LOWtests/test_granite_vision_extraction.py7
LOWtests/test_latex/test_basic.py5
LOWtests/test_latex/conftest.py1
LOWtests/test_latex/conftest.py5
LOWtests/test_latex/conftest.py5
LOWtests/test_latex/conftest.py7
LOWtests/test_latex/conftest.py8
LOWtests/test_latex/conftest.py9
LOWtests/test_latex/conftest.py10
LOWtests/test_latex/conftest.py10
LOWtests/test_latex/conftest.py10
LOWtests/test_latex/conftest.py11
LOWtests/test_latex/conftest.py14
LOWtests/test_latex/conftest.py14
LOWtests/test_latex/test_macros.py5
LOWtests/test_latex/test_macros.py6
LOWtests/test_latex/test_macros.py9
LOWtests/test_latex/test_macros.py11
LOWtests/test_latex/test_macros.py11
LOWtests/test_latex/test_macros.py12
LOWtests/test_latex/test_macros.py15
LOWtests/test_latex/test_macros.py15
LOWtests/test_latex/test_tables.py4
LOWtests/test_latex/test_tables.py5
LOWtests/test_latex/test_tables.py5
LOWtests/test_latex/test_tables.py8
LOWtests/test_latex/test_tables.py10
LOWtests/test_latex/test_tables.py10
LOWtests/test_latex/test_tables.py11
LOWtests/test_latex/test_tables.py14
351 more matches not shown…
Decorative Section Separators88 hits · 282 pts
SeverityFileLineSnippet
MEDIUMpyproject.toml75# ============================================================================
MEDIUMpyproject.toml77# ============================================================================
MEDIUMpyproject.toml90# ============================================================================
MEDIUMpyproject.toml92# ============================================================================
MEDIUMpyproject.toml158# ============================================================================
MEDIUMpyproject.toml160# ============================================================================
MEDIUMpyproject.toml187# ============================================================================
MEDIUMpyproject.toml189# ============================================================================
MEDIUMpyproject.toml217# ============================================================================
MEDIUMpyproject.toml219# ============================================================================
MEDIUMpyproject.toml235# ============================================================================
MEDIUMpyproject.toml237# ============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py44# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py46# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py305# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py307# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py400# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py402# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py512# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py514# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py607# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py609# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py167# =============================================================================
MEDIUMtests/test_vlm_presets_and_runtime_options.py169# =============================================================================
MEDIUMdocling/pipeline/standard_pdf_pipeline.py76# ──────────────────────────────────────────────────────────────────────────────
MEDIUMdocling/pipeline/standard_pdf_pipeline.py78# ──────────────────────────────────────────────────────────────────────────────
MEDIUMdocling/pipeline/standard_pdf_pipeline.py465# ──────────────────────────────────────────────────────────────────────────────
MEDIUMdocling/pipeline/standard_pdf_pipeline.py467# ──────────────────────────────────────────────────────────────────────────────
MEDIUMdocling/pipeline/standard_pdf_pipeline.py482 # ────────────────────────────────────────────────────────────────────────
MEDIUMdocling/pipeline/standard_pdf_pipeline.py484 # ────────────────────────────────────────────────────────────────────────
MEDIUMdocling/pipeline/standard_pdf_pipeline.py577 # ────────────────────────────────────────────────────────────────────────
MEDIUMdocling/pipeline/standard_pdf_pipeline.py579 # ────────────────────────────────────────────────────────────────────────
MEDIUMdocling/datamodel/pipeline_options.py950# =============================================================================
MEDIUMdocling/datamodel/pipeline_options.py952# =============================================================================
MEDIUMdocling/datamodel/pipeline_options.py988# =============================================================================
MEDIUMdocling/datamodel/pipeline_options.py990# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py865# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py867# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py869# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py871# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py41# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py43# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py114# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py116# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py311# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py313# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py380# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py382# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py423# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py425# =============================================================================
MEDIUMdocling/datamodel/stage_model_specs.py936# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py938# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py959# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py961# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py976# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py978# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py1365# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py1367# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py1461# -----------------------------------------------------------------------------
MEDIUMdocling/datamodel/stage_model_specs.py1463# -----------------------------------------------------------------------------
28 more matches not shown…
Fake / Example Data194 hits · 261 pts
SeverityFileLineSnippet
LOWtests/test_backend_jats.py54 "Jane Doe",
LOWtests/test_backend_jats.py62 "Jane Doe",
LOWtests/test_backend_jats.py107 assert "Jane Doe" in md
LOWtests/test_latex/test_basic.py348 assert "Jane Doe" in md
LOW…/data/groundtruth/docling_v2/picture_classification.md3Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOW…/data/groundtruth/docling_v2/picture_classification.md3Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOW…/data/groundtruth/docling_v2/picture_classification.md9Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOW…/data/groundtruth/docling_v2/picture_classification.md9Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOW…/data/groundtruth/docling_v2/picture_classification.md11Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOW…/data/groundtruth/docling_v2/picture_classification.md11Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOW…/data/groundtruth/docling_v2/picture_classification.md17Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOW…/data/groundtruth/docling_v2/picture_classification.md17Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt2<text><loc_109><loc_95><loc_390><loc_183>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt2<text><loc_109><loc_95><loc_390><loc_183>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt3<text><loc_109><loc_185><loc_390><loc_213>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt3<text><loc_109><loc_185><loc_390><loc_213>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt5<text><loc_109><loc_265><loc_390><loc_353>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmo
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt5<text><loc_109><loc_265><loc_390><loc_353>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmo
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt6<text><loc_109><loc_355><loc_390><loc_383>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt6<text><loc_109><loc_355><loc_390><loc_383>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt10<text><loc_112><loc_89><loc_401><loc_172>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt10<text><loc_112><loc_89><loc_401><loc_172>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt11<text><loc_112><loc_174><loc_401><loc_208>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt11<text><loc_112><loc_174><loc_401><loc_208>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt13<text><loc_112><loc_227><loc_401><loc_311>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmo
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt13<text><loc_112><loc_227><loc_401><loc_311>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmo
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt14<text><loc_112><loc_313><loc_401><loc_353>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt14<text><loc_112><loc_313><loc_401><loc_353>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt15<text><loc_112><loc_355><loc_401><loc_396>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…ta/groundtruth/docling_v2/code_and_formula.doctags.txt15<text><loc_112><loc_355><loc_401><loc_396>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie
LOW…a/groundtruth/docling_v2/inline_and_formatting.md.yaml733 orig: ': Lorem ipsum.'
LOW…a/groundtruth/docling_v2/inline_and_formatting.md.yaml738 text: ': Lorem ipsum.'
LOWtests/data/groundtruth/docling_v2/lorem_ipsum.docx.md1Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin elit mi, fermentum vitae dolor facilisis, porttitor molli
LOWtests/data/groundtruth/docling_v2/lorem_ipsum.docx.md1Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin elit mi, fermentum vitae dolor facilisis, porttitor molli
LOWtests/data/groundtruth/docling_v2/code_and_formula.md3Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOWtests/data/groundtruth/docling_v2/code_and_formula.md3Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOWtests/data/groundtruth/docling_v2/code_and_formula.md5Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md5Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md13Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOWtests/data/groundtruth/docling_v2/code_and_formula.md13Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOWtests/data/groundtruth/docling_v2/code_and_formula.md15Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md15Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md19Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOWtests/data/groundtruth/docling_v2/code_and_formula.md19Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOWtests/data/groundtruth/docling_v2/code_and_formula.md21Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md21Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md25Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOWtests/data/groundtruth/docling_v2/code_and_formula.md25Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magn
LOWtests/data/groundtruth/docling_v2/code_and_formula.md27Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md27Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md29Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOWtests/data/groundtruth/docling_v2/code_and_formula.md29Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nul
LOW…ata/groundtruth/docling_v2/picture_classification.json106 "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
LOW…ata/groundtruth/docling_v2/picture_classification.json106 "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
LOW…ata/groundtruth/docling_v2/picture_classification.json107 "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
LOW…ata/groundtruth/docling_v2/picture_classification.json107 "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
LOW…ata/groundtruth/docling_v2/picture_classification.json160 "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
LOW…ata/groundtruth/docling_v2/picture_classification.json160 "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
LOW…ata/groundtruth/docling_v2/picture_classification.json161 "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
LOW…ata/groundtruth/docling_v2/picture_classification.json161 "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
134 more matches not shown…
Cross-File Repetition39 hits · 195 pts
SeverityFileLineSnippet
HIGHtests/test_glmocr_vlm.py0verify preset is registered with correct metadata and model spec.
HIGHtests/test_falcon_ocr_vlm.py0verify preset is registered with correct metadata and model spec.
HIGHtests/test_lightonocr_vlm.py0verify preset is registered with correct metadata and model spec.
HIGHtests/test_nanonets_ocr_vlm.py0verify preset is registered with correct metadata and model spec.
HIGHtests/test_glmocr_vlm.py0verify engine overrides propagate correctly through get_engine_config.
HIGHtests/test_falcon_ocr_vlm.py0verify engine overrides propagate correctly through get_engine_config.
HIGHtests/test_lightonocr_vlm.py0verify engine overrides propagate correctly through get_engine_config.
HIGHtests/test_nanonets_ocr_vlm.py0verify engine overrides propagate correctly through get_engine_config.
HIGHtests/test_glmocr_vlm.py0verify from_preset produces a usable vlmconvertoptions with engine options.
HIGHtests/test_falcon_ocr_vlm.py0verify from_preset produces a usable vlmconvertoptions with engine options.
HIGHtests/test_lightonocr_vlm.py0verify from_preset produces a usable vlmconvertoptions with engine options.
HIGHtests/test_nanonets_ocr_vlm.py0verify from_preset produces a usable vlmconvertoptions with engine options.
HIGHtests/test_glmocr_vlm.py0verify legacy inlinevlmoptions/apivlmoptions specs are consistent.
HIGHtests/test_lightonocr_vlm.py0verify legacy inlinevlmoptions/apivlmoptions specs are consistent.
HIGHtests/test_nanonets_ocr_vlm.py0verify legacy inlinevlmoptions/apivlmoptions specs are consistent.
HIGHtests/test_glmocr_vlm.py0e2e test with vllm server (skipped in ci and when server is unavailable).
HIGHtests/test_lightonocr_vlm.py0e2e test with vllm server (skipped in ci and when server is unavailable).
HIGHtests/test_nanonets_ocr_vlm.py0e2e test with vllm server (skipped in ci and when server is unavailable).
HIGHtests/test_latex/test_figures.py0\documentclass{article} \begin{document} \begin{tikzpicture} \draw (0,0) -- (1,1); \end{tikzpicture} \end{document}
HIGHtests/test_latex/test_figures.py0\documentclass{article} \begin{document} \begin{tikzpicture} \draw (0,0) -- (1,1); \end{tikzpicture} \end{document}
HIGHtests/test_latex/test_figures.py0\documentclass{article} \begin{document} \begin{tikzpicture} \draw (0,0) -- (1,1); \end{tikzpicture} \end{document}
HIGHdocling/backend/xml/uspto_backend.py0represents an element of interest in the patent application document.
HIGHdocling/backend/xml/uspto_backend.py0represents an element of interest in the patent application document.
HIGHdocling/backend/xml/uspto_backend.py0represents an element of interest in the patent application document.
HIGHdocling/backend/xml/uspto_backend.py0signal the start of an element. args: tag: the element tag. attributes: the element attributes.
HIGHdocling/backend/xml/uspto_backend.py0signal the start of an element. args: tag: the element tag. attributes: the element attributes.
HIGHdocling/backend/xml/uspto_backend.py0signal the start of an element. args: tag: the element tag. attributes: the element attributes.
HIGHdocling/backend/xml/uspto_backend.py0receive notification of a skipped entity. html entities will be skipped by the parser. this method will unescape them an
HIGHdocling/backend/xml/uspto_backend.py0receive notification of a skipped entity. html entities will be skipped by the parser. this method will unescape them an
HIGHdocling/backend/xml/uspto_backend.py0receive notification of a skipped entity. html entities will be skipped by the parser. this method will unescape them an
HIGHdocling/backend/xml/uspto_backend.py0signal the end of an element. args: tag: the element tag.
HIGHdocling/backend/xml/uspto_backend.py0signal the end of an element. args: tag: the element tag.
HIGHdocling/backend/xml/uspto_backend.py0signal the end of an element. args: tag: the element tag.
HIGHdocling/backend/xml/uspto_backend.py0receive notification of character data. args: content: data reported by the handler.
HIGHdocling/backend/xml/uspto_backend.py0receive notification of character data. args: content: data reported by the handler.
HIGHdocling/backend/xml/uspto_backend.py0receive notification of character data. args: content: data reported by the handler.
HIGHdocling/backend/xml/uspto_backend.py0apply an html style to text. args: text: a string containing plain text. style_tag: an html tag name for styling text. i
HIGHdocling/backend/xml/uspto_backend.py0apply an html style to text. args: text: a string containing plain text. style_tag: an html tag name for styling text. i
HIGHdocling/backend/xml/uspto_backend.py0apply an html style to text. args: text: a string containing plain text. style_tag: an html tag name for styling text. i
Deep Nesting209 hits · 180 pts
SeverityFileLineSnippet
LOWtests/test_backend_html.py138
LOWtests/test_backend_html.py509
LOWtests/test_backend_msword.py149
LOWtests/test_backend_msword.py165
LOWtests/test_latex/conftest.py21
LOWdocling/document_extractor.py190
LOWdocling/document_converter.py544
LOWdocling/pipeline/legacy_standard_pdf_pipeline.py156
LOWdocling/pipeline/base_pipeline.py106
LOWdocling/pipeline/base_pipeline.py239
LOWdocling/pipeline/standard_pdf_pipeline.py132
LOWdocling/pipeline/standard_pdf_pipeline.py153
LOWdocling/pipeline/standard_pdf_pipeline.py257
LOWdocling/pipeline/standard_pdf_pipeline.py360
LOWdocling/pipeline/standard_pdf_pipeline.py663
LOWdocling/pipeline/standard_pdf_pipeline.py854
LOWdocling/pipeline/standard_pdf_pipeline.py704
LOWdocling/pipeline/extraction_vlm_pipeline.py59
LOWdocling/pipeline/extraction_vlm_pipeline.py160
LOWdocling/pipeline/extraction_vlm_pipeline.py198
LOWdocling/pipeline/vlm_pipeline.py132
LOWdocling/pipeline/vlm_pipeline.py270
LOWdocling/pipeline/vlm_pipeline.py435
LOW…/experimental/pipeline/threaded_layout_vlm_pipeline.py81
LOW…/experimental/pipeline/threaded_layout_vlm_pipeline.py238
LOW…/experimental/pipeline/threaded_layout_vlm_pipeline.py363
LOWdocling/utils/orientation.py9
LOWdocling/utils/accelerator_utils.py9
LOWdocling/utils/glm_utils.py21
LOWdocling/utils/glm_utils.py70
LOWdocling/utils/glm_utils.py332
LOWdocling/utils/api_image_request.py146
LOWdocling/utils/layout_postprocessor.py319
LOWdocling/utils/layout_postprocessor.py389
LOWdocling/utils/layout_postprocessor.py460
LOWdocling/utils/deepseekocr_utils.py27
LOWdocling/utils/deepseekocr_utils.py122
LOWdocling/utils/deepseekocr_utils.py177
LOWdocling/utils/deepseekocr_utils.py231
LOWdocling/backend/webvtt_backend.py100
LOWdocling/backend/webvtt_backend.py116
LOWdocling/backend/md_backend.py174
LOWdocling/backend/md_backend.py329
LOWdocling/backend/msword_backend.py256
LOWdocling/backend/msword_backend.py925
LOWdocling/backend/msword_backend.py995
LOWdocling/backend/msword_backend.py1124
LOWdocling/backend/msword_backend.py1225
LOWdocling/backend/msword_backend.py1407
LOWdocling/backend/msword_backend.py1606
LOWdocling/backend/msword_backend.py1837
LOWdocling/backend/msword_backend.py1972
LOWdocling/backend/msword_backend.py2081
LOWdocling/backend/msword_backend.py2153
LOWdocling/backend/msword_backend.py2196
LOWdocling/backend/msword_backend.py2344
LOWdocling/backend/msword_backend.py2435
LOWdocling/backend/csv_backend.py52
LOWdocling/backend/msexcel_backend.py302
LOWdocling/backend/msexcel_backend.py473
149 more matches not shown…
Excessive Try-Catch Wrapping154 hits · 172 pts
SeverityFileLineSnippet
LOW.actor/actor.sh268 except Exception as e:
LOWtests/test_deepseekocr_vlm.py107 except Exception:
LOWtests/test_options.py93 except Exception as ex:
LOWtests/test_backend_msword.py89 except Exception:
LOWtests/test_glmocr_vlm.py128 except Exception:
LOWtests/test_falcon_ocr_vlm.py93 except Exception:
LOWtests/test_lightonocr_vlm.py135 except Exception:
LOWtests/test_nanonets_ocr_vlm.py144 except Exception:
LOW…ified-python/references/advanced/exception-handling.md49 except Exception:
LOW…ified-python/references/advanced/exception-handling.md158except Exception:
LOW…ified-python/references/advanced/exception-handling.md167except Exception as e:
LOW…ified-python/references/advanced/exception-handling.md178 except Exception:
LOWdocling/pipeline/asr_pipeline.py104 except Exception as e:
LOWdocling/pipeline/asr_pipeline.py259 except Exception as exc:
LOWdocling/pipeline/asr_pipeline.py269 except Exception as e:
LOWdocling/pipeline/asr_pipeline.py365 except Exception as exc:
LOWdocling/pipeline/base_extraction_pipeline.py42 except Exception as e:
LOWdocling/pipeline/base_pipeline.py83 except Exception as e:
LOWdocling/pipeline/base_pipeline.py303 except Exception as e:
MEDIUMdocling/pipeline/standard_pdf_pipeline.py242def _run(self) -> None:
MEDIUMdocling/pipeline/standard_pdf_pipeline.py704def _produce_pages() -> None:
LOWdocling/pipeline/standard_pdf_pipeline.py250 except Exception: # pragma: no cover - top-level guard
LOWdocling/pipeline/standard_pdf_pipeline.py320 except Exception as exc:
LOWdocling/pipeline/standard_pdf_pipeline.py440 except Exception as exc:
LOWdocling/pipeline/standard_pdf_pipeline.py716 except Exception:
LOWdocling/pipeline/standard_pdf_pipeline.py728 except Exception as exc:
LOWdocling/pipeline/extraction_vlm_pipeline.py130 except Exception as e:
LOWdocling/pipeline/extraction_vlm_pipeline.py137 except Exception as e:
LOWdocling/pipeline/extraction_vlm_pipeline.py190 except Exception as e:
LOWdocling/pipeline/extraction_vlm_pipeline.py193 except Exception as e:
LOWdocling/pipeline/vlm_pipeline.py407 except Exception as exc:
LOWdocling/utils/glm_utils.py32 except Exception:
LOWdocling/utils/api_image_request.py89 except Exception as e:
LOWdocling/utils/api_image_request.py139 except Exception as e:
LOWdocling/utils/api_image_request.py239 except Exception as e:
LOWdocling/utils/deepseekocr_utils.py117 except Exception as e:
LOWdocling/backend/webvtt_backend.py73 except Exception as e:
LOWdocling/backend/md_backend.py168 except Exception as e:
LOWdocling/backend/msword_backend.py195 except Exception as e:
LOWdocling/backend/msword_backend.py340 except Exception:
LOWdocling/backend/msword_backend.py504 except Exception as e:
LOWdocling/backend/msword_backend.py605 except Exception as e:
LOWdocling/backend/msword_backend.py2149 except Exception as e:
LOWdocling/backend/msword_backend.py2405 except Exception as e:
LOWdocling/backend/msword_backend.py2483 except Exception as e:
LOWdocling/backend/csv_backend.py30 except Exception as e:
LOWdocling/backend/msexcel_backend.py162 except Exception as e:
LOWdocling/backend/msexcel_backend.py686 except Exception:
LOWdocling/backend/noop_backend.py38 except Exception as e:
LOWdocling/backend/asciidoc_backend.py44 except Exception as e:
LOWdocling/backend/mspowerpoint_backend.py62 except Exception as e:
LOWdocling/backend/mspowerpoint_backend.py431 except Exception:
LOWdocling/backend/mets_gbs_backend.py185 except Exception:
LOWdocling/backend/mets_gbs_backend.py197 except Exception:
LOWdocling/backend/html_backend.py455 except Exception as e:
LOWdocling/backend/image_backend.py169 except Exception as e:
LOWdocling/backend/docling_parse_backend.py318 except Exception:
LOWdocling/backend/latex/backend.py102 except Exception as e:
LOWdocling/backend/latex/backend.py125 except Exception as e:
LOWdocling/backend/latex/backend.py147 except Exception as e:
94 more matches not shown…
Self-Referential Comments53 hits · 166 pts
SeverityFileLineSnippet
MEDIUM.actor/actor.sh19 # Create a temporary home directory with write permissions
MEDIUM.actor/actor.sh183# Create a dedicated working directory in /tmp (writable)
MEDIUM.actor/actor.sh288# Create the request JSON
MEDIUMtests/test_deepseekocr_vlm.py42 # Create a page with the DeepSeek OCR markdown as VLM response
MEDIUMtests/test_backend_webp.py28 # Define the directory you want to search
MEDIUMtests/test_backend_msexcel.py23 # Define the directory you want to search
MEDIUMtests/test_backend_msexcel.py286 # Create an InputDocument with the BytesIO stream
MEDIUMtests/test_backend_html.py259 # Define the directory you want to search
MEDIUMtests/test_backend_msword.py34 # Define the directory you want to search
MEDIUMtests/test_backend_msword.py364 # Create a backend instance using any existing docx file
MEDIUMtests/test_e2e_conversion.py24 # Define the directory you want to search
MEDIUMtests/test_backend_csv.py18 # Define the directory you want to search
MEDIUMtests/test_e2e_ocr_conversion.py31 # Define the directory you want to search
MEDIUMtests/test_interfaces.py122 # Create an InlineVlmOptions with an invalid enum by patching attribute directly
MEDIUMtests/test_backend_markdown.py77 # Define the directory you want to search
MEDIUMtests/test_asr_pipeline.py115 # Create an empty ConversionResult with proper InputDocument
MEDIUMtests/test_asr_pipeline.py152 # Create a proper NoOpBackend instance
MEDIUMtests/test_asr_pipeline.py285 # Create a real file so backend initializes
MEDIUMtests/test_asr_pipeline.py459 # Create a real file so backend initializes
MEDIUMtests/test_asr_pipeline.py511 # Create a real file so backend initializes
MEDIUMtests/test_cli.py192 # Create a dummy audio file for testing
MEDIUMtests/test_cli.py212 # Create a dummy audio file for testing
MEDIUMtests/test_backend_pptx.py16 # Define the directory you want to search
MEDIUMtests/test_latex/test_figures.py78 # Create a temporary directory and test image
MEDIUMtests/test_latex/test_figures.py84 # Create a simple test image with known DPI
MEDIUMdocling/pipeline/standard_pdf_pipeline.py522 # Create a copy to avoid mutating pipeline_options in-place,
MEDIUMdocling/utils/visualization.py15 # Create a smaller font for the labels
MEDIUMdocling/utils/deepseekocr_utils.py280 # Create a new document
MEDIUMdocling/backend/msword_backend.py306 # Create a paragraph-like element to process with standard handler
MEDIUMdocling/backend/msword_backend.py1063 # Create a textbox group to contain all text from the textbox
MEDIUMdocling/backend/msword_backend.py1105 # Create a unique identifier based on content and position
MEDIUMdocling/backend/msword_backend.py2110 # Create a temporary document with just these elements
MEDIUMdocling/backend/msword_backend.py2408 # Create a group for this comment in NOTES and add the comment there
MEDIUMdocling/backend/html_backend.py2195 # Create the list container
MEDIUMdocling/models/base_ocr_model.py63 ) # Create a 20x20 structure element (10 pixels in all directions)
MEDIUMdocling/models/stages/layout/layout_model.py130 # Create a deep copy of the original image for both sides
MEDIUMdocling/models/stages/vlm_convert/vlm_convert_model.py73 # Create the engine - pass model_spec, let factory handle config generation
MEDIUM…cling/models/stages/chart_extraction/granite_vision.py230 # Create a batch of conversations
MEDIUM…ing/models/inference_engines/vlm/auto_inline_engine.py177 # Create the actual engine
MEDIUMdocling/models/inference_engines/vlm/vllm_engine.py164 # Create a temporary mixin instance for downloading
MEDIUMdocling/datamodel/pipeline_options.py1014# Define an enum for the backend options
MEDIUMdocling/datamodel/pipeline_options.py1076# Define an enum for the ocr engines
MEDIUMdocling/datamodel/base_models.py404# Create a type alias for score values
MEDIUMdocling/datamodel/asr_model_specs.py121# Create the model instance
MEDIUMdocling/datamodel/asr_model_specs.py69# Create the model instance
MEDIUMdocling/datamodel/asr_model_specs.py173# Create the model instance
MEDIUMdocling/datamodel/asr_model_specs.py225# Create the model instance
MEDIUMdocling/datamodel/asr_model_specs.py277# Create the model instance
MEDIUMdocling/datamodel/asr_model_specs.py329# Create the model instance
MEDIUMdocling/datamodel/service/options.py1# Define the input options for the API
MEDIUMdocs/examples/enrich_doclingdocument.py50# The following function is responsible for taking an item and applying the required pre-processing for the model.
MEDIUMdocs/examples/legacy/vlm_pipeline_api_model_legacy.py266 # Create the DocumentConverter and launch the conversion.
MEDIUMdocs/examples/legacy/minimal_vlm_pipeline_legacy.py6# This file is kept to validate backward compatibility with the old API.
Redundant / Tautological Comments62 hits · 93 pts
SeverityFileLineSnippet
LOW.actor/actor.sh208 # Check if process is still running
LOWtests/test_deepseekocr_vlm.py100 # Check if ollama is available
LOWtests/test_backend_msword.py141 # Verify if a particular textbox content is extracted
LOWtests/test_asr_pipeline.py57 # Check if the test audio file exists
LOWtests/test_latex/test_macros.py372 # Check if macros were registered
LOWdocling/pipeline/vlm_pipeline.py87 # Check if using new VlmConvertOptions
LOWdocling/utils/layout_postprocessor.py435 # Check if areas are similar (within 20% of each other)
LOWdocling/utils/deepseekocr_utils.py354 # Check if NEXT annotation is a caption for this table/figure/image
LOWdocling/backend/msword_backend.py676 # Check if this is a heading style
LOWdocling/backend/msword_backend.py1246 # Check if this paragraph contains a checkbox
LOWdocling/backend/msword_backend.py1264 # Check if this is actually a numbered list by examining the numFmt
LOWdocling/backend/msword_backend.py2362 # Check if document has any comments
LOWdocling/backend/noop_backend.py27 # Check if stream has content
LOWdocling/backend/noop_backend.py33 # Check if file exists
LOWdocling/backend/mspowerpoint_backend.py340 # Check if it's definitely a list item
LOWdocling/backend/mspowerpoint_backend.py353 # Check if it's definitely not a list item
LOWdocling/backend/mspowerpoint_backend.py458 # Check if master has marker information
LOWdocling/backend/html_backend.py1487 # Check if cell is in a column header or row header
LOWdocling/backend/xml/xbrl_backend.py91 # Check if arelle is available before proceeding
LOWdocling/backend/xml/jats_backend.py676 # Check if cell is in a column header or row header
LOWdocling/backend/docx/latex/omml.py648 # Check if base is a known limit function
LOWdocling/backend/docx/latex/omml.py653 # Check if base is a grouping function (underbrace, overbrace, etc.)
LOW…/stages/page_preprocessing/page_preprocessing_model.py129 ) # Check if text is mostly slash-number pattern
LOW…ling/models/stages/reading_order/readingorder_model.py228 # Check if table has no structure prediction
LOWdocling/models/stages/ocr/tesseract_ocr_model.py183 # Check if the detected language is present in the system
LOWdocling/models/stages/ocr/tesseract_ocr_cli_model.py175 # Check if the detected language has been installed
LOW…models/stages/table_structure/table_structure_model.py228 # Check if word-level cells are available from backend:
LOW…cling/models/stages/chart_extraction/granite_vision.py168 # Check if the value is numeric - non-numeric cells are row headers
LOW…ing/models/inference_engines/vlm/auto_inline_engine.py96 # Check if model has explicit MLX export
LOW…/inference_engines/vlm/api_openai_compatible_engine.py168 # Check if stopped by custom criteria
LOW…ng/models/vlm_pipeline_models/hf_transformers_model.py291 # Check if it's a GenerationStopper class
LOWdocling/datamodel/accelerator_options.py98 # Check if to set the num_threads from the alternative envvar
LOWdocling/datamodel/asr_model_specs.py132 # Check if MPS is available (Apple Silicon)
LOWdocling/datamodel/asr_model_specs.py140 # Check if mlx-whisper is available
LOWdocling/datamodel/asr_model_specs.py28 # Check if MPS is available (Apple Silicon)
LOWdocling/datamodel/asr_model_specs.py36 # Check if mlx-whisper is available
LOWdocling/datamodel/asr_model_specs.py80 # Check if MPS is available (Apple Silicon)
LOWdocling/datamodel/asr_model_specs.py88 # Check if mlx-whisper is available
LOWdocling/datamodel/asr_model_specs.py184 # Check if MPS is available (Apple Silicon)
LOWdocling/datamodel/asr_model_specs.py192 # Check if mlx-whisper is available
LOWdocling/datamodel/asr_model_specs.py236 # Check if MPS is available (Apple Silicon)
LOWdocling/datamodel/asr_model_specs.py244 # Check if mlx-whisper is available
LOWdocling/datamodel/asr_model_specs.py288 # Check if MPS is available (Apple Silicon)
LOWdocling/datamodel/asr_model_specs.py296 # Check if mlx-whisper is available
LOWdocling/datamodel/service/options.py921 # Check if using legacy fields with new fields
LOWdocling/datamodel/service/options.py952 # Check if using legacy fields with new fields
LOWdocs/examples/minimal_asr_pipeline.py66 # Check if the test audio file exists
LOWdocs/examples/chart_extraction.py80 # Check if the picture was classified as a chart.
LOWdocs/examples/chart_extraction.py86 # Check if chart data was extracted.
LOWdocs/examples/pictures_description_api.py101 # Check if running in CI environment
LOWdocs/examples/pictures_description_api.py111 # Check if credentials are available
LOWdocs/examples/picture_description_inline.py45# Check if running in CI
LOWdocs/examples/asr_pipeline_performance_comparison.py156 # Check if we're on Apple Silicon
LOWdocs/examples/post_process_ocr_with_vlm.py674 # Read file names (strip whitespace, ignore empty lines)
LOWdocs/examples/vlm_pipeline_api_model.py54 # Check if model is already loaded
LOWdocs/examples/vlm_pipeline_api_model.py102 # Check if model exists
LOWdocs/examples/vlm_pipeline_api_model.py168 # Check if LM Studio is running
LOWdocs/examples/vlm_pipeline_api_model.py238 # Check if Ollama is running
LOWdocs/examples/vlm_pipeline_api_model.py300 # Check if VLLM is running
LOWdocs/examples/vlm_pipeline_api_model.py358 # Check if running in CI environment
2 more matches not shown…
Docstring Block Structure14 hits · 70 pts
SeverityFileLineSnippet
HIGHdocling/document_converter.py343Convert one document fetched from a file path, URL, or DocumentStream. Note: If the document content is given a
HIGHdocling/document_converter.py407Convert multiple documents from file paths, URLs, or DocumentStreams. Args: source: Source of input
HIGHdocling/document_converter.py484Convert a document given as a string using the specified format. Only Markdown (`InputFormat.MD`) and HTML (`In
HIGH…g/models/stages/code_formula/code_formula_vlm_model.py136Construct the prompt for the model based on the element type. Args: label: The type of input, eithe
HIGHdocling/models/stages/vlm_convert/vlm_convert_model.py194Process raw images without page metadata. This method provides a simpler interface for processing images direct
HIGHdocling/models/inference_engines/vlm/factory.py35Create a VLM inference engine from options. Args: options: Engine configuration options model_spec:
HIGHdocling/models/inference_engines/vlm/_utils.py21Convert any image format to RGB PIL Image. Args: image: Input image as PIL Image or numpy array Return
HIGHdocling/models/inference_engines/vlm/_utils.py99Resolve the path to model artifacts, downloading if needed. This standardizes the logic for finding or downloading
HIGH…ling/models/inference_engines/common/kserve_v2_http.py225Execute HTTP request with consistent error handling. Args: url: Target URL method: HTTP
HIGH…ling/models/inference_engines/common/kserve_v2_http.py322Execute inference request against KServe v2 endpoint. Args: inputs: Mapping of input tensor names t
HIGH…ling/models/extraction/nuextract_transformers_model.py28 Process vision information from both messages and in-context examples, supporting batch processing. Args:
HIGHdocling/datamodel/pipeline_options.py1046Normalize deprecated backend enum values to current ones. Args: backend: The PDF backend enum value to norm
HIGHdocling/datamodel/stage_model_specs.py252Check if this model has an explicit export for the given engine. An explicit export means either: 1. Th
HIGHdocling/datamodel/stage_model_specs.py518Get a specific preset. Args: preset_id: The preset identifier Returns: The req
Over-Commented Block59 hits · 59 pts
SeverityFileLineSnippet
LOWtests/verify_utils.py141# for l, true_item in enumerate(doc_true.main_text):
LOW…ata/groundtruth/docling_v2/powerpoint_bad_text.pptx.md1# X-Library The fully customisable and copyright-free standard content template collection exclusively for our customers
LOWtests/data/groundtruth/docling_v2/unit_test_01.html.md1# Title
LOWtests/data/groundtruth/docling_v2/hyperlink_02.html.md1## [Home](/home.html)
LOWtests/data/uspto/pftaps057006474.txt861 ##STR83##
LOWdocling/service_client/watchers.py241 #
LOWdocs/examples/granitedocling_repetition_stopping.py1# %% [markdown]
LOWdocs/examples/granitedocling_repetition_stopping.py81# model=vlm_model_specs.GRANITEDOCLING_TRANSFORMERS.repo_id,
LOWdocs/examples/granitedocling_repetition_stopping.py101# converter = DocumentConverter(
LOWdocs/examples/export_multimodal.py1# %% [markdown]
LOWdocs/examples/export_multimodal.py121 f"Document converted and multimodal pages generated in {end_time:.2f} seconds."
LOWdocs/examples/inspect_picture_content.py1# %% [markdown]
LOWdocs/examples/compare_vlm_models.py1# %% [markdown]
LOWdocs/examples/translate.py1# %% [markdown]
LOWdocs/examples/export_figures.py1# %% [markdown]
LOWdocs/examples/enrich_doclingdocument.py1# %% [markdown]
LOWdocs/examples/gpu_vlm_pipeline.py1# %% [markdown]
LOWdocs/examples/model_family_engines_example.py1# %% [markdown]
LOWdocs/examples/run_with_formats.py1# %% [markdown]
LOWdocs/examples/minimal_asr_pipeline.py1# %% [markdown]
LOWdocs/examples/gpu_standard_pipeline.py1# %% [markdown]
LOWdocs/examples/chart_extraction.py1# %% [markdown]
LOWdocs/examples/pii_obfuscate.py1# %% [markdown]
LOWdocs/examples/pictures_description_api.py1# %% [markdown]
LOWdocs/examples/pictures_description_api.py181 # Run watsonx.ai example (skips if in CI or credentials not found)
LOWdocs/examples/pictures_description_api.py201# ### Custom API Configuration
LOWdocs/examples/tesseract_lang_detection.py1# %% [markdown]
LOWdocs/examples/picture_description_inline.py1# %% [markdown]
LOWdocs/examples/picture_description_inline.py161#
LOWdocs/examples/minimal_vlm_pipeline.py1# %% [markdown]
LOWdocs/examples/custom_convert.py1# %% [markdown]
LOWdocs/examples/custom_convert.py21# - If you uncomment a backend or OCR option that is not imported above, also
LOWdocs/examples/custom_convert.py61 # The sections below demo combinations of PdfPipelineOptions and backends.
LOWdocs/examples/custom_convert.py81 # pipeline_options = PdfPipelineOptions()
LOWdocs/examples/custom_convert.py121
LOWdocs/examples/custom_convert.py141
LOWdocs/examples/custom_convert.py161 # pipeline_options.table_structure_options = TableStructureOptions(do_cell_matching=True)
LOWdocs/examples/minimal.py1# %% [markdown]
LOWdocs/examples/suryaocr_with_custom_models.py1# Example: Integrating SuryaOCR with Docling for PDF OCR and Markdown Export
LOWdocs/examples/rapidocr_with_custom_models.py1# %% [markdown]
LOWdocs/examples/granite_vision_table_structure.py1# %% [markdown]
LOWdocs/examples/develop_formula_understanding.py1# %% [markdown]
LOWdocs/examples/export_tables.py1# %% [markdown]
LOWdocs/examples/parquet_images.py1# %% [markdown]
LOWdocs/examples/develop_picture_enrichment.py1# %% [markdown]
LOWdocs/examples/vlm_pipeline_api_model.py1# %% [markdown]
LOWdocs/examples/vlm_pipeline_api_model.py481#
LOWdocs/examples/vlm_pipeline_api_model.py501#
LOWdocs/examples/batch_convert.py1# %% [markdown]
LOWdocs/examples/run_with_accelerator.py1# %% [markdown]
LOWdocs/examples/run_with_accelerator.py41 num_threads=8, device=AcceleratorDevice.CPU
LOWdocs/examples/full_page_ocr.py1# %% [markdown]
LOWdocs/examples/legacy/vlm_pipeline_api_model_legacy.py1# %% [markdown]
LOWdocs/examples/legacy/vlm_pipeline_api_model_legacy.py221 enable_remote_services=True # required when calling remote VLM endpoints
LOWdocs/examples/legacy/vlm_pipeline_api_model_legacy.py241 format=ResponseFormat.DOCTAGS,
LOWdocs/examples/legacy/pictures_description_api_legacy.py1# %% [markdown]
LOWdocs/examples/legacy/pictures_description_api_legacy.py141
LOW…s/examples/legacy/picture_description_inline_legacy.py1# %% [markdown]
LOWdocs/examples/legacy/minimal_vlm_pipeline_legacy.py1# %% [markdown]
Cross-Language Confusion10 hits · 38 pts
SeverityFileLineSnippet
HIGHdocling/backend/html_backend.py709 const width = rect.width || 0;
HIGHdocling/backend/html_backend.py710 const height = rect.height || 0;
HIGHdocling/backend/html_backend.py711 if (width <= 0 && height <= 0) {
HIGHdocling/backend/html_backend.py718 let textLeft = null;
HIGHdocling/backend/html_backend.py719 let textTop = null;
HIGHdocling/backend/html_backend.py720 let textRight = null;
HIGHdocling/backend/html_backend.py721 let textBottom = null;
HIGHdocling/backend/html_backend.py734 const tWidth = tRect.width || 0;
HIGHdocling/backend/html_backend.py735 const tHeight = tRect.height || 0;
HIGHdocling/backend/html_backend.py736 if (tWidth <= 0 && tHeight <= 0) {
AI Slop Vocabulary10 hits · 23 pts
SeverityFileLineSnippet
MEDIUMtests/data/uspto/ipg07997973.xml3813<p id="p-0056" num="0055">An enumerated list of items (which may or may not be numbered) does not imply that any or all
MEDIUMtests/data/uspto/ipg07997973.xml3988 <li id="ul0012-0008" num="0196">&#x2003;The various methods of disguising a game described herein may pr
MEDIUMtests/data/uspto/ipg07997973.xml4017 <li id="ul0002-0035" num="0215">&#x2003;In various embodiments, the data about the games of a primary player may
MEDIUMtests/data/uspto/ipg07997973.xml4247 <li id="ul0028-0018" num="0367">&#x2003;Any physical game described herein may be implemented electronically in
MEDIUMtests/data/jats/elife-56337.xml3<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="researc
MEDIUMtests/data/xbrl/grve_10q_htm.xml2459 <us-gaap:SignificantAccountingPoliciesTextBlock contextRef="From2025-04-01to2025-12-31" id="fid_281">&lt;p style="FO
MEDIUMtests/data/xbrl/grve_10q_htm.xml2459 <us-gaap:SignificantAccountingPoliciesTextBlock contextRef="From2025-04-01to2025-12-31" id="fid_281">&lt;p style="FO
MEDIUMtests/data/xbrl/grve_10q_htm.xml2460 <us-gaap:BasisOfAccountingPolicyPolicyTextBlock contextRef="From2025-04-01to2025-12-31" id="fid_289">&lt;p style="FO
MEDIUMtests/data/xbrl/grve_10q_htm.xml2460 <us-gaap:BasisOfAccountingPolicyPolicyTextBlock contextRef="From2025-04-01to2025-12-31" id="fid_289">&lt;p style="FO
LOWdocling/backend/html_backend.py4234 # Do not fetch the image, just add a placeholder
Magic Placeholder Names3 hits · 18 pts
SeverityFileLineSnippet
HIGHdocling/datamodel/backend_options.py95 examples=[{"Authorization": "Bearer TOKEN"}, {"X-API-Key": "your-api-key"}],
HIGHdocs/examples/granitedocling_repetition_stopping.py87# # "Authorization": "Bearer YOUR_API_KEY", # if needed
HIGHdocs/examples/service_client/README.md13export DOCLING_SERVICE_API_KEY="your-api-key" # optional
Synthetic Comment Markers1 hit · 8 pts
SeverityFileLineSnippet
HIGHdocs/examples/picture_description_inline.py7# - Enriches documents with AI-generated image captions
Slop Phrases2 hits · 4 pts
SeverityFileLineSnippet
MEDIUM…roundtruth/docling_v2/arXiv-2501.01300v2_main.tex.json972 "orig": "[fig:enhan] are obtained by multiplying continuum extrapolated $\\chi^C_4$ values to ratios $P^C_B/P_C$ a
MEDIUM…roundtruth/docling_v2/arXiv-2501.01300v2_main.tex.json973 "text": "[fig:enhan] are obtained by multiplying continuum extrapolated $\\chi^C_4$ values to ratios $P^C_B/P_C$ a