Repository Analysis

ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

17.9 Moderate AI signal View on GitHub
17.9
Adjusted Score
17.9
Raw Score
100%
Time Factor
2026-05-27
Last Push
33,740
Stars
Python
Language
44,119
Lines of Code
221
Files
650
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 9MEDIUM 39LOW 602

Pattern Findings

650 matches across 13 categories. Click a row to expand file-level details.

Hyper-Verbose Identifiers303 hits · 324 pts
SeverityFileLineSnippet
LOWtests/test_multi_font_manager.py65def test_missing_font_directory():
LOWtests/test_multi_font_manager.py75def test_select_font_for_arabic_language(multi_font_manager):
LOWtests/test_multi_font_manager.py83def test_select_font_for_persian_language(multi_font_manager):
LOWtests/test_multi_font_manager.py91def test_select_font_for_urdu_language(multi_font_manager):
LOWtests/test_multi_font_manager.py111def test_select_font_for_hindi_language(multi_font_manager):
LOWtests/test_multi_font_manager.py119def test_select_font_for_sanskrit_language(multi_font_manager):
LOWtests/test_multi_font_manager.py127def test_select_font_for_marathi_language(multi_font_manager):
LOWtests/test_multi_font_manager.py135def test_select_font_for_nepali_language(multi_font_manager):
LOWtests/test_multi_font_manager.py147def test_select_font_for_chinese_language(multi_font_manager):
LOWtests/test_multi_font_manager.py155def test_select_font_for_chinese_generic(multi_font_manager):
LOWtests/test_multi_font_manager.py163def test_select_font_for_chinese_simplified(multi_font_manager):
LOWtests/test_multi_font_manager.py171def test_select_font_for_chinese_traditional(multi_font_manager):
LOWtests/test_multi_font_manager.py179def test_select_font_for_japanese_language(multi_font_manager):
LOWtests/test_multi_font_manager.py187def test_select_font_for_korean_language(multi_font_manager):
LOWtests/test_multi_font_manager.py198def test_select_font_for_english_text(multi_font_manager):
LOWtests/test_multi_font_manager.py204def test_select_font_without_language_hint(multi_font_manager):
LOWtests/test_multi_font_manager.py213def test_select_font_arabic_text_without_language_hint(multi_font_manager):
LOWtests/test_multi_font_manager.py222def test_devanagari_text_without_language_hint(multi_font_manager):
LOWtests/test_multi_font_manager.py230def test_cjk_text_without_language_hint(multi_font_manager):
LOWtests/test_multi_font_manager.py238def test_fallback_to_occulta_font(multi_font_manager):
LOWtests/test_multi_font_manager.py246def test_fallback_fonts_constant(multi_font_manager):
LOWtests/test_multi_font_manager.py261def test_has_all_glyphs_for_english(multi_font_manager):
LOWtests/test_multi_font_manager.py267def test_has_all_glyphs_for_arabic(multi_font_manager):
LOWtests/test_multi_font_manager.py274def test_has_all_glyphs_for_devanagari(multi_font_manager):
LOWtests/test_multi_font_manager.py281def test_has_all_glyphs_for_cjk(multi_font_manager):
LOWtests/test_multi_font_manager.py288def test_empty_text_has_all_glyphs(multi_font_manager):
LOWtests/test_multi_font_manager.py293def test_has_all_glyphs_missing_font(multi_font_manager):
LOWtests/test_multi_font_manager.py301def test_font_selection_caching(multi_font_manager):
LOWtests/test_multi_font_manager.py53def test_init_loads_builtin_fonts(multi_font_manager):
LOWtests/test_multi_font_manager.py315def test_language_font_map_coverage():
LOWtests/test_multi_font_manager.py362def test_custom_font_provider(font_dir):
LOWtests/test_multi_font_manager.py378def test_missing_font_uses_fallback(font_dir):
LOWtests/test_multi_font_manager.py393def test_builtin_font_provider_loads_expected_fonts(font_dir):
LOWtests/test_multi_font_manager.py406def test_builtin_font_provider_get_font(font_dir):
LOWtests/test_multi_font_manager.py418def test_builtin_font_provider_get_fallback(font_dir):
LOWtests/test_multi_font_manager.py427def test_builtin_font_provider_missing_font_logs_warning(tmp_path, font_dir, caplog):
LOWtests/test_multi_font_manager.py443def test_builtin_font_provider_missing_occulta_raises(tmp_path):
LOWtests/test_ocr_engine_selection.py18 def test_ocr_engine_option_exists(self):
LOWtests/test_ocr_engine_selection.py30 def test_ocr_engine_accepts_tesseract(self):
LOWtests/test_ocr_engine_selection.py39 def test_ocr_engine_accepts_auto(self):
LOWtests/test_ocr_engine_selection.py48 def test_ocr_engine_accepts_none(self):
LOWtests/test_ocr_engine_selection.py57 def test_ocr_engine_default_is_auto(self):
LOWtests/test_ocr_engine_selection.py66 def test_ocr_engine_rejects_invalid(self):
LOWtests/test_ocr_engine_selection.py79 def test_ocr_options_has_ocr_engine_field(self):
LOWtests/test_ocr_engine_selection.py90 def test_tesseract_selected_when_auto(self):
LOWtests/test_ocr_engine_selection.py103 def test_tesseract_selected_when_tesseract(self):
LOWtests/test_ocr_engine_selection.py116 def test_null_selected_when_none(self):
LOWtests/test_ocr_engine_selection.py129 def test_null_returns_none_when_auto(self):
LOWtests/test_pipeline_generate_ocr.py22 def test_ocr_engine_direct_function_exists(self):
LOWtests/test_pipeline_generate_ocr.py28 def test_ocr_engine_direct_returns_tuple(self, tmp_path):
LOWtests/test_pipeline_generate_ocr.py54 def test_page_result_has_ocr_tree_field(self):
LOWtests/test_pipeline_generate_ocr.py61 def test_page_result_ocr_tree_default_none(self):
LOWtests/test_pipeline_generate_ocr.py89 def test_hocr_result_has_ocr_tree_field(self):
LOWtests/test_pipeline_generate_ocr.py96 def test_hocr_result_ocr_tree_default_none(self):
LOWtests/test_system_font_provider.py38 def test_get_platform_windows(self):
LOWtests/test_system_font_provider.py44 def test_get_platform_freebsd(self):
LOWtests/test_system_font_provider.py72 def test_windows_font_dirs_with_windir(self):
LOWtests/test_system_font_provider.py86 def test_windows_font_dirs_default(self):
LOWtests/test_system_font_provider.py99 def test_windows_font_dirs_with_localappdata(self):
LOWtests/test_system_font_provider.py138 def test_get_font_unknown_name_returns_none(self):
243 more matches not shown…
Unused Imports213 hits · 204 pts
SeverityFileLineSnippet
LOWmisc/synology.py7
LOWmisc/pdf_text_diff.py6
LOWmisc/batch.py15
LOWmisc/example_plugin.py22
LOWmisc/webservice.py7
LOWmisc/webservice.py13
LOWmisc/pdf_compare.py6
LOWmisc/watcher.py8
LOWmisc/bisect_pdf.py6
LOWmisc/_webservice.py13
LOWmisc/ocrmypdf_compare.py6
LOWbin/bump_version.py7
LOWtests/test_multi_font_manager.py6
LOWtests/test_ocr_engine_selection.py10
LOWtests/test_pipeline_generate_ocr.py10
LOWtests/test_system_font_provider.py6
LOWtests/test_concurrency.py4
LOWtests/conftest.py4
LOWtests/test_rasterizer.py6
LOWtests/test_rasterizer.py23
LOWtests/test_validation.py4
LOWtests/test_logging.py4
LOWtests/test_hocr_parser.py6
LOWtests/test_check_pdf.py4
LOWtests/test_null_ocr_engine.py10
LOWtests/test_tagged.py4
LOWtests/test_optimize.py4
LOWtests/test_stdio.py4
LOWtests/test_image_input.py4
LOWtests/test_metadata.py4
LOWtests/__init__.py6
LOWtests/test_pdf_renderer.py6
LOWtests/test_semfree.py4
LOWtests/test_preprocessing.py4
LOWtests/test_rotation.py4
LOWtests/test_ocr_element.py6
LOWtests/test_page_boxes.py4
LOWtests/test_ocr_engine_interface.py10
LOWtests/test_pdfinfo.py4
LOWtests/test_json_serialization.py2
LOWtests/test_acroform.py4
LOWtests/test_hocrtransform.py4
LOWtests/test_userunit.py4
LOWtests/test_ghostscript.py4
LOWtests/test_multilingual_direct.py13
LOWtests/test_unpaper.py4
LOWtests/test_annots.py4
LOWtests/test_tesseract.py4
LOWtests/test_imageops.py4
LOWtests/test_page_numbers.py4
LOWtests/test_helpers.py4
LOWtests/test_verapdf.py6
LOWtests/test_api.py4
LOWtests/test_graft.py4
LOWtests/test_pdfa.py4
LOWtests/test_watcher.py1
LOWtests/test_main.py4
LOWtests/test_fpdf_renderer.py6
LOWtests/test_soft_error.py4
LOWtests/test_completion.py4
153 more matches not shown…
Self-Referential Comments18 hits · 58 pts
SeverityFileLineSnippet
MEDIUMtests/test_rasterizer.py252 # Create an image with gradients to detect rasterization errors
MEDIUMtests/test_rasterizer.py286 # Create an image with gradients to detect rasterization errors
MEDIUMtests/test_ocr_engine_interface.py35 # Create a minimal concrete implementation
MEDIUMtests/test_pdfinfo.py142 # Create an RGB image and save as JPEG
MEDIUMtests/test_pdfinfo.py151 # Create a PDF with the flate+jpeg image
MEDIUMtests/test_ghostscript.py418 # Create an invalid image object that has both ColorSpace and ImageMask set
MEDIUMtests/test_annots.py19 # Create a broken named destination
MEDIUMtests/test_annots.py21 # Create a valid named destination
MEDIUMtests/test_graft.py53 # Create a PDF with a non-zero mediabox origin
MEDIUMtests/test_fpdf_renderer.py107 # Create a non-page element
MEDIUMtests/test_fpdf_renderer.py138 # Create a simple page with one word
MEDIUMtests/test_fpdf_renderer.py412 # Create a page with multiple words on one line
MEDIUMtests/test_fpdf_renderer.py480 # Create a page with CJK words (Chinese characters)
MEDIUMdocs/conf.py10# This file is execfile()d with the current directory set to its
MEDIUMsrc/ocrmypdf/_options.py485 # Create a copy of the model data for serialization
MEDIUMsrc/ocrmypdf/_pipeline.py797 # Create a new single page PDF to hold
MEDIUMsrc/ocrmypdf/_annots.py41 # Create a set of all named destinations
MEDIUMsrc/ocrmypdf/fpdf_renderer/renderer.py948 # Create a renderer for this page
Decorative Section Separators16 hits · 54 pts
SeverityFileLineSnippet
MEDIUMtests/test_multilingual_direct.py69# =============================================================================
MEDIUMtests/test_multilingual_direct.py71# =============================================================================
MEDIUMtests/test_multilingual_direct.py537# =============================================================================
MEDIUMtests/test_multilingual_direct.py539# =============================================================================
MEDIUMtests/test_multilingual_direct.py141# =============================================================================
MEDIUMtests/test_multilingual_direct.py143# =============================================================================
MEDIUMtests/test_multilingual_direct.py216# =============================================================================
MEDIUMtests/test_multilingual_direct.py218# =============================================================================
MEDIUMtests/test_multilingual_direct.py314# =============================================================================
MEDIUMtests/test_multilingual_direct.py316# =============================================================================
MEDIUMtests/test_multilingual_direct.py381# =============================================================================
MEDIUMtests/test_multilingual_direct.py383# =============================================================================
MEDIUMtests/test_multilingual_direct.py511# =============================================================================
MEDIUMtests/test_multilingual_direct.py513# =============================================================================
MEDIUMsrc/ocrmypdf/_plugin_manager.py113 # =========================================================================
MEDIUMsrc/ocrmypdf/_plugin_manager.py115 # =========================================================================
Deep Nesting40 hits · 39 pts
SeverityFileLineSnippet
LOWmisc/pdf_compare.py34
LOWmisc/ocrmypdf_compare.py50
LOWbin/bump_version.py101
LOWtests/test_pdf_renderer.py690
LOWtests/test_ghostscript.py483
LOWtests/test_ghostscript.py517
LOWtests/test_ghostscript.py486
LOWtests/plugins/tesseract_cache.py66
LOWtests/plugins/tesseract_cache.py67
LOWsrc/ocrmypdf/optimize.py142
LOWsrc/ocrmypdf/optimize.py202
LOWsrc/ocrmypdf/_options.py89
LOWsrc/ocrmypdf/_options.py483
LOWsrc/ocrmypdf/_options.py585
LOWsrc/ocrmypdf/_options.py489
LOWsrc/ocrmypdf/_graft.py177
LOWsrc/ocrmypdf/_graft.py512
LOWsrc/ocrmypdf/api.py286
LOWsrc/ocrmypdf/imageops.py29
LOWsrc/ocrmypdf/_pipeline.py67
LOWsrc/ocrmypdf/_pipeline.py165
LOWsrc/ocrmypdf/_pipeline.py323
LOWsrc/ocrmypdf/_pipeline.py511
LOWsrc/ocrmypdf/_pipeline.py1234
LOWsrc/ocrmypdf/_validation.py159
LOWsrc/ocrmypdf/helpers.py252
LOWsrc/ocrmypdf/pdfinfo/layout.py294
LOWsrc/ocrmypdf/pdfinfo/_contentstream.py81
LOWsrc/ocrmypdf/pdfinfo/info.py133
LOWsrc/ocrmypdf/builtin_plugins/pypdfium.py117
LOWsrc/ocrmypdf/builtin_plugins/ghostscript.py245
LOWsrc/ocrmypdf/builtin_plugins/ghostscript.py310
LOWsrc/ocrmypdf/builtin_plugins/ghostscript.py266
LOWsrc/ocrmypdf/subprocess/_windows.py90
LOWsrc/ocrmypdf/subprocess/_run.py79
LOWsrc/ocrmypdf/_pipelines/_common.py514
LOWsrc/ocrmypdf/extra_plugins/semfree.py120
LOWsrc/ocrmypdf/_exec/tesseract.py281
LOWsrc/ocrmypdf/_exec/ghostscript.py107
LOWsrc/ocrmypdf/font/system_font_provider.py202
Docstring Block Structure5 hits · 25 pts
SeverityFileLineSnippet
HIGHsrc/ocrmypdf/api.py76Set up plugin infrastructure with proper initialization. This function handles: 1. Creating or validating the p
HIGHsrc/ocrmypdf/api.py317Construct an options object from the input/output files and keyword arguments. Args: input_file: Input file
HIGHsrc/ocrmypdf/api.py513Run OCRmyPDF on one PDF or image. This function supports two calling conventions: **New style (recommended):**
HIGHsrc/ocrmypdf/pdfa.py219Attempt to convert a PDF to PDF/A by adding required structures. This function creates a copy of the input PDF and
HIGHsrc/ocrmypdf/builtin_plugins/tesseract_ocr.py30Convert string argument to ThresholdingMethod enum. Args: value: String name of thresholding method (auto,
Excessive Try-Catch Wrapping16 hits · 18 pts
SeverityFileLineSnippet
LOWmisc/batch.py86 except Exception:
LOWtests/conftest.py51 except Exception: # pylint: disable=broad-except
MEDIUMtests/conftest.py48def have_unpaper():
LOWtests/test_metadata.py216 except Exception: # pylint: disable=broad-except
MEDIUMtests/test_metadata.py204def libxmp_file_to_dict():
LOWsrc/ocrmypdf/optimize.py356 except Exception: # pylint: disable=broad-except
LOWsrc/ocrmypdf/api.py373 except Exception as e:
LOWsrc/ocrmypdf/api.py837 except Exception as e:
LOWsrc/ocrmypdf/api.py953 except Exception as e:
LOWsrc/ocrmypdf/_pipeline.py1036 except Exception as e:
LOWsrc/ocrmypdf/builtin_plugins/concurrency.py56 except Exception: # pylint: disable=broad-except
LOWsrc/ocrmypdf/builtin_plugins/concurrency.py171 except Exception:
LOWsrc/ocrmypdf/_pipelines/_common.py317 except Exception: # pylint: disable=broad-except
LOWsrc/ocrmypdf/extra_plugins/semfree.py106 except Exception as e: # pylint: disable=broad-except
LOWsrc/ocrmypdf/font/font_provider.py99 except Exception as e:
LOWsrc/ocrmypdf/font/system_font_provider.py263 except Exception as e:
Redundant / Tautological Comments12 hits · 18 pts
SeverityFileLineSnippet
LOWtests/test_rasterizer.py21# Check if pypdfium2 is available
LOWtests/plugins/tesseract_cache.py101 # Check if cache has all required files
LOWsrc/ocrmypdf/_options.py673 # Check if this is a plugin namespace
LOWsrc/ocrmypdf/_annots.py30 # Check if there are any named destinations
LOWsrc/ocrmypdf/pdfa.py205 # Check if sRGB OutputIntent already exists
LOWsrc/ocrmypdf/fpdf_renderer/renderer.py820 # Check if character is in CJK ranges
LOWsrc/ocrmypdf/builtin_plugins/pypdfium.py238 # Check if user explicitly requested a different rasterizer
LOWsrc/ocrmypdf/builtin_plugins/ghostscript.py225 # Check if user explicitly requested a different rasterizer
LOWsrc/ocrmypdf/builtin_plugins/ghostscript.py277 # Check if it's an image with DCTDecode
LOWsrc/ocrmypdf/builtin_plugins/ghostscript.py345 # Check if output is 1-15 bytes shorter
LOWsrc/ocrmypdf/builtin_plugins/ghostscript.py350 # Check if the bytes are identical up to the truncation point
LOWsrc/ocrmypdf/font/multi_font_manager.py252 # Check if text contains non-ASCII characters
Over-Commented Block16 hits · 16 pts
SeverityFileLineSnippet
LOWtests/test_optimize.py341 image.Filter = Name.CCITTFaxDecode
LOWdocs/conf.py1#!/usr/bin/env python3
LOWdocs/conf.py121# non-false value, then it is used:
LOWdocs/conf.py141# add_function_parentheses = True
LOWdocs/conf.py181# Add any paths that contain custom themes here, relative to this directory.
LOWdocs/conf.py201#
LOWdocs/conf.py221# If true, SmartyPants will be used to convert quotes and dashes to
LOWdocs/conf.py241# html_use_index = True
LOWdocs/conf.py261# base URL from which the finished HTML is served.
LOWdocs/conf.py281# The name of a javascript file (relative to the configuration directory) that
LOWdocs/conf.py301 # Latex figure (float) alignment
LOWdocs/conf.py321# latex_use_parts = False
LOWdocs/conf.py341# If false, no module index is generated.
LOWdocs/conf.py381
LOWsrc/ocrmypdf/_jobcontext.py121 # Otherwise, we have a fallback Namespace (shouldn't happen in normal operation)
LOWsrc/ocrmypdf/_exec/ghostscript.py321 stop_on_error = False
Cross-File Repetition3 hits · 15 pts
SeverityFileLineSnippet
HIGHsrc/ocrmypdf/_pipelines/hocr_to_ocr_pdf.py0implements the concurrent and page synchronous parts of the pipeline.
HIGHsrc/ocrmypdf/_pipelines/ocr.py0implements the concurrent and page synchronous parts of the pipeline.
HIGHsrc/ocrmypdf/_pipelines/pdf_to_hocr.py0implements the concurrent and page synchronous parts of the pipeline.
AI Slop Vocabulary5 hits · 8 pts
SeverityFileLineSnippet
MEDIUMtests/test_fpdf_renderer.py326 """Test rendering comprehensive multilingual 'Hello!' hOCR file.
MEDIUMsrc/ocrmypdf/_validation_coordinator.py28 """Run comprehensive validation on all options.
MEDIUMsrc/ocrmypdf/_validation.py139 # Finally, run comprehensive validation using the coordinator
LOWsrc/ocrmypdf/_exec/unpaper.py62 # No changes, PNG input, just use the file we already have
LOWsrc/ocrmypdf/_exec/unpaper.py65 # adds a few seconds to test suite - so just use pnm
Cross-Language Confusion1 hit · 5 pts
SeverityFileLineSnippet
HIGHtests/test_api.py150 '"textpdf": {"Path": "c"}, "orientation_correction": 180, "ocr_tree": null}'
Verbosity Indicators2 hits · 4 pts
SeverityFileLineSnippet
LOWsrc/ocrmypdf/_validation_coordinator.py38 # Step 1: Plugin context validation
LOWsrc/ocrmypdf/_validation_coordinator.py41 # Step 2: Cross-cutting validation