Repository Analysis

microsoft/markitdown

Python tool for converting files and office documents to Markdown.

40.8 Strong AI signal View on GitHub
40.8
Adjusted Score
40.8
Raw Score
100%
Time Factor
2026-05-26
Last Push
130,440
Stars
Python
Language
17,007
Lines of Code
100
Files
382
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 1HIGH 8MEDIUM 108LOW 265

Pattern Findings

382 matches across 13 categories. Click a row to expand file-level details.

Decorative Section Separators90 hits · 324 pts
SeverityFileLineSnippet
MEDIUMpackages/markitdown/tests/test_cu_converter.py114# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py116# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py160# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py162# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py193# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py195# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py308# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py310# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py558# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py560# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py637# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py639# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py719# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py721# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py746# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py748# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py788# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py790# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py906# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py908# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py25# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py27# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py47# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py49# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py662# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown/tests/test_cu_converter.py664# ---------------------------------------------------------------------------
MEDIUM…/markitdown/src/markitdown/converters/_cu_converter.py50# ---------------------------------------------------------------------------
MEDIUM…/markitdown/src/markitdown/converters/_cu_converter.py52# ---------------------------------------------------------------------------
MEDIUM…/markitdown/src/markitdown/converters/_cu_converter.py335# ---------------------------------------------------------------------------
MEDIUM…/markitdown/src/markitdown/converters/_cu_converter.py337# ---------------------------------------------------------------------------
MEDIUM…/markitdown/src/markitdown/converters/_cu_converter.py436# ---------------------------------------------------------------------------
MEDIUM…/markitdown/src/markitdown/converters/_cu_converter.py438# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py103# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py105# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py166# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py168# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py210# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py212# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py55# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py57# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py70# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py72# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py87# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py89# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py120# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py122# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py147# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_docx_converter.py149# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py61# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py63# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py75# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py77# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py91# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py93# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py106# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py108# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py121# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py123# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py135# ---------------------------------------------------------------------------
MEDIUMpackages/markitdown-ocr/tests/test_pptx_converter.py137# ---------------------------------------------------------------------------
30 more matches not shown…
Hyper-Verbose Identifiers115 hits · 103 pts
SeverityFileLineSnippet
LOWpackages/markitdown/tests/test_module_vectors.py72def test_convert_stream_with_hints(test_vector):
LOWpackages/markitdown/tests/test_module_vectors.py93def test_convert_stream_without_hints(test_vector):
LOWpackages/markitdown/tests/test_module_vectors.py163def test_convert_keep_data_uris(test_vector):
LOWpackages/markitdown/tests/test_module_vectors.py181def test_convert_stream_keep_data_uris(test_vector):
LOWpackages/markitdown/tests/test_pdf_masterformat.py17 def test_partial_numbering_pattern_regex(self):
LOWpackages/markitdown/tests/test_pdf_masterformat.py34 def test_masterformat_partial_numbering_not_split(self):
LOWpackages/markitdown/tests/test_pdf_masterformat.py73 def test_masterformat_content_preserved(self):
LOWpackages/markitdown/tests/test_pdf_masterformat.py115 def test_merge_partial_numbering_with_empty_lines_between(self):
LOWpackages/markitdown/tests/test_pdf_masterformat.py148 def test_multiple_partial_numberings_all_merged(self):
LOWpackages/markitdown/tests/test_pdf_memory.py86 def test_page_close_called_on_every_page(self):
LOWpackages/markitdown/tests/test_pdf_memory.py116 def test_plain_text_pdf_falls_back_to_pdfminer(self):
LOWpackages/markitdown/tests/test_pdf_memory.py150 def test_plain_text_pdf_still_closes_all_pages(self):
LOWpackages/markitdown/tests/test_pdf_memory.py177 def test_mixed_pdf_uses_form_extraction_per_page(self):
LOWpackages/markitdown/tests/test_pdf_memory.py222 def test_only_one_pdfplumber_open_call(self):
LOWpackages/markitdown/tests/test_pdf_memory.py249 def test_real_pdf_page_cleanup(self):
LOWpackages/markitdown/tests/test_pdf_memory.py303 def test_memory_does_not_grow_linearly(self):
LOWpackages/markitdown/tests/test_pdf_memory.py333 def test_memory_constant_across_page_counts(self):
LOWpackages/markitdown/tests/test_module_misc.py110def test_stream_info_operations() -> None:
LOWpackages/markitdown/tests/test_module_misc.py291def test_deeply_nested_html_fallback() -> None:
LOWpackages/markitdown/tests/test_module_misc.py404def test_speech_transcription() -> None:
LOWpackages/markitdown/tests/test_module_misc.py465def test_markitdown_llm_parameters() -> None:
LOWpackages/markitdown/tests/test_docintel_html.py15def test_docintel_accepts_html_extension():
LOWpackages/markitdown/tests/test_docintel_html.py21def test_docintel_accepts_html_mimetype():
LOWpackages/markitdown/tests/test_cli_vectors.py98def test_input_from_stdin_without_hints(shared_tmp_dir, test_vector) -> None:
LOWpackages/markitdown/tests/test_cli_vectors.py152def test_output_to_file_with_data_uris(shared_tmp_dir, test_vector) -> None:
LOWpackages/markitdown/tests/test_cu_converter.py109 def test_rejects_unsupported_extensions(self, ext):
LOWpackages/markitdown/tests/test_cu_converter.py155 def test_rejects_unsupported_mimetypes(self, mime):
LOWpackages/markitdown/tests/test_cu_converter.py168 def test_restricted_to_pdf_only(self):
LOWpackages/markitdown/tests/test_cu_converter.py186 def test_webm_value_matches_cli_input(self):
LOWpackages/markitdown/tests/test_cu_converter.py201 def test_detects_video_from_mime_without_extension(self):
LOWpackages/markitdown/tests/test_cu_converter.py207 def test_detects_audio_from_mime_without_extension(self):
LOWpackages/markitdown/tests/test_cu_converter.py213 def test_detects_audio_alias_from_mime_without_extension(self):
LOWpackages/markitdown/tests/test_cu_converter.py219 def test_detects_video_alias_from_mime_without_extension(self):
LOWpackages/markitdown/tests/test_cu_converter.py298 def test_file_type_restriction_applies_to_mime(self):
LOWpackages/markitdown/tests/test_cu_converter.py316 def test_document_analyzer_routes_pdf_to_custom(self):
LOWpackages/markitdown/tests/test_cu_converter.py566 def test_known_document_prebuilts(self):
LOWpackages/markitdown/tests/test_cu_converter.py578 def test_known_audio_prebuilts(self):
LOWpackages/markitdown/tests/test_cu_converter.py584 def test_known_video_prebuilts(self):
LOWpackages/markitdown/tests/test_cu_converter.py590 def test_known_image_prebuilts(self):
LOWpackages/markitdown/tests/test_cu_converter.py596 def test_unknown_prebuilt_falls_back_to_get_analyzer(self):
LOWpackages/markitdown/tests/test_cu_converter.py618 def test_custom_analyzer_no_base_defaults_to_document(self):
LOWpackages/markitdown/tests/test_cu_converter.py628 def test_get_analyzer_failure_raises_value_error(self):
LOWpackages/markitdown/tests/test_cu_converter.py702 def test_wav_returns_markdown(self):
LOWpackages/markitdown/tests/test_cu_converter.py712 def test_jpeg_returns_markdown(self):
LOWpackages/markitdown/tests/test_cu_converter.py727 def test_nonexistent_analyzer_raises_value_error(self):
LOWpackages/markitdown/tests/test_cu_converter.py754 def test_cu_registered_before_docintel(self):
LOWpackages/markitdown/tests/test_cu_converter.py796 def test_use_cu_without_endpoint_exits(self):
LOWpackages/markitdown/tests/test_cu_converter.py914 def test_missing_deps_message(self):
LOWpackages/markitdown/tests/test_cu_converter.py94 def test_accepts_supported_extensions(self, ext):
LOWpackages/markitdown/tests/test_cu_converter.py143 def test_accepts_supported_mimetypes(self, mime):
LOWpackages/markitdown/tests/test_cu_converter.py267 def test_content_type_for_resolves_conflicts_to_file_type(
LOWpackages/markitdown/tests/test_cu_converter.py273 def test_conflicting_extension_and_mimetype_in_convert(self):
LOWpackages/markitdown/tests/test_cu_converter.py386 def test_no_analyzer_id_uses_auto_routing(self):
LOWpackages/markitdown/tests/test_cu_converter.py406 def test_no_analyzer_id_routes_image_to_document_search(self):
LOWpackages/markitdown/tests/test_cu_converter.py426 def test_document_analyzer_routes_image_to_custom(self):
LOWpackages/markitdown/tests/test_cu_converter.py449 def test_image_analyzer_routes_jpeg_to_custom(self):
LOWpackages/markitdown/tests/test_cu_converter.py472 def test_image_analyzer_routes_pdf_to_document_prebuilt(self):
LOWpackages/markitdown/tests/test_cu_converter.py504 def test_mime_only_input_uses_auto_routing(self, mimetype, expected_analyzer):
LOWpackages/markitdown/tests/test_cu_converter.py521 def test_mime_alias_input_uses_canonical_content_type(self):
LOWpackages/markitdown/tests/test_cu_converter.py539 def test_extension_only_input_uses_file_type_content_type(self):
55 more matches not shown…
Unused Imports58 hits · 56 pts
SeverityFileLineSnippet
LOW…sample-plugin/src/markitdown_sample_plugin/__init__.py5
LOW…sample-plugin/src/markitdown_sample_plugin/__init__.py5
LOW…sample-plugin/src/markitdown_sample_plugin/__init__.py5
LOW…sample-plugin/src/markitdown_sample_plugin/__init__.py6
LOWpackages/markitdown/tests/test_pdf_memory.py26
LOWpackages/markitdown/tests/test_cu_converter.py13
LOWpackages/markitdown/src/markitdown/__init__.py5
LOWpackages/markitdown/src/markitdown/__init__.py6
LOWpackages/markitdown/src/markitdown/__init__.py6
LOWpackages/markitdown/src/markitdown/__init__.py6
LOWpackages/markitdown/src/markitdown/__init__.py11
LOWpackages/markitdown/src/markitdown/__init__.py11
LOWpackages/markitdown/src/markitdown/__init__.py12
LOWpackages/markitdown/src/markitdown/__init__.py13
LOWpackages/markitdown/src/markitdown/__init__.py13
LOWpackages/markitdown/src/markitdown/__init__.py13
LOWpackages/markitdown/src/markitdown/__init__.py13
LOWpackages/markitdown/src/markitdown/__init__.py13
LOW…arkitdown/src/markitdown/converters/_xlsx_converter.py13
LOW…arkitdown/src/markitdown/converters/_xlsx_converter.py20
LOW…kages/markitdown/src/markitdown/converters/__init__.py5
LOW…kages/markitdown/src/markitdown/converters/__init__.py6
LOW…kages/markitdown/src/markitdown/converters/__init__.py7
LOW…kages/markitdown/src/markitdown/converters/__init__.py8
LOW…kages/markitdown/src/markitdown/converters/__init__.py9
LOW…kages/markitdown/src/markitdown/converters/__init__.py10
LOW…kages/markitdown/src/markitdown/converters/__init__.py11
LOW…kages/markitdown/src/markitdown/converters/__init__.py12
LOW…kages/markitdown/src/markitdown/converters/__init__.py13
LOW…kages/markitdown/src/markitdown/converters/__init__.py14
LOW…kages/markitdown/src/markitdown/converters/__init__.py14
LOW…kages/markitdown/src/markitdown/converters/__init__.py15
LOW…kages/markitdown/src/markitdown/converters/__init__.py16
LOW…kages/markitdown/src/markitdown/converters/__init__.py17
LOW…kages/markitdown/src/markitdown/converters/__init__.py18
LOW…kages/markitdown/src/markitdown/converters/__init__.py19
LOW…kages/markitdown/src/markitdown/converters/__init__.py20
LOW…kages/markitdown/src/markitdown/converters/__init__.py20
LOW…kages/markitdown/src/markitdown/converters/__init__.py24
LOW…kages/markitdown/src/markitdown/converters/__init__.py24
LOW…kages/markitdown/src/markitdown/converters/__init__.py28
LOW…kages/markitdown/src/markitdown/converters/__init__.py29
LOW…arkitdown/src/markitdown/converters/_docx_converter.py2
LOW…arkitdown/src/markitdown/converters/_docx_converter.py3
LOW…arkitdown/src/markitdown/converters/_pptx_converter.py9
LOW…markitdown/src/markitdown/converters/_zip_converter.py13
LOW…own/src/markitdown/converters/_plain_text_converter.py12
LOW…src/markitdown/converter_utils/docx/math/latex_dict.py8
LOWpackages/markitdown-mcp/src/markitdown_mcp/__init__.py5
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py10
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py10
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py11
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py12
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py12
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py16
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py17
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py18
LOWpackages/markitdown-ocr/src/markitdown_ocr/__init__.py19
Excessive Try-Catch Wrapping41 hits · 46 pts
SeverityFileLineSnippet
LOWpackages/markitdown/src/markitdown/_markitdown.py79 except Exception:
LOWpackages/markitdown/src/markitdown/_markitdown.py268 except Exception:
LOWpackages/markitdown/src/markitdown/_markitdown.py630 except Exception:
LOW…rkitdown/src/markitdown/converters/_image_converter.py112 except Exception as e:
LOW…s/markitdown/src/markitdown/converters/_llm_caption.py24 except Exception as e:
LOW…wn/src/markitdown/converters/_outlook_msg_converter.py66 except Exception as e:
LOW…wn/src/markitdown/converters/_outlook_msg_converter.py147 except Exception:
LOW…markitdown/src/markitdown/converters/_pdf_converter.py576 except Exception:
LOW…/markitdown/src/markitdown/converters/_cu_converter.py421 except Exception as exc:
LOW…itdown/src/markitdown/converters/_youtube_converter.py114 except Exception as e:
MEDIUM…itdown/src/markitdown/converters/_youtube_converter.py115 print(f"Error extracting description: {e}")
LOW…itdown/src/markitdown/converters/_youtube_converter.py176 except Exception as e:
MEDIUM…itdown/src/markitdown/converters/_youtube_converter.py179 print(f"Error fetching transcript: {e}")
LOW…itdown/src/markitdown/converters/_youtube_converter.py232 except Exception as e:
LOW…rkitdown/src/markitdown/converters/_ipynb_converter.py93 except Exception as e:
LOW…arkitdown/src/markitdown/converters/_pptx_converter.py127 except Exception:
LOW…arkitdown/src/markitdown/converters/_pptx_converter.py134 except Exception:
LOW…arkitdown/src/markitdown/converters/_pptx_converter.py262 except Exception:
MEDIUM…arkitdown/src/markitdown/converters/_pptx_converter.py235def _convert_chart_to_markdown(self, chart):
LOW…own/src/markitdown/converter_utils/docx/pre_process.py150 except Exception:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py83 except Exception:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py120 except Exception:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py123 except Exception:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py297 except Exception:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py302 except Exception:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py332 except Exception:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py380 except Exception as e:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py386 except Exception:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py413 except Exception as e:
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py419 except Exception:
LOW…own-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py134 except Exception:
LOW…own-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py208 except Exception:
LOW…own-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py211 except Exception:
LOW…own-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py121 except Exception:
LOW…own-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py132 except Exception:
LOW…own-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py248 except Exception:
MEDIUM…own-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py222def _convert_chart_to_markdown(self, chart):
LOW…own-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py152 except Exception:
LOW…own-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py155 except Exception:
LOW…ages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py78 except Exception:
LOW…ages/markitdown-ocr/src/markitdown_ocr/_ocr_service.py107 except Exception as e:
Deep Nesting30 hits · 30 pts
SeverityFileLineSnippet
LOWpackages/markitdown/tests/test_pdf_masterformat.py115
LOWpackages/markitdown/src/markitdown/_exceptions.py58
LOWpackages/markitdown/src/markitdown/__main__.py14
LOWpackages/markitdown/src/markitdown/_markitdown.py141
LOWpackages/markitdown/src/markitdown/_markitdown.py275
LOWpackages/markitdown/src/markitdown/_markitdown.py489
LOWpackages/markitdown/src/markitdown/_markitdown.py561
LOWpackages/markitdown/src/markitdown/_markitdown.py696
LOW…down/src/markitdown/converters/_doc_intel_converter.py71
LOW…down/src/markitdown/converters/_doc_intel_converter.py104
LOW…wn/src/markitdown/converters/_outlook_msg_converter.py127
LOW…markitdown/src/markitdown/converters/_pdf_converter.py120
LOW…markitdown/src/markitdown/converters/_pdf_converter.py398
LOW…markitdown/src/markitdown/converters/_pdf_converter.py520
LOW…/markitdown/src/markitdown/converters/_cu_converter.py251
LOW…itdown/src/markitdown/converters/_youtube_converter.py70
LOW…itdown/src/markitdown/converters/_youtube_converter.py211
LOW…down/src/markitdown/converters/_bing_serp_converter.py57
LOW…rkitdown/src/markitdown/converters/_ipynb_converter.py57
LOW…arkitdown/src/markitdown/converters/_epub_converter.py53
LOW…arkitdown/src/markitdown/converters/_pptx_converter.py61
LOW…markitdown/src/markitdown/converters/_zip_converter.py87
LOW…own/src/markitdown/converter_utils/docx/pre_process.py118
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py28
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py158
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py340
LOW…own-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py149
LOW…own-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py54
LOW…own-ocr/src/markitdown_ocr/_pptx_converter_with_ocr.py87
LOW…own-ocr/src/markitdown_ocr/_docx_converter_with_ocr.py126
AI Slop Vocabulary10 hits · 26 pts
SeverityFileLineSnippet
MEDIUM…ckages/markitdown/tests/test_files/test_wikipedia.html936</p><p>In 1990, Microsoft introduced the <a href="/wiki/Microsoft_Office" title="Microsoft Office">Microsoft Office</a>
MEDIUMpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
MEDIUMpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
MEDIUMpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
MEDIUMpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
MEDIUMpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
MEDIUMpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
MEDIUMpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
MEDIUMpackages/markitdown/tests/test_files/test_serp.html17var logJSText=function(n,t){t===void 0&&(t=null);(new Image).src=_G.lsUrl+'&Type=Event.ClientInst&DATA=[{"T":"CI.ClientI
LOWpackages/markitdown/src/markitdown/_markitdown.py790 # There were no other guesses, so just add the base guess
Redundant / Tautological Comments17 hits · 26 pts
SeverityFileLineSnippet
LOW…-sample-plugin/src/markitdown_sample_plugin/_plugin.py63 # Read the file stream into an str using hte provided charset encoding, or using the system default
LOW…-sample-plugin/src/markitdown_sample_plugin/_plugin.py67 # Return the result
LOWpackages/markitdown/tests/test_pdf_masterformat.py56 # Check if line contains ONLY a partial numbering (with possible whitespace/pipes)
LOWpackages/markitdown/tests/test_pdf_masterformat.py139 # Check if next non-empty line exists and wasn't merged
LOWpackages/markitdown/src/markitdown/_markitdown.py392 # Check if we have a seekable stream. If not, load the entire stream into memory.
LOWpackages/markitdown/src/markitdown/_markitdown.py614 # Check if the converter will accept the file, and if so, try to convert it
LOW…rkitdown/src/markitdown/converters/_audio_converter.py100 # Return the result
LOW…markitdown/src/markitdown/converters/_pdf_converter.py37 # Check if this line is ONLY a partial numbering
LOW…markitdown/src/markitdown/converters/_pdf_converter.py296 # Check if enough rows are table rows (at least 20%)
LOW…markitdown/src/markitdown/converters/_pdf_converter.py328 # Check if this row starts a table region
LOW…markitdown/src/markitdown/converters/_pdf_converter.py383 # Check if we're inside a table region (not at start)
LOW…markitdown/src/markitdown/converters/_pdf_converter.py448 # Assign words to columns
LOW…markitdown/src/markitdown/converters/_pdf_converter.py477 # Check if cells contain short, structured data (not long text)
LOW…markitdown/src/markitdown/converters/_pdf_converter.py539 # Read file stream into BytesIO for compatibility with pdfplumber
LOW…markitdown/src/markitdown/converters/_csv_converter.py44 # Read the file content
LOW…down-ocr/src/markitdown_ocr/_pdf_converter_with_ocr.py94 # Check if dimensions are valid
LOW…own-ocr/src/markitdown_ocr/_xlsx_converter_with_ocr.py165 # Check if sheet has images
Self-Referential Comments5 hits · 15 pts
SeverityFileLineSnippet
MEDIUMpackages/markitdown/tests/test_module_misc.py18# This file contains module tests that are not directly tested by the FileTestVectors.
MEDIUMpackages/markitdown/tests/test_cli_misc.py5# This file contains CLI tests that are not directly tested by the FileTestVectors.
MEDIUMpackages/markitdown/src/markitdown/_markitdown.py530 # Create an initial guess from all this information
MEDIUMpackages/markitdown/src/markitdown/_markitdown.py569 # Create a copy of the page_converters list, sorted by priority.
MEDIUM…own/src/markitdown/converter_utils/docx/pre_process.py85 # Create a new paragraph tag
Synthetic Comment Markers2 hits · 15 pts
SeverityFileLineSnippet
HIGHpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
HIGHpackages/markitdown/tests/test_files/test_rss.xml1<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://purl.org/rss/
Docstring Block Structure3 hits · 15 pts
SeverityFileLineSnippet
HIGHpackages/markitdown/src/markitdown/_base_converter.py51 Return a quick determination on if the converter should attempt converting the document. This is primar
HIGHpackages/markitdown/src/markitdown/_base_converter.py90 Convert a document to Markdown text. Parameters: - file_stream: The file-like object to conver
HIGH…/markitdown/src/markitdown/converters/_cu_converter.py397Resolve analyzer modality from cache or via get_analyzer() fallback. For known prebuilt-* names, returns the modali
Hallucination Indicators1 hit · 15 pts
SeverityFileLineSnippet
CRITICAL…arkitdown/src/markitdown/converters/_pptx_converter.py133 alt_text = shape._element._nvXxPr.cNvPr.attrib.get("descr", "")
Cross-File Repetition3 hits · 15 pts
SeverityFileLineSnippet
HIGHpackages/markitdown/tests/test_module_vectors.py0test the conversion of a stream with no stream info.
HIGHpackages/markitdown/tests/test_module_vectors.py0test the conversion of a stream with no stream info.
HIGHpackages/markitdown/tests/test_cli_vectors.py0test the conversion of a stream with no stream info.
Over-Commented Block7 hits · 7 pts
SeverityFileLineSnippet
LOW.devcontainer/devcontainer.json21 // Use 'forwardPorts' to make a list of ports inside the container available locally.
LOWpackages/markitdown-sample-plugin/tests/__init__.py1# SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>
LOW…ample-plugin/src/markitdown_sample_plugin/__about__.py1# SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>
LOWpackages/markitdown/tests/__init__.py1# SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>
LOWpackages/markitdown/src/markitdown/__about__.py1# SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>
LOWpackages/markitdown-mcp/tests/__init__.py1# SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>
LOWpackages/markitdown-mcp/src/markitdown_mcp/__about__.py1# SPDX-FileCopyrightText: 2024-present Adam Fourney <adamfo@microsoft.com>