google/langextract

26.4

Adjusted Score

26.4

Raw Score

100%

Time Factor

2026-07-02

Last Push

37.1K

Stars

Python

Language

42.1K

Lines of Code

131

Files

891

Pattern Hits

2026-07-14

Scan Date

0.41

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 1HIGH 53MEDIUM 21LOW 816

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 891 distinct pattern matches across 18 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Hyper-Verbose Identifiers532 hits · 506 pts

Severity	File	Line	Snippet	Context
LOW	tests/chunking_test.py	69	def test_multi_sentence_chunk(self):	CODE
LOW	tests/chunking_test.py	96	def test_sentence_with_multiple_newlines_and_right_interval(self):	CODE
LOW	tests/chunking_test.py	164	def test_long_token_gets_own_chunk(self):	CODE
LOW	tests/chunking_test.py	207	def test_newline_at_chunk_boundary_does_not_create_empty_interval(self):	CODE
LOW	tests/chunking_test.py	261	def test_newlines_is_secondary_sentence_break(self):	CODE
LOW	tests/chunking_test.py	312	def test_tokenizer_propagation(self):	CODE
LOW	tests/chunking_test.py	425	def test_make_batches_of_textchunk(	CODE
LOW	tests/chunking_test.py	473	def test_text_chunk_additional_context(self):	CODE
LOW	tests/chunking_test.py	486	def test_chunk_iterator_without_additional_context(self):	CODE
LOW	tests/chunking_test.py	497	def test_multiple_chunks_with_additional_context(self):	CODE
LOW	tests/chunking_test.py	549	def test_text_chunk_properties(	CODE
LOW⚡	tests/progress_test.py	27	def test_download_progress_bar(self):	CODE
LOW⚡	tests/progress_test.py	37	def test_extraction_progress_bar(self):	CODE
LOW⚡	tests/progress_test.py	47	def test_save_load_progress_bars(self):	CODE
LOW⚡	tests/progress_test.py	57	def test_model_info_extraction(self):	CODE
LOW	tests/progress_test.py	68	def test_formatting_functions(self):	CODE
LOW	tests/format_handler_test.py	147	def test_end_to_end_integration_with_prompt_and_resolver(self):	CODE
LOW	tests/format_handler_test.py	232	def test_format_parse_roundtrip(	STRING
LOW	tests/format_handler_test.py	260	def test_think_tags_stripped_before_parsing(self):	CODE
LOW	tests/format_handler_test.py	276	def test_top_level_list_accepted_as_fallback(self):	CODE
LOW	tests/annotation_test.py	47	def assert_char_interval_match_source(	CODE
LOW	tests/annotation_test.py	80	def test_annotate_text_single_chunk(self):	CODE
LOW	tests/annotation_test.py	206	def test_annotate_text_without_index_suffix(self):	CODE
LOW	tests/annotation_test.py	325	def test_annotate_text_with_attributes_suffix(self):	CODE
LOW	tests/annotation_test.py	469	def test_annotate_text_multiple_chunks(self):	CODE
LOW	tests/annotation_test.py	569	def test_annotate_text_no_extractions(self):	CODE
LOW	tests/annotation_test.py	766	def test_annotate_documents_exceptions(	CODE
LOW	tests/annotation_test.py	817	def test_multipass_extraction_non_overlapping(self):	CODE
LOW	tests/annotation_test.py	867	def test_multipass_extraction_overlapping(self):	CODE
LOW	tests/annotation_test.py	918	def test_multipass_extraction_single_pass(self):	CODE
LOW	tests/annotation_test.py	948	def test_multipass_extraction_empty_passes(self):	CODE
LOW	tests/annotation_test.py	1052	def test_merge_non_overlapping_extractions(	CODE
LOW	tests/annotation_test.py	1160	def test_yields_documents_not_generators(self):	CODE
LOW	tests/annotation_test.py	1220	def test_context_window_includes_previous_chunk_text(self):	CODE
LOW	tests/annotation_test.py	1271	def test_no_context_included_when_disabled(self):	CODE
LOW	tests/annotation_test.py	1309	def test_context_window_per_document_isolation(self):	CODE
LOW⚡	tests/schema_test.py	48	def _openai_attribute_properties(openai_schema, extraction_class):	CODE
LOW⚡	tests/schema_test.py	57	def test_abstract_methods_required(self):	CODE
LOW⚡	tests/schema_test.py	62	def test_subclass_must_implement_all_methods(self):	CODE
LOW⚡	tests/schema_test.py	971	def test_base_schema_rejects_user_schemas_by_default(self):	CODE
LOW⚡	tests/schema_test.py	975	def test_gemini_from_schema_dict_targets_json_schema_field(self):	CODE
LOW⚡	tests/schema_test.py	985	def test_gemini_from_schema_dict_validates_envelope(self):	CODE
LOW⚡	tests/schema_test.py	995	def test_openai_from_schema_dict_builds_response_format(self):	CODE
LOW	tests/schema_test.py	78	def test_get_schema_class_returns_none_by_default(self):	CODE
LOW	tests/schema_test.py	88	def test_apply_schema_stores_instance(self):	CODE
LOW	tests/schema_test.py	257	def test_from_examples_constructs_expected_schema(	CODE
LOW	tests/schema_test.py	264	def test_to_provider_config_returns_response_schema(self):	CODE
LOW	tests/schema_test.py	286	def test_requires_raw_output_returns_true(self):	CODE
LOW	tests/schema_test.py	307	def test_response_format_returns_json_schema_response_format(self):	CODE
LOW	tests/schema_test.py	340	def test_to_provider_config_uses_provider_schema_hook(self):	CODE
LOW	tests/schema_test.py	348	def test_from_examples_constructs_strict_openai_schema(self):	CODE
LOW	tests/schema_test.py	446	def test_from_examples_preserves_list_attribute_schema(self):	CODE
LOW	tests/schema_test.py	473	def test_from_examples_empty_examples_allow_empty_extraction_objects(self):	CODE
LOW	tests/schema_test.py	487	def test_validate_format_rejects_yaml(self):	CODE
LOW	tests/schema_test.py	498	def test_requires_raw_output_returns_true(self):	CODE
LOW	tests/schema_test.py	504	def test_validate_format_warns_when_fences_enabled(self):	CODE
LOW	tests/schema_test.py	517	def test_validate_format_warns_with_wrong_wrapper_key(self):	CODE
LOW	tests/schema_test.py	532	def test_from_examples_preserves_scalar_attribute_types(self):	CODE
LOW	tests/schema_test.py	569	def test_from_examples_preserves_mixed_numeric_attribute_types(self):	CODE
LOW	tests/schema_test.py	620	def test_from_examples_allows_none_attribute_values(self):	CODE
472 more matches not shown…

Docstring Block Structure36 hits · 180 pts

Severity	File	Line	Snippet	Context
HIGH	langextract/plugins.py	125	Load a provider class from module:Class specification. Args: spec: Import specification in format "module.path:Cl	STRING
HIGH	langextract/plugins.py	186	Get a provider class by name. Args: name: Provider name (e.g., "gemini", "openai", "ollama"). allow_override:	STRING
HIGH	langextract/annotation.py	124	Iterates over documents to yield text chunks along with the document ID. Args: documents: A sequence of Document	STRING
HIGH	langextract/annotation.py	222	Annotates a sequence of documents with NLP extractions. Breaks documents into chunks, processes them into prompts	STRING
HIGH	langextract/chunking.py	146	Creates a token interval. Args: start_index: first token's index (inclusive). end_index: last token's index +	STRING
HIGH	langextract/chunking.py	173	Get the text within an interval of tokens. Args: tokenized_text: Tokenized documents. token_interval: An inte	STRING
HIGH	langextract/chunking.py	220	Returns the char interval corresponding to the token interval. Args: tokenized_text: Document. token_interval	STRING
HIGH	langextract/chunking.py	247	Converts all whitespace characters in input text to a single space. Args: text: Input to sanitize. Returns:	STRING
HIGH	langextract/io.py	51	Loads the dataset from a CSV file. Args: delimiter: The delimiter to use when reading the CSV file. Yiel	STRING
HIGH	langextract/io.py	148	Loads annotated documents from a JSON Lines file. Args: jsonl_path: The file path to the JSON Lines file. sho	STRING
HIGH	langextract/io.py	198	Reads a CSV file and yields rows as dicts. Args: filepath: The path to the file. column_names: The names of t	STRING
HIGH	langextract/io.py	271	Download text content from a URL with optional progress bar. Args: url: The URL to download from. timeout: Re	STRING
HIGH	langextract/prompt_validation.py	136	Align extractions to their own example text and collect issues. Args: examples: The few-shot examples to validate	STRING
HIGH	langextract/prompting.py	56	Reads a structured prompt template from a file. Args: prompt_path: Path to a file containing PromptTemplateStruct	STRING
HIGH	langextract/factory.py	114	Create a language model instance from configuration. Args: config: Model configuration with optional model_id and	STRING
HIGH	langextract/factory.py	236	Internal helper to create a model with optional schema constraints. This function creates a language model and option	STRING
HIGH	langextract/extraction.py	76	Extracts structured information from text. Retrieves structured information from the provided text or documents using	STRING
HIGH	langextract/resolver.py	282	Runs resolve function on text with YAML/JSON extraction data. Args: input_text: The input text to be proces	STRING
HIGH	langextract/resolver.py	406	Parses a YAML or JSON-formatted string into extraction data. This method is kept for backward compatibility with te	STRING
HIGH	langextract/resolver.py	441	Extracts and orders extraction data based on their associated indexes. This function processes a list of dictionari	STRING
HIGH	langextract/core/output_schema.py	111	Validates the LangExtract output envelope and returns an isolated copy. LangExtract's resolver parses a top-level JSO	STRING
HIGH	langextract/core/output_schema.py	255	Builds a schema for one LangExtract extraction object. Pair this with `extractions_schema()` to produce the full outp	STRING
HIGH	langextract/core/base_model.py	206	Parses model output as JSON or YAML. Note: This expects raw JSON/YAML without code fences. Code fence extractio	STRING
HIGH	langextract/core/tokenizer.py	474	Reconstructs the substring of the original text spanning a given token interval. Args: tokenized_text: A Tokenize	STRING
HIGH	langextract/core/tokenizer.py	586	Finds a 'sentence' interval from a given start index. Sentence boundaries are defined by: - punctuation tokens in	STRING
HIGH	langextract/core/format_handler.py	154	Parse model output to extract data. Args: text: Raw model output. strict: If True, enforce strict schem	STRING
HIGH	langextract/core/format_handler.py	279	Extract content from text, handling fences if configured. Args: text: Input text that may contain fenced bloc	STRING
HIGH	langextract/providers/openai_batch.py	357	Execute batch inference on multiple prompts using OpenAI Batch API. Args: client: OpenAI client instance (or comp	STRING
HIGH	langextract/providers/router.py	140	Resolve a model ID to a provider class. Args: model_id: The model identifier to resolve. Returns: The prov	STRING
HIGH	langextract/providers/router.py	171	Resolve a provider name to a provider class. This allows explicit provider selection by name or class name. Args:	STRING
HIGH	langextract/providers/gemini_batch.py	327	Submit a file-based batch job to Vertex AI using GCS storage. Batch processing is only supported with Vertex AI becau	STRING
HIGH	langextract/providers/gemini_batch.py	562	Poll batch job until completion or timeout. Args: client: google.genai.Client instance for polling job status.	STRING
HIGH	langextract/providers/gemini_batch.py	639	Extract text outputs from file-based batch results, preserving order. Reads results from GCS output directory. Arg	STRING
HIGH	langextract/providers/gemini_batch.py	719	Execute batch inference on multiple prompts using the Vertex AI Batch API. This function provides file-based batch pr	STRING
HIGH	langextract/providers/ollama.py	561	Sends a prompt to an Ollama model and returns the generated response. Note: This is a low-level method. Constructor	STRING
HIGH	scripts/create_provider_plugin.py	240	\ """Schema implementation for {provider_name} provider.""" import langextract as lx from lange	STRING

Over-Commented Block108 hits · 101 pts

Severity	File	Line	Snippet	Context
LOW	autoformat.sh	1	#!/bin/bash	COMMENT
LOW	.pre-commit-config.yaml	1	# Copyright 2025 Google LLC.	COMMENT
LOW	pyproject.toml	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tox.ini	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/chunking_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/progress_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/format_handler_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/annotation_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/schema_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/prompting_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/inference_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/provider_schema_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/extract_precedence_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/resolver_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/resolver_test.py	661	extraction_text="prednisone",	COMMENT
LOW	tests/fuzzy_alignment_cases_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/test_kwargs_passthrough.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/gemini_retry_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/data_lib_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/init_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/openai_batch_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/registry_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/tokenizer_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/extract_schema_integration_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/prompt_validation_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/factory_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/visualization_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/io_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/provider_plugin_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/factory_schema_test.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/test_live_api.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/test_ollama_integration.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	tests/test_gemini_batch_api.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/plugins.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/annotation.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/chunking.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/registry.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/data_lib.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/io.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/prompt_validation.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/prompting.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/__init__.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/visualization.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/factory.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/tokenizer.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/extraction.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/resolver.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/inference.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/exceptions.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/progress.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/data.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/schema.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/core/output_schema.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/core/__init__.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/core/types.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/core/base_model.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/core/tokenizer.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/core/format_handler.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/core/exceptions.py	1	# Copyright 2025 Google LLC.	COMMENT
LOW	langextract/core/data.py	1	# Copyright 2025 Google LLC.	COMMENT
48 more matches not shown…

Unused Imports49 hits · 48 pts

Severity	File	Line	Context
LOW	tests/openai_batch_test.py	19	CODE
LOW	langextract/plugins.py	20	CODE
LOW	langextract/registry.py	21	CODE
LOW	langextract/data_lib.py	16	CODE
LOW	langextract/io.py	16	CODE
LOW	langextract/prompt_validation.py	17	CODE
LOW	langextract/prompting.py	16	CODE
LOW	langextract/__init__.py	21	CODE
LOW	langextract/visualization.py	24	CODE
LOW	langextract/factory.py	22	CODE
LOW	langextract/tokenizer.py	21	CODE
LOW	langextract/tokenizer.py	25	CODE
LOW	langextract/extraction.py	17	CODE
LOW	langextract/resolver.py	21	CODE
LOW	langextract/inference.py	21	CODE
LOW	langextract/exceptions.py	22	CODE
LOW	langextract/progress.py	16	CODE
LOW	langextract/data.py	21	CODE
LOW	langextract/data.py	25	CODE
LOW	langextract/schema.py	22	CODE
LOW	langextract/core/output_schema.py	16	CODE
LOW	langextract/core/__init__.py	22	CODE
LOW	langextract/core/types.py	16	CODE
LOW	langextract/core/base_model.py	16	CODE
LOW	langextract/core/format_handler.py	17	CODE
LOW	langextract/core/exceptions.py	21	CODE
LOW	langextract/core/data.py	16	CODE
LOW	langextract/core/schema.py	16	CODE
LOW	langextract/core/debug_utils.py	16	CODE
LOW	langextract/providers/openai_batch.py	22	CODE
LOW	langextract/providers/gemini.py	19	CODE
LOW	langextract/providers/openai.py	18	CODE
LOW	langextract/providers/router.py	22	CODE
LOW	langextract/providers/gemini_batch.py	25	CODE
LOW	langextract/providers/ollama.py	84	CODE
LOW	langextract/providers/schemas/__init__.py	16	CODE
LOW	langextract/providers/schemas/gemini.py	18	CODE
LOW	langextract/providers/schemas/openai.py	18	CODE
LOW	langextract/_compat/registry.py	18	CODE
LOW	langextract/_compat/__init__.py	21	CODE
LOW	langextract/_compat/inference.py	17	CODE
LOW	langextract/_compat/exceptions.py	18	CODE
LOW	langextract/_compat/schema.py	18	CODE
LOW	…amples/custom_provider_plugin/test_example_provider.py	24	CODE
LOW	…ovider_plugin/langextract_provider_example/provider.py	17	CODE
LOW	…ovider_plugin/langextract_provider_example/__init__.py	17	CODE
LOW	…provider_plugin/langextract_provider_example/schema.py	17	CODE
LOW	benchmarks/fuzzy_benchmark.py	29	CODE
LOW	scripts/validate_community_providers.py	18	CODE

Excessive Try-Catch Wrapping44 hits · 45 pts

Severity	File	Line	Snippet	Context
LOW	tests/prompt_validation_test.py	457	except Exception: # pylint: disable=broad-except	CODE
LOW	tests/prompt_validation_test.py	505	except Exception: # pylint: disable=broad-except	CODE
LOW	tests/prompt_validation_test.py	526	except Exception: # pylint: disable=broad-except	CODE
LOW	langextract/prompting.py	78	except Exception as e:	CODE
LOW	langextract/visualization.py	59	except Exception:	CODE
LOW	langextract/resolver.py	433	except Exception as e:	CODE
LOW	langextract/core/base_model.py	228	except Exception as e:	CODE
LOW	langextract/core/debug_utils.py	89	except Exception:	CODE
LOW	langextract/core/debug_utils.py	129	except Exception:	CODE
LOW	langextract/core/debug_utils.py	184	except Exception:	CODE
LOW	langextract/providers/openai_batch.py	253	except Exception as e:	CODE
LOW	langextract/providers/openai_batch.py	277	except Exception as e:	CODE
LOW	langextract/providers/openai_batch.py	313	except Exception as e:	CODE
LOW	langextract/providers/openai_batch.py	444	except Exception as e:	CODE
LOW	langextract/providers/openai_batch.py	470	except Exception as e:	CODE
LOW	langextract/providers/openai_batch.py	496	except Exception as e:	CODE
LOW	langextract/providers/openai_batch.py	508	except Exception as e:	CODE
LOW	langextract/providers/__init__.py	131	except Exception as e:	CODE
LOW	langextract/providers/__init__.py	136	except Exception as e:	CODE
LOW	langextract/providers/gemini.py	373	except Exception as e:	CODE
LOW	langextract/providers/gemini.py	457	except Exception as e:	CODE
LOW	langextract/providers/gemini.py	494	except Exception as e:	CODE
LOW	langextract/providers/openai.py	258	except Exception as e:	CODE
LOW	langextract/providers/openai.py	337	except Exception as e:	CODE
LOW	langextract/providers/openai.py	375	except Exception as e:	CODE
LOW	langextract/providers/gemini_batch.py	172	except Exception:	CODE
LOW	langextract/providers/gemini_batch.py	253	except Exception as e:	CODE
LOW	langextract/providers/gemini_batch.py	426	except Exception as e:	CODE
LOW	langextract/providers/gemini_batch.py	467	except Exception as e:	CODE
LOW	langextract/providers/gemini_batch.py	478	except Exception as e:	CODE
LOW	langextract/providers/gemini_batch.py	596	except Exception as e:	CODE
LOW	langextract/providers/gemini_batch.py	848	except Exception as e:	CODE
LOW	langextract/providers/ollama.py	313	except Exception as e:	CODE
LOW	…ovider_plugin/langextract_provider_example/provider.py	181	except Exception as e:	CODE
LOW	examples/ollama/demo_ollama.py	452	except Exception as e:	CODE
LOW	examples/ollama/demo_ollama.py	533	except Exception as e:	CODE
MEDIUM	benchmarks/plotting.py	339	print(f"Error loading {json_file}: {e}")	CODE
LOW	scripts/create_provider_plugin.py	375	except Exception as e:	CODE
LOW	scripts/create_provider_plugin.py	393	except Exception:	CODE
LOW	scripts/create_provider_plugin.py	395	except Exception as e:	CODE
LOW	scripts/create_provider_plugin.py	429	except Exception as e:	STRING
LOW	scripts/create_provider_plugin.py	444	except Exception as e:	STRING
LOW	.github/scripts/zenodo_publish.py	211	except Exception as e:	CODE
MEDIUM	.github/scripts/zenodo_publish.py	180	def main() -> int:	CODE

Magic Placeholder Names9 hits · 42 pts

Severity	File	Line	Snippet	Context
HIGH	README.md	213	docker run --rm -e LANGEXTRACT_API_KEY="your-api-key" langextract python your_script.py	CODE
HIGH	README.md	236	export LANGEXTRACT_API_KEY="your-api-key-here"	CODE
HIGH	README.md	246	LANGEXTRACT_API_KEY=your-api-key-here	CODE
HIGH	README.md	275	api_key="your-api-key-here" # Only use this for testing/development	CODE
HIGH	docs/examples/medication_examples.md	51	api_key="your-api-key-here" # Optional if LANGEXTRACT_API_KEY environment variable is set	CODE
HIGH	docs/examples/medication_examples.md	171	api_key="your-api-key-here" # Optional if LANGEXTRACT_API_KEY environment variable is set	STRING
HIGH	docs/examples/japanese_extraction.md	43	api_key="your-api-key-here" # Optional if env var is set	CODE
HIGH	examples/custom_provider_plugin/README.md	125	provider_kwargs={"api_key": "your-api-key"},	CODE
HIGH	examples/custom_provider_plugin/README.md	142	provider_kwargs={"api_key": "your-api-key"},	CODE

Deep Nesting38 hits · 37 pts

Severity	File	Line	Context
LOW	tests/extract_schema_integration_test.py	116	CODE
LOW	tests/extract_schema_integration_test.py	150	CODE
LOW	langextract/annotation.py	46	CODE
LOW	langextract/annotation.py	285	CODE
LOW	langextract/data_lib.py	27	CODE
LOW	langextract/io.py	85	CODE
LOW	langextract/io.py	265	CODE
LOW	langextract/prompt_validation.py	130	CODE
LOW	langextract/prompting.py	52	CODE
LOW	langextract/factory.py	56	CODE
LOW	langextract/extraction.py	45	CODE
LOW	langextract/resolver.py	1287	CODE
LOW	langextract/resolver.py	437	CODE
LOW	langextract/resolver.py	591	CODE
LOW	langextract/core/tokenizer.py	580	CODE
LOW	langextract/core/tokenizer.py	336	CODE
LOW	langextract/core/format_handler.py	151	CODE
LOW	langextract/providers/openai_batch.py	411	CODE
LOW	langextract/providers/__init__.py	71	CODE
LOW	langextract/providers/__init__.py	149	CODE
LOW	langextract/providers/gemini.py	351	CODE
LOW	langextract/providers/gemini.py	393	CODE
LOW	langextract/providers/openai.py	284	CODE
LOW	langextract/providers/router.py	170	CODE
LOW	langextract/providers/gemini_batch.py	259	CODE
LOW	langextract/providers/gemini_batch.py	633	CODE
LOW	langextract/providers/gemini_batch.py	452	CODE
LOW	langextract/providers/schemas/gemini.py	121	CODE
LOW	langextract/providers/schemas/openai.py	85	CODE
LOW	examples/ollama/demo_ollama.py	418	CODE
LOW	benchmarks/benchmark.py	140	CODE
LOW	benchmarks/benchmark.py	276	CODE
LOW	benchmarks/benchmark.py	311	CODE
LOW	benchmarks/fuzzy_benchmark.py	342	CODE
LOW	benchmarks/plotting.py	170	CODE
LOW	benchmarks/plotting.py	220	CODE
LOW	benchmarks/plotting.py	376	CODE
LOW	benchmarks/plotting.py	492	CODE

Cross-File Repetition6 hits · 30 pts

Severity	File	Snippet	Context
HIGH	tests/test_live_api.py	the patient was prescribed lisinopril and metformin last month. he takes the lisinopril 10mg daily for hypertension, but	STRING
HIGH	docs/examples/medication_examples.md	the patient was prescribed lisinopril and metformin last month. he takes the lisinopril 10mg daily for hypertension, but	STRING
HIGH	examples/ollama/demo_ollama.py	the patient was prescribed lisinopril and metformin last month. he takes the lisinopril 10mg daily for hypertension, but	STRING
HIGH	tests/test_live_api.py	extract medications with their details, using attributes to group related information: 1. extract entities in the order	STRING
HIGH	docs/examples/medication_examples.md	extract medications with their details, using attributes to group related information: 1. extract entities in the order	STRING
HIGH	examples/ollama/demo_ollama.py	extract medications with their details, using attributes to group related information: 1. extract entities in the order	STRING

Self-Referential Comments9 hits · 25 pts

Severity	File	Line	Snippet	Context
MEDIUM	tests/annotation_test.py	701	# Define a side effect function so return length based on batch length.	COMMENT
MEDIUM	tests/resolver_test.py	2010	# Define a chunk that includes the entire text.	COMMENT
MEDIUM	tests/resolver_test.py	2052	# Define a chunk that includes the entire text.	COMMENT
MEDIUM	tests/resolver_test.py	2098	# Define a chunk that includes too many tokens.	COMMENT
MEDIUM	tests/resolver_test.py	2139	# Define a correct chunk.	COMMENT
MEDIUM	tests/resolver_test.py	2166	# Define a chunk that includes the entire text.	COMMENT
MEDIUM	tests/extract_schema_integration_test.py	191	# Create a mock instance with required attributes	COMMENT
MEDIUM	tests/extract_schema_integration_test.py	242	# Create a mock Gemini schema with validate_format that issues warnings	COMMENT
MEDIUM	langextract/providers/ollama.py	24	# Create an example for few-shot learning	STRING

Modern Structural Boilerplate22 hits · 22 pts

Severity	File	Line	Snippet	Context
LOW	tests/openai_batch_test.py	49	def set_content(self, file_id: str, text: str) -> None:	CODE
LOW	langextract/plugins.py	30	__all__ = ["available_providers", "get_provider_class"]	CODE
LOW	langextract/prompt_validation.py	30	__all__ = [	CODE
LOW	langextract/prompting.py	268	def _update_state(self, document_id: str, chunk_text: str) -> None:	CODE
LOW	langextract/__init__.py	30	__all__ = [	CODE
LOW	langextract/exceptions.py	35	__all__ = [	CODE
LOW	langextract/schema.py	47	__all__ = [	CODE
LOW	langextract/core/output_schema.py	26	__all__ = [	CODE
LOW	langextract/core/__init__.py	24	__all__ = [	CODE
LOW	langextract/core/types.py	24	__all__ = [	CODE
LOW	langextract/core/base_model.py	29	__all__ = ['BaseLanguageModel']	CODE
LOW	langextract/core/base_model.py	125	def set_fence_output(self, fence_output: bool \| None) -> None:	CODE
LOW	langextract/core/tokenizer.py	35	__all__ = [	CODE
LOW	langextract/core/exceptions.py	23	__all__ = [	CODE
LOW	langextract/core/data.py	30	__all__ = [	CODE
LOW	langextract/core/schema.py	26	__all__ = [	CODE
LOW	langextract/providers/__init__.py	33	__all__ = [	CODE
LOW	langextract/providers/gemini_batch.py	452	def set_multi(self, items: Sequence[tuple[dict, str]]) -> None:	CODE
LOW	langextract/providers/schemas/__init__.py	24	__all__ = ["GeminiSchema", "OpenAISchema"]	CODE
LOW	langextract/_compat/__init__.py	23	__all__ = ["inference", "schema", "exceptions", "registry"]	CODE
LOW	…ovider_plugin/langextract_provider_example/__init__.py	19	__all__ = ["CustomGeminiProvider"]	CODE
LOW	.github/scripts/zenodo_publish.py	138	def update_metadata(draft_id: str) -> None:	CODE

Decorative Section Separators5 hits · 21 pts

Severity	File	Line	Snippet	Context
MEDIUM	tests/annotation_test.py	503	# -------------------------------------------------------------------------	COMMENT
MEDIUM⚡	tests/resolver_test.py	667	# --------------------------------------------------------------------	COMMENT
MEDIUM⚡	tests/resolver_test.py	670	# --------------------------------------------------------------------	COMMENT
MEDIUM⚡	tests/resolver_test.py	673	# --------------------------------------------------------------------	COMMENT
MEDIUM⚡	tests/resolver_test.py	676	# --------------------------------------------------------------------	COMMENT

Fake / Example Data13 hits · 12 pts

Severity	File	Line	Snippet	Context
LOW	tests/annotation_test.py	91	- patient: "Jane Doe"	CODE
LOW	tests/annotation_test.py	118	extraction_text="Jane Doe",	CODE
LOW	tests/annotation_test.py	217	- patient: "Jane Doe"	CODE
LOW	tests/annotation_test.py	237	extraction_text="Jane Doe",	CODE
LOW	tests/annotation_test.py	336	- patient: "Jane Doe"	CODE
LOW	tests/annotation_test.py	371	extraction_text="Jane Doe",	CODE
LOW	tests/schema_test.py	218	extraction_text="John Doe",	CODE
LOW	tests/resolver_test.py	413	"patient": "Jane Doe",	CODE
LOW	tests/resolver_test.py	430	extraction_text="Jane Doe",	CODE
LOW	tests/resolver_test.py	454	"patient": "John Doe",	CODE
LOW	tests/resolver_test.py	493	extraction_text="John Doe",	CODE
LOW	tests/data_lib_test.py	195	extraction_text="placeholder",	CODE
LOW	tests/tokenizer_test.py	812	expected_substring="Jane Doe",	CODE

AI Structural Patterns10 hits · 10 pts

Severity	File	Line	Context
LOW	langextract/annotation.py	209	CODE
LOW	langextract/annotation.py	532	CODE
LOW	langextract/extraction.py	45	CODE
LOW	langextract/resolver.py	327	CODE
LOW	langextract/resolver.py	789	CODE
LOW	langextract/providers/gemini.py	166	CODE
LOW	langextract/providers/openai.py	106	CODE
LOW	langextract/providers/ollama.py	379	CODE
LOW	langextract/providers/ollama.py	476	CODE
LOW	langextract/providers/ollama.py	540	CODE

Hallucination Indicators1 hit · 10 pts

Severity	File	Line	Snippet	Context
CRITICAL	langextract/_compat/README.md	16	- `from langextract.inference import InferenceOutputError` → `from langextract.core.exceptions import InferenceOutputErr	CODE

Cross-Language Confusion2 hits · 8 pts

Severity	File	Line	Snippet	Context
HIGH	tests/test_kwargs_passthrough.py	700	"""Format key should be omitted from payload when None (not sent as null)."""	STRING
HIGH	langextract/visualization.py	492	let animationInterval = null;	CODE

Modern AI Meta-Vocabulary3 hits · 6 pts

Severity	File	Line	Snippet	Context
MEDIUM	langextract/providers/ollama.py	24	# Create an example for few-shot learning	STRING
MEDIUM	docs/examples/japanese_extraction.md	20	# Define example data (few-shot examples help the model understand the task)	COMMENT
MEDIUM	skills/langextract-usage/SKILL.md	114	examples=examples, # few-shot examples (required)	CODE

AI Slop Vocabulary2 hits · 4 pts

Severity	File	Line	Snippet	Context
MEDIUM	langextract/core/tokenizer.py	278	# Fallback to the robust regex method	COMMENT
MEDIUM	benchmarks/plotting.py	37	"""Generate comprehensive benchmark visualization.	STRING

Redundant / Tautological Comments2 hits · 3 pts

Severity	File	Line	Snippet	Context
LOW	langextract/core/base_model.py	220	# Check if we have a format_type attribute (providers should set this)	COMMENT
LOW	langextract/providers/gemini_batch.py	236	# Check if rule already exists	COMMENT

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings