Repository Analysis

google/langextract

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

25.9 Moderate AI signal View on GitHub
25.9
Adjusted Score
25.9
Raw Score
100%
Time Factor
2026-05-21
Last Push
36,700
Stars
Python
Language
39,896
Lines of Code
129
Files
787
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 1HIGH 57MEDIUM 18LOW 711

Pattern Findings

787 matches across 15 categories. Click a row to expand file-level details.

Hyper-Verbose Identifiers463 hits · 448 pts
SeverityFileLineSnippet
LOWtests/chunking_test.py69 def test_multi_sentence_chunk(self):
LOWtests/chunking_test.py96 def test_sentence_with_multiple_newlines_and_right_interval(self):
LOWtests/chunking_test.py164 def test_long_token_gets_own_chunk(self):
LOWtests/chunking_test.py207 def test_newline_at_chunk_boundary_does_not_create_empty_interval(self):
LOWtests/chunking_test.py261 def test_newlines_is_secondary_sentence_break(self):
LOWtests/chunking_test.py312 def test_tokenizer_propagation(self):
LOWtests/chunking_test.py425 def test_make_batches_of_textchunk(
LOWtests/chunking_test.py473 def test_text_chunk_additional_context(self):
LOWtests/chunking_test.py486 def test_chunk_iterator_without_additional_context(self):
LOWtests/chunking_test.py497 def test_multiple_chunks_with_additional_context(self):
LOWtests/chunking_test.py549 def test_text_chunk_properties(
LOWtests/progress_test.py27 def test_download_progress_bar(self):
LOWtests/progress_test.py37 def test_extraction_progress_bar(self):
LOWtests/progress_test.py47 def test_save_load_progress_bars(self):
LOWtests/progress_test.py57 def test_model_info_extraction(self):
LOWtests/progress_test.py68 def test_formatting_functions(self):
LOWtests/format_handler_test.py147 def test_end_to_end_integration_with_prompt_and_resolver(self):
LOWtests/format_handler_test.py232 def test_format_parse_roundtrip(
LOWtests/format_handler_test.py260 def test_think_tags_stripped_before_parsing(self):
LOWtests/format_handler_test.py276 def test_top_level_list_accepted_as_fallback(self):
LOWtests/annotation_test.py47 def assert_char_interval_match_source(
LOWtests/annotation_test.py80 def test_annotate_text_single_chunk(self):
LOWtests/annotation_test.py206 def test_annotate_text_without_index_suffix(self):
LOWtests/annotation_test.py325 def test_annotate_text_with_attributes_suffix(self):
LOWtests/annotation_test.py469 def test_annotate_text_multiple_chunks(self):
LOWtests/annotation_test.py569 def test_annotate_text_no_extractions(self):
LOWtests/annotation_test.py766 def test_annotate_documents_exceptions(
LOWtests/annotation_test.py817 def test_multipass_extraction_non_overlapping(self):
LOWtests/annotation_test.py867 def test_multipass_extraction_overlapping(self):
LOWtests/annotation_test.py918 def test_multipass_extraction_single_pass(self):
LOWtests/annotation_test.py948 def test_multipass_extraction_empty_passes(self):
LOWtests/annotation_test.py1052 def test_merge_non_overlapping_extractions(
LOWtests/annotation_test.py1160 def test_yields_documents_not_generators(self):
LOWtests/annotation_test.py1220 def test_context_window_includes_previous_chunk_text(self):
LOWtests/annotation_test.py1271 def test_no_context_included_when_disabled(self):
LOWtests/annotation_test.py1309 def test_context_window_per_document_isolation(self):
LOWtests/schema_test.py47def _openai_attribute_properties(openai_schema, extraction_class):
LOWtests/schema_test.py56 def test_abstract_methods_required(self):
LOWtests/schema_test.py61 def test_subclass_must_implement_all_methods(self):
LOWtests/schema_test.py77 def test_get_schema_class_returns_none_by_default(self):
LOWtests/schema_test.py87 def test_apply_schema_stores_instance(self):
LOWtests/schema_test.py256 def test_from_examples_constructs_expected_schema(
LOWtests/schema_test.py263 def test_to_provider_config_returns_response_schema(self):
LOWtests/schema_test.py285 def test_requires_raw_output_returns_true(self):
LOWtests/schema_test.py306 def test_response_format_returns_json_schema_response_format(self):
LOWtests/schema_test.py339 def test_to_provider_config_uses_provider_schema_hook(self):
LOWtests/schema_test.py347 def test_from_examples_constructs_strict_openai_schema(self):
LOWtests/schema_test.py445 def test_from_examples_preserves_list_attribute_schema(self):
LOWtests/schema_test.py472 def test_from_examples_empty_examples_allow_empty_extraction_objects(self):
LOWtests/schema_test.py486 def test_validate_format_rejects_yaml(self):
LOWtests/schema_test.py497 def test_requires_raw_output_returns_true(self):
LOWtests/schema_test.py503 def test_validate_format_warns_when_fences_enabled(self):
LOWtests/schema_test.py516 def test_validate_format_warns_with_wrong_wrapper_key(self):
LOWtests/schema_test.py531 def test_from_examples_preserves_scalar_attribute_types(self):
LOWtests/schema_test.py568 def test_from_examples_preserves_mixed_numeric_attribute_types(self):
LOWtests/schema_test.py619 def test_from_examples_allows_none_attribute_values(self):
LOWtests/schema_test.py641 def test_from_examples_strict_false_emits_non_strict_response_format(self):
LOWtests/schema_test.py647 def test_response_format_returns_isolated_schema_dict(self):
LOWtests/schema_test.py658 def test_instance_is_frozen_and_dict_is_isolated(self):
LOWtests/schema_test.py748 def test_base_schema_no_validation(self):
403 more matches not shown…
Docstring Block Structure34 hits · 170 pts
SeverityFileLineSnippet
HIGHlangextract/plugins.py125Load a provider class from module:Class specification. Args: spec: Import specification in format "module.path:Cl
HIGHlangextract/plugins.py186Get a provider class by name. Args: name: Provider name (e.g., "gemini", "openai", "ollama"). allow_override:
HIGHlangextract/annotation.py124Iterates over documents to yield text chunks along with the document ID. Args: documents: A sequence of Document
HIGHlangextract/annotation.py222Annotates a sequence of documents with NLP extractions. Breaks documents into chunks, processes them into prompts
HIGHlangextract/chunking.py146Creates a token interval. Args: start_index: first token's index (inclusive). end_index: last token's index +
HIGHlangextract/chunking.py173Get the text within an interval of tokens. Args: tokenized_text: Tokenized documents. token_interval: An inte
HIGHlangextract/chunking.py220Returns the char interval corresponding to the token interval. Args: tokenized_text: Document. token_interval
HIGHlangextract/chunking.py247Converts all whitespace characters in input text to a single space. Args: text: Input to sanitize. Returns:
HIGHlangextract/io.py51Loads the dataset from a CSV file. Args: delimiter: The delimiter to use when reading the CSV file. Yiel
HIGHlangextract/io.py148Loads annotated documents from a JSON Lines file. Args: jsonl_path: The file path to the JSON Lines file. sho
HIGHlangextract/io.py198Reads a CSV file and yields rows as dicts. Args: filepath: The path to the file. column_names: The names of t
HIGHlangextract/io.py271Download text content from a URL with optional progress bar. Args: url: The URL to download from. timeout: Re
HIGHlangextract/prompt_validation.py134Align extractions to their own example text and collect issues. Args: examples: The few-shot examples to validate
HIGHlangextract/prompting.py56Reads a structured prompt template from a file. Args: prompt_path: Path to a file containing PromptTemplateStruct
HIGHlangextract/factory.py110Create a language model instance from configuration. Args: config: Model configuration with optional model_id and
HIGHlangextract/factory.py204Internal helper to create a model with optional schema constraints. This function creates a language model and option
HIGHlangextract/extraction.py66Extracts structured information from text. Retrieves structured information from the provided text or documents using
HIGHlangextract/resolver.py273Runs resolve function on text with YAML/JSON extraction data. Args: input_text: The input text to be proces
HIGHlangextract/resolver.py393Parses a YAML or JSON-formatted string into extraction data. This method is kept for backward compatibility with te
HIGHlangextract/resolver.py428Extracts and orders extraction data based on their associated indexes. This function processes a list of dictionari
HIGHlangextract/core/base_model.py157Parses model output as JSON or YAML. Note: This expects raw JSON/YAML without code fences. Code fence extractio
HIGHlangextract/core/tokenizer.py474Reconstructs the substring of the original text spanning a given token interval. Args: tokenized_text: A Tokenize
HIGHlangextract/core/tokenizer.py586Finds a 'sentence' interval from a given start index. Sentence boundaries are defined by: - punctuation tokens in
HIGHlangextract/core/format_handler.py154Parse model output to extract data. Args: text: Raw model output. strict: If True, enforce strict schem
HIGHlangextract/core/format_handler.py279Extract content from text, handling fences if configured. Args: text: Input text that may contain fenced bloc
HIGHlangextract/providers/openai_batch.py357Execute batch inference on multiple prompts using OpenAI Batch API. Args: client: OpenAI client instance (or comp
HIGHlangextract/providers/router.py140Resolve a model ID to a provider class. Args: model_id: The model identifier to resolve. Returns: The prov
HIGHlangextract/providers/router.py171Resolve a provider name to a provider class. This allows explicit provider selection by name or class name. Args:
HIGHlangextract/providers/gemini_batch.py317Submit a file-based batch job to Vertex AI using GCS storage. Batch processing is only supported with Vertex AI becau
HIGHlangextract/providers/gemini_batch.py552Poll batch job until completion or timeout. Args: client: google.genai.Client instance for polling job status.
HIGHlangextract/providers/gemini_batch.py629Extract text outputs from file-based batch results, preserving order. Reads results from GCS output directory. Arg
HIGHlangextract/providers/gemini_batch.py709Execute batch inference on multiple prompts using the Vertex AI Batch API. This function provides file-based batch pr
HIGHlangextract/providers/ollama.py561Sends a prompt to an Ollama model and returns the generated response. Note: This is a low-level method. Constructor
HIGHscripts/create_provider_plugin.py240\ """Schema implementation for {provider_name} provider.""" import langextract as lx from lange
Over-Commented Block107 hits · 101 pts
SeverityFileLineSnippet
LOWautoformat.sh1#!/bin/bash
LOW.pre-commit-config.yaml1# Copyright 2025 Google LLC.
LOWpyproject.toml1# Copyright 2025 Google LLC.
LOWtox.ini1# Copyright 2025 Google LLC.
LOWtests/chunking_test.py1# Copyright 2025 Google LLC.
LOWtests/progress_test.py1# Copyright 2025 Google LLC.
LOWtests/format_handler_test.py1# Copyright 2025 Google LLC.
LOWtests/annotation_test.py1# Copyright 2025 Google LLC.
LOWtests/schema_test.py1# Copyright 2025 Google LLC.
LOWtests/prompting_test.py1# Copyright 2025 Google LLC.
LOWtests/inference_test.py1# Copyright 2025 Google LLC.
LOWtests/provider_schema_test.py1# Copyright 2025 Google LLC.
LOWtests/extract_precedence_test.py1# Copyright 2025 Google LLC.
LOWtests/resolver_test.py1# Copyright 2025 Google LLC.
LOWtests/resolver_test.py661 extraction_text="prednisone",
LOWtests/fuzzy_alignment_cases_test.py1# Copyright 2025 Google LLC.
LOWtests/test_kwargs_passthrough.py1# Copyright 2025 Google LLC.
LOWtests/gemini_retry_test.py1# Copyright 2025 Google LLC.
LOWtests/data_lib_test.py1# Copyright 2025 Google LLC.
LOWtests/init_test.py1# Copyright 2025 Google LLC.
LOWtests/openai_batch_test.py1# Copyright 2025 Google LLC.
LOWtests/registry_test.py1# Copyright 2025 Google LLC.
LOWtests/tokenizer_test.py1# Copyright 2025 Google LLC.
LOWtests/extract_schema_integration_test.py1# Copyright 2025 Google LLC.
LOWtests/prompt_validation_test.py1# Copyright 2025 Google LLC.
LOWtests/factory_test.py1# Copyright 2025 Google LLC.
LOWtests/visualization_test.py1# Copyright 2025 Google LLC.
LOWtests/io_test.py1# Copyright 2025 Google LLC.
LOWtests/provider_plugin_test.py1# Copyright 2025 Google LLC.
LOWtests/factory_schema_test.py1# Copyright 2025 Google LLC.
LOWtests/test_live_api.py1# Copyright 2025 Google LLC.
LOWtests/test_ollama_integration.py1# Copyright 2025 Google LLC.
LOWtests/test_gemini_batch_api.py1# Copyright 2025 Google LLC.
LOWlangextract/plugins.py1# Copyright 2025 Google LLC.
LOWlangextract/annotation.py1# Copyright 2025 Google LLC.
LOWlangextract/chunking.py1# Copyright 2025 Google LLC.
LOWlangextract/registry.py1# Copyright 2025 Google LLC.
LOWlangextract/data_lib.py1# Copyright 2025 Google LLC.
LOWlangextract/io.py1# Copyright 2025 Google LLC.
LOWlangextract/prompt_validation.py1# Copyright 2025 Google LLC.
LOWlangextract/prompting.py1# Copyright 2025 Google LLC.
LOWlangextract/__init__.py1# Copyright 2025 Google LLC.
LOWlangextract/visualization.py1# Copyright 2025 Google LLC.
LOWlangextract/factory.py1# Copyright 2025 Google LLC.
LOWlangextract/tokenizer.py1# Copyright 2025 Google LLC.
LOWlangextract/extraction.py1# Copyright 2025 Google LLC.
LOWlangextract/resolver.py1# Copyright 2025 Google LLC.
LOWlangextract/inference.py1# Copyright 2025 Google LLC.
LOWlangextract/exceptions.py1# Copyright 2025 Google LLC.
LOWlangextract/progress.py1# Copyright 2025 Google LLC.
LOWlangextract/data.py1# Copyright 2025 Google LLC.
LOWlangextract/schema.py1# Copyright 2025 Google LLC.
LOWlangextract/core/__init__.py1# Copyright 2025 Google LLC.
LOWlangextract/core/types.py1# Copyright 2025 Google LLC.
LOWlangextract/core/base_model.py1# Copyright 2025 Google LLC.
LOWlangextract/core/tokenizer.py1# Copyright 2025 Google LLC.
LOWlangextract/core/format_handler.py1# Copyright 2025 Google LLC.
LOWlangextract/core/exceptions.py1# Copyright 2025 Google LLC.
LOWlangextract/core/data.py1# Copyright 2025 Google LLC.
LOWlangextract/core/schema.py1# Copyright 2025 Google LLC.
47 more matches not shown…
Cross-File Repetition12 hits · 60 pts
SeverityFileLineSnippet
HIGHtests/test_live_api.py0\ extract medication information including medication name, dosage, route, frequency, and duration in the order they app
HIGHtests/test_live_api.py0\ extract medication information including medication name, dosage, route, frequency, and duration in the order they app
HIGHtests/test_live_api.py0\ extract medication information including medication name, dosage, route, frequency, and duration in the order they app
HIGHtests/test_live_api.py0\ extract medication information including medication name, dosage, route, frequency, and duration in the order they app
HIGHtests/test_live_api.py0the patient was prescribed lisinopril and metformin last month. he takes the lisinopril 10mg daily for hypertension, but
HIGHtests/test_live_api.py0the patient was prescribed lisinopril and metformin last month. he takes the lisinopril 10mg daily for hypertension, but
HIGHdocs/examples/medication_examples.md0the patient was prescribed lisinopril and metformin last month. he takes the lisinopril 10mg daily for hypertension, but
HIGHexamples/ollama/demo_ollama.py0the patient was prescribed lisinopril and metformin last month. he takes the lisinopril 10mg daily for hypertension, but
HIGHtests/test_live_api.py0extract medications with their details, using attributes to group related information: 1. extract entities in the order
HIGHtests/test_live_api.py0extract medications with their details, using attributes to group related information: 1. extract entities in the order
HIGHdocs/examples/medication_examples.md0extract medications with their details, using attributes to group related information: 1. extract entities in the order
HIGHexamples/ollama/demo_ollama.py0extract medications with their details, using attributes to group related information: 1. extract entities in the order
Unused Imports48 hits · 48 pts
SeverityFileLineSnippet
LOWtests/openai_batch_test.py19
LOWlangextract/plugins.py20
LOWlangextract/registry.py21
LOWlangextract/data_lib.py16
LOWlangextract/io.py16
LOWlangextract/prompt_validation.py17
LOWlangextract/prompting.py16
LOWlangextract/__init__.py21
LOWlangextract/visualization.py24
LOWlangextract/factory.py22
LOWlangextract/tokenizer.py21
LOWlangextract/tokenizer.py25
LOWlangextract/extraction.py17
LOWlangextract/resolver.py21
LOWlangextract/inference.py21
LOWlangextract/exceptions.py22
LOWlangextract/progress.py16
LOWlangextract/data.py21
LOWlangextract/data.py25
LOWlangextract/schema.py21
LOWlangextract/core/__init__.py22
LOWlangextract/core/types.py16
LOWlangextract/core/base_model.py16
LOWlangextract/core/format_handler.py17
LOWlangextract/core/exceptions.py21
LOWlangextract/core/data.py16
LOWlangextract/core/schema.py16
LOWlangextract/core/debug_utils.py16
LOWlangextract/providers/openai_batch.py22
LOWlangextract/providers/gemini.py19
LOWlangextract/providers/openai.py18
LOWlangextract/providers/router.py22
LOWlangextract/providers/gemini_batch.py25
LOWlangextract/providers/ollama.py84
LOWlangextract/providers/schemas/__init__.py16
LOWlangextract/providers/schemas/gemini.py18
LOWlangextract/providers/schemas/openai.py18
LOWlangextract/_compat/registry.py18
LOWlangextract/_compat/__init__.py21
LOWlangextract/_compat/inference.py17
LOWlangextract/_compat/exceptions.py18
LOWlangextract/_compat/schema.py18
LOW…amples/custom_provider_plugin/test_example_provider.py24
LOW…ovider_plugin/langextract_provider_example/provider.py17
LOW…ovider_plugin/langextract_provider_example/__init__.py17
LOW…provider_plugin/langextract_provider_example/schema.py17
LOWbenchmarks/fuzzy_benchmark.py29
LOWscripts/validate_community_providers.py18
Excessive Try-Catch Wrapping44 hits · 45 pts
SeverityFileLineSnippet
LOWtests/prompt_validation_test.py457 except Exception: # pylint: disable=broad-except
LOWtests/prompt_validation_test.py505 except Exception: # pylint: disable=broad-except
LOWtests/prompt_validation_test.py526 except Exception: # pylint: disable=broad-except
LOWlangextract/prompting.py78 except Exception as e:
LOWlangextract/visualization.py59 except Exception:
LOWlangextract/resolver.py420 except Exception as e:
LOWlangextract/core/base_model.py179 except Exception as e:
LOWlangextract/core/debug_utils.py89 except Exception:
LOWlangextract/core/debug_utils.py129 except Exception:
LOWlangextract/core/debug_utils.py184 except Exception:
LOWlangextract/providers/openai_batch.py253 except Exception as e:
LOWlangextract/providers/openai_batch.py277 except Exception as e:
LOWlangextract/providers/openai_batch.py313 except Exception as e:
LOWlangextract/providers/openai_batch.py444 except Exception as e:
LOWlangextract/providers/openai_batch.py470 except Exception as e:
LOWlangextract/providers/openai_batch.py496 except Exception as e:
LOWlangextract/providers/openai_batch.py508 except Exception as e:
LOWlangextract/providers/__init__.py134 except Exception as e:
LOWlangextract/providers/__init__.py139 except Exception as e:
LOWlangextract/providers/gemini.py361 except Exception as e:
LOWlangextract/providers/gemini.py441 except Exception as e:
LOWlangextract/providers/gemini.py478 except Exception as e:
LOWlangextract/providers/openai.py258 except Exception as e:
LOWlangextract/providers/openai.py337 except Exception as e:
LOWlangextract/providers/openai.py375 except Exception as e:
LOWlangextract/providers/gemini_batch.py172 except Exception:
LOWlangextract/providers/gemini_batch.py253 except Exception as e:
LOWlangextract/providers/gemini_batch.py416 except Exception as e:
LOWlangextract/providers/gemini_batch.py457 except Exception as e:
LOWlangextract/providers/gemini_batch.py468 except Exception as e:
LOWlangextract/providers/gemini_batch.py586 except Exception as e:
LOWlangextract/providers/gemini_batch.py837 except Exception as e:
LOWlangextract/providers/ollama.py313 except Exception as e:
LOW…ovider_plugin/langextract_provider_example/provider.py181 except Exception as e:
LOWexamples/ollama/demo_ollama.py452 except Exception as e:
LOWexamples/ollama/demo_ollama.py533 except Exception as e:
MEDIUMbenchmarks/plotting.py339 print(f"Error loading {json_file}: {e}")
LOWscripts/create_provider_plugin.py375 except Exception as e:
LOWscripts/create_provider_plugin.py393 except Exception:
LOWscripts/create_provider_plugin.py395 except Exception as e:
LOWscripts/create_provider_plugin.py429 except Exception as e:
LOWscripts/create_provider_plugin.py444 except Exception as e:
LOW.github/scripts/zenodo_publish.py211 except Exception as e:
MEDIUM.github/scripts/zenodo_publish.py180def main() -> int:
Magic Placeholder Names9 hits · 42 pts
SeverityFileLineSnippet
HIGHREADME.md206docker run --rm -e LANGEXTRACT_API_KEY="your-api-key" langextract python your_script.py
HIGHREADME.md229export LANGEXTRACT_API_KEY="your-api-key-here"
HIGHREADME.md239LANGEXTRACT_API_KEY=your-api-key-here
HIGHREADME.md268 api_key="your-api-key-here" # Only use this for testing/development
HIGHdocs/examples/medication_examples.md51 api_key="your-api-key-here" # Optional if LANGEXTRACT_API_KEY environment variable is set
HIGHdocs/examples/medication_examples.md171 api_key="your-api-key-here" # Optional if LANGEXTRACT_API_KEY environment variable is set
HIGHdocs/examples/japanese_extraction.md43 api_key="your-api-key-here" # Optional if env var is set
HIGHexamples/custom_provider_plugin/README.md125 provider_kwargs={"api_key": "your-api-key"},
HIGHexamples/custom_provider_plugin/README.md142 provider_kwargs={"api_key": "your-api-key"},
Deep Nesting37 hits · 37 pts
SeverityFileLineSnippet
LOWtests/extract_schema_integration_test.py113
LOWtests/extract_schema_integration_test.py147
LOWlangextract/annotation.py46
LOWlangextract/annotation.py285
LOWlangextract/data_lib.py27
LOWlangextract/io.py85
LOWlangextract/io.py265
LOWlangextract/prompt_validation.py128
LOWlangextract/prompting.py52
LOWlangextract/factory.py53
LOWlangextract/extraction.py36
LOWlangextract/resolver.py1075
LOWlangextract/resolver.py424
LOWlangextract/resolver.py578
LOWlangextract/core/tokenizer.py580
LOWlangextract/core/tokenizer.py336
LOWlangextract/core/format_handler.py151
LOWlangextract/providers/openai_batch.py411
LOWlangextract/providers/__init__.py74
LOWlangextract/providers/__init__.py152
LOWlangextract/providers/gemini.py337
LOWlangextract/providers/gemini.py381
LOWlangextract/providers/openai.py284
LOWlangextract/providers/router.py170
LOWlangextract/providers/gemini_batch.py623
LOWlangextract/providers/gemini_batch.py442
LOWlangextract/providers/schemas/gemini.py98
LOWlangextract/providers/schemas/openai.py81
LOWexamples/ollama/demo_ollama.py418
LOWbenchmarks/benchmark.py140
LOWbenchmarks/benchmark.py276
LOWbenchmarks/benchmark.py311
LOWbenchmarks/fuzzy_benchmark.py342
LOWbenchmarks/plotting.py170
LOWbenchmarks/plotting.py220
LOWbenchmarks/plotting.py376
LOWbenchmarks/plotting.py492
Self-Referential Comments9 hits · 25 pts
SeverityFileLineSnippet
MEDIUMtests/annotation_test.py701 # Define a side effect function so return length based on batch length.
MEDIUMtests/resolver_test.py2010 # Define a chunk that includes the entire text.
MEDIUMtests/resolver_test.py2052 # Define a chunk that includes the entire text.
MEDIUMtests/resolver_test.py2098 # Define a chunk that includes too many tokens.
MEDIUMtests/resolver_test.py2139 # Define a correct chunk.
MEDIUMtests/resolver_test.py2166 # Define a chunk that includes the entire text.
MEDIUMtests/extract_schema_integration_test.py188 # Create a mock instance with required attributes
MEDIUMtests/extract_schema_integration_test.py239 # Create a mock Gemini schema with validate_format that issues warnings
MEDIUMlangextract/providers/ollama.py24 # Create an example for few-shot learning
Decorative Section Separators5 hits · 21 pts
SeverityFileLineSnippet
MEDIUMtests/annotation_test.py503 # -------------------------------------------------------------------------
MEDIUMtests/resolver_test.py667 # --------------------------------------------------------------------
MEDIUMtests/resolver_test.py670 # --------------------------------------------------------------------
MEDIUMtests/resolver_test.py673 # --------------------------------------------------------------------
MEDIUMtests/resolver_test.py676 # --------------------------------------------------------------------
Fake / Example Data12 hits · 12 pts
SeverityFileLineSnippet
LOWtests/annotation_test.py91 - patient: "Jane Doe"
LOWtests/annotation_test.py118 extraction_text="Jane Doe",
LOWtests/annotation_test.py217 - patient: "Jane Doe"
LOWtests/annotation_test.py237 extraction_text="Jane Doe",
LOWtests/annotation_test.py336 - patient: "Jane Doe"
LOWtests/annotation_test.py371 extraction_text="Jane Doe",
LOWtests/schema_test.py217 extraction_text="John Doe",
LOWtests/resolver_test.py413 "patient": "Jane Doe",
LOWtests/resolver_test.py430 extraction_text="Jane Doe",
LOWtests/resolver_test.py454 "patient": "John Doe",
LOWtests/resolver_test.py493 extraction_text="John Doe",
LOWtests/tokenizer_test.py812 expected_substring="Jane Doe",
Hallucination Indicators1 hit · 10 pts
SeverityFileLineSnippet
CRITICALlangextract/_compat/README.md16- `from langextract.inference import InferenceOutputError` → `from langextract.core.exceptions import InferenceOutputErr
Cross-Language Confusion2 hits · 8 pts
SeverityFileLineSnippet
HIGHtests/test_kwargs_passthrough.py700 """Format key should be omitted from payload when None (not sent as null)."""
HIGHlangextract/visualization.py492 let animationInterval = null;
AI Slop Vocabulary2 hits · 4 pts
SeverityFileLineSnippet
MEDIUMlangextract/core/tokenizer.py278 # Fallback to the robust regex method
MEDIUMbenchmarks/plotting.py37 """Generate comprehensive benchmark visualization.
Redundant / Tautological Comments2 hits · 3 pts
SeverityFileLineSnippet
LOWlangextract/core/base_model.py171 # Check if we have a format_type attribute (providers should set this)
LOWlangextract/providers/gemini_batch.py236 # Check if rule already exists