NirDiamant/RAG_Techniques

28.4

Adjusted Score

28.4

Raw Score

100%

Time Factor

2026-07-12

Last Push

28.5K

Stars

Jupyter Notebook

Language

10.0K

Lines of Code

Files

149

Pattern Hits

2026-07-14

Scan Date

0.12

HC Hit Rate

What These Metrics Mean

Adjusted Score: Primary synthetic code indicator. Raw score normalised per 1,000 lines of code and multiplied by the temporal discount factor. This is the definitive comparative metric — use it to rank repositories by AI authorship density.
Raw Score: The unmodified sum of all severity-weighted, context-multiplied pattern match scores before temporal discounting. Reflects the absolute signal strength independent of when the repository was last active.
Time Factor: The temporal discount multiplier (0–100%) applied to the raw score. Repositories last updated before ChatGPT's launch (Nov 2022) receive a 5% factor. Full signal is only assigned to repositories active in the post-adoption era (Jan 2024+).
Pattern Hits: Total count of individual pattern matches across all files and categories. A high hit count with a low score may indicate a very large codebase with isolated AI snippets; a low count with a high score indicates dense, concentrated AI signatures.
HC Hit Rate: High+Critical pattern hits per file, averaged across the repository. This orthogonal signal catches repositories where a few files are densely packed with high-severity AI tells — a strong indicator even when the normalised score appears moderate due to codebase size.
Lines of Code / Files: Total lines and files analysed. The scanner examines 94 file extensions. These denominators are used to normalise the score, enabling fair comparison between repositories of vastly different sizes.

Score History

This chart maps the temporal evolution of the adjusted synthetic code score across successive scan runs. An upward trajectory indicates ongoing incorporation of AI-generated code or expanding LLM-assisted scaffolding; a stable or declining trajectory may reflect active human refactoring, code removal, or the adoption of stricter authorship policies. The dashed secondary line (right axis) independently tracks total raw pattern hit count, which can diverge from the normalised score when codebase size changes significantly between scans.

Severity Breakdown

Classifies detected patterns by their diagnostic confidence and structural impact. CRITICAL patterns (coefficient 10) represent definitive synthetic signatures — hallucinated imports, explicit LLM attribution metadata — virtually never produced by human authors. HIGH (5) indicates strong structural tells such as cross-file repetition or cross-linguistic idioms. MEDIUM (2) covers recognisable conversational padding and AI-specific vocabulary. LOW (1) captures subtle indicators like tautological comments and generic boilerplate that require density to carry independent signal.

CRITICAL 0HIGH 4MEDIUM 43LOW 102

Directory Score Breakdown

This horizontal bar chart decomposes the repository's raw synthetic code score by top-level directory, allowing you to pinpoint precisely which modules or components carry the highest AI authorship density. Directories with disproportionately high scores relative to their size warrant targeted manual review: concentrated AI signatures often trace back to mass-generated configuration layers, auto-ported test suites, LLM-scaffolded boilerplate classes, or entire subsystems authored under heavy copilot assistance. Use this view to prioritise your human code-review effort.

Pattern Findings

The scanner identified 149 distinct pattern matches across 12 syntactic categories. Each entry below represents a discrete location in the source code where the engine recorded a statistically significant AI authorship indicator. Expand any category row to inspect the individual file paths, line numbers, code snippets, and the lexical context (CODE, COMMENT, or STRING) in which each match was detected.

Reading the findings table: The Severity column indicates the diagnostic confidence level (CRITICAL / HIGH / MEDIUM / LOW). The Context column identifies whether the match occurred inside executable code, an inline comment, or a string literal — comment-context matches receive a ×1.5 weight because LLMs systematically over-annotate. The ⚡ bolt icon marks clustered matches: three or more patterns within a 10-line window, each receiving an additional ×1.5 density multiplier as dense clusters constitute far stronger evidence of synthetic authorship than isolated hits.

Self-Referential Comments24 hits · 79 pts

Severity	File	Line	Snippet	Context
MEDIUM⚡	helper_functions.py	166	# Define the prompt template for chain-of-thought reasoning	COMMENT
MEDIUM⚡	helper_functions.py	174	# Create a PromptTemplate object with the specified template and input variables	STRING
MEDIUM⚡	helper_functions.py	180	# Create a chain by combining the prompt template and the language model	STRING
MEDIUM	…unnable_scripts/HyPE_Hypothetical_Prompt_Embeddings.py	46	# Create a retriever from the vector store	COMMENT
MEDIUM	…l_rag_techniques_runnable_scripts/choose_chunk_size.py	75	# Define the main class for the RAG method	COMMENT
MEDIUM	…nnable_scripts/HyDe_Hypothetical_Document_Embedding.py	18	# Define the HyDe retriever class - creating vector store, generating hypothetical document, and retrieving	COMMENT
MEDIUM	…l_rag_techniques_runnable_scripts/semantic_chunking.py	56	# Create a vector store and retriever from the semantic chunks	COMMENT
MEDIUM	…iques_runnable_scripts/retrieval_with_feedback_loop.py	25	# Define the Response class	COMMENT
MEDIUM	…iques_runnable_scripts/retrieval_with_feedback_loop.py	104	# Define the main RAG class	STRING
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	108	# Define the knowledge graph class	COMMENT
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	109	# Define the Concepts class	COMMENT
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	114	# Define the KnowledgeGraph class	COMMENT
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	315	# Define the Query Engine class	COMMENT
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	316	# Define the AnswerCheck class	COMMENT
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	322	# Define the QueryEngine class	COMMENT
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	809	# Create a graph RAG instance	COMMENT
MEDIUM	all_rag_techniques_runnable_scripts/graph_rag.py	47	# Define the document processor class	COMMENT
MEDIUM	all_rag_techniques_runnable_scripts/graph_rag.py	48	# Define the DocumentProcessor class	COMMENT
MEDIUM	all_rag_techniques_runnable_scripts/graph_rag.py	584	# Import necessary libraries	COMMENT
MEDIUM	all_rag_techniques_runnable_scripts/graph_rag.py	590	# Define the Visualizer class	COMMENT
MEDIUM	all_rag_techniques_runnable_scripts/graph_rag.py	731	# Define the graph RAG class	COMMENT
MEDIUM	all_rag_techniques_runnable_scripts/self_rag.py	70	# Define main class	COMMENT
MEDIUM	all_rag_techniques_runnable_scripts/simple_rag.py	41	# Create a retriever from the vector store	COMMENT
MEDIUM	…_rag_techniques_runnable_scripts/adaptive_retrieval.py	193	# Define the main Adaptive RAG class	COMMENT

Modern AI Meta-Vocabulary18 hits · 54 pts

Severity	File	Line	Snippet	Context
MEDIUM⚡	helper_functions.py	166	# Define the prompt template for chain-of-thought reasoning	COMMENT
MEDIUM	README.md	3	# Advanced RAG Techniques 🚀	COMMENT
MEDIUM	README.md	163	### 🌱 Foundational RAG Techniques	COMMENT
MEDIUM	CONTRIBUTING.md	1	# Contributing to RAG Techniques	COMMENT
MEDIUM	CONTRIBUTING.md	41	## Adding a New RAG Method	COMMENT
MEDIUM	CONTRIBUTING.md	80	### 1. [Simple RAG 🌱](https://colab.research.google.com/github/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/si	COMMENT
MEDIUM	…unnable_scripts/HyPE_Hypothetical_Prompt_Embeddings.py	185	# Initialize the HyPE-based RAG Retriever	COMMENT
MEDIUM	…l_rag_techniques_runnable_scripts/choose_chunk_size.py	75	# Define the main class for the RAG method	COMMENT
MEDIUM	…_techniques_runnable_scripts/contextual_compression.py	109	# Main function to run the RAG pipeline	COMMENT
MEDIUM	…nnable_scripts/HyDe_Hypothetical_Document_Embedding.py	76	# Create and run the RAG method instance	COMMENT
MEDIUM	…iques_runnable_scripts/retrieval_with_feedback_loop.py	104	# Define the main RAG class	STRING
MEDIUM	…able_scripts/context_enrichment_window_around_chunk.py	74	# Main class that encapsulates the RAG method	COMMENT
MEDIUM	…able_scripts/context_enrichment_window_around_chunk.py	139	# Initialize and run the RAG method	STRING
MEDIUM	…g_techniques_runnable_scripts/query_transformations.py	62	# Main class for the RAG method	COMMENT
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	809	# Create a graph RAG instance	COMMENT
MEDIUM⚡	all_rag_techniques_runnable_scripts/graph_rag.py	815	# Input a query and get the retrieved information from the graph RAG	COMMENT
MEDIUM	all_rag_techniques_runnable_scripts/graph_rag.py	731	# Define the graph RAG class	COMMENT
MEDIUM	…_rag_techniques_runnable_scripts/adaptive_retrieval.py	193	# Define the main Adaptive RAG class	COMMENT

Unused Imports49 hits · 47 pts

Severity	File	Line	Context
LOW	tests/test_imports.py	2	CODE
LOW	…unnable_scripts/HyPE_Hypothetical_Prompt_Embeddings.py	14	CODE
LOW	…unnable_scripts/HyPE_Hypothetical_Prompt_Embeddings.py	15	CODE
LOW	…_techniques_runnable_scripts/contextual_compression.py	9	CODE
LOW	…_techniques_runnable_scripts/contextual_compression.py	10	CODE
LOW	…nnable_scripts/HyDe_Hypothetical_Document_Embedding.py	9	CODE
LOW	…nnable_scripts/HyDe_Hypothetical_Document_Embedding.py	10	CODE
LOW	all_rag_techniques_runnable_scripts/raptor.py	22	CODE
LOW	all_rag_techniques_runnable_scripts/raptor.py	23	CODE
LOW	…l_rag_techniques_runnable_scripts/semantic_chunking.py	6	CODE
LOW	…g_techniques_runnable_scripts/document_augmentation.py	9	CODE
LOW	…g_techniques_runnable_scripts/document_augmentation.py	9	CODE
LOW	…g_techniques_runnable_scripts/document_augmentation.py	21	CODE
LOW	…iques_runnable_scripts/retrieval_with_feedback_loop.py	8	CODE
LOW	…iques_runnable_scripts/retrieval_with_feedback_loop.py	14	CODE
LOW	…iques_runnable_scripts/retrieval_with_feedback_loop.py	15	CODE
LOW	…able_scripts/context_enrichment_window_around_chunk.py	2	CODE
LOW	…able_scripts/context_enrichment_window_around_chunk.py	5	CODE
LOW	…able_scripts/context_enrichment_window_around_chunk.py	6	CODE
LOW	…g_techniques_runnable_scripts/explainable_retrieval.py	6	CODE
LOW	…g_techniques_runnable_scripts/explainable_retrieval.py	7	CODE
LOW	all_rag_techniques_runnable_scripts/reranking.py	14	CODE
LOW	all_rag_techniques_runnable_scripts/reranking.py	15	CODE
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	18	CODE
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	29	CODE
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	33	CODE
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	34	CODE
LOW	all_rag_techniques_runnable_scripts/self_rag.py	10	CODE
LOW	all_rag_techniques_runnable_scripts/self_rag.py	11	CODE
LOW	…ag_techniques_runnable_scripts/hierarchical_indices.py	8	CODE
LOW	…ag_techniques_runnable_scripts/hierarchical_indices.py	8	CODE
LOW	…ag_techniques_runnable_scripts/hierarchical_indices.py	11	CODE
LOW	…ag_techniques_runnable_scripts/hierarchical_indices.py	12	CODE
LOW	all_rag_techniques_runnable_scripts/simple_rag.py	10	CODE
LOW	all_rag_techniques_runnable_scripts/simple_rag.py	11	CODE
LOW	all_rag_techniques_runnable_scripts/fusion_retrieval.py	11	CODE
LOW	all_rag_techniques_runnable_scripts/fusion_retrieval.py	12	CODE
LOW	…_rag_techniques_runnable_scripts/adaptive_retrieval.py	9	CODE
LOW	…_rag_techniques_runnable_scripts/adaptive_retrieval.py	10	CODE
LOW	…_rag_techniques_runnable_scripts/adaptive_retrieval.py	10	CODE
LOW	…_rag_techniques_runnable_scripts/adaptive_retrieval.py	11	CODE
LOW	…_rag_techniques_runnable_scripts/adaptive_retrieval.py	17	CODE
LOW	…_rag_techniques_runnable_scripts/adaptive_retrieval.py	18	CODE
LOW	evaluation/evalute_rag.py	16	CODE
LOW	evaluation/evalute_rag.py	17	CODE
LOW	evaluation/evalute_rag.py	19	CODE
LOW	evaluation/evalute_rag.py	34	CODE
LOW	evaluation/evalute_rag.py	34	CODE
LOW	evaluation/evalute_rag.py	34	CODE

Verbosity Indicators11 hits · 25 pts

Severity	File	Line	Snippet	Context
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	42	# Step 1: Create a vector store	COMMENT
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	45	# Step 2: Create a retriever	COMMENT
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	48	# Step 3: Initialize language model and create a contextual compressor	COMMENT
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	52	# Step 4: Combine the retriever with the compressor	COMMENT
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	58	# Step 5: Create a QA chain with the compressed retriever	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	88	# Step 1: Determine if retrieval is necessary	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	95	# Step 2: Retrieve relevant documents	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	101	# Step 3: Evaluate relevance of retrieved documents	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	119	# Step 4: Generate response using relevant contexts	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	127	# Step 5: Assess support	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	133	# Step 6: Evaluate utility	COMMENT

Structural Annotation Overuse11 hits · 25 pts

Severity	File	Line	Snippet	Context
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	42	# Step 1: Create a vector store	COMMENT
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	45	# Step 2: Create a retriever	COMMENT
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	48	# Step 3: Initialize language model and create a contextual compressor	COMMENT
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	52	# Step 4: Combine the retriever with the compressor	COMMENT
LOW⚡	…_techniques_runnable_scripts/contextual_compression.py	58	# Step 5: Create a QA chain with the compressed retriever	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	88	# Step 1: Determine if retrieval is necessary	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	95	# Step 2: Retrieve relevant documents	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	101	# Step 3: Evaluate relevance of retrieved documents	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	119	# Step 4: Generate response using relevant contexts	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	127	# Step 5: Assess support	COMMENT
LOW⚡	all_rag_techniques_runnable_scripts/self_rag.py	133	# Step 6: Evaluate utility	COMMENT

Hyper-Verbose Identifiers21 hits · 20 pts

Severity	File	Line	Snippet	Context
LOW⚡	helper_functions.py	129	def retrieve_context_per_question(question, chunks_query_retriever):	CODE
LOW⚡	helper_functions.py	162	def create_question_answer_from_context_chain(llm):	CODE
LOW⚡	helper_functions.py	186	def answer_question_from_context(question, context, question_answer_from_context_chain):	STRING
LOW	helper_functions.py	294	async def retry_with_exponential_backoff(coroutine, max_retries=5):	STRING
LOW	helper_functions.py	338	def get_langchain_embedding_provider(provider: EmbeddingProvider, model_id: str = None):	STRING
LOW	tests/test_imports.py	8	def execute_imports_from_notebook(notebook_path) -> None:	CODE
LOW	tests/test_imports.py	42	def execute_imports_from_script_files(script_path) -> None:	CODE
LOW	…unnable_scripts/HyPE_Hypothetical_Prompt_Embeddings.py	49	def generate_hypothetical_prompt_embeddings(self, chunk_text):	CODE
LOW	…l_rag_techniques_runnable_scripts/choose_chunk_size.py	23	def evaluate_response_time_and_accuracy(chunk_size, eval_questions, eval_documents, faithfulness_evaluator,	CODE
LOW	…l_rag_techniques_runnable_scripts/choose_chunk_size.py	99	def create_faithfulness_evaluator(self):	CODE
LOW	…l_rag_techniques_runnable_scripts/choose_chunk_size.py	110	def create_relevancy_evaluator(self):	STRING
LOW	…nnable_scripts/HyDe_Hypothetical_Document_Embedding.py	34	def generate_hypothetical_document(self, query):	CODE
LOW	…g_techniques_runnable_scripts/document_augmentation.py	52	def clean_and_filter_questions(questions: List[str]) -> List[str]:	CODE
LOW	…able_scripts/context_enrichment_window_around_chunk.py	17	def split_text_to_chunks_with_indices(text: str, chunk_size: int, chunk_overlap: int) -> List[Document]:	CODE
LOW	…able_scripts/context_enrichment_window_around_chunk.py	38	def retrieve_with_context_overlap(vectorstore, retriever, query: str, num_neighbors: int = 1, chunk_size: int = 200,	CODE
LOW⚡	all_rag_techniques_runnable_scripts/graph_rag.py	331	def _create_answer_check_chain(self):	CODE
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	95	def compute_similarity_matrix(self, embeddings):	CODE
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	206	def _extract_concepts_and_entities(self, content, llm):	CODE
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	567	def _retrieve_relevant_documents(self, query: str):	CODE
LOW	all_rag_techniques_runnable_scripts/fusion_retrieval.py	20	def encode_pdf_and_get_split_documents(path, chunk_size=1000, chunk_overlap=200):	CODE
LOW	evaluation/evalute_rag.py	40	def create_deep_eval_test_cases(	CODE

Docstring Block Structure3 hits · 15 pts

Severity	File	Line	Snippet	Context
HIGH	helper_functions.py	80	Encodes a string into a vector store using OpenAI embeddings. Args: content (str): The text content to	STRING
HIGH	helper_functions.py	295	Retries a coroutine using exponential backoff upon encountering a RateLimitError. Args: coroutine:	STRING
HIGH	helper_functions.py	339	Returns an embedding provider based on the specified provider and model ID. Args: provider (EmbeddingP	STRING

Redundant / Tautological Comments4 hits · 7 pts

Severity	File	Line	Snippet	Context
LOW⚡	helper_functions.py	115	# Assign metadata to each chunk	COMMENT
LOW	…_techniques_runnable_scripts/contextual_compression.py	85	# Display the result and the source documents	COMMENT
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	456	# Check if we have a complete answer with the current context	COMMENT
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	497	# Check if we have a complete answer after adding the neighbor's content	COMMENT

Excessive Try-Catch Wrapping4 hits · 6 pts

Severity	File	Line	Snippet	Context
LOW⚡	helper_functions.py	123	except Exception as e:	CODE
LOW	tests/test_imports.py	33	except Exception as e:	CODE
LOW	tests/test_imports.py	64	except Exception as e:	CODE
MEDIUM	all_rag_techniques_runnable_scripts/crag.py	119	print("Error parsing search results. Returning empty list.")	CODE

Synthetic Comment Markers1 hit · 2 pts

Severity	File	Line	Snippet	Context
HIGH	…l_rag_techniques_runnable_scripts/choose_chunk_size.py	26	Evaluate the average response time, faithfulness, and relevancy of responses generated by GPT-3.5-turbo for a given	STRING

Deep Nesting2 hits · 2 pts

Severity	File	Line	Snippet	Context
LOW	tests/test_imports.py	8		CODE
LOW	all_rag_techniques_runnable_scripts/graph_rag.py	363		CODE

AI Structural Patterns1 hit · 1 pts

Severity	File	Line	Snippet	Context
LOW	all_rag_techniques_runnable_scripts/crag.py	91		CODE

Analysis Overview

What These Metrics Mean

Score History

Severity Breakdown

Directory Score Breakdown

Pattern Findings