Repository Analysis

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

15.4 Moderate AI signal View on GitHub
15.4
Adjusted Score
15.4
Raw Score
100%
Time Factor
2026-05-30
Last Push
81,402
Stars
Python
Language
1,362,232
Lines of Code
4907
Files
14571
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 9HIGH 584MEDIUM 1688LOW 12290

Pattern Findings

14571 matches across 21 categories. Click a row to expand file-level details.

Hyper-Verbose Identifiers8193 hits · 8109 pts
SeverityFileLineSnippet
LOWsetup.py52def should_require_rust_frontend() -> bool:
LOWsetup.py445 def fetch_metadata_for_variant(
LOWsetup.py479 def detect_system_cuda_variant() -> str:
LOWsetup.py533 def fetch_wheel_from_pypi_index(index_url: str, package: str = "vllm") -> str:
LOWsetup.py688 def extract_precompiled_and_patch_package(
LOWsetup.py799 def get_base_commit_in_main_branch() -> str:
LOWcsrc/cpu/generate_cpu_attn_dispatch.py92def generate_cases_for_isa_group(isa_list: list[str], include_fp8: bool = False) -> str:
LOWcsrc/quantization/machete/generate.py334def generate_type_option_name(kernel_types: TypeConfig):
LOWcsrc/quantization/machete/generate.py370def unsigned_type_with_bitwidth(num_bits):
LOWtools/generate_versions_json.py86def generate_bake_native_json(args: dict[str, str]) -> dict:
LOWtools/install_nixl_from_source_ubuntu.py65def install_system_dependencies():
LOWtools/install_nixl_from_source_ubuntu.py105def build_and_install_prerequisites(args):
LOWtools/pre_commit/generate_attention_backend_docs.py1046def _expand_flash_attn_variants(
LOWtools/pre_commit/generate_attention_backend_docs.py1150def parse_cuda_priority_lists() -> dict[str, list[str]]:
LOWtools/pre_commit/generate_attention_backend_docs.py300def parse_mla_prefill_registry() -> dict[str, str]:
LOWtools/pre_commit/generate_attention_backend_docs.py320def parse_mla_prefill_priorities() -> dict[str, list[str]]:
LOWtools/pre_commit/generate_attention_backend_docs.py386def parse_mla_prefill_backend_file(class_path: str) -> dict[str, Any] | None:
LOWtools/pre_commit/generate_attention_backend_docs.py466def parse_mla_prefill_backends() -> list[dict[str, Any]]:
LOWtools/pre_commit/generate_attention_backend_docs.py866def parse_flash_attn_features() -> dict[str, dict[str, Any]]:
LOWtools/pre_commit/generate_attention_backend_docs.py1003def parse_flashinfer_trtllm_features() -> dict[str, dict[str, Any]]:
LOWtools/pre_commit/generate_attention_backend_docs.py1098def _expand_flashinfer_variants(
LOWtools/pre_commit/generate_attention_backend_docs.py1203def _get_backends_from_return(stmts: list) -> list[str]:
LOWtools/pre_commit/generate_attention_backend_docs.py1490def generate_priority_section(priorities: dict[str, list[str]]) -> str:
LOWtools/profiler/visualize_layerwise_profile.py75def shorten_plot_legend_strings(legend, max_char_len: int):
LOWtools/profiler/visualize_layerwise_profile.py94def attempt_to_make_names_unique(entries_and_traces):
LOWtools/profiler/visualize_layerwise_profile.py144def group_trace_by_operations(trace_df: "pd.DataFrame") -> "pd.DataFrame":
LOWtools/profiler/nsys_profile_tools/gputrc2graph.py45 def gen_nonoverlapped_sum_from_gputrace(self, in_file, out_file):
LOWtools/profiler/nsys_profile_tools/gputrc2graph.py66 def sum_non_overlapping_intervals(self, df):
LOWtools/vllm-rocm/pin_rocm_dependencies.py20def extract_version_from_wheel(wheel_name: str) -> str:
LOWtools/vllm-rocm/pin_rocm_dependencies.py40def get_custom_wheel_versions(install_dir: str) -> dict[str, str]:
LOWtools/vllm-rocm/pin_rocm_dependencies.py94def pin_dependencies_in_requirements(requirements_path: str, versions: dict[str, str]):
LOWtests/test_sequence.py9def test_sequence_intermediate_tensors_equal():
LOWtests/test_zen_cpu_platform_detection.py35def test_is_amd_zen_cpu_returns_false_when_cpuinfo_missing():
LOWtests/test_version.py36def test_prev_minor_version_was(version_tuple, version_str, expected):
LOWtests/test_ray_env_utils.py36 def test_arbitrary_var_propagated(self):
LOWtests/test_ray_env_utils.py42 def test_worker_specific_excluded(self):
LOWtests/test_ray_env_utils.py50 def test_non_carry_over_blacklist(self):
LOWtests/test_fxgraphcache_pickle_patch.py20 def test_valueerror_converted_to_bypass(self):
LOWtests/test_fxgraphcache_pickle_patch.py30 def test_original_valueerror_chained(self):
LOWtests/test_fxgraphcache_pickle_patch.py44 def test_non_valueerror_propagates(self):
LOWtests/test_fxgraphcache_pickle_patch.py54 def test_normal_return_preserved(self):
LOWtests/test_fxgraphcache_pickle_patch.py76 def test_sentinel_attribute_set(self):
LOWtests/test_fxgraphcache_pickle_patch.py90def test_patch_applied_in_current_environment():
LOWtests/conftest.py217def init_test_http_connection():
LOWtests/conftest.py264def should_do_global_cleanup_after_test(request) -> bool:
LOWtests/conftest.py738 def _hidden_states_to_seq_logprobs(
LOWtests/conftest.py761 def _hidden_states_to_logprobs(
LOWtests/conftest.py788 def generate_greedy_logprobs_limit(
LOWtests/conftest.py1027 def _final_steps_generate_w_logprobs(
LOWtests/conftest.py1137 def generate_prompt_perplexity(
LOWtests/conftest.py1250 def _wait_for_rocm_memory_release(self, gpu_memory_utilization: float) -> None:
LOWtests/conftest.py1295def temporary_enable_log_propagate():
LOWtests/conftest.py1499def pytest_collection_modifyitems(config, items):
LOWtests/conftest.py1516def cli_config_file_with_model():
LOWtests/conftest.py1654def clean_gpu_memory_between_tests():
LOWtests/test_inputs.py21def test_preprocessor_always_mm_code_path(model_id, prompt):
LOWtests/test_ray_env.py54 def test_pythonhashseed_in_result(self):
LOWtests/test_ray_env.py118 def test_worker_specific_host_vars_are_excluded(self):
LOWtests/test_ray_env.py128 def test_non_carry_over_blacklist(self):
LOWtests/test_ray_env.py142 def test_additional_vars_passthrough(self):
8133 more matches not shown…
Decorative Section Separators1222 hits · 4246 pts
SeverityFileLineSnippet
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1041# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1043# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1145# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1147# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py24# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py26# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py75# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py77# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py263# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py265# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py536# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py538# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py822# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py824# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1283# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1289# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1382# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1384# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1618# ---------------------------------------------------------------------------
MEDIUMtools/pre_commit/generate_attention_backend_docs.py1620# ---------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh106# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh108# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh117# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh121# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh135# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh138# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh151# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh157# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh65# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh67# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh78# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh81# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh93# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh95# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh175# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh179# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh192# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh199# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh214# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh217# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh240# -----------------------------------------------------------------------------
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh243# -----------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py11# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py13# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py46# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py48# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py61# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py63# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py97# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py99# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py133# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py135# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py147# ---------------------------------------------------------------------------
MEDIUMtests/test_ray_env.py149# ---------------------------------------------------------------------------
MEDIUMtests/test_jit_monitor.py21# ------------------------------------------------------------------
MEDIUMtests/test_jit_monitor.py23# ------------------------------------------------------------------
MEDIUMtests/test_jit_monitor.py39# ------------------------------------------------------------------
MEDIUMtests/test_jit_monitor.py41# ------------------------------------------------------------------
MEDIUMtests/test_jit_monitor.py172# ------------------------------------------------------------------
MEDIUMtests/test_jit_monitor.py174# ------------------------------------------------------------------
1162 more matches not shown…
Cross-File Repetition489 hits · 2445 pts
SeverityFileLineSnippet
HIGHtests/test_envs.py0test that callable choices raise error for invalid values.
HIGHtests/test_envs.py0test that callable choices raise error for invalid values.
HIGHtests/test_envs.py0test that callable choices raise error for invalid values.
HIGHtests/test_logger.py0this test calls _configure_vllm_root_logger again to test custom logging config behavior, however mocks are used to ensu
HIGHtests/test_logger.py0this test calls _configure_vllm_root_logger again to test custom logging config behavior, however mocks are used to ensu
HIGHtests/test_logger.py0this test calls _configure_vllm_root_logger again to test custom logging config behavior, however mocks are used to ensu
HIGHtests/test_logger.py0this test calls _configure_vllm_root_logger again to test custom logging config behavior, however it fails before any ch
HIGHtests/test_logger.py0this test calls _configure_vllm_root_logger again to test custom logging config behavior, however it fails before any ch
HIGHtests/test_logger.py0this test calls _configure_vllm_root_logger again to test custom logging config behavior, however it fails before any ch
HIGHtests/v1/attention/test_mla_prefill_selector.py0clear lru cache to ensure each test case runs without caching.
HIGHtests/kernels/attention/test_attention_selector.py0clear lru cache to ensure each test case runs without caching.
HIGHtests/kernels/attention/test_mha_attn.py0clear lru cache to ensure each test case runs without caching.
HIGHtests/kernels/attention/test_rocm_attention_selector.py0clear lru cache to ensure each test case runs without caching.
HIGHtests/v1/logits_processors/utils.py0fake logit processor to support unit testing and examples
HIGHdocs/features/custom_logitsprocs.md0fake logit processor to support unit testing and examples
HIGHexamples/features/logits_processor/custom.py0fake logit processor to support unit testing and examples
HIGHtests/v1/logits_processors/utils.py0the request-level logits processor masks out all logits except the token id identified by `target_token`
HIGHdocs/features/custom_logitsprocs.md0the request-level logits processor masks out all logits except the token id identified by `target_token`
HIGHexamples/features/logits_processor/custom_req.py0the request-level logits processor masks out all logits except the token id identified by `target_token`
HIGHexamples/features/logits_processor/custom_req_init.py0the request-level logits processor masks out all logits except the token id identified by `target_token`
HIGHtests/v1/logits_processors/utils.py0example of wrapping a fake request-level logit processor to create a batch-level logits processor
HIGHdocs/features/custom_logitsprocs.md0example of wrapping a fake request-level logit processor to create a batch-level logits processor
HIGHexamples/features/logits_processor/custom_req.py0example of wrapping a fake request-level logit processor to create a batch-level logits processor
HIGHtests/v1/logits_processors/utils.py0this method returns a new request-level logits processor, customized to the `target_token` value associated with a parti
HIGHdocs/features/custom_logitsprocs.md0this method returns a new request-level logits processor, customized to the `target_token` value associated with a parti
HIGHexamples/features/logits_processor/custom_req.py0this method returns a new request-level logits processor, customized to the `target_token` value associated with a parti
HIGH…s/v1/kv_connector/nixl_integration/toy_proxy_server.py0lifespan context manager to handle startup and shutdown events.
HIGH…cache/disagg_prefill_lmcache_v1/disagg_proxy_server.py0lifespan context manager to handle startup and shutdown events.
HIGH…regated/mooncake_connector/mooncake_connector_proxy.py0lifespan context manager to handle startup and shutdown events.
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0<seed:think>the user\'s current thinking budget is 512.</seed:cot_budget_reflect>\nlet me analyze the
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0<seed:think>the user\'s current thinking budget is 512.</seed:cot_budget_reflect>\nlet me analyze the
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0<seed:think>the user\'s current thinking budget is 512.</seed:cot_budget_reflect>\nlet me analyze the
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0<seed:think>the user\'s current thinking budget is 512.</seed:cot_budget_reflect>\nlet me analyze the
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0question. the user wants to know the weather in barcelona, spain. looking at the functions available,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0question. the user wants to know the weather in barcelona, spain. looking at the functions available,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0question. the user wants to know the weather in barcelona, spain. looking at the functions available,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0question. the user wants to know the weather in barcelona, spain. looking at the functions available,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0there\'s a get_weather function that can retrieve the current temperature for a given location. \n\nfirst,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0there\'s a get_weather function that can retrieve the current temperature for a given location. \n\nfirst,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0there\'s a get_weather function that can retrieve the current temperature for a given location. \n\nfirst,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0there\'s a get_weather function that can retrieve the current temperature for a given location. \n\nfirst,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0check the parameters required by get_weather: location is mandatory (needs city and country), and unit is
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0check the parameters required by get_weather: location is mandatory (needs city and country), and unit is
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0check the parameters required by get_weather: location is mandatory (needs city and country), and unit is
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0check the parameters required by get_weather: location is mandatory (needs city and country), and unit is
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0optional. the user provided "barcelona spain" as the location, which fits the required format (city,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0optional. the user provided "barcelona spain" as the location, which fits the required format (city,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0optional. the user provided "barcelona spain" as the location, which fits the required format (city,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0optional. the user provided "barcelona spain" as the location, which fits the required format (city,
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0country). \n<seed:cot_budget_reflect>i have used 131 tokens, and there are 381 tokens remaining for use.
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0country). \n<seed:cot_budget_reflect>i have used 131 tokens, and there are 381 tokens remaining for use.
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0country). \n<seed:cot_budget_reflect>i have used 131 tokens, and there are 381 tokens remaining for use.
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0country). \n<seed:cot_budget_reflect>i have used 131 tokens, and there are 381 tokens remaining for use.
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0</seed:cot_budget_reflect>\n since the unit isn\'t specified, the function will default to celsius, which
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0</seed:cot_budget_reflect>\n since the unit isn\'t specified, the function will default to celsius, which
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0</seed:cot_budget_reflect>\n since the unit isn\'t specified, the function will default to celsius, which
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0</seed:cot_budget_reflect>\n since the unit isn\'t specified, the function will default to celsius, which
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0is fine. \n\nthere\'s no need to ask for more information because the location is clear. so i should call
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0is fine. \n\nthere\'s no need to ask for more information because the location is clear. so i should call
HIGHtests/tool_parsers/test_seed_oss_tool_parser.py0is fine. \n\nthere\'s no need to ask for more information because the location is clear. so i should call
429 more matches not shown…
Unused Imports1159 hits · 1088 pts
SeverityFileLineSnippet
LOW…c/cutlass_extensions/vllm_cutlass_library_extension.py6
LOWtests/conftest.py75
LOWtests/conftest.py75
LOWtests/conftest.py76
LOWtests/utils.py63
LOWtests/utils.py1951
LOWtests/v1/attention/test_mla_backends.py69
LOWtests/v1/attention/test_attention_backends.py46
LOWtests/v1/logits_processors/utils.py13
LOWtests/v1/cudagraph/test_cudagraph_mode.py38
LOWtests/v1/cudagraph/test_breakable_cudagraph.py7
LOWtests/v1/kv_connector/unit/test_tp_mapping.py10
LOWtests/v1/kv_connector/unit/test_hf3fs_client.py17
LOWtests/v1/kv_connector/unit/test_hf3fs_client.py17
LOWtests/v1/kv_connector/unit/test_hf3fs_client.py17
LOWtests/v1/kv_connector/unit/test_hf3fs_client.py17
LOWtests/v1/kv_connector/unit/test_hf3fs_client.py17
LOW…/v1/kv_connector/unit/offloading_connector/conftest.py3
LOWtests/v1/spec_decode/test_backup_token_async_spec.py9
LOWtests/v1/sample/test_topk_topp_sampler.py27
LOWtests/v1/engine/conftest.py20
LOWtests/v1/engine/conftest.py20
LOWtests/v1/simple_kv_offload/test_scheduler.py5
LOWtests/tool_use/test_gemma4_responses_adjust_request.py25
LOWtests/renderers/test_chat_utils_prompt_embeds.py6
LOWtests/kernels/core/test_vit_fp8_attn.py20
LOWtests/kernels/core/test_fused_q_kv_rmsnorm.py11
LOWtests/kernels/ir/test_ir_ops.py11
LOWtests/kernels/ir/test_layernorm.py7
LOWtests/kernels/mamba/test_ssu_dispatch.py24
LOWtests/kernels/moe/test_moe.py18
LOWtests/distributed/test_eplb_spec_decode.py3
LOW…add_dummy_platform/vllm_add_dummy_platform/__init__.py10
LOWtests/cuda/scripts/check_device_count_respects_env.py14
LOWtests/model_executor/test_oink_integration.py37
LOWtests/models/language/pooling/embed_utils.py8
LOWtests/models/language/pooling/test_reward.py18
LOWtests/models/multimodal/generation/test_pixtral.py24
LOWtests/vllm_test_utils/vllm_test_utils/__init__.py8
LOWtests/vllm_test_utils/vllm_test_utils/__init__.py8
LOWtests/vllm_test_utils/vllm_test_utils/__init__.py9
LOWtests/vllm_test_utils/vllm_test_utils/__init__.py9
LOWtests/compile/test_compile_ranges.py10
LOWtests/compile/test_config.py32
LOWtests/compile/test_structured_logging.py10
LOWtests/compile/test_graph_partition.py21
LOWtests/compile/test_decorator.py21
LOWtests/compile/fullgraph/test_toy_llama.py34
LOWtests/compile/fullgraph/test_multiple_graphs.py28
LOWtests/compile/passes/ir/test_lowering.py7
LOW…ranscription/test_transcription_inter_chunk_spacing.py11
LOWtests/entrypoints/openai/test_dp_supervisor.py17
LOWtests/entrypoints/openai/responses/conftest.py3
LOWtests/entrypoints/openai/responses/test_mcp_tools.py5
LOWtests/entrypoints/openai/responses/test_harmony.py5
LOWtests/standalone_tests/lazy_imports.py22
LOWvllm/__init__.py7
LOWvllm/__init__.py7
LOWvllm/__init__.py14
LOWvllm/__init__.py42
1099 more matches not shown…
Self-Referential Comments322 hits · 1028 pts
SeverityFileLineSnippet
MEDIUMtools/install_deepgemm.sh83# Create a temporary directory for the build
MEDIUMtools/report_build_time_ninja.py201 # Create a list that is in order by time stamp and has entries for the
MEDIUMtools/pre_commit/update-dockerfile-graph.sh26 # Define the target file path
MEDIUMtests/conftest.py551 # Create a copy to avoid modifying the original dict
MEDIUMtests/utils.py727 # Create a dedicated process group so we can kill
MEDIUMtests/utils.py1493 # Create a unique temporary file to store exception info from child
MEDIUMtests/test_access_log_filter.py259 # Create a logger with our filter (simulating uvicorn.access)
MEDIUMtests/test_access_log_filter.py266 # Create a custom handler that tracks messages
MEDIUMtests/test_config.py694 # Create a new mock and run the method with the same S3 URL
MEDIUMtests/test_logger.py274 # Create a mock logger to capture log calls
MEDIUMtests/v1/test_tensor_ipc_queue.py193 # Create a CPU tensor
MEDIUMtests/v1/test_tensor_ipc_queue.py511 # Create a CPU tensor
MEDIUMtests/v1/test_tensor_ipc_queue.py642 # Create a CPU tensor
MEDIUMtests/v1/test_tensor_ipc_queue.py905 # Create a tensor queue
MEDIUMtests/v1/test_serial_utils.py189 # Create a sample Python object
MEDIUMtests/v1/test_serial_utils.py207 # Create a sample tensor
MEDIUMtests/v1/test_serial_utils.py227 # Create a sample numpy array
MEDIUMtests/v1/test_serial_utils.py313 # Create a request with a non-multimodal tensor
MEDIUMtests/v1/test_serial_utils.py354 # Create a request with None for the tensor field
MEDIUMtests/v1/kv_offload/test_file_mapper.py43 # Create a copy of the mock config to avoid modifying the global one
MEDIUMtests/v1/metrics/test_ray_metrics.py58 # Create the actor and call the async method
MEDIUMtests/v1/attention/test_mla_backends.py279 # Create a realistic slot mapping that corresponds to the block table
MEDIUMtests/v1/attention/test_mla_backends.py1198 # Create a summary for the single-line failure message
MEDIUMtests/v1/attention/test_attention_backends.py187 # Create a realistic slot mapping that corresponds to the block table
MEDIUMtests/v1/logits_processors/test_correctness.py807 # Define a shuffled batch of requests which individually use a different
MEDIUMtests/v1/logits_processors/test_custom_offline.py28# Create a mixture of requests which do and don't utilize the dummy logitproc
MEDIUMtests/v1/logits_processors/test_custom_offline.py63 # Create a vLLM instance and load custom logitproc
MEDIUMtests/v1/logits_processors/test_custom_offline.py70 # Create a reference vLLM instance without custom logitproc
MEDIUMtests/v1/core/test_kv_cache_utils.py252 # Create a list of KVCacheBlock objects
MEDIUMtests/v1/core/test_kv_cache_utils.py255 # Create a FreeKVCacheBlockQueue with these blocks
MEDIUMtests/v1/core/test_kv_cache_utils.py420 # Create a list of KVCacheBlock objects
MEDIUMtests/v1/core/test_kv_cache_utils.py423 # Create a FreeKVCacheBlockQueue with these blocks
MEDIUMtests/v1/core/test_kv_cache_utils.py298 # Create an empty FreeKVCacheBlockQueue with these blocks
MEDIUMtests/v1/core/test_kv_cache_utils.py346 # Create an empty FreeKVCacheBlockQueue
MEDIUMtests/v1/core/test_kv_cache_utils.py363 # Create an empty FreeKVCacheBlockQueue with these blocks
MEDIUMtests/v1/core/test_kv_cache_utils.py1315 # Create a VllmConfig
MEDIUMtests/v1/core/test_kv_cache_utils.py1351 # Create a VllmConfig
MEDIUMtests/v1/core/test_scheduler.py2638 # Create a request and schedule it
MEDIUMtests/v1/core/test_scheduler.py2665 # Create a high priority request and schedule it
MEDIUMtests/v1/core/test_scheduler.py3424 # Create a request and schedule it (and to be preempted)
MEDIUMtests/v1/core/test_scheduler.py3474 # Create a high priority request and schedule it
MEDIUMtests/v1/core/test_scheduler.py4263 # Create a text-only request (no mm_features).
MEDIUMtests/v1/cudagraph/test_cudagraph_dispatch.py55 # Create a real LoRAConfig with specialize_active_lora enabled
MEDIUMtests/v1/kv_connector/unit/test_nixl_connector.py1912 # Create a request that triggers do_remote_decode so that
MEDIUMtests/v1/kv_connector/unit/test_lmcache_connector.py216 # Create a mock object that is not LMCacheKVEvents
MEDIUMtests/v1/kv_connector/unit/test_moriio_connector.py173 # Define a fake remote engine id for testing
MEDIUM…ts/v1/kv_connector/unit/test_decode_bench_connector.py145 # Create a request with multiple blocks worth of tokens
MEDIUM…ts/v1/kv_connector/unit/test_decode_bench_connector.py189 # Create a request
MEDIUM…ts/v1/kv_connector/unit/test_decode_bench_connector.py211 # Create a request with just 1 token
MEDIUM…ts/v1/kv_connector/unit/test_decode_bench_connector.py229 # Create a request with 2 tokens
MEDIUM…ts/v1/kv_connector/unit/test_decode_bench_connector.py255 # Create a request with many blocks
MEDIUM…ts/v1/kv_connector/unit/test_decode_bench_connector.py338 # Create a request that doesn't align to block boundaries
MEDIUMtests/v1/kv_connector/unit/test_nixl_connector_hma.py75 # Create a mock worker with just the required attributes
MEDIUMtests/v1/kv_connector/unit/test_example_connector.py146 # Create the LLM instance
MEDIUM…r/extract_hidden_states_integration/test_extraction.py48 # Create a minimal Llama config with small dimensions
MEDIUM…r/extract_hidden_states_integration/test_extraction.py63 # Create a simple tokenizer
MEDIUMtests/v1/determinism/test_batch_invariance.py102 # Create a batch of size `max_batch_size` and insert the needle at
MEDIUMtests/v1/distributed/test_external_lb_dp.py154 # Create a client for each server
MEDIUMtests/v1/distributed/test_hybrid_lb_dp.py182 # Create a client for each node (each node has its own API endpoint)
MEDIUMtests/v1/streaming_input/test_async_llm_streaming.py20 # Create a minimal mock without initializing the full engine
262 more matches not shown…
Deep Nesting1055 hits · 966 pts
SeverityFileLineSnippet
LOWuse_existing_torch.py21
LOWsetup.py943
LOWsetup.py987
LOWsetup.py173
LOWsetup.py589
LOWsetup.py688
LOWcsrc/cpu/generate_cpu_attn_dispatch.py92
LOWcsrc/quantization/marlin/generate_kernels.py173
LOWcsrc/moe/marlin_moe_wna16/generate_kernels.py173
LOWtools/report_build_time_ninja.py151
LOWtools/pre_commit/generate_attention_backend_docs.py115
LOWtools/pre_commit/generate_attention_backend_docs.py140
LOWtools/pre_commit/generate_attention_backend_docs.py161
LOWtools/pre_commit/generate_attention_backend_docs.py206
LOWtools/pre_commit/generate_attention_backend_docs.py320
LOWtools/pre_commit/generate_attention_backend_docs.py386
LOWtools/pre_commit/generate_attention_backend_docs.py550
LOWtools/pre_commit/generate_attention_backend_docs.py594
LOWtools/pre_commit/generate_attention_backend_docs.py866
LOWtools/pre_commit/generate_attention_backend_docs.py1150
LOWtools/pre_commit/generate_attention_backend_docs.py1203
LOWtools/pre_commit/validate_config.py73
LOWtools/pre_commit/check_boolean_context_manager.py21
LOWtools/pre_commit/check_spdx_header.py27
LOWtools/pre_commit/check_spdx_header.py65
LOWtools/pre_commit/check_spdx_header.py109
LOWtools/vllm-rocm/pin_rocm_dependencies.py40
LOWtools/vllm-rocm/pin_rocm_dependencies.py94
LOWtests/conftest.py525
LOWtests/conftest.py941
LOWtests/utils.py1150
LOWtests/utils.py1484
LOWtests/utils.py1610
LOWtests/utils.py535
LOWtests/utils.py1490
LOWtests/utils.py1633
LOWtests/v1/utils.py12
LOWtests/v1/kv_offload/cpu/test_gpu_worker.py223
LOWtests/v1/attention/test_mla_backends.py721
LOWtests/v1/attention/test_sparse_mla_backends.py531
LOWtests/v1/logits_processors/test_correctness.py304
LOWtests/v1/logits_processors/test_correctness.py456
LOWtests/v1/core/test_scheduler.py1923
LOWtests/v1/core/utils.py176
LOWtests/v1/kv_connector/unit/test_nixl_connector.py2561
LOWtests/v1/kv_connector/unit/utils.py317
LOWtests/v1/kv_connector/unit/test_mooncake_connector.py315
LOWtests/v1/kv_connector/unit/test_mooncake_connector.py704
LOWtests/v1/determinism/test_batch_invariance.py27
LOWtests/v1/determinism/test_batch_invariance.py154
LOWtests/v1/determinism/test_batch_invariance.py645
LOWtests/v1/determinism/test_nvfp4_batch_invariant.py45
LOWtests/v1/spec_decode/test_eagle.py48
LOWtests/v1/spec_decode/test_acceptance_length.py178
LOWtests/v1/spec_decode/test_acceptance_length.py228
LOWtests/v1/sample/test_logprobs.py129
LOWtests/v1/sample/test_logprobs.py494
LOWtests/v1/sample/test_logprobs.py866
LOWtests/v1/sample/test_logprobs.py910
LOWtests/v1/sample/test_logprobs.py956
995 more matches not shown…
Over-Commented Block785 hits · 716 pts
SeverityFileLineSnippet
LOWCMakeLists.txt1cmake_minimum_required(VERSION 3.26)
LOWCMakeLists.txt41set(PYTHON_SUPPORTED_VERSIONS "3.10" "3.11" "3.12" "3.13" "3.14")
LOWCMakeLists.txt1081
LOWcsrc/torch_utils.h1#pragma once
LOWcsrc/torch_bindings.cpp1// Provides torch::Tensor for ops.h (previously included transitively via
LOWcsrc/torch_bindings.cpp61 "Tensor q_in, Tensor kv, Tensor! k_cache, "
LOWcsrc/cumem_allocator_compat.h101} // extern "C"
LOWcsrc/cub_helpers.h1#pragma once
LOWcsrc/launch_bounds_utils.h1#pragma once
LOWcsrc/cumem_allocator.cpp1// A CUDAPluggableAllocator based on cumem* APIs.
LOWcsrc/cuda_compat.h41 #define VLLM_LDG(arg) __ldg(arg)
LOWcsrc/cuda_compat.h61#endif
LOWcsrc/cuda_utils.h1#pragma once
LOWcsrc/spinloop.cpp1#include <Python.h>
LOWcsrc/attention/attention_dtypes.h1#pragma once
LOWcsrc/core/scalar_type.hpp1#pragma once
LOWcsrc/core/registration.h1#pragma once
LOWcsrc/cpu/cpu_attn_neon.hpp1#ifndef CPU_ATTN_NEON_HPP
LOWcsrc/cpu/utils.cpp1#ifndef VLLM_NUMA_DISABLED
LOWcsrc/cpu/cpu_fused_moe.cpp1#include "cpu/cpu_types.hpp"
LOWcsrc/cpu/cpu_types.hpp1#ifndef CPU_TYPES_HPP
LOWcsrc/cpu/cpu_types.hpp21 #include "cpu_types_scalar.hpp"
LOWcsrc/cpu/cpu_types_riscv.hpp1#ifndef CPU_TYPES_RISCV_HPP
LOWcsrc/cpu/cpu_attn_impl.hpp901 static constexpr int64_t head_dim = attention_impl_t::HeadDim;
LOWcsrc/cpu/cpu_attn_impl.hpp921 // BlockSizeAlignment
LOWcsrc/cpu/cpu_attn_impl.hpp1741 blocksize_alignment);
LOWcsrc/cpu/cpu_attn_impl.hpp1761 float* curr_partial_q_buffer =
LOWcsrc/cpu/cpu_attn_vxe.hpp381} // namespace cpu_attention
LOWcsrc/cpu/generate_cpu_attn_dispatch.py141#ifdef CPU_CAPABILITY_AMXBF16
LOWcsrc/cpu/cpu_arch_macros.h61#endif
LOWcsrc/cpu/cpu_arch_macros.h161 #include <riscv_vector.h>
LOWcsrc/cpu/cpu_attn_rvv.hpp1// SPDX-License-Identifier: Apache-2.0
LOWcsrc/cpu/cpu_attn_fp8.hpp1// SPDX-License-Identifier: Apache-2.0
LOWcsrc/cpu/cpu_types_vxe.hpp1
LOWcsrc/cpu/cpu_types_vxe.hpp21
LOWcsrc/cpu/cpu_types_x86.hpp1
LOWcsrc/cpu/shm.cpp1#include "cpu/cpu_types.hpp"
LOWcsrc/cpu/cpu_types_riscv_impl.hpp1#ifndef CPU_TYPES_RISCV_IMPL_HPP
LOWcsrc/cpu/cpu_types_riscv_impl.hpp921 #define CPU_KERNEL_GUARD_IN(NAME)
LOWcsrc/cpu/cpu_types_riscv_defs.hpp1#ifndef CPU_TYPES_RISCV_DEFS_HPP
LOWcsrc/cpu/cpu_types_riscv_defs.hpp21 #define LMUL_256 m1
LOWcsrc/cpu/cpu_types_arm.hpp21#define VLLM_DISPATCH_CASE_FLOATING_TYPES(...) \
LOWcsrc/cpu/cpu_attn_vsx.hpp1// SPDX-License-Identifier: Apache-2.0
LOWcsrc/cpu/sgl-kernels/gemm.cpp81 constexpr int BLOCK_N = block_size_n();
LOWcsrc/cpu/sgl-kernels/fla.cpp1// Adapted from
LOWcsrc/cpu/sgl-kernels/gemm.h1// Adapted from
LOWcsrc/cpu/sgl-kernels/vec.h401 __m512i vec_zero = _mm512_setzero_epi32();
LOWcsrc/cpu/sgl-kernels/moe.cpp1// Adapted from
LOWcsrc/cpu/sgl-kernels/moe.cpp21// allocates 2 intermediate_caches instead of 3
LOWcsrc/cpu/sgl-kernels/moe.cpp1261 // unlike triton kernel, we fuse silu with gemm1 so only need 2 intermediate_caches:
LOWcsrc/cpu/sgl-kernels/moe_int4.cpp61 // num_threads * BLOCK_M * K +
LOWcsrc/cpu/sgl-kernels/common.h1// Adapted from
LOWcsrc/cpu/sgl-kernels/common.h201}
LOWcsrc/cpu/sgl-kernels/common.h281 return std::max(1, (actual_nth >> 1) * 2);
LOWcsrc/cpu/sgl-kernels/conv.cpp141 Unroll<ROWS * COLS>{}(loadb);
LOWcsrc/cpu/sgl-kernels/conv.cpp641 seqlen,
LOWcsrc/libtorch_stable/torch_utils.h1#pragma once
LOWcsrc/libtorch_stable/torch_bindings.cpp1#include "ops.h"
LOW…e/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp41
LOWcsrc/libtorch_stable/mamba/selective_scan.h1/******************************************************************************
725 more matches not shown…
Excessive Try-Catch Wrapping624 hits · 677 pts
SeverityFileLineSnippet
MEDIUMsetup.py110def find_tcmalloc() -> Path | None:
MEDIUMsetup.py799def get_base_commit_in_main_branch() -> str:
LOWsetup.py118 except Exception:
LOWsetup.py207 except Exception as e:
LOWsetup.py497 except Exception:
LOWsetup.py508 except Exception:
LOWsetup.py635 except Exception as e:
LOWsetup.py853 except Exception as err:
LOWsetup.py924 except Exception:
MEDIUMtools/generate_cmake_presets.py168 print(f"Error writing file: {e}")
LOWtools/install_nixl_from_source_ubuntu.py30 except Exception:
LOWtools/pre_commit/generate_attention_backend_docs.py311 except Exception:
LOWtools/pre_commit/generate_attention_backend_docs.py332 except Exception:
LOWtools/pre_commit/generate_attention_backend_docs.py401 except Exception:
LOWtools/pre_commit/generate_attention_backend_docs.py743 except Exception:
LOWtools/pre_commit/generate_attention_backend_docs.py757 except Exception as e:
LOWtools/pre_commit/generate_attention_backend_docs.py840 except Exception:
LOWtools/pre_commit/generate_attention_backend_docs.py877 except Exception:
LOWtools/pre_commit/generate_attention_backend_docs.py1014 except Exception:
LOWtools/pre_commit/generate_attention_backend_docs.py1171 except Exception:
LOWtools/profiler/nsys_profile_tools/gputrc2graph.py239 except Exception:
LOWtools/vllm-rocm/pin_rocm_dependencies.py79 except Exception as e:
LOWtests/conftest.py1281 except Exception:
LOWtests/conftest.py1543 except Exception as e:
MEDIUMtests/utils.py117def _nvml():
LOWtests/utils.py81 except Exception as e:
LOWtests/utils.py274 except Exception:
LOWtests/utils.py555 except Exception as e:
LOWtests/utils.py641 except Exception:
LOWtests/utils.py1522 except Exception as e:
LOWtests/utils.py1545 except Exception:
LOWtests/utils.py1739 except Exception as e:
LOWtests/utils.py1954 except Exception:
LOWtests/v1/test_tensor_ipc_queue.py83 except Exception as e:
LOWtests/v1/test_tensor_ipc_queue.py120 except Exception as e:
LOWtests/v1/test_tensor_ipc_queue.py308 except Exception as e:
LOWtests/v1/test_tensor_ipc_queue.py409 except Exception as e:
LOWtests/v1/test_tensor_ipc_queue.py443 except Exception as e:
LOWtests/v1/test_tensor_ipc_queue.py532 except Exception as e:
LOWtests/v1/test_tensor_ipc_queue.py570 except Exception as e:
LOWtests/v1/test_tensor_ipc_queue.py721 except Exception as e:
LOWtests/v1/utils.py64 except Exception as e:
LOWtests/v1/kv_offload/cpu/test_shared_offload_region.py164 except Exception as e:
LOWtests/v1/kv_offload/cpu/test_shared_offload_region.py123 except Exception as e:
LOWtests/v1/shutdown/test_forward_error.py81 except Exception as e:
LOWtests/v1/shutdown/test_processor_error.py39 except Exception as e:
LOWtests/v1/kv_connector/unit/test_hf3fs_client.py28except Exception:
LOWtests/v1/kv_connector/unit/test_multi_connector.py383 except Exception as e:
LOWtests/v1/kv_connector/unit/utils.py352 except Exception as e:
LOW…/v1/kv_connector/unit/test_mooncake_store_connector.py575 except Exception:
LOW…/v1/kv_connector/unit/test_mooncake_store_connector.py612 except Exception:
LOWtests/v1/kv_connector/unit/test_rixl_gpu_mem_diag.py37 except Exception:
LOW…s/v1/kv_connector/nixl_integration/toy_proxy_server.py253 except Exception as e:
MEDIUM…s/v1/kv_connector/nixl_integration/toy_proxy_server.py258 print(f"Error occurred in disagg prefill proxy server - {api} endpoint")
MEDIUM…/kv_connector/nixl_integration/test_disagg_accuracy.py159 print(f"Error writing to file: {e}")
MEDIUM…/kv_connector/nixl_integration/test_disagg_accuracy.py168 print(f"Error writing to file: {e}")
LOWtests/v1/determinism/test_online_batch_invariance.py42 except Exception as e: # pragma: no cover
LOWtests/v1/distributed/test_external_lb_dp.py91 except Exception as e:
LOWtests/v1/distributed/test_external_lb_dp.py118 except Exception as e:
MEDIUMtests/v1/distributed/test_external_lb_dp.py119 print(f"Error stopping servers: {e}")
564 more matches not shown…
Redundant / Tautological Comments312 hits · 482 pts
SeverityFileLineSnippet
LOWsetup.py821 # Check if the upstream_main_commit exists in the local repo
LOWtools/install_torchcodec_rocm.sh15# Check if torchcodec is already installed and working
LOWtools/pre_commit/generate_attention_backend_docs.py348 # Check if it's a capability.major == 10 check (Blackwell)
LOWtools/pre_commit/generate_attention_backend_docs.py766 # Check if this is an MLA backend by parent class or naming
LOWtools/pre_commit/generate_attention_backend_docs.py1188 # Check if this is the "if use_mla:" branch
LOWtools/pre_commit/check_forbidden_imports.py99 # Check if it's allowed
LOWtools/pre_commit/update-dockerfile-graph.sh10# Check if docker/Dockerfile is among the provided files
LOWtools/pre_commit/update-dockerfile-graph.sh14 # Check if Docker is installed and running
LOWtools/pre_commit/update-dockerfile-graph.sh71 # Check if the graph has changed
LOWtools/vllm-rocm/pin_rocm_dependencies.py148 # Check if this line is for one of our custom packages
LOWtests/conftest.py385 # Set this to avoid hanging issue
LOWtests/conftest.py897 # Set this to avoid hanging issue
LOWtests/utils.py503 os.kill(spid, 0) # Check if still alive
LOWtests/test_config.py478 # Check if LONGCHAT_ROPE_PARAMETERS entries are in longchat_model_config
LOWtests/test_logger.py358 # Set max_log_len to 10
LOWtests/v1/attention/test_mla_backends.py830 # Set num_speculative_tokens to query_len - 1
LOWtests/v1/attention/test_sparse_mla_backends.py611 # Set some to -1 to test masking
LOWtests/v1/attention/test_sparse_mla_backends.py615 # Set some to out of bounds
LOWtests/v1/attention/test_sparse_mla_backends.py671 # Set some to -1 to test masking
LOWtests/v1/attention/test_sparse_mla_backends.py675 # Set some to out of bounds
LOWtests/v1/core/test_scheduler.py1987 # Verify if position length is identical
LOWtests/v1/core/test_scheduler.py2883 # Check if scheduled_encoder_inputs is empty as expected
LOWtests/v1/core/test_scheduler.py3235 # Set up to test different encoder cache existence scenario after preemption
LOWtests/v1/core/test_scheduler.py3565 # Set up to test different encoder cache existence scenario after preemption
LOWtests/v1/core/utils.py231 # Verify if position length is identical
LOW…extract_hidden_states_integration/predictable_llama.py81 # Check if we need auxiliary hidden states
LOW…nnector/nixl_integration/config_sweep_accuracy_test.sh97# Check if cross-layers is enabled (non-empty)
LOWtests/v1/determinism/test_batch_invariance.py279 # Check if tokens match first
LOWtests/v1/determinism/test_batch_invariance.py558 # Check if tokens match first
LOWtests/v1/determinism/test_batch_invariance.py787 # Check if tokens match
LOWtests/v1/determinism/test_batch_invariance.py805 # Check if logprobs match bitwise
LOWtests/v1/spec_decode/test_acceptance_length.py102 # Check if get_valid_backends is actually defined in the platform class
LOW…1/ec_connector/integration/run_epd_correctness_test.sh25# Set 1 to use multimodal prompts; else to use text-only
LOW…ts/v1/ec_connector/integration/test_epd_correctness.py218 # Check if server is ready
LOWtests/v1/e2e/spec_decode/test_async_spec_decode.py34 # Increment counter
LOWtests/v1/engine/test_engine_core_client.py284 # Check if all request IDs in outputs have finished
LOWtests/v1/engine/utils.py131 # Check if the sampled_token_id occurs in choice_tensor[1:]
LOWtests/utils_/test_network_utils.py79 # Check if IPv6 is supported by trying to create an IPv6 socket
LOWtests/tool_parsers/test_minimax_tool_parser.py494 # Check if function name is sent (should happen only once)
LOWtests/tool_parsers/test_minimax_tool_parser.py500 # Check if arguments are sent incrementally
LOWtests/tool_parsers/test_mistral_tool_parser.py135 # Check if the slice from the current position matches the target sequence
LOWtests/kernels/moe/test_cpu_int4_moe.py17# Check if the dynamic_4bit_int_moe op is available
LOWtests/kernels/moe/test_cpu_int4_moe.py21# Check if KleidiAI ops are available
LOWtests/kernels/moe/test_moe_layer.py1791 # Check if enough GPUs available
LOWtests/kernels/moe/test_rocm_aiter_topk.py26# Check if aiter package is installed
LOWtests/kernels/moe/test_rocm_aiter_topk.py35 # Check if the op exists in torch.ops.vllm
LOWtests/kernels/moe/test_rocm_aiter_topk.py38 # Check if the op is callable
LOWtests/kernels/moe/test_rocm_aiter_topk.py44 # Check if the op exists in torch.ops.vllm
LOWtests/kernels/moe/test_rocm_aiter_topk.py47 # Check if the op is callable
LOWtests/evals/gsm8k/gsm8k_eval.py297 # Print results to terminal
LOWtests/distributed/test_eplb_execute.py150 # Check if the weights are correct
LOWtests/model_executor/test_eagle_quantization.py112 # Check if get_cache_scale is called and returns expected value
LOWtests/model_executor/test_qwen3_omni.py29 # Check if it's a special token that should be compressed
LOW…model_loader/runai_streamer_loader/test_runai_utils.py56 # Read the file in chunks to handle large files efficiently
LOWtests/models/multimodal/generation/test_maverick.py59 # Print the outputs
LOWtests/quantization/test_gptq_v2.py43 # Check if gptq_v2 format is correctly loaded
LOWtests/quantization/test_gptq_v2.py105 # Print the output sequences if failed
LOWtests/compile/test_config.py50 # Check if get_raw_stream exists in builtins
LOWtests/compile/fusions_e2e/conftest.py44 # Print the outputs.
LOWtests/compile/fullgraph/test_full_graph.py251 # Print the outputs.
252 more matches not shown…
AI Slop Vocabulary113 hits · 283 pts
SeverityFileLineSnippet
LOWcsrc/quantization/machete/generate.py435 # For now, we can just use the first accumulator type seen since
MEDIUMdocker/entrypoints/test_vllm_nonroot_entrypoint.sh145# More robust: count lines matching our UID.
MEDIUMtests/v1/logits_processors/test_custom_offline.py28# Create a mixture of requests which do and don't utilize the dummy logitproc
LOWtests/v1/cudagraph/test_breakable_cudagraph.py235 # Outside capture: decorator should just call through.
LOWtests/tool_parsers/test_mistral_tool_parser.py141 # Otherwise, just add the current token and move to the next one
MEDIUMtests/tool_parsers/test_glm4_moe_tool_parser.py295 # depending on how robust we want the parsing to be
MEDIUMtests/renderers/test_sparse_tensor_validation.py58 # explicitly so this fixture is robust to process-wide invariant-check state
MEDIUMtests/kernels/mamba/test_mamba_mixer2.py106 # - utilize mock patching to disable TP when
MEDIUMtests/distributed/test_pynccl.py382 # Essentially this is an all-gather operation.
MEDIUMtests/distributed/test_context_parallel.py42 # .buildkite/lm-eval-harness/configs/DeepSeek-V2-Lite-Chat.yaml
MEDIUMtests/distributed/test_context_parallel.py44 # .buildkite/lm-eval-harness/configs/Qwen2.5-1.5B-Instruct.yaml
LOWtests/model_executor/test_qwen3_omni.py41 # Regular token, just add it
LOWvllm/env_override.py356# functions just return True.
LOWvllm/v1/attention/backends/flash_attn.py291 # but for now just set it to `UNIFORM_BATCH` to get use to drop down
LOWvllm/v1/attention/backends/utils.py263 # then we can simply use a cdiv for the rest.
MEDIUMvllm/v1/attention/ops/triton_decode_attention.py362 # explicitly facilitate overlapping load/compute
MEDIUMvllm/v1/core/encoder_cache_manager.py321# utilize the cache and this class will fold into EncoderCacheManager, as
LOWvllm/v1/spec_decode/llm_base_proposer.py1661 # Therefore, we can just return the logits.
MEDIUMvllm/v1/worker/gpu_worker.py638 # CUDAGraph memory size and may not utilize all gpu memory.
MEDIUMvllm/v1/worker/gpu_worker.py888 # Generate the trace name by combining prefix with comprehensive rank suffix
LOWvllm/v1/worker/gpu/model_runner.py1175 # For piecewise and eager mode, just call model().
LOWvllm/v1/engine/parallel_sampling.py116 # If streaming, just return the current output
MEDIUMvllm/tool_parsers/llama_tool_parser.py262 # re-set stuff pertaining to progress in the current tool
MEDIUMvllm/tool_parsers/jamba_tool_parser.py218 # re-set stuff pertaining to progress in the current tool
MEDIUMvllm/tool_parsers/granite_tool_parser.py186 # re-set stuff pertaining to progress in the current tool
LOWvllm/tool_parsers/xlam_tool_parser.py558 # If we encounter an error, just return the delta text as regular content
MEDIUMvllm/tool_parsers/granite_20b_fc_tool_parser.py203 # re-set stuff pertaining to progress in the current tool
MEDIUMvllm/tool_parsers/step3_tool_parser.py51 # Explicit state flags for robust streaming
LOWvllm/tokenizers/mistral.py535 # if underlying tokenizer is sentencepiece, we just add "�".
LOWvllm/platforms/cuda.py586 # users can just use IR op priority directly
MEDIUMvllm/distributed/utils.py302 """A robust barrier to synchronize all ranks.
MEDIUM…distributed/kv_transfer/kv_connector/v1/nixl/worker.py615 # we can leverage host_buffer for permute
MEDIUM…ransfer/kv_connector/v1/mooncake/mooncake_connector.py756 # Tasks can await async events, so a surplus (2x is a robust heuristic)
MEDIUMvllm/config/model.py310 """Enable the custom cumem allocator to leverage advanced GPU memory
MEDIUMvllm/config/parallel.py580 # To make the initialization more robust we retry a few times
LOWvllm/model_executor/layers/mamba/ops/causal_conv1d.py185 # first chunk and does not have prior-token, so just set to 0
MEDIUMvllm/model_executor/layers/fused_moe/modular_kernel.py47# The goal is to be able to utilize different communication mechanisms with
LOW…m/model_executor/layers/fused_moe/runner/moe_runner.py259 # Once the MK can be created upfront, we can just pass in the proper
MEDIUM…xecutor/layers/fused_moe/prepare_finalize/deepep_ll.py40 # TODO (varun) : Optimize leverage num_tokens_per_expert counts
MEDIUM…executor/layers/fused_moe/experts/fused_humming_moe.py233 # Neighboring nodes are required to utilize distinct workspaces.
MEDIUMvllm/model_executor/layers/quantization/fp8.py281 # For GPUs that lack FP8 hardware support, we can leverage the Marlin
MEDIUMvllm/model_executor/layers/quantization/modelopt.py317 # Normalize quant_algo for robust matching (ModelOpt may emit lowercase).
MEDIUMvllm/model_executor/layers/quantization/fbgemm_fp8.py53 # For GPUs that lack FP8 hardware support, we can leverage the Marlin
MEDIUM…executor/layers/quantization/utils/marlin_utils_fp4.py101 # to fully utilize the E4M3 dynamic range (e.g., global_scale=1).
MEDIUM…executor/layers/quantization/utils/marlin_utils_fp4.py161 # For GPUs that lack FP4 hardware support, we can leverage the
MEDIUM…executor/layers/quantization/utils/marlin_utils_fp8.py53 # For GPUs that lack FP8 hardware support, we can leverage the
LOW…odel_executor/layers/quantization/utils/quant_utils.py406 # Unquantized layer: just return base weights
MEDIUM…/model_executor/layers/quantization/quark/quark_moe.py166 # For GPUs that lack FP8 hardware support, we can leverage the Marlin
MEDIUM…rs/quantization/compressed_tensors/transform/module.py85 # do not fold into weight in order to utilize FWHT
MEDIUMvllm/model_executor/models/grok1.py93 # Check for Grok2-specific attributes (both for robust detection)
MEDIUMvllm/model_executor/models/deepseek_ocr.py133 """Example of overriding the wrapper class `__init__()` in order to utilize
MEDIUMvllm/model_executor/models/deepseek_ocr.py133 """Example of overriding the wrapper class `__init__()` in order to utilize
LOWvllm/model_executor/models/qwen3_asr.py464 # No audio features, just return linear positions
LOWvllm/model_executor/models/transformers/multimodal.py197 # NOTE: we can't just set caching=False because base class method
MEDIUM…dels/deepseek_v4/nvidia/ops/fused_indexer_q_cutedsl.py167 # all threads in a warp to be active since we utilize warp shuffle later.
LOWvllm/reasoning/granite_reasoning_parser.py198 # corrected; just return the delta text as normal content.
LOWvllm/reasoning/hunyuan_a13b_reasoning_parser.py91 # this id is not part of content, so just return [] here.
LOWvllm/reasoning/olmo3_reasoning_parser.py271 # this id is not part of content, so just return [] here.
MEDIUMvllm/multimodal/evs.py255 # exact timestamp count. This is robust even when early
MEDIUMvllm/benchmarks/datasets/datasets.py2910 # leverage CustomDataset sample
53 more matches not shown…
Verbosity Indicators147 hits · 265 pts
SeverityFileLineSnippet
LOWtests/v1/logits_processors/test_correctness.py1180 # Step 1: think-start token appears.
LOWtests/v1/kv_connector/unit/test_lmcache_connector.py567 # Step 1: Get events from lmcache engine
LOWtests/v1/kv_connector/unit/test_lmcache_connector.py576 # Step 2: Update connector output (simulate receiving from worker)
LOWtests/v1/kv_connector/unit/test_lmcache_connector.py582 # Step 3: Take events
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py433 # Step 2: 5 blocks are in use (2 new for remote blocks).
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py441 # Step 3: finish recving (5 blocks in use)
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py450 # Step 4: try to schedule, remote request is put to running list
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py460 # Step 5: Remote request will be put back to waiting list
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py468 # Step 6: finish the request, free it.
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py477 # Step 7: now we can schedule (with 2 blocks computed),
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py489 # Step 8: free everything.
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py536 # Step 2: 3 blocks are in use,
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py547 # Step 3: finish the request, free it.
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py556 # Step 4: now we can initiate KV transfer (with 2 blocks computed).
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py564 # Step 5: finish recving (5 blocks in use)
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py573 # Step 6: schedule remote request
LOW…/v1/kv_connector/unit/test_remote_prefill_lifecycle.py580 # Step 7: free everything.
LOWtests/v1/determinism/test_batch_invariance.py700 # Step 1: Run decode and collect logprobs
LOWtests/v1/determinism/test_batch_invariance.py719 # Step 2: For each token position, run prefill and compare
LOWtests/v1/streaming_input/test_scheduler_streaming.py373 # Step 2: Schedule creates NewRequestData
LOWtests/v1/streaming_input/test_scheduler_streaming.py447 # Step 7: Schedule again - now request uses cached state
LOWtests/v1/streaming_input/test_scheduler_streaming.py514 # Step 12: Add new streaming request with seq_id=1
LOWtests/v1/streaming_input/test_scheduler_streaming.py388 # Step 3: Simulate model runner caching the prompt_token_ids
LOWtests/v1/streaming_input/test_scheduler_streaming.py421 # Step 6: Verify request state after Cycle 1
LOWtests/v1/streaming_input/test_scheduler_streaming.py459 # Step 8: Calculate num_tokens like gpu_model_runner.py:1284 does
LOWtests/v1/streaming_input/test_scheduler_streaming.py495 # Step 11: Verify request transitioned to WAITING_FOR_STREAMING_REQ
LOWtests/v1/streaming_input/test_scheduler_streaming.py526 # Step 13: Scheduler schedules the updated session
LOWtests/v1/streaming_input/test_scheduler_streaming.py544 # Step 14: Model runner caches NEW prompt_token_ids reference
LOWtests/v1/streaming_input/test_scheduler_streaming.py557 # Step 15: FINAL CRITICAL VERIFICATION
LOW…/streaming_input/test_gpu_model_runner_v2_streaming.py84 # Step 1: Add initial request with 3 prompt tokens, all computed
LOW…/streaming_input/test_gpu_model_runner_v2_streaming.py100 # Step 2: Create streaming update with extended prompt
LOW…/streaming_input/test_gpu_model_runner_v2_streaming.py116 # Step 3: Verify no free_indices leak (old slot recycled)
LOW…/streaming_input/test_gpu_model_runner_v2_streaming.py155 # Step 1: Add initial request with one audio feature
LOW…/streaming_input/test_gpu_model_runner_v2_streaming.py175 # Step 2: Create streaming update with additional multimodal feature
LOW…/streaming_input/test_gpu_model_runner_v2_streaming.py192 # Step 3: Verify no free_indices leak
LOW…/v1/streaming_input/test_gpu_model_runner_streaming.py59 # Step 1: Create initial request state with some computed tokens
LOW…/v1/streaming_input/test_gpu_model_runner_streaming.py77 # Step 2: Create new request data with extended prompt
LOW…/v1/streaming_input/test_gpu_model_runner_streaming.py96 # Step 3: Update the request
LOW…/v1/streaming_input/test_gpu_model_runner_streaming.py101 # Step 4: Verify the request state was updated correctly
LOW…/v1/streaming_input/test_gpu_model_runner_streaming.py131 # Step 1: Create initial request state with one multimodal feature
LOW…/v1/streaming_input/test_gpu_model_runner_streaming.py156 # Step 2: Create new request data with additional multimodal feature
LOW…/v1/streaming_input/test_gpu_model_runner_streaming.py176 # Step 3: Update the request
LOW…/v1/streaming_input/test_gpu_model_runner_streaming.py181 # Step 4: Verify the request state was updated correctly
LOW…1/ec_connector/integration/run_epd_correctness_test.sh459# Step 1: Run baseline
LOW…1/ec_connector/integration/run_epd_correctness_test.sh462# Step 2: Test 1E + 1PD
LOW…1/ec_connector/integration/run_epd_correctness_test.sh465# Step 3: Test baseline 1P + 1D
LOW…1/ec_connector/integration/run_epd_correctness_test.sh468# Step 4: Test 1E + 1P + 1D
LOWtests/kernels/test_fused_inv_rope_fp8_quant.py683 # Step 1: In-place CUDA RoPE (same as production)
LOWtests/kernels/test_fused_inv_rope_fp8_quant.py695 # Step 2: Reshape + quant + reshape (same as production)
LOW…ultimodal/generation/test_vit_backend_functionality.py415 # Step 1: Backend filtering
LOW…ultimodal/generation/test_vit_backend_functionality.py425 # Step 2: Apply GPU marks dynamically
LOW…ultimodal/generation/test_vit_backend_functionality.py430 # Step 3: Route to appropriate handler
LOW…ts/entrypoints/llm/test_mm_cache_external_injection.py91 # Step 1: Normal requests to populate the cache
LOW…ts/entrypoints/llm/test_mm_cache_external_injection.py99 # Step 2: Use a second image to get valid expanded tokens and
LOW…ntrypoints/weight_transfer/test_weight_transfer_llm.py263 # Step 1: Initialize weight transfer engine
LOW…ntrypoints/weight_transfer/test_weight_transfer_llm.py268 # Step 2: Start weight update
LOW…ntrypoints/weight_transfer/test_weight_transfer_llm.py271 # Step 3: Update weights
LOW…ntrypoints/weight_transfer/test_weight_transfer_llm.py282 # Step 4: Finish weight update
LOWtests/entrypoints/openai/responses/test_harmony.py1069 # Step 1: Get a function call from the model
LOWtests/entrypoints/openai/responses/test_harmony.py1095 # Step 2: Build full conversation history
87 more matches not shown…
Cross-Language Confusion50 hits · 246 pts
SeverityFileLineSnippet
HIGHsetup.py656 "build_tag": null,
HIGHsetup.py660 "variant": null,
HIGHcsrc/cpu/generate_cpu_attn_dispatch.py156 (__riscv_v_min_vlen == 128 || __riscv_v_min_vlen == 256)
HIGHcsrc/cpu/generate_cpu_attn_dispatch.py228 "&& (__riscv_v_min_vlen == 128 || __riscv_v_min_vlen == 256)",
HIGHcsrc/quantization/machete/generate.py512 "M > 256 && K <= 16384 && N <= 4096": ((128, 128), (2, 1, 1)),
HIGHcsrc/quantization/machete/generate.py515 "M > 128 && K <= 4096 && N <= 4096": ((128, 64), (2, 1, 1)),
HIGHcsrc/quantization/machete/generate.py516 "M > 128 && K <= 8192 && N <= 8192": ((128, 128), (2, 1, 1)),
HIGHcsrc/quantization/machete/generate.py519 "M > 64 && K <= 4069 && N <= 4069": ((128, 32), (2, 1, 1)),
HIGHcsrc/quantization/machete/generate.py520 "M > 64 && K <= 4069 && N <= 8192": ((128, 64), (2, 1, 1)),
HIGHcsrc/quantization/machete/generate.py521 "M > 64 && K >= 8192 && N >= 12288": ((256, 128), (2, 1, 1)),
HIGHcsrc/quantization/machete/generate.py524 "M > 32 && K <= 6144 && N <= 6144": ((128, 16), (1, 1, 1)),
HIGHcsrc/quantization/machete/generate.py525 "M > 32 && K >= 16384 && N >= 12288": ((256, 64), (2, 1, 1)),
HIGHcsrc/quantization/machete/generate.py528 "M > 16 && K <= 12288 && N <= 8192": ((128, 32), (2, 1, 1)),
HIGHtests/v1/attention/test_trtllm_attention_integration.py140 # Randomly permute blocks (starting from block 1; block 0 is null).
HIGHtests/v1/attention/test_mla_backends.py256 # Permute the context blocks (excluding block 0 which is null)
HIGHtests/v1/attention/test_attention_backends.py164 # Permute the context blocks (excluding block 0 which is null)
HIGHtests/v1/core/test_scheduler.py2630 num_blocks=5, # Can hold 64 tokens (first block is null)
HIGHtests/v1/core/test_scheduler.py3412 num_blocks=15, # can hold 244 tokens with 14 blocks (first block is null)
HIGHtests/v1/core/test_scheduler.py3653 num_blocks=11, # Can hold 160 tokens (first block is null)
HIGHtests/v1/cudagraph/test_cudagraph_mode.py67 # when above code raises, `llm` may be undefined, so we need to catch that
HIGHtests/v1/cudagraph/test_cudagraph_mode.py123 # when above code raises, `llm` may be undefined, so we need to catch that
HIGHtests/tool_parsers/test_granite_tool_parser.py36 "null_field": null,
HIGHtests/tool_parsers/test_gemma4_tool_parser.py91 # instead of `{"param": null}` for nullable tool parameters.
HIGHtests/tool_parsers/test_gemma4_tool_parser.py94 assert json.dumps(result) == '{"param": null}'
HIGHtests/tool_parsers/test_granite_20b_fc_tool_parser.py36 "null_field": null,
HIGHtests/tool_parsers/test_olmo3_tool_parser.py34 "role=null, "
HIGHtests/tool_parsers/test_olmo3_tool_parser.py43 '"role": null, '
HIGHtests/tool_parsers/test_pythonic_tool_parser.py35 '"role": null, '
HIGHtests/tool_parsers/test_deepseekv3_tool_parser.py43 """"bool_field": true, "null_field": null, """
HIGHtests/tool_parsers/test_internlm2_tool_parser.py54 "null_field": null,
HIGHtests/tool_parsers/test_llama4_pythonic_tool_parser.py35 '"role": null, '
HIGHtests/tool_parsers/test_phi4mini_tool_parser.py51 "null_field": null,
HIGHtests/tool_parsers/test_lfm2_tool_parser.py38 '"role": null, '
HIGHtests/tool_parsers/test_longcat_tool_parser.py54 "null_field": null,
HIGHtests/tool_parsers/test_hunyuan_a13b_tool_parser.py45 '<tool_calls>[{"name": "get_weather", "arguments": {"city": "San Francisco", "metric": "celsius"}}, {"name":
HIGHtests/tool_parsers/test_qwen3coder_tool_parser.py633 # Multi non-null: anyOf[string, integer, null] → first non-null is string
HIGHtests/tool_parsers/test_qwen3coder_tool_parser.py189[{"question": "Pick a color", "multiSelect": false, "answer": null}]
HIGH…entrypoints/serve/disagg/test_return_routed_experts.py36 '{"sliding_window": null}',
HIGHtests/entrypoints/openai/test_return_routed_experts.py33 '{"sliding_window": null}',
HIGHvllm/_custom_ops.py1812 assert k_times_2 % 2 == 0, "input width must be even (gate || up layout)"
HIGHvllm/_custom_ops.py1921 assert k_times_2 % 2 == 0, "input width must be even (gate || up layout)"
HIGHvllm/_custom_ops.py1792 input_tensor: The input tensor with gate || up layout [m_topk, k*2]
HIGHvllm/v1/core/single_type_kv_cache_manager.py777 # result [null] [null] ... [null] [hit block 1 (1st block contain
HIGHvllm/v1/core/single_type_kv_cache_manager.py338 every (non-null) block — the default for full attention.
HIGHvllm/v1/core/single_type_kv_cache_manager.py731 [null, null, block 3], otherwise, we return [null, null]
HIGHvllm/v1/core/single_type_kv_cache_manager.py736 we return 4 blocks[null, null, null, null]
HIGHvllm/tool_parsers/utils.py280 (null, true, false) that some models produce instead of Python
HIGH…ibuted/kv_transfer/kv_connector/v1/flexkv_connector.py47 cd FlexKV && bash build.sh
HIGH…isaggregated/flexkv_connector/prefix_caching_flexkv.py12 2. cd FlexKV && bash build.sh
HIGH.buildkite/scripts/generate-nightly-index.py209 "build_tag": null,
Docstring Block Structure39 hits · 195 pts
SeverityFileLineSnippet
HIGHvllm/v1/attention/backends/registry.py208Register or override a backend implementation. Args: backend: The AttentionBackendEnum member to register
HIGHvllm/v1/attention/backends/mla/prefill/registry.py102Register or override an MLA prefill backend implementation. Args: backend: The MLAPrefillBackendEnum member
HIGHvllm/v1/core/single_type_kv_cache_manager.py718 For chunked local attention, we need to find the longest cache hit prefix of the blocks that is not lon
HIGHvllm/v1/structured_output/utils.py290 Check if grammar appears to use Lark syntax. Args: grammar_str: Input grammar string Returns:
HIGHvllm/v1/structured_output/utils.py322 Convert a Lark grammar string to EBNF format. EBNF reference: https://github.com/ggerganov/llama.cpp/blob/
HIGHvllm/v1/worker/utils.py264 Select a block size that is supported by all backends and is a factor of kv_manager_block_size. If kv_mana
HIGHvllm/tool_parsers/apertus_tool_parser.py137 Buffers incoming delta chunks to prevent fragmentation of multi-token special tags. If a chunk
HIGHvllm/tool_parsers/apertus_tool_parser.py181 Extracts tool calls from a completely generated model response (Non-Streaming). Args: mode
HIGHvllm/tool_parsers/apertus_tool_parser.py283 Handles streaming chunks Args: previous_text: The complete model text generated prior to t
HIGHvllm/tool_parsers/apertus_tool_parser.py494 Calculates the exact string difference to safely append new tool parameters. This ensures characters l
HIGH…ed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py738 Get number of new tokens that can be loaded from the external KV cache beyond the num_computed_tokens.
HIGHvllm/distributed/kv_transfer/kv_connector/v1/base.py459 Get number of new tokens that can be loaded from the external KV cache beyond the num_computed_tokens.
HIGH…ed/kv_transfer/kv_connector/v1/moriio/moriio_engine.py205Get remote allocation info for a request. Args: transfer_id:TransferId The request ID Retu
HIGH…nector/v1/lmcache_integration/multi_process_adapter.py198 Submit a new lookup request to LMCache if there is no ongoing request. Supports both token-based and h
HIGH…nector/v1/lmcache_integration/multi_process_adapter.py601 Check and get the finished store and retrieve requests. Args: finished_req_ids_from_engine
HIGH…/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py107Allocates a memory block of at least the requested size. Args: size (int): Minimum size of memory t
HIGH…/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py184Stores a CUDA tensor in pinned host memory. Args: tensor (torch.Tensor): CUDA tensor to store
HIGH…/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py229Loads a tensor from pinned host memory to the specified device. Args: addr (int): Address where ten
HIGHvllm/distributed/weight_transfer/factory.py82Create a weight transfer engine instance. Args: config: Weight transfer configuration containing th
HIGHvllm/distributed/weight_transfer/base.py86 Construct typed init info from dict with validation. Args: init_dict: Dictionary containin
HIGHvllm/distributed/weight_transfer/base.py106 Construct typed update info from dict with validation. Args: update_dict: Dictionary conta
HIGHvllm/logging_utils/formatter.py22 Shortens a file path for logging display: - Removes leading 'vllm' folder if present.
HIGHvllm/model_executor/kernels/linear/__init__.py441 Choose a _KernelT that can implement the given config for the given compute capability. Attempts to choose the
HIGHvllm/model_executor/kernels/linear/__init__.py619 Choose an MPLinearKernel that can implement the given config for the given compute capability. Attempts to cho
HIGH…/model_executor/layers/fused_moe/expert_map_manager.py337 Map global expert ID to local expert ID. Args: global_id: Global expert ID (0 to global_nu
HIGHvllm/model_executor/layers/fla/ops/chunk.py153 Args: q (torch.Tensor): Queries of shape `[B, T, H, K]`. k (torch.Tensor):
HIGHvllm/model_executor/layers/fla/ops/fused_recurrent.py530 Args: q (torch.Tensor): queries of shape `[B, T, H, K]`. k (torch.Tensor):
HIGHvllm/model_executor/models/keye_vl1_5.py79 Return num_patches per video. Args: grid_thw: Tensor with shape [N, 3] containing temporal, height, wi
HIGHvllm/model_executor/models/isaac.py246Apply pixel shuffle to a packed vision sequence without unpacking per image. Args: x (`torch.Tensor`):
HIGHvllm/parser/parser_manager.py37 Retrieve a registered or lazily registered Parser class. Args: name: The registered name o
HIGHvllm/multimodal/audio.py90Normalize audio to the specified format. This function handles channel reduction for multi-channel audio, suppo
HIGHvllm/benchmarks/lib/ready_checker.py25 Wait for an endpoint to become available before starting benchmarks. Args: request_func: The async req
HIGHvllm/entrypoints/chat_utils.py1440 Parses a given multi-modal content part based on its type. Args: part: A dict containing the content p
HIGHvllm/transformers_utils/gguf_utils.py174Extract vision config parameters from mmproj.gguf metadata. Reads vision encoder configuration from GGUF metadata f
HIGHvllm/transformers_utils/processors/isaac.py192Convert normalized images into flattened ViT-style patches. Args: image (`torch.Tensor`): Tenso
HIGHvllm/lora/resolver.py72Get a registered resolver instance by name. Args: resolver_name: Name of the resolver to get.
HIGH…ications/chatbot/streamlit_openai_chatbot_webserver.py111Generate and stream LLM response with optional reasoning process. Args: messages (list): List of conversati
HIGHbenchmarks/benchmark_long_document_qa_throughput.py68 Repeat each prompt in the list for a specified number of times. The order of prompts in the output list depends
HIGHbenchmarks/attention_benchmarks/batch_spec.py74 Parse batch specification string into list of BatchRequest objects. Grammar: (<count>?) q<q_len>(k?) (s<seq_le
Hallucination Indicators9 hits · 95 pts
SeverityFileLineSnippet
CRITICALrust/src/cmd/src/cli/unsupported.rs138/// - `vllm.entrypoints.cli.serve.ServeSubcommand.subparser_init(...)`
CRITICALtests/v1/e2e/general/test_mamba_prefix_cache.py773 assert engine.llm_engine.engine_core.engine_core.scheduler.reset_prefix_cache()
CRITICALtests/distributed/test_torchrun_example_moe.py68 llm.llm_engine.model_executor.driver_worker.worker.model_runner.model.parameters()
CRITICALtests/distributed/test_torchrun_example.py59 llm.llm_engine.model_executor.driver_worker.worker.model_runner.model.parameters()
CRITICALtests/models/language/generation/test_gemma.py19 lambda self: self.model_runner.model.language_model.model.normalizer.cpu().item() # noqa: E501
CRITICALtests/models/language/generation/test_gemma.py24 lambda self: self.model_runner.model.model.normalizer.cpu().item()
CRITICALvllm/v1/spec_decode/llm_base_proposer.py1311 self.model.model.embed_tokens.weight.cpu(),
CRITICALvllm/model_executor/layers/fla/ops/utils.py170 triton.runtime.driver.active.utils.get_device_properties(i)[
CRITICALdocs/training/layerwise.md124model = llm.llm_engine.engine_core.engine_core.model_executor.driver_worker.worker.get_model()
Fake / Example Data20 hits · 23 pts
SeverityFileLineSnippet
LOWtests/tool_parsers/test_kimi_k2_tool_parser.py85 '{"to": "user@example.com", "subject": "Daily Update"}',
LOWtests/tool_parsers/test_kimi_k2_tool_parser.py92 {"to": "user@example.com", "subject": "Daily Update"},
LOWtests/tool_parsers/test_mistral_tool_parser.py303 "name": "John Doe",
LOWtests/tool_parsers/test_mistral_tool_parser.py693 "name": "John Doe",
LOWtests/tool_parsers/test_mistral_tool_parser.py1056 "name": "John Doe",
LOWtests/tool_parsers/test_mistral_tool_parser.py296 """[TOOL_CALLS] [{"arguments":{"name": "John Doe"}, "name": "get_age"}]""", # noqa: E501
LOWtests/tool_parsers/test_mistral_tool_parser.py686 """[TOOL_CALLS] [{"arguments": {"name": "John Doe"}, "name": "get_age"}]""", # noqa: E501
LOWtests/tool_parsers/test_mistral_tool_parser.py1049 """[TOOL_CALLS] [{"arguments": {"name": "John Doe"}, "name": "get_age"}]""", # noqa: E501
LOWtests/tool_parsers/test_olmo3_tool_parser.py23 "register_user(name='John Doe', "
LOWtests/tool_parsers/test_olmo3_tool_parser.py31 "register_user(name='John Doe', "
LOWtests/tool_parsers/test_olmo3_tool_parser.py40 arguments='{"name": "John Doe", '
LOWtests/tool_parsers/test_pythonic_tool_parser.py23 "register_user(name='John Doe', "
LOWtests/tool_parsers/test_pythonic_tool_parser.py32 arguments='{"name": "John Doe", '
LOWtests/tool_parsers/test_lfm2_tool_parser.py26 "register_user(name='John Doe', "
LOWtests/tool_parsers/test_lfm2_tool_parser.py35 arguments='{"name": "John Doe", '
LOWtests/tool_parsers/test_lfm2_tool_parser.py339 "deliveryAddress='123 Main St')]"
LOWtests/tool_parsers/test_hunyuan_a13b_tool_parser.py45 '<tool_calls>[{"name": "get_weather", "arguments": {"city": "San Francisco", "metric": "celsius"}}, {"name":
LOWtests/tool_parsers/test_hunyuan_a13b_tool_parser.py53 "name": "John Doe",
LOWtests/benchmarks/test_txt_slices_dataset.py20Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
LOWtests/benchmarks/test_txt_slices_dataset.py20Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
Example Usage Blocks14 hits · 20 pts
SeverityFileLineSnippet
LOWtools/setup_deepgemm_pythons.sh6# Usage:
LOWtools/vllm-rocm/generate-rocm-wheels-root-index.sh11# Usage:
LOWdocker/docker-bake.hcl5# Usage:
LOWdocker/entrypoints/test_vllm_nonroot_entrypoint.sh9# Usage:
LOW…/nixl_integration/run_multi_connector_accuracy_test.sh15# Usage:
LOW…nector/nixl_integration/spec_decode_acceptance_test.sh13# Usage:
LOW…nixl_integration/run_multi_connector_edge_case_test.sh15# Usage:
LOWexamples/tool_calling/chat_with_tools_offline.py44# Usage:
LOWexamples/ray_serving/run_cluster.sh8# Usage:
LOWexamples/ray_serving/multi-node-serving.sh11# Example usage:
LOWexamples/generate/multimodal/mistral-small_offline.py51# Usage:
LOW…nchmarks/attention_benchmarks/configs/mla_prefill.yaml14# Usage:
LOW.buildkite/scripts/cache-rocm-base-wheels.sh10# Usage:
LOW.buildkite/scripts/tool_call/run-bfcl-eval.sh5# Usage:
Magic Placeholder Names5 hits · 20 pts
SeverityFileLineSnippet
HIGH…les/pooling/embed/openai_embedding_long_text/client.py22 --api-key your-api-key
HIGH…les/pooling/embed/openai_embedding_long_text/client.py32 --api-key your-api-key
HIGH…les/pooling/embed/openai_embedding_long_text/client.py44API_KEY = "your-api-key" # Replace with your actual API key
HIGH…es/pooling/embed/openai_embedding_long_text/service.sh19API_KEY=${API_KEY:-"your-api-key"}
HIGH.github/ISSUE_TEMPLATE/400-bug-report.yml20 Consider redacting or replacing sensitive values with placeholders like `<YOUR_TOKEN_HERE>` when sharing configura
Slop Phrases7 hits · 16 pts
SeverityFileLineSnippet
MEDIUMtests/models/multimodal/generation/test_common.py64# model arch happens to be a substring of another one, you can add a
MEDIUMtests/models/multimodal/generation/test_common.py85# NOTE you can add --collect-only to any of the above commands to see
LOW…distributed/kv_transfer/kv_connector/v1/nixl/worker.py1974 # while processing the next batch, we make sure to only set an
MEDIUMvllm/model_executor/models/interfaces.py224 as a language model component.
MEDIUM…ache/disagg_prefill_lmcache_v1/disagg_vllm_launcher.sh24# secure random value. This is set to a fixed value for demonstration purposes only.
MEDIUMexamples/rl/rlhf_ipc.py17 for demonstration purposes we simply zero out the weights.
MEDIUM.buildkite/test-amd.yaml1# In this file, you can add more tests to run either by adding a new step or
Synthetic Comment Markers1 hit · 8 pts
SeverityFileLineSnippet
HIGH…tic_prefix_caching/automatic_prefix_caching_offline.py26# A prompt containing a large markdown table. The table is randomly generated by GPT-4.
Overly Generic Function Names4 hits · 4 pts
SeverityFileLineSnippet
LOW…gated/disaggregated_serving/moriio_toy_proxy_server.py227async def handle_request(api: str, request: Request):
LOW…aggregated/p2p_nccl_xpyd/disagg_proxy_p2p_nccl_xpyd.py125async def handle_request():
LOW…marks/disagg_benchmarks/disagg_prefill_proxy_server.py243 async def handle_request():
LOWbenchmarks/disagg_benchmarks/round_robin_proxy.py16 async def handle_request(self, request):
Dead Code1 hit · 2 pts
SeverityFileLineSnippet
MEDIUMtests/v1/e2e/general/test_streaming_input.py502