Repository Analysis

EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of language models.

12.5 Low AI signal View on GitHub
12.5
Adjusted Score
12.5
Raw Score
100%
Time Factor
2026-05-11
Last Push
12,750
Stars
Python
Language
289,478
Lines of Code
14963
Files
1813
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 366MEDIUM 149LOW 1298

Pattern Findings

1813 matches across 15 categories. Click a row to expand file-level details.

Cross-File Repetition353 hits · 1765 pts
SeverityFileLineSnippet
HIGHlm_eval/filters/selection.py0can define custom behavior here, if an individual instantiation of a filter class should have state.
HIGHlm_eval/filters/selection.py0can define custom behavior here, if an individual instantiation of a filter class should have state.
HIGHlm_eval/api/filter.py0can define custom behavior here, if an individual instantiation of a filter class should have state.
HIGHlm_eval/filters/selection.py0assuming each entry of `resps` is a list of model responses, we discard all but the first response.
HIGH…val/tasks/darija_bench/darija_transliteration/utils.py0assuming each entry of `resps` is a list of model responses, we discard all but the first response.
HIGHlm_eval/tasks/darija_bench/darija_translation/utils.py0assuming each entry of `resps` is a list of model responses, we discard all but the first response.
HIGH…_eval/tasks/darija_bench/darija_summarization/utils.py0assuming each entry of `resps` is a list of model responses, we discard all but the first response.
HIGHlm_eval/tasks/super_glue/record/t5_utils.py0lower text and remove punctuation, articles and extra whitespace.
HIGHlm_eval/tasks/french_bench/utils.py0lower text and remove punctuation, articles and extra whitespace.
HIGHlm_eval/tasks/longbench/metrics.py0lower text and remove punctuation, articles and extra whitespace.
HIGHlm_eval/tasks/mlqa/utils.py0lower text and remove punctuation, articles and extra whitespace.
HIGHlm_eval/tasks/tinyBenchmarks/utils_truthfulqa.py0returns `t5` style bleu scores. see the related implementation: https://github.com/google-research/text-to-text-transfer
HIGHlm_eval/tasks/catalan_bench/truthfulqa_va/utils.py0returns `t5` style bleu scores. see the related implementation: https://github.com/google-research/text-to-text-transfer
HIGHlm_eval/tasks/truthfulqa-multi/utils.py0returns `t5` style bleu scores. see the related implementation: https://github.com/google-research/text-to-text-transfer
HIGHlm_eval/tasks/truthfulqa/utils.py0returns `t5` style bleu scores. see the related implementation: https://github.com/google-research/text-to-text-transfer
HIGHlm_eval/tasks/noreval/nortruthfulqa/generation/utils.py0returns `t5` style bleu scores. see the related implementation: https://github.com/google-research/text-to-text-transfer
HIGHlm_eval/tasks/noreval/norsumm/utils.py0returns `t5` style bleu scores. see the related implementation: https://github.com/google-research/text-to-text-transfer
HIGHlm_eval/tasks/galician_bench/utils.py0returns `t5` style bleu scores. see the related implementation: https://github.com/google-research/text-to-text-transfer
HIGHlm_eval/tasks/tinyBenchmarks/utils_truthfulqa.py0returns `t5` style rouge scores. see the related implementation: https://github.com/google-research/text-to-text-transfe
HIGHlm_eval/tasks/catalan_bench/truthfulqa_va/utils.py0returns `t5` style rouge scores. see the related implementation: https://github.com/google-research/text-to-text-transfe
HIGHlm_eval/tasks/truthfulqa-multi/utils.py0returns `t5` style rouge scores. see the related implementation: https://github.com/google-research/text-to-text-transfe
HIGHlm_eval/tasks/truthfulqa/utils.py0returns `t5` style rouge scores. see the related implementation: https://github.com/google-research/text-to-text-transfe
HIGHlm_eval/tasks/noreval/nortruthfulqa/generation/utils.py0returns `t5` style rouge scores. see the related implementation: https://github.com/google-research/text-to-text-transfe
HIGHlm_eval/tasks/noreval/norsumm/utils.py0returns `t5` style rouge scores. see the related implementation: https://github.com/google-research/text-to-text-transfe
HIGHlm_eval/tasks/galician_bench/utils.py0returns `t5` style rouge scores. see the related implementation: https://github.com/google-research/text-to-text-transfe
HIGHlm_eval/tasks/darijammlu/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/mmlusr/config.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/e2lmc/noor/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/egymmlu/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/arab_culture/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGH…val/tasks/arab_culture_completion/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/mmlu/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/tmmluplus/default/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/arabicmmlu/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/tmlu/default/_generate_configs.py0take in a yaml, and output all "other" splits with this yaml
HIGHlm_eval/tasks/mgsm/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrimmlu/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/paws-x/_generate_config.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/translation/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/xnli/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/openai_mmlu/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/adr/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/mafand/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/naijarc/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/belebele/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/injongointent/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/xlsum/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/afrisenti/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/masakhapos/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/ntrex/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/flores/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/salt/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/masakhanews/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/uhura-arc-easy/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/sib/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/afriqa/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrobench/masakhaner/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrixnli/gen_utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/afrixnli/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
HIGHlm_eval/tasks/xwinograd/utils.py0generate a yaml file for each language. :param output_dir: the directory to output the files to. :param overwrite: wheth
293 more matches not shown…
Hyper-Verbose Identifiers655 hits · 580 pts
SeverityFileLineSnippet
LOWlm_eval/evaluator_utils.py173def _compute_task_aggregations(
LOWlm_eval/evaluator_utils.py319def _collect_groups_bottom_up(groups: dict[str, Group]) -> list[Group]:
LOWlm_eval/evaluator_utils.py404def _propagate_higher_is_better(
LOWlm_eval/utils.py47def is_transformers_available() -> bool:
LOWlm_eval/utils.py324def get_sample_results_filenames(filenames: list[str]) -> list[str]:
LOWlm_eval/utils.py331def get_rolling_token_windows(
LOWlm_eval/utils.py844def check_remote_tokenizer_support(
LOWlm_eval/tasks/__init__.py36def get_task_name_from_config(task_config: dict[str, str]) -> str:
LOWlm_eval/tasks/__init__.py50def get_task_name_from_object(task_object):
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py207def generate_optimal_plans_for_problem_state(P, state, num_plans, timeout):
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py330def create_tmp_dom_prob_replace_init(P, state, result_domain_file, result_problem_file):
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py671def str_remove_before_first_parentheses(s):
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py680def str_remove_after_last_parentheses(s):
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py207def generate_optimal_plans_for_problem_state(P, state, num_plans, timeout):
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py330def create_tmp_dom_prob_replace_init(P, state, result_domain_file, result_problem_file):
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py671def str_remove_before_first_parentheses(s):
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py680def str_remove_after_last_parentheses(s):
LOWlm_eval/tasks/jfinqa/test_jfinqa_utils.py35 def test_normalize_comma_only_between_digits(self):
LOWlm_eval/tasks/jfinqa/test_jfinqa_utils.py58 def test_extract_answer_multiline_with_answer(self):
LOWlm_eval/tasks/jfinqa/test_jfinqa_utils.py100 def test_exact_numerical_match(self):
LOWlm_eval/tasks/jfinqa/test_jfinqa_utils.py115 def test_non_numeric_fallback(self):
LOWlm_eval/tasks/jfinqa/test_jfinqa_utils.py123 def test_same_unit_different_values(self):
LOWlm_eval/tasks/jfinqa/test_jfinqa_utils.py145 def test_missing_optional_fields(self):
LOWlm_eval/tasks/jfinqa/test_jfinqa_utils.py163 def test_exact_and_numerical_match(self):
LOWlm_eval/tasks/jfinqa/test_jfinqa_utils.py169 def test_numerical_match_only(self):
LOWlm_eval/tasks/ifeval/instructions.py133 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py170 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py243 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py293 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py340 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py379 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py427 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py476 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py550 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py600 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py660 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py721 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py784 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py853 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py911 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py939 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1017 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1108 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1153 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1208 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1241 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1295 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1331 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1356 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1432 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1460 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1492 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1523 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1580 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/instructions.py1612 def get_instruction_args_keys(self):
LOWlm_eval/tasks/ifeval/utils.py24def test_instruction_following_strict(
LOWlm_eval/tasks/ifeval/utils.py57def test_instruction_following_loose(
LOWlm_eval/tasks/ifeval/multilingual/utils.py23def test_instruction_following_strict(
LOWlm_eval/tasks/ifeval/multilingual/utils.py56def test_instruction_following_loose(
LOW…ks/ifeval/multilingual/instructions/es_instructions.py104 def get_instruction_args_keys(self):
595 more matches not shown…
Decorative Section Separators98 hits · 358 pts
SeverityFileLineSnippet
MEDIUMlm_eval/tasks/cruxeval/utils.py223# ============================================================================
MEDIUMlm_eval/tasks/cruxeval/utils.py225# ============================================================================
MEDIUMlm_eval/tasks/cruxeval/utils.py241# ============================================================================
MEDIUMlm_eval/tasks/cruxeval/utils.py243# ============================================================================
MEDIUMlm_eval/tasks/cruxeval/utils.py425# ============================================================================
MEDIUMlm_eval/tasks/cruxeval/utils.py427# ============================================================================
MEDIUMlm_eval/api/registry.py414# =============================================================================
MEDIUMlm_eval/api/registry.py416# =============================================================================
MEDIUMlm_eval/api/registry.py460# =============================================================================
MEDIUMlm_eval/api/registry.py462# =============================================================================
MEDIUMlm_eval/api/registry.py520# =============================================================================
MEDIUMlm_eval/api/registry.py522# =============================================================================
MEDIUMlm_eval/api/registry.py570# =============================================================================
MEDIUMlm_eval/api/registry.py572# =============================================================================
MEDIUMlm_eval/api/group.py327# =============================================================================
MEDIUMlm_eval/api/group.py329# =============================================================================
MEDIUMtests/test_fewshot_context.py13# =============================================================================
MEDIUMtests/test_fewshot_context.py15# =============================================================================
MEDIUMtests/test_fewshot_context.py24# =============================================================================
MEDIUMtests/test_fewshot_context.py26# =============================================================================
MEDIUMtests/test_fewshot_context.py58# =============================================================================
MEDIUMtests/test_fewshot_context.py60# =============================================================================
MEDIUMtests/test_fewshot_context.py112# =============================================================================
MEDIUMtests/test_fewshot_context.py114# =============================================================================
MEDIUMtests/test_fewshot_context.py717# =============================================================================
MEDIUMtests/test_fewshot_context.py719# =============================================================================
MEDIUMtests/test_fewshot_context.py180# =============================================================================
MEDIUMtests/test_fewshot_context.py182# =============================================================================
MEDIUMtests/test_fewshot_context.py455# =============================================================================
MEDIUMtests/test_fewshot_context.py457# =============================================================================
MEDIUMtests/test_task_manager.py12# =============================================================================
MEDIUMtests/test_task_manager.py14# =============================================================================
MEDIUMtests/test_task_manager.py359# =============================================================================
MEDIUMtests/test_task_manager.py361# =============================================================================
MEDIUMtests/test_task_manager.py921# =============================================================================
MEDIUMtests/test_task_manager.py923# =============================================================================
MEDIUMtests/test_task_manager.py84# =============================================================================
MEDIUMtests/test_task_manager.py86# =============================================================================
MEDIUMtests/test_task_manager.py205# =============================================================================
MEDIUMtests/test_task_manager.py207# =============================================================================
MEDIUMtests/test_task_manager.py777# =============================================================================
MEDIUMtests/test_task_manager.py779# =============================================================================
MEDIUMtests/test_samplers.py38# =============================================================================
MEDIUMtests/test_samplers.py40# =============================================================================
MEDIUMtests/test_samplers.py213# =============================================================================
MEDIUMtests/test_samplers.py215# =============================================================================
MEDIUMtests/test_samplers.py268# =============================================================================
MEDIUMtests/test_samplers.py270# =============================================================================
MEDIUMtests/test_samplers.py304# =============================================================================
MEDIUMtests/test_samplers.py306# =============================================================================
MEDIUMtests/test_samplers.py15# =============================================================================
MEDIUMtests/test_samplers.py17# =============================================================================
MEDIUMtests/test_aggregation_pipeline.py26# ---------------------------------------------------------------------------
MEDIUMtests/test_aggregation_pipeline.py28# ---------------------------------------------------------------------------
MEDIUMtests/test_aggregation_pipeline.py107# ---------------------------------------------------------------------------
MEDIUMtests/test_aggregation_pipeline.py109# ---------------------------------------------------------------------------
MEDIUMtests/test_evaluator_utils.py115# ---------------------------------------------------------------------------
MEDIUMtests/test_evaluator_utils.py117# ---------------------------------------------------------------------------
MEDIUMtests/test_evaluator_utils.py139# ---------------------------------------------------------------------------
MEDIUMtests/test_evaluator_utils.py141# ---------------------------------------------------------------------------
38 more matches not shown…
Excessive Try-Catch Wrapping164 hits · 202 pts
SeverityFileLineSnippet
LOWlm_eval/tasks/_index.py63 except Exception as err:
LOWlm_eval/tasks/realtoxicityprompts/metric.py36 except Exception:
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py123 except Exception as e:
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py148 except Exception:
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py325 except Exception as e:
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py676 except Exception:
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py1058 except Exception as e:
MEDIUMlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py1053def parse_prediction(prediction):
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py123 except Exception as e:
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py148 except Exception:
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py325 except Exception as e:
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py676 except Exception:
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py1058 except Exception as e:
MEDIUMlm_eval/tasks/acpbench/gen_2shot/acp_utils.py1053def parse_prediction(prediction):
LOWlm_eval/tasks/slr_bench/lm_eval_slr_bench.py14except Exception as e:
LOWlm_eval/tasks/slr_bench/lm_eval_slr_bench.py59 except Exception as e:
MEDIUMlm_eval/tasks/slr_bench/lm_eval_slr_bench.py60 print(f"Error in process_results: {e}")
LOWlm_eval/tasks/humaneval/utils.py9except Exception as e:
LOWlm_eval/tasks/aime/utils.py49 except Exception:
LOWlm_eval/tasks/hendrycks_math/utils.py49 except Exception:
LOWlm_eval/tasks/hrm8k/default/utils.py32 except Exception:
LOWlm_eval/tasks/hrm8k/default/utils.py70 except Exception:
LOWlm_eval/tasks/hrm8k/default/utils.py84 except Exception:
LOWlm_eval/tasks/hrm8k/default/utils.py158 except Exception:
LOWlm_eval/tasks/hrm8k/default/utils.py189 except Exception:
LOWlm_eval/tasks/hrm8k/en/utils.py32 except Exception:
LOWlm_eval/tasks/hrm8k/en/utils.py70 except Exception:
LOWlm_eval/tasks/hrm8k/en/utils.py84 except Exception:
LOWlm_eval/tasks/hrm8k/en/utils.py158 except Exception:
LOWlm_eval/tasks/hrm8k/en/utils.py189 except Exception:
LOWlm_eval/tasks/humaneval_infilling/utils.py9except Exception as e:
LOWlm_eval/tasks/medtext/utils.py18except Exception as e:
LOWlm_eval/tasks/medtext/utils.py27 except Exception as e:
LOWlm_eval/tasks/medtext/utils.py33 except Exception as e:
LOWlm_eval/tasks/medtext/utils.py39 except Exception as e:
LOWlm_eval/tasks/medtext/utils.py47 except Exception as e:
MEDIUMlm_eval/tasks/medtext/utils.py24def doc_eval(pred, refs):
LOWlm_eval/tasks/olaph/utils.py19except Exception as e:
LOWlm_eval/tasks/olaph/utils.py28 except Exception as e:
LOWlm_eval/tasks/olaph/utils.py34 except Exception as e:
LOWlm_eval/tasks/olaph/utils.py40 except Exception as e:
LOWlm_eval/tasks/olaph/utils.py48 except Exception as e:
MEDIUMlm_eval/tasks/olaph/utils.py25def doc_eval(pred, refs):
LOWlm_eval/tasks/minerva_math/utils.py197 except Exception as e:
LOWlm_eval/tasks/leaderboard/math/utils.py209 except Exception as e:
LOWlm_eval/tasks/toksuite/utils.py474 except Exception:
LOWlm_eval/tasks/toksuite/utils.py494 except Exception:
LOWlm_eval/tasks/toksuite/utils.py518 except Exception:
LOWlm_eval/tasks/toksuite/utils.py522 except Exception:
LOWlm_eval/tasks/toksuite/utils.py555 except Exception:
LOWlm_eval/tasks/toksuite/utils.py578 except Exception:
LOWlm_eval/tasks/toksuite/utils.py601 except Exception:
LOWlm_eval/tasks/toksuite/utils.py605 except Exception:
LOWlm_eval/tasks/meqsum/utils.py18except Exception as e:
LOWlm_eval/tasks/meqsum/utils.py52 except Exception as e:
LOWlm_eval/tasks/meqsum/utils.py58 except Exception as e:
LOWlm_eval/tasks/meqsum/utils.py64 except Exception as e:
LOWlm_eval/tasks/meqsum/utils.py72 except Exception as e:
LOWlm_eval/tasks/med_prescriptions/utils.py2060 except Exception:
LOWlm_eval/tasks/med_prescriptions/utils.py2066 except Exception:
104 more matches not shown…
Unused Imports181 hits · 179 pts
SeverityFileLineSnippet
LOWlm_eval/evaluator_utils.py1
LOWlm_eval/__init__.py2
LOWlm_eval/evaluator.py1
LOWlm_eval/filters/__init__.py1
LOWlm_eval/filters/__init__.py6
LOWlm_eval/filters/__init__.py8
LOWlm_eval/filters/__init__.py8
LOWlm_eval/filters/__init__.py8
LOWlm_eval/filters/__init__.py8
LOWlm_eval/filters/extraction.py1
LOWlm_eval/tasks/_index.py1
LOWlm_eval/tasks/_factory.py1
LOWlm_eval/tasks/_yaml_loader.py1
LOWlm_eval/tasks/manager.py1
LOWlm_eval/tasks/manager.py20
LOWlm_eval/tasks/babilong/common_utils.py11
LOWlm_eval/tasks/evalita_llm/utils.py3
LOWlm_eval/tasks/evalita_llm/utils.py4
LOWlm_eval/tasks/jfinqa/utils.py12
LOWlm_eval/tasks/catalan_bench/truthfulqa_va/utils.py2
LOWlm_eval/tasks/catalan_bench/truthfulqa_va/utils.py8
LOWlm_eval/tasks/aime/utils.py1
LOWlm_eval/tasks/noreval/norsumm/utils.py1
LOWlm_eval/tasks/noreval/norsumm/utils.py7
LOWlm_eval/tasks/spanish_bench/utils.py2
LOWlm_eval/tasks/spanish_bench/utils.py5
LOWlm_eval/tasks/minerva_math/utils.py5
LOWlm_eval/tasks/minerva_math/utils.py5
LOWlm_eval/tasks/minerva_math/utils.py14
LOWlm_eval/tasks/darija_bench/darija_sentiment/utils.py1
LOWlm_eval/tasks/darija_bench/darija_sentiment/utils.py2
LOW…_eval/tasks/darija_bench/darija_summarization/utils.py1
LOWlm_eval/tasks/longbench/utils.py1
LOWlm_eval/tasks/longbench/utils.py2
LOWlm_eval/tasks/longbench/utils.py3
LOWlm_eval/tasks/leaderboard/gpqa/utils.py1
LOWlm_eval/tasks/xquad/utils.py1
LOWlm_eval/tasks/xquad/utils.py2
LOWlm_eval/tasks/xquad/utils.py4
LOWlm_eval/tasks/xquad/utils.py7
LOWlm_eval/tasks/cnn_dailymail/utils.py6
LOWlm_eval/tasks/score/utils.py21
LOWlm_eval/tasks/score/utils.py25
LOWlm_eval/tasks/ruler/vt_utils.py30
LOWlm_eval/tasks/ruler/vt_utils.py30
LOWlm_eval/tasks/ruler/fwe_utils.py20
LOWlm_eval/tasks/ruler/common_utils.py10
LOWlm_eval/tasks/afrobench/nollysenti/prompt_5/utils.py1
LOWlm_eval/tasks/afrobench/nollysenti/prompt_2/utils.py1
LOWlm_eval/tasks/afrobench/nollysenti/prompt_3/utils.py1
LOWlm_eval/tasks/afrobench/nollysenti/prompt_4/utils.py1
LOWlm_eval/tasks/afrobench/nollysenti/prompt_1/utils.py1
LOWlm_eval/tasks/afrobench/injongointent/prompt_5/utils.py1
LOWlm_eval/tasks/afrobench/injongointent/prompt_2/utils.py1
LOWlm_eval/tasks/afrobench/injongointent/prompt_3/utils.py1
LOWlm_eval/tasks/afrobench/injongointent/prompt_4/utils.py1
LOWlm_eval/tasks/afrobench/injongointent/prompt_1/utils.py1
LOWlm_eval/tasks/afrobench/afrisenti/prompt_5/utils.py1
LOWlm_eval/tasks/afrobench/afrisenti/prompt_2/utils.py1
LOWlm_eval/tasks/afrobench/afrisenti/prompt_3/utils.py1
121 more matches not shown…
Deep Nesting166 hits · 166 pts
SeverityFileLineSnippet
LOWlm_eval/evaluator_utils.py404
LOWlm_eval/evaluator_utils.py483
LOWlm_eval/evaluator.py424
LOWlm_eval/filters/extraction.py39
LOWlm_eval/filters/extraction.py157
LOWlm_eval/filters/extraction.py42
LOWlm_eval/tasks/_factory.py127
LOWlm_eval/tasks/realtoxicityprompts/metric.py12
LOWlm_eval/tasks/evalita_llm/metrics.py49
LOWlm_eval/tasks/evalita_llm/metrics.py63
LOWlm_eval/tasks/evalita_llm/utils.py11
LOWlm_eval/tasks/evalita_llm/utils.py30
LOWlm_eval/tasks/evalita_llm/utils.py91
LOWlm_eval/tasks/evalita_llm/utils.py193
LOWlm_eval/tasks/evalita_llm/utils.py246
LOWlm_eval/tasks/evalita_llm/utils.py439
LOWlm_eval/tasks/evalita_llm/utils.py526
LOWlm_eval/tasks/simple_cooccurrence_bias/utils.py29
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py642
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py730
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py91
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py917
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py993
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py642
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py730
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py91
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py917
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py993
LOWlm_eval/tasks/mgsm/utils.py131
LOWlm_eval/tasks/chartqa/utils.py192
LOW…asks/catalan_bench/flores_ca/create_yamls_flores_ca.py273
LOWlm_eval/tasks/truthfulqa-multi/utils.py38
LOWlm_eval/tasks/mmmu/utils.py105
LOWlm_eval/tasks/mmmu/utils.py223
LOWlm_eval/tasks/mmmu/utils.py316
LOWlm_eval/tasks/mmmu/utils.py230
LOWlm_eval/tasks/translation/utils.py41
LOWlm_eval/tasks/aime/utils.py97
LOWlm_eval/tasks/hendrycks_math/utils.py97
LOWlm_eval/tasks/qasper/utils.py6
LOWlm_eval/tasks/qasper/utils.py9
LOWlm_eval/tasks/qasper/utils.py31
LOW…s/portuguese_bench/flores_pt/create_yamls_flores_pt.py272
LOWlm_eval/tasks/bbq/utils.py212
LOWlm_eval/tasks/bbq/utils.py300
LOWlm_eval/tasks/bbq/utils.py303
LOWlm_eval/tasks/hrm8k/default/utils.py146
LOWlm_eval/tasks/hrm8k/en/utils.py146
LOW…asks/spanish_bench/flores_es/create_yamls_flores_es.py272
LOWlm_eval/tasks/minerva_math/utils.py159
LOWlm_eval/tasks/leaderboard/math/utils.py170
LOWlm_eval/tasks/toksuite/utils.py423
LOWlm_eval/tasks/toksuite/utils.py532
LOWlm_eval/tasks/med_prescriptions/utils.py2178
LOWlm_eval/tasks/med_prescriptions/utils.py2271
LOWlm_eval/tasks/score/utils.py74
LOWlm_eval/tasks/score/utils.py199
LOWlm_eval/tasks/score/utils.py93
LOWlm_eval/tasks/score/non_greedy_summarizer.py33
LOWlm_eval/tasks/score/non_greedy_summarizer.py117
106 more matches not shown…
Over-Commented Block88 hits · 86 pts
SeverityFileLineSnippet
LOWlm_eval/result_schema.py21 {
LOWlm_eval/result_schema.py41 # Per-task list of per-document sample results.
LOWlm_eval/result_schema.py61 "upper_git_hash": str | None,
LOWlm_eval/result_schema.py81 # Model source identifier (e.g. "hf").
LOWlm_eval/tasks/tinyBenchmarks/utils_truthfulqa.py61 # bleurt_scores_true = self.bleurt.compute(
LOWlm_eval/tasks/ifeval/instructions.py1# Copyright 2023 The Google Research Authors.
LOWlm_eval/tasks/ifeval/instructions_util.py1# Copyright 2023 The Google Research Authors.
LOWlm_eval/tasks/ifeval/instructions_registry.py1# Copyright 2023 The Google Research Authors.
LOW…val/tasks/ifeval/multilingual/instructions_registry.py1# Copyright 2024 The Google Research Authors.
LOW…multilingual/instruction_utils/ca_instructions_util.py1# coding=utf-8
LOW…multilingual/instruction_utils/es_instructions_util.py1# coding=utf-8
LOW…ks/ifeval/multilingual/instructions/es_instructions.py1# coding=utf-8
LOW…ks/ifeval/multilingual/instructions/ca_instructions.py1# Copyright 2024 The Google Research Authors.
LOWlm_eval/tasks/catalan_bench/truthfulqa_va/utils.py181 # bleurt_scores_false = self.bleurt.compute(
LOWlm_eval/tasks/truthfulqa-multi/utils.py81 completion = results[0]
LOWlm_eval/tasks/truthfulqa-multi/utils.py101 bleu_scores = [bleu([[ref]], [completion]) for ref in all_refs]
LOWlm_eval/tasks/truthfulqa-multi/utils.py121 # rouge2_max = rouge2_correct
LOWlm_eval/tasks/truthfulqa/utils.py61
LOWlm_eval/tasks/longbench/metrics.py1# MIT License
LOWlm_eval/tasks/longbench/_generate_config.py1# MIT License
LOWlm_eval/tasks/leaderboard/ifeval/instructions.py1# Copyright 2023 The Google Research Authors.
LOWlm_eval/tasks/leaderboard/ifeval/instructions_util.py1# Copyright 2023 The Google Research Authors.
LOW…eval/tasks/leaderboard/ifeval/instructions_registry.py1# Copyright 2023 The Google Research Authors.
LOWlm_eval/tasks/logiqa2/utils_logiqa2.py21# # https://github.com/csitfun/LogiQA2.0/blob/main/logiqa2nli/nli-prompt.py
LOWlm_eval/tasks/score/utils.py1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOWlm_eval/tasks/score/non_greedy_summarizer.py1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOWlm_eval/tasks/score/mmlu_pro/utils_mmlu_pro.py1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…ore/math/prompt_robustness_math_counting_and_prob.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…/tasks/score/math/prompt_robustness_math_geometry.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…ks/score/math/non_greedy_robustness_math_geometry.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…/score/math/non_greedy_robustness_math_num_theory.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…l/tasks/score/math/prompt_robustness_math_precalc.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…/score/math/non_greedy_robustness_math_prealgebra.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…asks/score/math/prompt_robustness_math_num_theory.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…math/non_greedy_robustness_math_counting_and_prob.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…asks/score/math/prompt_robustness_math_prealgebra.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…/math/prompt_robustness_math_intermediate_algebra.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOWlm_eval/tasks/score/math/math_grader.py1# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
LOWlm_eval/tasks/score/math/math_grader.py21# copies of the Software, and to permit persons to whom the Software is
LOWlm_eval/tasks/score/math/math_grader.py41# copies of the Software, and to permit persons to whom the Software is
LOWlm_eval/tasks/score/math/math_grader.py61# copies of the Software, and to permit persons to whom the Software is
LOW…sks/score/math/non_greedy_robustness_math_precalc.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…h/non_greedy_robustness_math_intermediate_algebra.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…/agi_eval/option_order_robustness_agieval_lsat_rc.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…score/agi_eval/prompt_robustness_agieval_lstat_lr.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…e/agi_eval/non_greedy_robustness_agieval_sat_math.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…agi_eval/option_order_robustness_agieval_sat_math.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…e/agi_eval/non_greedy_robustness_agieval_lstat_ar.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOWlm_eval/tasks/score/agi_eval/utils_agieval.py1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…/agi_eval/option_order_robustness_agieval_lsat_ar.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…s/score/agi_eval/prompt_robustness_agieval_sat_en.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…ore/agi_eval/non_greedy_robustness_agieval_sat_en.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…/agi_eval/non_greedy_robustness_agieval_logiqa_en.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…e/agi_eval/non_greedy_robustness_agieval_lstat_lr.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…gi_eval/option_order_robustness_agieval_logiqa_en.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…score/agi_eval/prompt_robustness_agieval_lstat_ar.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…score/agi_eval/prompt_robustness_agieval_sat_math.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…re/agi_eval/non_greedy_robustness_agieval_lsat_rc.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…core/agi_eval/prompt_robustness_agieval_logiqa_en.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
LOW…/score/agi_eval/prompt_robustness_agieval_lsat_rc.yaml1# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
28 more matches not shown…
Redundant / Tautological Comments44 hits · 68 pts
SeverityFileLineSnippet
LOWlm_eval/tasks/_yaml_loader.py50 # Check if this is a built-in task module
LOWlm_eval/tasks/_yaml_loader.py69 # Check if we need to reload the module
LOWlm_eval/tasks/_yaml_loader.py72 # Check if it was modified
LOWlm_eval/tasks/evalita_llm/utils.py126 if results: # Check if results is not empty
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py183 # Check if new_plan is a plan
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py807 # Check if the answer is equal (as a set) to the real stored answer
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py860 # Check if the plan candidate from the answer (a) is a proper subsequence of the plan in the question and (b
LOWlm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py978 # Check if the answer is equal as sets to the correct answers.
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py183 # Check if new_plan is a plan
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py807 # Check if the answer is equal (as a set) to the real stored answer
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py860 # Check if the plan candidate from the answer (a) is a proper subsequence of the plan in the question and (b
LOWlm_eval/tasks/acpbench/gen_2shot/acp_utils.py978 # Check if the answer is equal as sets to the correct answers.
LOW…ks/ifeval/multilingual/instructions/es_instructions.py1366 # Check if the last character in value is a dot (.)
LOW…ks/ifeval/multilingual/instructions/es_instructions.py1506 # Check if all normalized, alphabetic characters are uppercase, ignoring non-alphabetic characters
LOW…ks/ifeval/multilingual/instructions/ca_instructions.py1364 # Check if the last character in value is a dot (.)
LOW…ks/ifeval/multilingual/instructions/ca_instructions.py1504 # Check if all normalized, alphabetic characters are uppercase, ignoring non-alphabetic characters
LOWlm_eval/tasks/chartqa/utils.py211 # Check if the number is in the model answer with commas (e.g. 1,000)
LOWlm_eval/tasks/chartqa/utils.py214 # Check if the number is in the model answer without commas (e.g. 1000)
LOWlm_eval/tasks/graphwalks/utils.py42 # Check if formatted correctly
LOWlm_eval/tasks/aime/utils.py26 # Check if answer matches target
LOWlm_eval/tasks/bbq/utils.py224 # Check if answer is "Not known"
LOWlm_eval/tasks/med_prescriptions/utils.py2106 # Check if the text contains any Indian script characters
LOWlm_eval/tasks/arab_culture/utils_mcq.py17### Set this to one to add the country and region information to the prompt
LOWlm_eval/tasks/arab_culture/utils_mcq.py19### Set this to one to add the region information to the prompt
LOWlm_eval/tasks/arab_culture/utils_mcq.py21### Set this to change between Arabic and English for the answer keys and the choices keys
LOWlm_eval/tasks/jsonschema_bench/metrics.py28 # Check if the schema is valid
LOWlm_eval/tasks/afrobench/masakhaner/prompt_5/utils.py17 if pair: # Check if the line is not empty
LOWlm_eval/tasks/afrobench/masakhaner/prompt_2/utils.py17 if pair: # Check if the line is not empty
LOWlm_eval/tasks/afrobench/masakhaner/prompt_3/utils.py17 if pair: # Check if the line is not empty
LOWlm_eval/tasks/afrobench/masakhaner/prompt_4/utils.py17 if pair: # Check if the line is not empty
LOWlm_eval/tasks/afrobench/masakhaner/prompt_1/utils.py17 if pair: # Check if the line is not empty
LOW…eval/tasks/arab_culture_completion/utils_completion.py18### Set this to one to add the country and region information to the prompt
LOW…eval/tasks/arab_culture_completion/utils_completion.py20### Set this to one to add the region information to the prompt
LOW…eval/tasks/arab_culture_completion/utils_completion.py22### Set this to change between Arabic and English for the answer keys and the choices keys
LOWlm_eval/decontamination/decontaminate.py61 # Check if we've decontaminated this combination before
LOWlm_eval/models/winml.py326 # Check if encoding empty string gives BOS token
LOWlm_eval/models/winml.py556 # Check if greedy (argmax matches actual token)
LOWlm_eval/models/hf_vlms.py586 # Check if per-token argmax is exactly equal to continuation
LOWlm_eval/models/neuron_optimum.py542 # Check if per-token argmax is exactly equal to continuation
LOWlm_eval/models/huggingface.py1529 # Check if per-token argmax is exactly equal to continuation
LOWlm_eval/models/megatron_lm.py987 # Check if greedy
LOWlm_eval/_cli/run.py478 # Print results
LOWlm_eval/api/task.py1078 # Check if answer is provided (handle a=0 as valid answer index)
LOWtests/test_tasks.py28 # Check if task_classes is empty
Docstring Block Structure11 hits · 55 pts
SeverityFileLineSnippet
HIGHlm_eval/models/winml.py388 Run inference using ONNX Runtime GenAI to get full logits sequence. Args: input_text: Inpu
HIGHlm_eval/models/ibm_watsonx_ai.py229 Determines whether a stop token has been generated in the `response_tokens` compared to the `context_tokens`.
HIGHlm_eval/models/utils.py280Generates and yields batches from the reordered array. The method of grouping and batching depends on the param
HIGHlm_eval/models/utils.py504This function checks if the (Hugging Face) tokenizer has a padding token and sets it if not present. Some tokenizers req
HIGHlm_eval/models/utils.py611Normalize generation kwargs for consistent handling across model backends. Model implementations may have different
HIGHlm_eval/models/utils.py829Truncates input tokens and/or reduces max_gen_toks to fit within max_model_len. Strategy: 1. No truncation
HIGHlm_eval/api/registry.py102Materialize a lazy placeholder into the actual object. This is at module level to avoid memory leaks from lru_cache
HIGHlm_eval/api/registry.py188Register an object under one or more aliases. Can be used as a decorator or called directly for direct registra
HIGHlm_eval/api/registry.py279Retrieve an object by alias, materializing if needed. Thread-safe lazy loading: if the alias points to a placeh
HIGHlm_eval/api/registry.py492Get a model class by name. Args: model_name: The registered name of the model Returns: The mod
HIGHlm_eval/api/registry.py546Get a filter by name. Args: filter_name: The registered name of the filter, or a callable Returns:
AI Slop Vocabulary20 hits · 54 pts
SeverityFileLineSnippet
MEDIUMlm_eval/evaluator.py198 # See https://github.com/EleutherAI/lm-evaluation-harness/pull/1412
MEDIUMlm_eval/tasks/tinyBenchmarks/utils_truthfulqa.py160 # init RougeScorer once (https://github.com/EleutherAI/lm-evaluation-harness/issues/1692)--rouge_types are const
MEDIUMlm_eval/tasks/ifeval/instructions_util.py29# see https://github.com/EleutherAI/lm-evaluation-harness/issues/2210
MEDIUMlm_eval/tasks/aime/utils.py35# string normalization from https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/tasks/hendrycks_math
MEDIUMlm_eval/tasks/hendrycks_math/utils.py35# string normalization from https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/tasks/hendrycks_math
MEDIUMlm_eval/tasks/truthfulqa/utils.py164 # init RougeScorer once (https://github.com/EleutherAI/lm-evaluation-harness/issues/1692)--rouge_types are const
LOWlm_eval/tasks/bbq/utils.py65 # If all elements are NaN, then we simply return NaN
MEDIUMlm_eval/tasks/noreval/nortruthfulqa/generation/utils.py137 # init RougeScorer once (https://github.com/EleutherAI/lm-evaluation-harness/issues/1692)--rouge_types are const
MEDIUMlm_eval/tasks/noreval/norsumm/utils.py87 # init RougeScorer once (https://github.com/EleutherAI/lm-evaluation-harness/issues/1692)--rouge_types are const
MEDIUMlm_eval/tasks/minerva_math/utils.py28# https://github.com/wellecks/lm-evaluation-harness/blob/master/lm_eval/tasks/minerva_math.py
LOWlm_eval/tasks/longbench/_generate_config.py177 # Now we just set a boolean flag to indicate whether we need a newline
MEDIUMlm_eval/tasks/leaderboard/ifeval/instructions_util.py28# see https://github.com/EleutherAI/lm-evaluation-harness/issues/2210
MEDIUMlm_eval/tasks/leaderboard/math/utils.py25# https://github.com/wellecks/lm-evaluation-harness/blob/master/lm_eval/tasks/minerva_math.py
MEDIUMlm_eval/tasks/cruxeval/utils.py242# lm-evaluation-harness Integration Functions
MEDIUMlm_eval/models/openai_completions.py314 "Loglikelihood (and therefore `multiple_choice`-type tasks) is not supported for chat completions as OpenAI
MEDIUMlm_eval/models/huggingface.py1393 # See: https://github.com/EleutherAI/lm-evaluation-harness/issues/1678
MEDIUMlm_eval/models/sglang_causallms.py40 # batch args from lm-eval interface: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interfa
LOWlm_eval/models/megatron_lm.py837 # We just pass through the requests without additional splitting
LOWlm_eval/models/megatron_lm.py857 # We just return results without additional gathering
MEDIUMlm_eval/api/metrics.py614 # See https://github.com/EleutherAI/lm-evaluation-harness/pull/1390 for more documentation.
Self-Referential Comments17 hits · 50 pts
SeverityFileLineSnippet
MEDIUMlm_eval/tasks/slr_bench/lm_eval_slr_bench.py33 # Create the reference in the required format
MEDIUMlm_eval/tasks/toksuite/utils.py500 # Create the summary row with column averages
MEDIUMlm_eval/tasks/med_prescriptions/utils.py2101 # Create a regular expression pattern for Indian scripts
MEDIUMlm_eval/tasks/ruler/vt_utils.py75 # Create a list of the repeated noise
MEDIUMlm_eval/loggers/utils.py25 # Define the pattern to match ',none' at the end of the string
MEDIUMlm_eval/config/evaluate_config.py223 # Create an instance and validate
MEDIUMlm_eval/models/api_models.py269 """This method is responsible for creating the json payload that will be sent to the API."""
MEDIUMtests/test_registry.py157 # Create a class to test with
MEDIUMtests/test_metrics.py12 # Create a minimal config
MEDIUMtests/test_cli_subcommands.py444 # Create a minimal valid task yaml
MEDIUMtests/test_cli_subcommands.py855 # Create a YAML config file
MEDIUMtests/test_task_manager.py529 # Create a custom arc_easy.yaml that has a different metric
MEDIUMtests/test_task_manager.py588 # Create a custom task using a real dataset
MEDIUMtests/test_task_manager.py640 # Create a completely new task (not overriding any default)
MEDIUMtests/models/test_vllm_context_length.py24 # Create a mock VLLM instance with required attributes
MEDIUMtests/models/test_vllm_context_length.py205 # Create a mock request
MEDIUMtests/scripts/test_zeno_visualize.py17 # Define the process_model_args function that replicates the fixed logic in zeno_visualize.py
Verbosity Indicators11 hits · 23 pts
SeverityFileLineSnippet
LOWlm_eval/tasks/score/math/prompt_templates.json11 "prompt": "You should solve this math problem.\nIf the problem is easy, provide a brief solution with little
LOWlm_eval/tasks/infinitebench/utils.py367 # Step 1: find last standalone A-D letter (official regex)
LOWlm_eval/tasks/infinitebench/utils.py372 # Step 2: empty prediction
LOWlm_eval/tasks/infinitebench/utils.py376 # Step 3: first character
LOWlm_eval/tasks/infinitebench/utils.py380 # Step 4: full prediction matches label letter
LOWlm_eval/tasks/infinitebench/utils.py384 # Step 5: replace punctuation, check prefixes (matching official chars)
LOWlm_eval/tasks/infinitebench/utils.py395 # Step 6: scan words for first A-D letter
LOWlm_eval/tasks/infinitebench/utils.py430 # Step 1: find last standalone A-J letter (official regex)
LOWlm_eval/tasks/infinitebench/utils.py437 # Step 2: replace chars and consolidate spaces (matching official)
LOWlm_eval/tasks/infinitebench/utils.py447 # Step 3: check startswith
LOWlm_eval/tasks/infinitebench/utils.py453 # Step 4: check answer prefixes (matching official set)
Cross-Language Confusion1 hit · 8 pts
SeverityFileLineSnippet
HIGHlm_eval/tasks/bbq/utils.py75 # Unfortunately, bias score for `n_non_unk = 0` is undefined,
Synthetic Comment Markers1 hit · 8 pts
SeverityFileLineSnippet
HIGHlm_eval/tasks/arabic_leaderboard_complete/README.md181* `arabic_leaderboard_acva`: Arabic-Culture-Value-Alignment (ACVA) is a yes/no question dataset, generated by GPT3.5 Tur
Dead Code3 hits · 6 pts
SeverityFileLineSnippet
MEDIUMlm_eval/models/hf_vlms.py413
MEDIUMlm_eval/models/hf_vlms.py414
MEDIUMlm_eval/models/hf_vlms.py435