Repository Analysis

huggingface/sentence-transformers

State-of-the-Art Embeddings, Retrieval, and Reranking

23.4 Moderate AI signal View on GitHub
23.4
Adjusted Score
23.4
Raw Score
100%
Time Factor
2026-05-28
Last Push
18,755
Stars
Python
Language
109,703
Lines of Code
581
Files
1959
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 118MEDIUM 104LOW 1737

Pattern Findings

1959 matches across 12 categories. Click a row to expand file-level details.

Hyper-Verbose Identifiers850 hits · 796 pts
SeverityFileLineSnippet
LOW…entence_transformer/deprecated_model_card_templates.py125 def model_card_get_pooling_function(pooling_mode):
LOWsentence_transformers/sentence_transformer/model.py926 def set_pooling_include_prompt(self, include_prompt: bool) -> None:
LOWsentence_transformers/sentence_transformer/model.py972 def get_sentence_embedding_dimension(self) -> int | None:
LOWsentence_transformers/sentence_transformer/model.py1010 def truncate_sentence_embeddings(self, truncate_dim: int | None) -> Iterator[None]:
LOW…ntence_transformers/sentence_transformer/model_card.py118 def get_model_specific_metadata(self) -> dict[str, Any]:
LOW…sentence_transformer/losses/batch_semi_hard_triplet.py110 def batch_semi_hard_triplet_loss(self, labels: Tensor, embeddings: Tensor) -> Tensor:
LOW…tence_transformer/losses/multiple_negatives_ranking.py235 def compute_loss_from_embeddings(self, embeddings: list[Tensor], labels: Tensor) -> Tensor:
LOW…ence_transformers/sentence_transformer/losses/angle.py83 def compute_loss_from_embeddings(
LOW…ransformer/losses/cached_multiple_negatives_ranking.py416 def calculate_loss_and_cache_gradients(self, reps: list[list[Tensor]]) -> Tensor:
LOW…transformer/losses/global_orthogonal_regularization.py139 def compute_loss_from_embeddings(
LOW…mers/sentence_transformer/losses/batch_hard_triplet.py228 def get_anchor_positive_triplet_mask(labels: Tensor) -> Tensor:
LOW…mers/sentence_transformer/losses/batch_hard_triplet.py247 def get_anchor_negative_triplet_mask(labels: Tensor) -> Tensor:
LOW…transformers/sentence_transformer/losses/margin_mse.py177 def compute_loss_from_embeddings(self, embeddings: list[Tensor], labels: Tensor) -> Tensor:
LOW…rmers/sentence_transformer/losses/cosine_similarity.py83 def compute_loss_from_embeddings(self, embeddings: list[Tensor], labels: Tensor) -> Tensor:
LOW…nce_transformers/sentence_transformer/losses/cosent.py86 def compute_loss_from_embeddings(self, embeddings: list[Tensor], labels: Tensor) -> Tensor:
LOW…ce_transformers/sentence_transformer/losses/triplet.py87 def compute_loss_from_embeddings(self, embeddings: list[Tensor], labels: Tensor) -> Tensor:
LOW…e_transformer/losses/batch_hard_soft_margin_triplet.py99 def batch_hard_triplet_soft_margin_loss(self, labels: Tensor, embeddings: Tensor) -> Tensor:
LOW…sformers/sentence_transformer/losses/distill_kl_div.py141 def compute_loss_from_embeddings(self, embeddings: list[Tensor], labels: Tensor) -> Tensor:
LOW…rmers/sentence_transformer/losses/cached_gist_embed.py278 def calculate_loss_and_cache_gradients(self, reps: list[list[Tensor]], reps_guided: list[list[Tensor]]) -> Tensor:
LOW…nsformers/sentence_transformer/losses/embed_distill.py196 def compute_loss_from_embeddings(self, embeddings: list[Tensor], teacher_embeddings: list[Tensor]) -> Tensor:
LOW…ntence_transformer/evaluation/binary_classification.py316 def find_best_acc_and_threshold(scores, labels, high_score_more_similar: bool):
LOW…nsformers/sentence_transformer/evaluation/reranking.py291 def compute_metrics_individual(self, model: SentenceTransformer):
LOW…nsformers/sentence_transformer/evaluation/nano_beir.py468 def _load_dataset_subset_split(self, subset: str, split: str, required_columns: list[str]):
LOW…nsformers/sentence_transformer/evaluation/nano_beir.py523 def store_metrics_in_model_card_data(self, *args, **kwargs):
LOW…e_transformers/sentence_transformer/modules/pooling.py26def _convert_legacy_pooling_kwargs(kwargs: dict[str, Any]) -> None:
LOW…e_transformers/sentence_transformer/modules/pooling.py46def _deprecated_pooling_mode_kwargs(func):
LOW…e_transformers/sentence_transformer/modules/pooling.py151 def _exclude_prompt_from_mask(attention_mask: Tensor, prompt_length: int) -> Tensor:
LOW…ransformers/sentence_transformer/modules/clip_model.py36 def _get_default_modality_config(config: dict[str, Any]) -> tuple[ModalityConfig, str]:
LOWsentence_transformers/util/deprecated_import.py261def setup_deprecated_module_imports() -> None:
LOWsentence_transformers/util/similarity.py299 def to_similarity_pairwise_fn(
LOWsentence_transformers/util/retrieval.py89def paraphrase_mining_embeddings(
LOWsentence_transformers/util/environment.py28def suggest_extra_on_exception() -> Generator[None, None, None]:
LOWsentence_transformers/util/environment.py72def check_package_availability(package_name: str, owner: str) -> bool:
LOWsentence_transformers/util/file_io.py34def is_sentence_transformer_model(
LOWsentence_transformers/util/decorators.py48def transformer_kwargs_decorator(func):
LOWsentence_transformers/util/decorators.py92def cross_encoder_init_args_decorator(func):
LOWsentence_transformers/util/decorators.py165def cross_encoder_predict_rank_args_decorator(func):
LOWsentence_transformers/util/decorators.py190def save_to_hub_args_decorator(func):
LOWsentence_transformers/backend/optimize.py19def export_optimized_onnx_model(
LOWsentence_transformers/backend/quantize.py24def export_dynamic_quantized_onnx_model(
LOWsentence_transformers/backend/quantize.py107def export_static_quantized_openvino_model(
LOWsentence_transformers/backend/utils.py155def save_or_push_to_hub_model(
LOWsentence_transformers/cross_encoder/model.py402 def get_default_activation_fn(self) -> Callable:
LOWsentence_transformers/cross_encoder/model.py463 def default_activation_function(self) -> Callable:
LOWsentence_transformers/cross_encoder/fit_mixin.py397 def smart_batching_collate_text_only(self, batch: list[InputExample]) -> BatchEncoding:
LOWsentence_transformers/cross_encoder/model_card.py269 def get_model_specific_metadata(self) -> dict[str, Any]:
LOW…ss_encoder/losses/cached_multiple_negatives_ranking.py239 def calculate_loss_and_cache_gradients(self, logits: list[Tensor], batch_size: int) -> Tensor:
LOW…nce_transformers/cross_encoder/evaluation/nano_beir.py389 def _load_dataset_subset_split(self, subset: str, split: str, required_columns: list[str]):
LOWsentence_transformers/sparse_encoder/model.py597 def set_pooling_include_prompt(self, include_prompt: bool) -> None:
LOWsentence_transformers/sparse_encoder/model.py825 def get_sentence_embedding_dimension(self) -> int | None:
LOWsentence_transformers/sparse_encoder/model.py1110 def splade_pooling_chunk_size(self) -> int | None:
LOWsentence_transformers/sparse_encoder/model.py1131 def splade_pooling_chunk_size(self, value: int | None) -> None:
LOWsentence_transformers/sparse_encoder/model_card.py118 def get_model_specific_metadata(self) -> dict[str, Any]:
LOWsentence_transformers/sparse_encoder/search_engines.py160def semantic_search_elasticsearch(
LOWsentence_transformers/sparse_encoder/search_engines.py428def semantic_search_opensearch(
LOW…ce_transformers/sparse_encoder/losses/cached_splade.py182 def calculate_loss_and_cache_gradients(self, reps: list[list[Tensor]], labels: Tensor | None) -> Tensor:
LOWsentence_transformers/sparse_encoder/losses/flops.py47 def compute_loss_from_embeddings(self, embeddings: torch.Tensor) -> torch.Tensor:
LOWsentence_transformers/sparse_encoder/losses/csr.py15def normalized_mean_squared_error(reconstruction: torch.Tensor, original_input: torch.Tensor) -> torch.Tensor:
LOWsentence_transformers/sparse_encoder/losses/csr.py68 def compute_loss_from_embeddings(self, outputs: list[dict[str, torch.Tensor]]) -> dict[str, torch.Tensor]:
LOW…arse_encoder/evaluation/sparse_embedding_similarity.py159 def store_metrics_in_model_card_data(
790 more matches not shown…
Unused Imports553 hits · 522 pts
SeverityFileLineSnippet
LOWsentence_transformers/__init__.py1
LOWsentence_transformers/__init__.py10
LOWsentence_transformers/__init__.py10
LOWsentence_transformers/__init__.py10
LOWsentence_transformers/__init__.py15
LOWsentence_transformers/__init__.py15
LOWsentence_transformers/__init__.py16
LOWsentence_transformers/__init__.py16
LOWsentence_transformers/__init__.py16
LOWsentence_transformers/__init__.py16
LOWsentence_transformers/__init__.py22
LOWsentence_transformers/__init__.py22
LOWsentence_transformers/__init__.py23
LOWsentence_transformers/__init__.py24
LOWsentence_transformers/__init__.py25
LOWsentence_transformers/__init__.py26
LOWsentence_transformers/__init__.py27
LOWsentence_transformers/__init__.py28
LOWsentence_transformers/__init__.py28
LOWsentence_transformers/__init__.py28
LOWsentence_transformers/__init__.py28
LOWsentence_transformers/__init__.py34
LOWsentence_transformers/__init__.py36
LOWsentence_transformers/__init__.py37
LOWsentence_transformers/__init__.py38
LOW…entence_transformer/deprecated_model_card_templates.py6
LOWsentence_transformers/sentence_transformer/__init__.py1
LOWsentence_transformers/sentence_transformer/__init__.py3
LOWsentence_transformers/sentence_transformer/__init__.py4
LOWsentence_transformers/sentence_transformer/__init__.py5
LOWsentence_transformers/sentence_transformer/__init__.py6
LOWsentence_transformers/sentence_transformer/__init__.py7
LOW…nce_transformers/sentence_transformer/training_args.py1
LOW…nce_transformers/sentence_transformer/training_args.py5
LOW…nce_transformers/sentence_transformer/training_args.py5
LOWsentence_transformers/sentence_transformer/model.py1
LOW…nce_transformers/sentence_transformer/data_collator.py1
LOWsentence_transformers/sentence_transformer/trainer.py1
LOWsentence_transformers/sentence_transformer/fit_mixin.py1
LOW…ntence_transformers/sentence_transformer/model_card.py1
LOW…sentence_transformer/losses/batch_semi_hard_triplet.py1
LOW…/losses/cached_multiple_negatives_symmetric_ranking.py1
LOW…ntence_transformers/sentence_transformer/losses/mse.py1
LOW…tence_transformer/losses/multiple_negatives_ranking.py1
LOW…transformers/sentence_transformer/losses/gist_embed.py1
LOW…rmers/sentence_transformer/losses/batch_all_triplet.py1
LOW…ransformers/sentence_transformer/losses/contrastive.py1
LOW…ence_transformers/sentence_transformer/losses/angle.py1
LOW…ransformer/losses/cached_multiple_negatives_ranking.py1
LOW…transformer/losses/global_orthogonal_regularization.py1
LOW…mers/sentence_transformer/losses/batch_hard_triplet.py1
LOW…transformers/sentence_transformer/losses/margin_mse.py1
LOW…e_transformers/sentence_transformer/losses/__init__.py2
LOW…e_transformers/sentence_transformer/losses/__init__.py4
LOW…e_transformers/sentence_transformer/losses/__init__.py6
LOW…e_transformers/sentence_transformer/losses/__init__.py7
LOW…e_transformers/sentence_transformer/losses/__init__.py8
LOW…e_transformers/sentence_transformer/losses/__init__.py9
LOW…e_transformers/sentence_transformer/losses/__init__.py10
LOW…e_transformers/sentence_transformer/losses/__init__.py10
493 more matches not shown…
Cross-File Repetition100 hits · 500 pts
SeverityFileLineSnippet
HIGH…sentence_transformer/losses/batch_semi_hard_triplet.py0@misc{hermans2017defense, title={in defense of the triplet loss for person re-identification}, author={alexander hermans
HIGH…rmers/sentence_transformer/losses/batch_all_triplet.py0@misc{hermans2017defense, title={in defense of the triplet loss for person re-identification}, author={alexander hermans
HIGH…mers/sentence_transformer/losses/batch_hard_triplet.py0@misc{hermans2017defense, title={in defense of the triplet loss for person re-identification}, author={alexander hermans
HIGH…ce_transformers/sentence_transformer/losses/triplet.py0@misc{hermans2017defense, title={in defense of the triplet loss for person re-identification}, author={alexander hermans
HIGH…e_transformer/losses/batch_hard_soft_margin_triplet.py0@misc{hermans2017defense, title={in defense of the triplet loss for person re-identification}, author={alexander hermans
HIGH…ransformer/losses/cached_multiple_negatives_ranking.py0random-state context manager class. reference: https://github.com/luyug/gradcache. this class will back up the pytorch's
HIGH…rmers/sentence_transformer/losses/cached_gist_embed.py0random-state context manager class. reference: https://github.com/luyug/gradcache. this class will back up the pytorch's
HIGH…ss_encoder/losses/cached_multiple_negatives_ranking.py0random-state context manager class. reference: https://github.com/luyug/gradcache. this class will back up the pytorch's
HIGH…ransformer/losses/cached_multiple_negatives_ranking.py0a backward hook to backpropagate the cached gradients mini-batch by mini-batch.
HIGH…rmers/sentence_transformer/losses/cached_gist_embed.py0a backward hook to backpropagate the cached gradients mini-batch by mini-batch.
HIGH…ss_encoder/losses/cached_multiple_negatives_ranking.py0a backward hook to backpropagate the cached gradients mini-batch by mini-batch.
HIGH…ransformer/losses/cached_multiple_negatives_ranking.py0do forward pass on all the minibatches of the input features and yield corresponding embeddings.
HIGH…rmers/sentence_transformer/losses/cached_gist_embed.py0do forward pass on all the minibatches of the input features and yield corresponding embeddings.
HIGH…ss_encoder/losses/cached_multiple_negatives_ranking.py0do forward pass on all the minibatches of the input features and yield corresponding embeddings.
HIGH…rs/sentence_transformer/datasets/parallel_sentences.py0this file contains deprecated code that can only be used with the old `model.fit`-style sentence transformers v2.x train
HIGH…_transformers/sentence_transformer/readers/nli_data.py0this file contains deprecated code that can only be used with the old `model.fit`-style sentence transformers v2.x train
HIGH…_transformers/sentence_transformer/readers/sts_data.py0this file contains deprecated code that can only be used with the old `model.fit`-style sentence transformers v2.x train
HIGH…nsformers/sentence_transformer/readers/paired_files.py0this file contains deprecated code that can only be used with the old `model.fit`-style sentence transformers v2.x train
HIGH…sformers/sentence_transformer/readers/input_example.py0this file contains deprecated code that can only be used with the old `model.fit`-style sentence transformers v2.x train
HIGH…formers/sentence_transformer/readers/label_sentence.py0this file contains deprecated code that can only be used with the old `model.fit`-style sentence transformers v2.x train
HIGH…e_transformers/sentence_transformer/readers/triplet.py0this file contains deprecated code that can only be used with the old `model.fit`-style sentence transformers v2.x train
HIGHsentence_transformers/cross_encoder/losses/rank_net.py0get configuration parameters for this loss function. returns: dictionary containing the configuration parameters
HIGHsentence_transformers/cross_encoder/losses/list_net.py0get configuration parameters for this loss function. returns: dictionary containing the configuration parameters
HIGHsentence_transformers/cross_encoder/losses/plist_mle.py0get configuration parameters for this loss function. returns: dictionary containing the configuration parameters
HIGHsentence_transformers/cross_encoder/losses/list_mle.py0get configuration parameters for this loss function. returns: dictionary containing the configuration parameters
HIGH…tence_transformers/cross_encoder/losses/lambda_loss.py0get configuration parameters for this loss function. returns: dictionary containing the configuration parameters
HIGH…ce_transformers/cross_encoder/evaluation/deprecated.py0this evaluator has been deprecated in favor of the more general crossencoderclassificationevaluator.
HIGH…ce_transformers/cross_encoder/evaluation/deprecated.py0this evaluator has been deprecated in favor of the more general crossencoderclassificationevaluator.
HIGH…ce_transformers/cross_encoder/evaluation/deprecated.py0this evaluator has been deprecated in favor of the more general crossencoderclassificationevaluator.
HIGH…ce_transformers/cross_encoder/evaluation/deprecated.py0this evaluator has been deprecated in favor of the more general crossencoderclassificationevaluator.
HIGHsentence_transformers/sparse_encoder/losses/splade.py0get the configuration dictionary. returns: dictionary containing the configuration parameters
HIGHsentence_transformers/sparse_encoder/losses/csr.py0get the configuration dictionary. returns: dictionary containing the configuration parameters
HIGHsentence_transformers/sparse_encoder/losses/csr.py0get the configuration dictionary. returns: dictionary containing the configuration parameters
HIGHtests/sentence_transformer/test_model_card.py0dummy dataset for testing purposes. the dataset looks as follows: { "anchor": ["anchor 1", "anchor 2", ..., "anchor 10"]
HIGHtests/cross_encoder/test_model_card.py0dummy dataset for testing purposes. the dataset looks as follows: { "anchor": ["anchor 1", "anchor 2", ..., "anchor 10"]
HIGHtests/sparse_encoder/test_model_card.py0dummy dataset for testing purposes. the dataset looks as follows: { "anchor": ["anchor 1", "anchor 2", ..., "anchor 10"]
HIGH…ransformer/evaluation/test_label_accuracy_evaluator.py0tests the correct computation of evaluation scores from binaryclassificationevaluator
HIGH…mer/evaluation/test_binary_classification_evaluator.py0tests the correct computation of evaluation scores from binaryclassificationevaluator
HIGH…sformer/evaluation/test_paraphrase_mining_evaluator.py0tests the correct computation of evaluation scores from binaryclassificationevaluator
HIGHtests/util/test_hard_negatives.py0return a sample dataset with multiple matching passages for each query.
HIGHtests/util/test_hard_negatives.py0return a sample dataset with multiple matching passages for each query.
HIGHtests/util/test_hard_negatives.py0return a sample dataset with multiple matching passages for each query.
HIGHtests/cross_encoder/test_model.py0\ <|im_start|>system judge whether the document meets the requirements based on the query and the instruct provided. not
HIGHtests/cross_encoder/test_model.py0\ <|im_start|>system judge whether the document meets the requirements based on the query and the instruct provided. not
HIGHtests/cross_encoder/test_model.py0\ <|im_start|>system judge whether the document meets the requirements based on the query and the instruct provided. not
HIGHtests/base/test_modality.py0malformed urls (e.g. containing unclosed brackets) should not crash.
HIGHtests/base/test_modality.py0malformed urls (e.g. containing unclosed brackets) should not crash.
HIGHtests/base/test_modality.py0malformed urls (e.g. containing unclosed brackets) should not crash.
HIGHtests/base/modules/transformer/test_text_generation.py0create a transformer instance and return it with its supported modalities.
HIGHtests/base/modules/transformer/test_fill_mask.py0create a transformer instance and return it with its supported modalities.
HIGHtests/base/modules/transformer/test_any_to_any.py0create a transformer instance and return it with its supported modalities.
HIGH…se/modules/transformer/test_sequence_classification.py0create a transformer instance and return it with its supported modalities.
HIGHtests/base/modules/transformer/test_text_generation.py0test inference with each supported modality (single and multi-modal).
HIGH…ts/base/modules/transformer/test_feature_extraction.py0test inference with each supported modality (single and multi-modal).
HIGHtests/base/modules/transformer/test_any_to_any.py0test inference with each supported modality (single and multi-modal).
HIGH…se/modules/transformer/test_sequence_classification.py0test inference with each supported modality (single and multi-modal).
HIGHtests/base/modules/transformer/test_text_generation.py0test inference with pair inputs for each supported modality combination.
HIGHtests/base/modules/transformer/test_any_to_any.py0test inference with pair inputs for each supported modality combination.
HIGH…se/modules/transformer/test_sequence_classification.py0test inference with pair inputs for each supported modality combination.
HIGH…former/unsupervised_learning/TSDAE/train_stsb_tsdae.py0applies noise by randomly deleting words. warning: nltk's tokenization/detokenization is designed primarily for english.
40 more matches not shown…
Self-Referential Comments96 hits · 291 pts
SeverityFileLineSnippet
MEDIUMsentence_transformers/sentence_transformer/model.py856 # Create a pool if not provided, but a list of devices is
MEDIUM…transformers/sentence_transformer/losses/gist_embed.py193 # Define the anchor threshold
MEDIUM…transformers/sentence_transformer/losses/gist_embed.py211 # Create a mask to protect true positive pairs in the anchor-positive matrix (i.e., diagonal elements)
MEDIUM…rmers/sentence_transformer/losses/cached_gist_embed.py333 # Define the anchor threshold
MEDIUM…rmers/sentence_transformer/losses/cached_gist_embed.py351 # Create a mask to protect true positive pairs in the anchor-positive matrix (i.e., diagonal elements)
MEDIUM…s/sentence_transformer/evaluation/paraphrase_mining.py66 # Create a mapping from qid to question & a list of duplicates (qid1, qid2)
MEDIUMsentence_transformers/util/tensor.py160 # Create a mask of zeros, then set the top-k positions to 1
MEDIUMsentence_transformers/util/tensor.py165 # Create a sparse tensor with only the top values
MEDIUMsentence_transformers/util/retrieval.py163 """This function is deprecated. Use semantic_search instead"""
MEDIUMsentence_transformers/cross_encoder/model.py297 # Create a pool if is not provided, but a list of devices is
MEDIUM…ers/cross_encoder/losses/multiple_negatives_ranking.py137 # Create a mask for each anchor to each candidate index, where the matching positive
MEDIUMsentence_transformers/cross_encoder/losses/plist_mle.py224 # Create a mask for valid entries
MEDIUM…ransformers/cross_encoder/evaluation/classification.py51 # Create a list of pairs, and map the labels to the labels that the model knows
MEDIUMsentence_transformers/sparse_encoder/model.py723 # Create a pool if not provided, but a list of devices is
MEDIUMsentence_transformers/base/sampler.py533 # Create a random numpy permutation using int32 (or int64 if necessary)
MEDIUMsentence_transformers/base/modules/transformer.py1940 # This method is only called if this model has a modules.json, i.e. it's already been saved
MEDIUMsentence_transformers/base/modules/router.py81 # Create an asymmetric model with different encoders for queries and documents
MEDIUMtests/sentence_transformer/test_model.py1201 # Create a simple dataset with a text column
MEDIUMtests/sentence_transformer/test_model.py1322 # Create a Router with mixed modules
MEDIUMtests/sentence_transformer/test_model.py986 # Create a mock model with required prompts
MEDIUMtests/sentence_transformer/test_model.py1043 # Create a mock model with required prompts
MEDIUMtests/sentence_transformer/test_multi_process.py89 # Create a pool
MEDIUMtests/sentence_transformer/test_multi_process.py163 # Create a pool
MEDIUMtests/sentence_transformer/test_trainer.py197 # Create a new model card if a Trainer was initialized
MEDIUMtests/sentence_transformer/test_trainer.py801 # Define a custom batch sampler function
MEDIUMtests/sentence_transformer/test_trainer.py931 # Define a custom multi-dataset batch sampler function
MEDIUMtests/util/test_hard_negatives.py985 # Create a larger dataset with 32 entries
MEDIUMtests/util/test_hard_negatives.py1100 # Create a dataset with just 2 pairs
MEDIUMtests/cross_encoder/test_model.py753 # Create a simple dataset with a text column
MEDIUMtests/cross_encoder/test_trainer.py101 # Create a new model card if a Trainer was initialized
MEDIUMtests/sparse_encoder/test_model.py426 # Create a simple dataset with a text column
MEDIUMtests/sparse_encoder/test_model.py136 # Create an empty sparse tensor
MEDIUMtests/sparse_encoder/test_model.py171 # Create a batch where the first sample has values but the second is all zeros
MEDIUMtests/sparse_encoder/test_multi_process.py62 # Create a pool
MEDIUMtests/sparse_encoder/modules/test_csr.py11# Create a wrapper to measure outputs of the forward method
MEDIUMtests/sparse_encoder/modules/test_csr.py61 # Create the wrapper and replace the forward method
MEDIUMtests/base/samplers/test_no_duplicates_batch_sampler.py39 # Create a list of two 0's, two 1's, two 2's, ... two 49's. Then shuffle.
MEDIUMtests/base/modules/test_router.py683 # Create a Router with different module configurations for each route
MEDIUMtests/base/modules/test_router.py693 # Create a SentenceTransformer with static_embedding followed by router
MEDIUMtests/base/modules/test_router.py67# Create a custom ModuleDict subclass to track access
MEDIUMtests/base/modules/test_router.py209 # Create a Router with StaticEmbedding modules
MEDIUMtests/base/modules/test_router.py313 # Create a Router with StaticEmbedding modules
MEDIUMtests/base/modules/test_router.py439 # Create a Router with StaticEmbedding modules
MEDIUMtests/base/modules/test_router.py457 # Create a loss function that works with router
MEDIUMtests/base/modules/test_router.py481 # Create a Router with StaticEmbedding modules
MEDIUMtests/base/modules/test_router.py493 # Create a loss function that works with router
MEDIUMtests/base/modules/test_router.py1356 # Create an object that will fail modality inference
MEDIUMtests/base/modules/test_dense.py92 # Create a Dense layer with custom keys
MEDIUM…/training/data_augmentation/train_sts_indomain_bm25.py218# Define the training arguments
MEDIUM…/training/data_augmentation/train_sts_indomain_bm25.py239# Create the trainer & start training
MEDIUM…raining/data_augmentation/train_sts_indomain_nlpaug.py137# Define the training arguments
MEDIUM…raining/data_augmentation/train_sts_indomain_nlpaug.py158# Create the trainer & start training
MEDIUM…transformer/training/other/training_batch_hard_trec.py45 # Create a dev set from train set
MEDIUM…ansformer/training/other/training_gooaq_infonce_gor.py42# Define a custom loss that combines InfoNCE and Global Orthogonal Regularization
MEDIUM…ransformer/training/distillation/model_distillation.py111# Create a relatively small dataset for evaluation
MEDIUM…ransformer/training/distillation/model_distillation.py116# Create an STSB evaluator
MEDIUM…ransformer/training/distillation/model_distillation.py165# Define the training arguments
MEDIUM…ransformer/training/distillation/model_distillation.py189# Create the trainer & start training
MEDIUM…ing/distillation/model_distillation_layer_reduction.py58# Create a smaller student model by using only some of the teacher layers
MEDIUM…ing/distillation/model_distillation_layer_reduction.py134# Create a relatively small dataset for evaluation
36 more matches not shown…
Excessive Try-Catch Wrapping174 hits · 176 pts
SeverityFileLineSnippet
LOW…entence_transformer/deprecated_model_card_templates.py189 except Exception as e:
MEDIUM…entence_transformer/deprecated_model_card_templates.py160def get_train_objective_info(dataloader, loss):
LOW…nsformers/sentence_transformer/evaluation/nano_beir.py477 except Exception as e:
LOWsentence_transformers/util/hard_negatives.py457 except Exception:
LOWsentence_transformers/util/logging.py19 except Exception:
MEDIUMsentence_transformers/util/logging.py12def emit(self, record) -> None:
LOWsentence_transformers/util/misc.py69 except Exception:
LOWsentence_transformers/util/file_io.py192 except Exception as first_error:
LOWsentence_transformers/util/file_io.py249 except Exception:
LOWsentence_transformers/cross_encoder/model.py383 except Exception as e:
LOWsentence_transformers/cross_encoder/model.py387 except Exception:
LOW…nce_transformers/cross_encoder/evaluation/nano_beir.py398 except Exception as e:
LOWsentence_transformers/sparse_encoder/model.py788 except Exception as e:
LOWsentence_transformers/sparse_encoder/model.py792 except Exception:
LOWsentence_transformers/base/model.py754 except Exception:
LOWsentence_transformers/base/model.py920 except Exception as exc:
LOWsentence_transformers/base/model.py1093 except Exception as exc:
LOWsentence_transformers/base/model.py1103 except Exception:
LOWsentence_transformers/base/trainer.py685 except Exception:
LOWsentence_transformers/base/trainer.py690 except Exception as exc:
MEDIUMsentence_transformers/base/model_card.py1424def try_to_float(metric_value):
LOWsentence_transformers/base/model_card.py1937 except Exception as exc:
LOWsentence_transformers/base/model_card.py1945 except Exception as exc:
LOWsentence_transformers/base/model_card.py1953 except Exception as exc:
LOWsentence_transformers/base/model_card.py1960 except Exception as exc:
LOWsentence_transformers/base/model_card.py456 except Exception:
LOWsentence_transformers/base/model_card.py851 except Exception:
LOWsentence_transformers/base/model_card.py894 except Exception:
LOWsentence_transformers/base/model_card.py1318 except Exception:
LOWsentence_transformers/base/model_card.py1387 except Exception:
LOWsentence_transformers/base/model_card.py1427 except Exception:
LOWsentence_transformers/base/model_card.py1557 except Exception:
LOWsentence_transformers/base/model_card.py1565 except Exception:
LOWsentence_transformers/base/model_card.py1587 except Exception:
LOWsentence_transformers/base/model_card.py1842 except Exception:
LOWsentence_transformers/base/model_card.py1894 except Exception:
LOWsentence_transformers/base/model_card.py1905 except Exception as exc:
LOWsentence_transformers/base/model_card.py1911 except Exception as exc:
LOWsentence_transformers/base/model_card.py1922 except Exception as exc:
LOWsentence_transformers/base/model_card.py495 except Exception:
LOWsentence_transformers/base/modules/transformer.py1735 except Exception:
LOWtests/cross_encoder/test_model_card.py290 except Exception:
LOWtests/sparse_encoder/test_pretrained.py142 except Exception as e:
LOWtests/sparse_encoder/test_pretrained.py153 except Exception as e:
LOWtests/sparse_encoder/test_pretrained.py183 except Exception as e:
LOWtests/sparse_encoder/test_trainer.py127 except Exception as e:
LOWtests/base/test_model_card.py281 except Exception:
LOWtests/base/test_modality.py694 except Exception as exc:
LOWtests/base/modules/transformer/test_text_generation.py86 except Exception as e:
LOWtests/base/modules/transformer/conftest.py193 except Exception as e:
LOWtests/base/modules/transformer/conftest.py686 except Exception:
LOWtests/base/modules/transformer/test_fill_mask.py78 except Exception as e:
LOW…ts/base/modules/transformer/test_feature_extraction.py96 except Exception as e:
LOWtests/base/modules/transformer/test_any_to_any.py106 except Exception as e:
LOW…se/modules/transformer/test_sequence_classification.py87 except Exception as e:
LOWdocs/cross_encoder/training_overview.md865 except Exception:
LOWdocs/cross_encoder/training_overview.md1069 except Exception:
LOW…ples/sentence_transformer/training/nli/training_nli.py117except Exception:
LOW…s/sentence_transformer/training/nli/training_nli_v2.py126except Exception:
LOW…entence_transformer/training/nli/training_nli_angle.py127except Exception:
114 more matches not shown…
Docstring Block Structure18 hits · 90 pts
SeverityFileLineSnippet
HIGHsentence_transformers/sentence_transformer/trainer.py37 SentenceTransformerTrainer is a simple but feature-complete training and eval loop for PyTorch based on the 🤗 T
HIGH…rmers/sentence_transformer/modules/static_embedding.py182 Creates a StaticEmbedding instance from a distillation process using the `model2vec` package. Args:
HIGH…rmers/sentence_transformer/modules/static_embedding.py245 Create a StaticEmbedding instance from a model2vec model. This method loads a pre-trained model2vec model
HIGHsentence_transformers/util/misc.py41 Import a dotted module path and return the attribute/class designated by the last name in the path. Raise Impor
HIGHsentence_transformers/util/similarity.py264 Converts a similarity function name or enum value to the corresponding similarity function. Args:
HIGHsentence_transformers/util/similarity.py302 Converts a similarity function into a pairwise similarity function. The pairwise similarity function r
HIGHsentence_transformers/util/quantization.py31 Performs semantic search using the FAISS library. Rescoring will be performed if: 1. `rescore` is True
HIGHsentence_transformers/util/quantization.py198 Performs semantic search using the usearch library. Rescoring will be performed if: 1. `rescore` is True
HIGHsentence_transformers/util/file_io.py41 Checks if the given model name or path corresponds to a SentenceTransformer model. Args: model_name_or
HIGHsentence_transformers/util/file_io.py82 Loads a file from a local or remote location. Args: model_name_or_path (str): The model name or path.
HIGHsentence_transformers/util/file_io.py141 Loads the subfolder path for a given model name or path. Args: model_name_or_path (str): The name or p
HIGHsentence_transformers/util/file_io.py206Download a URL to a local file with a progress bar. The content is streamed in chunks and first written to a tempor
HIGHsentence_transformers/backend/optimize.py27 Export an optimized ONNX model from a SentenceTransformer, SparseEncoder, or CrossEncoder model. The O1-O4 opt
HIGHsentence_transformers/backend/quantize.py32 Export a quantized ONNX model from a SentenceTransformer, SparseEncoder, or CrossEncoder model. This function
HIGHsentence_transformers/backend/quantize.py119 Export a quantized OpenVINO model from a SentenceTransformer, SparseEncoder, or CrossEncoder model. This funct
HIGHsentence_transformers/cross_encoder/model.py556 Performs predictions with the CrossEncoder on the given input pairs. .. tip:: Adjusting `
HIGHsentence_transformers/base/modality.py550Infer the modality of a single input sample by inspecting its type/structure. Pure type-based detection, does not r
HIGHsentence_transformers/base/modules/module.py311 A utility function to load the PyTorch weights of a model from a checkpoint. The checkpoint can be either a
Deep Nesting91 hits · 84 pts
SeverityFileLineSnippet
LOWsentence_transformers/sentence_transformer/model.py481
LOWsentence_transformers/sentence_transformer/model.py838
LOWsentence_transformers/sentence_transformer/model.py903
LOWsentence_transformers/sentence_transformer/fit_mixin.py63
LOWsentence_transformers/sentence_transformer/fit_mixin.py407
LOWsentence_transformers/sentence_transformer/fit_mixin.py465
LOW…tence_transformer/losses/multiple_negatives_ranking.py235
LOW…ransformer/losses/cached_multiple_negatives_ranking.py65
LOW…ransformer/losses/cached_multiple_negatives_ranking.py153
LOW…ransformer/losses/cached_multiple_negatives_ranking.py425
LOW…transformers/sentence_transformer/losses/matryoshka.py92
LOW…rmers/sentence_transformer/losses/cached_gist_embed.py46
LOW…tence_transformer/datasets/no_duplicates_dataloader.py29
LOW…ntence_transformer/evaluation/information_retrieval.py299
LOW…ntence_transformer/evaluation/information_retrieval.py447
LOW…formers/sentence_transformer/evaluation/translation.py102
LOW…s/sentence_transformer/evaluation/paraphrase_mining.py245
LOW…e_transformers/sentence_transformer/modules/pooling.py163
LOW…e_transformers/sentence_transformer/modules/pooling.py237
LOW…rmers/sentence_transformer/modules/tokenizer/phrase.py63
LOWsentence_transformers/util/hard_negatives.py25
LOWsentence_transformers/util/deprecated_import.py210
LOWsentence_transformers/util/quantization.py18
LOWsentence_transformers/util/quantization.py185
LOWsentence_transformers/util/quantization.py371
LOWsentence_transformers/util/retrieval.py89
LOWsentence_transformers/util/retrieval.py167
LOWsentence_transformers/util/retrieval.py258
LOWsentence_transformers/util/environment.py40
LOWsentence_transformers/util/file_io.py205
LOWsentence_transformers/util/decorators.py92
LOWsentence_transformers/util/decorators.py115
LOWsentence_transformers/backend/utils.py155
LOWsentence_transformers/cross_encoder/model.py279
LOWsentence_transformers/cross_encoder/model.py353
LOWsentence_transformers/cross_encoder/fit_mixin.py61
LOWsentence_transformers/cross_encoder/fit_mixin.py417
LOWsentence_transformers/sparse_encoder/model.py706
LOW…rs/sparse_encoder/evaluation/reciprocal_rank_fusion.py105
LOW…_transformers/sparse_encoder/modules/splade_pooling.py65
LOWsentence_transformers/base/model.py830
LOWsentence_transformers/base/model.py1028
LOWsentence_transformers/base/modality.py231
LOWsentence_transformers/base/modality.py440
LOWsentence_transformers/base/modality.py464
LOWsentence_transformers/base/data_collator.py114
LOWsentence_transformers/base/trainer.py392
LOWsentence_transformers/base/trainer.py417
LOWsentence_transformers/base/trainer.py532
LOWsentence_transformers/base/trainer.py815
LOWsentence_transformers/base/trainer.py1206
LOWsentence_transformers/base/model_card.py429
LOWsentence_transformers/base/model_card.py512
LOWsentence_transformers/base/model_card.py594
LOWsentence_transformers/base/model_card.py696
LOWsentence_transformers/base/model_card.py792
LOWsentence_transformers/base/model_card.py1361
LOWsentence_transformers/base/model_card.py1889
LOWsentence_transformers/base/modules/transformer.py452
LOWsentence_transformers/base/modules/transformer.py581
31 more matches not shown…
Redundant / Tautological Comments48 hits · 64 pts
SeverityFileLineSnippet
LOW…mers/sentence_transformer/losses/batch_hard_triplet.py240 # Check if labels[i] == labels[j]
LOW…mers/sentence_transformer/losses/batch_hard_triplet.py254 # Check if labels[i] != labels[k]
LOWsentence_transformers/util/tensor.py23 # Check if list contains sparse tensors
LOWsentence_transformers/util/retrieval.py331 # Check if we need to increase sort_max_size
LOWsentence_transformers/cross_encoder/model_card.py211 # Check if any pair element is non-text (from usage_examples before asset saving)
LOW…arse_encoder/evaluation/sparse_embedding_similarity.py78 # Print the results
LOW…rse_encoder/evaluation/sparse_information_retrieval.py128 # Print the results
LOW…ansformers/sparse_encoder/evaluation/sparse_triplet.py90 # Print the results
LOW…sformers/sparse_encoder/evaluation/sparse_nano_beir.py171 # Print the results
LOW…rse_encoder/evaluation/sparse_binary_classification.py104 # Print the results
LOW…e_transformers/sparse_encoder/evaluation/sparse_mse.py78 # Print the results
LOW…ormers/sparse_encoder/evaluation/sparse_translation.py75 # Print the results
LOW…sformers/sparse_encoder/evaluation/sparse_reranking.py102 # Print the results
LOW…mers/sparse_encoder/modules/sparse_static_embedding.py202 # Check if we have a JSON path in config
LOWsentence_transformers/base/model.py971 # Check if this is a Sentence Transformer model
LOWsentence_transformers/base/model.py1057 # Check if the config_sentence_transformers.json file exists (exists since v2 of the framework)
LOWsentence_transformers/base/model.py1082 # Check if a readme exists. README is optional metadata; a transient Hub error
LOWsentence_transformers/base/model.py1136 # Check if the `load` method only accepts a single parameter (the path to the local directory).
LOWsentence_transformers/base/model_card.py651 # Check if the model has a tuple modality whose parts all match available columns.
LOWtests/sparse_encoder/utils.py26 # Check if shape matches
LOWtests/sparse_encoder/utils.py40 # Check if indices are the same
LOWtests/sparse_encoder/utils.py44 # Check if values are close
LOWtests/sparse_encoder/test_trainer.py116 # Check if model parameters have changed after training
LOW…sparse_encoder/modules/test_sparse_static_embedding.py55 # Check if embeddings are the same before and after save/load
LOW…sparse_encoder/modules/test_sparse_static_embedding.py58 # Check if SparseStaticEmbedding weights are maintained after loading
LOW…modules/transformer/update_transformers_tiny_models.py42 # Check if the model_id contains the architecture name
LOW…raining/data_augmentation/train_sts_qqp_crossdomain.py58# Check if the QQP dataset exists. If not, download and extract
LOW…er/training/quora_duplicate_questions/create_splits.py510####### Write files for Information Retrieval #####
LOW…/embedding-quantization/semantic_search_recommended.py120 # Output the results
LOW…transformer/applications/clustering/fast_clustering.py32# Check if the dataset exists. If not, download and extract
LOW…tions/semantic-search/semantic_search_quora_pytorch.py35# Check if embedding cache path exists
LOW…tions/semantic-search/semantic_search_quora_pytorch.py37 # Check if the dataset exists. If not, download and extract
LOW…cations/semantic-search/semantic_search_quora_annoy.py57# Check if embedding cache path exists
LOW…cations/semantic-search/semantic_search_quora_annoy.py59 # Check if the dataset exists. If not, download and extract
LOW…tions/semantic-search/semantic_search_quora_hnswlib.py47# Check if embedding cache path exists
LOW…tions/semantic-search/semantic_search_quora_hnswlib.py49 # Check if the dataset exists. If not, download and extract
LOW…cations/semantic-search/semantic_search_quora_faiss.py61# Check if embedding cache path exists
LOW…cations/semantic-search/semantic_search_quora_faiss.py63 # Check if the dataset exists. If not, download and extract
LOW…/cross_encoder/applications/cross_encoder_reranking.py41# Check if embedding cache path exists
LOW…parse_encoder/evaluation/sparse_reranking_evaluator.py53# Print the results
LOW…coder/evaluation/sparse_nanobeir_advanced_evaluator.py45# Print the results
LOW…arse_encoder/evaluation/sparse_similarity_evaluator.py31# Print the results
LOW…rse_encoder/evaluation/sparse_translation_evaluator.py30# Print the results
LOW…/sparse_encoder/evaluation/sparse_triplet_evaluator.py37# Print the results
LOW…parse_encoder/evaluation/sparse_retrieval_evaluator.py69# Print the results
LOW…sparse_encoder/evaluation/sparse_nanobeir_evaluator.py94# Print the results
LOW…_encoder/evaluation/sparse_classification_evaluator.py58# Print the results
LOW…ples/sparse_encoder/evaluation/sparse_mse_evaluator.py32# Print the results
AI Slop Vocabulary7 hits · 14 pts
SeverityFileLineSnippet
LOW…transformers/sentence_transformer/losses/gist_embed.py240 # so the label for anchor[i] is i. This means that we can just use arange
LOW…rmers/sentence_transformer/losses/cached_gist_embed.py316 # so the label for anchor[i] is i. This means that we can just use arange
LOWsentence_transformers/base/trainer.py514 # if loss is nan or inf simply add the average of previous logged losses
LOWsentence_transformers/base/trainer.py567 # would not accept it. If None, we just call the super().log() method without it so that it works with all versi
LOWsentence_transformers/base/trainer.py644 # If the evaluator is not defined, we can just return the output
MEDIUM…applications/parallel-sentence-mining/bitext_mining.py27# Model we want to use for bitext mining. sentence-transformers/LaBSE achieves state-of-the-art performance
MEDIUM…rmer/applications/parallel-sentence-mining/bucc2018.py25# Model we want to use for bitext mining. sentence-transformers/LaBSE achieves state-of-the-art performance
Verbosity Indicators9 hits · 14 pts
SeverityFileLineSnippet
LOW…raining/data_augmentation/train_sts_qqp_crossdomain.py107# Step 1: Train cross-encoder model with STSbenchmark
LOW…raining/data_augmentation/train_sts_qqp_crossdomain.py152# Step 2: Label QQP train dataset using cross-encoder (BERT) model
LOW…raining/data_augmentation/train_sts_qqp_crossdomain.py177# Step 3: Train bi-encoder (SBERT) model with QQP dataset - Augmented SBERT
LOW…/training/data_augmentation/train_sts_indomain_bm25.py89# Step 1: Train cross-encoder model with (gold) STS benchmark dataset
LOW…/training/data_augmentation/train_sts_indomain_bm25.py132# Step 2: Label BM25 sampled STSb (silver dataset) using cross-encoder model
LOW…/training/data_augmentation/train_sts_indomain_bm25.py190# Step 3: Train bi-encoder model with both (gold + silver) STSbenchmark dataset - Augmented SBERT
LOW…ining/data_augmentation/train_sts_indomain_semantic.py97# Step 1: Train cross-encoder model with STSbenchmark
LOW…ining/data_augmentation/train_sts_indomain_semantic.py141# Step 2: Find silver pairs to label
LOW…ining/data_augmentation/train_sts_indomain_semantic.py205# Step 3: Train bi-encoder model with both STSbenchmark and labeled AllNlI - Augmented SBERT
Over-Commented Block11 hits · 11 pts
SeverityFileLineSnippet
LOWdocs/conf.py1# Configuration file for the Sphinx documentation builder.
LOWdocs/sparse_encoder/training_overview.md141 router = Router.for_query_document(
LOW…ransformer/training/distillation/model_quantization.py61 },
LOW…transformer/training/multilingual/make_multilingual.py81# If we want, we can limit the maximum sequence length for the model
LOW…ations/embedding-quantization/semantic_search_faiss.py61 # In the first call we'll provide the `corpus_embeddings` and get the `corpus_index` back, which
LOW…ions/embedding-quantization/semantic_search_usearch.py61 # In the first call we'll provide the `corpus_embeddings` and get the `corpus_index` back, which
LOW…ncoder/training/ms_marco/training_ms_marco_plistmle.py101 # lambda_weight = PListMLELambdaWeight(rank_discount_fn=custom_discount)
LOWexamples/cross_encoder/training/distillation/README.md61# {"corpus_id": 4, "score": 0.91639173},
LOW.github/workflows/sync-skills.yml1name: Sync skill to huggingface/skills
LOW…train_sentence_transformer_static_embedding_example.py1#!/usr/bin/env python3
LOW…cripts/train_sentence_transformer_with_lora_example.py1#!/usr/bin/env python3
Slop Phrases2 hits · 6 pts
SeverityFileLineSnippet
MEDIUM…te_questions/application_duplicate_questions_mining.py10# For demonstration purposes, we limit it to a few questions which all have on duplicate
MEDIUM…ce_transformer/training/prompts/training_nq_prompts.py26# Feel free to adjust these variables: