Repository Analysis

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

10.5 Low AI signal View on GitHub
10.5
Adjusted Score
10.5
Raw Score
100%
Time Factor
2026-05-25
Last Push
22,166
Stars
Python
Language
3,994
Lines of Code
26
Files
35
Pattern Hits
2026-05-31
Scan Date

Score History

Severity Breakdown

CRITICAL 0HIGH 0MEDIUM 2LOW 33

Pattern Findings

35 matches across 6 categories. Click a row to expand file-level details.

Hyper-Verbose Identifiers14 hits · 17 pts
SeverityFileLineSnippet
LOWwhisperx/asr.py22def find_numeral_symbol_tokens(tokenizer):
LOWwhisperx/SubtitlesProcessor.py47 def estimate_timestamp_for_word(self, words, i, next_segment_start_time=None):
LOWwhisperx/SubtitlesProcessor.py99 def determine_advanced_split_points(self, segment, next_segment_start_time=None):
LOWwhisperx/SubtitlesProcessor.py141 def generate_subtitles_from_split_points(self, segment, split_points, next_start_time=None):
LOWtests/test_word_timestamp_interpolation.py88 def test_known_chars_get_timestamps(self):
LOWtests/test_word_timestamp_interpolation.py96 def test_unknown_word_gets_timestamps(self):
LOWtests/test_word_timestamp_interpolation.py105 def test_mixed_word_gets_timestamps(self):
LOWtests/test_word_timestamp_interpolation.py114 def test_unknown_word_does_not_corrupt_neighbors(self):
LOWtests/test_word_timestamp_interpolation.py124 def test_all_unknown_segment_gets_timestamps(self):
LOWtests/test_word_timestamp_interpolation.py132 def test_timestamps_are_ordered(self):
LOWtests/test_word_timestamp_interpolation.py155 def test_ignore_does_not_crash(self):
LOWtests/test_word_timestamp_interpolation.py164 def test_ignore_segments_have_valid_timestamps(self):
LOWtests/test_word_timestamp_interpolation.py177 def test_ignore_preserves_nans(self):
LOWtests/test_word_timestamp_interpolation.py188 def test_ignore_word_assignment_integration(self):
Deep Nesting10 hits · 10 pts
SeverityFileLineSnippet
LOWwhisperx/alignment.py117
LOWwhisperx/asr.py315
LOWwhisperx/asr.py114
LOWwhisperx/diarize.py185
LOWwhisperx/utils.py252
LOWwhisperx/utils.py262
LOWwhisperx/transcribe.py20
LOWwhisperx/SubtitlesProcessor.py76
LOWwhisperx/SubtitlesProcessor.py99
LOWwhisperx/vads/pyannote.py108
Unused Imports6 hits · 6 pts
SeverityFileLineSnippet
LOWwhisperx/vads/__init__.py1
LOWwhisperx/vads/__init__.py2
LOWwhisperx/vads/__init__.py3
LOWwhisperx/vads/vad.py3
LOWwhisperx/vads/vad.py4
LOWwhisperx/vads/vad.py4
Excessive Try-Catch Wrapping2 hits · 3 pts
SeverityFileLineSnippet
LOWwhisperx/alignment.py103 except Exception as e:
MEDIUMwhisperx/alignment.py105 print(f"Error loading model from huggingface, check https://huggingface.co/models for finetuned wav2vec2.0 m
AI Slop Vocabulary1 hit · 3 pts
SeverityFileLineSnippet
MEDIUMwhisperx/alignment.py319 # increment word_idx, nltk word tokenization would probably be more robust here, but us space for now...
Redundant / Tautological Comments2 hits · 3 pts
SeverityFileLineSnippet
LOWwhisperx/diarize.py237 # Assign speaker to words
LOWwhisperx/vads/pyannote.py34 # Check if the resolved model file exists