SoTA open-source TTS
96 matches across 11 categories. Click a row to expand file-level details.
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | src/chatterbox/tts.py | 0 | quick cleanup func for punctuation from llms or containing chars not seen often in the dataset |
| HIGH | src/chatterbox/tts_turbo.py | 0 | quick cleanup func for punctuation from llms or containing chars not seen often in the dataset |
| HIGH | src/chatterbox/mtl_tts.py | 0 | quick cleanup func for punctuation from llms or containing chars not seen often in the dataset |
| HIGH | src/chatterbox/tts.py | 0 | conditionals for t3 and s3gen - t3 conditionals: - speaker_emb - clap_emb - cond_prompt_speech_tokens - cond_prompt_spee |
| HIGH | src/chatterbox/tts_turbo.py | 0 | conditionals for t3 and s3gen - t3 conditionals: - speaker_emb - clap_emb - cond_prompt_speech_tokens - cond_prompt_spee |
| HIGH | src/chatterbox/mtl_tts.py | 0 | conditionals for t3 and s3gen - t3 conditionals: - speaker_emb - clap_emb - cond_prompt_speech_tokens - cond_prompt_spee |
| HIGH | src/chatterbox/models/s3gen/transformer/subsampling.py | 0 | input x. args: x (torch.tensor): input tensor (#batch, time, idim). x_mask (torch.tensor): input mask (#batch, 1, time). |
| HIGH | src/chatterbox/models/s3gen/transformer/subsampling.py | 0 | input x. args: x (torch.tensor): input tensor (#batch, time, idim). x_mask (torch.tensor): input mask (#batch, 1, time). |
| HIGH | src/chatterbox/models/s3gen/transformer/subsampling.py | 0 | input x. args: x (torch.tensor): input tensor (#batch, time, idim). x_mask (torch.tensor): input mask (#batch, 1, time). |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | src/chatterbox/models/s3gen/decoder.py | 244 | Forward pass of the UNet1DConditional model. Args: x: (B, 80, T) mask (_type_) |
| HIGH | src/chatterbox/models/s3gen/xvector.py | 16 | Perform padding for the list of tensors. Args: xs (List): List of Tensors [(T_1, `*`), (T_2, `*`), ..., (T_ |
| HIGH | src/chatterbox/models/s3gen/utils/mask.py | 19 | def subsequent_mask( size: int, device: torch.device = torch.device("cpu"), ) -> torch.Tensor: """C |
| HIGH | src/chatterbox/models/s3gen/utils/mask.py | 60 | Create mask for subsequent steps (size, size) with chunk size, this is for streaming encoder Args: s |
| HIGH | src/chatterbox/models/s3gen/utils/mask.py | 168 | Make mask tensor containing indices of padded part. See description of make_non_pad_mask. Args: length |
| HIGH | src/chatterbox/models/s3gen/matcha/decoder.py | 364 | Forward pass of the UNet1DConditional model. Args: x (torch.Tensor): shape (batch_size, in_channels |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | example_tts_turbo.py | 2 | |
| LOW | src/chatterbox/__init__.py | 9 | |
| LOW | src/chatterbox/__init__.py | 10 | |
| LOW | src/chatterbox/__init__.py | 11 | |
| LOW | src/chatterbox/__init__.py | 11 | |
| LOW | src/chatterbox/models/tokenizers/__init__.py | 1 | |
| LOW | src/chatterbox/models/tokenizers/__init__.py | 1 | |
| LOW | src/chatterbox/models/s3tokenizer/__init__.py | 1 | |
| LOW | src/chatterbox/models/s3tokenizer/__init__.py | 1 | |
| LOW | src/chatterbox/models/s3tokenizer/__init__.py | 1 | |
| LOW | src/chatterbox/models/s3tokenizer/__init__.py | 1 | |
| LOW | src/chatterbox/models/s3tokenizer/__init__.py | 1 | |
| LOW | src/chatterbox/models/t3/__init__.py | 1 | |
| LOW | src/chatterbox/models/t3/t3.py | 4 | |
| LOW | src/chatterbox/models/t3/t3.py | 4 | |
| LOW | src/chatterbox/models/t3/inference/t3_hf_backend.py | 4 | |
| LOW | src/chatterbox/models/t3/modules/learned_pos_emb.py | 1 | |
| LOW | src/chatterbox/models/t3/modules/learned_pos_emb.py | 4 | |
| LOW | src/chatterbox/models/voice_encoder/__init__.py | 1 | |
| LOW | src/chatterbox/models/voice_encoder/__init__.py | 1 | |
| LOW | src/chatterbox/models/s3gen/__init__.py | 1 | |
| LOW | src/chatterbox/models/s3gen/__init__.py | 2 | |
| LOW | src/chatterbox/models/s3gen/flow.py | 23 | |
| LOW | src/chatterbox/models/s3gen/flow_matching.py | 14 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | src/chatterbox/models/t3/t3.py | 281 | speech_head=self.speech_head, |
| LOW | src/chatterbox/models/s3gen/decoder.py | 1 | # Copyright (c) 2024 Alibaba Inc (authors: Xiang Lyu, Zhihao Du) |
| LOW | src/chatterbox/models/s3gen/flow.py | 1 | # Copyright (c) 2024 Alibaba Inc (authors: Xiang Lyu, Zhihao Du) |
| LOW | src/chatterbox/models/s3gen/hifigan.py | 1 | # jrm: adapted from CosyVoice/cosyvoice/hifigan/generator.py |
| LOW | src/chatterbox/models/s3gen/f0_predictor.py | 1 | # Copyright (c) 2024 Alibaba Inc (authors: Xiang Lyu, Kai Hu) |
| LOW | src/chatterbox/models/s3gen/s3gen.py | 1 | # Modified from CosyVoice https://github.com/FunAudioLLM/CosyVoice |
| LOW | src/chatterbox/models/s3gen/flow_matching.py | 1 | # Copyright (c) 2024 Alibaba Inc (authors: Xiang Lyu, Zhihao Du) |
| LOW | src/chatterbox/models/s3gen/transformer/attention.py | 1 | # Copyright (c) 2019 Shigeki Karita |
| LOW | src/chatterbox/models/s3gen/transformer/attention.py | 161 | Returns: |
| LOW | src/chatterbox/models/s3gen/transformer/attention.py | 281 | # cache(1, head, 0, d_k * 2) (16/-1, -1/-1, 16/0 mode) |
| LOW | src/chatterbox/models/s3gen/transformer/subsampling.py | 1 | # Copyright (c) 2021 Mobvoi Inc (Binbin Zhang, Di Wu) |
| LOW | src/chatterbox/models/s3gen/transformer/convolution.py | 1 | # Copyright (c) 2020 Mobvoi Inc. (authors: Binbin Zhang, Di Wu) |
| LOW | …hatterbox/models/s3gen/transformer/upsample_encoder.py | 1 | # Copyright (c) 2021 Mobvoi Inc (Binbin Zhang, Di Wu) |
| LOW | src/chatterbox/models/s3gen/transformer/embedding.py | 1 | # Copyright (c) 2020 Mobvoi Inc. (authors: Binbin Zhang, Di Wu) |
| LOW | …c/chatterbox/models/s3gen/transformer/encoder_layer.py | 1 | # Copyright (c) 2021 Mobvoi Inc (Binbin Zhang, Di Wu) |
| LOW | src/chatterbox/models/s3gen/transformer/activation.py | 1 | # Copyright (c) 2020 Johns Hopkins University (Shinji Watanabe) |
| LOW | …/models/s3gen/transformer/positionwise_feed_forward.py | 1 | # Copyright (c) 2019 Shigeki Karita |
| LOW | src/chatterbox/models/s3gen/utils/class_utils.py | 1 | # Copyright [2023-11-28] <sxc19@mails.tsinghua.edu.cn, Xingchen Song> |
| LOW | src/chatterbox/models/s3gen/utils/mask.py | 1 | # Copyright (c) 2019 Shigeki Karita |
| Severity | File | Line | Snippet |
|---|---|---|---|
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 63 | |
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 64 | |
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 66 | |
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 69 | |
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 70 | |
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 71 | |
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 73 | |
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 74 | |
| MEDIUM | src/chatterbox/models/s3gen/flow_matching.py | 76 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| HIGH | gradio_tts_turbo_app.py | 62 | if (end < current_text.length && current_text[end] === ' ') suffix = ""; |
| HIGH | gradio_tts_app.py | 61 | min_p = gr.Slider(0.00, 1.00, step=0.01, label="min_p || Newer Sampler. Recommend 0.02 > 0.1. Handles Hi |
| HIGH | gradio_tts_app.py | 62 | top_p = gr.Slider(0.00, 1.00, step=0.01, label="top_p || Original Sampler. 1.0 Disables(recommended). Or |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | multilingual_app.py | 150 | except Exception as e: |
| MEDIUM | multilingual_app.py | 151 | print(f"Error loading model: {e}") |
| LOW | multilingual_app.py | 158 | except Exception as e: |
| LOW | src/chatterbox/tts_turbo.py | 212 | except Exception as e: |
| MEDIUM | src/chatterbox/tts_turbo.py | 204 | def norm_loudness(self, wav, sr, target_lufs=-27): |
| LOW | src/chatterbox/models/tokenizers/tokenizer.py | 131 | except Exception as e: |
| LOW | src/chatterbox/models/tokenizers/tokenizer.py | 187 | except Exception as e: |
| LOW | src/chatterbox/models/tokenizers/tokenizer.py | 251 | except Exception as e: |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | src/chatterbox/models/tokenizers/tokenizer.py | 75 | |
| LOW | src/chatterbox/models/tokenizers/tokenizer.py | 285 | |
| LOW | src/chatterbox/models/s3gen/decoder.py | 229 | |
| LOW | src/chatterbox/models/s3gen/xvector.py | 130 | |
| LOW | src/chatterbox/models/s3gen/utils/mask.py | 89 | |
| LOW | src/chatterbox/models/s3gen/matcha/decoder.py | 345 | |
| LOW | src/chatterbox/models/s3gen/matcha/transformer.py | 96 |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | multilingual_app.py | 120 | def get_supported_languages_display() -> str: |
| LOW | src/chatterbox/models/t3/inference/t3_hf_backend.py | 34 | def prepare_inputs_for_generation( |
| LOW | src/chatterbox/models/t3/modules/perceiver.py | 22 | def _relative_position_bucket(relative_position, causal=True, num_buckets=32, max_distance=128): |
| LOW | src/chatterbox/models/t3/modules/perceiver.py | 84 | def scaled_dot_product_attention(self, q, k, v, mask=None): |
| LOW | src/chatterbox/models/s3gen/utils/mel.py | 15 | def dynamic_range_compression_torch(x, C=1, clip_val=1e-5): |
| LOW | src/chatterbox/models/s3gen/utils/intmeanflow.py | 5 | def get_intmeanflow_time_mixer(dims): |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | src/chatterbox/vc.py | 63 | # Check if MPS is available on macOS |
| LOW | src/chatterbox/tts.py | 169 | # Check if MPS is available on macOS |
| LOW | src/chatterbox/tts_turbo.py | 187 | # Check if MPS is available on macOS |
| LOW | src/chatterbox/mtl_tts.py | 233 | # Check if MPS is available on macOS |
| Severity | File | Line | Snippet |
|---|---|---|---|
| LOW | src/chatterbox/models/s3gen/transformer/convolution.py | 124 | # It's better we just return None if no cache is required, |