Spaces:

slabstech
/

dhwani-internal-api-server

Paused

sachin commited on Mar 17

Commit

54103f9

1 Parent(s): ecf3eb5

update

Files changed (2) hide show

docs/issues.md ADDED Viewed

+2025-03-17 22:33:24,340 - parler_tts.modeling_parler_tts - WARNING - `prompt_attention_mask` is specified but `attention_mask` is not. A full `attention_mask` will be created. Make sure this is the intended behaviour.
+W0317 22:33:36.322000 1 torch/_inductor/utils.py:1137] [0/0] Not enough SMs to use max_autotune_gemm mode
+CUDAGraph supports dynamic shapes by recording a new graph for each distinct input size. Recording too many CUDAGraphs may lead to extra overhead. We have observed 51 distinct sizes. Please consider the following options for better performance: a) padding inputs to a few fixed number of shapes; or b) set torch._inductor.config.triton.cudagraph_skip_dynamic_graphs=True. Set torch._inductor.config.triton.cudagraph_dynamic_shape_warn_limit=None to silence this warning.

src/server/main.py CHANGED Viewed

@@ -97,17 +97,17 @@ class TTSModelManager:
         # TODO - temporary disable -torch.compile
         # Update model configuration
         model.config.pad_token_id = tokenizer.pad_token_id
         # Update for deprecation: use max_batch_size instead of batch_size
         if hasattr(model.generation_config.cache_config, 'max_batch_size'):
             model.generation_config.cache_config.max_batch_size = 1
         model.generation_config.cache_implementation = "static"
         # Compile the model
-        ##compile_mode = "default"
-        compile_mode = "reduce-overhead"
         model.forward = torch.compile(model.forward, mode=compile_mode)

         # TODO - temporary disable -torch.compile
+        '''
         # Update model configuration
         model.config.pad_token_id = tokenizer.pad_token_id
         # Update for deprecation: use max_batch_size instead of batch_size
         if hasattr(model.generation_config.cache_config, 'max_batch_size'):
             model.generation_config.cache_config.max_batch_size = 1
         model.generation_config.cache_implementation = "static"
+        '''
         # Compile the model
+        compile_mode = "default"
+        #compile_mode = "reduce-overhead"
         model.forward = torch.compile(model.forward, mode=compile_mode)