File size: 4,214 Bytes
6ab2ab5 77a7228 67b03c6 d12caca 6ab2ab5 ff26e5c aca0a0f cfc8509 aca0a0f 05546cd aca0a0f ff26e5c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
---
library_name: keras-hub
license: mit
tags:
- speech-recognition
- keras
- automatic-speech-recognition
pipeline_tag: automatic-speech-recognition
---
### Model Overview
⚠️ Whisper is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model.
A Whisper encoder-decoder network for speech.
This class implements a Transformer-based encoder-decoder model as
described in
["Robust Speech Recognition via Large-Scale Weak Supervision"](https://arxiv.org/abs/2212.04356).
It includes the embedding lookups and transformer layers, but not the head
for predicting the next token.
The default constructor gives a fully customizable, randomly initialized Whisper
model with any number of layers, heads, and embedding dimensions. To load
preset architectures and weights, use the `from_preset()` constructor.
Disclaimer: Pre-trained models are provided on an "as is" basis, without
warranties or conditions of any kind. The underlying model is provided by a
third party and subject to a separate license, available
[here](https://github.com/openai/whisper).
## Links
* [Whisper Quickstart Notebook](coming soon)
* [Whisper API Documentation](https://keras.io/keras_hub/api/models/whisper/)
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
## Installation
Keras and KerasHub can be installed with:
```
pip install -U -q keras-hub
pip install -U -q keras
```
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
__Arguments__
- __vocabulary_size__: int. The size of the token vocabulary.
- __num_layers__: int. The number of transformer encoder layers and
transformer decoder layers.
- __num_heads__: int. The number of attention heads for each transformer.
The hidden size must be divisible by the number of attention heads.
- __hidden_dim__: int. The size of the transformer encoding and pooler layers.
- __intermediate_dim__: int. The output dimension of the first Dense layer in
a two-layer feedforward network for each transformer.
- __num_mels__: int. The number of mel-frequency filters. Defaults to `80`.
- __dropout__: float. Dropout probability for the Transformer encoder.
- __max_encoder_sequence_length__: int. The maximum sequence length that the
audio encoder can consume. Since the second convolutional layer in
the encoder reduces the sequence length by half (stride of 2), we
use `max_encoder_sequence_length // 2` as the sequence length for the
positional embedding layer.
- __max_decoder_sequence_length__: int. The maximum sequence length that the
text decoder can consume.
## Example Usage
```python
import keras_hub
import keras_core as keras
import numpy as np
```
```python
input_data = {
"encoder_features": np.ones(shape=(1, 12, 80), dtype="int32"),
"decoder_token_ids": np.ones(shape=(1, 12), dtype="int32"),
"decoder_padding_mask": np.array(
[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]]
),
}
# Randomly initialized Whisper encoder-decoder model with a custom config.
model = keras_hub.models.WhisperBackbone(
vocabulary_size=51864,
num_layers=4,
num_heads=4,
hidden_dim=256,
intermediate_dim=512,
max_encoder_sequence_length=128,
max_decoder_sequence_length=128,
)
model(input_data)
```
## Example Usage with Hugging Face URI
```python
import keras_hub
import keras_core as keras
import numpy as np
```
```python
input_data = {
"encoder_features": np.ones(shape=(1, 12, 80), dtype="int32"),
"decoder_token_ids": np.ones(shape=(1, 12), dtype="int32"),
"decoder_padding_mask": np.array(
[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]]
),
}
# Randomly initialized Whisper encoder-decoder model with a custom config.
model = keras_hub.models.WhisperBackbone(
vocabulary_size=51864,
num_layers=4,
num_heads=4,
hidden_dim=256,
intermediate_dim=512,
max_encoder_sequence_length=128,
max_decoder_sequence_length=128,
)
model(input_data)
```
|