hubert-large-korean / README.md
franknoh's picture
Update README.md
62054d1
metadata
license: apache-2.0
language:
  - ko
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
  - speech
  - audio

hubert-large-korean

Model Details

Hubert(Hidden-Unit BERT)๋Š” Facebook์—์„œ ์ œ์•ˆํ•œ Speech Representation Learning ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. Hubert๋Š” ๊ธฐ์กด์˜ ์Œ์„ฑ ์ธ์‹ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ, ์Œ์„ฑ ์‹ ํ˜ธ๋ฅผ raw waveform์—์„œ ๋ฐ”๋กœ ํ•™์Šตํ•˜๋Š” self-supervised learning ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ด ์—ฐ๊ตฌ๋Š” ๊ตฌ๊ธ€์˜ TPU Research Cloud(TRC)๋ฅผ ํ†ตํ•ด ์ง€์›๋ฐ›์€ Cloud TPU๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Model Description

Base Large
CNN Encoder strides 5, 2, 2, 2, 2, 2, 2
kernel width 10, 3, 3, 3, 3, 2, 2
channel 512
Transformer Encoder Layer 12 24
embedding dim 768 1024
inner FFN dim 3072 4096
attention heads 8 16
Projection dim 256 768
Params 95M 317M

How to Get Started with the Model

Pytorch

import torch
from transformers import HubertModel

model = HubertModel.from_pretrained("team-lucid/hubert-large-korean")

wav = torch.ones(1, 16000)
outputs = model(wav)
print(f"Input:   {wav.shape}")  # [1, 16000]
print(f"Output:  {outputs.last_hidden_state.shape}")  # [1, 49, 768]

JAX/Flax

import jax.numpy as jnp
from transformers import FlaxAutoModel

model = FlaxAutoModel.from_pretrained("team-lucid/hubert-large-korean", trust_remote_code=True)

wav = jnp.ones((1, 16000))
outputs = model(wav)
print(f"Input:   {wav.shape}")  # [1, 16000]
print(f"Output:  {outputs.last_hidden_state.shape}")  # [1, 49, 768]

Training Details

Training Data

ํ•ด๋‹น ๋ชจ๋ธ์€ ๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€์˜ ์žฌ์›์œผ๋กœ ํ•œ๊ตญ์ง€๋Šฅ์ •๋ณด์‚ฌํšŒ์ง„ํฅ์›์˜ ์ง€์›์„ ๋ฐ›์•„ ๊ตฌ์ถ•๋œ ์ž์œ ๋Œ€ํ™” ์Œ์„ฑ(์ผ๋ฐ˜๋‚จ์—ฌ), ๋‹คํ™”์ž ์Œ์„ฑํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ, ๋ฐฉ์†ก ์ฝ˜ํ…์ธ  ๋Œ€ํ™”์ฒด ์Œ์„ฑ์ธ์‹ ๋ฐ์ดํ„ฐ ์—์„œ ์•ฝ 4,000์‹œ๊ฐ„์„ ์ถ”์ถœํ•ด ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Training Procedure

์› ๋…ผ๋ฌธ๊ณผ ๋™์ผํ•˜๊ฒŒ MFCC ๊ธฐ๋ฐ˜์œผ๋กœ Base ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ๋‹ค์Œ, 500 cluster๋กœ k-means๋ฅผ ์ˆ˜ํ–‰ํ•ด ๋‹ค์‹œ Base์™€ Large ๋ชจ๋ธ์„ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.

Training Hyperparameters

Hyperparameter Base Large
Warmup Steps 32,000 32,000
Learning Rates 5e-4 1.5e-3
Batch Size 128 128
Weight Decay 0.01 0.01
Max Steps 400,000 400,000
Learning Rate Decay 0.1 0.1
Adamฮฒ1Adam\beta_1 0.9 0.9
Adamฮฒ2Adam\beta_2 0.99 0.99