Spaces:

Tamerstito
/

translate-audio

Sleeping

App Files Files Community

Tamerstito commited on Mar 27

Commit

a79b400

verified ·

1 Parent(s): 8efba52

Delete NeMo-2.2.0

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

NeMo-2.2.0/CITATION.cff +0 -41
NeMo-2.2.0/CONTRIBUTING.md +0 -79
NeMo-2.2.0/Dockerfile.ci +0 -86
NeMo-2.2.0/Dockerfile.speech +0 -190
NeMo-2.2.0/LICENSE +0 -201
NeMo-2.2.0/MANIFEST.in +0 -1
NeMo-2.2.0/README.md +0 -723
NeMo-2.2.0/codecov.yml +0 -7
NeMo-2.2.0/docs/Makefile +0 -216
NeMo-2.2.0/docs/source/_static/css/custom.css +0 -372
NeMo-2.2.0/docs/source/_static/js/pk_scripts.js +0 -19
NeMo-2.2.0/docs/source/_templates/layout.html +0 -14
NeMo-2.2.0/docs/source/apis.rst +0 -49
NeMo-2.2.0/docs/source/asr/all_chkpt.rst +0 -236
NeMo-2.2.0/docs/source/asr/api.rst +0 -343
NeMo-2.2.0/docs/source/asr/asr_all.bib +0 -1043
NeMo-2.2.0/docs/source/asr/asr_language_modeling_and_customization.rst +0 -663
NeMo-2.2.0/docs/source/asr/configs.rst +0 -1122
NeMo-2.2.0/docs/source/asr/data/asrlm_results.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_by.csv +0 -4
NeMo-2.2.0/docs/source/asr/data/benchmark_ca.csv +0 -3
NeMo-2.2.0/docs/source/asr/data/benchmark_canary.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_cn.csv +0 -3
NeMo-2.2.0/docs/source/asr/data/benchmark_code_switching.csv +0 -3
NeMo-2.2.0/docs/source/asr/data/benchmark_cs.csv +0 -4
NeMo-2.2.0/docs/source/asr/data/benchmark_de.csv +0 -9
NeMo-2.2.0/docs/source/asr/data/benchmark_en.csv +0 -37
NeMo-2.2.0/docs/source/asr/data/benchmark_eo.csv +0 -3
NeMo-2.2.0/docs/source/asr/data/benchmark_es.csv +0 -11
NeMo-2.2.0/docs/source/asr/data/benchmark_fa.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_fastconformer_hybrid.csv +0 -16
NeMo-2.2.0/docs/source/asr/data/benchmark_fr.csv +0 -11
NeMo-2.2.0/docs/source/asr/data/benchmark_hi.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_hr.csv +0 -5
NeMo-2.2.0/docs/source/asr/data/benchmark_it.csv +0 -6
NeMo-2.2.0/docs/source/asr/data/benchmark_jp.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_ka.csv +0 -3
NeMo-2.2.0/docs/source/asr/data/benchmark_kab.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_kz.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_mr.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_multilingual.csv +0 -5
NeMo-2.2.0/docs/source/asr/data/benchmark_nl.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_parakeet.csv +0 -7
NeMo-2.2.0/docs/source/asr/data/benchmark_pl.csv +0 -4
NeMo-2.2.0/docs/source/asr/data/benchmark_ru.csv +0 -7
NeMo-2.2.0/docs/source/asr/data/benchmark_rw.csv +0 -3
NeMo-2.2.0/docs/source/asr/data/benchmark_ua.csv +0 -4
NeMo-2.2.0/docs/source/asr/data/benchmark_uz.csv +0 -2
NeMo-2.2.0/docs/source/asr/data/benchmark_zh.csv +0 -4
NeMo-2.2.0/docs/source/asr/data/scores/be/conformer_be.csv +0 -3

NeMo-2.2.0/CITATION.cff DELETED Viewed

@@ -1,41 +0,0 @@
-cff-version: 1.2.0
-message: "If you use this software, please cite it as below."
-title: "NeMo: a toolkit for Conversational AI and Large Language Models"
-url: https://nvidia.github.io/NeMo/
-repository-code: https://github.com/NVIDIA/NeMo
-authors:
-  - family-names: Harper
-    given-names: Eric
-  - family-names: Majumdar
-    given-names: Somshubra
-  - family-names: Kuchaiev
-    given-names: Oleksii
-  - family-names: Jason
-    given-names: Li
-  - family-names: Zhang
-    given-names: Yang
-  - family-names: Bakhturina
-    given-names: Evelina
-  - family-names: Noroozi
-    given-names: Vahid
-  - family-names: Subramanian
-    given-names: Sandeep
-  - family-names: Nithin
-    given-names: Koluguri
-  - family-names: Jocelyn
-    given-names: Huang
-  - family-names: Jia
-    given-names: Fei
-  - family-names: Balam
-    given-names: Jagadeesh
-  - family-names: Yang
-    given-names: Xuesong
-  - family-names: Livne
-    given-names: Micha
-  - family-names: Dong
-    given-names: Yi
-  - family-names: Naren
-    given-names: Sean
-  - family-names: Ginsburg
-    given-names: Boris

NeMo-2.2.0/CONTRIBUTING.md DELETED Viewed

@@ -1,79 +0,0 @@
-# Contributions are welcome!
-We do all of NeMo's development in the open. Contributions from NeMo community are welcome.
-# Pull Requests (PR) Guidelines
-**Send your PRs to the `main` branch**
-1) Make sure your PR does one thing. Have a clear answer to "What does this PR do?".
-2) Read General Principles and style guide below
-3) Make sure you sign your commits. E.g. use ``git commit -s`` when before your commit
-4) Make sure all unittests finish successfully before sending PR ``pytest`` or (if yor dev box does not have GPU) ``pytest --cpu`` from NeMo's root folder
-5) Send your PR and request a review
-## Unit tests
-Quick tests (locally, while developing)
-```
-pytest
-# If you don't have NVIDIA GPU do:
-# pytest --cpu
-```
-Full tests, including pre-trained model downloads
-```
-pytest --with_downloads
-```
-## Whom should you ask for review:
-1. For changes to NeMo's core: @ericharper, @titu1994, @blisc, or @okuchaiev
-1. For changes to NeMo's ASR collection: @titu1994, @redoctopus, @jbalam-nv, or @okuchaiev
-1. For changes to NeMo's NLP collection: @MaximumEntropy, @ericharper, @ekmb, @yzhang123, @VahidooX, @vladgets, or @okuchaiev
-1. For changes to NeMo's TTS collection: @blisc, or @okuchaiev
-Note that some people may self-assign to review your PR - in which case, please wait for them to add a review.
-Your  pull requests must pass all checks and peer-review before they can be merged.
-# General principles
-1. **User-oriented**: make it easy for end users, even at the cost of writing more code in the background
-1. **Robust**: make it hard for users to make mistakes.
-1. **Well-tested**: please add simple, fast unittests. Consider adding CI tests for end-to-end functionality.
-1. **Reusable**: for every piece of code, think about how it can be reused in the future and make it easy to be reused.
-1. **Readable**: code should be easier to read.
-1. **Legal**: if you copy even one line of code from the Internet, make sure that the code allows the license that NeMo supports. Give credit and link back to the code.
-1. **Sensible**: code should make sense. If you think a piece of code might be confusing, write comments.
-## Class naming conventions
-* No “I”, “Interface”, “NM” nor “NeMo” pre/postfixes anywhere
-* Core interfaces have simple names: Typing, Cloud, Serialization, FileIO*
-* Core classes have the simplest names ever: NeuralModule, Model, Graph, Dataset, Loss, Module*
-* Abstract classes in the Model hierarchy have Model postfix
-* A config class for MyModel should be called MyModelConfig
-* Leaf Neural Module classes have simple names without any postfixes (e.g. AudioPreprocess)
-* Leaf Datasets have Dataset postfix (e.g. AudioToSpeechLabelDataset)
-* Leaf Losses have Loss postfix (e.g. CTCLoss)
-* Leaf Models do not have any postfix, just name (e.g. QuartzNet)
-## Python style
-We use ``black`` as our style guide. To check whether your code will pass style check (from the NeMo's repo folder) run:
-``python setup.py style`` and if it does not pass run ``python setup.py style --fix``.
-1. Include docstrings for every class and method exposed to the user.
-1. Use Python 3 type hints for every class and method exposed to the user.
-1. Avoid wild import: ``from X import *`` unless in ``X.py``, ``__all__`` is defined.
-1. Minimize the use of ``**kwargs``.
-1. ``RaiseError`` is preferred to ``assert``. Write: ```if X: raise Error``` instead of ```assert X```.
-1. Classes are preferred to standalone methods.
-1. Methods should be atomic. A method shouldn't be longer than 75 lines, e.g. can be fit into the computer screen without scrolling.
-1. If a method has arguments that don't fit into one line, each argument should be in its own line for readability.
-1. Add ``__init__.py`` for every folder.
-1. F-strings are prefered to formatted strings.
-1. Loggers are preferred to print. In NeMo, you can use logger from ``from nemo.utils import logging``
-1. Private functions (functions start with ``_``) shouldn't be called outside its host file.
-1. If a comment lasts multiple lines, use ``'''`` instead of ``#``.
-# Collections
-Collection is a logical grouping of related Neural Modules. It is a grouping of modules that share a domain area or semantics.
-When contributing module to a collection, please make sure it belongs to that category.
-If you would like to start a new one and contribute back to the platform, you are very welcome to do so.

NeMo-2.2.0/Dockerfile.ci DELETED Viewed

@@ -1,86 +0,0 @@
-# syntax=docker/dockerfile:1-labs
-# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:24.07-py3
-FROM ${BASE_IMAGE} as nemo-bump
-ARG NEMO_TAG
-WORKDIR /opt
-# NeMo
-RUN <<"EOF" bash -exu
-if [[ ! -d NeMo ]]; then
-    git clone https://github.com/NVIDIA/NeMo.git
-fi
-cd NeMo/
-git fetch origin '+refs/pull/*/merge:refs/remotes/pull/*/merge'
-git fetch origin $NEMO_TAG
-git checkout -f $NEMO_TAG
-EOF
-FROM ${BASE_IMAGE}
-ARG IMAGE_LABEL
-LABEL "nemo.library"=${IMAGE_LABEL}
-ENV TRANSFORMERS_OFFLINE=0
-ENV HYDRA_FULL_ERROR=1
-ENV PYTHONUNBUFFERED=1
-# APT packages
-RUN <<"EOF" bash -ex
-apt-get update
-apt-get install -y bc libsox-fmt-all -y
-apt-get clean
-EOF
-ARG MLM_REPO
-ARG MLM_TAG
-RUN --mount=type=bind,from=nemo-bump,source=/opt/NeMo/reinstall.sh,target=/opt/NeMo/reinstall.sh \
-  bash /opt/NeMo/reinstall.sh --library mcore --mode build && \
-  ls -al /opt/Megatron-LM || true
-WORKDIR /workspace
-RUN \
-  --mount=type=bind,from=nemo-bump,source=/opt/NeMo/requirements,target=/tmp/NeMo/requirements \
-  --mount=type=bind,from=nemo-bump,source=/opt/NeMo/tools/ctc_segmentation/requirements.txt,target=/tmp/NeMo/tools/ctc_segmentation/requirements.txt \
-  --mount=type=bind,from=nemo-bump,source=/opt/NeMo/reinstall.sh,target=/tmp/NeMo/reinstall.sh \
-  --mount=type=bind,from=nemo-bump,source=/opt/NeMo/setup.py,target=/tmp/NeMo/setup.py \
-  --mount=type=bind,from=nemo-bump,source=/opt/NeMo/README.md,target=/tmp/NeMo/README.md \
-  --mount=type=bind,from=nemo-bump,source=/opt/NeMo/nemo/package_info.py,target=/tmp/NeMo/nemo/package_info.py \
-  --mount=type=bind,from=nemo-bump,source=/opt/NeMo/nemo/__init__.py,target=/tmp/NeMo/nemo/__init__.py <<"EOF" bash -ex
-    export NEMO_DIR=/tmp/NeMo
-    bash /tmp/NeMo/reinstall.sh --library mcore --mode install
-    bash /tmp/NeMo/reinstall.sh --library nemo --mode install
-    rm -rf $NEMO_DIR || true
-EOF
-# Copy over NeMo code
-ARG NEMO_REPO
-ARG NEMO_TAG
-RUN \
-  --mount=type=bind,from=nemo-bump,source=/opt/NeMo/reinstall.sh,target=/tmp/reinstall.sh <<"EOF" bash -ex
-  bash /tmp/reinstall.sh --library mcore --mode install
-  bash /tmp/reinstall.sh --library nemo --mode install
-    # Copy into workspace
-    cp -a /opt/NeMo/. /workspace/
-    cp -r /opt/Megatron-LM/ /workspace/
-    # set permission
-    chmod 777 -R /workspace
-EOF
-ENV PYTHONPATH="${PYTHONPATH}:/workspace/Megatron-LM"
-ENV NEMO_HOME="/home/TestData/nemo_home"

NeMo-2.2.0/Dockerfile.speech DELETED Viewed

@@ -1,190 +0,0 @@
-# syntax=docker/dockerfile:experimental
-# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:24.02-py3
-# build an image that includes only the nemo dependencies, ensures that dependencies
-# are included first for optimal caching, and useful for building a development
-# image (by specifying build target as `nemo-deps`)
-FROM ${BASE_IMAGE} as nemo-deps
-# dependency flags; should be declared after FROM
-# torchaudio: not required by default
-ARG REQUIRE_TORCHAUDIO=false
-# k2: not required by default
-ARG REQUIRE_K2=false
-# ais cli: not required by default, install only if required
-ARG REQUIRE_AIS_CLI=false
-# Ensure apt-get won't prompt for selecting options
-ENV DEBIAN_FRONTEND=noninteractive
-# libavdevice-dev required for latest torchaudio
-RUN apt-get update && \
-  apt-get upgrade -y && \
-  apt-get install -y \
-  libsndfile1 sox \
-  libfreetype6 \
-  swig \
-  ffmpeg \
-  libavdevice-dev && \
-  rm -rf /var/lib/apt/lists/*
-# libtool, ... , libgts-dev are required for graphviz
-# graphviz is required for k2 and pynini visualization
-RUN apt-get update && \
-  apt-get install -y \
-  libtool \
-  libltdl-dev \
-  automake \
-  autoconf \
-  bison \
-  flex \
-  tcl \
-  ghostscript \
-  libgd-dev \
-  fontconfig \
-  libcairo2-dev \
-  libpango1.0-dev \
-  libgts-dev && \
-  rm -rf /var/lib/apt/lists/*
-WORKDIR /workspace/
-ARG TE_TAG=7d576ed25266a17a7b651f2c12e8498f67e0baea
-ARG MCORE_TAG=338af51452a53982d202e8386db6233adad1ce86
-ARG APEX_TAG=810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c
-# Install megatron core, this can be removed once 0.3 pip package is released
-# We leave it here in case we need to work off of a specific commit in main
-RUN git clone https://github.com/NVIDIA/Megatron-LM.git && \
-  cd Megatron-LM && \
-  git checkout ${MCORE_TAG} && \
-  pip install .
-# Performance optimizations for distributed optimizer: https://github.com/NVIDIA/apex/pull/1771
-RUN git clone https://github.com/NVIDIA/apex.git && \
-  cd apex && \
-  git checkout ${APEX_TAG} && \
-  pip install -v --no-build-isolation --disable-pip-version-check --no-cache-dir \
-    --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./
-# Transformer Engine 1.2.0
-RUN git clone https://github.com/NVIDIA/TransformerEngine.git && \
-  cd TransformerEngine && \
-  git fetch origin ${TE_TAG} && \
-  git checkout FETCH_HEAD && \
-  git submodule init && git submodule update && \
-  NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .
-WORKDIR /tmp/
-# uninstall stuff from base container
-RUN pip3 uninstall -y sacrebleu torchtext
-# build torchaudio
-WORKDIR /tmp/torchaudio_build
-COPY scripts/installers /tmp/torchaudio_build/scripts/installers/
-RUN INSTALL_MSG=$(/bin/bash /tmp/torchaudio_build/scripts/installers/install_torchaudio_latest.sh); INSTALL_CODE=$?; \
-  echo ${INSTALL_MSG}; \
-  if [ ${INSTALL_CODE} -ne 0 ]; then \
-  echo "torchaudio installation failed";  \
-  if [ "${REQUIRE_TORCHAUDIO}" = true ]; then \
-  exit ${INSTALL_CODE};  \
-  else echo "Skipping failed torchaudio installation"; fi \
-  else echo "torchaudio installed successfully"; fi
-COPY scripts /tmp/nemo/scripts/
-# install correct graphviz version (k2 and pynini visualization tool), skip if installation fails
-RUN INSTALL_MSG=$(/bin/bash /tmp/nemo/scripts/installers/install_graphviz.sh --docker); INSTALL_CODE=$?; \
-  echo ${INSTALL_MSG}; \
-  if [ ${INSTALL_CODE} -ne 0 ]; then \
-  echo "graphviz installation failed";  \
-  if [ "${REQUIRE_K2}" = true ]; then \
-  exit ${INSTALL_CODE};  \
-  else echo "Skipping failed graphviz installation"; fi \
-  else echo "graphviz installed successfully"; fi
-# install k2, skip if installation fails
-COPY scripts /tmp/nemo/scripts/
-RUN INSTALL_MSG=$(/bin/bash /tmp/nemo/scripts/installers/install_k2.sh); INSTALL_CODE=$?; \
-  echo ${INSTALL_MSG}; \
-  if [ ${INSTALL_CODE} -ne 0 ]; then \
-  echo "k2 installation failed";  \
-  if [ "${REQUIRE_K2}" = true ]; then \
-  exit ${INSTALL_CODE};  \
-  else echo "Skipping failed k2 installation"; fi \
-  else echo "k2 installed successfully"; fi
-# install nemo dependencies
-WORKDIR /tmp/nemo
-ENV LHOTSE_REQUIRE_TORCHAUDIO=0
-COPY requirements .
-# exclude requirements_vllm.txt, since `vllm==0.5.x` breaks the container due to hardcoded requirements `torch==2.3.0`
-RUN for f in $(ls requirements*.txt | grep -v 'requirements_vllm.txt'); do \
-    pip3 install --disable-pip-version-check --no-cache-dir -r $f; done
-# install flash attention
-RUN pip install flash-attn
-# install numba for latest containers
-RUN pip install numba>=0.57.1
-# copy nemo source into a scratch image
-FROM scratch as nemo-src
-COPY . .
-# start building the final container
-FROM nemo-deps as nemo
-ARG NEMO_VERSION=2.0.0
-# Check that NEMO_VERSION is set. Build will fail without this. Expose NEMO and base container
-# version information as runtime environment variable for introspection purposes
-RUN /usr/bin/test -n "$NEMO_VERSION" && \
-  /bin/echo "export NEMO_VERSION=${NEMO_VERSION}" >> /root/.bashrc && \
-  /bin/echo "export BASE_IMAGE=${BASE_IMAGE}" >> /root/.bashrc
-# Install NeMo
-RUN --mount=from=nemo-src,target=/tmp/nemo,rw cd /tmp/nemo && pip install ".[all]"
-# Check install
-# NB: adjusting LD_LIBRARY_PATH (only here, should not be persistent!) is a temporary hack
-# to avoid failure if CUDA is unavailable (`docker build` does not expose GPUs)
-# The error is raised in NeMo Core, and the main reason is reinstalled Transformer-Engine;
-RUN export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_HOME}/compat/lib.real && \
-  python -c "import nemo.collections.asr as nemo_asr" && \
-  python -c "import nemo.collections.nlp as nemo_nlp" && \
-  python -c "import nemo.collections.tts as nemo_tts" && \
-  python -c "import nemo_text_processing.text_normalization as text_normalization"
-# copy scripts/examples/tests into container for end user
-WORKDIR /workspace/nemo
-COPY scripts /workspace/nemo/scripts
-COPY examples /workspace/nemo/examples
-COPY tests /workspace/nemo/tests
-COPY tutorials /workspace/nemo/tutorials
-# COPY README.rst LICENSE /workspace/nemo/
-RUN printf "#!/bin/bash\njupyter lab --no-browser --allow-root --ip=0.0.0.0" >> start-jupyter.sh && \
-  chmod +x start-jupyter.sh
-# If required, install AIS CLI and Python AIS SDK
-RUN INSTALL_MSG=$(/bin/bash /tmp/nemo/scripts/installers/install_ais_cli_latest.sh && pip install aistore); INSTALL_CODE=$?; \
-  echo ${INSTALL_MSG}; \
-  if [ ${INSTALL_CODE} -ne 0 ]; then \
-  echo "AIS CLI installation failed"; \
-  if [ "${REQUIRE_AIS_CLI}" = true ]; then \
-  exit ${INSTALL_CODE}; \
-  else echo "Skipping AIS CLI installation"; fi \
-  else echo "AIS CLI installed successfully"; fi

NeMo-2.2.0/LICENSE DELETED Viewed

@@ -1,201 +0,0 @@
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-   1. Definitions.
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-   END OF TERMS AND CONDITIONS
-   APPENDIX: How to apply the Apache License to your work.
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-   Copyright [yyyy] [name of copyright owner]
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-       http://www.apache.org/licenses/LICENSE-2.0
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.

NeMo-2.2.0/MANIFEST.in DELETED Viewed

	@@ -1 +0,0 @@
1	- include requirements/*

NeMo-2.2.0/README.md DELETED Viewed

@@ -1,723 +0,0 @@
-[![Project Status: Active -- The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
-[![Documentation](https://readthedocs.com/projects/nvidia-nemo/badge/?version=main)](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/)
-[![CodeQL](https://github.com/nvidia/nemo/actions/workflows/codeql.yml/badge.svg?branch=main&event=push)](https://github.com/nvidia/nemo/actions/workflows/codeql.yml)
-[![NeMo core license and license for collections in this repo](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://github.com/NVIDIA/NeMo/blob/master/LICENSE)
-[![Release version](https://badge.fury.io/py/nemo-toolkit.svg)](https://badge.fury.io/py/nemo-toolkit)
-[![Python version](https://img.shields.io/pypi/pyversions/nemo-toolkit.svg)](https://badge.fury.io/py/nemo-toolkit)
-[![PyPi total downloads](https://static.pepy.tech/personalized-badge/nemo-toolkit?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=downloads)](https://pepy.tech/project/nemo-toolkit)
-[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
-# **NVIDIA NeMo Framework**
-## Latest News
-<!-- markdownlint-disable -->
-<details open>
-  <summary><b>NeMo 2.0</b></summary>
-      We've released NeMo 2.0, an update on the NeMo Framework which prioritizes modularity and ease-of-use. Please refer to the <a href=https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html>NeMo Framework User Guide</a> to get started.
-</details>
-<details open>
-  <summary><b>New Cosmos World Foundation Models Support</b></summary>
-    <details>
-      <summary> <a href="https://developer.nvidia.com/blog/advancing-physical-ai-with-nvidia-cosmos-world-foundation-model-platform">Advancing Physical AI with NVIDIA Cosmos World Foundation Model Platform </a> (2025-01-09)
-      </summary>
-        The end-to-end NVIDIA Cosmos platform accelerates world model development for physical AI systems. Built on CUDA, Cosmos combines state-of-the-art world foundation models, video tokenizers, and AI-accelerated data processing pipelines. Developers can accelerate world model development by fine-tuning Cosmos world foundation models or building new ones from the ground up. These models create realistic synthetic videos of environments and interactions, providing a scalable foundation for training complex systems, from simulating humanoid robots performing advanced actions to developing end-to-end autonomous driving models.
-        <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://developer.nvidia.com/blog/accelerate-custom-video-foundation-model-pipelines-with-new-nvidia-nemo-framework-capabilities/">
-          Accelerate Custom Video Foundation Model Pipelines with New NVIDIA NeMo Framework Capabilities
-        </a> (2025-01-07)
-      </summary>
-        The NeMo Framework now supports training and customizing the <a href="https://github.com/NVIDIA/Cosmos">NVIDIA Cosmos</a> collection of world foundation models. Cosmos leverages advanced text-to-world generation techniques to create fluid, coherent video content from natural language prompts.
-        <br><br>
-        You can also now accelerate your video processing step using the <a href="https://developer.nvidia.com/nemo-curator-video-processing-early-access">NeMo Curator</a> library, which provides optimized video processing and captioning features that can deliver up to 89x faster video processing when compared to an unoptimized CPU pipeline.
-      <br><br>
-    </details>
-</details>
-<details open>
-  <summary><b>Large Language Models and Multimodal Models</b></summary>
-    <details>
-      <summary>
-        <a href="https://developer.nvidia.com/blog/state-of-the-art-multimodal-generative-ai-model-development-with-nvidia-nemo/">
-          State-of-the-Art Multimodal Generative AI Model Development with NVIDIA NeMo
-        </a> (2024-11-06)
-      </summary>
-        NVIDIA recently announced significant enhancements to the NeMo platform, focusing on multimodal generative AI models. The update includes NeMo Curator and the Cosmos tokenizer, which streamline the data curation process and enhance the quality of visual data. These tools are designed to handle large-scale data efficiently, making it easier to develop high-quality AI models for various applications, including robotics and autonomous driving. The Cosmos tokenizers, in particular, efficiently map visual data into compact, semantic tokens, which is crucial for training large-scale generative models. The tokenizer is available now on the <a href=http://github.com/NVIDIA/cosmos-tokenizer/NVIDIA/cosmos-tokenizer>NVIDIA/cosmos-tokenizer</a> GitHub repo and on <a href=https://huggingface.co/nvidia/Cosmos-Tokenizer-CV8x8x8>Hugging Face</a>.
-      <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/llama/index.html#new-llama-3-1-support for more information/">
-        New Llama 3.1 Support
-        </a> (2024-07-23)
-      </summary>
-        The NeMo Framework now supports training and customizing the Llama 3.1 collection of LLMs from Meta.
-      <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://aws.amazon.com/blogs/machine-learning/accelerate-your-generative-ai-distributed-training-workloads-with-the-nvidia-nemo-framework-on-amazon-eks/">
-          Accelerate your Generative AI Distributed Training Workloads with the NVIDIA NeMo Framework on Amazon EKS
-        </a> (2024-07-16)
-      </summary>
-     NVIDIA NeMo Framework now runs distributed training workloads on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. For step-by-step instructions on creating an EKS cluster and running distributed training workloads with NeMo, see the GitHub repository <a href="https://github.com/aws-samples/awsome-distributed-training/tree/main/3.test_cases/2.nemo-launcher/EKS/"> here.</a>
-      <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://developer.nvidia.com/blog/nvidia-nemo-accelerates-llm-innovation-with-hybrid-state-space-model-support/">
-          NVIDIA NeMo Accelerates LLM Innovation with Hybrid State Space Model Support
-        </a> (2024/06/17)
-      </summary>
-     NVIDIA NeMo and Megatron Core now support pre-training and fine-tuning of state space models (SSMs). NeMo also supports training models based on the Griffin architecture as described by Google DeepMind.
-      <br><br>
-    </details>
-      <details>
-      <summary>
-        <a href="https://huggingface.co/models?sort=trending&search=nvidia%2Fnemotron-4-340B">
-          NVIDIA releases 340B base, instruct, and reward models pretrained on a total of 9T tokens.
-        </a> (2024-06-18)
-      </summary>
-      See documentation and tutorials for SFT, PEFT, and PTQ with
-      <a href="https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron/index.html">
-        Nemotron 340B
-      </a>
-      in the NeMo Framework User Guide.
-      <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://developer.nvidia.com/blog/nvidia-sets-new-generative-ai-performance-and-scale-records-in-mlperf-training-v4-0/">
-          NVIDIA sets new generative AI performance and scale records in MLPerf Training v4.0
-        </a> (2024/06/12)
-      </summary>
-      Using NVIDIA NeMo Framework and NVIDIA Hopper GPUs NVIDIA was able to scale to 11,616 H100 GPUs and achieve near-linear performance scaling on LLM pretraining.
-      NVIDIA also achieved the highest LLM fine-tuning performance and raised the bar for text-to-image training.
-      <br><br>
-    </details>
-    <details>
-        <summary>
-          <a href="https://cloud.google.com/blog/products/compute/gke-and-nvidia-nemo-framework-to-train-generative-ai-models">
-            Accelerate your generative AI journey with NVIDIA NeMo Framework on GKE
-          </a> (2024/03/16)
-        </summary>
-        An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke.
-        The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework.
-        <br><br>
-      </details>
-</details>
-<details open>
-  <summary><b>Speech Recognition</b></summary>
-  <details>
-      <summary>
-        <a href="https://developer.nvidia.com/blog/accelerating-leaderboard-topping-asr-models-10x-with-nvidia-nemo/">
-          Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo
-        </a> (2024/09/24)
-      </summary>
-      NVIDIA NeMo team released a number of inference optimizations for CTC, RNN-T, and TDT models that resulted in up to 10x inference speed-up.
-      These models now exceed an inverse real-time factor (RTFx) of 2,000, with some reaching RTFx of even 6,000.
-      <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://developer.nvidia.com/blog/new-standard-for-speech-recognition-and-translation-from-the-nvidia-nemo-canary-model/">
-          New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model
-        </a> (2024/04/18)
-      </summary>
-      The NeMo team just released Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization.
-      Canary also provides bi-directional translation, between English and the three other supported languages.
-      <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/">
-          Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models
-        </a> (2024/04/18)
-      </summary>
-      NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises—released the Parakeet family of automatic speech recognition (ASR) models.
-      These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy.
-      <br><br>
-    </details>
-  <details>
-    <summary>
-      <a href="https://developer.nvidia.com/blog/turbocharge-asr-accuracy-and-speed-with-nvidia-nemo-parakeet-tdt/">
-        Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT
-      </a> (2024/04/18)
-    </summary>
-    NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere—on any cloud and on-premises—recently released Parakeet-TDT.
-    This new addition to the  NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B.
-    <br><br>
-  </details>
-</details>
-<!-- markdownlint-enable -->
-## Introduction
-NVIDIA NeMo Framework is a scalable and cloud-native generative AI
-framework built for researchers and PyTorch developers working on Large
-Language Models (LLMs), Multimodal Models (MMs), Automatic Speech
-Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV)
-domains. It is designed to help you efficiently create, customize, and
-deploy new generative AI models by leveraging existing code and
-pre-trained model checkpoints.
-For technical documentation, please see the [NeMo Framework User
-Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html).
-## What's New in NeMo 2.0
-NVIDIA NeMo 2.0 introduces several significant improvements over its predecessor, NeMo 1.0, enhancing flexibility, performance, and scalability.
-- **Python-Based Configuration** - NeMo 2.0 transitions from YAML files to a Python-based configuration, providing more flexibility and control. This shift makes it easier to extend and customize configurations programmatically.
-- **Modular Abstractions** - By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 simplifies adaptation and experimentation. This modular approach allows developers to more easily modify and experiment with different components of their models.
-- **Scalability** - NeMo 2.0 seamlessly scaling large-scale experiments across thousands of GPUs using [NeMo-Run](https://github.com/NVIDIA/NeMo-Run), a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.
-Overall, these enhancements make NeMo 2.0 a powerful, scalable, and user-friendly framework for AI model development.
-> [!IMPORTANT]
-> NeMo 2.0 is currently supported by the LLM (large language model) and VLM (vision language model) collections.
-### Get Started with NeMo 2.0
-- Refer to the [Quickstart](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html) for examples of using NeMo-Run to launch NeMo 2.0 experiments locally and on a slurm cluster.
-- For more information about NeMo 2.0, see the [NeMo Framework User Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html).
-- [NeMo 2.0 Recipes](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes) contains additional examples of launching large-scale runs using NeMo 2.0 and NeMo-Run.
-- For an in-depth exploration of the main features of NeMo 2.0, see the [Feature Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/index.html#feature-guide).
-- To transition from NeMo 1.0 to 2.0, see the [Migration Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/index.html#migration-guide) for step-by-step instructions.
-### Get Started with Cosmos
-NeMo Curator and NeMo Framework support video curation and post-training of the Cosmos World Foundation Models, which are open and available on [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/cosmos/collections/cosmos) and [Hugging Face](https://huggingface.co/collections/nvidia/cosmos-6751e884dc10e013a0a0d8e6). For more information on video datasets, refer to [NeMo Curator](https://developer.nvidia.com/nemo-curator). To post-train World Foundation Models using the NeMo Framework for your custom physical AI tasks, see the [Cosmos Diffusion models](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/post_training/README.md) and the [Cosmos Autoregressive models](https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/post_training/README.md).
-## LLMs and MMs Training, Alignment, and Customization
-All NeMo models are trained with
-[Lightning](https://github.com/Lightning-AI/lightning). Training is
-automatically scalable to 1000s of GPUs. You can check the performance benchmarks using the
-latest NeMo Framework container [here](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance_summary.html).
-When applicable, NeMo models leverage cutting-edge distributed training
-techniques, incorporating [parallelism
-strategies](https://docs.nvidia.com/nemo-framework/user-guide/latest/modeloverview.html)
-to enable efficient training of very large models. These techniques
-include Tensor Parallelism (TP), Pipeline Parallelism (PP), Fully
-Sharded Data Parallelism (FSDP), Mixture-of-Experts (MoE), and Mixed
-Precision Training with BFloat16 and FP8, as well as others.
-NeMo Transformer-based LLMs and MMs utilize [NVIDIA Transformer
-Engine](https://github.com/NVIDIA/TransformerEngine) for FP8 training on
-NVIDIA Hopper GPUs, while leveraging [NVIDIA Megatron
-Core](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core) for
-scaling Transformer model training.
-NeMo LLMs can be aligned with state-of-the-art methods such as SteerLM,
-Direct Preference Optimization (DPO), and Reinforcement Learning from
-Human Feedback (RLHF). See [NVIDIA NeMo
-Aligner](https://github.com/NVIDIA/NeMo-Aligner) for more information.
-In addition to supervised fine-tuning (SFT), NeMo also supports the
-latest parameter efficient fine-tuning (PEFT) techniques such as LoRA,
-P-Tuning, Adapters, and IA3. Refer to the [NeMo Framework User
-Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/index.html)
-for the full list of supported models and techniques.
-## LLMs and MMs Deployment and Optimization
-NeMo LLMs and MMs can be deployed and optimized with [NVIDIA NeMo
-Microservices](https://developer.nvidia.com/nemo-microservices-early-access).
-## Speech AI
-NeMo ASR and TTS models can be optimized for inference and deployed for
-production use cases with [NVIDIA Riva](https://developer.nvidia.com/riva).
-## NeMo Framework Launcher
-> [!IMPORTANT]
-> NeMo Framework Launcher is compatible with NeMo version 1.0 only. [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) is recommended for launching experiments using NeMo 2.0.
-[NeMo Framework
-Launcher](https://github.com/NVIDIA/NeMo-Megatron-Launcher) is a
-cloud-native tool that streamlines the NeMo Framework experience. It is
-used for launching end-to-end NeMo Framework training jobs on CSPs and
-Slurm clusters.
-The NeMo Framework Launcher includes extensive recipes, scripts,
-utilities, and documentation for training NeMo LLMs. It also includes
-the NeMo Framework [Autoconfigurator](https://github.com/NVIDIA/NeMo-Megatron-Launcher#53-using-autoconfigurator-to-find-the-optimal-configuration),
-which is designed to find the optimal model parallel configuration for
-training on a specific cluster.
-To get started quickly with the NeMo Framework Launcher, please see the
-[NeMo Framework
-Playbooks](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html).
-The NeMo Framework Launcher does not currently support ASR and TTS
-training, but it will soon.
-## Get Started with NeMo Framework
-Getting started with NeMo Framework is easy. State-of-the-art pretrained
-NeMo models are freely available on [Hugging Face
-Hub](https://huggingface.co/models?library=nemo&sort=downloads&search=nvidia)
-and [NVIDIA
-NGC](https://catalog.ngc.nvidia.com/models?query=nemo&orderBy=weightPopularDESC).
-These models can be used to generate text or images, transcribe audio,
-and synthesize speech in just a few lines of code.
-We have extensive
-[tutorials](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/tutorials.html)
-that can be run on [Google Colab](https://colab.research.google.com) or
-with our [NGC NeMo Framework
-Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo).
-We also have
-[playbooks](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html)
-for users who want to train NeMo models with the NeMo Framework
-Launcher.
-For advanced users who want to train NeMo models from scratch or
-fine-tune existing NeMo models, we have a full suite of [example
-scripts](https://github.com/NVIDIA/NeMo/tree/main/examples) that support
-multi-GPU/multi-node training.
-## Key Features
-- [Large Language Models](nemo/collections/nlp/README.md)
-- [Multimodal](nemo/collections/multimodal/README.md)
-- [Automatic Speech Recognition](nemo/collections/asr/README.md)
-- [Text to Speech](nemo/collections/tts/README.md)
-- [Computer Vision](nemo/collections/vision/README.md)
-## Requirements
-- Python 3.10 or above
-- Pytorch 1.13.1 or above
-- NVIDIA GPU (if you intend to do model training)
-## Developer Documentation
-| Version | Status                                                                                                                                                              | Description                                                                                                                    |
-| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
-| Latest  | [![Documentation Status](https://readthedocs.com/projects/nvidia-nemo/badge/?version=main)](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/)     | [Documentation of the latest (i.e. main) branch.](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/)          |
-| Stable  | [![Documentation Status](https://readthedocs.com/projects/nvidia-nemo/badge/?version=stable)](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/) | [Documentation of the stable (i.e. most recent release)](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/) |
-## Install NeMo Framework
-The NeMo Framework can be installed in a variety of ways, depending on
-your needs. Depending on the domain, you may find one of the following
-installation methods more suitable.
-- Conda / Pip - Refer to [Conda](#conda) and [Pip](#pip) for
-  installation instructions.
-  - This is the recommended method for ASR and TTS domains.
-  - When using a Nvidia PyTorch container as the base, this is the
-      recommended method for all domains.
-- Docker Containers - Refer to [Docker containers](#docker-containers)
-  for installation instructions.
-  - NeMo Framework container -
-      [nvcr.io/nvidia/nemo:24.05]{.title-ref}
-- LLMs and MMs Dependencies - Refer to [LLMs and MMs
-    Dependencies](#install-llms-and-mms-dependencies) for installation
-    instructions.
-**Important: We strongly recommended that you start with a base NVIDIA
-PyTorch container: nvcr.io/nvidia/pytorch:24.02-py3.**
-### Conda
-Install NeMo in a fresh Conda environment:
-```bash
-conda create --name nemo python==3.10.12
-conda activate nemo
-```
-Install PyTorch using their
-[configurator](https://pytorch.org/get-started/locally/):
-```bash
-conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
-```
-The command to install PyTorch may depend on your system. Use the
-configurator linked above to find the right command for your system.
-Then, install NeMo via Pip or from Source. We do not provide NeMo on the
-conda-forge or any other Conda channel.
-### Pip
-To install the nemo_toolkit, use the following installation method:
-```bash
-apt-get update && apt-get install -y libsndfile1 ffmpeg
-pip install Cython packaging
-pip install nemo_toolkit['all']
-```
-Depending on the shell used, you may need to use the
-`"nemo_toolkit[all]"` specifier instead in the above command.
-### Pip from a Specific Domain
-To install a specific domain of NeMo, you must first install the
-nemo_toolkit using the instructions listed above. Then, you run the
-following domain-specific commands:
-```bash
-pip install nemo_toolkit['asr']
-pip install nemo_toolkit['nlp']
-pip install nemo_toolkit['tts']
-pip install nemo_toolkit['vision']
-pip install nemo_toolkit['multimodal']
-```
-### Pip from a Source Branch
-If you want to work with a specific version of NeMo from a particular
-GitHub branch (e.g main), use the following installation method:
-```bash
-apt-get update && apt-get install -y libsndfile1 ffmpeg
-pip install Cython packaging
-python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all]
-```
-### Build from Source
-If you want to clone the NeMo GitHub repository and contribute to NeMo
-open-source development work, use the following installation method:
-```bash
-apt-get update && apt-get install -y libsndfile1 ffmpeg
-git clone https://github.com/NVIDIA/NeMo
-cd NeMo
-./reinstall.sh
-```
-If you only want the toolkit without the additional Conda-based
-dependencies, you can replace `reinstall.sh` with `pip install -e .`
-when your PWD is the root of the NeMo repository.
-### Mac Computers with Apple Silicon
-To install NeMo on Mac computers with the Apple M-Series GPU, you need
-to create a new Conda environment, install PyTorch 2.0 or higher, and
-then install the nemo_toolkit.
-**Important: This method is only applicable to the ASR domain.**
-Run the following code:
-```shell
-# [optional] install mecab using Homebrew, to use sacrebleu for NLP collection
-# you can install Homebrew here: https://brew.sh
-brew install mecab
-# [optional] install pynini using Conda, to use text normalization
-conda install -c conda-forge pynini
-# install Cython manually
-pip install cython packaging
-# clone the repo and install in development mode
-git clone https://github.com/NVIDIA/NeMo
-cd NeMo
-pip install 'nemo_toolkit[all]'
-# Note that only the ASR toolkit is guaranteed to work on MacBook - so for MacBook use pip install 'nemo_toolkit[asr]'
-```
-### Windows Computers
-To install the Windows Subsystem for Linux (WSL), run the following code
-in PowerShell:
-```shell
-wsl --install
-# [note] If you run wsl --install and see the WSL help text, it means WSL is already installed.
-```
-To learn more about installing WSL, refer to [Microsoft\'s official
-documentation](https://learn.microsoft.com/en-us/windows/wsl/install).
-After installing your Linux distribution with WSL, two options are
-available:
-**Option 1:** Open the distribution (Ubuntu by default) from the Start
-menu and follow the instructions.
-**Option 2:** Launch the Terminal application. Download it from
-[Microsoft\'s Windows Terminal
-page](https://learn.microsoft.com/en-us/windows/terminal) if not
-installed.
-Next, follow the instructions for Linux systems, as provided above. For
-example:
-```bash
-apt-get update && apt-get install -y libsndfile1 ffmpeg
-git clone https://github.com/NVIDIA/NeMo
-cd NeMo
-./reinstall.sh
-```
-### RNNT
-For optimal performance of a Recurrent Neural Network Transducer (RNNT),
-install the Numba package from Conda.
-Run the following code:
-```bash
-conda remove numba
-pip uninstall numba
-conda install -c conda-forge numba
-```
-## Install LLMs and MMs Dependencies
-If you work with the LLM and MM domains, three additional dependencies
-are required: NVIDIA Apex, NVIDIA Transformer Engine, and NVIDIA
-Megatron Core. When working with the [main]{.title-ref} branch, these
-dependencies may require a recent commit.
-The most recent working versions of these dependencies are here:
-```bash
-export apex_commit=810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c
-export te_commit=bfe21c3d68b0a9951e5716fb520045db53419c5e
-export mcore_commit=02871b4df8c69fac687ab6676c4246e936ce92d0
-export nv_pytorch_tag=24.02-py3
-```
-When using a released version of NeMo, please refer to the [Software
-Component
-Versions](https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html)
-for the correct versions.
-### PyTorch Container
-We recommended that you start with a base NVIDIA PyTorch container:
-nvcr.io/nvidia/pytorch:24.02-py3.
-If starting with a base NVIDIA PyTorch container, you must first launch
-the container:
-```bash
-docker run \
-  --gpus all \
-  -it \
-  --rm \
-  --shm-size=16g \
-  --ulimit memlock=-1 \
-  --ulimit stack=67108864 \
-  nvcr.io/nvidia/pytorch:$nv_pytorch_tag
-```
-Next, you need to install the dependencies.
-### Apex
-NVIDIA Apex is required for LLM and MM domains. Although Apex is
-pre-installed in the NVIDIA PyTorch container, you may need to update it
-to a newer version.
-To install Apex, run the following code:
-```bash
-git clone https://github.com/NVIDIA/apex.git
-cd apex
-git checkout $apex_commit
-pip install . -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam --group_norm"
-```
-When attempting to install Apex separately from the NVIDIA PyTorch
-container, you might encounter an error if the CUDA version on your
-system is different from the one used to compile PyTorch. To bypass this
-error, you can comment out the relevant line in the setup file located
-in the Apex repository on GitHub here:
-<https://github.com/NVIDIA/apex/blob/master/setup.py#L32>.
-cuda-nvprof is needed to install Apex. The version should match the CUDA
-version that you are using.
-To install cuda-nvprof, run the following code:
-```bash
-conda install -c nvidia cuda-nvprof=11.8
-```
-Finally, install the packaging:
-```bash
-pip install packaging
-```
-To install the most recent versions of Apex locally, it might be
-necessary to remove the [pyproject.toml]{.title-ref} file from the Apex
-directory.
-### Transformer Engine
-NVIDIA Transformer Engine is required for LLM and MM domains. Although
-the Transformer Engine is pre-installed in the NVIDIA PyTorch container,
-you may need to update it to a newer version.
-The Transformer Engine facilitates training with FP8 precision on NVIDIA
-Hopper GPUs and introduces many enhancements for the training of
-Transformer-based models. Refer to [Transformer Engine](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html)
-for information.
-To install Transformer Engine, run the following code:
-```bash
-git clone https://github.com/NVIDIA/TransformerEngine.git && \
-cd TransformerEngine && \
-git checkout $te_commit && \
-git submodule init && git submodule update && \
-NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .
-```
-Transformer Engine requires PyTorch to be built with at least CUDA 11.8.
-### Megatron Core
-Megatron Core is required for LLM and MM domains. Megatron Core is a
-library for scaling large Transformer-based models. NeMo LLMs and MMs
-leverage Megatron Core for model parallelism, transformer architectures,
-and optimized PyTorch datasets.
-To install Megatron Core, run the following code:
-```bash
-git clone https://github.com/NVIDIA/Megatron-LM.git && \
-cd Megatron-LM && \
-git checkout $mcore_commit && \
-pip install . && \
-cd megatron/core/datasets && \
-make
-```
-## NeMo Text Processing
-NeMo Text Processing, specifically Inverse Text Normalization, is now a
-separate repository. It is located here:
-<https://github.com/NVIDIA/NeMo-text-processing>.
-## Docker Containers
-NeMo containers are launched concurrently with NeMo version updates.
-NeMo Framework now supports LLMs, MMs, ASR, and TTS in a single
-consolidated Docker container. You can find additional information about
-released containers on the [NeMo releases
-page](https://github.com/NVIDIA/NeMo/releases).
-To use a pre-built container, run the following code:
-```bash
-docker pull nvcr.io/nvidia/nemo:24.05
-```
-To build a nemo container with Dockerfile from a branch, run the
-following code:
-```bash
-DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest
-```
-If you choose to work with the main branch, we recommend using NVIDIA\'s
-PyTorch container version 23.10-py3 and then installing from GitHub.
-```bash
-docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g \
--p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
-stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:23.10-py3
-```
-## Future Work
-The NeMo Framework Launcher does not currently support ASR and TTS
-training, but it will soon.
-## Discussions Board
-FAQ can be found on the NeMo [Discussions
-board](https://github.com/NVIDIA/NeMo/discussions). You are welcome to
-ask questions or start discussions on the board.
-## Contribute to NeMo
-We welcome community contributions! Please refer to
-[CONTRIBUTING.md](https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md)
-for the process.
-## Publications
-We provide an ever-growing list of
-[publications](https://nvidia.github.io/NeMo/publications/) that utilize
-the NeMo Framework.
-To contribute an article to the collection, please submit a pull request
-to the `gh-pages-src` branch of this repository. For detailed
-information, please consult the README located at the [gh-pages-src
-branch](https://github.com/NVIDIA/NeMo/tree/gh-pages-src#readme).
-## Blogs
-<!-- markdownlint-disable -->
-<details open>
-  <summary><b>Large Language Models and Multimodal Models</b></summary>
-    <details>
-      <summary>
-        <a href="https://blogs.nvidia.com/blog/bria-builds-responsible-generative-ai-using-nemo-picasso/">
-          Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso
-        </a> (2024/03/06)
-      </summary>
-      Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework.
-      The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation.
-      Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference.
-      <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility/">
-          New NVIDIA NeMo Framework Features and NVIDIA H200
-        </a> (2023/12/06)
-      </summary>
-      NVIDIA NeMo Framework now includes several optimizations and enhancements,
-      including:
-      1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models,
-      2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale,
-      3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and
-      4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs.
-      <br><br>
-      <a href="https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility">
-      <img src="https://github.com/sbhavani/TransformerEngine/blob/main/docs/examples/H200-NeMo-performance.png" alt="H200-NeMo-performance" style="width: 600px;"></a>
-      <br><br>
-    </details>
-    <details>
-      <summary>
-        <a href="https://blogs.nvidia.com/blog/nemo-amazon-titan/">
-          NVIDIA now powers training for Amazon Titan Foundation models
-        </a> (2023/11/28)
-      </summary>
-      NVIDIA NeMo Framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs).
-      The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock.
-      The NeMo Framework provides a versatile framework for building, customizing, and running LLMs.
-      <br><br>
-    </details>
-</details>
-<!-- markdownlint-enable -->
-## Licenses
-- [NeMo GitHub Apache 2.0
-  license](https://github.com/NVIDIA/NeMo?tab=Apache-2.0-1-ov-file#readme)
-- NeMo is licensed under the [NVIDIA AI PRODUCT
-  AGREEMENT](https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/).
-  By pulling and using the container, you accept the terms and
-  conditions of this license.

NeMo-2.2.0/codecov.yml DELETED Viewed

@@ -1,7 +0,0 @@
-comment: false
-coverage:
-  status:
-    patch: false
-    project: false
-fixes:
-  - "/workspace/::"

NeMo-2.2.0/docs/Makefile DELETED Viewed

@@ -1,216 +0,0 @@
-# Makefile for Sphinx documentation
-#
-# You can set these variables from the command line.
-SPHINXOPTS    =
-SPHINXBUILD   = sphinx-build
-PAPER         =
-BUILDDIR      = build
-# User-friendly check for sphinx-build
-ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
-$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
-endif
-# Internal variables.
-PAPEROPT_a4     = -D latex_paper_size=a4
-PAPEROPT_letter = -D latex_paper_size=letter
-ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
-# the i18n builder cannot share the environment and doctrees with the others
-I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
-.PHONY: help
-help:
-	@echo "Please use \`make <target>' where <target> is one of"
-	@echo "  html       to make standalone HTML files"
-	@echo "  dirhtml    to make HTML files named index.html in directories"
-	@echo "  singlehtml to make a single large HTML file"
-	@echo "  pickle     to make pickle files"
-	@echo "  json       to make JSON files"
-	@echo "  htmlhelp   to make HTML files and a HTML help project"
-	@echo "  qthelp     to make HTML files and a qthelp project"
-	@echo "  applehelp  to make an Apple Help Book"
-	@echo "  devhelp    to make HTML files and a Devhelp project"
-	@echo "  epub       to make an epub"
-	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
-	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
-	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
-	@echo "  text       to make text files"
-	@echo "  man        to make manual pages"
-	@echo "  texinfo    to make Texinfo files"
-	@echo "  info       to make Texinfo files and run them through makeinfo"
-	@echo "  gettext    to make PO message catalogs"
-	@echo "  changes    to make an overview of all changed/added/deprecated items"
-	@echo "  xml        to make Docutils-native XML files"
-	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
-	@echo "  linkcheck  to check all external links for integrity"
-	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
-	@echo "  coverage   to run coverage check of the documentation (if enabled)"
-.PHONY: clean
-clean:
-	rm -rf $(BUILDDIR)/*
-.PHONY: html
-html:
-	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
-	@echo
-	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
-.PHONY: dirhtml
-dirhtml:
-	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
-	@echo
-	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
-.PHONY: singlehtml
-singlehtml:
-	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
-	@echo
-	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
-.PHONY: pickle
-pickle:
-	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
-	@echo
-	@echo "Build finished; now you can process the pickle files."
-.PHONY: json
-json:
-	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
-	@echo
-	@echo "Build finished; now you can process the JSON files."
-.PHONY: htmlhelp
-htmlhelp:
-	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
-	@echo
-	@echo "Build finished; now you can run HTML Help Workshop with the" \
-	      ".hhp project file in $(BUILDDIR)/htmlhelp."
-.PHONY: qthelp
-qthelp:
-	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
-	@echo
-	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
-	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
-	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/OpenSeq2Seq.qhcp"
-	@echo "To view the help file:"
-	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/OpenSeq2Seq.qhc"
-.PHONY: applehelp
-applehelp:
-	$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
-	@echo
-	@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
-	@echo "N.B. You won't be able to view it unless you put it in" \
-	      "~/Library/Documentation/Help or install it in your application" \
-	      "bundle."
-.PHONY: devhelp
-devhelp:
-	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
-	@echo
-	@echo "Build finished."
-	@echo "To view the help file:"
-	@echo "# mkdir -p $$HOME/.local/share/devhelp/OpenSeq2Seq"
-	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/OpenSeq2Seq"
-	@echo "# devhelp"
-.PHONY: epub
-epub:
-	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
-	@echo
-	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
-.PHONY: latex
-latex:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo
-	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
-	@echo "Run \`make' in that directory to run these through (pdf)latex" \
-	      "(use \`make latexpdf' here to do that automatically)."
-.PHONY: latexpdf
-latexpdf:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo "Running LaTeX files through pdflatex..."
-	$(MAKE) -C $(BUILDDIR)/latex all-pdf
-	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
-.PHONY: latexpdfja
-latexpdfja:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo "Running LaTeX files through platex and dvipdfmx..."
-	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
-	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
-.PHONY: text
-text:
-	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
-	@echo
-	@echo "Build finished. The text files are in $(BUILDDIR)/text."
-.PHONY: man
-man:
-	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
-	@echo
-	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
-.PHONY: texinfo
-texinfo:
-	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
-	@echo
-	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
-	@echo "Run \`make' in that directory to run these through makeinfo" \
-	      "(use \`make info' here to do that automatically)."
-.PHONY: info
-info:
-	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
-	@echo "Running Texinfo files through makeinfo..."
-	make -C $(BUILDDIR)/texinfo info
-	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
-.PHONY: gettext
-gettext:
-	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
-	@echo
-	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
-.PHONY: changes
-changes:
-	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
-	@echo
-	@echo "The overview file is in $(BUILDDIR)/changes."
-.PHONY: linkcheck
-linkcheck:
-	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
-	@echo
-	@echo "Link check complete; look for any errors in the above output " \
-	      "or in $(BUILDDIR)/linkcheck/output.txt."
-.PHONY: doctest
-doctest:
-	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
-	@echo "Testing of doctests in the sources finished, look at the " \
-	      "results in $(BUILDDIR)/doctest/output.txt."
-.PHONY: coverage
-coverage:
-	$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
-	@echo "Testing of coverage in the sources finished, look at the " \
-	      "results in $(BUILDDIR)/coverage/python.txt."
-.PHONY: xml
-xml:
-	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
-	@echo
-	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
-.PHONY: pseudoxml
-pseudoxml:
-	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
-	@echo
-	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

NeMo-2.2.0/docs/source/_static/css/custom.css DELETED Viewed

@@ -1,372 +0,0 @@
-@import url("theme.css");
-body {
-	font-size: 100%;
-	font-family: 'NVIDIA Sans', sans-serif;
-}
-/* Width of template */
-.wy-nav-content {
-	max-width: 1200px !important;
-}
-/* Standard Text Formatting */
-h1 {
-	color: #76b900;
-	text-align: center;
-	/* background-color: #ffffff; */
-}
-h2 {
-	color: #ffffff;
-	/* background-color: #ffffff; */
-	/* #76b900 */
-	Padding: 5px;
-}
-h3 {
-	padding-top: 0px;
-	border-top: solid 3px #000000;
-	/* #76b900 */
-	border-bottom: solid 3px #000000;
-	/* #76b900 */
-}
-p {
-	margin-bottom: 24px;
-}
-/* Link Colors */
-/*
-a {
-    color: #76b900;
-}
-/*
-/*
-a:visited {
-	color: #218219;
-}
-*/
-.container-xl {
-	margin-right: unset;
-	margin-left: unset;
-}
-section {
-	overflow-x: auto;
-}
-/* ----------------------------------------------TABLES--------------------------------------- */
-section table {
-	overflow-x: auto;
-	display: block;
-}
-table {
-	font-size: small;
-}
-/* Table head Color */
-thead td {
-	background-color: #333333 !important;
-}
-.row-odd p {
-	/*padding-bottom: 0px;*/
-	/*margin-bottom: 0px;*/
-}
-/* even rows*/
-.row-even tr {
-	background-color: #e5f1e6 !important;
-}
-/* odd rows*/
-.wy-table-responsive table tr {
-	background-color: #ffffff !important;
-}
-.wy-table-responsive table td {
-	white-space: normal;
-}
-/* Removes bottom margin in tables*/
-.rst-content .line-block {
-	margin-bottom: 0px;
-}
-.wy-table-responsive {
-	overflow: visible !important;
-}
-/* reduces the size of text in multiline table columns. */
-.rst-content table.docutils td {
-	font-size: 80%;
-}
-.rst-content dl:not(.docutils) dt {
-	background-color: inherit;
-	color: #000000;
-	border-top: solid 0px #000000;
-}
-.rst-content dl:not(.docutils) dt:before {
-	color: #333333;
-}
-.rst-content .line-block {
-	margin-bottom: 0px;
-}
-.wy-side-nav-search,
-.wy-nav-top {
-	background-color: #000000;
-	padding: 0;
-}
-.wy-side-nav-search img {
-	padding: 0px;
-	padding: 0px 0px;
-	margin-bottom: 0;
-}
-.wy-side-nav-search input[type=text] {
-	border-radius: 0px;
-}
-.wy-menu-vertical p.caption {
-	color: #76b900;
-}
-.wy-side-nav-search>a img.logo,
-.wy-side-nav-search .wy-dropdown>a img.logo {
-	margin: 0px 0px 0px 0px;
-}
-.wy-nav-content {
-	margin: 0;
-	min-height: 100%;
-	height: 100%;
-	background: #ffffff;
-}
-/* List (numbered, bulleted) padding Fix */
-.wy-plain-list-decimal li {
-	margin-top: -6px;
-	margin-bottom: -6px;
-}
-.rst-content .section ol.loweralpha {
-	margin-top: -6px;
-	margin-bottom: 12px;
-}
-.wy-plain-list-disc,
-.rst-content .toctree-wrapper ul,
-article ul {
-	margin-top: 0px !important;
-	margin-bottom: 12px;
-}
-/* Alert Boxes */
-/* Background color of Alert Box Title */
-.rst-content .section ul {
-	margin-top: -12px;
-	margin-bottom: 16px;
-}
-.wy-alert.wy-alert-info .wy-alert-title,
-.rst-content .note .wy-alert-title,
-.rst-content .wy-alert-info.attention .wy-alert-title,
-.rst-content .wy-alert-info.caution .wy-alert-title,
-.rst-content .wy-alert-info.danger .wy-alert-title,
-.rst-content .wy-alert-info.error .wy-alert-title,
-.rst-content .wy-alert-info.hint .wy-alert-title,
-.rst-content .wy-alert-info.important .wy-alert-title,
-.rst-content .wy-alert-info.tip .wy-alert-title,
-.rst-content .wy-alert-info.warning .wy-alert-title,
-.rst-content .seealso .wy-alert-title,
-.rst-content .wy-alert-info.admonition-todo .wy-alert-title,
-.rst-content .wy-alert-info.admonition .wy-alert-title,
-.wy-alert.wy-alert-info .rst-content .admonition-title,
-.rst-content .wy-alert.wy-alert-info .admonition-title,
-.rst-content .note .admonition-title,
-.rst-content .wy-alert-info.attention .admonition-title,
-.rst-content .wy-alert-info.caution .admonition-title,
-.rst-content .wy-alert-info.danger .admonition-title,
-.rst-content .wy-alert-info.error .admonition-title,
-.rst-content .wy-alert-info.hint .admonition-title,
-.rst-content .wy-alert-info.important .admonition-title,
-.rst-content .wy-alert-info.tip .admonition-title,
-.rst-content .wy-alert-info.warning .admonition-title,
-.rst-content .seealso .admonition-title,
-.rst-content .wy-alert-info.admonition-todo .admonition-title,
-.rst-content .wy-alert-info.admonition .admonition-title {
-	background: #76b900;
-}
-/* Background and Font Color of Alert Box Main Body*/
-.wy-alert.wy-alert-info,
-.rst-content .note,
-.rst-content .wy-alert-info.attention,
-.rst-content .wy-alert-info.caution,
-.rst-content .wy-alert-info.danger,
-.rst-content .wy-alert-info.error,
-.rst-content .wy-alert-info.hint,
-.rst-content .wy-alert-info.important,
-.rst-content .wy-alert-info.tip,
-.rst-content .wy-alert-info.warning,
-.rst-content .seealso,
-.rst-content .wy-alert-info.admonition-todo,
-.rst-content .wy-alert-info.admonition {
-	background: #333333;
-	color: #999999;
-}
-.section {
-	margin-top: 50px;
-}
-/* Logo */
-.navbar-brand-box {
-	background-color: #ffffff;
-}
-/* ---------------------------------------------- Media Queries --------------------------------------- */
-@media (min-width: 1200px) {
-	.container-xl {
-		max-width: 100%;
-	}
-}
-@media (min-width: none) {
-	body {
-		font-size: 18px;
-	}
-	#site-navigation nav ul.nav {
-		font-size: 18px;
-	}
-	#site-navigation nav.bd-links p {
-		font-size: 18px;
-	}
-	#site-navigation {
-		width: 350px;
-	}
-	.toc-h2 {
-		font-size: 18px;
-	}
-	.toc-h3 {
-		font-size: 1rem;
-	}
-	.toc-h4 {
-		font-size: 0.85rem;
-	}
-	.header-article .bd-toc {
-		font-size: 18px;
-	}
-	#main-content>div {
-		margin-left: 10%;
-		margin-right: 10%;
-	}
-}
-/* ---------------------------------------------- NVIDIA Sans --------------------------------------- */
-:root {
-	--md-text-font: "NVIDIA Sans";
-	/* --md-code-font: "NVIDIA Sans"; */
-}
-@font-face {
-	font-family: "NVIDIA Sans";
-	src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/5/2/52891dda673228d54e5d57bf1e4a3880d4b22405.woff2) format("woff2"),
-		url(https://aws1.discourse-cdn.com/nvidia/original/3X/e/0/e090b7dda7a582522c7f9045c6ce949cce60134f.woff) format("woff");
-	font-weight: 300;
-	font-style: normal;
-}
-@font-face {
-	font-family: "NVIDIA Sans";
-	src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/a/1/a107baabcbf6b241099122336bce7429bcfd377a.woff2) format("woff2"),
-		url(https://aws1.discourse-cdn.com/nvidia/original/3X/3/a/3a6060a4e3bce70e5552ba0de8af4b22c6cf9144.woff) format("woff");
-	font-weight: 300;
-	font-style: italic;
-}
-@font-face {
-	font-family: "NVIDIA Sans";
-	src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/9/9/9920d2b172b01d92fc9c1c0e521dcf45b59c47c3.woff2) format("woff2"),
-		url(https://aws1.discourse-cdn.com/nvidia/original/3X/6/c/6c7d947928a7e4ef3e80ed409bef6c243f2148cb.woff) format("woff");
-	font-weight: 400;
-	font-style: normal;
-}
-@font-face {
-	font-family: "NVIDIA Sans";
-	src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/e/8/e8e63fe1244372cd942d957f44a5616a1eba0644.woff2) format("woff2"),
-		url(https://aws1.discourse-cdn.com/nvidia/original/3X/0/f/0f1fb2af0283ab09d36e7097bb07d895c3228f12.woff) format("woff");
-	font-weight: 400;
-	font-style: italic;
-}
-@font-face {
-	font-family: "NVIDIA Sans";
-	src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/7/9/79d3c513a9cd72c59f65354f39f89ca52dc17dd2.woff2) format("woff2"),
-		url(https://aws1.discourse-cdn.com/nvidia/original/3X/2/5/2581ac533f5d01f4985d8a7245b0766b4630ced8.woff) format("woff");
-	font-weight: 500;
-	font-style: normal;
-}
-@font-face {
-	font-family: "NVIDIA Sans";
-	src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/3/9/39d9ef1ee9770dd503f19bb2ace2fdb4eff3bb50.woff2) format("woff2"),
-		url(https://aws1.discourse-cdn.com/nvidia/original/3X/7/b/7bb5d5e2e71b2e13c8098b2e67c0a0ed9258e6c7.woff) format("woff");
-	font-weight: 500;
-	font-style: italic;
-}
-@font-face {
-	font-family: "NVIDIA Sans";
-	src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/0/5/05276a55a43eb3f74981ec1e93252727afcd9d16.woff2) format("woff2"),
-		url(https://aws1.discourse-cdn.com/nvidia/original/3X/9/c/9cfec7ed941b06564aa4d5ca14610e81542d070f.woff) format("woff");
-	font-weight: 700;
-	font-style: normal;
-}
-@font-face {
-	font-family: "NVIDIA Sans";
-	src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/a/e/aebd14d09ba56f541e1b8735fb051e33710f9ae7.woff2) format("woff2"),
-		url(https://aws1.discourse-cdn.com/nvidia/original/3X/e/d/edbdabef43acc5c12e84a94baaa5542c9404cfeb.woff) format("woff");
-	font-weight: 700;
-	font-style: italic;
-}

NeMo-2.2.0/docs/source/_static/js/pk_scripts.js DELETED Viewed

@@ -1,19 +0,0 @@
-document.addEventListener("DOMContentLoaded", function () {
-    var params = window.location.search.substring(1).split("&").reduce(function (params, param) {
-        if (!param) {
-            return params;
-        }
-        var values = param.split("=");
-        var name = values[0];
-        var value = values[1];
-        params[name] = value;
-        return params;
-    }, {});
-    var form = document.getElementById("feedback-form");
-    for (var name in params) {
-        var input = form.querySelector("[name=" + name + "]");
-        input.value = params[name];
-    }
-});

NeMo-2.2.0/docs/source/_templates/layout.html DELETED Viewed

@@ -1,14 +0,0 @@
-{% extends "!layout.html" %}
-{% block extrahead %}
-<script type="text/javascript"
-    src="//assets.adobedtm.com/b92787824f2e0e9b68dc2e993f9bd995339fe417/satelliteLib-7ba51e58dc61bcb0e9311aadd02a0108ab24cc6c.js"></script>
-{% endblock %}
-{% block footer %}
-<script type="text/javascript">_satellite.pageBottom();</script>
-{% endblock %}

NeMo-2.2.0/docs/source/apis.rst DELETED Viewed

@@ -1,49 +0,0 @@
-=========
-NeMo APIs
-=========
-You can learn more about the underlying principles of the NeMo codebase in this section.
-The `NeMo Framework codebase <https://github.com/NVIDIA/NeMo>`__ is composed of a `core <https://github.com/NVIDIA/NeMo/tree/main/nemo/core>`__ section which contains the main building blocks of the framework, and various `collections <https://github.com/NVIDIA/NeMo/tree/main/nemo/collections>`__ which help you
-build specialized AI models.
-You can learn more about aspects of the NeMo "core" by following the links below:
-.. toctree::
-   :maxdepth: 1
-   :name: core
-   :titlesonly:
-   core/core
-   core/neural_modules
-   core/exp_manager
-   core/neural_types
-   core/export
-   core/adapters/intro
-You can learn more about aspects of the NeMo APIs by following the links below:
-.. toctree::
-   :maxdepth: 1
-   :name: API
-   :titlesonly:
-   core/api
-   common/intro
-   nlp/api
-   multimodal/api
-   asr/api
-   tts/api
-Alternatively, you can jump straight to the documentation for the individual collections:
-* :doc:`Large Language Models (LLMs) <../nlp/nemo_megatron/intro>`
-* :doc:`Automatic Speech Recognition (ASR) <../asr/intro>`
-* :doc:`Multimodal Models (MMs) <../multimodal/mllm/intro>`
-* :doc:`Text-to-Speech (TTS) <../tts/intro>`
-* :doc:`Computer Vision (CV)  <../vision/intro>`

NeMo-2.2.0/docs/source/asr/all_chkpt.rst DELETED Viewed

@@ -1,236 +0,0 @@
-All Checkpoints
-===============
-English
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_en.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-German
-^^^^^^
-.. csv-table::
-   :file: data/benchmark_de.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Spanish
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_es.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-French
-^^^^^^
-.. csv-table::
-   :file: data/benchmark_fr.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Russian
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_ru.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Japanese
-^^^^^^^^
-.. csv-table::
-   :file: data/benchmark_jp.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Chinese
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_cn.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Georgian
-^^^^^^^^
-.. csv-table::
-   :file: data/benchmark_ka.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Kazakh
-^^^^^^
-.. csv-table::
-   :file: data/benchmark_kz.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Persian
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_fa.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Uzbek
-^^^^^
-.. csv-table::
-   :file: data/benchmark_uz.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Ukrainian
-^^^^^^^^^
-.. csv-table::
-   :file: data/benchmark_ua.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Polish
-^^^^^^
-.. csv-table::
-   :file: data/benchmark_pl.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Italian
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_it.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Belarusian
-^^^^^^^^^^
-.. csv-table::
-   :file: data/benchmark_by.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Croatian
-^^^^^^^^
-.. csv-table::
-   :file: data/benchmark_hr.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Esperanto
-^^^^^^^^^
-.. csv-table::
-   :file: data/benchmark_eo.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Kabyle
-^^^^^^
-.. csv-table::
-   :file: data/benchmark_kab.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Dutch
-^^^^^
-.. csv-table::
-   :file: data/benchmark_nl.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Catalan
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_ca.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Hindi
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_hi.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Marathi
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_mr.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
------------------------------
-Mandarin
-^^^^^^^
-.. csv-table::
-   :file: data/benchmark_zh.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1
-Kinyarwanda
-^^^^^^^^^^^
-.. csv-table::
-   :file: data/benchmark_rw.csv
-   :align: left
-   :widths: 50,50
-   :header-rows: 1

NeMo-2.2.0/docs/source/asr/api.rst DELETED Viewed

@@ -1,343 +0,0 @@
-NeMo ASR API
-============
-Model Classes
--------------
-.. autoclass:: nemo.collections.asr.models.EncDecCTCModel
-    :show-inheritance:
-    :members: transcribe, change_vocabulary, setup_training_data, setup_optimization, setup_validation_data, setup_test_data, register_artifact
-.. autoclass:: nemo.collections.asr.models.EncDecCTCModelBPE
-    :show-inheritance:
-    :members: transcribe, change_vocabulary, setup_training_data, setup_optimization, setup_validation_data, setup_test_data, register_artifact
-.. autoclass:: nemo.collections.asr.models.EncDecRNNTModel
-    :show-inheritance:
-    :members: transcribe, change_vocabulary, setup_training_data, setup_optimization, setup_validation_data, setup_test_data, register_artifact
-.. autoclass:: nemo.collections.asr.models.EncDecRNNTBPEModel
-    :show-inheritance:
-    :members: transcribe, change_vocabulary, setup_training_data, setup_optimization, setup_validation_data, setup_test_data, register_artifact
-.. autoclass:: nemo.collections.asr.models.EncDecClassificationModel
-    :show-inheritance:
-    :members: setup_training_data, setup_optimization, setup_validation_data, setup_test_data, register_artifact
-.. autoclass:: nemo.collections.asr.models.EncDecSpeakerLabelModel
-    :show-inheritance:
-    :members: setup_training_data, setup_optimization, setup_validation_data, setup_test_data, register_artifact
-.. autoclass:: nemo.collections.asr.models.hybrid_asr_tts_models.ASRWithTTSModel
-    :show-inheritance:
-    :members: from_asr_config, from_pretrained_models, save_asr_model_to, setup_training_data
-.. _confidence-ensembles-api:
-.. autoclass:: nemo.collections.asr.models.confidence_ensemble.ConfidenceEnsembleModel
-    :show-inheritance:
-    :members: transcribe
-Modules
--------
-.. autoclass:: nemo.collections.asr.modules.ConvASREncoder
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.modules.ConvASRDecoder
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.modules.ConvASRDecoderClassification
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.modules.SpeakerDecoder
-    :show-inheritance:
-    :members:
-.. _conformer-encoder-api:
-.. autoclass:: nemo.collections.asr.modules.ConformerEncoder
-    :show-inheritance:
-    :members:
-.. _squeezeformer-encoder-api:
-.. autoclass:: nemo.collections.asr.modules.SqueezeformerEncoder
-    :show-inheritance:
-    :members:
-.. _rnn-encoder-api:
-.. autoclass:: nemo.collections.asr.modules.RNNEncoder
-    :show-inheritance:
-    :members:
-.. _rnnt-decoder-api:
-.. autoclass:: nemo.collections.asr.modules.RNNTDecoder
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.modules.StatelessTransducerDecoder
-    :show-inheritance:
-    :members:
-.. _rnnt-joint-api:
-.. autoclass:: nemo.collections.asr.modules.RNNTJoint
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.modules.SampledRNNTJoint
-    :show-inheritance:
-    :members:
-Parts
------
-.. autoclass:: nemo.collections.asr.parts.submodules.jasper.JasperBlock
-    :show-inheritance:
-    :members:
-Mixins
-------
-.. autoclass:: nemo.collections.asr.parts.mixins.mixins.ASRBPEMixin
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.mixins.mixins.ASRModuleMixin
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.mixins.transcription.TranscriptionMixin
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.mixins.transcription.TranscribeConfig
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.mixins.interctc_mixin.InterCTCMixin
-    :show-inheritance:
-    :members:
-Datasets
---------
-Character Encoding Datasets
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.data.audio_to_text.AudioToCharDataset
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.data.audio_to_text.TarredAudioToCharDataset
-    :show-inheritance:
-    :members:
-Text-to-Text Datasets for Hybrid ASR-TTS models
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.data.text_to_text.TextToTextDataset
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.data.text_to_text.TextToTextIterableDataset
-    :show-inheritance:
-    :members:
-Subword Encoding Datasets
-~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.data.audio_to_text.AudioToBPEDataset
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.data.audio_to_text.TarredAudioToBPEDataset
-    :show-inheritance:
-    :members:
-Audio Preprocessors
--------------------
-.. autoclass:: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.modules.AudioToMFCCPreprocessor
-    :show-inheritance:
-    :members:
-Audio Augmentors
-----------------
-.. autoclass:: nemo.collections.asr.modules.SpectrogramAugmentation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.modules.CropOrPadSpectrogramAugmentation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.SpeedPerturbation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.TimeStretchPerturbation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.GainPerturbation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.ImpulsePerturbation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.ShiftPerturbation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.NoisePerturbation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.WhiteNoisePerturbation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.RirAndNoisePerturbation
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.preprocessing.perturb.TranscodePerturbation
-    :show-inheritance:
-    :members:
-Miscellaneous Classes
----------------------
-CTC Decoding
-~~~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.parts.submodules.ctc_decoding.CTCDecoding
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.ctc_decoding.CTCBPEDecoding
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.ctc_greedy_decoding.GreedyCTCInfer
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.ctc_beam_decoding.BeamCTCInfer
-    :show-inheritance:
-    :members:
-RNNT Decoding
-~~~~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.parts.submodules.rnnt_decoding.RNNTDecoding
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.rnnt_decoding.RNNTBPEDecoding
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.rnnt_greedy_decoding.GreedyRNNTInfer
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.rnnt_greedy_decoding.GreedyBatchedRNNTInfer
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.rnnt_beam_decoding.BeamRNNTInfer
-    :show-inheritance:
-    :members:
-TDT Decoding
-~~~~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.parts.submodules.rnnt_greedy_decoding.GreedyTDTInfer
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.rnnt_greedy_decoding.GreedyBatchedTDTInfer
-    :show-inheritance:
-    :members:
-.. autoclass:: nemo.collections.asr.parts.submodules.tdt_beam_decoding.BeamTDTInfer
-    :show-inheritance:
-    :members:
-Hypotheses
-~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.parts.utils.rnnt_utils.Hypothesis
-    :show-inheritance:
-    :no-members:
-.. autoclass:: nemo.collections.asr.parts.utils.rnnt_utils.NBestHypotheses
-    :show-inheritance:
-    :no-members:
-Adapter Networks
-~~~~~~~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.MultiHeadAttentionAdapter
-    :show-inheritance:
-    :members:
-    :member-order: bysource
------
-.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.RelPositionMultiHeadAttentionAdapter
-    :show-inheritance:
-    :members:
-    :member-order: bysource
------
-.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.PositionalEncodingAdapter
-    :show-inheritance:
-    :members:
-    :member-order: bysource
------
-.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.RelPositionalEncodingAdapter
-    :show-inheritance:
-    :members:
-    :member-order: bysource
-Adapter Strategies
-~~~~~~~~~~~~~~~~~~
-.. autoclass:: nemo.collections.asr.parts.submodules.adapters.multi_head_attention_adapter_module.MHAResidualAddAdapterStrategy
-    :show-inheritance:
-    :members:
-    :member-order: bysource
-    :undoc-members: adapter_module_names

NeMo-2.2.0/docs/source/asr/asr_all.bib DELETED Viewed

@@ -1,1043 +0,0 @@
-@article{matchboxnet,
-  title={{MatchboxNet}: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition},
-  author={Majumdar, Somshubra and Ginsburg, Boris},
-  journal={Proc. Interspeech 2020},
-  year={2020}
-}
-@article{marblenet,
-  title={MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection},
-  author={Jia, Fei and Majumdar, Somshubra and Ginsburg, Boris},
-  journal={arXiv preprint arXiv:2010.13886},
-  year={2020}
-}
-@inproceedings{panayotov2015librispeech,
-  title={Librispeech: an ASR corpus based on public domain audio books},
-  author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
-  booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
-  pages={5206--5210},
-  year={2015},
-  organization={IEEE}
-}
-@article{luong17,
-  author  = {Minh{-}Thang Luong and Eugene Brevdo and Rui Zhao},
-  title   = {Neural Machine Translation (seq2seq) Tutorial},
-  journal = {https://github.com/tensorflow/nmt},
-  year    = {2017},
-}
-@INPROCEEDINGS{LaurentSeqWiseBN,
-author={C. {Laurent} and G. {Pereyra} and P. {Brakel} and Y. {Zhang} and Y. {Bengio}},
-booktitle={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
-title={Batch normalized recurrent neural networks},
-year={2016},
-volume={},
-number={},
-pages={2657-2661},
-keywords={feedforward neural nets;learning (artificial intelligence);recurrent neural nets;speech recognition;batch normalized recurrent neural networks;RNN;sequential data;long-term dependency learning;convergence rate improvement;intermediate representation normalization;feedforward neural networks;speech recognition task;language modeling;training criterion;Training;Recurrent neural networks;Convergence;Speech recognition;Computer architecture;Speech;batch normalization;RNN;LSTM;optimization},
-doi={10.1109/ICASSP.2016.7472159},
-ISSN={2379-190X},
-month={March},}
-@article{graves2005,
-  author  = {Alex Graves and Jurgen Schmidhuber},
-  title   = {Framewise phoneme classification with bidirectional LSTM and other neural network architectures},
-  journal = {Neural Networks, vol. 18},
-  pages={602–-610},
-  year    = {2005},
-}
-@inproceedings{graves2006,
-  title={Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks},
-  author={Graves, Alex and Fern{\'a}ndez, Santiago and Gomez, Faustino and Schmidhuber, J{\"u}rgen},
-  booktitle={Proceedings of the 23rd international conference on Machine learning},
-  pages={369--376},
-  year={2006},
-  organization={ACM}
-}
-@article{li2019jasper,
-  title={Jasper: An End-to-End Convolutional Neural Acoustic Model},
-  author={Li, Jason and Lavrukhin, Vitaly and Ginsburg, Boris and Leary, Ryan and Kuchaiev, Oleksii and Cohen, Jonathan M and Nguyen, Huyen and Gadde, Ravi Teja},
-  journal={arXiv preprint arXiv:1904.03288},
-  year={2019}
-}
-@misc{ardila2019common,
-    title={Common Voice: A Massively-Multilingual Speech Corpus},
-    author={Rosana Ardila and Megan Branson and Kelly Davis and Michael Henretty and Michael Kohler and Josh Meyer and Reuben Morais and Lindsay Saunders and Francis M. Tyers and Gregor Weber},
-    year={2019},
-    eprint={1912.06670},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-@article{graves2012,
-  title={Sequence Transduction with Recurrent Neural Networks},
-  author={Graves, Alex},
-  journal={arXiv preprint arXiv:1211.3711},
-  year={2012}
-}
-@article{graves2013,
-  title={Generating sequences with recurrent neural networks},
-  author={Graves, Alex},
-  journal={arXiv preprint arXiv:1308.0850},
-  year={2013}
-}
-@article{sergeev2018horovod,
-  title={Horovod: fast and easy distributed deep learning in TensorFlow},
-  author={Sergeev, Alexander and Del Balso, Mike},
-  journal={arXiv preprint arXiv:1802.05799},
-  year={2018}
-}
-@misc{NVVolta,
-  title = {NVIDIA TESLA V100 GPU ARCHITECTURE},
-  howpublished = {\url{http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf}},
-  note = {Accessed: 2018-10-09}
-}
-@article{NVTuring,
-  title = {NVIDIA TURING GPU ARCHITECTURE},
-  howpublished = {\url{https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf}},
-  author = {NVIDIA},
-  year = {2018},
-  note = {Accessed: 2018-10-09}
-}
-@misc{Rygaard2015,
-  title = {Using Synthesized Speech to Improve Speech Recognition for Low-Resource Languages},
-  author = {Luise Valentin Rygaard},
-  howpublished = {\url{https://parasol.tamu.edu/dreu2015/Rygaard/report.pdf}},
-  year = {2015},
-}
-@misc{OpenSeq2Seq,
-  title = {OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models},
-  author = {Kuchaiev, Oleksii and Ginsburg, Boris and Gitman, Igor and  Lavrukhin,Vitaly and   Case, Carl and   Micikevicius, Paulius},
-  howpublished = {\url{https://arxiv.org/abs/1805.10387}},
-  year = {2018},
-}
-@misc{MPGuide,
-  title = {Training with Mixed Precision},
-  howpublished = {\url{http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/}},
-  note = {Accessed: 2018-04-06},
-}
-@misc{Mozilla,
-  title = {Mozilla: A Journey to less than 10\% Word Error Rate},
-  howpublished = {\url{https://hacks.mozilla.org/2017/11/a-journey-to-10-word-error-rate/}},
-  note = {Accessed: 2018-04-06},
-}
-@article{Waibel1989,
-  title={A time-delay neural network architecture for isolated word recognition},
-  author={Waibel, Alexander,  and Hanazawa, Toshiyki and Hinton,Geoffrey and Shirano, Kiyohiro and  Lang, Kevin },
-  journal={IEEE Trans. on Acoustics, Speech and Signal Processing},
-  year={1989}
-}
-@article{Lang1990,
-  title={A time-delay neural network architecture for isolated word recognition},
-  author={Lang, Kevin and Waibel, Alexander, and Hinton,Geoffrey },
-  journal={Neural Networks},
-  year={1990}
-}
-@book{Bengio1996,
-	Author = {Bengio, Y.},
-	Publisher = {International Thomson Computer Press},
-	Title = {Neural Networks for Speech and Sequence Recognition},
-	Year = {1996}
-}
-@article{Bengio1992,
-  title={Global optimization of a neural network-hidden Markov model hybrid},
-  author={Bengio, Y., and De Mori, R., and Flammia, G., and Kompe, R. },
-  journal={IEEE Transactions on Neural Networks, 3(2), 252–259},
-  year={1992}
-}
-@article{Bourlard1994,
-  title={Connectionist speech recognition: a hybrid approach},
-  author={Bourlard, H. A. and Morgan, N.},
-  journal={volume 247 Springer },
-  year={1994}
-}
-@article{srivastava14a,
-  author  = {Nitish Srivastava, and Geoffrey Hinton, and Alex Krizhevsky, and Ilya Sutskever, and Ruslan Salakhutdinov},
-  title   = {Dropout: A Simple Way to Prevent Neural Networks from Overfitting},
-  journal = {Journal of Machine Learning Research},
-  year    = {2014},
-  volume  = {15},
-  pages   = {1929-1958},
-  url     = {http://jmlr.org/papers/v15/srivastava14a.html}
-}
-@article{Hinton2012,
-  title={Deep Neural Networks for Acoustic Modeling in Speech Recognition},
-  author={ Hinton,Geoffrey and Deng, Li and  Yu, Dong and Dahl,George   and  Mohamed,Abdel-rahman and Jaitly, Navdeep and Senior,  Andrew and Vanhoucke, Vincent and Nguyen, Patrick  and  Kingsbury, Brian and  Sainath, Tara},
-  journal={IEEE Signal Processing Magazine},
-  year={2012}
-}
-@article{Graves2014,
-  title={Towards End-to-End Speech Recognition with Recurrent Neural Networks},
-  author={Graves, Alex  and  Jaitly, Navdeep},
-  journal={International Conference on Machine Learning},
-  year={2014}
-}
-@article{Chorowski2014,
-  title={End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results},
-  author={ Chorowski, Jan, and Bahdanau, Dzmitry , and Cho, Kyunghyun , and Bengio, Yoshua },
-  journal={Neural Information Processing Systems: Workshop Deep Learning and Representation Learning Workshop },
-  year={2014}
-}
-@article{Sak2014,
-  title={Long short-term memory recurrent neural network architectures  for large scale acoustic modeling},
-  author={Sak, Hasim and  Senior, Andrew and Beaufays, Francoise },
-  journal={Interspeech 2014},
-  year={2014}
-}
-@article{Ko2015,
-  title={Audio Augmentation for Speech Recognition},
-  author={Tom, Ko and Vijayaditya, Peddinti and Daniel, Povey
-  and Sanjeev, Khudanpur },
-  journal={Interspeech 2015},
-  year={2015}
-}
-@article{Tjandra2017,
-  title={Listening while Speaking: Speech Chain by Deep Learning},
-  author={Andros, Tjandra and Sakriani, Sakti and Satoshi, Nakamura },
-  journal={ASRU 2017},
-  year={2017}
-}
-@article{Tjandra2018,
-  title={Machine Speech Chain with One-shot Speaker Adaptation},
-  author={Andros, Tjandra and Sakriani, Sakti and Satoshi, Nakamura },
-  journal={Interspeech 2018},
-  year={2018}
-}
-@article{bahdanau2014neural,
-  title={Neural machine translation by jointly learning to align and translate},
-  author={Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua},
-  journal={arXiv preprint arXiv:1409.0473},
-  year={2014}
-}
-@article{cho2014learning,
-  title={Learning phrase representations using RNN encoder-decoder for statistical machine translation},
-  author={Cho, Kyunghyun and Van Merri{\"e}nboer, Bart and Gulcehre, Caglar and Bahdanau, Dzmitry and Bougares, Fethi and Schwenk, Holger and Bengio, Yoshua},
-  journal={arXiv preprint arXiv:1406.1078},
-  year={2014}
-}
-@article{rush2015neural,
-  title={A neural attention model for abstractive sentence summarization},
-  author={Rush, Alexander M and Chopra, Sumit and Weston, Jason},
-  journal={arXiv preprint arXiv:1509.00685},
-  year={2015}
-}
-@article{micikevicius2017mixed,
-  title={Mixed precision training},
-  author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaev, Oleksii and Venkatesh, Ganesh and others},
-  journal={arXiv preprint arXiv:1710.03740},
-  year={2017}
-}
-@ARTICLE{Britz:2017,
-  author = {{Britz}, Denny and {Goldie}, Anna and {Luong}, Thang and {Le}, Quoc},
-  title  = {Massive Exploration of Neural Machine Translation Architectures},
-  journal = {ArXiv e-prints arXiv:1703.03906},
-  archivePrefix   = "arXiv",
-  eprinttype      = {arxiv},
-  eprint          = {1703.03906},
-  primaryClass    = "cs.CL",
-  keywords        = {Computer Science - Computation and Language},
-  year            = 2017,
-  month           = mar
-}
-@inproceedings{abadi2016tensorflow,
-  title={TensorFlow: A System for Large-Scale Machine Learning.},
-  author={Abadi, Mart{\'\i}n and Barham, Paul and Chen, Jianmin and Chen, Zhifeng and Davis, Andy and Dean, Jeffrey and Devin, Matthieu and Ghemawat, Sanjay and Irving, Geoffrey and Isard, Michael and others},
-  booktitle={OSDI},
-  volume={16},
-  pages={265--283},
-  year={2016}
-}
-@article{tensor2tensor,
-  author = {Ashish Vaswani and Samy Bengio and Eugene Brevdo and Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and  \L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and
-    Noam Shazeer and Jakob Uszkoreit},
-  title     = {Tensor2Tensor for Neural Machine Translation},
-  journal   = {CoRR},
-  volume    = {abs/1803.07416},
-  year      = {2018},
-  url       = {http://arxiv.org/abs/1803.07416},
-}
-@article{gehring2017convs2s,
-  author          = {Gehring, Jonas, and Auli, Michael and Grangier, David and Yarats, Denis and Dauphin, Yann N},
-  title           = "{Convolutional Sequence to Sequence Learning}",
-  journal         = {ArXiv e-prints arXiv:1705.03122},
-  archivePrefix   = "arXiv",
-  eprinttype      = {arxiv},
-  eprint          = {1705.03122},
-  primaryClass    = "cs.CL",
-  keywords        = {Computer Science - Computation and Language},
-  year            = 2017,
-  month           = May,
-}
-@inproceedings{chan2015,
-  title={Listen, attend and spell},
-  author={Chan, William and Jaitly, Navdeep and Le, Quoc V and Vinyals, Oriol},
-  booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on},
-  pages={5206--5210},
-  year={2016},
-  organization={IEEE}
-}
-@inproceedings{xu2015show,
-  title={Show, attend and tell: Neural image caption generation with visual attention},
-  author={Xu, Kelvin and Ba, Jimmy and Kiros, Ryan and Cho, Kyunghyun and Courville, Aaron and Salakhudinov, Ruslan and Zemel, Rich and Bengio, Yoshua},
-  booktitle={International Conference on Machine Learning},
-  pages={2048--2057},
-  year={2015}
-}
-@incollection{Sutskever2014,
-  title = {Sequence to Sequence Learning with Neural Networks},
-  author = {Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V},
-  booktitle = {Advances in Neural Information Processing Systems 27},
-  editor = {Z. Ghahramani and M. Welling and C. Cortes and N. D. Lawrence and K. Q. Weinberger},
-  pages = {3104--3112},
-  year = {2014},
-  publisher = {Curran Associates, Inc.},
-  url = {http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf}
-}
-@article{DeepSpeech2014,
-  title     = {Deep Speech: Scaling up end-to-end speech recognition},
-  author    = {Awni Y. Hannun and  Carl Case and  Jared Casper and Bryan Catanzaro and Greg Diamos and Erich Elsen and Ryan Prenger and  Sanjeev Satheesh and Shubho Sengupta and Adam Coates and   Andrew Y. Ng},
-  journal   = {CoRR},
-  volume    = {abs/1412.5567},
-  year      = {2014},
-  url       = {http://arxiv.org/abs/1412.5567},
-  archivePrefix = {arXiv},
-  eprint    = {1412.5567},
-  timestamp = {Mon, 13 Aug 2018 16:48:07 +0200},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/HannunCCCDEPSSCN14},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@inproceedings{DeepSpeech2,
- author = {Amodei, Dario and Ananthanarayanan, Sundaram and Anubhai, Rishita and Bai, Jingliang and Battenberg, Eric and Case, Carl and Casper, Jared and Catanzaro, Bryan and Cheng, Qiang and Chen, Guoliang and Chen, Jie and Chen, Jingdong and Chen, Zhijie and Chrzanowski, Mike and Coates, Adam and Diamos, Greg and Ding, Ke and Du, Niandong and Elsen, Erich and Engel, Jesse and Fang, Weiwei and Fan, Linxi and Fougner, Christopher and Gao, Liang and Gong, Caixia and Hannun, Awni and Han, Tony and Johannes, Lappi Vaino and Jiang, Bing and Ju, Cai and Jun, Billy and LeGresley, Patrick and Lin, Libby and Liu, Junjie and Liu, Yang and Li, Weigao and Li, Xiangang and Ma, Dongpeng and Narang, Sharan and Ng, Andrew and Ozair, Sherjil and Peng, Yiping and Prenger, Ryan and Qian, Sheng and Quan, Zongfeng and Raiman, Jonathan and Rao, Vinay and Satheesh, Sanjeev and Seetapun, David and Sengupta, Shubho and Srinet, Kavya and Sriram, Anuroop and Tang, Haiyuan and Tang, Liliang and Wang, Chong and Wang, Jidong and Wang, Kaifu and Wang, Yi and Wang, Zhijian and Wang, Zhiqian and Wu, Shuang and Wei, Likai and Xiao, Bo and Xie, Wen and Xie, Yan and Yogatama, Dani and Yuan, Bin and Zhan, Jun and Zhu, Zhenyao},
- title = {Deep Speech 2: End-to-end Speech Recognition in English and Mandarin},
- booktitle = {Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48},
- series = {ICML'16},
- year = {2016},
- location = {New York, NY, USA},
- pages = {173--182},
- numpages = {10},
- url = {http://dl.acm.org/citation.cfm?id=3045390.3045410},
- acmid = {3045410},
- publisher = {JMLR.org},
-}
-@inproceedings{prabhavalkar2017comparison,
-  title={A comparison of sequence-to-sequence models for speech recognition},
-  author={Prabhavalkar, Rohit and Rao, Kanishka and Sainath, Tara N and Li, Bo and Johnson, Leif and Jaitly, Navdeep},
-  booktitle={Proc. Interspeech},
-  pages={939--943},
-  year={2017}
-}
-@article{chiu2017state,
-  title={State-of-the-art speech recognition with sequence-to-sequence models},
-  author={Chiu, Chung-Cheng and Sainath, Tara N and Wu, Yonghui and Prabhavalkar, Rohit and Nguyen, Patrick and Chen, Zhifeng and Kannan, Anjuli and Weiss, Ron J and Rao, Kanishka and Gonina, Katya and others},
-  journal={arXiv preprint arXiv:1712.01769},
-  year={2017}
-}
-@misc{NVMixed,
-  title = {{NVIDA's Mixed-Precision Training - TensorFlow example}},
-  howpublished = {\url{https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/#example_tensorflow}},
-  author={NVIDIA},
-  note = {Accessed: 2018-10-09},
-  year={2018}
-}
-@article{gehring2017,
-  title={Convolutional sequence to sequence learning},
-  author={Gehring, Jonas and Auli, Michael and Grangier, David and Yarats, Denis and Dauphin, Yann N},
-  journal={arXiv preprint arXiv:1705.03122},
-  year={2017}
-}
-@article{collobert2016,
-  title={Wav2letter: an end-to-end convnet-based speech recognition system},
-  author={Collobert, Ronan and Puhrsch, Christian and Synnaeve, Gabriel},
-  journal={arXiv preprint arXiv:1609.03193},
-  year={2016}
-}
-@inproceedings{Zhang2016,
-author={Ying Zhang and Mohammad Pezeshki and Philémon Brakel and Saizheng Zhang and César Laurent and Yoshua Bengio and Aaron Courville},
-title={Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks},
-year=2016,
-booktitle={Interspeech 2016},
-doi={10.21437/Interspeech.2016-1446},
-url={http://dx.doi.org/10.21437/Interspeech.2016-1446},
-pages={410--414}
-}
-@inproceedings{Zhang2017,
-  title={Very deep convolutional networks for end-to-end speech recognition},
-  author={Zhang, Yu, and  Chan, William, and  Jaitly, Navdeep},
-  booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on},
-  year={2017},
-  organization={IEEE}
-}
-@article{Wang2017,
-  title={Tacotron: Towards End-to-End Speech Synthesis},
-  author={ Wang, Yuxuan, and Skerry-Ryan, RJ,  and Stanton, Daisy  and Wu, Yonghui and   Weiss, Ron, and   Jaitly, Navdeep  and Yang, Zongheng and  Xiao, Ying and  Chen,Zhifeng and Bengio, Samy and  Le, Quoc  and   Agiomyrgiannakis, Yannis  and  Clark,Rob and Saurous,  Rif A.},
-  journal={arXiv preprint arXiv:1703.10135},
-  year={2017}
-}
-@article{griffin1984signal,
-  title={Signal estimation from modified short-time Fourier transform},
-  author={Griffin, Daniel and Lim, Jae},
-  journal={IEEE Transactions on Acoustics, Speech, and Signal Processing},
-  volume={32},
-  number={2},
-  pages={236--243},
-  year={1984},
-  publisher={IEEE}
-}
-@misc{ito2017lj,
-  title={The LJ speech dataset},
-  author={Ito, Keith and others},
-  year={2017}
-}
-@misc{mailabs,
-  title = {{The M-AILABS Speech Dataset}},
-  howpublished = {\url{http://www.m-ailabs.bayern/en/the-mailabs-speech-dataset/}},
-  author={M-AILABS},
-  note = {Accessed: 2018-10-09},
-  year={2018}
-}
-@article{merity2016pointer,
-  title={Pointer sentinel mixture models},
-  author={Merity, Stephen and Xiong, Caiming and Bradbury, James and Socher, Richard},
-  journal={arXiv preprint arXiv:1609.07843},
-  year={2016}
-}
-@inproceedings{socher2013recursive,
-  title={Recursive deep models for semantic compositionality over a sentiment treebank},
-  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew and Potts, Christopher},
-  booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},
-  pages={1631--1642},
-  year={2013}
-}
-@InProceedings{maas-EtAl:2011:ACL-HLT2011,
-  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
-  title     = {Learning Word Vectors for Sentiment Analysis},
-  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
-  month     = {June},
-  year      = {2011},
-  address   = {Portland, Oregon, USA},
-  publisher = {Association for Computational Linguistics},
-  pages     = {142--150},
-  url       = {http://www.aclweb.org/anthology/P11-1015}
-}
-@inproceedings{Povey2018SemiOrthogonalLM,
-  title={Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks},
-  author={Daniel Povey and Gaofeng Cheng and Yiming Wang and Ke Li and Hainan Xu and Mahsa Yarmohammadi and Sanjeev Khudanpur},
-  booktitle={Interspeech},
-  year={2018}
-}
-@article{CAPIO2017,
-  author    = {Kyu J. Han and Akshay Chandrashekaran and Jungsuk Kim and  Ian R. Lane},
-  title     = {The {CAPIO} 2017 Conversational Speech Recognition System},
-  journal   = {CoRR},
-  volume    = {abs/1801.00059},
-  year      = {2018},
-  url       = {http://arxiv.org/abs/1801.00059},
-  archivePrefix = {arXiv},
-  eprint    = {1801.00059},
-  timestamp = {Mon, 13 Aug 2018 16:49:10 +0200},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1801-00059},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@article{WaveNet,
-  author    = {A{\"{a}}ron van den Oord and Sander Dieleman and  Heiga Zen and Karen Simonyan and                Oriol Vinyals and  Alex Graves and  Nal Kalchbrenner and Andrew W. Senior and Koray Kavukcuoglu},
-  title     = {WaveNet: {A} Generative Model for Raw Audio},
-  journal   = {CoRR},
-  volume    = {abs/1609.03499},
-  year      = {2016},
-  url       = {http://arxiv.org/abs/1609.03499},
-  archivePrefix = {arXiv},
-  eprint    = {1609.03499},
-  timestamp = {Mon, 13 Aug 2018 16:49:15 +0200},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/OordDZSVGKSK16},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@article{FacebookGERENGBackTranslation,
-  author    = {Rico Sennrich and Barry Haddow and Alexandra Birch},
-  title     = {Improving Neural Machine Translation Models with Monolingual Data},
-  journal   = {CoRR},
-  volume    = {abs/1511.06709},
-  year      = {2015},
-  url       = {http://arxiv.org/abs/1511.06709},
-  archivePrefix = {arXiv},
-  eprint    = {1511.06709},
-  timestamp = {Mon, 13 Aug 2018 16:47:05 +0200},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/SennrichHB15a},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@article{GlobalStyleTokens,
-  author    = {Yuxuan Wang and Daisy Stanton and  Yu Zhang and  R. J. Skerry{-}Ryan and   Eric Battenberg and Joel Shor and  Ying Xiao and  Fei Ren and  Ye Jia and  Rif A. Saurous},
-  title     = {Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis},
-  journal   = {CoRR},
-  volume    = {abs/1803.09017},
-  year      = {2018},
-  url       = {http://arxiv.org/abs/1803.09017},
-  archivePrefix = {arXiv},
-  eprint    = {1803.09017},
-  timestamp = {Mon, 13 Aug 2018 16:46:53 +0200},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1803-09017},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@article{IoffeS15BatchNorm,
-  author = {Sergey Ioffe and  Christian Szegedy},
-  title  = {Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift},
-  journal = {CoRR},
-  volume = {abs/1502.03167},
-  year = {2015},
-  url  = {http://arxiv.org/abs/1502.03167},
-  archivePrefix = {arXiv},
-  eprint    = {1502.03167},
-  timestamp = {Mon, 13 Aug 2018 16:47:06 +0200},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/IoffeS15},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@article{kingma,
-  author    = {Diederik P. Kingma and
-               Jimmy Ba},
-  title     = {Adam: {A} Method for Stochastic Optimization},
-  journal   = {CoRR},
-  volume    = {abs/1412.6980},
-  year      = {2014},
-  url       = {http://arxiv.org/abs/1412.6980},
-  archivePrefix = {arXiv},
-  eprint    = {1412.6980},
-  timestamp = {Mon, 13 Aug 2018 01:00:00 +0200},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/KingmaB14},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@incollection{Salimans2016WeightNorm,
-  title = {Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks},
-  author = {Salimans, Tim and Kingma, Durk P},
-  booktitle = {Advances in Neural Information Processing Systems 29},
-  editor = {D. D. Lee and M. Sugiyama and U. V. Luxburg and I. Guyon and R. Garnett},
-  pages = {901--909},
-  year = {2016},
-  publisher = {Curran Associates, Inc.},
-  url = {http://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf}
-}
-@article{wu2016google,
-  title={Google's neural machine translation system: Bridging the gap between human and machine translation},
-  author={Wu, Yonghui and Schuster, Mike and Chen, Zhifeng and Le, Quoc V and Norouzi, Mohammad and Macherey, Zolfgang and Krikun, Maxim and Cao, Yuan and Gao, Qin and Macherey, Klaus and others},
-  journal={arXiv preprint arXiv:1609.08144},
-  year={2016}
-}
-@inproceedings{opennmt,
-  author = {Guillaume Klein and  Yoon Kim and  Yuntian Deng  and  Jean Senellart and  Alexander M. Rush},
-  title  = {OpenNMT: Open-Source Toolkit for Neural Machine Translation},
-  booktitle = {Proc. ACL},
-  year      = {2017},
-  url       = {https://doi.org/10.18653/v1/P17-4012},
-  doi       = {10.18653/v1/P17-4012}
-}
-@article{paszke2017automatic,
-  title={Automatic differentiation in PyTorch},
-  author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
-  year={2017}
-}
-@article{yu2014introduction,
-  title={An introduction to computational networks and the computational network toolkit},
-  author={Yu, Dong and Eversole, Adam and Seltzer, Mike and Yao, Kaisheng and Huang, Zhiheng and Guenter, Brian and Kuchaiev, Oleksii and Zhang, Yu and Seide, Frank and Wang, Huaming and others},
-  journal={Microsoft Technical Report MSR-TR-2014--112},
-  year={2014}
-}
-@article{nvidia2017v100,
-  title={V100 GPU architecture. The world’s most advanced data center GPU. Version WP-08608-001\_v1. 1},
-  author={NVIDIA, Tesla},
-  journal={NVIDIA. Aug},
-  pages={108},
-  year={2017}
-}
-@article{Ba2016LayerNorm,
-  author    = {Jimmy Lei Ba and Jamie Ryan Kiros and Geoffrey E Hinton},
-  title     = {Layer normalization},
-  journal   = {CoRR},
-  volume    = {abs/1607.06450},
-  year      = {2016},
-  url       = {http://arxiv.org/abs/1607.06450},
-  archivePrefix = {arXiv},
-}
-@inproceedings{Dauphin2017GLU,
- author = {Dauphin, Yann N. and Fan, Angela and Auli, Michael and Grangier, David},
- title = {Language Modeling with Gated Convolutional Networks},
- booktitle = {Proceedings of the 34th International Conference on Machine Learning - Volume 70},
- series = {ICML'17},
- year = {2017},
- location = {Sydney, NSW, Australia},
- pages = {933--941},
- numpages = {9},
- url = {http://dl.acm.org/citation.cfm?id=3305381.3305478},
- acmid = {3305478},
- publisher = {JMLR.org},
-}
-@incollection{Oord2016PixelCNN,
-title = {Conditional Image Generation with PixelCNN Decoders},
-author = {van den Oord, Aaron and Kalchbrenner, Nal and Espeholt, Lasse and kavukcuoglu, koray and Vinyals, Oriol and Graves, Alex},
-booktitle = {Advances in Neural Information Processing Systems 29},
-editor = {D. D. Lee and M. Sugiyama and U. V. Luxburg and I. Guyon and R. Garnett},
-pages = {4790--4798},
-year = {2016},
-publisher = {Curran Associates, Inc.},
-url = {http://papers.nips.cc/paper/6527-conditional-image-generation-with-pixelcnn-decoders.pdf}
-}
-@article{he2015,
-  title={Deep residual learning for image recognition},
-  author={K. He, and X. Zhang, and S. Ren, and J. Sun},
-  journal={arXiv preprint arXiv:1512.03385},
-  year={2015}
-}
-@article{huang2016,
-  title={Densely Connected Convolutional Networks},
-  author={Gao Huang, and Zhuang Liu, and Laurens van der Maaten, and Kilian Q. Weinberger},
-  journal={arXiv preprint arXiv:1608.06993},
-  year={2016}
-}
-@inproceedings{heafield2011kenlm,
-  title={KenLM: Faster and smaller language model queries},
-  author={Heafield, Kenneth},
-  booktitle={Proceedings of the sixth workshop on statistical machine translation},
-  pages={187--197},
-  year={2011},
-  organization={Association for Computational Linguistics}
-}
-@article{dai2018transformer,
-  title={Transformer-XL: Language Modeling with Longer-Term Dependency},
-  author={Dai, Zihang and Yang, Zhilin and Yang, Yiming and Cohen, William W and Carbonell, Jaime and Le, Quoc V and Salakhutdinov, Ruslan},
-  year={2018},
-  journal   = {CoRR},
-  volume    = {abs/1901.02860},
-  url       = {http://arxiv.org/abs/1901.02860},
-  archivePrefix = {arXiv},
-  eprint    = {1901.02860},
-  timestamp = {Fri, 01 Feb 2019 13:39:59 +0100},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1901-02860},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@inproceedings{Saon+2016,
-author={George Saon and Tom Sercu and Steven Rennie and Hong-Kwang J. Kuo},
-title={The IBM 2016 English Conversational Telephone Speech Recognition System},
-year=2016,
-booktitle={Interspeech 2016},
-doi={10.21437/Interspeech.2016-1460},
-url={http://dx.doi.org/10.21437/Interspeech.2016-1460},
-pages={7--11}
-}
-@INPROCEEDINGS{Sercu-2016,
-author={T. {Sercu} and C. {Puhrsch} and B. {Kingsbury} and Y. {LeCun}},
-booktitle={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
-title={Very deep multilingual convolutional neural networks for LVCSR},
-year={2016},
-volume={},
-number={},
-pages={4955-4959},
-keywords={natural language processing;neural nets;speech recognition;very deep multilingual convolutional neural networks;LVCSR;CNN;large vocabulary continuous speech recognition systems;word error rate;Training;Context;Hidden Markov models;Neural networks;Computer architecture;Kernel;Training data;Convolutional Networks;Multilingual;Acoustic Modeling;Speech Recognition;Neural Networks},
-doi={10.1109/ICASSP.2016.7472620},
-ISSN={2379-190X},
-month={March},}
-@inproceedings{Sercu+2016,
-author={Tom Sercu and Vaibhava Goel},
-title={Advances in Very Deep Convolutional Neural Networks for LVCSR},
-year=2016,
-booktitle={Interspeech 2016},
-doi={10.21437/Interspeech.2016-1033},
-url={http://dx.doi.org/10.21437/Interspeech.2016-1033},
-pages={3429--3433}
-}
-@INPROCEEDINGS{Xiong-2018,
-author={W. {Xiong} and L. {Wu} and F. {Alleva} and J. {Droppo} and X. {Huang} and A. {Stolcke}},
-booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
-title={The Microsoft 2017 Conversational Speech Recognition System},
-year={2018},
-volume={},
-number={},
-pages={5934-5938},
-keywords={convolution;feedforward neural nets;natural language processing;speaker recognition;speech processing;language model rescoring step;senone level;switchboard domains;character-based LSTM language models;NIST 2000 switchboard test set;frame level;word-level voting;acoustic model posteriors;dialog session aware LSTM language models;CNN-BLSTM acoustic model;Microsoft 2017 conversational speech recognition system;Acoustics;Error analysis;Training;Speech recognition;Switches;Computational modeling;Context modeling;Conversational speech recognition;CNN;LACE;BLSTM;LSTM-LM;system combination;human parity},
-doi={10.1109/ICASSP.2018.8461870},
-ISSN={2379-190X},
-month={April},}
-@inproceedings{zeyer2018improved,
-  author={Albert Zeyer and Kazuki Irie and Ralf Schlüter and Hermann Ney},
-  title={Improved Training of End-to-end Attention Models for Speech Recognition},
-  year=2018,
-  booktitle={Proc. Interspeech 2018},
-  pages={7--11},
-  doi={10.21437/Interspeech.2018-1616},
-  url={http://dx.doi.org/10.21437/Interspeech.2018-1616}
-}
-@article{Wav2LetterV2,
-  author    = {Vitaliy Liptchinsky and
-               Gabriel Synnaeve and
-               Ronan Collobert},
-  title     = {Letter-Based Speech Recognition with Gated ConvNets},
-  journal   = {CoRR},
-  volume    = {abs/1712.09444},
-  year      = {2017},
-  url       = {http://arxiv.org/abs/1712.09444},
-  archivePrefix = {arXiv},
-  eprint    = {1712.09444},
-  timestamp = {Mon, 13 Aug 2018 16:46:33 +0200},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1712-09444},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@article{zeghidour2018,
-  author    = {Neil Zeghidour and
-               Qiantong Xu and
-               Vitaliy Liptchinsky and
-               Nicolas Usunier and
-               Gabriel Synnaeve and
-               Ronan Collobert},
-  title     = {Fully Convolutional Speech Recognition},
-  journal   = {CoRR},
-  volume    = {abs/1812.06864},
-  year      = {2018},
-  url       = {http://arxiv.org/abs/1812.06864},
-  archivePrefix = {arXiv},
-  eprint    = {1812.06864},
-  timestamp = {Tue, 01 Jan 2019 15:01:25 +0100},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1812-06864},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@inproceedings{Hadian2018,
-  author={Hossein Hadian and Hossein Sameti and Daniel Povey and Sanjeev Khudanpur},
-  title={End-to-end Speech Recognition Using Lattice-free MMI},
-  year=2018,
-  booktitle={Proc. Interspeech 2018},
-  pages={12--16},
-  doi={10.21437/Interspeech.2018-1423},
-  url={http://dx.doi.org/10.21437/Interspeech.2018-1423}
-}
-@inproceedings{Tang2018,
-  author={Jian Tang and Yan Song and Lirong Dai and Ian McLoughlin},
-  title={Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition},
-  year=2018,
-  booktitle={Proc. Interspeech 2018},
-  pages={1783--1787},
-  doi={10.21437/Interspeech.2018-1089},
-  url={http://dx.doi.org/10.21437/Interspeech.2018-1089}
-}
-@article{Kurata2017LanguageMW,
-  title={Language modeling with highway LSTM},
-  author={Gakuto Kurata and Bhuvana Ramabhadran and George Saon and Abhinav Sethy},
-  journal={2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
-  year={2017},
-  pages={244-251}
-}
-@inproceedings{Saon2017,
-  author={George Saon and Gakuto Kurata and Tom Sercu and Kartik Audhkhasi and Samuel Thomas and Dimitrios Dimitriadis and Xiaodong Cui and Bhuvana Ramabhadran and Michael Picheny and Lynn-Li Lim and Bergul Roomi and Phil Hall},
-  title={English Conversational Telephone Speech Recognition by Humans and Machines},
-  year=2017,
-  booktitle={Proc. Interspeech 2017},
-  pages={132--136},
-  doi={10.21437/Interspeech.2017-405},
-  url={http://dx.doi.org/10.21437/Interspeech.2017-405}
-}
-@inproceedings{Povey+2016,
-author={Daniel Povey and Vijayaditya Peddinti and Daniel Galvez and Pegah Ghahremani and Vimal Manohar and Xingyu Na and Yiming Wang and Sanjeev Khudanpur},
-title={Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI},
-year=2016,
-booktitle={Interspeech 2016},
-doi={10.21437/Interspeech.2016-595},
-url={http://dx.doi.org/10.21437/Interspeech.2016-595},
-pages={2751--2755}
-}
-@article{Yang2018,
-  author    = {Xuerui Yang and
-               Jiwei Li and
-               Xi Zhou},
-  title     = {A novel pyramidal-FSMN architecture with lattice-free {MMI} for speech
-               recognition},
-  journal   = {CoRR},
-  volume    = {abs/1810.11352},
-  year      = {2018},
-  url       = {http://arxiv.org/abs/1810.11352},
-  archivePrefix = {arXiv},
-  eprint    = {1810.11352},
-  timestamp = {Wed, 31 Oct 2018 14:24:29 +0100},
-  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1810-11352},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-@article{liptchinsky2017based,
-  title={Letter-Based Speech Recognition with Gated ConvNets},
-  author={Liptchinsky, Vitaliy and Synnaeve, Gabriel and Collobert, Ronan},
-  journal={arXiv preprint arXiv:1712.09444},
-  year={2017}
-}
-@inproceedings{Weng2018,
-  author={Chao Weng and Jia Cui and Guangsen Wang and Jun Wang and Chengzhu Yu and Dan Su and Dong Yu},
-  title={Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition},
-  year=2018,
-  booktitle={Proc. Interspeech 2018},
-  pages={761--765},
-  doi={10.21437/Interspeech.2018-1030},
-  url={http://dx.doi.org/10.21437/Interspeech.2018-1030}
-}
-@INPROCEEDINGS{Battenberg2017,
-author={E. {Battenberg} and J. {Chen} and R. {Child} and A. {Coates} and Y. G. Y. {Li} and H. {Liu} and S. {Satheesh} and A. {Sriram} and Z. {Zhu}},
-booktitle={2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
-title={Exploring neural transducers for end-to-end speech recognition},
-year={2017},
-volume={},
-number={},
-pages={206-213},
-keywords={recurrent neural nets;speech recognition;Hub500 benchmark;CTC models;speech recognition pipeline;RNN-Transducer models;language model;Seq2Seq models;end-to-end speech recognition;neural transducers;Decoding;Hidden Markov models;Transducers;Task analysis;Speech;Mathematical model;Neural networks},
-doi={10.1109/ASRU.2017.8268937},
-ISSN={},
-month={Dec},
-}
-@inproceedings{
-loshchilov2018,
-title={Decoupled Weight Decay Regularization},
-author={Ilya Loshchilov and Frank Hutter},
-booktitle={International Conference on Learning Representations},
-year={2019},
-url={https://openreview.net/forum?id=Bkg6RiCqY7},
-}
-@article{zhang2017ndadam,
-  author = {Zijun Zhang and Lin Ma and Zongpeng Li and Chuan Wu},
-  title = {Normalized Direction-preserving Adam},
-  journal = {arXiv e-prints arXiv:1709.04546},
-  year = {2017},
-}
-@article{park2019,
-       author = {{Park}, Daniel S. and {Chan}, William and {Zhang}, Yu and
-         {Chiu}, Chung-Cheng and {Zoph}, Barret and {Cubuk}, Ekin D. and
-         {Le}, Quoc V.},
-        title = "{SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition}",
-      journal = {arXiv e-prints},
-      year = "2019",
-      eid = {arXiv:1904.08779},
-      eprint = {1904.08779},
-}
-@article{novograd2019,
-       author = {{Ginsburg}, Boris and {Castonguay}, Patrice and {Hrinchuk}, Oleksii and
-         {Kuchaiev}, Oleksii and {Lavrukhin}, Vitaly and {Leary}, Ryan and
-         {Li}, Jason and {Nguyen}, Huyen and {Cohen}, Jonathan M.},
-        title = "{Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks}",
-      journal = {arXiv e-prints},
-         year = "2019",
-          eid = {arXiv:1905.11286},
-       eprint = {1905.11286},
-}
-@article{kriman2019quartznet,
-  title={Quartznet: {Deep} automatic speech recognition with 1d time-channel separable convolutions},
-  author={Kriman, Samuel and Beliaev, Stanislav and Ginsburg, Boris and Huang, Jocelyn and Kuchaiev, Oleksii and Lavrukhin, Vitaly and Leary, Ryan and Li, Jason and Zhang, Yang},
-  journal={arXiv preprint arXiv:1910.10261},
-  year={2019}
-}
-@misc{itu1988g711,
-    title={{ITU-T} {G.711} - {Pulse} code modulation ({PCM}) of voice frequencies},
-    author={ITU-T Geneva Switzerland},
-    year={1988},
-}
-@article{han2020contextnet,
-  title={ContextNet: Improving convolutional neural networks for automatic speech recognition with global context},
-  author={Han, Wei and Zhang, Zhengdong and Zhang, Yu and Yu, Jiahui and Chiu, Chung-Cheng and Qin, James and Gulati, Anmol and Pang, Ruoming and Wu, Yonghui},
-  journal={arXiv:2005.03191},
-  year={2020}
-}
-@inproceedings{hu2018squeeze,
-  title={Squeeze-and-excitation networks},
-  author={Hu, Jie and Shen, Li and Sun, Gang},
-  booktitle={ICVPR},
-  year={2018}
-}
-@article{koluguri2020speakernet,
-  title={SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification},
-  author={Koluguri, Nithin Rao and Li, Jason and Lavrukhin, Vitaly and Ginsburg, Boris},
-  journal={arXiv preprint arXiv:2010.12653},
-  year={2020}
-}
-@article{gulati2020conformer,
-  title={Conformer: Convolution-augmented transformer for speech recognition},
-  author={Gulati, Anmol and Qin, James and Chiu, Chung-Cheng and Parmar, Niki and Zhang, Yu and Yu, Jiahui and Han, Wei and Wang, Shibo and Zhang, Zhengdong and Wu, Yonghui and others},
-  journal={arXiv preprint arXiv:2005.08100},
-  year={2020}
-}
-@article{koluguri2021titanet,
-  title={TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context},
-  author={Koluguri, Nithin Rao and Park, Taejin and Ginsburg, Boris},
-  journal={arXiv preprint arXiv:2110.04410},
-  year={2021}
-}
-@article{Dawalatabad_2021,
-   title={ECAPA-TDNN Embeddings for Speaker Diarization},
-   url={http://dx.doi.org/10.21437/Interspeech.2021-941},
-   DOI={10.21437/interspeech.2021-941},
-   journal={Interspeech 2021},
-   publisher={ISCA},
-   author={Dawalatabad, Nauman and Ravanelli, Mirco and Grondin, François and Thienpondt, Jenthe and Desplanques, Brecht and Na, Hwidong},
-   year={2021},
-   month={Aug}
-}
-@inproceedings{he2019streaming,
-  title={Streaming end-to-end speech recognition for mobile devices},
-  author={He, Yanzhang and Sainath, Tara N and Prabhavalkar, Rohit and McGraw, Ian and Alvarez, Raziel and Zhao, Ding and Rybach, David and Kannan, Anjuli and Wu, Yonghui and Pang, Ruoming and others},
-  booktitle={ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
-  pages={6381--6385},
-  year={2019},
-  organization={IEEE}
-}
-@misc{wav2vec2,
-  doi = {10.48550/ARXIV.2006.11477},
-  url = {https://arxiv.org/abs/2006.11477},
-  author = {Baevski, Alexei and Zhou, Henry and Mohamed, Abdelrahman and Auli, Michael},
-  title = {wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
-  publisher = {arXiv},
-  year = {2020},
-  copyright = {arXiv.org perpetual, non-exclusive license}
-}
-@misc{w2v_bert,
-  doi = {10.48550/ARXIV.2108.06209},
-  url = {https://arxiv.org/abs/2108.06209},
-  author = {Chung, Yu-An and Zhang, Yu and Han, Wei and Chiu, Chung-Cheng and Qin, James and Pang, Ruoming and Wu, Yonghui},
-  title = {W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training},
-  publisher = {arXiv},
-  year = {2021},
-  copyright = {arXiv.org perpetual, non-exclusive license}
-}
-@misc{ssl_inter,
-  doi = {10.48550/ARXIV.2112.08778},
-  url = {https://arxiv.org/abs/2112.08778},
-  author = {Wang, Chengyi and Wu, Yu and Chen, Sanyuan and Liu, Shujie and Li, Jinyu and Qian, Yao and Yang, Zhenglu},
-  title = {Self-Supervised Learning for speech recognition with Intermediate layer supervision},
-  publisher = {arXiv},
-  year = {2021},
-  copyright = {arXiv.org perpetual, non-exclusive license}
-}
-@misc{kim2022squeezeformer,
-  doi = {10.48550/ARXIV.2206.00888},
-  url = {https://arxiv.org/abs/2206.00888},
-  author = {Kim, Sehoon and Gholami, Amir and Shaw, Albert and Lee, Nicholas and Mangalam, Karttikeya and Malik, Jitendra and Mahoney, Michael W. and Keutzer, Kurt},
-  keywords = {Audio and Speech Processing (eess.AS), Computation and Language (cs.CL), Sound (cs.SD), FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Computer and information sciences, FOS: Computer and information sciences},
-  title = {Squeezeformer: An Efficient Transformer for Automatic Speech Recognition},
-  publisher = {arXiv},
-  year = {2022},
-  copyright = {arXiv.org perpetual, non-exclusive license}
-}
-@misc{park2022multi,
-    doi = {10.48550/ARXIV.2203.15974},
-    url = {https://arxiv.org/abs/2203.15974},
-    author = {Park, Tae Jin and Koluguri, Nithin Rao and Balam, Jagadeesh and Ginsburg, Boris},
-    keywords = {Audio and Speech Processing (eess.AS), Computation and Language (cs.CL), FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Computer and information sciences, FOS: Computer and information sciences},
-    title = {Multi-scale Speaker Diarization with Dynamic Scale Weighting},
-    publisher = {arXiv},
-    year = {2022},
-    copyright = {Creative Commons Attribution 4.0 International}
-}
-@inproceedings{vaswani2017aayn,
-  title={Attention is all you need},
-  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
-  booktitle={Advances in Neural Information Processing Systems},
-  pages={6000--6010},
-  year={2017}
-}

NeMo-2.2.0/docs/source/asr/asr_language_modeling_and_customization.rst DELETED Viewed

@@ -1,663 +0,0 @@
-#######################################
-ASR Language Modeling and Customization
-#######################################
-Language models have shown to help the accuracy of ASR models. NeMo supports the following two approaches to incorporate language models into the ASR models:
-*  :ref:`ngram_modeling`
-*  :ref:`neural_rescoring`
-It is possible to use both approaches on the same ASR model.
-.. _ngram_modeling:
-************************
-N-gram Language Modeling
-************************
-In this approach, an N-gram LM is trained on text data, then it is used in fusion with beam search decoding to find the
-best candidates. The beam search decoders in NeMo support language models trained with KenLM library (
-`https://github.com/kpu/kenlm <https://github.com/kpu/kenlm>`__).
-The beam search decoders and KenLM library are not installed by default in NeMo. You need to install them to be able to use beam search decoding and N-gram LM.
-Please refer to `scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh>`__
-on how to install them. Alternatively, you can build Docker image
-`scripts/installers/Dockerfile.ngramtools <https://github.com/NVIDIA/NeMo/blob/stable/scripts/installers/Dockerfile.ngramtools>`__ with all the necessary dependencies.
-NeMo supports both character-based and BPE-based models for N-gram LMs. An N-gram LM can be used with beam search
-decoders on top of the ASR models to produce more accurate candidates. The beam search decoder would incorporate
-the scores produced by the N-gram LM into its score calculations as the following:
-.. code-block::
-    final_score = acoustic_score + beam_alpha*lm_score + beam_beta*seq_length
-where acoustic_score is the score predicted by the acoustic encoder and lm_score is the one estimated by the LM.
-The parameter 'beam_alpha' determines the weight given to the N-gram language model, while 'beam_beta' is a penalty term that accounts for sequence length in the scores. A larger 'beam_alpha' places more emphasis on the language model and less on the acoustic model. Negative values for 'beam_beta' penalize longer sequences, encouraging the decoder to prefer shorter predictions. Conversely, positive values for 'beam_beta' favor longer candidates.
-.. _train-ngram-lm:
-Train N-gram LM
-===============
-The script to train an N-gram language model with KenLM can be found at:
-`scripts/asr_language_modeling/ngram_lm/train_kenlm.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/train_kenlm.py>`__.
-This script trains an N-gram language model with the KenLM library which can then be used with the beam search decoders on top of the ASR models. This script also supports both character-level and BPE-level encodings and models which are detected automatically from the model type.
-You can train the N-gram model using the following:
-.. code-block::
-    python train_kenlm.py nemo_model_file=<path to the .nemo file of the model> \
-                              train_paths=<list of paths to the training text or JSON manifest files> \
-                              kenlm_bin_path=<path to the bin folder of KenLM library> \
-                              kenlm_model_file=<path to store the binary KenLM model> \
-                              ngram_length=<order of N-gram model> \
-                              preserve_arpa=true
-The `train_paths` parameter allows for various input types, such as a list of text files, JSON manifests, or directories, to be used as the training data.
-If the file's extension is anything other than `.json`, it assumes that data format is plain text. For plain text format, each line should contain one
-sample. For the JSON manifests, the file must contain JSON-formatted samples per each line like this:
-.. code-block::
-    {"audio_filepath": "/data_path/file1.wav", "text": "The transcript of the audio file."}
-This code extracts the `text` field from each line to create the training text file. After the N-gram model is trained, it is stored at the path specified by `kenlm_model_file`.
-The following is the list of the arguments for the training script:
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| **Argument**     | **Type**  | **Default** | **Description**                                                                                                                |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| nemo_model_file  | str       | Required    | The path to `.nemo` file of the ASR model, or name of a pretrained NeMo model to extract a tokenizer.                          |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| train_paths      | List[str] | Required    | List of training files or folders. Files can be a plain text file or ".json" manifest or ".json.gz".                           |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| kenlm_model_file | str       | Required    | The path to store the KenLM binary model file.                                                                                 |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| kenlm_bin_path   | str       | Required    | The path to the bin folder of KenLM. It is a folder named `bin` under where KenLM is installed.                                |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| ngram_length**   | int       | Required    | Specifies order of N-gram LM.                                                                                                  |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| ngram_prune      | List[int] | [0]         | List of thresholds to prune N-grams. Example: [0,0,1]. See Pruning section on the https://kheafield.com/code/kenlm/estimation  |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| cache_path       | str       | ``""``      | Cache path to save tokenized files.                                                                                            |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| preserve_arpa    | bool      | ``False``   | Whether to preserve the intermediate ARPA file after construction of the BIN file.                                             |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-| verbose          | int       | 1           | Verbose level.                                                                                                                 |
-+------------------+-----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
-..note::
-It is recommended that you use 6 as the order of the N-gram model for BPE-based models. Higher orders may require re-compiling KenLM to support them.
-Evaluate by Beam Search Decoding and N-gram LM
-==============================================
-NeMo's beam search decoders are capable of using the KenLM's N-gram models to find the best candidates.
-The script to evaluate an ASR model with beam search decoding and N-gram models can be found at
-`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__.
-This script has a large number of possible argument overrides; therefore, it is recommended that you use ``python eval_beamsearch_ngram_ctc.py --help`` to see the full list of arguments.
-You can evaluate an ASR model using the following:
-.. code-block::
-    python eval_beamsearch_ngram_ctc.py nemo_model_file=<path to the .nemo file of the model> \
-           input_manifest=<path to the evaluation JSON manifest file \
-           kenlm_model_file=<path to the binary KenLM model> \
-           beam_width=[<list of the beam widths, separated with commas>] \
-           beam_alpha=[<list of the beam alphas, separated with commas>] \
-           beam_beta=[<list of the beam betas, separated with commas>] \
-           preds_output_folder=<optional folder to store the predictions> \
-           probs_cache_file=null \
-           decoding_mode=beamsearch_ngram \
-           decoding_strategy="<Beam library such as beam, pyctcdecode or flashlight>"
-It can evaluate a model in the following three modes by setting the argument ``--decoding_mode``:
-*  greedy: Just greedy decoding is done and no beam search decoding is performed.
-*  beamsearch: The beam search decoding is done, but without using the N-gram language model. Final results are equivalent to setting the weight of LM (beam_beta) to zero.
-*  beamsearch_ngram: The beam search decoding is done with N-gram LM.
-In ``beamsearch`` mode, the evaluation is performed using beam search decoding without any language model. The performance is reported in terms of Word Error Rate (WER) and Character Error Rate (CER). Moreover, when the best candidate is selected among the candidates, it is also reported as the best WER/CER. This can serve as an indicator of the quality of the predicted candidates.
-The script initially loads the ASR model and predicts the outputs of the model's encoder as log probabilities. This part is computed in batches on a device specified by --device, which can be either a CPU (`--device=cpu`) or a single GPU (`--device=cuda:0`).
-The batch size for this part is specified by ``--acoustic_batch_size``. Using the largest feasible batch size can speed up the calculation of log probabilities. Additionally, you can use `--use_amp` to accelerate the calculation and allow for larger --acoustic_batch_size values.
-Currently, multi-GPU support is not available for calculating log probabilities. However, using ``--probs_cache_file`` can help. This option stores the log probabilities produced by the model’s encoder in a pickle file, allowing you to skip the first step in future runs.
-The following is the list of the important arguments for the evaluation script:
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| **Argument**                         | **Type** | **Default**      | **Description**                                                         |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| nemo_model_file                      | str      | Required         | The path of the `.nemo` file of the ASR model to extract the tokenizer. |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| input_manifest                       | str      | Required         | Path to the training file, it can be a text file or JSON manifest.      |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| kenlm_model_file                     | str      | Required         | The path to store the KenLM binary model file.                          |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| preds_output_folder                  | str      | None             | The path to an optional folder to store the predictions.                |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| probs_cache_file                     | str      | None             | The cache file for storing the outputs of the model.                    |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| acoustic_batch_size                  | int      | 16               | The batch size to calculate log probabilities.                          |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| use_amp                              | bool     | False            | Whether to use AMP if available to calculate log probabilities.         |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| device                               | str      | cuda             | The device to load the model onto to calculate log probabilities.       |
-|                                      |          |                  | It can `cpu`, `cuda`, `cuda:0`, `cuda:1`, ...                           |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| decoding_mode                        | str      | beamsearch_ngram | The decoding scheme to be used for evaluation.                          |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| beam_width                           | float    | Required         | List of the width or list of the widths of the beam search decoding.    |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| beam_alpha                           | float    | Required         | List of the alpha parameter for the beam search decoding.               |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| beam_beta                            | float    | Required         | List of the beta parameter for the beam search decoding.                |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| beam_batch_size                      | int      | 128              | The batch size to be used for beam search decoding.                     |
-|                                      |          |                  | Larger batch size can be a little faster, but uses larger memory.       |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| decoding_strategy                    | str      | beam             | String argument for type of decoding strategy for the model.            |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| decoding                             | Dict     | BeamCTC          | Subdict of beam search configs. Values found via                        |
-|                                      | Config   | InferConfig      | python eval_beamsearch_ngram_ctc.py --help                                  |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| text_processing.do_lowercase         | bool     | ``False``        | Whether to make the training text all lower case.                       |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| text_processing.punctuation_marks    | str      | ``""``           | String with punctuation marks to process. Example: ".\,?"               |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| text_processing.rm_punctuation       |  bool    | ``False``        | Whether to remove punctuation marks from text.                          |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-| text_processing.separate_punctuation | bool     | ``True``         | Whether to separate punctuation with the previous word by space.        |
-+--------------------------------------+----------+------------------+-------------------------------------------------------------------------+
-The width of the beam search (``--beam_width``) specifies the number of top candidates or predictions the beam search decoder will consider. Larger beam widths result in more accurate but slower predictions.
-.. note::
-    The ``eval_beamsearch_ngram_ctc.py`` script contains the entire subconfig used for CTC Beam Decoding.
-    Therefore it is possible to forward arguments for various beam search libraries such as ``flashlight``
-    and ``pyctcdecode`` via the ``decoding`` subconfig.
-To learn more about evaluating the ASR models with N-gram LM, refer to the tutorial here: Offline ASR Inference with Beam Search and External Language Model Rescoring
-`Offline ASR Inference with Beam Search and External Language Model Rescoring <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Offline_ASR.ipynb>`_
-Beam Search Engines
--------------------
-NeMo ASR CTC supports multiple beam search engines for decoding. The default engine is beam, which is the OpenSeq2Seq decoding library.
-OpenSeq2Seq (``beam``)
-~~~~~~~~~~~~~~~~~~~~~~
-CPU-based beam search engine that is quite efficient and supports char and subword models. It requires a character/subword
-KenLM model to be provided.
-The config for this decoding library is described above.
-Flashlight (``flashlight``)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Flashlight is a C++ library for ASR decoding provided at `https://github.com/flashlight/flashlight <https://github.com/flashlight/flashlight>`_. It is a CPU- and CUDA-based beam search engine that is quite efficient and supports char and subword models. It requires an ARPA KenLM file.
-It supports several advanced features, such as lexicon-based decoding, lexicon-free decoding, beam pruning threshold, and more.
-.. code-block:: python
-    @dataclass
-    class FlashlightConfig:
-        lexicon_path: Optional[str] = None
-        boost_path: Optional[str] = None
-        beam_size_token: int = 16
-        beam_threshold: float = 20.0
-        unk_weight: float = -math.inf
-        sil_weight: float = 0.0
-.. code-block::
-    # Lexicon-based decoding
-    python eval_beamsearch_ngram_ctc.py ... \
-           decoding_strategy="flashlight" \
-           decoding.beam.flashlight_cfg.lexicon_path='/path/to/lexicon.lexicon' \
-           decoding.beam.flashlight_cfg.beam_size_token = 32 \
-           decoding.beam.flashlight_cfg.beam_threshold = 25.0
-    # Lexicon-free decoding
-    python eval_beamsearch_ngram_ctc.py ... \
-           decoding_strategy="flashlight" \
-           decoding.beam.flashlight_cfg.beam_size_token = 32 \
-           decoding.beam.flashlight_cfg.beam_threshold = 25.0
-PyCTCDecode (``pyctcdecode``)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-PyCTCDecode is a Python library for ASR decoding provided at `https://github.com/kensho-technologies/pyctcdecode <https://github.com/kensho-technologies/pyctcdecode>`_. It is a CPU-based beam search engine that is somewhat efficient for a pure Python library, and supports char and subword models. It requires a character/subword KenLM ARPA / BINARY model to be provided.
-It has advanced features, such as word boosting, which can be useful for transcript customization.
-.. code-block:: python
-   @dataclass
-    class PyCTCDecodeConfig:
-        beam_prune_logp: float = -10.0
-        token_min_logp: float = -5.0
-        prune_history: bool = False
-        hotwords: Optional[List[str]] = None
-        hotword_weight: float = 10.0
-.. code-block::
-    # PyCTCDecoding
-    python eval_beamsearch_ngram_ctc.py ... \
-           decoding_strategy="pyctcdecode" \
-           decoding.beam.pyctcdecode_cfg.beam_prune_logp = -10. \
-           decoding.beam.pyctcdecode_cfg.token_min_logp = -5. \
-           decoding.beam.pyctcdecode_cfg.hotwords=[<List of str words>] \
-           decoding.beam.pyctcdecode_cfg.hotword_weight=10.0
-Hyperparameter Grid Search
---------------------------
-Beam search decoding with N-gram LM has three main hyperparameters: `beam_width`, `beam_alpha`, and `beam_beta`.
-The accuracy of the model is dependent on the values of these parameters, specifically, beam_alpha and beam_beta. To perform grid search, you can specify a single value or a list of values for each of these parameters. In this case, it would perform the beam search decoding on all combinations of the three hyperparameters.
-For example, the following set of parameters would result in 212=4 beam search decodings:
-.. code-block::
-    python eval_beamsearch_ngram_ctc.py ... \
-                        beam_width=[64,128] \
-                        beam_alpha=[1.0] \
-                        beam_beta=[1.0,0.5]
-Beam Search ngram Decoding for Transducer Models (RNNT and HAT)
-===============================================================
-You can also find a similar script to evaluate an RNNT/HAT model with beam search decoding and N-gram models at:
-`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py>`_
-.. code-block::
-    python eval_beamsearch_ngram_transducer.py nemo_model_file=<path to the .nemo file of the model> \
-            input_manifest=<path to the evaluation JSON manifest file \
-            kenlm_model_file=<path to the binary KenLM model> \
-            beam_width=[<list of the beam widths, separated with commas>] \
-            beam_alpha=[<list of the beam alphas, separated with commas>] \
-            preds_output_folder=<optional folder to store the predictions> \
-            probs_cache_file=null \
-            decoding_strategy=<greedy_batch or maes decoding>
-            maes_prefix_alpha=[<list of the maes prefix alphas, separated with commas>] \
-            maes_expansion_gamma=[<list of the maes expansion gammas, separated with commas>] \
-            hat_subtract_ilm=<in case of HAT model: subtract internal LM or not (True/False)> \
-            hat_ilm_weight=[<in case of HAT model: list of the HAT internal LM weights, separated with commas>] \
-.. _neural_rescoring:
-****************
-Neural Rescoring
-****************
-When using the neural rescoring approach, a neural network is used to score candidates. A candidate is the text transcript predicted by the ASR model’s decoder. The top K candidates produced by beam search decoding (with a beam width of K) are given to a neural language model for ranking. The language model assigns a score to each candidate, which is usually combined with the scores from beam search decoding to produce the final scores and rankings.
-Train Neural Rescorer
-=====================
-An example script to train such a language model with Transformer can be found at `examples/nlp/language_modeling/transformer_lm.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/language_modeling/transformer_lm.py>`__.
-It trains a ``TransformerLMModel`` which can be used as a neural rescorer for an ASR system. Full documentation on language models training is available at:
-:doc:`../nlp/language_modeling`
-You can also use a pretrained language model from the Hugging Face library, such as Transformer-XL and GPT, instead of training your model.
-Models like BERT and RoBERTa are not supported by this script because they are trained as Masked Language Models. As a result, they are not efficient or effective for scoring sentences out of the box.
-Evaluation
-==========
-Given a trained TransformerLMModel `.nemo` file or a pretrained HF model, the script available at
-`scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__
-can be used to re-score beams obtained with ASR model. You need the `.tsv` file containing the candidates produced
-by the acoustic model and the beam search decoding to use this script. The candidates can be the result of just the beam
-search decoding or the result of fusion with an N-gram LM. You can generate this file by specifying `--preds_output_folder` for
-`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__.
-The neural rescorer would rescore the beams/candidates by using two parameters of `rescorer_alpha` and `rescorer_beta`, as follows:
-.. code-block::
-    final_score = beam_search_score + rescorer_alpha*neural_rescorer_score + rescorer_beta*seq_length
-The parameter `rescorer_alpha` specifies the importance placed on the neural rescorer model, while `rescorer_beta` is a penalty term that accounts for sequence length in the scores. These parameters have similar effects to `beam_alpha` and `beam_beta` in the beam search decoder and N-gram language model.
-Use the following steps to evaluate a neural LM:
-#. Obtain `.tsv` file with beams and their corresponding scores. Scores can be from a regular beam search decoder or
-   in fusion with an N-gram LM scores. For a given beam size `beam_size` and a number of examples
-   for evaluation `num_eval_examples`, it should contain (`num_eval_examples` x `beam_size`) lines of
-   form `beam_candidate_text \t score`. This file can be generated by `scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__
-#. Rescore the candidates by `scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__.
-.. code-block::
-    python eval_neural_rescorer.py
-        --lm_model=[path to .nemo file of the LM or the name of a HF pretrained model]
-        --beams_file=[path to beams .tsv file]
-        --beam_size=[size of the beams]
-        --eval_manifest=[path to eval manifest .json file]
-        --batch_size=[batch size used for inference on the LM model]
-        --alpha=[the value for the parameter rescorer_alpha]
-        --beta=[the value for the parameter rescorer_beta]
-        --scores_output_file=[the optional path to store the rescored candidates]
-The candidates, along with their new scores, are stored at the file specified by `--scores_output_file`.
-The following is the list of the arguments for the evaluation script:
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| **Argument**        |**Type**| **Default**      | **Description**                                                         |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| lm_model            | str    | Required         | The path of the '.nemo' file of an ASR model, or the name of a          |
-|                     |        |                  | Hugging Face pretrained model like 'transfo-xl-wt103' or 'gpt2'.        |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| eval_manifest       | str    | Required         | Path to the evaluation manifest file (.json manifest file).             |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| beams_file          | str    | Required         | Path to beams file (.tsv) containing the candidates and their scores.   |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| beam_size           | int    | Required         | The width of the beams (number of candidates) generated by the decoder. |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| alpha               | float  | None             | The value for parameter rescorer_alpha                                  |
-|                     |        |                  | Not passing value would enable linear search for rescorer_alpha.        |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| beta                | float  | None             | The value for parameter rescorer_beta                                   |
-|                     |        |                  | Not passing value would enable linear search for rescorer_beta.         |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| batch_size          | int    | 16               | The batch size used to calculate the scores.                            |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| max_seq_length      | int    | 512              | Maximum sequence length (in tokens) for the input.                      |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| scores_output_file  | str    | None             | The optional file to store the rescored beams.                          |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| use_amp             | bool   | ``False``        | Whether to use AMP if available calculate the scores.                   |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-| device              | str    | cuda             | The device to load LM model onto to calculate the scores                |
-|                     |        |                  | It can be 'cpu', 'cuda', 'cuda:0', 'cuda:1', ...                        |
-+---------------------+--------+------------------+-------------------------------------------------------------------------+
-Hyperparameter Linear Search
-----------------------------
-The hyperparameter linear search script also supports linear search for parameters `alpha` and `beta`. If any of the two is not
-provided, a linear search is performed to find the best value for that parameter. When linear search is used, initially
-`beta` is set to zero and the best value for `alpha` is found, then `alpha` is fixed with
-that value and another linear search is done to find the best value for `beta`.
-If any of the of these two parameters is already specified, then search for that one is skipped. After each search for a
-parameter, the plot of WER% for different values of the parameter is also shown.
-It is recommended to first use the linear search for both parameters on a validation set by not providing any values for `--alpha` and `--beta`.
-Then check the WER curves and decide on the best values for each parameter. Finally, evaluate the best values on the test set.
-Word Boosting
-=============
-The Flashlight decoder supports word boosting during CTC decoding using a KenLM binary and corresponding lexicon. Word boosting only works in lexicon-decoding mode and does not function in lexicon-free mode. It allows you to bias the decoder for certain words by manually increasing or decreasing the probability of emitting specific words. This can be very helpful if you have uncommon or industry-specific terms that you want to ensure are transcribed correctly.
-For more information, go to `word boosting <https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#word-boosting>`__
-To use word boosting in NeMo, create a simple tab-separated text file. Each line should contain a word to be boosted, followed by a tab, and then the boosted score for that word.
-For example:
-.. code-block::
-    nvidia	40
-    geforce	50
-    riva	80
-    turing	30
-    badword	-100
-Positive scores boost words higher in the LM decoding step so they show up more frequently, whereas negative scores
-squelch words so they show up less frequently. The recommended range for the boost score is +/- 20 to 100.
-The boost file handles both in-vocabulary words and OOV words just fine, so you can specify both IV and OOV words with corresponding scores.
-You can then pass this file to your Flashlight config object during decoding:
-.. code-block::
-    # Lexicon-based decoding
-    python eval_beamsearch_ngram_ctc.py ... \
-           decoding_strategy="flashlight" \
-           decoding.beam.flashlight_cfg.lexicon_path='/path/to/lexicon.lexicon' \
-           decoding.beam.flashlight_cfg.boost_path='/path/to/my_boost_file.boost' \
-           decoding.beam.flashlight_cfg.beam_size_token = 32 \
-           decoding.beam.flashlight_cfg.beam_threshold = 25.0
-Combine N-gram Language Models
-==============================
-Before combining N-gram LMs, install the required OpenGrm NGram library using `scripts/installers/install_opengrm.sh <https://github.com/NVIDIA/NeMo/blob/stable/scripts/installers/install_opengrm.sh>`__.
-Alternatively, you can use Docker image `scripts/installers/Dockerfile.ngramtools <https://github.com/NVIDIA/NeMo/blob/stable/scripts/installers/Dockerfile.ngramtools>`__ with all the necessary dependencies.
-Alternatively, you can use the Docker image at:
-`scripts/asr_language_modeling/ngram_lm/ngram_merge.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/ngram_merge.py>`__, which includes all the necessary dependencies.
-This script interpolates two ARPA N-gram language models and creates a KenLM binary file that can be used with the beam search decoders on top of ASR models.
-You can specify weights (`--alpha` and `--beta`) for each of the models (`--ngram_a` and `--ngram_b`) correspondingly: `alpha` * `ngram_a` + `beta` * `ngram_b`.
-This script supports both character level and BPE level encodings and models which are detected automatically from the type of the model.
-To combine two N-gram models, you can use the following command:
-.. code-block::
-    python ngram_merge.py  --kenlm_bin_path <path to the bin folder of KenLM library> \
-                    --ngram_bin_path  <path to the bin folder of OpenGrm Ngram library> \
-                    --arpa_a <path to the ARPA N-gram model file A> \
-                    --alpha <weight of N-gram model A> \
-                    --arpa_b <path to the ARPA N-gram model file B> \
-                    --beta <weight of N-gram model B> \
-                    --out_path <path to folder to store the output files>
-If you provide `--test_file` and `--nemo_model_file`, This script supports both character-level and BPE-level encodings and models, which are detected automatically based on the type of the model.
-Note, the result of each step during the process is cached in the temporary file in the `--out_path`, to speed up further run.
-You can use the `--force` flag to discard the cache and recalculate everything from scratch.
-.. code-block::
-    python ngram_merge.py  --kenlm_bin_path <path to the bin folder of KenLM library> \
-                    --ngram_bin_path  <path to the bin folder of OpenGrm Ngram library> \
-                    --arpa_a <path to the ARPA N-gram model file A> \
-                    --alpha <weight of N-gram model A> \
-                    --arpa_b <path to the ARPA N-gram model file B> \
-                    --beta <weight of N-gram model B> \
-                    --out_path <path to folder to store the output files>
-                    --nemo_model_file <path to the .nemo file of the model> \
-                    --test_file <path to the test file> \
-                    --symbols <path to symbols (.syms) file> \
-                    --force <flag to recalculate and rewrite all cached files>
-The following is the list of the arguments for the opengrm script:
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| **Argument**         |**Type**| **Default**      | **Description**                                                                                                 |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| kenlm_bin_path       | str    | Required         | The path to the bin folder of KenLM library. It is a folder named `bin` under where KenLM is installed.         |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| ngram_bin_path       | str    | Required         | The path to the bin folder of OpenGrm Ngram. It is a folder named `bin` under where OpenGrm Ngram is installed. |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| arpa_a               | str    | Required         | Path to the ARPA N-gram model file A.                                                                           |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| alpha                | float  | Required         | Weight of N-gram model A.                                                                                       |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| arpa_b               | int    | Required         | Path to the ARPA N-gram model file B.                                                                           |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| beta                 | float  | Required         | Weight of N-gram model B.                                                                                       |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| out_path             | str    | Required         | Path for writing temporary and resulting files.                                                                 |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| test_file            | str    | None             | Path to test file to count perplexity if provided.                                                              |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| symbols              | str    | None             | Path to symbols (.syms) file. Could be calculated if it is not provided.                                        |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| nemo_model_file      | str    | None             | The path to '.nemo' file of the ASR model, or name of a pretrained NeMo model.                                  |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-| force                | bool   | ``False``        | Whether to recompile and rewrite all files.                                                                     |
-+----------------------+--------+------------------+-----------------------------------------------------------------------------------------------------------------+
-.. _wfst-ctc-decoding:
-WFST CTC decoding
-=================
-Weighted Finite-State Transducers (WFST) are finite-state machines with input and output symbols on each transition and some weight element of a semiring. WFSTs can act as N-gram LMs in a special type of LM-forced beam search, called WFST decoding.
-.. note::
-    More precisely, WFST decoding is more of a greedy N-depth search with LM.
-    Thus, it is asymptotically worse than conventional beam search decoding algorithms, but faster.
-**WARNING**
-At the moment, NeMo supports WFST decoding only for CTC models and word-based LMs.
-To run WFST decoding in NeMo, one needs to provide a NeMo ASR model and either an ARPA LM or a WFST LM (advanced). An ARPA LM can be built from source text with KenLM as follows: ``<kenlm_bin_path>/lmplz -o <ngram_length> --arpa <out_arpa_path> --prune <ngram_prune>``.
-The script to evaluate an ASR model with WFST decoding and N-gram models can be found at
-`scripts/asr_language_modeling/ngram_lm/eval_wfst_decoding_ctc.py
-<https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_wfst_decoding_ctc.py>`__.
-This script has a large number of possible argument overrides, therefore it is advised to use ``python eval_wfst_decoding_ctc.py --help`` to see the full list of arguments.
-You may evaluate an ASR model as the following:
-.. code-block::
-    python eval_wfst_decoding_ctc.py nemo_model_file=<path to the .nemo file of the model> \
-           input_manifest=<path to the evaluation JSON manifest file> \
-           arpa_model_file=<path to the ARPA LM model> \
-           decoding_wfst_file=<path to the decoding WFST file> \
-           beam_width=[<list of the beam widths, separated with commas>] \
-           lm_weight=[<list of the LM weight multipliers, separated with commas>] \
-           open_vocabulary_decoding=<whether to use open vocabulary mode for WFST decoding> \
-           decoding_mode=<decoding mode, affects output. Usually "nbest"> \
-           decoding_search_type=<WFST decoding library. Usually "riva"> \
-           preds_output_folder=<optional folder to store the predictions> \
-           probs_cache_file=null
-.. note::
-    Since WFST decoding is LM-forced (the search goes over the WIDEST graph), only word sequences accepted by the WFST can appear in the decoding results.
-    To circumvent this restriction, one can pass ``open_vocabulary_decoding=true`` (experimental feature).
-Quick start example
--------------------
-.. code-block::
-    wget -O - https://www.openslr.org/resources/11/3-gram.pruned.1e-7.arpa.gz | \
-    gunzip -c | tr '[:upper:]' '[:lower:]' > 3-gram.pruned.1e-7.arpa && \
-    python eval_wfst_decoding_ctc.py nemo_model_file="stt_en_conformer_ctc_small_ls" \
-           input_manifest="<data_dir>/Librispeech/test_other.json" \
-           arpa_model_file="3-gram.pruned.1e-7.arpa" \
-           decoding_wfst_file="3-gram.pruned.1e-7.fst" \
-           beam_width=[8] \
-           lm_weight=[0.5,0.6,0.7,0.8,0.9]
-.. note::
-    Building a decoding WFST is a long process, so it is better to provide a ``decoding_wfst_file`` path even if you don't have it.
-    This way, the decoding WFST will be buffered to the specified file path and there will be no need to re-build it on the next run.
-***************************************************
-Context-biasing (Word Boosting) without External LM
-***************************************************
-NeMo toolkit supports a fast context-biasing method for CTC and Transducer (RNN-T) ASR models with CTC-based Word Spotter.
-The method involves decoding CTC log probabilities with a context graph built for words and phrases from the context-biasing list.
-The spotted context-biasing candidates (with their scores and time intervals) are compared by scores with words from the greedy CTC decoding results to improve recognition accuracy and pretend false accepts of context-biasing.
-A Hybrid Transducer-CTC model (a shared encoder trained together with CTC and Transducer output heads) enables the use of the CTC-WS method for the Transducer model.
-Context-biasing candidates obtained by CTC-WS are also filtered by the scores with greedy CTC predictions and then merged with greedy Transducer results.
-Scheme of the CTC-WS method:
-.. image:: https://github.com/NVIDIA/NeMo/releases/download/v1.22.0/asset-post-v1.22.0-ctcws_scheme_1.png
-    :align: center
-    :alt: CTC-WS scheme
-    :width: 80%
-High-level overview of the context-biasing words replacement with CTC-WS method:
-.. image:: https://github.com/NVIDIA/NeMo/releases/download/v1.22.0/asset-post-v1.22.0-ctcws_scheme_2.png
-    :align: center
-    :alt: CTC-WS high level overview
-    :width: 80%
-More details about CTC-WS context-biasing can be found in the `tutorial <https://github.com/NVIDIA/NeMo/tree/main/tutorials/asr/ASR_Context_Biasing.ipynb>`__.
-To use CTC-WS context-biasing, you need to create a context-biasing text file that contains words/phrases to be boosted, with its transcriptions (spellings) separated by underscore.
-Multiple transcriptions can be useful for abbreviations ("gpu" -> "g p u"), compound words ("nvlink" -> "nv link"),
-or words with common mistakes in the case of our ASR model ("nvidia" -> "n video").
-Example of the context-biasing file:
-.. code-block::
-    nvidia_nvidia
-    omniverse_omniverse
-    gpu_gpu_g p u
-    dgx_dgx_d g x_d gx
-    nvlink_nvlink_nv link
-    ray tracing_ray tracing
-The main script for CTC-WS context-biasing in NeMo is:
-.. code-block::
-    {NEMO_DIR_PATH}/scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
-Context-biasing is managed by ``apply_context_biasing`` parameter [true or false].
-Other important context-biasing parameters are:
-*  ``beam_threshold`` - threshold for CTC-WS beam pruning.
-*  ``context_score`` - per token weight for context biasing.
-*  ``ctc_ali_token_weight`` - per token weight for CTC alignment (prevents false acceptances of context-biasing words).
-All the context-biasing parameters are selected according to the default values in the script.
-You can tune them according to your data and ASR model (list all the values in the [] separated by commas)
-for example: ``beam_threshold=[7.0,8.0,9.0]``, ``context_score=[3.0,4.0,5.0]``, ``ctc_ali_token_weight=[0.5,0.6,0.7]``.
-The script will run the recognition with all the combinations of the parameters and will select the best one based on WER value.
-.. code-block::
-    # Context-biasing with the CTC-WS method for CTC ASR model
-    python {NEMO_DIR_PATH}/scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py \
-            nemo_model_file={ctc_model_name} \
-            input_manifest={test_nemo_manifest} \
-            preds_output_folder={exp_dir} \
-            decoder_type="ctc" \
-            acoustic_batch_size=64 \
-            apply_context_biasing=true \
-            context_file={cb_list_file_modified} \
-            beam_threshold=[7.0] \
-            context_score=[3.0] \
-            ctc_ali_token_weight=[0.5]
-To use Transducer head of the Hybrid Transducer-CTC model, you need to set ``decoder_type=rnnt``.

NeMo-2.2.0/docs/source/asr/configs.rst DELETED Viewed

@@ -1,1122 +0,0 @@
-NeMo ASR Configuration Files
-============================
-This section describes the NeMo configuration file setup that is specific to models in the ASR collection. For general information
-about how to set up and run experiments that is common to all NeMo models (e.g. Experiment Manager and PyTorch Lightning trainer
-parameters), see the :doc:`../core/core` section.
-The model section of the NeMo ASR configuration files generally requires information about the dataset(s) being used, the preprocessor
-for audio files, parameters for any augmentation being performed, as well as the model architecture specification. The sections on
-this page cover each of these in more detail.
-Example configuration files for all of the NeMo ASR scripts can be found in the
-`config directory of the examples <https://github.com/NVIDIA/NeMo/tree/stable/examples/asr/conf>`_.
-Dataset Configuration
----------------------
-Training, validation, and test parameters are specified using the ``train_ds``, ``validation_ds``, and
-``test_ds`` sections in the configuration file, respectively. Depending on the task, there may be arguments specifying the sample rate
-of the audio files, the vocabulary of the dataset (for character prediction), whether or not to shuffle the dataset, and so on. You may
-also decide to leave fields such as the ``manifest_filepath`` blank, to be specified via the command-line at runtime.
-Any initialization parameter that is accepted for the Dataset class used in the experiment can be set in the config file.
-Refer to the `Datasets <./api.html#Datasets>`__ section of the API for a list of Datasets and their respective parameters.
-An example ASR train and validation configuration should look similar to the following:
-.. code-block:: yaml
-  # Specified at the beginning of the config file
-  labels: &labels [" ", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
-           "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
-  model:
-    train_ds:
-      manifest_filepath: ???
-      sample_rate: 16000
-      labels: *labels   # Uses the labels above
-      batch_size: 32
-      trim_silence: True
-      max_duration: 16.7
-      shuffle: True
-      num_workers: 8
-      pin_memory: true
-      # tarred datasets
-      is_tarred: false # If set to true, uses the tarred version of the Dataset
-      tarred_audio_filepaths: null     # Not used if is_tarred is false
-      shuffle_n: 2048                  # Not used if is_tarred is false
-      # bucketing params
-      bucketing_strategy: "synced_randomized"
-      bucketing_batch_size: null
-      bucketing_weights: null
-    validation_ds:
-      manifest_filepath: ???
-      sample_rate: 16000
-      labels: *labels   # Uses the labels above
-      batch_size: 32
-      shuffle: False    # No need to shuffle the validation data
-      num_workers: 8
-      pin_memory: true
-There are two ways to test/validate on more than one manifest:
-- Specify a list in the `manifest_filepath` field. Results will be reported for each, the first one being used for overall loss / WER (specify `val_dl_idx` if you wish to change that). In this case, all manifests will share configuration parameters.
-- Use the ds_item key and pass a list of config objects to it. This allows you to use differently configured datasets for validation, e.g.
-.. code-block:: yaml
-  model:
-    validation_ds:
-      ds_item:
-      - name: dataset1
-        manifest_filepath: ???
-        # Config parameters for dataset1
-        ...
-      - name: dataset2
-        manifest_filepath: ???
-        # Config parameters for dataset2
-        ...
-By default, dataloaders are set up when the model is instantiated. However, dataloader setup can be deferred to
-model's `setup()` method by setting ``defer_setup`` in the configuration.
-For example, training data setup can be deferred as follows:
-.. code-block:: yaml
-  model:
-    train_ds:
-      # Configure training data as usual
-      ...
-      # Defer train dataloader setup from `__init__` to `setup`
-      defer_setup: true
-Preprocessor Configuration
---------------------------
-If you are loading audio files for your experiment, you will likely want to use a preprocessor to convert from the
-raw audio signal to features (e.g. mel-spectrogram or MFCC). The ``preprocessor`` section of the config specifies the audio
-preprocessor to be used via the ``_target_`` field, as well as any initialization parameters for that preprocessor.
-An example of specifying a preprocessor is as follows:
-.. code-block:: yaml
-  model:
-    ...
-    preprocessor:
-      # _target_ is the audio preprocessor module you want to use
-      _target_: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor
-      normalize: "per_feature"
-      window_size: 0.02
-      ...
-      # Other parameters for the preprocessor
-Refer to the `Audio Preprocessors <./api.html#Audio Preprocessors>`__ API section for the preprocessor options, expected arguments,
-and defaults.
-Augmentation Configurations
----------------------------
-There are a few on-the-fly spectrogram augmentation options for NeMo ASR, which can be specified by the
-configuration file using a ``spec_augment`` section.
-For example, there are options for `Cutout <https://arxiv.org/abs/1708.04552>`_ and
-`SpecAugment <https://arxiv.org/abs/1904.08779>`_ available via the ``SpectrogramAugmentation`` module.
-The following example sets up both ``Cutout`` (via the ``rect_*`` parameters) and ``SpecAugment`` (via the ``freq_*``
-and ``time_*`` parameters).
-.. code-block:: yaml
-  model:
-    ...
-    spec_augment:
-      _target_: nemo.collections.asr.modules.SpectrogramAugmentation
-      # Cutout parameters
-      rect_masks: 5   # Number of rectangles to cut from any given spectrogram
-      rect_freq: 50   # Max cut of size 50 along the frequency dimension
-      rect_time: 120  # Max cut of size 120 along the time dimension
-      # SpecAugment parameters
-      freq_masks: 2   # Cut two frequency bands
-      freq_width: 15  # ... of width 15 at maximum
-      time_masks: 5    # Cut out 10 time bands
-      time_width: 25  # ... of width 25 at maximum
-You can use any combination of ``Cutout``, frequency/time ``SpecAugment``, or neither of them.
-With NeMo ASR, you can also add augmentation pipelines that can be used to simulate various kinds of noise
-added to audio in the channel. Augmentors in a pipeline are applied on the audio data read in the data layer. Online
-augmentors can be specified in the config file using an ``augmentor`` section in ``train_ds``. The following example
-adds an augmentation pipeline that first adds white noise to an audio sample with a probability of 0.5 and at a level
-randomly picked between -50 dB and -10 dB and then passes the resultant samples through a room impulse response randomly
-picked from the manifest file provided for ``impulse`` augmentation in the config file.
-.. code-block:: yaml
-  model:
-    ...
-    train_ds:
-    ...
-        augmentor:
-            white_noise:
-                prob: 0.5
-                min_level: -50
-                max_level: -10
-            impulse:
-                prob: 0.3
-                manifest_path: /path/to/impulse_manifest.json
-Refer to the `Audio Augmentors <./api.html#Audio Augmentors>`__ API section for more details.
-Tokenizer Configurations
-------------------------
-Some models utilize sub-word encoding via an external tokenizer instead of explicitly defining their vocabulary.
-For such models, a ``tokenizer`` section is added  to the model config. ASR models currently support two types of
-custom tokenizers:
-- Google Sentencepiece tokenizers (tokenizer type of ``bpe`` in the config)
-- HuggingFace WordPiece tokenizers (tokenizer type of ``wpe`` in the config)
-- Aggregate tokenizers ((tokenizer type of ``agg`` in the config), see below)
-In order to build custom tokenizers, refer to the ``ASR_with_Subword_Tokenization`` notebook available in the
-ASR tutorials directory.
-The following example sets up a ``SentencePiece Tokenizer`` at a path specified by the user:
-.. code-block:: yaml
-  model:
-    ...
-    tokenizer:
-      dir: "<path to the directory that contains the custom tokenizer files>"
-      type: "bpe"  # can be "bpe" or "wpe"
-The Aggregate (``agg``) tokenizer feature makes it possible to combine tokenizers in order to train multilingual
-models. The config file would look like this:
-.. code-block:: yaml
-  model:
-    ...
-    tokenizer:
-      type: "agg"  # aggregate tokenizer
-      langs:
-        en:
-          dir: "<path to the directory that contains the tokenizer files>"
-          type: "bpe"  # can be "bpe" or "wpe"
-        es:
-          dir: "<path to the directory that contains the tokenizer files>"
-          type: "bpe"  # can be "bpe" or "wpe"
-In the above config file, each language is associated with its own pre-trained tokenizer, which gets assigned
-a token id range in the order the tokenizers are listed. To train a multilingual model, one needs to populate the
-``lang`` field in the manifest file, allowing the routing of each sample to the correct tokenizer. At inference time,
-the routing is done based on the inferred token id range.
-For models which utilize sub-word tokenization, we share the decoder module (``ConvASRDecoder``) with character tokenization models.
-All parameters are shared, but for models which utilize sub-word encoding, there are minor differences when setting up the config. For
-such models, the tokenizer is utilized to fill in the missing information when the model is constructed automatically.
-For example, a decoder config corresponding to a sub-word tokenization model should look similar to the following:
-.. code-block:: yaml
-  model:
-    ...
-    decoder:
-      _target_: nemo.collections.asr.modules.ConvASRDecoder
-      feat_in: *enc_final
-      num_classes: -1  # filled with vocabulary size from tokenizer at runtime
-      vocabulary: []  # filled with vocabulary from tokenizer at runtime
-On-the-fly Code Switching
--------------------------
-Nemo supports creating code-switched synthetic utterances on-the-fly during training/validation/testing. This allows you to create ASR models which
-support intra-utterance code switching. If you have Nemo formatted audio data on disk (either JSON manifests or tarred audio data), you
-can easily mix as many of these audio sources together as desired by adding some extra parameters to your `train_ds`, `validation_ds`, and `test_ds`.
-Please note that this allows you to mix any kind of audio sources together to create synthetic utterances which sample from all sources. The most
-common use case for this is blending different languages together to create a multilingual code-switched model, but you can also blend
-together different audio sources from the same languages (or language families), to create noise robust data, or mix fast and slow speech from the
-same language.
-For multilingual code-switched models, we recommend using AggTokenizer for your Tokenizer if mixing different languages.
-The following example shows how to mix 3 different languages: English (en), German (de), and Japanese (ja) added to the `train_ds` model block, however
-you can add similar logic to your `validation_ds` and `test_ds` blocks for on-the-fly code-switched validation and test data too. This example mixes
-together 3 languages, but you can use as many as you want. However, be advised that the more languages you add, the higher your `min_duration` and `max_duration`
-need to be set to ensure all languages are sampled into each synthetic utterance, and setting these hyperparameters higher will use more VRAM per mini-batch during
-training and evaluation.
-.. code-block:: yaml
-  model:
-    train_ds:
-      manifest_filepath: [/path/to/EN/tarred_manifest.json, /path/to/DE/tarred_manifest.json, /path/to/JA/tarred_manifest.json]
-      tarred_audio_filepaths: ['/path/to/EN/tars/audio__OP_0..511_CL_.tar', '/path/to/DE/tars/audio__OP_0..1023_CL_.tar', '/path/to/JA/tars/audio__OP_0..2047_CL_.tar']
-      is_code_switched: true
-      is_tarred: true
-      shuffle: true
-        code_switched:              # add this block for code-switching
-          min_duration: 12          # the minimum number of seconds for each synthetic code-switched utterance
-          max_duration: 20          # the maximum number of seconds for each synthetic code-switched utterance
-          min_monolingual: 0.3      # the minimum percentage of utterances which will be pure monolingual (0.3 = 30%)
-          probs: [0.25, 0.5, 0.25]  # the probability to sample each language (matches order of `language` above) if not provided, assumes uniform distribution
-          force_monochannel: true   # if your source data is multi-channel, then setting this to True will force the synthetic utterances to be mono-channel
-          sampling_scales: 0.75     # allows you to down/up sample individual languages. Can set this as an array for individual languages, or a scalar for all languages
-          seed: 123                 # add a seed for replicability in future runs (highly useful for `validation_ds` and `test_ds`)
-Model Architecture Configurations
----------------------------------
-Each configuration file should describe the model architecture being used for the experiment. Models in the NeMo ASR collection need
-an ``encoder`` section and a ``decoder`` section, with the ``_target_`` field specifying the module to use for each.
-Here is the list of the parameters in the model section which are shared among most of the ASR models:
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+---------------------------------+
-| **Parameter**           | **Datatype**     | **Description**                                                                                               | **Supported Values**            |
-+=========================+==================+===============================================================================================================+=================================+
-| :code:`log_prediction`  | bool             | Whether a random sample should be printed in the output at each step, along with its predicted transcript.    |                                 |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+---------------------------------+
-| :code:`ctc_reduction`   | string           | Specifies the reduction type of CTC loss. Defaults to ``mean_batch`` which would take the average over the    | :code:`none`,                   |
-|                         |                  | batch after taking the average over the length of each sample.                                                | :code:`mean_batch`              |
-|                         |                  |                                                                                                               | :code:`mean`, :code:`sum`       |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+---------------------------------+
-The following sections go into more detail about the specific configurations of each model architecture.
-For more information about the ASR models, refer to the :doc:`Models <./models>` section.
-Jasper and QuartzNet
-~~~~~~~~~~~~~~~~~~~~
-The `Jasper <./models.html#Jasper>`__ and `QuartzNet <./models.html#QuartzNet>`__ models are very similar, and as such the components in their
-configs are very similar as well.
-Both architectures use the ``ConvASREncoder`` for the ``encoder``, with parameters detailed in the table below. The encoder parameters
-include details about the Jasper/QuartzNet ``[BxR]`` encoder architecture, including how many blocks to use (``B``), how many times
-to repeat each sub-block (``R``), and the convolution parameters for each block.
-The number of blocks ``B`` is determined by the number of list elements under ``jasper`` minus the one prologue and two epilogue blocks.
-The number of sub-blocks ``R`` is determined by setting the ``repeat`` parameter.
-To use QuartzNet (which uses more compact time-channel separable convolutions) instead of Jasper, add :code:`separable: true` to all
-but the last block in the architecture.
-Change the parameter name ``jasper``.
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+-------------------------------------+
-| **Parameter**           | **Datatype**     | **Description**                                                                                               | **Supported Values**                |
-+=========================+==================+===============================================================================================================+=====================================+
-| :code:`feat_in`         | int              | The number of input features. Should be equal to :code:`features` in the preprocessor parameters.             |                                     |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+-------------------------------------+
-| :code:`activation`      | string           | Which activation function to use in the encoder.                                                              | :code:`hardtanh`, :code:`relu`,     |
-|                         |                  |                                                                                                               | :code:`selu`, :code:`swish`         |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+-------------------------------------+
-| :code:`conv_mask`       | bool             | Whether to use masked convolutions in the encoder. Defaults to ``true``.                                      |                                     |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+-------------------------------------+
-| :code:`jasper`          |                  | A list of blocks that specifies your encoder architecture. Each entry in this list represents one block in    |                                     |
-|                         |                  | the architecture and contains the parameters for that block, including convolution parameters, dropout, and   |                                     |
-|                         |                  | the number of times the block is repeated. Refer to the `Jasper <https://arxiv.org/pdf/1904.03288.pdf>`_ and  |                                     |
-|                         |                  | `QuartzNet <https://arxiv.org/pdf/1910.10261.pdf>`_ papers for details about specific model configurations.   |                                     |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+-------------------------------------+
-A QuartzNet 15x5 (fifteen blocks, each sub-block repeated five times) encoder configuration should look similar to the following example:
-.. code-block:: yaml
-  # Specified at the beginning of the file for convenience
-  n_mels: &n_mels 64    # Used for both the preprocessor and encoder as number of input features
-  repeat: &repeat 5     # R=5
-  dropout: &dropout 0.0
-  separable: &separable true  # Set to true for QN. Set to false for Jasper.
-  model:
-    ...
-    encoder:
-      _target_: nemo.collections.asr.modules.ConvASREncoder
-      feat_in: *n_mels  # Should match "features" in the preprocessor.
-      activation: relu
-      conv_mask: true
-      jasper:   # This field name should be "jasper" for both types of models.
-      # Prologue block
-      - dilation: [1]
-        dropout: *dropout
-        filters: 256
-        kernel: [33]
-        repeat: 1   # Prologue block is not repeated.
-        residual: false
-        separable: *separable
-        stride: [2]
-      # Block 1
-      - dilation: [1]
-        dropout: *dropout
-        filters: 256
-        kernel: [33]
-        repeat: *repeat
-        residual: true
-        separable: *separable
-        stride: [1]
-      ... # Entries for blocks 2~14
-      # Block 15
-      - dilation: [1]
-        dropout: *dropout
-        filters: 512
-        kernel: [75]
-        repeat: *repeat
-        residual: true
-        separable: *separable
-        stride: [1]
-      # Two epilogue blocks
-      - dilation: [2]
-        dropout: *dropout
-        filters: 512
-        kernel: [87]
-        repeat: 1   # Epilogue blocks are not repeated
-        residual: false
-        separable: *separable
-        stride: [1]
-      - dilation: [1]
-        dropout: *dropout
-        filters: &enc_filters 1024
-        kernel: [1]
-        repeat: 1   # Epilogue blocks are not repeated
-        residual: false
-        stride: [1]
-Both Jasper and QuartzNet use the ``ConvASRDecoder`` as the decoder. The decoder parameters are detailed in the following table.
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+---------------------------------+
-| **Parameter**           | **Datatype**     | **Description**                                                                                               | **Supported Values**            |
-+=========================+==================+===============================================================================================================+=================================+
-| :code:`feat_in`         | int              | The number of input features to the decoder. Should be equal to the number of filters in the last block of    |                                 |
-|                         |                  | the encoder.                                                                                                  |                                 |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+---------------------------------+
-| :code:`vocabulary`      | list             | A list of the valid output characters for your model. For example, for an English dataset, this could be a    |                                 |
-|                         |                  | list of all lowercase letters, space, and apostrophe.                                                         |                                 |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+---------------------------------+
-| :code:`num_classes`     | int              | Number of output classes, i.e. the length of :code:`vocabulary`.                                              |                                 |
-+-------------------------+------------------+---------------------------------------------------------------------------------------------------------------+---------------------------------+
-For example, a decoder config corresponding to the encoder above should look similar to the following:
-.. code-block:: yaml
-  model:
-    ...
-    decoder:
-      _target_: nemo.collections.asr.modules.ConvASRDecoder
-      feat_in: *enc_filters
-      vocabulary: *labels
-      num_classes: 28   # Length of the vocabulary list
-Citrinet
-~~~~~~~~
-The `Citrinet <./models.html#Citrinet>`__ and `QuartzNet <./models.html#QuartzNet>`__ models are very similar, and as such the
-components in their configs are very similar as well. Citrinet utilizes Squeeze and Excitation, as well as sub-word tokenization, in
-contrast to QuartzNet. Depending on the dataset, we utilize different tokenizers. For Librispeech, we utilize the HuggingFace WordPiece
-tokenizer, and for all other datasets we utilize the Google Sentencepiece tokenizer - usually the ``unigram`` tokenizer type.
-Both architectures use the ``ConvASREncoder`` for the ``encoder``, with parameters detailed above. The encoder parameters include
-details about the Citrinet-C encoder architecture, including how many filters are used per channel (``C``). The Citrinet-C
-configuration is a shortform notation for Citrinet-21x5xC, such that ``B = 21`` and ``R = 5`` are the default and should generally
-not be changed.
-To use Citrinet instead of QuartzNet, refer to the ``citrinet_512.yaml`` configuration found inside the ``examples/asr/conf/citrinet``
-directory. Citrinet is primarily comprised of the same :class:`~nemo.collections.asr.parts.submodules.jasper.JasperBlock` as ``Jasper`` or
-``QuartzNet``.
-While the configs for Citrinet and QuartzNet are similar, we note the additional flags used for Citrinet below. Refer to the
-``JasperBlock`` documentation for the meaning of these arguments.
-+---------------------------+------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------+
-| **Parameter**             | **Datatype**     | **Description**                                                                                           | **Supported Values**              |
-+===========================+==================+===========================================================================================================+===================================+
-| :code:`se`                | bool             | Whether to apply squeeze-and-excitation mechanism or not.                                                 | :code:`true` or :code:`false`     |
-+---------------------------+------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------+
-| :code:`se_context_size`   | int              | SE context size. -1 means global context.                                                                 | :code:`-1` or :code:`+ve int`     |
-+---------------------------+------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------+
-| :code:`stride_last`       | bool             | Stride on the final repeated block or all repeated blocks.                                                | :code:`true` or :code:`false`     |
-+---------------------------+------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------+
-| :code:`residual_mode`     | str              | Type of residual branch to construct.                                                                     | :code:`"add"` or                  |
-|                           |                  | Can be pointwise residual addition or pointwise strided residual attention                                | :code:`"stride_add"`              |
-+---------------------------+------------------+-----------------------------------------------------------------------------------------------------------+-----------------------------------+
-A Citrinet-512 config should look similar to the following:
-.. code-block:: yaml
-  model:
-    ...
-    # Specify some defaults across the entire model
-    model_defaults:
-      repeat: 5
-      dropout: 0.1
-      separable: true
-      se: true
-      se_context_size: -1
-    ...
-    encoder:
-      _target_: nemo.collections.asr.modules.ConvASREncoder
-      feat_in: *n_mels  # Should match "features" in the preprocessor.
-      activation: relu
-      conv_mask: true
-      jasper:   # This field name should be "jasper" for the JasperBlock (which constructs Citrinet).
-      # Prologue block
-      - filters: 512
-        repeat: 1
-        kernel: [5]
-        stride: [1]
-        dilation: [1]
-        dropout: 0.0
-        residual: false
-        separable: ${model.model_defaults.separable}
-        se: ${model.model_defaults.se}
-        se_context_size: ${model.model_defaults.se_context_size}
-      # Block 1
-      - filters: 512
-        repeat: ${model.model_defaults.repeat}
-        kernel: [11]
-        stride: [2]
-        dilation: [1]
-        dropout: ${model.model_defaults.dropout}
-        residual: true
-        separable: ${model.model_defaults.separable}
-        se: ${model.model_defaults.se}
-        se_context_size: ${model.model_defaults.se_context_size}
-        stride_last: true
-        residual_mode: "stride_add"
-      ... # Entries for blocks 2~21
-      # Block 22
-      - filters: 512
-        repeat: ${model.model_defaults.repeat}
-        kernel: [39]
-        stride: [1]
-        dilation: [1]
-        dropout: ${model.model_defaults.dropout}
-        residual: true
-        separable: ${model.model_defaults.separable}
-        se: ${model.model_defaults.se}
-        se_context_size: ${model.model_defaults.se_context_size}
-      # Epilogue block
-      - filters: &enc_final 640
-        repeat: 1
-        kernel: [41]
-        stride: [1]
-        dilation: [1]
-        dropout: 0.0
-        residual: false
-        separable: ${model.model_defaults.separable}
-        se: ${model.model_defaults.se}
-        se_context_size: ${model.model_defaults.se_context_size}
-As mentioned above, Citrinet uses the ``ConvASRDecoder`` as the decoder layer similar to QuartzNet. Only the configuration must be
-changed slightly as Citrinet utilizes sub-word tokenization.
-.. note::
-    The following information is relevant to any of the above models that implements its encoder as an :class:`~nemo.collections.asr.modules.conv_asr.ConvASREncoder`, and utilizes the ``SqueezeExcite`` mechanism.
-The ``SqueezeExcite`` block within a :class:`~nemo.collections.asr.modules.conv_asr.ConvASREncoder` network can be modified to utilize a different context window after the model has been instantiated (even after the model has been trained) so as to evaluate the model with limited context. This can be achieved using the :meth:`~nemo.collections.asr.parts.mixins.mixins.ASRModuleMixin.change_conv_asr_se_context_window`
-.. code-block:: python
-    # Here, model can be any model that has a `ConvASREncoder` as its encoder, and utilized `SqueezeExcite` blocks
-    # `context_window` : It is an integer representing the number of timeframes (each corresponding to some window stride).
-    # `update_config` : Bool flag which determines whether the config of the model should be updated to reflect the new context window.
-    # Here, we specify that 128 timeframes of 0.01s stride should be the context window
-    # This is equivalent to 128 * 0.01s context window for `SqueezeExcite`
-    model.change_conv_asr_se_context_window(context_window=128, update_config=True)
-Conformer-CTC
-~~~~~~~~~~~~~
-The config files for Conformer-CTC model contain character-based encoding and sub-word encoding at
-``<NeMo_git_root>/examples/asr/conf/conformer/conformer_ctc_char.yaml`` and ``<NeMo_git_root>/examples/asr/conf/conformer/conformer_ctc_bpe.yaml``
-respectively. Some components of the configs of `Conformer-CTC <./models.html#Conformer-CTC>`__ include the following datasets:
-* ``train_ds``, ``validation_ds``, and ``test_ds``
-* opimizer (``optim``)
-* augmentation (``spec_augment``)
-* ``decoder``
-* ``trainer``
-* ``exp_manager``
-These datasets are similar to other ASR models like `QuartzNet <./models.html#QuartzNet>`__. There should be a tokenizer section where you can
-specify the tokenizer if you want to use sub-word encoding instead of character-based encoding.
-The encoder section includes the details about the Conformer-CTC encoder architecture. You may find more information in the
-config files and also :ref:`nemo.collections.asr.modules.ConformerEncoder <conformer-encoder-api>`.
-Squeezeformer-CTC
-~~~~~~~~~~~~~~~~~
-The config files for Squeezeformer-CTC model contain character-based encoding and sub-word encoding at
-``<NeMo_git_root>/examples/asr/conf/squeezeformer/squeezeformer_ctc_char.yaml`` and ``<NeMo_git_root>/examples/asr/conf/squeezeformer/squeezeformer_ctc_bpe.yaml``
-respectively. Components of the configs of `Squeezeformer-CTC <./models.html#Squeezeformer-CTC>`__ are similar to Conformer config - `QuartzNet <./configs.html#Conformer-CTC>`__.
-The encoder section includes the details about the Squeezeformer-CTC encoder architecture. You may find more information in the
-config files and also :ref:`nemo.collections.asr.modules.SqueezeformerEncoder <squeezeformer-encoder-api>`.
-ContextNet
-~~~~~~~~~~
-Please refer to the model page of `ContextNet <./models.html#ContextNet>`__ for more information on this model.
-Conformer-Transducer
-~~~~~~~~~~~~~~~~~~~~
-Please refer to the model page of `Conformer-Transducer <./models.html#Conformer-Transducer>`__ for more information on this model.
-LSTM-Transducer and LSTM-CTC
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The config files for LSTM-Transducer and LSTM-CTC models can be found at ``<NeMo_git_root>/examples/asr/conf/lstm/lstm_transducer_bpe.yaml`` and ``<NeMo_git_root>/examples/asr/conf/lstm/lstm_ctc_bpe.yaml`` respectively.
-Most of the of the configs of are similar to other ctc or transducer models. The main difference is the encoder part.
-The encoder section includes the details about the RNN-based encoder architecture. You may find more information in the
-config files and also :ref:`nemo.collections.asr.modules.RNNEncoder <rnn-encoder-api>`.
-InterCTC Config
----------------
-All CTC-based models also support `InterCTC loss <https://arxiv.org/abs/2102.03216>`_. To use it, you need to specify
-2 parameters as in example below
-.. code-block:: yaml
-   model:
-      # ...
-      interctc:
-        loss_weights: [0.3]
-        apply_at_layers: [8]
-which can be used to reproduce the default setup from the paper (assuming the total number of layers is 18).
-You can also specify multiple CTC losses from different layers, e.g., to get 2 losses from layers 3 and 8 with
-weights 0.1 and 0.3, specify:
-.. code-block:: yaml
-   model:
-      # ...
-      interctc:
-        loss_weights: [0.1, 0.3]
-        apply_at_layers: [3, 8]
-Note that the final-layer CTC loss weight is automatically computed to normalize
-all weight to 1 (0.6 in the example above).
-Stochastic Depth Config
------------------------
-`Stochastic Depth <https://arxiv.org/abs/2102.03216>`_ is a useful technique for regularizing ASR model training.
-Currently it's only supported for :ref:`nemo.collections.asr.modules.ConformerEncoder <conformer-encoder-api>`. To
-use it, specify the following parameters in the encoder config file to reproduce the default setup from the paper:
-.. code-block:: yaml
-   model:
-      # ...
-      encoder:
-        # ...
-        stochastic_depth_drop_prob: 0.3
-        stochastic_depth_mode: linear  # linear or uniform
-        stochastic_depth_start_layer: 1
-See :ref:`documentation of ConformerEncoder <conformer-encoder-api>` for more details. Note that stochastic depth
-is supported for both CTC and Transducer model variations (or any other kind of model/loss that's using
-conformer as encoder).
-Transducer Configurations
--------------------------
-All CTC-based ASR model configs can be modified to support Transducer loss training. Below, we discuss the modifications required in the config to enable Transducer training. All modifications are made to the ``model`` config.
-Model Defaults
-~~~~~~~~~~~~~~
-It is a subsection to the model config representing the default values shared across the entire model represented as ``model.model_defaults``.
-There are three values that are primary components of a transducer model. They are :
-* ``enc_hidden``: The hidden dimension of the final layer of the Encoder network.
-* ``pred_hidden``: The hidden dimension of the final layer of the Prediction network.
-* ``joint_hidden``: The hidden dimension of the intermediate layer of the Joint network.
-One can access these values inside the config by using OmegaConf interpolation as follows :
-.. code-block:: yaml
-    model:
-      ...
-      model_defaults:
-        enc_hidden: 256
-        pred_hidden: 256
-        joint_hidden: 256
-      ...
-      decoder:
-        ...
-        prednet:
-          pred_hidden: ${model.model_defaults.pred_hidden}
-Acoustic Encoder Model
-~~~~~~~~~~~~~~~~~~~~~~
-The transducer model is comprised of three models combined. One of these models is the Acoustic (encoder) model. We should be able to drop in any CTC Acoustic model config into this section of the transducer config.
-The only condition that needs to be met is that **the final layer of the acoustic model must have the hidden dimension defined in ``model_defaults.enc_hidden``**.
-Decoder / Prediction Model
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-The Prediction model is generally an autoregressive, causal model that consumes text tokens and returns embeddings that will be used by the Joint model. The base config for an LSTM based Prediction network can be found in the the ``decoder`` section of `ContextNet <./models.html#ContextNet>`__ or other Transducer architectures. For further information refer to the ``Intro to Transducers`` tutorial in the ASR tutorial section.
-**This config can be copy-pasted into any custom transducer model with no modification.**
-Let us discuss some of the important arguments:
-* ``blank_as_pad``: In ordinary transducer models, the embedding matrix does not acknowledge the ``Transducer Blank`` token (similar to CTC Blank). However, this causes the autoregressive loop to be more complicated and less efficient. Instead, this flag which is set by default, will add the ``Transducer Blank`` token to the embedding matrix - and use it as a pad value (zeros tensor). This enables more efficient inference without harming training. For further information refer to the ``Intro to Transducers`` tutorial in the ASR tutorial section.
-* ``prednet.pred_hidden``: The hidden dimension of the LSTM and the output dimension of the Prediction network.
-.. code-block:: yaml
-  decoder:
-    _target_: nemo.collections.asr.modules.RNNTDecoder
-    normalization_mode: null
-    random_state_sampling: false
-    blank_as_pad: true
-    prednet:
-      pred_hidden: ${model.model_defaults.pred_hidden}
-      pred_rnn_layers: 1
-      t_max: null
-      dropout: 0.0
-Joint Model
-~~~~~~~~~~~
-The Joint model is a simple feed-forward Multi-Layer Perceptron network. This MLP accepts the output of the Acoustic and Prediction models and computes a joint probability distribution over the entire vocabulary space. The base config for the Joint network can be found in the the ``joint`` section of `ContextNet <./models.html#ContextNet>`__ or other Transducer architectures. For further information refer to the ``Intro to Transducers`` tutorial in the ASR tutorial section.
-**This config can be copy-pasted into any custom transducer model with no modification.**
-The Joint model config has several essential components which we discuss below :
-* ``log_softmax``: Due to the cost of computing softmax on such large tensors, the Numba CUDA implementation of RNNT loss will implicitly compute the log softmax when called (so its inputs should be logits). The CPU version of the loss doesn't face such memory issues so it requires log-probabilities instead. Since the behaviour is different for CPU-GPU, the ``None`` value will automatically switch behaviour dependent on whether the input tensor is on a CPU or GPU device.
-* ``preserve_memory``: This flag will call ``torch.cuda.empty_cache()`` at certain critical sections when computing the Joint tensor. While this operation might allow us to preserve some memory, the empty_cache() operation is tremendously slow and will slow down training by an order of magnitude or more. It is available to use but not recommended.
-* ``fuse_loss_wer``: This flag performs "batch splitting" and then "fused loss + metric" calculation. It will be discussed in detail in the next tutorial that will train a Transducer model.
-* ``fused_batch_size``: When the above flag is set to True, the model will have two distinct "batch sizes". The batch size provided in the three data loader configs (``model.*_ds.batch_size``) will now be the ``Acoustic model`` batch size, whereas the ``fused_batch_size`` will be the batch size of the ``Prediction model``, the ``Joint model``, the ``transducer loss`` module and the ``decoding`` module.
-* ``jointnet.joint_hidden``: The hidden intermediate dimension of the joint network.
-.. code-block:: yaml
-  joint:
-    _target_: nemo.collections.asr.modules.RNNTJoint
-    log_softmax: null  # sets it according to cpu/gpu device
-    # fused mode
-    fuse_loss_wer: false
-    fused_batch_size: 16
-    jointnet:
-      joint_hidden: ${model.model_defaults.joint_hidden}
-      activation: "relu"
-      dropout: 0.0
-Sampled Softmax Joint Model
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-There are some situations where a large vocabulary with a Transducer model - such as for multilingual models with a large
-number of languages. In this setting, we need to consider the cost of memory of training Transducer networks which does
-not allow large vocabulary.
-For such cases, one can instead utilize the ``SampledRNNTJoint`` module instead of the usual ``RNNTJoint`` module, in order
-to compute the loss using a sampled subset of the vocabulary rather than the full vocabulary file.
-It adds only one additional parameter :
-* ``n_samples``: Specifies the minimum number of tokens to sample from the vocabulary space,
-  excluding the RNNT blank token. If a given value is larger than the entire vocabulary size,
-  then the full vocabulary will be used.
-The only difference in config required is to replace ``nemo.collections.asr.modules.RNNTJoint`` with ``nemo.collections.asr.modules.SampledRNNTJoint``
-.. code-block:: yaml
-  joint:
-    _target_: nemo.collections.asr.modules.SampledRNNTJoint
-    n_samples: 500
-    ...  # All other arguments from RNNTJoint can be used after this.
-Effect of Batch Splitting / Fused Batch step
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The following information below explain why memory is an issue when training Transducer models and how NeMo tackles the issue with its Fused Batch step. The material can be read for a thorough understanding, otherwise, it can be skipped. You can also follow these steps in the "ASR_with_Transducers" tutorial.
-**Diving deeper into the memory costs of Transducer Joint**
-One of the significant limitations of Transducers is the exorbitant memory cost of computing the Joint module. The Joint module is comprised of two steps.
-1) Projecting the Acoustic and Transcription feature dimensions to some standard hidden dimension (specified by model.model_defaults.joint_hidden)
-2) Projecting this intermediate hidden dimension to the final vocabulary space to obtain the transcription.
-Take the following example.
-BS=32 ; T (after 2x stride) = 800, U (with character encoding) = 400-450 tokens, Vocabulary size V = 28 (26 alphabet chars, space and apostrophe). Let the hidden dimension of the Joint model be 640 (Most Google Transducer papers use hidden dimension of 640).
-* :math:`Memory \, (Hidden, \, gb) = 32 \times 800 \times 450 \times 640 \times 4 = 29.49` gigabytes (4 bytes per float).
-* :math:`Memory \, (Joint, \, gb) = 32 \times 800 \times 450 \times 28 \times 4 = 1.290` gigabytes (4 bytes per float)
-**NOTE**: This is just for the forward pass! We need to double this memory to store gradients! This much memory is also just for the Joint model **alone**. Far more memory is required for the Prediction model as well as the large Acoustic model itself and its gradients!
-Even with mixed precision, that's $\sim 30$ GB of GPU RAM for just 1 part of the network + its gradients.
-Effect of Fused Batch Step
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The fundamental problem is that the joint tensor grows in size when ``[T x U]`` grows in size. This growth in memory cost is due to many reasons - either by model construction (downsampling) or the choice of dataset preprocessing (character tokenization vs. sub-word tokenization).
-Another dimension that NeMo can control is **batch**. Due to how we batch our samples, small and large samples all get clumped together into a single batch. So even though the individual samples are not all as long as the maximum length of T and U in that batch, when a batch of such samples is constructed, it will consume a significant amount of memory for the sake of compute efficiency.
-So as is always the case - **trade-off compute speed for memory savings**.
-The fused operation goes as follows :
-1) Forward the entire acoustic model in a single pass. (Use global batch size here for acoustic model - found in ``model.*_ds.batch_size``)
-2) Split the Acoustic Model's logits by ``fused_batch_size`` and loop over these sub-batches.
-3) Construct a sub-batch of same ``fused_batch_size`` for the Prediction model. Now the target sequence length is :math:`U_{sub-batch} < U`.
-4) Feed this :math:`U_{sub-batch}` into the Joint model, along with a sub-batch from the Acoustic model (with :math:`T_{sub-batch} < T)`. Remember, we only have to slice off a part of the acoustic model here since we have the full batch of samples :math:`(B, T, D)` from the acoustic model.
-5) Performing steps (3) and (4) yields :math:`T_{sub-batch}` and :math:`U_{sub-batch}`. Perform sub-batch joint step - costing an intermediate :math:`(B, T_{sub-batch}, U_{sub-batch}, V)` in memory.
-6) Compute loss on sub-batch and preserve in a list to be later concatenated.
-7) Compute sub-batch metrics (such as Character / Word Error Rate) using the above Joint tensor and sub-batch of ground truth labels. Preserve the scores to be averaged across the entire batch later.
-8) Delete the sub-batch joint matrix  :math:`(B, T_{sub-batch}, U_{sub-batch}, V)`. Only gradients from .backward() are preserved now in the computation graph.
-9) Repeat steps (3) - (8) until all sub-batches are consumed.
-10) Cleanup step. Compute full batch WER and log. Concatenate loss list and pass to PTL to compute the equivalent of the original (full batch) Joint step. Delete ancillary objects necessary for sub-batching.
-Transducer Decoding
-~~~~~~~~~~~~~~~~~~~
-Models which have been trained with CTC can transcribe text simply by performing a regular argmax over the output of their decoder. For transducer-based models, the three networks must operate in a synchronized manner in order to transcribe the acoustic features. The base config for the Transducer decoding step can be found in the the ``decoding`` section of `ContextNet <./models.html#ContextNet>`__ or other Transducer architectures. For further information refer to the ``Intro to Transducers`` tutorial in the ASR tutorial section.
-**This config can be copy-pasted into any custom transducer model with no modification.**
-The most important component at the top level is the ``strategy``. It can take one of many values:
-* ``greedy``: This is sample-level greedy decoding. It is generally exceptionally slow as each sample in the batch will be decoded independently. For publications, this should be used alongside batch size of 1 for exact results.
-* ``greedy_batch``: This is the general default and should nearly match the ``greedy`` decoding scores (if the acoustic features are not affected by feature mixing in batch mode). Even for small batch sizes, this strategy is significantly faster than ``greedy``.
-* ``beam``: Runs beam search with the implicit language model of the Prediction model. It will generally be quite slow, and might need some tuning of the beam size to get better transcriptions.
-* ``tsd``: Time synchronous decoding. Please refer to the paper: `Alignment-Length Synchronous Decoding for RNN Transducer <https://ieeexplore.ieee.org/document/9053040>`_ for details on the algorithm implemented. Time synchronous decoding (TSD) execution time grows by the factor T * max_symmetric_expansions. For longer sequences, T is greater and can therefore take a long time for beams to obtain good results. TSD also requires more memory to execute.
-* ``alsd``: Alignment-length synchronous decoding. Please refer to the paper: `Alignment-Length Synchronous Decoding for RNN Transducer <https://ieeexplore.ieee.org/document/9053040>`_ for details on the algorithm implemented. Alignment-length synchronous decoding (ALSD) execution time is faster than TSD, with a growth factor of T + U_max, where U_max is the maximum target length expected during execution. Generally, T + U_max < T * max_symmetric_expansions. However, ALSD beams are non-unique. Therefore it is required to use larger beam sizes to achieve the same (or close to the same) decoding accuracy as TSD. For a given decoding accuracy, it is possible to attain faster decoding via ALSD than TSD.
-* ``maes``: Modified Adaptive Expansion Search Decoding. Please refer to the paper `Accelerating RNN Transducer Inference via Adaptive Expansion Search <https://ieeexplore.ieee.org/document/9250505>`_. Modified Adaptive Synchronous Decoding (mAES) execution time is adaptive w.r.t the number of expansions (for tokens) required per timestep. The number of expansions can usually be constrained to 1 or 2, and in most cases 2 is sufficient. This beam search technique can possibly obtain superior WER while sacrificing some evaluation time.
-.. code-block:: yaml
-  decoding:
-    strategy: "greedy_batch"
-    # preserve decoding alignments
-    preserve_alignments: false
-    # Overrides the fused batch size after training.
-    # Setting it to -1 will process whole batch at once when combined with `greedy_batch` decoding strategy
-    fused_batch_size: Optional[int] = -1
-    # greedy strategy config
-    greedy:
-      max_symbols: 10
-    # beam strategy config
-    beam:
-      beam_size: 2
-      score_norm: true
-      softmax_temperature: 1.0  # scale the logits by some temperature prior to softmax
-      tsd_max_sym_exp: 10  # for Time Synchronous Decoding, int > 0
-      alsd_max_target_len: 5.0  # for Alignment-Length Synchronous Decoding, float > 1.0
-      maes_num_steps: 2  # for modified Adaptive Expansion Search, int > 0
-      maes_prefix_alpha: 1  # for modified Adaptive Expansion Search, int > 0
-      maes_expansion_beta: 2  # for modified Adaptive Expansion Search, int >= 0
-      maes_expansion_gamma: 2.3  # for modified Adaptive Expansion Search, float >= 0
-Transducer Loss
-~~~~~~~~~~~~~~~
-This section configures the type of Transducer loss itself, along with possible sub-sections. By default, an optimized implementation of Transducer loss will be used which depends on Numba for CUDA acceleration. The base config for the Transducer loss section can be found in the the ``loss`` section of `ContextNet <./models.html#ContextNet>`__ or other Transducer architectures. For further information refer to the ``Intro to Transducers`` tutorial in the ASR tutorial section.
-**This config can be copy-pasted into any custom transducer model with no modification.**
-The loss config is based on a resolver pattern and can be used as follows:
-1) ``loss_name``: ``default`` is generally a good option. Will select one of the available resolved losses and match the kwargs from a sub-configs passed via explicit ``{loss_name}_kwargs`` sub-config.
-2) ``{loss_name}_kwargs``: This sub-config is passed to the resolved loss above and can be used to configure the resolved loss.
-.. code-block:: yaml
-  loss:
-    loss_name: "default"
-    warprnnt_numba_kwargs:
-      fastemit_lambda: 0.0
-FastEmit Regularization
-^^^^^^^^^^^^^^^^^^^^^^^
-FastEmit Regularization is supported for the default Numba based WarpRNNT loss. Recently proposed regularization approach - `FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization <https://arxiv.org/abs/2010.11148>`_ allows us near-direct control over the latency of transducer models.
-Refer to the above paper for results and recommendations of ``fastemit_lambda``.
-.. _Hybrid-ASR-TTS_model__Config:
-Hybrid ASR-TTS Model Configuration
-----------------------------------
-:ref:`Hybrid ASR-TTS model <Hybrid-ASR-TTS_model>` consists of three parts:
-* ASR model (``EncDecCTCModelBPE``, ``EncDecRNNTBPEModel`` or ``EncDecHybridRNNTCTCBPEModel``)
-* TTS Mel Spectrogram Generator (currently, only :ref:`FastPitch <FastPitch_model>` model is supported)
-* :ref:`Enhancer model <SpectrogramEnhancer_model>` (optional)
-Also, the config allows to specify :ref:`text-only dataset <Hybrid-ASR-TTS_model__Text-Only-Data>`.
-Main parts of the config:
-* ASR model
-    * ``asr_model_path``: path to the ASR model checkpoint (`.nemo`) file, loaded only once, then the config of the ASR model is stored in the ``asr_model`` field
-    * ``asr_model_type``: needed only when training from scratch. ``rnnt_bpe`` corresponds to ``EncDecRNNTBPEModel``, ``ctc_bpe`` to ``EncDecCTCModelBPE``, ``hybrid_rnnt_ctc_bpe`` to ``EncDecHybridRNNTCTCBPEModel``
-    * ``asr_model_fuse_bn``: fusing BatchNorm in the pretrained ASR model, can improve quality in finetuning scenario
-* TTS model
-    * ``tts_model_path``: path to the pretrained TTS model checkpoint (`.nemo`) file, loaded only once, then the config of the model is stored in the ``tts_model`` field
-* Enhancer model
-    * ``enhancer_model_path``: optional path to the enhancer model. Loaded only once, the config is stored in the ``enhancer_model`` field
-* ``train_ds``
-    * ``text_data``: properties related to text-only data
-        * ``manifest_filepath``: path (or paths) to :ref:`text-only dataset <Hybrid-ASR-TTS_model__Text-Only-Data>` manifests
-        * ``speakers_filepath``: path (or paths) to the text file containing speaker ids for the multi-speaker TTS model (speakers are sampled randomly during training)
-        * ``min_words`` and ``max_words``: parameters to filter text-only manifests by the number of words
-        * ``tokenizer_workers``: number of workers for initial tokenization (when loading the data). ``num_CPUs / num_GPUs`` is a recommended value.
-    * ``asr_tts_sampling_technique``, ``asr_tts_sampling_temperature``, ``asr_tts_sampling_probabilities``: sampling parameters for text-only and audio-text data (if both specified). Correspond to ``sampling_technique``, ``sampling_temperature``, and ``sampling_probabilities`` parameters of the :mod:`ConcatDataset <nemo.collections.common.data.dataset.ConcatDataset>`.
-    * all other components are similar to conventional ASR models
-* ``validation_ds`` and ``test_ds`` correspond to the underlying ASR model
-.. code-block:: yaml
-  model:
-    sample_rate: 16000
-    # asr model
-    asr_model_path: ???
-    asr_model: null
-    asr_model_type: null  # rnnt_bpe, ctc_bpe or hybrid_rnnt_ctc_bpe; needed only if instantiating from config, otherwise type is auto inferred
-    asr_model_fuse_bn: false  # only ConformerEncoder supported now, use false for other models
-    # tts model
-    tts_model_path: ???
-    tts_model: null
-    # enhancer model
-    enhancer_model_path: null
-    enhancer_model: null
-    train_ds:
-      text_data:
-        manifest_filepath: ???
-        speakers_filepath: ???
-        min_words: 1
-        max_words: 45  # 45 - recommended value, ~16.7 sec for LibriSpeech
-        tokenizer_workers: 1
-      asr_tts_sampling_technique: round-robin  # random, round-robin, temperature
-      asr_tts_sampling_temperature: null
-      asr_tts_sampling_probabilities: null  # [0.5,0.5] – ASR,TTS
-      manifest_filepath: ???
-      batch_size: 16 # you may increase batch_size if your memory allows
-      # other params
-Finetuning with Text-Only Data
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-To finetune existing ASR model using text-only data use ``<NeMo_git_root>/examples/asr/asr_with_tts/speech_to_text_bpe_with_text_finetune.py`` script with the corresponding config ``<NeMo_git_root>/examples/asr/conf/asr_tts/hybrid_asr_tts.yaml``.
-Please specify paths to all the required models (ASR, TTS, and Enhancer checkpoints), along with ``train_ds.text_data.manifest_filepath`` and ``train_ds.text_data.speakers_filepath``.
-.. code-block:: shell
-    python speech_to_text_bpe_with_text_finetune.py \
-        model.asr_model_path=<path to ASR model> \
-        model.tts_model_path=<path to compatible TTS model> \
-        model.enhancer_model_path=<optional path to enhancer model> \
-        model.asr_model_fuse_bn=<true recommended if ConformerEncoder with BatchNorm, false otherwise> \
-        model.train_ds.manifest_filepath=<path to manifest with audio-text pairs or null> \
-        model.train_ds.text_data.manifest_filepath=<path(s) to manifest with train text> \
-        model.train_ds.text_data.speakers_filepath=<path(s) to speakers list> \
-        model.train_ds.text_data.tokenizer_workers=4 \
-        model.validation_ds.manifest_filepath=<path to validation manifest> \
-        model.train_ds.batch_size=<batch_size>
-Training from Scratch
-~~~~~~~~~~~~~~~~~~~~~
-To train ASR model from scratch using text-only data use ``<NeMo_git_root>/examples/asr/asr_with_tts/speech_to_text_bpe_with_text.py`` script with conventional ASR model config, e.g. ``<NeMo_git_root>/examples/asr/conf/conformer/conformer_ctc_bpe.yaml`` or  ``<NeMo_git_root>/examples/asr/conf/conformer/conformer_transducer_bpe.yaml``
-Please specify the ASR model type, paths to the TTS model, and (optional) enhancer, along with text-only data-related fields.
-Use ``++`` or ``+`` markers for these options, since the options are not present in the original ASR model config.
-.. code-block:: shell
-    python speech_to_text_bpe_with_text.py \
-        ++asr_model_type=<rnnt_bpe or ctc_bpe> \
-        ++tts_model_path=<path to compatible tts model> \
-        ++enhancer_model_path=<optional path to enhancer model> \
-        ++model.train_ds.text_data.manifest_filepath=<path(s) to manifests with train text> \
-        ++model.train_ds.text_data.speakers_filepath=<path(s) to speakers list> \
-        ++model.train_ds.text_data.min_words=1 \
-        ++model.train_ds.text_data.max_words=45 \
-        ++model.train_ds.text_data.tokenizer_workers=4
-Fine-tuning Configurations
---------------------------
-All ASR scripts support easy fine-tuning by partially/fully loading the pretrained weights from a checkpoint into the **currently instantiated model**. Note that the currently instantiated model should have parameters that match the pre-trained checkpoint (such that weights may load properly). In order to directly fine-tune a pre-existing checkpoint, please follow the tutorial  `ASR Language Fine-tuning. <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb>`_
-Models can be fine-tuned in two ways:
-* By updating or retaining current tokenizer alone
-* By updating model architecture and tokenizer
-Fine-tuning by updating or retaining current tokenizer
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In this case, the model architecture is not updated. The model is initialized with the pre-trained weights by
-two ways:
-1) Providing a path to a NeMo model (via ``init_from_nemo_model``)
-2) Providing a name of a pretrained NeMo model (which will be downloaded via the cloud) (via ``init_from_pretrained_model``)
-Then users can use existing tokenizer or update the tokenizer with new vocabulary. This is useful when users don't want to update the model architecture
-but want to update the tokenizer with new vocabulary.
-The same script can be used to finetune CTC, RNNT or Hybrid models as well.
-<NeMo_repo>/examples/asr/speech_to_text_finetune.py script supports this type of fine-tuning with the following arguments:
-.. code-block:: sh
-    python examples/asr/speech_to_text_finetune.py \
-        --config-path=<path to dir of configs> \
-        --config-name=<name of config without .yaml>) \
-        model.train_ds.manifest_filepath="<path to manifest file>" \
-        model.validation_ds.manifest_filepath="<path to manifest file>" \
-        model.tokenizer.update_tokenizer=<True/False> \ # True to update tokenizer, False to retain existing tokenizer
-        model.tokenizer.dir=<path to tokenizer dir> \ # Path to tokenizer dir when update_tokenizer=True
-        model.tokenizer.type=<tokenizer type> \ # tokenizer type when update_tokenizer=True
-        trainer.devices=-1 \
-        trainer.accelerator='gpu' \
-        trainer.max_epochs=50 \
-        +init_from_nemo_model="<path to .nemo model file>" (or +init_from_pretrained_model="<name of pretrained checkpoint>")
-Refer to <NeMo_repo>/examples/asr/conf/asr_finetune/speech_to_text_finetune.yaml for more details.
-Finetune ASR Models using HuggingFace Datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Users can utilize HuggingFace Datasets for finetuning NeMo ASR models. The following config file can be used for this purpose:
-`<NeMo_repo>/examples/asr/conf/asr_finetune/speech_to_text_hf_finetune.yaml`
-As mentioned earlier, users can update the tokenizer or use an existing one based on their requirements. If users want to create a new tokenizer
-from HuggingFace Datasets, they can use the following script:
-`<NeMo_repo>/scripts/tokenizers/get_hf_text_data.py`
-Fine-tuning by changing model architecture and tokenizer
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-If users want to update the model architecture as well they can use the following script:
-For providing pretrained model, users can provide Pre-trained weights in multiple ways -
-1) Providing a path to a NeMo model (via ``init_from_nemo_model``)
-2) Providing a name of a pretrained NeMo model (which will be downloaded via the cloud) (via ``init_from_pretrained_model``)
-3) Providing a path to a Pytorch Lightning checkpoint file (via ``init_from_ptl_ckpt``)
-There are multiple ASR subtasks inside the ``examples/asr/`` directory, you can substitute the ``<subtask>`` tag below.
-.. code-block:: sh
-    python examples/asr/<subtask>/script_to_<script_name>.py \
-        --config-path=<path to dir of configs> \
-        --config-name=<name of config without .yaml>) \
-        model.train_ds.manifest_filepath="<path to manifest file>" \
-        model.validation_ds.manifest_filepath="<path to manifest file>" \
-        trainer.devices=-1 \
-        trainer.accelerator='gpu' \
-        trainer.max_epochs=50 \
-        +init_from_nemo_model="<path to .nemo model file>" # (or +init_from_pretrained_model, +init_from_ptl_ckpt )
-To reinitialize part of the model, to make it different from the pretrained model, users can mention them through config:
-.. code-block:: yaml
-    init_from_nemo_model: "<path to .nemo model file>"
-        asr_model:
-            include: ["preprocessor","encoder"]
-            exclude: ["decoder"]
-Fine-tuning Execution Flow Diagram
-----------------------------------
-When preparing your own training or fine-tuning scripts, please follow the execution flow diagram order for correct inference.
-Depending on the type of model, there may be extra steps that must be performed -
-* CTC Models - `Examples directory for CTC Models <https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_ctc/README.md>`_
-* RNN Transducer Models - `Examples directory for Transducer Models <https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_transducer/README.md>`_

NeMo-2.2.0/docs/source/asr/data/asrlm_results.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model Name,Model Base Class,Model Card
2	- asrlm_en_transformer_large_ls,TransformerLMModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:asrlm_en_transformer_large_ls"

NeMo-2.2.0/docs/source/asr/data/benchmark_by.csv DELETED Viewed

@@ -1,4 +0,0 @@
-Model,Model Base Class
-`stt_uk_citrinet_1024_gamma_0_25 <https://huggingface.co/nvidia/stt_uk_citrinet_1024_gamma_0_25>`_,EncDecCTCModel
-`stt_ua_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_ua_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_ca.csv DELETED Viewed

@@ -1,3 +0,0 @@
-model_name,base_model,link
-stt_ca_conformer_ctc_large,ConformerBaseModel,https://huggingface.co/nvidia/stt_ca_conformer_ctc_large
-stt_ca_conformer_transducer_large,ConformerBaseModel,https://huggingface.co/nvidia/stt_ca_conformer_transducer_large

NeMo-2.2.0/docs/source/asr/data/benchmark_canary.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model,Language
2	- `canary-1b <https://huggingface.co/nvidia/canary-1b>`_ ,"English, French, German, Spanish"

NeMo-2.2.0/docs/source/asr/data/benchmark_cn.csv DELETED Viewed

@@ -1,3 +0,0 @@
-Model,Model Base Class
-`stt_zh_citrinet_1024_gamma_0_25 <https://huggingface.co/nvidia/stt_zh_citrinet_1024_gamma_0_25>`_,EncDecCTCModel
-`stt_zh_conformer_transducer_large <https://huggingface.co/nvidia/stt_zh_conformer_transducer_large>`_,EncDecRNNTBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_code_switching.csv DELETED Viewed

@@ -1,3 +0,0 @@
-Model,Language
-`stt_enes_conformer_ctc_large_codesw <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_enes_conformer_ctc_large_codesw>`_,"English, Spanish"
-`stt_enes_conformer_transducer_large_codesw <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_enes_conformer_transducer_large_codesw>`_,"English, Spanish"

NeMo-2.2.0/docs/source/asr/data/benchmark_cs.csv DELETED Viewed

@@ -1,4 +0,0 @@
-Model,Model Base Class
-`stt_ca_conformer_ctc_large <https://huggingface.co/nvidia/stt_ca_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_ca_conformer_transducer_large <https://huggingface.co/nvidia/stt_ca_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_ca_quartznet15x5 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_ca_quartznet15x5>`_,EncDecCTCModel

NeMo-2.2.0/docs/source/asr/data/benchmark_de.csv DELETED Viewed

@@ -1,9 +0,0 @@
-Model,Model Base Class
-`canary-1b <https://huggingface.co/nvidia/canary-1b>`_,EncDecMultiTaskModel
-`stt_de_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_de_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_de_conformer_ctc_large <https://huggingface.co/nvidia/stt_de_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_de_conformer_transducer_large <https://huggingface.co/nvidia/stt_de_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_de_quartznet15x5 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_de_quartznet15x5>`_,EncDecCTCModel
-`stt_de_contextnet_1024 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_de_contextnet_1024>`_,EncDecRNNTBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc_blend_eu <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc_blend_eu>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_en.csv DELETED Viewed

@@ -1,37 +0,0 @@
-Model,Model Base Class
-`parakeet-tdt-1.1b <https://huggingface.co/nvidia/parakeet-tdt-1.1b>`_, EncDecRNNTBPEModel
-`parakeet-tdt_ctc-1.1b <https://huggingface.co/nvidia/parakeet-tdt_ctc-1.1b>`_, ASRModel
-`parakeet-tdt_ctc-110m <https://huggingface.co/nvidia/parakeet-tdt_ctc-110m>`_, ASRModel
-`canary-1b <https://huggingface.co/nvidia/canary-1b>`_, EncDecMultiTaskModel
-`stt_en_conformer_ctc_large <https://huggingface.co/nvidia/stt_en_conformer_ctc_large>`_, EncDecCTCModelBPE
-`parakeet-ctc-0.6b <https://huggingface.co/nvidia/parakeet-ctc-0.6b>`_, EncDecCTCModelBPE
-`parakeet-ctc-1.1b <https://huggingface.co/nvidia/parakeet-ctc-1.1b>`_, EncDecCTCModelBPE
-`stt_en_conformer_transducer_xlarge <https://huggingface.co/nvidia/stt_en_conformer_transducer_xlarge>`_, EncDecRNNTBPEModel
-`stt_en_fastconformer_ctc_large <https://huggingface.co/nvidia/stt_en_fastconformer_ctc_large>`_, EncDecCTCModelBPE
-`stt_en_citrinet_256_ls <https://huggingface.co/nvidia/stt_en_citrinet_256_ls>`_, EncDecCTCModelBPE
-`stt_en_fastconformer_hybrid_large_streaming_multi <https://huggingface.co/nvidia/stt_en_fastconformer_hybrid_large_streaming_multi>`_, EncDecHybridRNNTCTCBPEModel
-`stt_en_fastconformer_ctc_xxlarge <https://huggingface.co/nvidia/stt_en_fastconformer_ctc_xxlarge>`_, EncDecCTCTBPEModel
-`stt_en_conformer_transducer_large <https://huggingface.co/nvidia/stt_en_conformer_transducer_large>`_, EncDecRNNTBPEModel
-`stt_en_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_en_fastconformer_hybrid_large_pc>`_, EncDecHybridRNNTCTCBPEModel
-`stt_en_citrinet_512_ls <https://huggingface.co/nvidia/stt_en_citrinet_512_ls>`_, EncDecCTCModelBPE
-`stt_en_conformer_ctc_small <https://huggingface.co/nvidia/stt_en_conformer_ctc_small>`_, EncDecCTCModelBPE
-`stt_en_citrinet_1024_gamma_0_25 <https://huggingface.co/nvidia/stt_en_citrinet_1024_gamma_0_25>`_, EncDecCTCModelBPE
-`stt_en_fastconformer_transducer_large <https://huggingface.co/nvidia/stt_en_fastconformer_transducer_large>`_, EncDecRNNTBPEModel
-`stt_en_fastconformer_transducer_xlarge <https://huggingface.co/nvidia/stt_en_fastconformer_transducer_xlarge>`_, EncDecRNNTBPEModel
-`stt_en_fastconformer_transducer_xxlarge <https://huggingface.co/nvidia/stt_en_fastconformer_transducer_xxlarge>`_, EncDecRNNTBPEModel
-`stt_en_citrinet_768_ls <https://huggingface.co/nvidia/stt_en_citrinet_768_ls>`_, EncDecCTCModelBPE
-`stt_en_fastconformer_ctc_xlarge <https://huggingface.co/nvidia/stt_en_fastconformer_ctc_xlarge>`_, EncDecCTCTBPEModel
-`stt_en_citrinet_384_ls <https://huggingface.co/nvidia/stt_en_citrinet_384_ls>`_, EncDecCTCModelBPE
-`stt_en_citrinet_1024_ls <https://huggingface.co/nvidia/stt_en_citrinet_1024_ls>`_, EncDecCTCModelBPE
-`QuartzNet15x5Base-En <https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels>`_, EncDecCTCModel
-`stt_en_jasper10x5dr <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_jasper10x5dr>`_, EncDecCTCModel
-`stt_en_contextnet_256_mls <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_contextnet_256_mls>`_, EncDecRNNTBPEModel
-`stt_en_contextnet_512_mls <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_contextnet_512_mls>`_, EncDecRNNTBPEModel
-`stt_en_contextnet_1024_mls <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_contextnet_1024_mls>`_, EncDecRNNTBPEModel
-`stt_en_contextnet_256 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_contextnet_256>`_, EncDecRNNTBPEModel
-`stt_en_contextnet_512 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_contextnet_512>`_, EncDecRNNTBPEModel
-`stt_en_contextnet_1024 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_contextnet_1024>`_, EncDecRNNTBPEModel
-`stt_enes_conformer_ctc_large <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_enes_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_enes_conformer_transducer_large <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_enes_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc_blend_eu <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc_blend_eu>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_eo.csv DELETED Viewed

@@ -1,3 +0,0 @@
-Model,Model Base Class
-`stt_eo_conformer_transducer_large <https://huggingface.co/nvidia/stt_eo_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_eo_conformer_ctc_large <https://huggingface.co/nvidia/stt_eo_conformer_ctc_large>`_,EncDecCTCModelBPE

NeMo-2.2.0/docs/source/asr/data/benchmark_es.csv DELETED Viewed

@@ -1,11 +0,0 @@
-Model,Model Base Class
-`canary-1b <https://huggingface.co/nvidia/canary-1b>`_,EncDecMultiTaskModel
-`stt_es_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_es_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_es_conformer_transducer_large <https://huggingface.co/nvidia/stt_es_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_es_conformer_ctc_large <https://huggingface.co/nvidia/stt_es_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_es_quartznet15x5 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_es_quartznet15x5>`_,EncDecCTCModel
-`stt_es_contextnet_1024 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_es_contextnet_1024>`_,EncDecRNNTBPEModel
-`stt_enes_conformer_ctc_large <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_enes_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_enes_conformer_transducer_large <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_enes_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc_blend_eu <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc_blend_eu>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_fa.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model,Model Base Class
2	- `stt_fa_fastconformer_hybrid_large <https://huggingface.co/nvidia/stt_fa_fastconformer_hybrid_large>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_fastconformer_hybrid.csv DELETED Viewed

@@ -1,16 +0,0 @@
-Model,Language
-`stt_be_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_be_fastconformer_hybrid_large_pc>`_,Belarusian
-`stt_hr_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_hr_fastconformer_hybrid_large_pc>`_,Croatian
-`stt_nl_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_nl_fastconformer_hybrid_large_pc>`_,Dutch
-`stt_en_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_en_fastconformer_hybrid_large_pc>`_,English
-`stt_fr_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_fr_fastconformer_hybrid_large_pc>`_,French
-`stt_ka_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_ka_fastconformer_hybrid_large_pc>`_,Georgian
-`stt_de_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_de_fastconformer_hybrid_large_pc>`_,German
-`stt_it_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_it_fastconformer_hybrid_large_pc>`_,Italian
-`stt_kk_ru_fastconformer_hybrid_large <https://huggingface.co/nvidia/stt_kk_ru_fastconformer_hybrid_large>`_,"Kazakh, Russian"
-`stt_fa_fastconformer_hybrid_large <https://huggingface.co/nvidia/stt_fa_fastconformer_hybrid_large>`_,Persian
-`stt_pl_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_pl_fastconformer_hybrid_large_pc>`_,Polish
-`stt_ru_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_ru_fastconformer_hybrid_large_pc>`_,Russian
-`stt_es_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_es_fastconformer_hybrid_large_pc>`_,Spanish
-`stt_ua_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_ua_fastconformer_hybrid_large_pc>`_,Ukrainian
-`stt_uz_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_uz_fastconformer_hybrid_large_pc>`_,Uzbek

NeMo-2.2.0/docs/source/asr/data/benchmark_fr.csv DELETED Viewed

@@ -1,11 +0,0 @@
-Model,Model Base Class
-`canary-1b <https://huggingface.co/nvidia/canary-1b>`_,EncDecMultiTaskModel
-`stt_fr_conformer_ctc_large <https://huggingface.co/nvidia/stt_fr_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_fr_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_fr_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_fr_conformer_transducer_large <https://huggingface.co/nvidia/stt_fr_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_fr_quartznet15x5 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_fr_quartznet15x5>`_,EncDecCTCModel
-`stt_fr_contextnet_1024 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_fr_contextnet_1024>`_,EncDecRNNTBPEModel
-`stt_fr_no_hyphen_citrinet_1024_gamma_0_25 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_fr_citrinet_1024_gamma_0_25>`_,EncDecCTCModelBPE
-`stt_fr_no_hyphen_conformer_ctc_large <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_fr_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc_blend_eu <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc_blend_eu>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_hi.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model Name,Model Base Class
2	- `stt_hi_conformer_ctc_medium <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_hi_conformer_ctc_medium>`_,EncDecCTCModelBPE

NeMo-2.2.0/docs/source/asr/data/benchmark_hr.csv DELETED Viewed

@@ -1,5 +0,0 @@
-Model,Model Base Class
-`stt_hr_conformer_transducer_large <https://huggingface.co/nvidia/stt_hr_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_hr_conformer_ctc_large <https://huggingface.co/nvidia/stt_hr_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_hr_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_hr_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_it.csv DELETED Viewed

@@ -1,6 +0,0 @@
-Model,Model Base Class
-`stt_it_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_it_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_it_conformer_ctc_large <https://huggingface.co/nvidia/stt_it_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_it_conformer_transducer_large <https://huggingface.co/nvidia/stt_it_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_it_quartznet15x5 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_it_quartznet15x5>`_,EncDecCTCModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_jp.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model,Model Base Class
2	- `parakeet-tdt_ctc-0.6b-ja <https://huggingface.co/nvidia/parakeet-tdt_ctc-0.6b-ja>`_,ASRModel

NeMo-2.2.0/docs/source/asr/data/benchmark_ka.csv DELETED Viewed

@@ -1,3 +0,0 @@
-Model,Model Base Class
-`stt_ka_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_ka_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc <https://huggingface.co/nvidia/stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_kab.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model,Model Base Class
2	- `stt_kab_conformer_transducer_large <https://huggingface.co/nvidia/stt_kab_conformer_transducer_large>`_,EncDecRNNTBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_kz.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model,Model Base Class
2	- `stt_kk_ru_fastconformer_hybrid_large <https://huggingface.co/nvidia/stt_kk_ru_fastconformer_hybrid_large>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_mr.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model Name,Model Base Class
2	- `stt_mr_conformer_ctc_medium <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_mr_conformer_ctc_medium>`_,EncDecCTCModelBPE

NeMo-2.2.0/docs/source/asr/data/benchmark_multilingual.csv DELETED Viewed

@@ -1,5 +0,0 @@
-Model,Model Base Class,Model Card
-stt_enes_conformer_ctc_large,EncDecCTCModelBPE,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_enes_conformer_ctc_large"
-stt_enes_conformer_transducer_large,EncDecRNNTBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_enes_conformer_transducer_large"
-stt_multilingual_fastconformer_hybrid_large_pc,EncDecHybridRNNTCTCBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc"
-stt_multilingual_fastconformer_hybrid_large_pc_blend_eu,EncDecHybridRNNTCTCBPEModel,"https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc_blend_eu"

NeMo-2.2.0/docs/source/asr/data/benchmark_nl.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model,Model Base Class
2	- `stt_nl_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_nl_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_parakeet.csv DELETED Viewed

@@ -1,7 +0,0 @@
-Model,Language
-`parakeet-ctc-0.6b <https://huggingface.co/nvidia/parakeet-ctc-0.6b>`_,English
-`parakeet-ctc-1.1b <https://huggingface.co/nvidia/parakeet-ctc-1.1b>`_,English
-`parakeet-tdt-1.1b <https://huggingface.co/nvidia/parakeet-tdt-1.1b>`_,English
-`parakeet-tdt_ctc-110m <https://huggingface.co/nvidia/parakeet-tdt_ctc-110m>`_,English
-`parakeet-tdt_ctc-1.1b <https://huggingface.co/nvidia/parakeet-tdt_ctc-1.1b>`_,English
-`parakeet-tdt_ctc-0.6b-ja <https://huggingface.co/nvidia/parakeet-tdt_ctc-0.6b-ja>`_,Japanese

NeMo-2.2.0/docs/source/asr/data/benchmark_pl.csv DELETED Viewed

@@ -1,4 +0,0 @@
-Model,Model Base Class
-`stt_pl_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_pl_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_pl_quartznet15x5 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_pl_quartznet15x5>`_,EncDecCTCModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_ru.csv DELETED Viewed

@@ -1,7 +0,0 @@
-Model,Model Base Class
-`stt_ru_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_ru_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_ru_conformer_transducer_large <https://huggingface.co/nvidia/stt_ru_conformer_transducer_large>`_,EncDecRNNTBPEModel
-`stt_kk_ru_fastconformer_hybrid_large <https://huggingface.co/nvidia/stt_kk_ru_fastconformer_hybrid_large>`_,EncDecHybridRNNTCTCBPEModel
-`stt_ru_conformer_ctc_large <https://huggingface.co/nvidia/stt_ru_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_ru_quartznet15x5 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_ru_quartznet15x5>`_,EncDecCTCModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_rw.csv DELETED Viewed

@@ -1,3 +0,0 @@
-Model,Model Base Class
-`stt_rw_conformer_ctc_large <https://huggingface.co/nvidia/stt_rw_conformer_ctc_large>`_,EncDecCTCModelBPE
-`stt_rw_conformer_transducer_large <https://huggingface.co/nvidia/stt_rw_conformer_transducer_large>`_,EncDecRNNTBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_ua.csv DELETED Viewed

@@ -1,4 +0,0 @@
-Model,Model Base Class
-`stt_uk_citrinet_1024_gamma_0_25 <https://huggingface.co/nvidia/stt_uk_citrinet_1024_gamma_0_25>`_,EncDecCTCModel
-`stt_ua_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_ua_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel
-`stt_multilingual_fastconformer_hybrid_large_pc <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_multilingual_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_uz.csv DELETED Viewed

	@@ -1,2 +0,0 @@
1	- Model,Model Base Class
2	- `stt_uz_fastconformer_hybrid_large_pc <https://huggingface.co/nvidia/stt_uz_fastconformer_hybrid_large_pc>`_,EncDecHybridRNNTCTCBPEModel

NeMo-2.2.0/docs/source/asr/data/benchmark_zh.csv DELETED Viewed

@@ -1,4 +0,0 @@
-Model,Model Base Class
-`stt_zh_citrinet_512 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_zh_citrinet_512>`_,EncDecCTCModel
-`stt_zh_citrinet_1024_gamma_0_25 <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_zh_citrinet_1024_gamma_0_25>`_,EncDecCTCModel
-`stt_zh_conformer_transducer_large <https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_zh_conformer_transducer_large>`_,EncDecRNNTModel

NeMo-2.2.0/docs/source/asr/data/scores/be/conformer_be.csv DELETED Viewed

@@ -1,3 +0,0 @@
-Model Name,Language,MCV Test-Set v10 (be)
-stt_be_conformer_ctc_large,be,4.7 %
-stt_be_conformer_transducer_large,be,3.8 %