+ MAERec is a scene text recognition model composed of a ViT backbone and a Transformer decoder in auto-regressive
+ style. It shows an outstanding performance in scene text recognition, especially when pre-trained on the
+ Union14M-U through MAE.
+
+
+ In this demo, we combine MAERec with DBNet++ to build an
+ end-to-end scene text recognition model.
+
+
+ """)
+ gr.Image('github/maerec.png')
+ with gr.Column(scale=1):
+ input_image = gr.Image(label='Input Image')
+ output_image = gr.Image(label='Output Image')
+ use_detector = gr.Checkbox(
+ label=
+ 'Use Scene Text Detector or Not (Disabled for Recognition Only)',
+ default=True)
+ det_results = gr.Textbox(label='Detection Results')
+ mmocr = gr.Button('Run MMOCR')
+ gr.Markdown("## Image Examples")
+ with gr.Row():
+ gr.Examples(
+ examples=[
+ 'github/author.jpg', 'github/gradio1.jpeg',
+ 'github/Art_Curve_178.jpg', 'github/cute_3.jpg',
+ 'github/cute_168.jpg', 'github/hiercurve_2229.jpg',
+ 'github/ic15_52.jpg', 'github/ic15_698.jpg',
+ 'github/Art_Curve_352.jpg'
+ ],
+ inputs=input_image,
+ )
+ mmocr.click(
+ fn=run_mmocr,
+ inputs=[input_image, use_detector],
+ outputs=[output_image, det_results])
+ demo.launch(debug=True)
diff --git a/mmocr-dev-1.x/.circleci/config.yml b/mmocr-dev-1.x/.circleci/config.yml
new file mode 100644
index 0000000000000000000000000000000000000000..05bf6d2c08bf787b91abeecf09586ac9aecad71c
--- /dev/null
+++ b/mmocr-dev-1.x/.circleci/config.yml
@@ -0,0 +1,34 @@
+version: 2.1
+
+# this allows you to use CircleCI's dynamic configuration feature
+setup: true
+
+# the path-filtering orb is required to continue a pipeline based on
+# the path of an updated fileset
+orbs:
+ path-filtering: circleci/path-filtering@0.1.2
+
+workflows:
+ # the always-run workflow is always triggered, regardless of the pipeline parameters.
+ always-run:
+ jobs:
+ # the path-filtering/filter job determines which pipeline
+ # parameters to update.
+ - path-filtering/filter:
+ name: check-updated-files
+ # 3-column, whitespace-delimited mapping. One mapping per
+ # line:
+ #
+ mapping: |
+ mmocr/.* lint_only false
+ requirements/.* lint_only false
+ tests/.* lint_only false
+ tools/.* lint_only false
+ configs/.* lint_only false
+ .circleci/.* lint_only false
+ base-revision: dev-1.x
+ # this is the path of the configuration we should trigger once
+ # path filtering and pipeline parameter value updates are
+ # complete. In this case, we are using the parent dynamic
+ # configuration itself.
+ config-path: .circleci/test.yml
diff --git a/mmocr-dev-1.x/.circleci/docker/Dockerfile b/mmocr-dev-1.x/.circleci/docker/Dockerfile
new file mode 100644
index 0000000000000000000000000000000000000000..d9cf8cc7712d5241975c3b748fb0d01a5545b4fd
--- /dev/null
+++ b/mmocr-dev-1.x/.circleci/docker/Dockerfile
@@ -0,0 +1,11 @@
+ARG PYTORCH="1.8.1"
+ARG CUDA="10.2"
+ARG CUDNN="7"
+
+FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
+
+# To fix GPG key error when running apt-get update
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
+
+RUN apt-get update && apt-get install -y ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 libgl1-mesa-glx
diff --git a/mmocr-dev-1.x/.circleci/test.yml b/mmocr-dev-1.x/.circleci/test.yml
new file mode 100644
index 0000000000000000000000000000000000000000..c24bebcb50465251f879a0506caf4587f3cc92a6
--- /dev/null
+++ b/mmocr-dev-1.x/.circleci/test.yml
@@ -0,0 +1,196 @@
+version: 2.1
+
+# the default pipeline parameters, which will be updated according to
+# the results of the path-filtering orb
+parameters:
+ lint_only:
+ type: boolean
+ default: true
+
+jobs:
+ lint:
+ docker:
+ - image: cimg/python:3.7.4
+ steps:
+ - checkout
+ - run:
+ name: Install pre-commit hook
+ command: |
+ pip install pre-commit
+ pre-commit install
+ - run:
+ name: Linting
+ command: pre-commit run --all-files
+ - run:
+ name: Check docstring coverage
+ command: |
+ pip install interrogate
+ interrogate -v --ignore-init-method --ignore-module --ignore-nested-functions --ignore-magic --ignore-regex "__repr__" --fail-under 90 mmocr
+ build_cpu:
+ parameters:
+ # The python version must match available image tags in
+ # https://circleci.com/developer/images/image/cimg/python
+ python:
+ type: string
+ torch:
+ type: string
+ torchvision:
+ type: string
+ docker:
+ - image: cimg/python:<< parameters.python >>
+ resource_class: large
+ steps:
+ - checkout
+ - run:
+ name: Install Libraries
+ command: |
+ sudo apt-get update
+ sudo apt-get install -y ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 libgl1-mesa-glx libjpeg-dev zlib1g-dev libtinfo-dev libncurses5 libgeos-dev
+ - run:
+ name: Configure Python & pip
+ command: |
+ pip install --upgrade pip
+ pip install wheel
+ - run:
+ name: Install PyTorch
+ command: |
+ python -V
+ pip install torch==<< parameters.torch >>+cpu torchvision==<< parameters.torchvision >>+cpu -f https://download.pytorch.org/whl/torch_stable.html
+ - run:
+ name: Install mmocr dependencies
+ command: |
+ pip install git+https://github.com/open-mmlab/mmengine.git@main
+ pip install -U openmim
+ mim install 'mmcv >= 2.0.0rc1'
+ pip install git+https://github.com/open-mmlab/mmdetection.git@dev-3.x
+ pip install -r requirements/tests.txt
+ - run:
+ name: Build and install
+ command: |
+ pip install -e .
+ - run:
+ name: Run unittests
+ command: |
+ coverage run --branch --source mmocr -m pytest tests/
+ coverage xml
+ coverage report -m
+ build_cuda:
+ parameters:
+ torch:
+ type: string
+ cuda:
+ type: enum
+ enum: ["10.1", "10.2", "11.1", "11.7"]
+ cudnn:
+ type: integer
+ default: 7
+ machine:
+ image: ubuntu-2004-cuda-11.4:202110-01
+ # docker_layer_caching: true
+ resource_class: gpu.nvidia.small
+ steps:
+ - checkout
+ - run:
+ # Cloning repos in VM since Docker doesn't have access to the private key
+ name: Clone Repos
+ command: |
+ git clone -b main --depth 1 https://github.com/open-mmlab/mmengine.git /home/circleci/mmengine
+ git clone -b dev-3.x --depth 1 https://github.com/open-mmlab/mmdetection.git /home/circleci/mmdetection
+ - run:
+ name: Build Docker image
+ command: |
+ docker build .circleci/docker -t mmocr:gpu --build-arg PYTORCH=<< parameters.torch >> --build-arg CUDA=<< parameters.cuda >> --build-arg CUDNN=<< parameters.cudnn >>
+ docker run --gpus all -t -d -v /home/circleci/project:/mmocr -v /home/circleci/mmengine:/mmengine -v /home/circleci/mmdetection:/mmdetection -w /mmocr --name mmocr mmocr:gpu
+ - run:
+ name: Install mmocr dependencies
+ command: |
+ docker exec mmocr pip install -e /mmengine
+ docker exec mmocr pip install -U openmim
+ docker exec mmocr mim install 'mmcv >= 2.0.0rc1'
+ docker exec mmocr pip install -e /mmdetection
+ docker exec mmocr pip install -r requirements/tests.txt
+ - run:
+ name: Build and install
+ command: |
+ docker exec mmocr pip install -e .
+ - run:
+ name: Run unittests
+ command: |
+ docker exec mmocr pytest tests/
+
+workflows:
+ pr_stage_lint:
+ when: << pipeline.parameters.lint_only >>
+ jobs:
+ - lint:
+ name: lint
+ filters:
+ branches:
+ ignore:
+ - dev-1.x
+ - 1.x
+ - main
+ pr_stage_test:
+ when:
+ not:
+ << pipeline.parameters.lint_only >>
+ jobs:
+ - lint:
+ name: lint
+ filters:
+ branches:
+ ignore:
+ - dev-1.x
+ - test-1.x
+ - main
+ - build_cpu:
+ name: minimum_version_cpu
+ torch: 1.6.0
+ torchvision: 0.7.0
+ python: "3.7"
+ requires:
+ - lint
+ - build_cpu:
+ name: maximum_version_cpu
+ torch: 2.0.0
+ torchvision: 0.15.1
+ python: 3.9.0
+ requires:
+ - minimum_version_cpu
+ - hold:
+ type: approval
+ requires:
+ - maximum_version_cpu
+ - build_cuda:
+ name: mainstream_version_gpu
+ torch: 1.8.1
+ # Use double quotation mark to explicitly specify its type
+ # as string instead of number
+ cuda: "10.2"
+ requires:
+ - hold
+ - build_cuda:
+ name: mainstream_version_gpu
+ torch: 2.0.0
+ # Use double quotation mark to explicitly specify its type
+ # as string instead of number
+ cuda: "11.7"
+ cudnn: 8
+ requires:
+ - hold
+ merge_stage_test:
+ when:
+ not:
+ << pipeline.parameters.lint_only >>
+ jobs:
+ - build_cuda:
+ name: minimum_version_gpu
+ torch: 1.6.0
+ # Use double quotation mark to explicitly specify its type
+ # as string instead of number
+ cuda: "10.1"
+ filters:
+ branches:
+ only:
+ - dev-1.x
+ - main
diff --git a/mmocr-dev-1.x/.codespellrc b/mmocr-dev-1.x/.codespellrc
new file mode 100644
index 0000000000000000000000000000000000000000..d9a0a76c5862203c2951d0b3703da9c1322417e8
--- /dev/null
+++ b/mmocr-dev-1.x/.codespellrc
@@ -0,0 +1,5 @@
+[codespell]
+skip = *.ipynb
+count =
+quiet-level = 3
+ignore-words-list = convertor,convertors,formating,nin,wan,datas,hist,ned
diff --git a/mmocr-dev-1.x/.coveragerc b/mmocr-dev-1.x/.coveragerc
new file mode 100644
index 0000000000000000000000000000000000000000..a7ee638287be67483ce907295325c53264af4c8c
--- /dev/null
+++ b/mmocr-dev-1.x/.coveragerc
@@ -0,0 +1,3 @@
+[run]
+omit =
+ */__init__.py
diff --git a/mmocr-dev-1.x/.dev_scripts/benchmark_full_models.txt b/mmocr-dev-1.x/.dev_scripts/benchmark_full_models.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5d7d7bf4e36369bdd34ad1c73e99104a37191ab8
--- /dev/null
+++ b/mmocr-dev-1.x/.dev_scripts/benchmark_full_models.txt
@@ -0,0 +1,18 @@
+textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py
+textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py
+textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py
+textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py
+textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py
+textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py
+textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py
+textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500.py
+textrecog/abinet/abinet-vision_20e_st-an_mj.py
+textrecog/crnn/crnn_mini-vgg_5e_mj.py
+textrecog/master/master_resnet31_12e_st_mj_sa.py
+textrecog/nrtr/nrtr_resnet31-1by16-1by8_6e_st_mj.py
+textrecog/robust_scanner/robustscanner_resnet31_5e_st-sub_mj-sub_sa_real.py
+textrecog/sar/sar_resnet31_parallel-decoder_5e_st-sub_mj-sub_sa_real.py
+textrecog/satrn/satrn_shallow-small_5e_st_mj.py
+textrecog/satrn/satrn_shallow-small_5e_st_mj.py
+textrecog/aster/aster_resnet45_6e_st_mj.py
+textrecog/svtr/svtr-small_20e_st_mj.py
diff --git a/mmocr-dev-1.x/.dev_scripts/benchmark_options.py b/mmocr-dev-1.x/.dev_scripts/benchmark_options.py
new file mode 100644
index 0000000000000000000000000000000000000000..e10c7adbccb430f513ae58517a47050444cdecc1
--- /dev/null
+++ b/mmocr-dev-1.x/.dev_scripts/benchmark_options.py
@@ -0,0 +1,7 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+third_part_libs = [
+ 'pip install -r ../requirements/albu.txt',
+]
+
+default_floating_range = 0.5
diff --git a/mmocr-dev-1.x/.dev_scripts/benchmark_train_models.txt b/mmocr-dev-1.x/.dev_scripts/benchmark_train_models.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8cba62d0cae75b93cac6b004aff0c860abcfcaaa
--- /dev/null
+++ b/mmocr-dev-1.x/.dev_scripts/benchmark_train_models.txt
@@ -0,0 +1,9 @@
+textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py
+textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py
+textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py
+textrecog/abinet/abinet-vision_20e_st-an_mj.py
+textrecog/crnn/crnn_mini-vgg_5e_mj.py
+textrecog/aster/aster_resnet45_6e_st_mj.py
+textrecog/nrtr/nrtr_resnet31-1by16-1by8_6e_st_mj.py
+textrecog/sar/sar_resnet31_parallel-decoder_5e_st-sub_mj-sub_sa_real.py
+textrecog/svtr/svtr-small_20e_st_mj.py
diff --git a/mmocr-dev-1.x/.dev_scripts/covignore.cfg b/mmocr-dev-1.x/.dev_scripts/covignore.cfg
new file mode 100644
index 0000000000000000000000000000000000000000..00ec54b01a343447ab555390112b6d3e9676738c
--- /dev/null
+++ b/mmocr-dev-1.x/.dev_scripts/covignore.cfg
@@ -0,0 +1,18 @@
+# Each line should be the relative path to the root directory
+# of this repo. Support regular expression as well.
+# For example:
+# mmocr/models/textdet/postprocess/utils.py
+# .*/utils.py
+.*/__init__.py
+
+# It will be removed after all models have been refactored
+mmocr/utils/bbox_utils.py
+
+# Major part is covered, however, it's hard to cover model's output.
+mmocr/models/textdet/detectors/mmdet_wrapper.py
+
+# It will be removed after KieVisualizer and TextSpotterVisualizer
+mmocr/visualization/visualize.py
+
+# Add tests for data preparers later
+mmocr/datasets/preparers
diff --git a/mmocr-dev-1.x/.dev_scripts/diff_coverage_test.sh b/mmocr-dev-1.x/.dev_scripts/diff_coverage_test.sh
new file mode 100755
index 0000000000000000000000000000000000000000..588d6dbd4070e314f7fa4c5fda1311b3589221b2
--- /dev/null
+++ b/mmocr-dev-1.x/.dev_scripts/diff_coverage_test.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+set -e
+
+readarray -t IGNORED_FILES < $( dirname "$0" )/covignore.cfg
+
+REUSE_COVERAGE_REPORT=${REUSE_COVERAGE_REPORT:-0}
+REPO=${1:-"origin"}
+BRANCH=${2:-"refactor_dev"}
+
+git fetch $REPO $BRANCH
+
+PY_FILES=""
+for FILE_NAME in $(git diff --name-only ${REPO}/${BRANCH}); do
+ # Only test python files in mmocr/ existing in current branch, and not ignored in covignore.cfg
+ if [ ${FILE_NAME: -3} == ".py" ] && [ ${FILE_NAME:0:6} == "mmocr/" ] && [ -f "$FILE_NAME" ]; then
+ IGNORED=false
+ for IGNORED_FILE_NAME in "${IGNORED_FILES[@]}"; do
+ # Skip blank lines
+ if [ -z "$IGNORED_FILE_NAME" ]; then
+ continue
+ fi
+ if [ "${IGNORED_FILE_NAME::1}" != "#" ] && [[ "$FILE_NAME" =~ $IGNORED_FILE_NAME ]]; then
+ echo "Ignoring $FILE_NAME"
+ IGNORED=true
+ break
+ fi
+ done
+ if [ "$IGNORED" = false ]; then
+ PY_FILES="$PY_FILES $FILE_NAME"
+ fi
+ fi
+done
+
+# Only test the coverage when PY_FILES are not empty, otherwise they will test the entire project
+if [ ! -z "${PY_FILES}" ]
+then
+ if [ "$REUSE_COVERAGE_REPORT" == "0" ]; then
+ coverage run --branch --source mmocr -m pytest tests/
+ fi
+ coverage report --fail-under 90 -m $PY_FILES
+ interrogate -v --ignore-init-method --ignore-module --ignore-nested-functions --ignore-magic --ignore-regex "__repr__" --fail-under 95 $PY_FILES
+fi
diff --git a/mmocr-dev-1.x/.github/CODE_OF_CONDUCT.md b/mmocr-dev-1.x/.github/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000000000000000000000000000000000000..92afad1c5ab5d5781115dee45c131d3751d3cd31
--- /dev/null
+++ b/mmocr-dev-1.x/.github/CODE_OF_CONDUCT.md
@@ -0,0 +1,76 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+- Using welcoming and inclusive language
+- Being respectful of differing viewpoints and experiences
+- Gracefully accepting constructive criticism
+- Focusing on what is best for the community
+- Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+- The use of sexualized language or imagery and unwelcome sexual attention or
+ advances
+- Trolling, insulting/derogatory comments, and personal or political attacks
+- Public or private harassment
+- Publishing others' private information, such as a physical or electronic
+ address, without explicit permission
+- Other conduct which could reasonably be considered inappropriate in a
+ professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community. Examples of
+representing a project or community include using an official project e-mail
+address, posting via an official social media account, or acting as an appointed
+representative at an online or offline event. Representation of a project may be
+further defined and clarified by project maintainers.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at chenkaidev@gmail.com. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
+
+[homepage]: https://www.contributor-covenant.org
diff --git a/mmocr-dev-1.x/.github/CONTRIBUTING.md b/mmocr-dev-1.x/.github/CONTRIBUTING.md
new file mode 100644
index 0000000000000000000000000000000000000000..7c7d23f22866eae2c12844e365c0f8a03c0e501b
--- /dev/null
+++ b/mmocr-dev-1.x/.github/CONTRIBUTING.md
@@ -0,0 +1 @@
+We appreciate all contributions to improve MMOCR. Please read [Contribution Guide](/docs/en/notes/contribution_guide.md) for step-by-step instructions to make a contribution to MMOCR, and [CONTRIBUTING.md](https://github.com/open-mmlab/mmcv/blob/master/CONTRIBUTING.md) in MMCV for more details about the contributing guideline.
diff --git a/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/1-bug-report.yml b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/1-bug-report.yml
new file mode 100644
index 0000000000000000000000000000000000000000..6faa7b762abbec60f3444db838dcdfc072ba5f25
--- /dev/null
+++ b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/1-bug-report.yml
@@ -0,0 +1,121 @@
+name: "๐ Bug report"
+description: "Create a report to help us reproduce and fix the bug"
+labels: kind/bug
+title: "[Bug] "
+
+body:
+ - type: markdown
+ attributes:
+ value: |
+ ## Note
+ For general usage questions or idea discussions, please post it to our [**Forum**](https://github.com/open-mmlab/mmocr/discussions)
+ If this issue is about installing MMCV, please file an issue at [MMCV](https://github.com/open-mmlab/mmcv/issues/new/choose).
+ If it's anything about model deployment, please raise it to [MMDeploy](https://github.com/open-mmlab/mmdeploy)
+
+ Please fill in as **much** of the following form as you're able to. **The clearer the description, the shorter it will take to solve it.**
+
+ - type: checkboxes
+ attributes:
+ label: Prerequisite
+ description: Please check the following items before creating a new issue.
+ options:
+ - label: I have searched [Issues](https://github.com/open-mmlab/mmocr/issues) and [Discussions](https://github.com/open-mmlab/mmocr/discussions) but cannot get the expected help.
+ required: true
+ # - label: I have read the [FAQ documentation](https://mmocr.readthedocs.io/en/1.x/notes/4_faq.html) but cannot get the expected help.
+ # required: true
+ - label: The bug has not been fixed in the [latest version (0.x)](https://github.com/open-mmlab/mmocr) or [latest version (1.x)](https://github.com/open-mmlab/mmocr/tree/dev-1.x).
+ required: true
+
+ - type: dropdown
+ id: task
+ attributes:
+ label: Task
+ description: The problem arises when
+ options:
+ - I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
+ - I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.
+ validations:
+ required: true
+
+ - type: dropdown
+ id: branch
+ attributes:
+ label: Branch
+ description: The problem arises when I'm working on
+ options:
+ - main branch https://github.com/open-mmlab/mmocr
+ - 1.x branch https://github.com/open-mmlab/mmocr/tree/dev-1.x
+ validations:
+ required: true
+
+ - type: textarea
+ attributes:
+ label: Environment
+ description: |
+ Please run `python mmocr/utils/collect_env.py` to collect necessary environment information and copy-paste it here.
+ You may add additional information that may be helpful for locating the problem, such as
+ - How you installed PyTorch \[e.g., pip, conda, source\]
+ - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
+ validations:
+ required: true
+
+ - type: textarea
+ attributes:
+ label: Reproduces the problem - code sample
+ description: |
+ Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
+ placeholder: |
+ ```python
+ # Sample code to reproduce the problem
+ ```
+ validations:
+ required: true
+
+ - type: textarea
+ attributes:
+ label: Reproduces the problem - command or script
+ description: |
+ What command or script did you run?
+ placeholder: |
+ ```shell
+ The command or script you run.
+ ```
+ validations:
+ required: true
+
+ - type: textarea
+ attributes:
+ label: Reproduces the problem - error message
+ description: |
+ Please provide the error message or logs you got, with the full traceback.
+
+ Tip: You can attach images or log files by dragging them into the text area..
+ placeholder: |
+ ```
+ The error message or logs you got, with the full traceback.
+ ```
+ validations:
+ required: true
+
+ - type: textarea
+ attributes:
+ label: Additional information
+ description: |
+ Tell us anything else you think we should know.
+
+ Tip: You can attach images or log files by dragging them into the text area.
+ placeholder: |
+ 1. What's your expected result?
+ 2. What dataset did you use?
+ 3. What do you think might be the reason?
+
+ - type: markdown
+ attributes:
+ value: |
+ ## Acknowledgement
+ Thanks for taking the time to fill out this report.
+
+ If you have already identified the reason, we strongly appreciate you creating a new PR to fix it [**Here**](https://github.com/open-mmlab/mmocr/pulls)!
+ Please refer to [**Contribution Guide**](https://mmocr.readthedocs.io/en/dev-1.x/notes/contribution_guide.html) for contributing.
+
+ Welcome to join our [**Community**](https://mmocr.readthedocs.io/en/latest/contact.html) to discuss together. ๐ฌ
diff --git a/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/2-feature_request.yml b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/2-feature_request.yml
new file mode 100644
index 0000000000000000000000000000000000000000..56dad87138c0285a155fb7366f27ab7da869ed9b
--- /dev/null
+++ b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/2-feature_request.yml
@@ -0,0 +1,39 @@
+name: ๐ Feature request
+description: Suggest an idea for this project
+labels: [feature-request]
+title: "[Feature] "
+
+body:
+ - type: markdown
+ attributes:
+ value: |
+ ## Note
+ For general usage questions or idea discussions, please post it to our [**Forum**](https://github.com/open-mmlab/mmocr/discussions)
+
+ Please fill in as **much** of the following form as you're able to. **The clearer the description, the shorter it will take to solve it.**
+
+ - type: textarea
+ attributes:
+ label: What is the feature?
+ description: Tell us more about the feature and how this feature can help.
+ placeholder: |
+ E.g., It is inconvenient when \[....\].
+ validations:
+ required: true
+
+ - type: textarea
+ attributes:
+ label: Any other context?
+ description: |
+ Have you considered any alternative solutions or features? If so, what are they? Also, feel free to add any other context or screenshots about the feature request here.
+
+ - type: markdown
+ attributes:
+ value: |
+ ## Acknowledgement
+ Thanks for taking the time to fill out this report.
+
+ We strongly appreciate you creating a new PR to implement it [**Here**](https://github.com/open-mmlab/mmocr/pulls)!
+ Please refer to [**Contribution Guide**](https://mmocr.readthedocs.io/en/dev-1.x/notes/contribution_guide.html) for contributing.
+
+ Welcome to join our [**Community**](https://mmocr.readthedocs.io/en/latest/contact.html) to discuss together. ๐ฌ
diff --git a/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/3-new-model.yml b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/3-new-model.yml
new file mode 100644
index 0000000000000000000000000000000000000000..ea5491cca4ebb4392bb3413a547d43ea14c2f021
--- /dev/null
+++ b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/3-new-model.yml
@@ -0,0 +1,51 @@
+name: "\U0001F31F New model/dataset/scheduler addition"
+description: Submit a proposal/request to implement a new model / dataset / scheduler
+labels: [ "feature-request" ]
+title: "[New Models] "
+
+
+body:
+ - type: markdown
+ attributes:
+ value: |
+ ## Note
+ For general usage questions or idea discussions, please post it to our [**Forum**](https://github.com/open-mmlab/mmocr/discussions)
+
+ Please fill in as **much** of the following form as you're able to. **The clearer the description, the shorter it will take to solve it.**
+
+ - type: textarea
+ id: description-request
+ validations:
+ required: true
+ attributes:
+ label: Model/Dataset/Scheduler description
+ description: |
+ Put any and all important information relative to the model/dataset/scheduler
+
+ - type: checkboxes
+ attributes:
+ label: Open source status
+ description: |
+ Please provide the open-source status, which would be very helpful
+ options:
+ - label: "The model implementation is available"
+ - label: "The model weights are available."
+
+ - type: textarea
+ id: additional-info
+ attributes:
+ label: Provide useful links for the implementation
+ description: |
+ Please provide information regarding the implementation, the weights, and the authors.
+ Please mention the authors by @gh-username if you're aware of their usernames.
+
+ - type: markdown
+ attributes:
+ value: |
+ ## Acknowledgement
+ Thanks for taking the time to fill out this report.
+
+ We strongly appreciate you creating a new PR to implement it [**Here**](https://github.com/open-mmlab/mmocr/pulls)!
+ Please refer to [**Contribution Guide**](https://mmocr.readthedocs.io/en/dev-1.x/notes/contribution_guide.html) for contributing.
+
+ Welcome to join our [**Community**](https://mmocr.readthedocs.io/en/latest/contact.html) to discuss together. ๐ฌ
diff --git a/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/4-documentation.yml b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/4-documentation.yml
new file mode 100644
index 0000000000000000000000000000000000000000..f19e070f56fda1fed144a1241ba3827c84bd4f18
--- /dev/null
+++ b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/4-documentation.yml
@@ -0,0 +1,48 @@
+name: ๐ Documentation
+description: Report an issue related to the documentation.
+labels: "docs"
+title: "[Docs] "
+
+body:
+ - type: markdown
+ attributes:
+ value: |
+ ## Note
+ For general usage questions or idea discussions, please post it to our [**Forum**](https://github.com/open-mmlab/mmocr/discussions)
+ Please fill in as **much** of the following form as you're able to. **The clearer the description, the shorter it will take to solve it.**
+
+ - type: dropdown
+ id: branch
+ attributes:
+ label: Branch
+ description: This issue is related to the
+ options:
+ - master branch https://mmocr.readthedocs.io/en/latest/
+ - 1.x branch https://mmocr.readthedocs.io/en/dev-1.x/
+ validations:
+ required: true
+
+ - type: textarea
+ attributes:
+ label: ๐ The doc issue
+ description: >
+ A clear and concise description the issue.
+ validations:
+ required: true
+
+ - type: textarea
+ attributes:
+ label: Suggest a potential alternative/fix
+ description: >
+ Tell us how we could improve the documentation in this regard.
+
+ - type: markdown
+ attributes:
+ value: |
+ ## Acknowledgement
+ Thanks for taking the time to fill out this report.
+
+ If you have already identified the reason, we strongly appreciate you creating a new PR to fix it [**here**](https://github.com/open-mmlab/mmocr/pulls)!
+ Please refer to [**Contribution Guide**](https://mmocr.readthedocs.io/en/dev-1.x/notes/contribution_guide.html) for contributing.
+
+ Welcome to join our [**Community**](https://mmocr.readthedocs.io/en/latest/contact.html) to discuss together. ๐ฌ
diff --git a/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/config.yml b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 0000000000000000000000000000000000000000..fca6615a0531179134ef7bcd94a37a10fc0718c2
--- /dev/null
+++ b/mmocr-dev-1.x/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,12 @@
+blank_issues_enabled: false
+
+contact_links:
+ - name: โ FAQ
+ url: https://mmocr.readthedocs.io/en/dev-1.x/get_started/faq.html
+ about: Is your question frequently asked?
+ - name: ๐ฌ Forum
+ url: https://github.com/open-mmlab/mmocr/discussions
+ about: Ask general usage questions and discuss with other MMOCR community members
+ - name: ๐ Explore OpenMMLab
+ url: https://openmmlab.com/
+ about: Get know more about OpenMMLab
diff --git a/mmocr-dev-1.x/.github/pull_request_template.md b/mmocr-dev-1.x/.github/pull_request_template.md
new file mode 100644
index 0000000000000000000000000000000000000000..e010f972caa5699d8cfcac64a859b6eb7d1d7749
--- /dev/null
+++ b/mmocr-dev-1.x/.github/pull_request_template.md
@@ -0,0 +1,33 @@
+Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
+
+## Motivation
+
+Please describe the motivation of this PR and the goal you want to achieve through this PR.
+
+## Modification
+
+Please briefly describe what modification is made in this PR.
+
+## BC-breaking (Optional)
+
+Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
+If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
+
+## Use cases (Optional)
+
+If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
+
+## Checklist
+
+**Before PR**:
+
+- [ ] I have read and followed the workflow indicated in the [CONTRIBUTING.md](https://github.com/open-mmlab/mmocr/blob/main/.github/CONTRIBUTING.md) to create this PR.
+- [ ] Pre-commit or linting tools indicated in [CONTRIBUTING.md](https://github.com/open-mmlab/mmocr/blob/main/.github/CONTRIBUTING.md) are used to fix the potential lint issues.
+- [ ] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
+- [ ] New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
+- [ ] The documentation has been modified accordingly, including docstring or example tutorials.
+
+**After PR**:
+
+- [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects.
+- [ ] CLA has been signed and all committers have signed the CLA in this PR.
diff --git a/mmocr-dev-1.x/.github/workflows/lint.yml b/mmocr-dev-1.x/.github/workflows/lint.yml
new file mode 100644
index 0000000000000000000000000000000000000000..e9cdba667ba986019473046d13315b2755bd5de6
--- /dev/null
+++ b/mmocr-dev-1.x/.github/workflows/lint.yml
@@ -0,0 +1,27 @@
+name: lint
+
+on: [push, pull_request]
+
+concurrency:
+ group: ${{ github.workflow }}-${{ github.ref }}
+ cancel-in-progress: true
+
+jobs:
+ lint:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v2
+ - name: Set up Python 3.7
+ uses: actions/setup-python@v2
+ with:
+ python-version: 3.7
+ - name: Install pre-commit hook
+ run: |
+ pip install pre-commit
+ pre-commit install
+ - name: Linting
+ run: pre-commit run --all-files
+ - name: Check docstring coverage
+ run: |
+ pip install interrogate
+ interrogate -v --ignore-init-method --ignore-module --ignore-nested-functions --ignore-regex "__repr__" --fail-under 90 mmocr
diff --git a/mmocr-dev-1.x/.github/workflows/merge_stage_test.yml b/mmocr-dev-1.x/.github/workflows/merge_stage_test.yml
new file mode 100644
index 0000000000000000000000000000000000000000..856ede8335a691ccb5585ecae04728fcd1e958bb
--- /dev/null
+++ b/mmocr-dev-1.x/.github/workflows/merge_stage_test.yml
@@ -0,0 +1,160 @@
+name: merge_stage_test
+
+on:
+ push:
+ paths-ignore:
+ - 'README.md'
+ - 'README_zh-CN.md'
+ - 'docs/**'
+ - 'demo/**'
+ - '.dev_scripts/**'
+ - '.circleci/**'
+ - 'projects/**'
+ branches:
+ - dev-1.x
+
+concurrency:
+ group: ${{ github.workflow }}-${{ github.ref }}
+ cancel-in-progress: true
+
+jobs:
+ build_cpu_py:
+ runs-on: ubuntu-22.04
+ strategy:
+ matrix:
+ python-version: [3.8, 3.9]
+ torch: [1.8.1]
+ include:
+ - torch: 1.8.1
+ torchvision: 0.9.1
+ steps:
+ - uses: actions/checkout@v3
+ - name: Set up Python ${{ matrix.python-version }}
+ uses: actions/setup-python@v4
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: Upgrade pip
+ run: pip install pip --upgrade
+ - name: Install PyTorch
+ run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
+ - name: Install MMEngine
+ run: pip install git+https://github.com/open-mmlab/mmengine.git@main
+ - name: Install MMCV
+ run: |
+ pip install -U openmim
+ mim install 'mmcv >= 2.0.0rc1'
+ - name: Install MMDet
+ run: pip install git+https://github.com/open-mmlab/mmdetection.git@dev-3.x
+ - name: Install other dependencies
+ run: pip install -r requirements/tests.txt
+ - name: Build and install
+ run: rm -rf .eggs && pip install -e .
+ - name: Run unittests and generate coverage report
+ run: |
+ coverage run --branch --source mmocr -m pytest tests/
+ coverage xml
+ coverage report -m
+
+ build_cpu_pt:
+ runs-on: ubuntu-22.04
+ strategy:
+ matrix:
+ python-version: [3.7]
+ torch: [1.6.0, 1.7.1, 1.8.1, 1.9.1, 1.10.1, 1.11.0, 1.12.1, 1.13.0]
+ include:
+ - torch: 1.6.0
+ torchvision: 0.7.0
+ - torch: 1.7.1
+ torchvision: 0.8.2
+ - torch: 1.8.1
+ torchvision: 0.9.1
+ - torch: 1.9.1
+ torchvision: 0.10.1
+ - torch: 1.10.1
+ torchvision: 0.11.2
+ - torch: 1.11.0
+ torchvision: 0.12.0
+ - torch: 1.12.1
+ torchvision: 0.13.1
+ - torch: 1.13.0
+ torchvision: 0.14.0
+ - torch: 2.0.0
+ torchvision: 0.15.1
+ python-version: 3.8
+ steps:
+ - uses: actions/checkout@v3
+ - name: Set up Python ${{ matrix.python-version }}
+ uses: actions/setup-python@v4
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: Upgrade pip
+ run: pip install pip --upgrade
+ - name: Install PyTorch
+ run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
+ - name: Install MMEngine
+ run: pip install git+https://github.com/open-mmlab/mmengine.git@main
+ - name: Install MMCV
+ run: |
+ pip install -U openmim
+ mim install 'mmcv >= 2.0.0rc1'
+ - name: Install MMDet
+ run: pip install git+https://github.com/open-mmlab/mmdetection.git@dev-3.x
+ - name: Install other dependencies
+ run: pip install -r requirements/tests.txt
+ - name: Build and install
+ run: rm -rf .eggs && pip install -e .
+ - name: Run unittests and generate coverage report
+ run: |
+ coverage run --branch --source mmocr -m pytest tests/
+ coverage xml
+ coverage report -m
+ # Only upload coverage report for python3.7 && pytorch1.8.1 cpu
+ - name: Upload coverage to Codecov
+ if: ${{matrix.torch == '1.8.1' && matrix.python-version == '3.7'}}
+ uses: codecov/codecov-action@v1.0.14
+ with:
+ file: ./coverage.xml
+ flags: unittests
+ env_vars: OS,PYTHON
+ name: codecov-umbrella
+ fail_ci_if_error: false
+
+
+ build_windows:
+ runs-on: windows-2022
+ strategy:
+ matrix:
+ python: [3.7]
+ platform: [cpu, cu111]
+ torch: [1.8.1]
+ torchvision: [0.9.1]
+ include:
+ - python-version: 3.8
+ platform: cu117
+ torch: 2.0.0
+ torchvision: 0.15.1
+ steps:
+ - uses: actions/checkout@v2
+ - name: Set up Python ${{ matrix.python }}
+ uses: actions/setup-python@v2
+ with:
+ python-version: ${{ matrix.python }}
+ - name: Upgrade pip
+ run: python -m pip install --upgrade pip
+ - name: Install lmdb
+ run: pip install lmdb
+ - name: Install PyTorch
+ run: pip install torch==${{matrix.torch}}+${{matrix.platform}} torchvision==${{matrix.torchvision}}+${{matrix.platform}} -f https://download.pytorch.org/whl/${{matrix.platform}}/torch_stable.html
+ - name: Install mmocr dependencies
+ run: |
+ pip install git+https://github.com/open-mmlab/mmengine.git@main
+ pip install -U openmim
+ mim install 'mmcv >= 2.0.0rc1'
+ pip install git+https://github.com/open-mmlab/mmdetection.git@dev-3.x
+ pip install -r requirements/tests.txt
+ - name: Build and install
+ run: |
+ pip install -e .
+ - name: Run unittests and generate coverage report
+ run: |
+ pytest tests/
diff --git a/mmocr-dev-1.x/.github/workflows/pr_stage_test.yml b/mmocr-dev-1.x/.github/workflows/pr_stage_test.yml
new file mode 100644
index 0000000000000000000000000000000000000000..e9344e5a056f58db0e3c5e0bfd588078f2817613
--- /dev/null
+++ b/mmocr-dev-1.x/.github/workflows/pr_stage_test.yml
@@ -0,0 +1,102 @@
+name: pr_stage_test
+
+on:
+ pull_request:
+ paths-ignore:
+ - 'README.md'
+ - 'README_zh-CN.md'
+ - 'docs/**'
+ - 'demo/**'
+ - '.dev_scripts/**'
+ - '.circleci/**'
+ - 'projects/**'
+
+concurrency:
+ group: ${{ github.workflow }}-${{ github.ref }}
+ cancel-in-progress: true
+
+jobs:
+ build_cpu:
+ runs-on: ubuntu-22.04
+ strategy:
+ matrix:
+ python-version: [3.7]
+ include:
+ - torch: 1.8.1
+ torchvision: 0.9.1
+ steps:
+ - uses: actions/checkout@v3
+ - name: Set up Python ${{ matrix.python-version }}
+ uses: actions/setup-python@v4
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: Upgrade pip
+ run: pip install pip --upgrade
+ - name: Install PyTorch
+ run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
+ - name: Install MMEngine
+ run: pip install git+https://github.com/open-mmlab/mmengine.git@main
+ - name: Install MMCV
+ run: |
+ pip install -U openmim
+ mim install 'mmcv >= 2.0.0rc1'
+ - name: Install MMDet
+ run: pip install git+https://github.com/open-mmlab/mmdetection.git@dev-3.x
+ - name: Install other dependencies
+ run: pip install -r requirements/tests.txt
+ - name: Build and install
+ run: rm -rf .eggs && pip install -e .
+ - name: Run unittests and generate coverage report
+ run: |
+ coverage run --branch --source mmocr -m pytest tests/
+ coverage xml
+ coverage report -m
+ # Upload coverage report for python3.7 && pytorch1.8.1 cpu
+ - name: Upload coverage to Codecov
+ uses: codecov/codecov-action@v1.0.14
+ with:
+ file: ./coverage.xml
+ flags: unittests
+ env_vars: OS,PYTHON
+ name: codecov-umbrella
+ fail_ci_if_error: false
+
+
+ build_windows:
+ runs-on: windows-2022
+ strategy:
+ matrix:
+ python: [3.7]
+ platform: [cpu, cu111]
+ torch: [1.8.1]
+ torchvision: [0.9.1]
+ include:
+ - python-version: 3.8
+ platform: cu117
+ torch: 2.0.0
+ torchvision: 0.15.1
+ steps:
+ - uses: actions/checkout@v3
+ - name: Set up Python ${{ matrix.python }}
+ uses: actions/setup-python@v4
+ with:
+ python-version: ${{ matrix.python }}
+ - name: Upgrade pip
+ run: python -m pip install --upgrade pip
+ - name: Install lmdb
+ run: pip install lmdb
+ - name: Install PyTorch
+ run: pip install torch==${{matrix.torch}}+${{matrix.platform}} torchvision==${{matrix.torchvision}}+${{matrix.platform}} -f https://download.pytorch.org/whl/${{matrix.platform}}/torch_stable.html
+ - name: Install mmocr dependencies
+ run: |
+ pip install git+https://github.com/open-mmlab/mmengine.git@main
+ pip install -U openmim
+ mim install 'mmcv >= 2.0.0rc1'
+ pip install git+https://github.com/open-mmlab/mmdetection.git@dev-3.x
+ pip install -r requirements/tests.txt
+ - name: Build and install
+ run: |
+ pip install -e .
+ - name: Run unittests and generate coverage report
+ run: |
+ pytest tests/
diff --git a/mmocr-dev-1.x/.github/workflows/publish-to-pypi.yml b/mmocr-dev-1.x/.github/workflows/publish-to-pypi.yml
new file mode 100644
index 0000000000000000000000000000000000000000..fc8e5f4fa230670134149e69cf301723a7176de9
--- /dev/null
+++ b/mmocr-dev-1.x/.github/workflows/publish-to-pypi.yml
@@ -0,0 +1,26 @@
+name: deploy
+
+on: push
+
+concurrency:
+ group: ${{ github.workflow }}-${{ github.ref }}
+ cancel-in-progress: true
+
+jobs:
+ build-n-publish:
+ runs-on: ubuntu-latest
+ if: startsWith(github.event.ref, 'refs/tags')
+ steps:
+ - uses: actions/checkout@v2
+ - name: Set up Python 3.7
+ uses: actions/setup-python@v1
+ with:
+ python-version: 3.7
+ - name: Build MMOCR
+ run: |
+ pip install wheel
+ python setup.py sdist bdist_wheel
+ - name: Publish distribution to PyPI
+ run: |
+ pip install twine
+ twine upload dist/* -u __token__ -p ${{ secrets.pypi_password }}
diff --git a/mmocr-dev-1.x/.github/workflows/test_mim.yml b/mmocr-dev-1.x/.github/workflows/test_mim.yml
new file mode 100644
index 0000000000000000000000000000000000000000..2c1c170d6375a7bedc761404f228db80030cec44
--- /dev/null
+++ b/mmocr-dev-1.x/.github/workflows/test_mim.yml
@@ -0,0 +1,44 @@
+name: test-mim
+
+on:
+ push:
+ paths:
+ - 'model-index.yml'
+ - 'configs/**'
+
+ pull_request:
+ paths:
+ - 'model-index.yml'
+ - 'configs/**'
+
+concurrency:
+ group: ${{ github.workflow }}-${{ github.ref }}
+ cancel-in-progress: true
+
+jobs:
+ build_cpu:
+ runs-on: ubuntu-18.04
+ strategy:
+ matrix:
+ python-version: [3.7]
+ torch: [1.8.0]
+ include:
+ - torch: 1.8.0
+ torch_version: torch1.8
+ torchvision: 0.9.0
+ steps:
+ - uses: actions/checkout@v2
+ - name: Set up Python ${{ matrix.python-version }}
+ uses: actions/setup-python@v2
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: Upgrade pip
+ run: pip install pip --upgrade
+ - name: Install PyTorch
+ run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html
+ - name: Install openmim
+ run: pip install openmim
+ - name: Build and install
+ run: rm -rf .eggs && mim install -e .
+ - name: test commands of mim
+ run: mim search mmocr
diff --git a/mmocr-dev-1.x/.gitignore b/mmocr-dev-1.x/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..54567836d29f65c1630d89d44d0c48af9718c8f3
--- /dev/null
+++ b/mmocr-dev-1.x/.gitignore
@@ -0,0 +1,146 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+*.ipynb
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/en/_build/
+docs/zh_cn/_build/
+docs/*/api/generated/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+# cython generated cpp
+!data/dict
+/data
+.vscode
+.idea
+
+# custom
+*.pkl
+*.pkl.json
+*.log.json
+work_dirs/
+exps/
+*~
+show_dir/
+
+# Pytorch
+*.pth
+
+# demo
+!tests/data
+tests/results
+
+#temp files
+.DS_Store
+
+checkpoints
+
+htmlcov
+*.swp
+log.txt
+workspace.code-workspace
+results
+mmocr/core/font.TTF
+mmocr/.mim
+workdirs/
+.history/
+.dev/
+data/
diff --git a/mmocr-dev-1.x/.owners.yml b/mmocr-dev-1.x/.owners.yml
new file mode 100644
index 0000000000000000000000000000000000000000..c21ef0dc20db0d2994136a04637844f726a82162
--- /dev/null
+++ b/mmocr-dev-1.x/.owners.yml
@@ -0,0 +1,9 @@
+assign:
+ strategy:
+ random
+ # daily-shift-based
+ scedule:
+ '*/1 * * * *'
+ assignees:
+ - gaotongxiao
+ - Harold-lkk
diff --git a/mmocr-dev-1.x/.pre-commit-config.yaml b/mmocr-dev-1.x/.pre-commit-config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..bf71f4f9b198b3d81c4bfe335de3fc3a7df35ca0
--- /dev/null
+++ b/mmocr-dev-1.x/.pre-commit-config.yaml
@@ -0,0 +1,63 @@
+exclude: ^tests/data/
+repos:
+ - repo: https://github.com/PyCQA/flake8
+ rev: 5.0.4
+ hooks:
+ - id: flake8
+ - repo: https://github.com/zhouzaida/isort
+ rev: 5.12.1
+ hooks:
+ - id: isort
+ - repo: https://github.com/pre-commit/mirrors-yapf
+ rev: v0.32.0
+ hooks:
+ - id: yapf
+ - repo: https://github.com/codespell-project/codespell
+ rev: v2.2.1
+ hooks:
+ - id: codespell
+ - repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v4.3.0
+ hooks:
+ - id: trailing-whitespace
+ exclude: |
+ (?x)^(
+ dicts/|
+ projects/.*?/dicts/
+ )
+ - id: check-yaml
+ - id: end-of-file-fixer
+ exclude: |
+ (?x)^(
+ dicts/|
+ projects/.*?/dicts/
+ )
+ - id: requirements-txt-fixer
+ - id: double-quote-string-fixer
+ - id: check-merge-conflict
+ - id: fix-encoding-pragma
+ args: ["--remove"]
+ - id: mixed-line-ending
+ args: ["--fix=lf"]
+ - id: mixed-line-ending
+ args: ["--fix=lf"]
+ - repo: https://github.com/executablebooks/mdformat
+ rev: 0.7.9
+ hooks:
+ - id: mdformat
+ args: ["--number", "--table-width", "200"]
+ additional_dependencies:
+ - mdformat-openmmlab
+ - mdformat_frontmatter
+ - linkify-it-py
+ - repo: https://github.com/myint/docformatter
+ rev: v1.3.1
+ hooks:
+ - id: docformatter
+ args: ["--in-place", "--wrap-descriptions", "79"]
+ - repo: https://github.com/open-mmlab/pre-commit-hooks
+ rev: v0.2.0 # Use the ref you want to point at
+ hooks:
+ - id: check-algo-readme
+ - id: check-copyright
+ args: ["mmocr", "tests", "tools"] # these directories will be checked
diff --git a/mmocr-dev-1.x/.pylintrc b/mmocr-dev-1.x/.pylintrc
new file mode 100644
index 0000000000000000000000000000000000000000..d7a39be85d13c69aea978374a5edf921a5d4cc39
--- /dev/null
+++ b/mmocr-dev-1.x/.pylintrc
@@ -0,0 +1,621 @@
+[MASTER]
+
+# A comma-separated list of package or module names from where C extensions may
+# be loaded. Extensions are loading into the active Python interpreter and may
+# run arbitrary code.
+extension-pkg-whitelist=
+
+# Specify a score threshold to be exceeded before program exits with error.
+fail-under=10.0
+
+# Add files or directories to the blacklist. They should be base names, not
+# paths.
+ignore=CVS,configs
+
+# Add files or directories matching the regex patterns to the blacklist. The
+# regex matches against base names, not paths.
+ignore-patterns=
+
+# Python code to execute, usually for sys.path manipulation such as
+# pygtk.require().
+#init-hook=
+
+# Use multiple processes to speed up Pylint. Specifying 0 will auto-detect the
+# number of processors available to use.
+jobs=1
+
+# Control the amount of potential inferred values when inferring a single
+# object. This can help the performance when dealing with large functions or
+# complex, nested conditions.
+limit-inference-results=100
+
+# List of plugins (as comma separated values of python module names) to load,
+# usually to register additional checkers.
+load-plugins=
+
+# Pickle collected data for later comparisons.
+persistent=yes
+
+# When enabled, pylint would attempt to guess common misconfiguration and emit
+# user-friendly hints instead of false-positive error messages.
+suggestion-mode=yes
+
+# Allow loading of arbitrary C extensions. Extensions are imported into the
+# active Python interpreter and may run arbitrary code.
+unsafe-load-any-extension=no
+
+
+[MESSAGES CONTROL]
+
+# Only show warnings with the listed confidence levels. Leave empty to show
+# all. Valid levels: HIGH, INFERENCE, INFERENCE_FAILURE, UNDEFINED.
+confidence=
+
+# Disable the message, report, category or checker with the given id(s). You
+# can either give multiple identifiers separated by comma (,) or put this
+# option multiple times (only on the command line, not in the configuration
+# file where it should appear only once). You can also use "--disable=all" to
+# disable everything first and then reenable specific checks. For example, if
+# you want to run only the similarities checker, you can use "--disable=all
+# --enable=similarities". If you want to run only the classes checker, but have
+# no Warning level messages displayed, use "--disable=all --enable=classes
+# --disable=W".
+disable=print-statement,
+ parameter-unpacking,
+ unpacking-in-except,
+ old-raise-syntax,
+ backtick,
+ long-suffix,
+ old-ne-operator,
+ old-octal-literal,
+ import-star-module-level,
+ non-ascii-bytes-literal,
+ raw-checker-failed,
+ bad-inline-option,
+ locally-disabled,
+ file-ignored,
+ suppressed-message,
+ useless-suppression,
+ deprecated-pragma,
+ use-symbolic-message-instead,
+ apply-builtin,
+ basestring-builtin,
+ buffer-builtin,
+ cmp-builtin,
+ coerce-builtin,
+ execfile-builtin,
+ file-builtin,
+ long-builtin,
+ raw_input-builtin,
+ reduce-builtin,
+ standarderror-builtin,
+ unicode-builtin,
+ xrange-builtin,
+ coerce-method,
+ delslice-method,
+ getslice-method,
+ setslice-method,
+ no-absolute-import,
+ old-division,
+ dict-iter-method,
+ dict-view-method,
+ next-method-called,
+ metaclass-assignment,
+ indexing-exception,
+ raising-string,
+ reload-builtin,
+ oct-method,
+ hex-method,
+ nonzero-method,
+ cmp-method,
+ input-builtin,
+ round-builtin,
+ intern-builtin,
+ unichr-builtin,
+ map-builtin-not-iterating,
+ zip-builtin-not-iterating,
+ range-builtin-not-iterating,
+ filter-builtin-not-iterating,
+ using-cmp-argument,
+ eq-without-hash,
+ div-method,
+ idiv-method,
+ rdiv-method,
+ exception-message-attribute,
+ invalid-str-codec,
+ sys-max-int,
+ bad-python3-import,
+ deprecated-string-function,
+ deprecated-str-translate-call,
+ deprecated-itertools-function,
+ deprecated-types-field,
+ next-method-defined,
+ dict-items-not-iterating,
+ dict-keys-not-iterating,
+ dict-values-not-iterating,
+ deprecated-operator-function,
+ deprecated-urllib-function,
+ xreadlines-attribute,
+ deprecated-sys-function,
+ exception-escape,
+ comprehension-escape,
+ no-member,
+ invalid-name,
+ too-many-branches,
+ wrong-import-order,
+ too-many-arguments,
+ missing-function-docstring,
+ missing-module-docstring,
+ too-many-locals,
+ too-few-public-methods,
+ abstract-method,
+ broad-except,
+ too-many-nested-blocks,
+ too-many-instance-attributes,
+ missing-class-docstring,
+ duplicate-code,
+ not-callable,
+ protected-access,
+ dangerous-default-value,
+ no-name-in-module,
+ logging-fstring-interpolation,
+ super-init-not-called,
+ redefined-builtin,
+ attribute-defined-outside-init,
+ arguments-differ,
+ cyclic-import,
+ bad-super-call,
+ too-many-statements
+
+# Enable the message, report, category or checker with the given id(s). You can
+# either give multiple identifier separated by comma (,) or put this option
+# multiple time (only on the command line, not in the configuration file where
+# it should appear only once). See also the "--disable" option for examples.
+enable=c-extension-no-member
+
+
+[REPORTS]
+
+# Python expression which should return a score less than or equal to 10. You
+# have access to the variables 'error', 'warning', 'refactor', and 'convention'
+# which contain the number of messages in each category, as well as 'statement'
+# which is the total number of statements analyzed. This score is used by the
+# global evaluation report (RP0004).
+evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)
+
+# Template used to display messages. This is a python new-style format string
+# used to format the message information. See doc for all details.
+#msg-template=
+
+# Set the output format. Available formats are text, parseable, colorized, json
+# and msvs (visual studio). You can also give a reporter class, e.g.
+# mypackage.mymodule.MyReporterClass.
+output-format=text
+
+# Tells whether to display a full report or only the messages.
+reports=no
+
+# Activate the evaluation score.
+score=yes
+
+
+[REFACTORING]
+
+# Maximum number of nested blocks for function / method body
+max-nested-blocks=5
+
+# Complete name of functions that never returns. When checking for
+# inconsistent-return-statements if a never returning function is called then
+# it will be considered as an explicit return statement and no message will be
+# printed.
+never-returning-functions=sys.exit
+
+
+[TYPECHECK]
+
+# List of decorators that produce context managers, such as
+# contextlib.contextmanager. Add to this list to register other decorators that
+# produce valid context managers.
+contextmanager-decorators=contextlib.contextmanager
+
+# List of members which are set dynamically and missed by pylint inference
+# system, and so shouldn't trigger E1101 when accessed. Python regular
+# expressions are accepted.
+generated-members=
+
+# Tells whether missing members accessed in mixin class should be ignored. A
+# mixin class is detected if its name ends with "mixin" (case insensitive).
+ignore-mixin-members=yes
+
+# Tells whether to warn about missing members when the owner of the attribute
+# is inferred to be None.
+ignore-none=yes
+
+# This flag controls whether pylint should warn about no-member and similar
+# checks whenever an opaque object is returned when inferring. The inference
+# can return multiple potential results while evaluating a Python object, but
+# some branches might not be evaluated, which results in partial inference. In
+# that case, it might be useful to still emit no-member and other checks for
+# the rest of the inferred objects.
+ignore-on-opaque-inference=yes
+
+# List of class names for which member attributes should not be checked (useful
+# for classes with dynamically set attributes). This supports the use of
+# qualified names.
+ignored-classes=optparse.Values,thread._local,_thread._local
+
+# List of module names for which member attributes should not be checked
+# (useful for modules/projects where namespaces are manipulated during runtime
+# and thus existing member attributes cannot be deduced by static analysis). It
+# supports qualified module names, as well as Unix pattern matching.
+ignored-modules=
+
+# Show a hint with possible names when a member name was not found. The aspect
+# of finding the hint is based on edit distance.
+missing-member-hint=yes
+
+# The minimum edit distance a name should have in order to be considered a
+# similar match for a missing member name.
+missing-member-hint-distance=1
+
+# The total number of similar names that should be taken in consideration when
+# showing a hint for a missing member.
+missing-member-max-choices=1
+
+# List of decorators that change the signature of a decorated function.
+signature-mutators=
+
+
+[SPELLING]
+
+# Limits count of emitted suggestions for spelling mistakes.
+max-spelling-suggestions=4
+
+# Spelling dictionary name. Available dictionaries: none. To make it work,
+# install the python-enchant package.
+spelling-dict=
+
+# List of comma separated words that should not be checked.
+spelling-ignore-words=
+
+# A path to a file that contains the private dictionary; one word per line.
+spelling-private-dict-file=
+
+# Tells whether to store unknown words to the private dictionary (see the
+# --spelling-private-dict-file option) instead of raising a message.
+spelling-store-unknown-words=no
+
+
+[LOGGING]
+
+# The type of string formatting that logging methods do. `old` means using %
+# formatting, `new` is for `{}` formatting.
+logging-format-style=old
+
+# Logging modules to check that the string format arguments are in logging
+# function parameter format.
+logging-modules=logging
+
+
+[VARIABLES]
+
+# List of additional names supposed to be defined in builtins. Remember that
+# you should avoid defining new builtins when possible.
+additional-builtins=
+
+# Tells whether unused global variables should be treated as a violation.
+allow-global-unused-variables=yes
+
+# List of strings which can identify a callback function by name. A callback
+# name must start or end with one of those strings.
+callbacks=cb_,
+ _cb
+
+# A regular expression matching the name of dummy variables (i.e. expected to
+# not be used).
+dummy-variables-rgx=_+$|(_[a-zA-Z0-9_]*[a-zA-Z0-9]+?$)|dummy|^ignored_|^unused_
+
+# Argument names that match this expression will be ignored. Default to name
+# with leading underscore.
+ignored-argument-names=_.*|^ignored_|^unused_
+
+# Tells whether we should check for unused import in __init__ files.
+init-import=no
+
+# List of qualified module names which can have objects that can redefine
+# builtins.
+redefining-builtins-modules=six.moves,past.builtins,future.builtins,builtins,io
+
+
+[FORMAT]
+
+# Expected format of line ending, e.g. empty (any line ending), LF or CRLF.
+expected-line-ending-format=
+
+# Regexp for a line that is allowed to be longer than the limit.
+ignore-long-lines=^\s*(# )??$
+
+# Number of spaces of indent required inside a hanging or continued line.
+indent-after-paren=4
+
+# String used as indentation unit. This is usually " " (4 spaces) or "\t" (1
+# tab).
+indent-string=' '
+
+# Maximum number of characters on a single line.
+max-line-length=100
+
+# Maximum number of lines in a module.
+max-module-lines=1000
+
+# Allow the body of a class to be on the same line as the declaration if body
+# contains single statement.
+single-line-class-stmt=no
+
+# Allow the body of an if to be on the same line as the test if there is no
+# else.
+single-line-if-stmt=no
+
+
+[STRING]
+
+# This flag controls whether inconsistent-quotes generates a warning when the
+# character used as a quote delimiter is used inconsistently within a module.
+check-quote-consistency=no
+
+# This flag controls whether the implicit-str-concat should generate a warning
+# on implicit string concatenation in sequences defined over several lines.
+check-str-concat-over-line-jumps=no
+
+
+[SIMILARITIES]
+
+# Ignore comments when computing similarities.
+ignore-comments=yes
+
+# Ignore docstrings when computing similarities.
+ignore-docstrings=yes
+
+# Ignore imports when computing similarities.
+ignore-imports=no
+
+# Minimum lines number of a similarity.
+min-similarity-lines=4
+
+
+[MISCELLANEOUS]
+
+# List of note tags to take in consideration, separated by a comma.
+notes=FIXME,
+ XXX,
+ TODO
+
+# Regular expression of note tags to take in consideration.
+#notes-rgx=
+
+
+[BASIC]
+
+# Naming style matching correct argument names.
+argument-naming-style=snake_case
+
+# Regular expression matching correct argument names. Overrides argument-
+# naming-style.
+#argument-rgx=
+
+# Naming style matching correct attribute names.
+attr-naming-style=snake_case
+
+# Regular expression matching correct attribute names. Overrides attr-naming-
+# style.
+#attr-rgx=
+
+# Bad variable names which should always be refused, separated by a comma.
+bad-names=foo,
+ bar,
+ baz,
+ toto,
+ tutu,
+ tata
+
+# Bad variable names regexes, separated by a comma. If names match any regex,
+# they will always be refused
+bad-names-rgxs=
+
+# Naming style matching correct class attribute names.
+class-attribute-naming-style=any
+
+# Regular expression matching correct class attribute names. Overrides class-
+# attribute-naming-style.
+#class-attribute-rgx=
+
+# Naming style matching correct class names.
+class-naming-style=PascalCase
+
+# Regular expression matching correct class names. Overrides class-naming-
+# style.
+#class-rgx=
+
+# Naming style matching correct constant names.
+const-naming-style=UPPER_CASE
+
+# Regular expression matching correct constant names. Overrides const-naming-
+# style.
+#const-rgx=
+
+# Minimum line length for functions/classes that require docstrings, shorter
+# ones are exempt.
+docstring-min-length=-1
+
+# Naming style matching correct function names.
+function-naming-style=snake_case
+
+# Regular expression matching correct function names. Overrides function-
+# naming-style.
+#function-rgx=
+
+# Good variable names which should always be accepted, separated by a comma.
+good-names=i,
+ j,
+ k,
+ ex,
+ Run,
+ _,
+ x,
+ y,
+ w,
+ h,
+ a,
+ b
+
+# Good variable names regexes, separated by a comma. If names match any regex,
+# they will always be accepted
+good-names-rgxs=
+
+# Include a hint for the correct naming format with invalid-name.
+include-naming-hint=no
+
+# Naming style matching correct inline iteration names.
+inlinevar-naming-style=any
+
+# Regular expression matching correct inline iteration names. Overrides
+# inlinevar-naming-style.
+#inlinevar-rgx=
+
+# Naming style matching correct method names.
+method-naming-style=snake_case
+
+# Regular expression matching correct method names. Overrides method-naming-
+# style.
+#method-rgx=
+
+# Naming style matching correct module names.
+module-naming-style=snake_case
+
+# Regular expression matching correct module names. Overrides module-naming-
+# style.
+#module-rgx=
+
+# Colon-delimited sets of names that determine each other's naming style when
+# the name regexes allow several styles.
+name-group=
+
+# Regular expression which should only match function or class names that do
+# not require a docstring.
+no-docstring-rgx=^_
+
+# List of decorators that produce properties, such as abc.abstractproperty. Add
+# to this list to register other decorators that produce valid properties.
+# These decorators are taken in consideration only for invalid-name.
+property-classes=abc.abstractproperty
+
+# Naming style matching correct variable names.
+variable-naming-style=snake_case
+
+# Regular expression matching correct variable names. Overrides variable-
+# naming-style.
+#variable-rgx=
+
+
+[DESIGN]
+
+# Maximum number of arguments for function / method.
+max-args=5
+
+# Maximum number of attributes for a class (see R0902).
+max-attributes=7
+
+# Maximum number of boolean expressions in an if statement (see R0916).
+max-bool-expr=5
+
+# Maximum number of branch for function / method body.
+max-branches=12
+
+# Maximum number of locals for function / method body.
+max-locals=15
+
+# Maximum number of parents for a class (see R0901).
+max-parents=7
+
+# Maximum number of public methods for a class (see R0904).
+max-public-methods=20
+
+# Maximum number of return / yield for function / method body.
+max-returns=6
+
+# Maximum number of statements in function / method body.
+max-statements=50
+
+# Minimum number of public methods for a class (see R0903).
+min-public-methods=2
+
+
+[IMPORTS]
+
+# List of modules that can be imported at any level, not just the top level
+# one.
+allow-any-import-level=
+
+# Allow wildcard imports from modules that define __all__.
+allow-wildcard-with-all=no
+
+# Analyse import fallback blocks. This can be used to support both Python 2 and
+# 3 compatible code, which means that the block might have code that exists
+# only in one or another interpreter, leading to false positives when analysed.
+analyse-fallback-blocks=no
+
+# Deprecated modules which should not be used, separated by a comma.
+deprecated-modules=optparse,tkinter.tix
+
+# Create a graph of external dependencies in the given file (report RP0402 must
+# not be disabled).
+ext-import-graph=
+
+# Create a graph of every (i.e. internal and external) dependencies in the
+# given file (report RP0402 must not be disabled).
+import-graph=
+
+# Create a graph of internal dependencies in the given file (report RP0402 must
+# not be disabled).
+int-import-graph=
+
+# Force import order to recognize a module as part of the standard
+# compatibility libraries.
+known-standard-library=
+
+# Force import order to recognize a module as part of a third party library.
+known-third-party=enchant
+
+# Couples of modules and preferred modules, separated by a comma.
+preferred-modules=
+
+
+[CLASSES]
+
+# List of method names used to declare (i.e. assign) instance attributes.
+defining-attr-methods=__init__,
+ __new__,
+ setUp,
+ __post_init__
+
+# List of member names, which should be excluded from the protected access
+# warning.
+exclude-protected=_asdict,
+ _fields,
+ _replace,
+ _source,
+ _make
+
+# List of valid names for the first argument in a class method.
+valid-classmethod-first-arg=cls
+
+# List of valid names for the first argument in a metaclass class method.
+valid-metaclass-classmethod-first-arg=cls
+
+
+[EXCEPTIONS]
+
+# Exceptions that will emit a warning when being caught. Defaults to
+# "BaseException, Exception".
+overgeneral-exceptions=BaseException,
+ Exception
diff --git a/mmocr-dev-1.x/.readthedocs.yml b/mmocr-dev-1.x/.readthedocs.yml
new file mode 100644
index 0000000000000000000000000000000000000000..5d508503d475b736ef08545e5c130cb8f373cd1a
--- /dev/null
+++ b/mmocr-dev-1.x/.readthedocs.yml
@@ -0,0 +1,9 @@
+version: 2
+
+formats: all
+
+python:
+ version: 3.7
+ install:
+ - requirements: requirements/docs.txt
+ - requirements: requirements/readthedocs.txt
diff --git a/mmocr-dev-1.x/CITATION.cff b/mmocr-dev-1.x/CITATION.cff
new file mode 100644
index 0000000000000000000000000000000000000000..7d1d93a7c68daf442bc6540b197b401e7a38b91c
--- /dev/null
+++ b/mmocr-dev-1.x/CITATION.cff
@@ -0,0 +1,9 @@
+cff-version: 1.2.0
+message: "If you use this software, please cite it as below."
+title: "OpenMMLab Text Detection, Recognition and Understanding Toolbox"
+authors:
+ - name: "MMOCR Contributors"
+version: 0.3.0
+date-released: 2020-08-15
+repository-code: "https://github.com/open-mmlab/mmocr"
+license: Apache-2.0
diff --git a/mmocr-dev-1.x/LICENSE b/mmocr-dev-1.x/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..3076a4378396deea4db311adbe1fbfd8b8b05920
--- /dev/null
+++ b/mmocr-dev-1.x/LICENSE
@@ -0,0 +1,203 @@
+Copyright (c) MMOCR Authors. All rights reserved.
+
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright 2021 MMOCR Authors. All rights reserved.
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
diff --git a/mmocr-dev-1.x/MANIFEST.in b/mmocr-dev-1.x/MANIFEST.in
new file mode 100644
index 0000000000000000000000000000000000000000..2ba112301a27e08af6939187dbd8e24cd85e852d
--- /dev/null
+++ b/mmocr-dev-1.x/MANIFEST.in
@@ -0,0 +1,5 @@
+include requirements/*.txt
+include mmocr/.mim/model-index.yml
+include mmocr/.mim/dicts/*.txt
+recursive-include mmocr/.mim/configs *.py *.yml
+recursive-include mmocr/.mim/tools *.sh *.py
diff --git a/mmocr-dev-1.x/README.md b/mmocr-dev-1.x/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1acff842ecf6d84845e1fd347d480f7c6dd88f9d
--- /dev/null
+++ b/mmocr-dev-1.x/README.md
@@ -0,0 +1,251 @@
+
+
+## Latest Updates
+
+**The default branch is now `main` and the code on the branch has been upgraded to v1.0.0. The old `main` branch (v0.6.3) code now exists on the `0.x` branch.** If you have been using the `main` branch and encounter upgrade issues, please read the [Migration Guide](https://mmocr.readthedocs.io/en/dev-1.x/migration/overview.html) and notes on [Branches](https://mmocr.readthedocs.io/en/dev-1.x/migration/branches.html) .
+
+v1.0.0 was released in 2023-04-06. Major updates from 1.0.0rc6 include:
+
+1. Support for SCUT-CTW1500, SynthText, and MJSynth datasets in Dataset Preparer
+2. Updated FAQ and documentation
+3. Deprecation of file_client_args in favor of backend_args
+4. Added a new MMOCR tutorial notebook
+
+To know more about the updates in MMOCR 1.0, please refer to [What's New in MMOCR 1.x](https://mmocr.readthedocs.io/en/dev-1.x/migration/news.html), or
+Read [Changelog](https://mmocr.readthedocs.io/en/dev-1.x/notes/changelog.html) for more details!
+
+## Introduction
+
+MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is part of the [OpenMMLab](https://openmmlab.com/) project.
+
+The main branch works with **PyTorch 1.6+**.
+
+
+
+
+
+### Major Features
+
+- **Comprehensive Pipeline**
+
+ The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction.
+
+- **Multiple Models**
+
+ The toolbox supports a wide variety of state-of-the-art models for text detection, text recognition and key information extraction.
+
+- **Modular Design**
+
+ The modular design of MMOCR enables users to define their own optimizers, data preprocessors, and model components such as backbones, necks and heads as well as losses. Please refer to [Overview](https://mmocr.readthedocs.io/en/dev-1.x/get_started/overview.html) for how to construct a customized model.
+
+- **Numerous Utilities**
+
+ The toolbox provides a comprehensive set of utilities which can help users assess the performance of models. It includes visualizers which allow visualization of images, ground truths as well as predicted bounding boxes, and a validation tool for evaluating checkpoints during training. It also includes data converters to demonstrate how to convert your own data to the annotation files which the toolbox supports.
+
+## Installation
+
+MMOCR depends on [PyTorch](https://pytorch.org/), [MMEngine](https://github.com/open-mmlab/mmengine), [MMCV](https://github.com/open-mmlab/mmcv) and [MMDetection](https://github.com/open-mmlab/mmdetection).
+Below are quick steps for installation.
+Please refer to [Install Guide](https://mmocr.readthedocs.io/en/dev-1.x/get_started/install.html) for more detailed instruction.
+
+```shell
+conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
+conda activate open-mmlab
+pip3 install openmim
+git clone https://github.com/open-mmlab/mmocr.git
+cd mmocr
+mim install -e .
+```
+
+## Get Started
+
+Please see [Quick Run](https://mmocr.readthedocs.io/en/dev-1.x/get_started/quick_run.html) for the basic usage of MMOCR.
+
+## [Model Zoo](https://mmocr.readthedocs.io/en/dev-1.x/modelzoo.html)
+
+Supported algorithms:
+
+
+BackBone
+
+- [x] [oCLIP](configs/backbone/oclip/README.md) (ECCV'2022)
+
+
+
+
+Text Detection
+
+- [x] [DBNet](configs/textdet/dbnet/README.md) (AAAI'2020) / [DBNet++](configs/textdet/dbnetpp/README.md) (TPAMI'2022)
+- [x] [Mask R-CNN](configs/textdet/maskrcnn/README.md) (ICCV'2017)
+- [x] [PANet](configs/textdet/panet/README.md) (ICCV'2019)
+- [x] [PSENet](configs/textdet/psenet/README.md) (CVPR'2019)
+- [x] [TextSnake](configs/textdet/textsnake/README.md) (ECCV'2018)
+- [x] [DRRG](configs/textdet/drrg/README.md) (CVPR'2020)
+- [x] [FCENet](configs/textdet/fcenet/README.md) (CVPR'2021)
+
+
+
+
+Text Recognition
+
+- [x] [ABINet](configs/textrecog/abinet/README.md) (CVPR'2021)
+- [x] [ASTER](configs/textrecog/aster/README.md) (TPAMI'2018)
+- [x] [CRNN](configs/textrecog/crnn/README.md) (TPAMI'2016)
+- [x] [MASTER](configs/textrecog/master/README.md) (PR'2021)
+- [x] [NRTR](configs/textrecog/nrtr/README.md) (ICDAR'2019)
+- [x] [RobustScanner](configs/textrecog/robust_scanner/README.md) (ECCV'2020)
+- [x] [SAR](configs/textrecog/sar/README.md) (AAAI'2019)
+- [x] [SATRN](configs/textrecog/satrn/README.md) (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
+- [x] [SVTR](configs/textrecog/svtr/README.md) (IJCAI'2022)
+
+
+
+
+Key Information Extraction
+
+- [x] [SDMG-R](configs/kie/sdmgr/README.md) (ArXiv'2021)
+
+
+
+
+Text Spotting
+
+- [x] [ABCNet](projects/ABCNet/README.md) (CVPR'2020)
+- [x] [ABCNetV2](projects/ABCNet/README_V2.md) (TPAMI'2021)
+- [x] [SPTS](projects/SPTS/README.md) (ACM MM'2022)
+
+
+
+Please refer to [model_zoo](https://mmocr.readthedocs.io/en/dev-1.x/modelzoo.html) for more details.
+
+## Projects
+
+[Here](projects/README.md) are some implementations of SOTA models and solutions built on MMOCR, which are supported and maintained by community users. These projects demonstrate the best practices based on MMOCR for research and product development. We welcome and appreciate all the contributions to OpenMMLab ecosystem.
+
+## Contributing
+
+We appreciate all contributions to improve MMOCR. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guidelines.
+
+## Acknowledgement
+
+MMOCR is an open-source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks.
+We hope the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new OCR methods.
+
+## Citation
+
+If you find this project useful in your research, please consider cite:
+
+```bibtex
+@article{mmocr2021,
+ title={MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding},
+ author={Kuang, Zhanghui and Sun, Hongbin and Li, Zhizhong and Yue, Xiaoyu and Lin, Tsui Hin and Chen, Jianyong and Wei, Huaqiang and Zhu, Yiqin and Gao, Tong and Zhang, Wenwei and Chen, Kai and Zhang, Wayne and Lin, Dahua},
+ journal= {arXiv preprint arXiv:2108.06543},
+ year={2021}
+}
+```
+
+## License
+
+This project is released under the [Apache 2.0 license](LICENSE).
+
+## OpenMMLab Family
+
+- [MMEngine](https://github.com/open-mmlab/mmengine): OpenMMLab foundational library for training deep learning models
+- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision.
+- [MIM](https://github.com/open-mmlab/mim): MIM installs OpenMMLab packages.
+- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab image classification toolbox and benchmark.
+- [MMDetection](https://github.com/open-mmlab/mmdetection): OpenMMLab detection toolbox and benchmark.
+- [MMDetection3D](https://github.com/open-mmlab/mmdetection3d): OpenMMLab's next-generation platform for general 3D object detection.
+- [MMRotate](https://github.com/open-mmlab/mmrotate): OpenMMLab rotated object detection toolbox and benchmark.
+- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation): OpenMMLab semantic segmentation toolbox and benchmark.
+- [MMOCR](https://github.com/open-mmlab/mmocr): OpenMMLab text detection, recognition, and understanding toolbox.
+- [MMPose](https://github.com/open-mmlab/mmpose): OpenMMLab pose estimation toolbox and benchmark.
+- [MMHuman3D](https://github.com/open-mmlab/mmhuman3d): OpenMMLab 3D human parametric model toolbox and benchmark.
+- [MMSelfSup](https://github.com/open-mmlab/mmselfsup): OpenMMLab self-supervised learning toolbox and benchmark.
+- [MMRazor](https://github.com/open-mmlab/mmrazor): OpenMMLab model compression toolbox and benchmark.
+- [MMFewShot](https://github.com/open-mmlab/mmfewshot): OpenMMLab fewshot learning toolbox and benchmark.
+- [MMAction2](https://github.com/open-mmlab/mmaction2): OpenMMLab's next-generation action understanding toolbox and benchmark.
+- [MMTracking](https://github.com/open-mmlab/mmtracking): OpenMMLab video perception toolbox and benchmark.
+- [MMFlow](https://github.com/open-mmlab/mmflow): OpenMMLab optical flow toolbox and benchmark.
+- [MMEditing](https://github.com/open-mmlab/mmediting): OpenMMLab image and video editing toolbox.
+- [MMGeneration](https://github.com/open-mmlab/mmgeneration): OpenMMLab image and video generative models toolbox.
+- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab model deployment framework.
+
+## Welcome to the OpenMMLab community
+
+Scan the QR code below to follow the OpenMMLab team's [**Zhihu Official Account**](https://www.zhihu.com/people/openmmlab) and join the OpenMMLab team's [**QQ Group**](https://jq.qq.com/?_wv=1027&k=aCvMxdr3), or join the official communication WeChat group by adding the WeChat, or join our [**Slack**](https://join.slack.com/t/mmocrworkspace/shared_invite/zt-1ifqhfla8-yKnLO_aKhVA2h71OrK8GZw)
+
+
+
+
+
+We will provide you with the OpenMMLab community
+
+- ๐ข share the latest core technologies of AI frameworks
+- ๐ป Explaining PyTorch common module source Code
+- ๐ฐ News related to the release of OpenMMLab
+- ๐ Introduction of cutting-edge algorithms developed by OpenMMLab
+ ๐ Get the more efficient answer and feedback
+- ๐ฅ Provide a platform for communication with developers from all walks of life
+
+The OpenMMLab community looks forward to your participation! ๐ฌ
diff --git a/mmocr-dev-1.x/README_zh-CN.md b/mmocr-dev-1.x/README_zh-CN.md
new file mode 100644
index 0000000000000000000000000000000000000000..c38839637ec2e410e830cbcc8eb45160b178f8fd
--- /dev/null
+++ b/mmocr-dev-1.x/README_zh-CN.md
@@ -0,0 +1,250 @@
+
+
+ๆไปฌไผๅจ OpenMMLab ็คพๅบไธบๅคงๅฎถ
+
+- ๐ข ๅไบซ AI ๆกๆถ็ๅๆฒฟๆ ธๅฟๆๆฏ
+- ๐ป ่งฃ่ฏป PyTorch ๅธธ็จๆจกๅๆบ็
+- ๐ฐ ๅๅธ OpenMMLab ็็ธๅ ณๆฐ้ป
+- ๐ ไป็ป OpenMMLab ๅผๅ็ๅๆฒฟ็ฎๆณ
+- ๐ ่ทๅๆด้ซๆ็้ฎ้ข็ญ็ๅๆ่งๅ้ฆ
+- ๐ฅ ๆไพไธๅ่กๅไธๅผๅ่ ๅ ๅไบคๆต็ๅนณๅฐ
+
+ๅนฒ่ดงๆปกๆปก ๐๏ผ็ญไฝ ๆฅๆฉ ๐๏ผOpenMMLab ็คพๅบๆๅพ ๆจ็ๅ ๅ ฅ ๐ฌ
diff --git a/mmocr-dev-1.x/configs/backbone/oclip/README.md b/mmocr-dev-1.x/configs/backbone/oclip/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e29cf971f6f8e6ba6c4fc640e6d06c5583d2909d
--- /dev/null
+++ b/mmocr-dev-1.x/configs/backbone/oclip/README.md
@@ -0,0 +1,41 @@
+# oCLIP
+
+> [Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880282.pdf)
+
+
+
+## Abstract
+
+Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-language tasks by jointly learning visual and textual representations, which intuitively helps in Optical Character Recognition (OCR) tasks due to the rich visual and textual information in scene text images. However, these methods cannot well cope with OCR tasks because of the difficulty in both instance-level text encoding and image-text pair acquisition (i.e. images and captured texts in them). This paper presents a weakly supervised pre-training method, oCLIP, which can acquire effective scene text representations by jointly learning and aligning visual and textual information. Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations. With the learning of textual features, the pre-trained model can attend texts in images well with character awareness. Besides, these designs enable the learning from weakly annotated texts (i.e. partial texts in images without text bounding boxes) which mitigates the data annotation constraint greatly. Experiments over the weakly annotated images in ICDAR2019-LSVT show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks, respectively. In addition, the proposed method outperforms existing pre-training techniques consistently across multiple public datasets (e.g., +3.2% and +1.3% for Total-Text and CTW1500).
+
+
+
+
+
+## Models
+
+| Backbone | Pre-train Data | Model |
+| :-------: | :------------: | :-------------------------------------------------------------------------------: |
+| ResNet-50 | SynthText | [Link](https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth) |
+
+```{note}
+The model is converted from the official [oCLIP](https://github.com/bytedance/oclip.git).
+```
+
+## Supported Text Detection Models
+
+| | [DBNet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#dbnet) | [DBNet++](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#dbnetpp) | [FCENet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fcenet) | [TextSnake](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fcenet) | [PSENet](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#psenet) | [DRRG](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#drrg) | [Mask R-CNN](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#mask-r-cnn) |
+| :-------: | :------------------------------------------------------------------------: | :----------------------------------------------------------------------------: | :--------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------: | :----------------------------------------------------------------------: | :----------------------------------------------------------------------------------: |
+| ICDAR2015 | โ | โ | โ | | โ | | โ |
+| CTW1500 | | | โ | โ | โ | โ | โ |
+
+## Citation
+
+```bibtex
+@article{xue2022language,
+ title={Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting},
+ author={Xue, Chuhui and Zhang, Wenqing and Hao, Yu and Lu, Shijian and Torr, Philip and Bai, Song},
+ journal={Proceedings of the European Conference on Computer Vision (ECCV)},
+ year={2022}
+}
+```
diff --git a/mmocr-dev-1.x/configs/backbone/oclip/metafile.yml b/mmocr-dev-1.x/configs/backbone/oclip/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..8953af1b6b3c7b6190602be0af9e07753ed67518
--- /dev/null
+++ b/mmocr-dev-1.x/configs/backbone/oclip/metafile.yml
@@ -0,0 +1,13 @@
+Collections:
+- Name: oCLIP
+ Metadata:
+ Training Data: SynthText
+ Architecture:
+ - CLIPResNet
+ Paper:
+ URL: https://arxiv.org/abs/2203.03911
+ Title: 'Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting'
+ README: configs/backbone/oclip/README.md
+
+Models:
+ Weights: https://download.openmmlab.com/mmocr/backbone/resnet50-oclip-7ba0c533.pth
diff --git a/mmocr-dev-1.x/configs/kie/_base_/datasets/wildreceipt-openset.py b/mmocr-dev-1.x/configs/kie/_base_/datasets/wildreceipt-openset.py
new file mode 100644
index 0000000000000000000000000000000000000000..f82512839cdea57e559bd375be2a3f4146558af3
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/_base_/datasets/wildreceipt-openset.py
@@ -0,0 +1,26 @@
+wildreceipt_openset_data_root = 'data/wildreceipt/'
+
+wildreceipt_openset_train = dict(
+ type='WildReceiptDataset',
+ data_root=wildreceipt_openset_data_root,
+ metainfo=dict(category=[
+ dict(id=0, name='bg'),
+ dict(id=1, name='key'),
+ dict(id=2, name='value'),
+ dict(id=3, name='other')
+ ]),
+ ann_file='openset_train.txt',
+ pipeline=None)
+
+wildreceipt_openset_test = dict(
+ type='WildReceiptDataset',
+ data_root=wildreceipt_openset_data_root,
+ metainfo=dict(category=[
+ dict(id=0, name='bg'),
+ dict(id=1, name='key'),
+ dict(id=2, name='value'),
+ dict(id=3, name='other')
+ ]),
+ ann_file='openset_test.txt',
+ test_mode=True,
+ pipeline=None)
diff --git a/mmocr-dev-1.x/configs/kie/_base_/datasets/wildreceipt.py b/mmocr-dev-1.x/configs/kie/_base_/datasets/wildreceipt.py
new file mode 100644
index 0000000000000000000000000000000000000000..9c1122edd53c5c8df4bad55ad764c12e1714026a
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/_base_/datasets/wildreceipt.py
@@ -0,0 +1,16 @@
+wildreceipt_data_root = 'data/wildreceipt/'
+
+wildreceipt_train = dict(
+ type='WildReceiptDataset',
+ data_root=wildreceipt_data_root,
+ metainfo=wildreceipt_data_root + 'class_list.txt',
+ ann_file='train.txt',
+ pipeline=None)
+
+wildreceipt_test = dict(
+ type='WildReceiptDataset',
+ data_root=wildreceipt_data_root,
+ metainfo=wildreceipt_data_root + 'class_list.txt',
+ ann_file='test.txt',
+ test_mode=True,
+ pipeline=None)
diff --git a/mmocr-dev-1.x/configs/kie/_base_/default_runtime.py b/mmocr-dev-1.x/configs/kie/_base_/default_runtime.py
new file mode 100644
index 0000000000000000000000000000000000000000..bcc5b3fa02a0f3259f701cddecbc307988424a6b
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/_base_/default_runtime.py
@@ -0,0 +1,33 @@
+default_scope = 'mmocr'
+env_cfg = dict(
+ cudnn_benchmark=False,
+ mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
+ dist_cfg=dict(backend='nccl'),
+)
+randomness = dict(seed=None)
+
+default_hooks = dict(
+ timer=dict(type='IterTimerHook'),
+ logger=dict(type='LoggerHook', interval=100),
+ param_scheduler=dict(type='ParamSchedulerHook'),
+ checkpoint=dict(type='CheckpointHook', interval=1),
+ sampler_seed=dict(type='DistSamplerSeedHook'),
+ sync_buffer=dict(type='SyncBuffersHook'),
+ visualization=dict(
+ type='VisualizationHook',
+ interval=1,
+ enable=False,
+ show=False,
+ draw_gt=False,
+ draw_pred=False),
+)
+
+# Logging
+log_level = 'INFO'
+log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
+
+load_from = None
+resume = False
+
+visualizer = dict(
+ type='KIELocalVisualizer', name='visualizer', is_openset=False)
diff --git a/mmocr-dev-1.x/configs/kie/_base_/schedules/schedule_adam_60e.py b/mmocr-dev-1.x/configs/kie/_base_/schedules/schedule_adam_60e.py
new file mode 100644
index 0000000000000000000000000000000000000000..fd7147e2b86a8640966617bae1eb86d3347057f9
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/_base_/schedules/schedule_adam_60e.py
@@ -0,0 +1,10 @@
+# optimizer
+optim_wrapper = dict(
+ type='OptimWrapper', optimizer=dict(type='Adam', weight_decay=0.0001))
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=60, val_interval=1)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+# learning rate
+param_scheduler = [
+ dict(type='MultiStepLR', milestones=[40, 50], end=60),
+]
diff --git a/mmocr-dev-1.x/configs/kie/sdmgr/README.md b/mmocr-dev-1.x/configs/kie/sdmgr/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..921af5310e46803c937168c6e1c0bdf17a372798
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/sdmgr/README.md
@@ -0,0 +1,41 @@
+# SDMGR
+
+> [Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470)
+
+
+
+## Abstract
+
+Key information extraction from document images is of paramount importance in office automation. Conventional template matching based approaches fail to generalize well to document images of unseen templates, and are not robust against text recognition errors. In this paper, we propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured document images. We model document images as dual-modality graphs, nodes of which encode both the visual and textual features of detected text regions, and edges of which represent the spatial relations between neighboring text regions. The key information extraction is solved by iteratively propagating messages along graph edges and reasoning the categories of graph nodes. In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild. It contains 25 key information categories, a total of about 69000 text boxes, and is about 2 times larger than the existing public datasets. Extensive experiments validate that all information including visual features, textual features and spatial relations can benefit key information extraction. It has been shown that SDMG-R can effectively extract key information from document images of unseen templates, and obtain new state-of-the-art results on the recent popular benchmark SROIE and our WildReceipt. Our code and dataset will be publicly released.
+
+
+
+
+
+## Results and models
+
+### WildReceipt
+
+| Method | Modality | Macro F1-Score | Download |
+| :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------: |
+| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.890 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/20220825_151648.log) |
+| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.873 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/20220831_193317.log) |
+
+### WildReceiptOpenset
+
+| Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download |
+| :-------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------: |
+| [sdmgr_novisual_openset](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py) | Textual | 0.792 | 0.931 | 0.940 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/20220831_200807.log) |
+
+## Citation
+
+```bibtex
+@misc{sun2021spatial,
+ title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction},
+ author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang},
+ year={2021},
+ eprint={2103.14470},
+ archivePrefix={arXiv},
+ primaryClass={cs.CV}
+}
+```
diff --git a/mmocr-dev-1.x/configs/kie/sdmgr/_base_sdmgr_novisual.py b/mmocr-dev-1.x/configs/kie/sdmgr/_base_sdmgr_novisual.py
new file mode 100644
index 0000000000000000000000000000000000000000..5e85de2f78f020bd5695858098ad143dbbd09ed0
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/sdmgr/_base_sdmgr_novisual.py
@@ -0,0 +1,35 @@
+num_classes = 26
+
+model = dict(
+ type='SDMGR',
+ kie_head=dict(
+ type='SDMGRHead',
+ visual_dim=16,
+ num_classes=num_classes,
+ module_loss=dict(type='SDMGRModuleLoss'),
+ postprocessor=dict(type='SDMGRPostProcessor')),
+ dictionary=dict(
+ type='Dictionary',
+ dict_file='{{ fileDirname }}/../../../dicts/sdmgr_dict.txt',
+ with_padding=True,
+ with_unknown=True,
+ unknown_token=None),
+)
+
+train_pipeline = [
+ dict(type='LoadKIEAnnotations'),
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
+ dict(type='PackKIEInputs')
+]
+test_pipeline = [
+ dict(type='LoadKIEAnnotations'),
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
+ dict(type='PackKIEInputs'),
+]
+
+val_evaluator = dict(
+ type='F1Metric',
+ mode='macro',
+ num_classes=num_classes,
+ ignored_classes=[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 25])
+test_evaluator = val_evaluator
diff --git a/mmocr-dev-1.x/configs/kie/sdmgr/_base_sdmgr_unet16.py b/mmocr-dev-1.x/configs/kie/sdmgr/_base_sdmgr_unet16.py
new file mode 100644
index 0000000000000000000000000000000000000000..76aa631bdfbbf29013d27ac76c0e160d232d1500
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/sdmgr/_base_sdmgr_unet16.py
@@ -0,0 +1,28 @@
+_base_ = '_base_sdmgr_novisual.py'
+
+model = dict(
+ backbone=dict(type='UNet', base_channels=16),
+ roi_extractor=dict(
+ type='mmdet.SingleRoIExtractor',
+ roi_layer=dict(type='RoIAlign', output_size=7),
+ featmap_strides=[1]),
+ data_preprocessor=dict(
+ type='ImgDataPreprocessor',
+ mean=[123.675, 116.28, 103.53],
+ std=[58.395, 57.12, 57.375],
+ bgr_to_rgb=True,
+ pad_size_divisor=32),
+)
+
+train_pipeline = [
+ dict(type='LoadImageFromFile'),
+ dict(type='LoadKIEAnnotations'),
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
+ dict(type='PackKIEInputs')
+]
+test_pipeline = [
+ dict(type='LoadImageFromFile'),
+ dict(type='LoadKIEAnnotations'),
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
+ dict(type='PackKIEInputs', meta_keys=('img_path', )),
+]
diff --git a/mmocr-dev-1.x/configs/kie/sdmgr/metafile.yml b/mmocr-dev-1.x/configs/kie/sdmgr/metafile.yml
new file mode 100644
index 0000000000000000000000000000000000000000..da430e3d87ab7fe02a9560f7d0e441cce2ccf929
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/sdmgr/metafile.yml
@@ -0,0 +1,52 @@
+Collections:
+- Name: SDMGR
+ Metadata:
+ Training Data: KIEDataset
+ Training Techniques:
+ - Adam
+ Training Resources: 1x NVIDIA A100-SXM4-80GB
+ Architecture:
+ - UNet
+ - SDMGRHead
+ Paper:
+ URL: https://arxiv.org/abs/2103.14470.pdf
+ Title: 'Spatial Dual-Modality Graph Reasoning for Key Information Extraction'
+ README: configs/kie/sdmgr/README.md
+
+Models:
+ - Name: sdmgr_unet16_60e_wildreceipt
+ Alias: SDMGR
+ In Collection: SDMGR
+ Config: configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py
+ Metadata:
+ Training Data: wildreceipt
+ Results:
+ - Task: Key Information Extraction
+ Dataset: wildreceipt
+ Metrics:
+ macro_f1: 0.890
+ Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth
+ - Name: sdmgr_novisual_60e_wildreceipt
+ In Collection: SDMGR
+ Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py
+ Metadata:
+ Training Data: wildreceipt
+ Results:
+ - Task: Key Information Extraction
+ Dataset: wildreceipt
+ Metrics:
+ macro_f1: 0.873
+ Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth
+ - Name: sdmgr_novisual_60e_wildreceipt_openset
+ In Collection: SDMGR
+ Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py
+ Metadata:
+ Training Data: wildreceipt-openset
+ Results:
+ - Task: Key Information Extraction
+ Dataset: wildreceipt
+ Metrics:
+ macro_f1: 0.931
+ micro_f1: 0.940
+ edge_micro_f1: 0.792
+ Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth
diff --git a/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py b/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc3d52a1ce93d4baf267edc923c71f2b9482e767
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py
@@ -0,0 +1,71 @@
+_base_ = [
+ '../_base_/default_runtime.py',
+ '../_base_/datasets/wildreceipt-openset.py',
+ '../_base_/schedules/schedule_adam_60e.py',
+ '_base_sdmgr_novisual.py',
+]
+
+node_num_classes = 4 # 4 classes: bg, key, value and other
+edge_num_classes = 2 # edge connectivity
+key_node_idx = 1
+value_node_idx = 2
+
+model = dict(
+ type='SDMGR',
+ kie_head=dict(
+ num_classes=node_num_classes,
+ postprocessor=dict(
+ link_type='one-to-many',
+ key_node_idx=key_node_idx,
+ value_node_idx=value_node_idx)),
+)
+
+test_pipeline = [
+ dict(
+ type='LoadKIEAnnotations',
+ key_node_idx=key_node_idx,
+ value_node_idx=value_node_idx), # Keep key->value edges for evaluation
+ dict(type='Resize', scale=(1024, 512), keep_ratio=True),
+ dict(type='PackKIEInputs'),
+]
+
+wildreceipt_openset_train = _base_.wildreceipt_openset_train
+wildreceipt_openset_train.pipeline = _base_.train_pipeline
+wildreceipt_openset_test = _base_.wildreceipt_openset_test
+wildreceipt_openset_test.pipeline = test_pipeline
+
+train_dataloader = dict(
+ batch_size=4,
+ num_workers=1,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=wildreceipt_openset_train)
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=1,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=wildreceipt_openset_test)
+test_dataloader = val_dataloader
+
+val_evaluator = [
+ dict(
+ type='F1Metric',
+ prefix='node',
+ key='labels',
+ mode=['micro', 'macro'],
+ num_classes=node_num_classes,
+ cared_classes=[key_node_idx, value_node_idx]),
+ dict(
+ type='F1Metric',
+ prefix='edge',
+ mode='micro',
+ key='edge_labels',
+ cared_classes=[1], # Collapse to binary F1 score
+ num_classes=edge_num_classes)
+]
+test_evaluator = val_evaluator
+
+visualizer = dict(
+ type='KIELocalVisualizer', name='visualizer', is_openset=True)
+auto_scale_lr = dict(base_batch_size=4)
diff --git a/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py b/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py
new file mode 100644
index 0000000000000000000000000000000000000000..b56c2b9b665b1bd5c2734aa41fa1e563feda5a81
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py
@@ -0,0 +1,28 @@
+_base_ = [
+ '../_base_/default_runtime.py',
+ '../_base_/datasets/wildreceipt.py',
+ '../_base_/schedules/schedule_adam_60e.py',
+ '_base_sdmgr_novisual.py',
+]
+
+wildreceipt_train = _base_.wildreceipt_train
+wildreceipt_train.pipeline = _base_.train_pipeline
+wildreceipt_test = _base_.wildreceipt_test
+wildreceipt_test.pipeline = _base_.test_pipeline
+
+train_dataloader = dict(
+ batch_size=4,
+ num_workers=1,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=wildreceipt_train)
+
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=1,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=wildreceipt_test)
+test_dataloader = val_dataloader
+
+auto_scale_lr = dict(base_batch_size=4)
diff --git a/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py b/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py
new file mode 100644
index 0000000000000000000000000000000000000000..d49cbbc33798e815a24cb29cf3bc008460948c88
--- /dev/null
+++ b/mmocr-dev-1.x/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py
@@ -0,0 +1,29 @@
+_base_ = [
+ '../_base_/default_runtime.py',
+ '../_base_/datasets/wildreceipt.py',
+ '../_base_/schedules/schedule_adam_60e.py',
+ '_base_sdmgr_unet16.py',
+]
+
+wildreceipt_train = _base_.wildreceipt_train
+wildreceipt_train.pipeline = _base_.train_pipeline
+wildreceipt_test = _base_.wildreceipt_test
+wildreceipt_test.pipeline = _base_.test_pipeline
+
+train_dataloader = dict(
+ batch_size=4,
+ num_workers=4,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=wildreceipt_train)
+
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=1,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=wildreceipt_test)
+
+test_dataloader = val_dataloader
+
+auto_scale_lr = dict(base_batch_size=4)
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/datasets/ctw1500.py b/mmocr-dev-1.x/configs/textdet/_base_/datasets/ctw1500.py
new file mode 100644
index 0000000000000000000000000000000000000000..3361f734d0d92752336d13b60f293b785a92e927
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/datasets/ctw1500.py
@@ -0,0 +1,15 @@
+ctw1500_textdet_data_root = 'data/ctw1500'
+
+ctw1500_textdet_train = dict(
+ type='OCRDataset',
+ data_root=ctw1500_textdet_data_root,
+ ann_file='textdet_train.json',
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=None)
+
+ctw1500_textdet_test = dict(
+ type='OCRDataset',
+ data_root=ctw1500_textdet_data_root,
+ ann_file='textdet_test.json',
+ test_mode=True,
+ pipeline=None)
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/datasets/icdar2015.py b/mmocr-dev-1.x/configs/textdet/_base_/datasets/icdar2015.py
new file mode 100644
index 0000000000000000000000000000000000000000..958cb4fa17f50ed7dc967ccceb11cfb9426cd867
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/datasets/icdar2015.py
@@ -0,0 +1,15 @@
+icdar2015_textdet_data_root = 'data/icdar2015'
+
+icdar2015_textdet_train = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textdet_data_root,
+ ann_file='textdet_train.json',
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=None)
+
+icdar2015_textdet_test = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textdet_data_root,
+ ann_file='textdet_test.json',
+ test_mode=True,
+ pipeline=None)
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/datasets/icdar2017.py b/mmocr-dev-1.x/configs/textdet/_base_/datasets/icdar2017.py
new file mode 100644
index 0000000000000000000000000000000000000000..804cb26f96f2bcfb3fdf9803cf36d79e997c57a8
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/datasets/icdar2017.py
@@ -0,0 +1,17 @@
+icdar2017_textdet_data_root = 'data/det/icdar_2017'
+
+icdar2017_textdet_train = dict(
+ type='OCRDataset',
+ data_root=icdar2017_textdet_data_root,
+ ann_file='instances_training.json',
+ data_prefix=dict(img_path='imgs/'),
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=None)
+
+icdar2017_textdet_test = dict(
+ type='OCRDataset',
+ data_root=icdar2017_textdet_data_root,
+ ann_file='instances_test.json',
+ data_prefix=dict(img_path='imgs/'),
+ test_mode=True,
+ pipeline=None)
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/datasets/synthtext.py b/mmocr-dev-1.x/configs/textdet/_base_/datasets/synthtext.py
new file mode 100644
index 0000000000000000000000000000000000000000..9b2310c36fbd89be9a99d2ecba6f823d28532e35
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/datasets/synthtext.py
@@ -0,0 +1,8 @@
+synthtext_textdet_data_root = 'data/synthtext'
+
+synthtext_textdet_train = dict(
+ type='OCRDataset',
+ data_root=synthtext_textdet_data_root,
+ ann_file='textdet_train.json',
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=None)
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/datasets/totaltext.py b/mmocr-dev-1.x/configs/textdet/_base_/datasets/totaltext.py
new file mode 100644
index 0000000000000000000000000000000000000000..29efc842fb0c558b98c1b8e805973360013b804e
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/datasets/totaltext.py
@@ -0,0 +1,15 @@
+totaltext_textdet_data_root = 'data/totaltext'
+
+totaltext_textdet_train = dict(
+ type='OCRDataset',
+ data_root=totaltext_textdet_data_root,
+ ann_file='textdet_train.json',
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=None)
+
+totaltext_textdet_test = dict(
+ type='OCRDataset',
+ data_root=totaltext_textdet_data_root,
+ ann_file='textdet_test.json',
+ test_mode=True,
+ pipeline=None)
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/datasets/toy_data.py b/mmocr-dev-1.x/configs/textdet/_base_/datasets/toy_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..50138769b7bfd99babafcc2aa6e85593c2b0dbf1
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/datasets/toy_data.py
@@ -0,0 +1,17 @@
+toy_det_data_root = 'tests/data/det_toy_dataset'
+
+toy_det_train = dict(
+ type='OCRDataset',
+ data_root=toy_det_data_root,
+ ann_file='instances_training.json',
+ data_prefix=dict(img_path='imgs/'),
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=None)
+
+toy_det_test = dict(
+ type='OCRDataset',
+ data_root=toy_det_data_root,
+ ann_file='instances_test.json',
+ data_prefix=dict(img_path='imgs/'),
+ test_mode=True,
+ pipeline=None)
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/default_runtime.py b/mmocr-dev-1.x/configs/textdet/_base_/default_runtime.py
new file mode 100644
index 0000000000000000000000000000000000000000..81480273b5a7b30d5d7113fb1cb9380b16de5e8f
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/default_runtime.py
@@ -0,0 +1,41 @@
+default_scope = 'mmocr'
+env_cfg = dict(
+ cudnn_benchmark=False,
+ mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
+ dist_cfg=dict(backend='nccl'),
+)
+randomness = dict(seed=None)
+
+default_hooks = dict(
+ timer=dict(type='IterTimerHook'),
+ logger=dict(type='LoggerHook', interval=5),
+ param_scheduler=dict(type='ParamSchedulerHook'),
+ checkpoint=dict(type='CheckpointHook', interval=20),
+ sampler_seed=dict(type='DistSamplerSeedHook'),
+ sync_buffer=dict(type='SyncBuffersHook'),
+ visualization=dict(
+ type='VisualizationHook',
+ interval=1,
+ enable=False,
+ show=False,
+ draw_gt=False,
+ draw_pred=False),
+)
+
+# Logging
+log_level = 'INFO'
+log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
+
+load_from = None
+resume = False
+
+# Evaluation
+val_evaluator = dict(type='HmeanIOUMetric')
+test_evaluator = val_evaluator
+
+# Visualization
+vis_backends = [dict(type='LocalVisBackend')]
+visualizer = dict(
+ type='TextDetLocalVisualizer',
+ name='visualizer',
+ vis_backends=vis_backends)
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/pretrain_runtime.py b/mmocr-dev-1.x/configs/textdet/_base_/pretrain_runtime.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb2800d50a570881475035e3b0da9c81e88712d1
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/pretrain_runtime.py
@@ -0,0 +1,14 @@
+_base_ = 'default_runtime.py'
+
+default_hooks = dict(
+ logger=dict(type='LoggerHook', interval=1000),
+ checkpoint=dict(
+ type='CheckpointHook',
+ interval=10000,
+ by_epoch=False,
+ max_keep_ckpts=1),
+)
+
+# Evaluation
+val_evaluator = None
+test_evaluator = None
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_adam_600e.py b/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_adam_600e.py
new file mode 100644
index 0000000000000000000000000000000000000000..eb61f7b9ee1b2ab18c8f75f24e7a204a9f90ee54
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_adam_600e.py
@@ -0,0 +1,9 @@
+# optimizer
+optim_wrapper = dict(type='OptimWrapper', optimizer=dict(type='Adam', lr=1e-3))
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=600, val_interval=20)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+# learning rate
+param_scheduler = [
+ dict(type='PolyLR', power=0.9, end=600),
+]
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_100k.py b/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_100k.py
new file mode 100644
index 0000000000000000000000000000000000000000..f760774b7b2e21886fc3bbe0746fe3bf843d3471
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_100k.py
@@ -0,0 +1,12 @@
+# optimizer
+optim_wrapper = dict(
+ type='OptimWrapper',
+ optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
+
+train_cfg = dict(type='IterBasedTrainLoop', max_iters=100000)
+test_cfg = None
+val_cfg = None
+# learning policy
+param_scheduler = [
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, by_epoch=False, end=100000),
+]
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_1200e.py b/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_1200e.py
new file mode 100644
index 0000000000000000000000000000000000000000..f8555e468bccaa6e5dbca23c9d2821164e21e516
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_1200e.py
@@ -0,0 +1,11 @@
+# optimizer
+optim_wrapper = dict(
+ type='OptimWrapper',
+ optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=1200, val_interval=20)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+# learning policy
+param_scheduler = [
+ dict(type='PolyLR', power=0.9, eta_min=1e-7, end=1200),
+]
diff --git a/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_base.py b/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_base.py
new file mode 100644
index 0000000000000000000000000000000000000000..baf559de231db06382529079be7d5bba071b209e
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/_base_/schedules/schedule_sgd_base.py
@@ -0,0 +1,15 @@
+# Note: This schedule config serves as a base config for other schedules.
+# Users would have to at least fill in "max_epochs" and "val_interval"
+# in order to use this config in their experiments.
+
+# optimizer
+optim_wrapper = dict(
+ type='OptimWrapper',
+ optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=None, val_interval=20)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+# learning policy
+param_scheduler = [
+ dict(type='ConstantLR', factor=1.0),
+]
diff --git a/mmocr-dev-1.x/configs/textdet/dbnet/README.md b/mmocr-dev-1.x/configs/textdet/dbnet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..07c91edbaf8c8bbe96ae59fc8d17725314da47c8
--- /dev/null
+++ b/mmocr-dev-1.x/configs/textdet/dbnet/README.md
@@ -0,0 +1,47 @@
+# DBNet
+
+> [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
+
+
+
+## Abstract
+
+Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset.
+
+
+
+In this tutorial, we will introduce some common interfaces of the Dataset class, and the usage of Dataset implementations in MMOCR as well as the annotation types they support.
+
+```{tip}
+Dataset class supports some advanced features, such as lazy initialization and data serialization, and takes advantage of various dataset wrappers to perform data concatenation, repeating, and category balancing. These content will not be covered in this tutorial, but you can read {external+mmengine:doc}`MMEngine: BaseDataset ` for more details.
+```
+
+## Common Interfaces
+
+Now, let's look at a concrete example and learn some typical interfaces of a Dataset class.
+`OCRDataset` is a widely used Dataset implementation in MMOCR, and is suggested as a default Dataset type in MMOCR as its associated annotation format is flexible enough to support *all* the OCR tasks ([more info](#ocrdataset)). Now we will instantiate an `OCRDataset` object wherein the toy dataset in `tests/data/det_toy_dataset` will be loaded.
+
+```python
+from mmocr.datasets import OCRDataset
+from mmengine.registry import init_default_scope
+init_default_scope('mmocr')
+
+train_pipeline = [
+ dict(
+ type='LoadImageFromFile'),
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ ),
+ dict(type='RandomCrop', min_side_ratio=0.1),
+ dict(type='Resize', scale=(640, 640), keep_ratio=True),
+ dict(type='Pad', size=(640, 640)),
+ dict(
+ type='PackTextDetInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
+]
+dataset = OCRDataset(
+ data_root='tests/data/det_toy_dataset',
+ ann_file='textdet_test.json',
+ test_mode=False,
+ pipeline=train_pipeline)
+
+```
+
+Let's peek the size of this dataset:
+
+```python
+>>> print(len(dataset))
+
+10
+```
+
+Typically, a Dataset class loads and stores two types of information: (1) **meta information**: Some meta descriptors of the dataset's property, such as available object categories in this dataset. (2) **annotation**: The path to images, and their labels. We can access the meta information in `dataset.metainfo`:
+
+```python
+>>> from pprint import pprint
+>>> pprint(dataset.metainfo)
+
+{'category': [{'id': 0, 'name': 'text'}],
+ 'dataset_type': 'TextDetDataset',
+ 'task_name': 'textdet'}
+```
+
+As for the annotations, we can access them via `dataset.get_data_info(idx)`, which returns a dictionary containing the information of the `idx`-th sample in the dataset that is initially parsed, but not yet processed by [data pipeline](./transforms.md).
+
+```python
+>>> from pprint import pprint
+>>> pprint(dataset.get_data_info(0))
+
+{'height': 720,
+ 'img_path': 'tests/data/det_toy_dataset/test/img_10.jpg',
+ 'instances': [{'bbox': [260.0, 138.0, 284.0, 158.0],
+ 'bbox_label': 0,
+ 'ignore': True,
+ 'polygon': [261, 138, 284, 140, 279, 158, 260, 158]},
+ ...,
+ {'bbox': [1011.0, 157.0, 1079.0, 173.0],
+ 'bbox_label': 0,
+ 'ignore': True,
+ 'polygon': [1011, 157, 1079, 160, 1076, 173, 1011, 170]}],
+ 'sample_idx': 0,
+ 'seg_map': 'test/gt_img_10.txt',
+ 'width': 1280}
+
+```
+
+On the other hand, we can get the sample fully processed by data pipeline via `dataset[idx]` or `dataset.__getitem__(idx)`, which is directly feedable to models and perform a full train/test cycle. It has two fields:
+
+- `inputs`: The image after data augmentation;
+- `data_samples`: The [DataSample](./structures.md) that contains the augmented annotations, and meta information appended by some data transforms to keep track of some key properties of this sample.
+
+```python
+>>> pprint(dataset[0])
+
+{'data_samples':
+) at 0x7f735a0508e0>,
+ 'inputs': tensor([[[129, 111, 131, ..., 0, 0, 0], ...
+ [ 19, 18, 15, ..., 0, 0, 0]]], dtype=torch.uint8)}
+```
+
+## Dataset Classes and Annotation Formats
+
+Each Dataset implementation can only load datasets in a specific annotation format. Here lists all supported Dataset classes and their compatible annotation formats, as well as an example config that showcases how to use them in practice.
+
+```{note}
+If you are not familiar with the config system, you may find [Dataset Configuration](../user_guides/dataset_prepare.md#dataset-configuration) helpful.
+```
+
+### OCRDataset
+
+Usually, there are many different types of annotations in OCR datasets, and the formats often vary between different subtasks, such as text detection and text recognition. These differences can result in the need for different data loading code when using different datasets, increasing the learning and maintenance costs for users.
+
+In MMOCR, we propose a unified dataset format that can adapt to all three subtasks of OCR: text detection, text recognition, and text spotting. This design maximizes the uniformity of the dataset, allows for the reuse of data annotations across different tasks, and makes dataset management more convenient. Considering that popular dataset formats are still inconsistent, MMOCR provides [Dataset Preparer](../user_guides/data_prepare/dataset_preparer.md) to help users convert their datasets to MMOCR format. We also strongly encourage researchers to develop their own datasets based on this data format.
+
+#### Annotation Format
+
+This annotation file is a `.json` file that stores a `dict`, containing both `metainfo` and `data_list`, where the former includes basic information about the dataset and the latter consists of the label item of each target instance. Here presents an extensive list of all the fields in the annotation file, but some fields are used in a subset of tasks and can be ignored in other tasks.
+
+```python
+{
+ "metainfo":
+ {
+ "dataset_type": "TextDetDataset", # Options: TextDetDataset/TextRecogDataset/TextSpotterDataset
+ "task_name": "textdet", # Options: textdet/textspotter/textrecog
+ "category": [{"id": 0, "name": "text"}] # Used in textdet/textspotter
+ },
+ "data_list":
+ [
+ {
+ "img_path": "test_img.jpg",
+ "height": 604,
+ "width": 640,
+ "instances": # multiple instances in one image
+ [
+ {
+ "bbox": [0, 0, 10, 20], # in textdet/textspotter, [x1, y1, x2, y2].
+ "bbox_label": 0, # The object category, always 0 (text) in MMOCR
+ "polygon": [0, 0, 0, 10, 10, 20, 20, 0], # in textdet/textspotter. [x1, y1, x2, y2, ....]
+ "text": "mmocr", # in textspotter/textrecog
+ "ignore": False # in textspotter/textdet. Whether to ignore this sample during training
+ },
+ #...
+ ],
+ }
+ #... multiple images
+ ]
+}
+```
+
+#### Example Config
+
+Here is a part of config example where we make `train_dataloader` use `OCRDataset` to load the ICDAR2015 dataset for a text detection model. Keep in mind that `OCRDataset` can load any OCR datasets prepared by Dataset Preparer regardless of its task. That is, you can use it for text recognition and text spotting, but you still have to modify the transform types in `pipeline` according to the needs of different tasks.
+
+```python
+pipeline = [
+ dict(
+ type='LoadImageFromFile'),
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ ),
+ dict(
+ type='PackTextDetInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
+]
+
+icdar2015_textdet_train = dict(
+ type='OCRDataset',
+ data_root='data/icdar2015',
+ ann_file='textdet_train.json',
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=pipeline)
+
+train_dataloader = dict(
+ batch_size=16,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=icdar2015_textdet_train)
+```
+
+### RecogLMDBDataset
+
+Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. ([Example](https://github.com/clovaai/deep-text-recognition-benchmark))
+
+To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR supports loading images and labels from lmdb datasets via `RecogLMDBDataset`.
+
+#### Annotation Format
+
+MMOCR requires the following keys for LMDB datasets:
+
+- `num_samples`: The parameter describing the data volume of the dataset.
+- The keys of images and labels are in the
+ format of `image-000000001` and `label-000000001`, respectively. The index starts from 1.
+
+MMOCR has a toy LMDB dataset in `tests/data/rec_toy_dataset/imgs.lmdb`.
+You can get a sense of the format with the following code snippet.
+
+```python
+>>> import lmdb
+>>>
+>>> env = lmdb.open('tests/data/rec_toy_dataset/imgs.lmdb')
+>>> txn = env.begin()
+>>> for k, v in txn.cursor():
+>>> print(k, v)
+
+b'image-000000001' b'\xff...'
+b'image-000000002' b'\xff...'
+b'image-000000003' b'\xff...'
+b'image-000000004' b'\xff...'
+b'image-000000005' b'\xff...'
+b'image-000000006' b'\xff...'
+b'image-000000007' b'\xff...'
+b'image-000000008' b'\xff...'
+b'image-000000009' b'\xff...'
+b'image-000000010' b'\xff...'
+b'label-000000001' b'GRAND'
+b'label-000000002' b'HOTEL'
+b'label-000000003' b'HOTEL'
+b'label-000000004' b'PACIFIC'
+b'label-000000005' b'03/09/2009'
+b'label-000000006' b'ANING'
+b'label-000000007' b'Virgin'
+b'label-000000008' b'america'
+b'label-000000009' b'ATTACK'
+b'label-000000010' b'DAVIDSON'
+b'num-samples' b'10'
+```
+
+#### Example Config
+
+Here is a part of config example where we make `train_dataloader` use `RecogLMDBDataset` to load the toy dataset. Since `RecogLMDBDataset` loads images as numpy arrays, don't forget to use `LoadImageFromNDArray` instead of `LoadImageFromFile` in the pipeline for successful loading.
+
+```python
+pipeline = [
+ dict(
+ type='LoadImageFromNDArray'),
+ dict(
+ type='LoadOCRAnnotations',
+ with_text=True,
+ ),
+ dict(
+ type='PackTextRecogInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
+]
+
+toy_textrecog_train = dict(
+ type='RecogLMDBDataset',
+ data_root='tests/data/rec_toy_dataset/',
+ ann_file='imgs.lmdb',
+ pipeline=pipeline)
+
+train_dataloader = dict(
+ batch_size=16,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=toy_textrecog_train)
+```
+
+### RecogTextDataset
+
+Prior to MMOCR 1.0, MMOCR 0.x takes text files as input for text recognition. These formats has been deprecated in MMOCR 1.0, and this class could be removed anytime in the future. [More info](../migration/dataset.md)
+
+#### Annotation Format
+
+Text files can either be in `txt` format or `jsonl` format. The simple `.txt` annotations separate image name and word annotation by a blank space, which cannot handle the case when spaces are included in a text instance.
+
+```text
+img1.jpg OpenMMLab
+img2.jpg MMOCR
+```
+
+The JSON Line format uses a dictionary-like structure to represent the annotations, where the keys `filename` and `text` store the image name and word label, respectively.
+
+```json
+{"filename": "img1.jpg", "text": "OpenMMLab"}
+{"filename": "img2.jpg", "text": "MMOCR"}
+```
+
+#### Example Config
+
+Here is a part of config example where we use `RecogTextDataset` to load the old txt labels in training, and the old jsonl labels in testing.
+
+```python
+pipeline = [
+ dict(
+ type='LoadImageFromFile'),
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ ),
+ dict(
+ type='PackTextDetInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
+]
+
+ # loading 0.x txt format annos
+ txt_dataset = dict(
+ type='RecogTextDataset',
+ data_root=data_root,
+ ann_file='old_label.txt',
+ data_prefix=dict(img_path='imgs'),
+ parser_cfg=dict(
+ type='LineStrParser',
+ keys=['filename', 'text'],
+ keys_idx=[0, 1]),
+ pipeline=pipeline)
+
+
+train_dataloader = dict(
+ batch_size=16,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=txt_dataset)
+
+ # loading 0.x json line format annos
+ jsonl_dataset = dict(
+ type='RecogTextDataset',
+ data_root=data_root,
+ ann_file='old_label.jsonl',
+ data_prefix=dict(img_path='imgs'),
+ parser_cfg=dict(
+ type='LineJsonParser',
+ keys=['filename', 'text'],
+ pipeline=pipeline))
+
+test_dataloader = dict(
+ batch_size=16,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=jsonl_dataset)
+```
+
+### IcdarDataset
+
+Prior to MMOCR 1.0, MMOCR 0.x takes COCO-like format annotations as input for text detection. These formats has been deprecated in MMOCR 1.0, and this class could be removed anytime in the future. [More info](../migration/dataset.md)
+
+#### Annotation Format
+
+```json
+{
+ "images": [
+ {
+ "id": 1,
+ "width": 800,
+ "height": 600,
+ "file_name": "test.jpg"
+ }
+ ],
+ "annotations": [
+ {
+ "id": 1,
+ "image_id": 1,
+ "category_id": 1,
+ "bbox": [0,0,10,10],
+ "segmentation": [
+ [0,0,10,0,10,10,0,10]
+ ],
+ "area": 100,
+ "iscrowd": 0
+ }
+ ]
+}
+```
+
+#### Example Config
+
+Here is a part of config example where we make `train_dataloader` use `IcdarDataset` to load the old labels.
+
+```python
+pipeline = [
+ dict(
+ type='LoadImageFromFile'),
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ ),
+ dict(
+ type='PackTextDetInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
+]
+
+icdar2015_textdet_train = dict(
+ type='IcdarDatasetDataset',
+ data_root='data/det/icdar2015',
+ ann_file='instances_training.json',
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=pipeline)
+
+train_dataloader = dict(
+ batch_size=16,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=icdar2015_textdet_train)
+```
+
+### WildReceiptDataset
+
+It's customized for [WildReceipt](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/datasetzoo.html#wildreceipt) dataset only.
+
+#### Annotation Format
+
+```json
+// Close Set
+{
+ "file_name": "image_files/Image_16/11/d5de7f2a20751e50b84c747c17a24cd98bed3554.jpeg",
+ "height": 1200,
+ "width": 1600,
+ "annotations":
+ [
+ {
+ "box": [550.0, 190.0, 937.0, 190.0, 937.0, 104.0, 550.0, 104.0],
+ "text": "SAFEWAY",
+ "label": 1
+ },
+ {
+ "box": [1048.0, 211.0, 1074.0, 211.0, 1074.0, 196.0, 1048.0, 196.0],
+ "text": "TM",
+ "label": 25
+ }
+ ], //...
+}
+
+// Open Set
+{
+ "file_name": "image_files/Image_12/10/845be0dd6f5b04866a2042abd28d558032ef2576.jpeg",
+ "height": 348,
+ "width": 348,
+ "annotations":
+ [
+ {
+ "box": [114.0, 19.0, 230.0, 19.0, 230.0, 1.0, 114.0, 1.0],
+ "text": "CHOEUN",
+ "label": 2,
+ "edge": 1
+ },
+ {
+ "box": [97.0, 35.0, 236.0, 35.0, 236.0, 19.0, 97.0, 19.0],
+ "text": "KOREANRESTAURANT",
+ "label": 2,
+ "edge": 1
+ }
+ ]
+}
+```
+
+#### Example Config
+
+Please refer to [SDMGR's config](https://github.com/open-mmlab/mmocr/blob/f30c16ce96bd2393570c04eeb9cf48a7916315cc/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) for more details.
diff --git a/mmocr-dev-1.x/docs/en/basic_concepts/engine.md b/mmocr-dev-1.x/docs/en/basic_concepts/engine.md
new file mode 100644
index 0000000000000000000000000000000000000000..a113015ac6e77292e4e43779c2c498af12ea927c
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/basic_concepts/engine.md
@@ -0,0 +1,3 @@
+# Engine\[coming soon\]
+
+Coming Soon!
diff --git a/mmocr-dev-1.x/docs/en/basic_concepts/evaluation.md b/mmocr-dev-1.x/docs/en/basic_concepts/evaluation.md
new file mode 100644
index 0000000000000000000000000000000000000000..ef477e967d646c6b000e44a75587440896d80490
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/basic_concepts/evaluation.md
@@ -0,0 +1,197 @@
+# Evaluation
+
+```{note}
+Before reading this document, we recommend that you first read {external+mmengine:doc}`MMEngine: Model Accuracy Evaluation Basics `.
+```
+
+## Metrics
+
+MMOCR implements widely-used evaluation metrics for text detection, text recognition and key information extraction tasks based on the {external+mmengine:doc}`MMEngine: BaseMetric ` base class. Users can specify the metric used in the validation and test phases by modifying the `val_evaluator` and `test_evaluator` fields in the configuration file. For example, the following config shows how to use `HmeanIOUMetric` to evaluate the model performance in text detection task.
+
+```python
+val_evaluator = dict(type='HmeanIOUMetric')
+test_evaluator = val_evaluator
+
+# In addition, MMOCR also supports the combined evaluation of multiple metrics for the same task, such as using WordMetric and CharMetric at the same time
+val_evaluator = [
+ dict(type='WordMetric', mode=['exact', 'ignore_case', 'ignore_case_symbol']),
+ dict(type='CharMetric')
+]
+```
+
+```{tip}
+More evaluation related configurations can be found in the [evaluation configuration tutorial](../user_guides/config.md#evaluation-configuration).
+```
+
+As shown in the following table, MMOCR currently supports 5 evaluation metrics for text detection, text recognition, and key information extraction tasks, including `HmeanIOUMetric`, `WordMetric`, `CharMetric`, `OneMinusNEDMetric`, and `F1Metric`.
+
+| | | | |
+| --------------------------------------- | ------- | ------------------------------------------------- | --------------------------------------------------------------------- |
+| Metric | Task | Input Field | Output Field |
+| [HmeanIOUMetric](#hmeanioumetric) | TextDet | `pred_polygons` `pred_scores` `gt_polygons` | `recall` `precision` `hmean` |
+| [WordMetric](#wordmetric) | TextRec | `pred_text` `gt_text` | `word_acc` `word_acc_ignore_case` `word_acc_ignore_case_symbol` |
+| [CharMetric](#charmetric) | TextRec | `pred_text` `gt_text` | `char_recall` `char_precision` |
+| [OneMinusNEDMetric](#oneminusnedmetric) | TextRec | `pred_text` `gt_text` | `1-N.E.D` |
+| [F1Metric](#f1metric) | KIE | `pred_labels` `gt_labels` | `macro_f1` `micro_f1` |
+
+In general, the evaluation metric used in each task is conventionally determined. Users usually do not need to understand or manually modify the internal implementation of the evaluation metric. However, to facilitate more customized requirements, this document will further introduce the specific implementation details and configurable parameters of the built-in metrics in MMOCR.
+
+### HmeanIOUMetric
+
+[HmeanIOUMetric](mmocr.evaluation.metrics.hmean_iou_metric.HmeanIOUMetric) is one of the most widely used evaluation metrics in text detection tasks, because it calculates the harmonic mean (H-mean) between the detection precision (P) and recall rate (R). The `HmeanIOUMetric` can be calculated by the following equation:
+
+```{math}
+H = \frac{2}{\frac{1}{P} + \frac{1}{R}} = \frac{2PR}{P+R}
+```
+
+In addition, since it is equivalent to the F-score (also known as F-measure or F-metric) when {math}`\beta = 1`, `HmeanIOUMetric` is sometimes written as `F1Metric` or `f1-score`:
+
+```{math}
+F_1=(1+\beta^2)\cdot\frac{PR}{\beta^2\cdot P+R} = \frac{2PR}{P+R}
+```
+
+In MMOCR, the calculation of `HmeanIOUMetric` can be summarized as the following steps:
+
+1. Filter out invalid predictions
+
+ - Filter out predictions with a score is lower than `pred_score_thrs`
+ - Filter out predictions overlapping with `ignored` ground truth boxes with an overlap ratio higher than `ignore_precision_thr`
+
+ It is worth noting that `pred_score_thrs` will **automatically search** for the **best threshold** within a certain range by default, and users can also customize the search range by manually modifying the configuration file:
+
+ ```python
+ # By default, HmeanIOUMetric searches the best threshold within the range [0.3, 0.9] with a step size of 0.1
+ val_evaluator = dict(type='HmeanIOUMetric', pred_score_thrs=dict(start=0.3, stop=0.9, step=0.1))
+ ```
+
+2. Calculate the IoU matrix
+
+ - At the data processing stage, `HmeanIOUMetric` will calculate and maintain an {math}`M \times N` IoU matrix `iou_metric` for the convenience of the subsequent bounding box pairing step. Here, M and N represent the number of label bounding boxes and filtered prediction bounding boxes, respectively. Therefore, each element of this matrix stores the IoU between the m-th label bounding box and the n-th prediction bounding box.
+
+3. Compute the number of GT samples that can be accurately matched based on the corresponding pairing strategy
+
+ Although `HmeanIOUMetric` can be calculated by a fixed formula, there may still be some subtle differences in the specific implementations. These differences mainly reflect the use of different strategies to match gt and predicted bounding boxes, which leads to the difference in final scores. Currently, MMOCR supports two matching strategies, namely `vanilla` and `max_matching`, for the `HmeanIOUMetric`. As shown below, users can specify the matching strategies in the config.
+
+ - `vanilla` matching strategy
+
+ By default, `HmeanIOUMetric` adopts the `vanilla` matching strategy, which is consistent with the `hmean-iou` implementation in MMOCR 0.x and the **official** text detection competition evaluation standard of ICDAR series. The matching strategy adopts the first-come-first-served matching method to pair the labels and predictions.
+
+ ```python
+ # By default, HmeanIOUMetric adopts 'vanilla' matching strategy
+ val_evaluator = dict(type='HmeanIOUMetric')
+ ```
+
+ - `max_matching` matching strategy
+
+ To address the shortcomings of the existing matching mechanism, MMOCR has implemented a more efficient matching strategy to maximize the number of matches.
+
+ ```python
+ # Specify to use 'max_matching' matching strategy
+ val_evaluator = dict(type='HmeanIOUMetric', strategy='max_matching')
+ ```
+
+ ```{note}
+ We recommend that research-oriented developers use the default `vanilla` matching strategy to ensure consistency with other papers. For industry-oriented developers, you can use the `max_matching` matching strategy to achieve optimized performance.
+ ```
+
+4. Compute the final evaluation score according to the aforementioned matching strategy
+
+### WordMetric
+
+[WordMetric](mmocr.evaluation.metrics.recog_metric.WordMetric) implements **word-level** text recognition evaluation metrics and includes three text matching modes, namely `exact`, `ignore_case`, and `ignore_case_symbol`. Users can freely combine the output of one or more text matching modes in the configuration file by modifying the `mode` field.
+
+```python
+# Use WordMetric for text recognition task
+val_evaluator = [
+ dict(type='WordMetric', mode=['exact', 'ignore_case', 'ignore_case_symbol'])
+]
+```
+
+- `exact`๏ผFull matching mode, i.e., only when the predicted text and the ground truth text are exactly the same, the predicted text is considered to be correct.
+- `ignore_case`๏ผThe mode ignores the case of the predicted text and the ground truth text.
+- `ignore_case_symbol`๏ผThe mode ignores the case and symbols of the predicted text and the ground truth text. This is also the text recognition accuracy reported by most academic papers. The performance reported by MMOCR uses the `ignore_case_symbol` mode by default.
+
+Assume that the real label is `MMOCR!` and the model output is `mmocr`. The `WordMetric` scores under the three matching modes are: `{'exact': 0, 'ignore_case': 0, 'ignore_case_symbol': 1}`.
+
+### CharMetric
+
+[CharMetric](mmocr.evaluation.metrics.recog_metric.CharMetric) implements **character-level** text recognition evaluation metrics that are **case-insensitive**.
+
+```python
+# Use CharMetric for text recognition task
+val_evaluator = [dict(type='CharMetric')]
+```
+
+Specifically, `CharMetric` will output two evaluation metrics, namely `char_precision` and `char_recall`. Let the number of correctly predicted characters (True Positive) be {math}`\sigma_{tp}`, then the precision *P* and recall *R* can be calculated by the following equation:
+
+```{math}
+P=\frac{\sigma_{tp}}{\sigma_{pred}}, R = \frac{\sigma_{tp}}{\sigma_{gt}}
+```
+
+where {math}`\sigma_{gt}` and {math}`\sigma_{pred}` represent the total number of characters in the label text and the predicted text, respectively.
+
+For example, assume that the label text is "MM**O**CR" and the predicted text is "mm**0**cR**1**". The score of the `CharMetric` is:
+
+```{math}
+P=\frac{4}{6}, R=\frac{4}{5}
+```
+
+### OneMinusNEDMetric
+
+[OneMinusNEDMetric(1-N.E.D)](mmocr.evaluation.metrics.recog_metric.OneMinusNEDMetric) is commonly used for text recognition evaluation of Chinese or English **text line-level** annotations. Unlike the full matching metric that requires the prediction and the gt text to be exactly the same, `1-N.E.D` uses the normalized [edit distance](https://en.wikipedia.org/wiki/Edit_distance) (also known as Levenshtein Distance) to measure the difference between the predicted and the gt text, so that the performance difference of the model can be better distinguished when evaluating long texts. Assume that the real and predicted texts are {math}`s_i` and {math}`\hat{s_i}`, respectively, and their lengths are {math}`l_{i}` and {math}`\hat{l_i}`, respectively. The `OneMinusNEDMetric` score can be calculated by the following formula:
+
+```{math}
+score = 1 - \frac{1}{N}\sum_{i=1}^{N}\frac{D(s_i, \hat{s_{i}})}{max(l_{i},\hat{l_{i}})}
+```
+
+where *N* is the total number of samples, and {math}`D(s_1, s_2)` is the edit distance between two strings.
+
+For example, assume that the real label is "OpenMMLabMMOCR", the prediction of model A is "0penMMLabMMOCR", and the prediction of model B is "uvwxyz". The results of the full matching and `OneMinusNEDMetric` evaluation metrics are as follows:
+
+| | | |
+| ------- | ---------- | ---------- |
+| | Full-match | 1 - N.E.D. |
+| Model A | 0 | 0.92857 |
+| Model B | 0 | 0 |
+
+As shown in the table above, although the model A only predicted one letter incorrectly, both models got 0 in when using full-match strategy. However, the `OneMinusNEDMetric` evaluation metric can better distinguish the performance of the two models on **long texts**.
+
+### F1Metric
+
+[F1Metric](mmocr.evaluation.metrics.f_metric.F1Metric) implements the F1-Metric evaluation metric for KIE tasks and provides two modes, namely `micro` and `macro`.
+
+```python
+val_evaluator = [
+ dict(type='F1Metric', mode=['micro', 'macro'],
+]
+```
+
+- `micro` mode: Calculate the global F1-Metric score based on the total number of True Positive, False Negative, and False Positive.
+
+- `macro` mode๏ผCalculate the F1-Metric score for each class and then take the average.
+
+### Customized Metric
+
+MMOCR supports the implementation of customized evaluation metrics for users who pursue higher customization. In general, users only need to create a customized evaluation metric class `CustomizedMetric` and inherit {external+mmengine:doc}`MMEngine: BaseMetric `. Then, the data format processing method `process` and the metric calculation method `compute_metrics` need to be overwritten respectively. Finally, add it to the `METRICS` registry to implement any customized evaluation metric.
+
+```python
+from mmengine.evaluator import BaseMetric
+from mmocr.registry import METRICS
+
+@METRICS.register_module()
+class CustomizedMetric(BaseMetric):
+
+ def process(self, data_batch: Sequence[Dict], predictions: Sequence[Dict]):
+ """ process receives two parameters, data_batch stores the gt label information, and predictions stores the predicted results.
+ """
+ pass
+
+ def compute_metrics(self, results: List):
+ """ compute_metric receives the results of the process method as input and returns the evaluation results.
+ """
+ pass
+```
+
+```{note}
+More details can be found in {external+mmengine:doc}`MMEngine Documentation: BaseMetric `.
+```
diff --git a/mmocr-dev-1.x/docs/en/basic_concepts/models.md b/mmocr-dev-1.x/docs/en/basic_concepts/models.md
new file mode 100644
index 0000000000000000000000000000000000000000..7eab561e7276af01c63ca7ae8c1452c5c6317c25
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/basic_concepts/models.md
@@ -0,0 +1,3 @@
+# Models\[coming soon\]
+
+Coming Soon!
diff --git a/mmocr-dev-1.x/docs/en/basic_concepts/overview.md b/mmocr-dev-1.x/docs/en/basic_concepts/overview.md
new file mode 100644
index 0000000000000000000000000000000000000000..9e31fefa5fc8cc9e7f86be30b18f0b62aa1c85d5
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/basic_concepts/overview.md
@@ -0,0 +1,3 @@
+# Overview & Features\[coming soon\]
+
+Coming Soon!
diff --git a/mmocr-dev-1.x/docs/en/basic_concepts/structures.md b/mmocr-dev-1.x/docs/en/basic_concepts/structures.md
new file mode 100644
index 0000000000000000000000000000000000000000..0f73a77286457a4f1360fcb8cd83353202ea786d
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/basic_concepts/structures.md
@@ -0,0 +1,219 @@
+# Data Structures and Elements
+
+MMOCR uses {external+mmengine:doc}`MMEngine: Abstract Data Element ` to encapsulate the data required for each task into `data_sample`. The base class has implemented basic add/delete/update/check functions and supports data migration between different devices, as well as dictionary-like and tensor-like operations, which also allows the interfaces of different algorithms to be unified.
+
+Thanks to the unified data structures, the data flow between each module in the algorithm libraries, such as [`visualizer`](./visualizers.md), [`evaluator`](./evaluation.md), [`dataset`](./datasets.md), is greatly simplified. In MMOCR, we have the following conventions for different data types.
+
+- **xxxData**: Single granularity data annotation or model output. Currently MMEngine has three built-in granularities of {external+mmengine:doc}`data elements `, including instance-level data (`InstanceData`), pixel-level data (`PixelData`) and image-level label data (`LabelData`). Among the tasks currently supported by MMOCR, text detection and key information extraction tasks use `InstanceData` to encapsulate the bounding boxes and the corresponding box label, while the text recognition task uses `LabelData` to encapsulate the text content.
+- **xxxDataSample**: inherited from {external+mmengine:doc}`MMEngine: Base Data Element `, used to hold **all** annotation and prediction information that required by a single task. For example, [`TextDetDataSample`](mmocr.structures.textdet_data_sample.TextDetDataSample) for the text detection, [`TextRecogDataSample`](mmocr.structures.textrecog_data_sample.TextRecogDataSample) for text recognition, and [`KIEDataSample`](mmocr.structures.kie_data_sample.KIEDataSample) for the key information extraction task.
+
+In the following, we will introduce the practical application of data elements **xxxData** and data samples **xxxDataSample** in MMOCR, respectively.
+
+## Data Elements - xxxData
+
+`InstanceData` and `LabelData` are the `BaseDataElement` defined in `MMEngine` to encapsulate different granularity of annotation data or model output. In MMOCR, we have used `InstanceData` and `LabelData` for encapsulating the data types actually used in OCR-related tasks.
+
+### InstanceData
+
+In the **text detection** task, the detector concentrate on instance-level text samples, so we use `InstanceData` to encapsulate the data needed for this task. Typically, its required training annotation and prediction output contain rectangular or polygonal bounding boxes, as well as bounding box labels. Since the text detection task has only one positive sample class, "text", in MMOCR we use `0` to number this class by default. The following code example shows how to use the `InstanceData` to encapsulate the data used in the text detection task.
+
+```python
+import torch
+from mmengine.structures import InstanceData
+
+# defining gt_instance for encapsulating the ground truth data
+gt_instance = InstanceData()
+gt_instance.bbox = torch.Tensor([[0, 0, 10, 10], [10, 10, 20, 20]])
+gt_instance.polygons = torch.Tensor([[[0, 0], [10, 0], [10, 10], [0, 10]],
+ [[10, 10], [20, 10], [20, 20], [10, 20]]])
+gt_instance.label = torch.Tensor([0, 0])
+
+# defining pred_instance for encapsulating the prediction data
+pred_instances = InstanceData()
+pred_polygons, scores = model(input)
+pred_instances.polygons = pred_polygons
+pred_instances.scores = scores
+```
+
+The conventions for the fields in `InstanceData` in MMOCR are shown in the table below. It is important to note that the length of each field in `InstanceData` must be equal to the number of instances `N` in the sample.
+
+| | | |
+| ----------- | ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Field | Type | Description |
+| bboxes | `torch.FloatTensor` | Bounding boxes `[x1, y1, x2, y2]` with the shape `(N, 4)`. |
+| labels | `torch.LongTensor` | Instance label with the shape `(N, )`. By default, MMOCR uses `0` to represent the "text" class. |
+| polygons | `list[np.array(dtype=np.float32)]` | Polygonal bounding boxes with the shape `(N, )`. |
+| scores | `torch.Tensor` | Confidence scores of the predictions of bounding boxes. `(N, )`. |
+| ignored | `torch.BoolTensor` | Whether to ignore the current sample with the shape `(N, )`. |
+| texts | `list[str]` | The text content of each instance with the shape `(N, )`๏ผused for e2e text spotting or KIE task. |
+| text_scores | `torch.FloatTensor` | Confidence score of the predictions of text contents with the shape `(N, )`๏ผused for e2e text spotting task. |
+| edge_labels | `torch.IntTensor` | The node adjacency matrix with the shape `(N, N)`. In KIE, the optional values for the state between nodes are `-1` (ignored, not involved in loss calculation)๏ผ`0` (disconnected) and `1`(connected). |
+| edge_scores | `torch.FloatTensor` | The prediction confidence of each edge in the KIE task, with the shape `(N, N)`. |
+
+### LabelData
+
+For **text recognition** tasks, both labeled content and predicted content are wrapped using `LabelData`.
+
+```python
+import torch
+from mmengine.data import LabelData
+
+# defining gt_text for encapsulating the ground truth data
+gt_text = LabelData()
+gt_text.item = 'MMOCR'
+
+# defining pred_text for encapsulating the prediction data
+pred_text = LabelData()
+index, score = model(input)
+text = dictionary.idx2str(index)
+pred_text.score = score
+pred_text.item = text
+```
+
+The conventions for the `LabelData` fields in MMOCR are shown in the following table.
+
+| | | |
+| -------------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Field | Type | Description |
+| item | `str` | Text content. |
+| score | `list[float]` | Confidence socre of the predicted text. |
+| indexes | `torch.LongTensor` | A sequence of text characters encoded by [dictionary](../basic_concepts/models.md#dictionary) and containing all special characters except ``. |
+| padded_indexes | `torch.LongTensor` | If the length of indexes is less than the maximum sequence length and `pad_idx` exists, this field holds the encoded text sequence padded to the maximum sequence length of `max_seq_len`. |
+
+## DataSample xxxDataSample
+
+By defining a uniform data structure, we can easily encapsulate the annotation data and prediction results in a unified way, making data transfer between different modules of the code base easier. In MMOCR, we have designed three data structures based on the data needed in three tasks: [`TextDetDataSample`](mmocr.structures.textdet_data_sample.TextDetDataSample), [`TextRecogDataSample`](mmocr.structures.textrecog_data_sample.TextRecogDataSample), and [`KIEDataSample`](mmocr.structures.kie_data_sample.KIEDataSample). These data structures all inherit from {external+mmengine:doc}`MMEngine: Base Data Element `, which is used to hold all annotation and prediction information required by each task.
+
+### Text Detection - TextDetDataSample
+
+[TextDetDataSample](mmocr.structures.textdet_data_sample.TextDetDataSample) is used to encapsulate the data needed for the text detection task. It contains two main fields `gt_instances` and `pred_instances`, which are used to store the annotation information and prediction results respectively.
+
+| | | |
+| -------------- | ------------------------------- | ----------------------- |
+| Field | Type | Description |
+| gt_instances | [`InstanceData`](#instancedata) | Annotation information. |
+| pred_instances | [`InstanceData`](#instancedata) | Prediction results. |
+
+The fields of [`InstanceData`](#instancedata) that will be used are:
+
+| | | |
+| -------- | ---------------------------------- | ------------------------------------------------------------------------------------------------ |
+| Field | Type | Description |
+| bboxes | `torch.FloatTensor` | Bounding boxes `[x1, y1, x2, y2]` with the shape `(N, 4)`. |
+| labels | `torch.LongTensor` | Instance label with the shape `(N, )`. By default, MMOCR uses `0` to represent the "text" class. |
+| polygons | `list[np.array(dtype=np.float32)]` | Polygonal bounding boxes with the shape `(N, )`. |
+| scores | `torch.Tensor` | Confidence scores of the predictions of bounding boxes. `(N, )`. |
+| ignored | `torch.BoolTensor` | Boolean flags with the shape `(N, )`, indicating whether to ignore the current sample. |
+
+Since text detection models usually only output one of the bboxes/polygons, we only need to make sure that one of these two is assigned a value.
+
+The following sample code demonstrates the use of `TextDetDataSample`.
+
+```python
+import torch
+from mmengine.data import TextDetDataSample
+
+data_sample = TextDetDataSample()
+# Define the ground truth data
+img_meta = dict(img_shape=(800, 1196, 3), pad_shape=(800, 1216, 3))
+gt_instances = InstanceData(metainfo=img_meta)
+gt_instances.bboxes = torch.rand((5, 4))
+gt_instances.labels = torch.zeros((5,), dtype=torch.long)
+data_sample.gt_instances = gt_instances
+
+# Define the prediction data
+pred_instances = InstanceData()
+pred_instances.bboxes = torch.rand((5, 4))
+pred_instances.labels = torch.zeros((5,), dtype=torch.long)
+data_sample.pred_instances = pred_instances
+```
+
+### Text Recognition - TextRecogDataSample
+
+[`TextRecogDataSample`](mmocr.structures.textrecog_data_sample.TextRecogDataSample) is used to encapsulate the data for the text recognition task. It has two fields, `gt_text` and `pred_text` , which are used to store annotation information and prediction results, respectively.
+
+| | | |
+| --------- | ------------------------------------------ | ------------------- |
+| Field | Type | Description |
+| gt_text | [`LabelData`](#text-recognition-labeldata) | Label information. |
+| pred_text | [`LabelData`](#text-recognition-labeldata) | Prediction results. |
+
+The following sample code demonstrates the use of [`TextRecogDataSample`](mmocr.structures.textrecog_data_sample.TextRecogDataSample).
+
+```python
+import torch
+from mmengine.data import TextRecogDataSample
+
+data_sample = TextRecogDataSample()
+# Define the ground truth data
+img_meta = dict(img_shape=(800, 1196, 3), pad_shape=(800, 1216, 3))
+gt_text = LabelData(metainfo=img_meta)
+gt_text.item = 'mmocr'
+data_sample.gt_text = gt_text
+
+# Define the prediction data
+pred_text = LabelData(metainfo=img_meta)
+pred_text.item = 'mmocr'
+data_sample.pred_text = pred_text
+```
+
+The fields of `LabelData` that will be used are:
+
+| | | |
+| -------------- | ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Field | Type | Description |
+| item | `list[str]` | The text corresponding to the instance, of length (N, ), for end-to-end OCR tasks and KIE |
+| score | `torch.FloatTensor` | Confidence of the text prediction, of length (N, ), for the end-to-end OCR task |
+| indexes | `torch.LongTensor` | A sequence of text characters encoded by [dictionary](../basic_concepts/models.md#dictionary) and containing all special characters except ``. |
+| padded_indexes | `torch.LongTensor` | If the length of indexes is less than the maximum sequence length and `pad_idx` exists, this field holds the encoded text sequence padded to the maximum sequence length of `max_seq_len`. |
+
+### Key Information Extraction - KIEDataSample
+
+[`KIEDataSample`](mmocr.structures.kie_data_sample.KIEDataSample) is used to encapsulate the data needed for the KIE task. It also contains two fields, `gt_instances` and `pred_instances`, which are used to store annotation information and prediction results respectively.
+
+| | | |
+| -------------- | ---------------------------------------------- | ----------------------- |
+| Field | Type | Description |
+| gt_instances | [`InstanceData`](#text-detection-instancedata) | Annotation information. |
+| pred_instances | [`InstanceData`](#text-detection-instancedata) | Prediction results. |
+
+The [`InstanceData`](#text-detection-instancedata) fields that will be used by this task are shown in the following table.
+
+| | | |
+| ----------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Field | Type | Description |
+| bboxes | `torch.FloatTensor` | Bounding boxes `[x1, y1, x2, y2]` with the shape `(N, 4)`. |
+| labels | `torch.LongTensor` | Instance label with the shape `(N, )`. |
+| texts | `list[str]` | The text content of each instance with the shape `(N, )`๏ผused for e2e text spotting or KIE task. |
+| edge_labels | `torch.IntTensor` | The node adjacency matrix with the shape `(N, N)`. In the KIE task, the optional values for the state between nodes are `-1` (ignored, not involved in loss calculation)๏ผ`0` (disconnected) and `1`(connected). |
+| edge_scores | `torch.FloatTensor` | The prediction confidence of each edge in the KIE task, with the shape `(N, N)`. |
+| scores | `torch.FloatTensor` | The confidence scores for node label predictions, with the shape `(N,)`. |
+
+```{warning}
+Since there is no unified standard for model implementation of KIE tasks, the design currently considers only [SDMGR](../../../configs/kie/sdmgr/README.md) model usage scenarios. Therefore, the design is subject to change as we support more KIE models.
+```
+
+The following sample code shows the use of [`KIEDataSample`](mmocr.structures.kie_data_sample.KIEDataSample).
+
+```python
+import torch
+from mmengine.data import KIEDataSample
+
+data_sample = KIEDataSample()
+# Define the ground truth data
+img_meta = dict(img_shape=(800, 1196, 3),pad_shape=(800, 1216, 3))
+gt_instances = InstanceData(metainfo=img_meta)
+gt_instances.bboxes = torch.rand((5, 4))
+gt_instances.labels = torch.zeros((5,), dtype=torch.long)
+gt_instances.texts = ['text1', 'text2', 'text3', 'text4', 'text5']
+gt_instances.edge_lebels = torch.randint(-1, 2, (5, 5))
+data_sample.gt_instances = gt_instances
+
+# Define the prediction data
+pred_instances = InstanceData()
+pred_instances.bboxes = torch.rand((5, 4))
+pred_instances.labels = torch.rand((5,))
+pred_instances.edge_labels = torch.randint(-1, 2, (10, 10))
+pred_instances.edge_scores = torch.rand((10, 10))
+data_sample.pred_instances = pred_instances
+```
diff --git a/mmocr-dev-1.x/docs/en/basic_concepts/transforms.md b/mmocr-dev-1.x/docs/en/basic_concepts/transforms.md
new file mode 100644
index 0000000000000000000000000000000000000000..0a19208156b4b6b0e7ad3b7eaf9014e09e586d00
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/basic_concepts/transforms.md
@@ -0,0 +1,226 @@
+# Data Transforms and Pipeline
+
+In the design of MMOCR, dataset construction and preparation are decoupled. That is, dataset construction classes such as [`OCRDataset`](mmocr.datasets.ocr_dataset.OCRDataset) are responsible for loading and parsing annotation files; while data transforms further apply data preprocessing, augmentation, formatting, and other related functions. Currently, there are five types of data transforms implemented in MMOCR, as shown in the following table.
+
+| | | |
+| -------------------------------- | --------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
+| Transforms Type | File | Description |
+| Data Loading | loading.py | Implemented the data loading functions. |
+| Data Formatting | formatting.py | Formatting the data required by different tasks. |
+| Cross Project Data Adapter | adapters.py | Converting the data format between other OpenMMLab projects and MMOCR. |
+| Data Augmentation Functions | ocr_transforms.py textdet_transforms.py textrecog_transforms.py | Various built-in data augmentation methods designed for different tasks. |
+| Wrappers of Third Party Packages | wrappers.py | Wrapping the transforms implemented in popular third party packages such as [ImgAug](https://github.com/aleju/imgaug), and adapting them to MMOCR format. |
+
+Since each data transform class is independent of each other, we can easily combine any data transforms to build a data pipeline after we have defined the data fields. As shown in the following figure, in MMOCR, a typical training data pipeline consists of three stages: **data loading**, **data augmentation**, and **data formatting**. Users only need to define the data pipeline list in the configuration file and specify the specific data transform class and its parameters:
+
+
+
+```python
+train_pipeline_r18 = [
+ # Loading images
+ dict(
+ type='LoadImageFromFile',
+ color_type='color_ignore_orientation'),
+ # Loading annotations
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ ),
+ # Data augmentation
+ dict(
+ type='ImgAugWrapper',
+ args=[['Fliplr', 0.5],
+ dict(cls='Affine', rotate=[-10, 10]), ['Resize', [0.5, 3.0]]]),
+ dict(type='RandomCrop', min_side_ratio=0.1),
+ dict(type='Resize', scale=(640, 640), keep_ratio=True),
+ dict(type='Pad', size=(640, 640)),
+ # Data formatting
+ dict(
+ type='PackTextDetInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape'))
+]
+```
+
+```{tip}
+More tutorials about data pipeline configuration can be found in the [Config Doc](../user_guides/config.md#data-pipeline-configuration). Next, we will briefly introduce the data transforms supported in MMOCR according to their categories.
+```
+
+For each data transform, MMOCR provides a detailed docstring. For example, in the header of each data transform class, we annotate `Required Keys`, `Modified Keys` and `Added Keys`. The `Required Keys` represent the mandatory fields that should be included in the input required by the data transform, while the `Modified Keys` and `Added Keys` indicate that the transform may modify or add the fields into the original data. For example, `LoadImageFromFile` implements the image loading function, whose `Required Keys` is the image path `img_path`, and the `Modified Keys` includes the loaded image `img`, the current size of the image `img_shape`, the original size of the image `ori_shape`, and other image attributes.
+
+```python
+@TRANSFORMS.register_module()
+class LoadImageFromFile(MMCV_LoadImageFromFile):
+ # We provide detailed docstring for each data transform.
+ """Load an image from file.
+
+ Required Keys:
+
+ - img_path
+
+ Modified Keys:
+
+ - img
+ - img_shape
+ - ori_shape
+ """
+```
+
+```{note}
+In the data pipeline of MMOCR, the image and label information are saved in a dictionary. By using the unified fields, the data can be freely transferred between different data transforms. Therefore, it is very important to understand the conventional fields used in MMOCR.
+```
+
+For your convenience, the following table lists the conventional keys used in MMOCR data transforms.
+
+| | | |
+| ---------------- | --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Key | Type | Description |
+| img | `np.array(dtype=np.uint8)` | Image array, shape of `(h, w, c)`. |
+| img_shape | `tuple(int, int)` | Current image size `(h, w)`. |
+| ori_shape | `tuple(int, int)` | Original image size `(h, w)`. |
+| scale | `tuple(int, int)` | Stores the target image size `(h, w)` specified by the user in the `Resize` data transform series. Note: This value may not correspond to the actual image size after the transformation. |
+| scale_factor | `tuple(float, float)` | Stores the target image scale factor `(w_scale, h_scale)` specified by the user in the `Resize` data transform series. Note: This value may not correspond to the actual image size after the transformation. |
+| keep_ratio | `bool` | Boolean flag determines whether to keep the aspect ratio while scaling images. |
+| flip | `bool` | Boolean flags to indicate whether the image has been flipped. |
+| flip_direction | `str` | Flipping direction, options are `horizontal`, `vertical`, `diagonal`. |
+| gt_bboxes | `np.array(dtype=np.float32)` | Ground-truth bounding boxes. |
+| gt_polygons | `list[np.array(dtype=np.float32)` | Ground-truth polygons. |
+| gt_bboxes_labels | `np.array(dtype=np.int64)` | Category label of bounding boxes. By default, MMOCR uses `0` to represent "text" instances. |
+| gt_texts | `list[str]` | Ground-truth text content of the instance. |
+| gt_ignored | `np.array(dtype=np.bool_)` | Boolean flag indicating whether ignoring the instance (used in text detection). |
+
+## Data Loading
+
+Data loading transforms mainly implement the functions of loading data from different formats and backends. Currently, the following data loading transforms are implemented in MMOCR:
+
+| | | | |
+| ------------------ | --------------------------------------------------------- | -------------------------------------------------------------- | --------------------------------------------------------------- |
+| Transforms Name | Required Keys | Modified/Added Keys | Description |
+| LoadImageFromFile | `img_path` | `img` `img_shape` `ori_shape` | Load image from the specified path๏ผsupporting different file storage backends (e.g. `disk`, `http`, `petrel`) and decoding backends (e.g. `cv2`, `turbojpeg`, `pillow`, `tifffile`). |
+| LoadOCRAnnotations | `bbox` `bbox_label` `polygon` `ignore` `text` | `gt_bboxes` `gt_bboxes_labels` `gt_polygons` `gt_ignored` `gt_texts` | Parse the annotation required by OCR task. |
+| LoadKIEAnnotations | `bboxes` `bbox_labels` `edge_labels` `texts` | `gt_bboxes` `gt_bboxes_labels` `gt_edge_labels` `gt_texts` `ori_shape` | Parse the annotation required by KIE task. |
+
+## Data Augmentation
+
+Data augmentation is an indispensable process in text detection and recognition tasks. Currently, MMOCR has implemented dozens of data augmentation modules commonly used in OCR fields, which are classified into [ocr_transforms.py](/mmocr/datasets/transforms/ocr_transforms.py), [textdet_transforms.py](/mmocr/datasets/transforms/textdet_transforms.py), and [textrecog_transforms.py](/mmocr/datasets/transforms/textrecog_transforms.py).
+
+Specifically, `ocr_transforms.py` implements generic OCR data augmentation modules such as `RandomCrop` and `RandomRotate`:
+
+| | | | |
+| --------------- | ------------------------------------------------------------- | -------------------------------------------------------------- | -------------------------------------------------------------- |
+| Transforms Name | Required Keys | Modified/Added Keys | Description |
+| RandomCrop | `img` `gt_bboxes` `gt_bboxes_labels` `gt_polygons` `gt_ignored` `gt_texts` (optional) | `img` `img_shape` `gt_bboxes` `gt_bboxes_labels` `gt_polygons` `gt_ignored` `gt_texts` (optional) | Randomly crop the image and make sure the cropped image contains at least one text instance. The optional parameter is `min_side_ratio`, which controls the ratio of the short side of the cropped image to the original image, the default value is `0.4`. |
+| RandomRotate | `img` `img_shape` `gt_bboxes` (optional) `gt_polygons` (optional) | `img` `img_shape` `gt_bboxes` (optional) `gt_polygons` (optional) `rotated_angle` | Randomly rotate the image and optionally fill the blank areas of the rotated image. |
+| | | | |
+
+`textdet_transforms.py` implements text detection related data augmentation modules:
+
+| | | | |
+| ----------------- | ------------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
+| Transforms Name | Required Keys | Modified/Added Keys | Description |
+| RandomFlip | `img` `gt_bboxes` `gt_polygons` | `img` `gt_bboxes` `gt_polygons` `flip` `flip_direction` | Random flip, support `horizontal`, `vertical` and `diagonal` modes. Defaults to `horizontal`. |
+| FixInvalidPolygon | `gt_polygons` `gt_ignored` | `gt_polygons` `gt_ignored` | Automatically fixing the invalid polygons included in the annotations. |
+
+`textrecog_transforms.py` implements text recognition related data augmentation modules:
+
+| | | | |
+| --------------- | ------------- | ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
+| Transforms Name | Required Keys | Modified/Added Keys | Description |
+| RescaleToHeight | `img` | `img` `img_shape` `scale` `scale_factor` `keep_ratio` | Scales the image to the specified height while keeping the aspect ratio. When `min_width` and `max_width` are specified, the aspect ratio may be changed. |
+| | | | |
+
+```{warning}
+The above table only briefly introduces some selected data augmentation methods, for more information please refer to the [API documentation](../api.rst) or the code docstrings.
+```
+
+## Data Formatting
+
+Data formatting transforms are responsible for packaging images, ground truth labels, and other information into a dictionary. Different tasks usually rely on different formatting transforms. For example:
+
+| | | | |
+| ------------------- | ------------- | ------------------- | --------------------------------------------- |
+| Transforms Name | Required Keys | Modified/Added Keys | Description |
+| PackTextDetInputs | - | - | Pack the inputs required by text detection. |
+| PackTextRecogInputs | - | - | Pack the inputs required by text recognition. |
+| PackKIEInputs | - | - | Pack the inputs required by KIE. |
+
+## Cross Project Data Adapters
+
+The cross-project data adapters bridge the data formats between MMOCR and other OpenMMLab libraries such as [MMDetection](https://github.com/open-mmlab/mmdetection), making it possible to call models implemented in other OpenMMLab projects. Currently, MMOCR has implemented [`MMDet2MMOCR`](mmocr.datasets.transforms.MMDet2MMOCR) and [`MMOCR2MMDet`](mmocr.datasets.transforms.MMOCR2MMDet), allowing data to be converted between MMDetection and MMOCR formats; with these adapters, users can easily train any detectors supported by MMDetection in MMOCR. For example, we provide a [tutorial](#todo) to show how to train Mask R-CNN as a text detector in MMOCR.
+
+| | | | |
+| --------------- | -------------------------------------------- | ----------------------------- | ------------------------------------------ |
+| Transforms Name | Required Keys | Modified/Added Keys | Description |
+| MMDet2MMOCR | `gt_masks` `gt_ignore_flags` | `gt_polygons` `gt_ignored` | Convert the fields used in MMDet to MMOCR. |
+| MMOCR2MMDet | `img_shape` `gt_polygons` `gt_ignored` | `gt_masks` `gt_ignore_flags` | Convert the fields used in MMOCR to MMDet. |
+
+## Wrappers
+
+To facilitate the use of popular third-party CV libraries in MMOCR, we provide wrappers in `wrappers.py` to unify the data format between MMOCR and other third-party libraries. Users can directly configure the data transforms provided by these libraries in the configuration file of MMOCR. The supported wrappers are as follows:
+
+| | | | |
+| ------------------ | ------------------------------------------------------------ | ------------------------------------------------------------- | ------------------------------------------------------------- |
+| Transforms Name | Required Keys | Modified/Added Keys | Description |
+| ImgAugWrapper | `img` `gt_polygons` (optional for text recognition) `gt_bboxes` (optional for text recognition) `gt_bboxes_labels` (optional for text recognition) `gt_ignored` (optional for text recognition) `gt_texts` (optional) | `img` `gt_polygons` (optional for text recognition) `gt_bboxes` (optional for text recognition) `gt_bboxes_labels` (optional for text recognition) `gt_ignored` (optional for text recognition) `img_shape` (optional) `gt_texts` (optional) | [ImgAug](https://github.com/aleju/imgaug) wrapper, which bridges the data format and configuration between ImgAug and MMOCR, allowing users to config the data augmentation methods supported by ImgAug in MMOCR. |
+| TorchVisionWrapper | `img` | `img` `img_shape` | [TorchVision](https://github.com/pytorch/vision) wrapper, which bridges the data format and configuration between TorchVision and MMOCR, allowing users to config the data transforms supported by `torchvision.transforms` in MMOCR. |
+
+### `ImgAugWrapper` Example
+
+For example, in the original ImgAug, we can define a `Sequential` type data augmentation pipeline as follows to perform random flipping, random rotation and random scaling on the image:
+
+```python
+import imgaug.augmenters as iaa
+
+aug = iaa.Sequential(
+ iaa.Fliplr(0.5), # horizontally flip 50% of all images
+ iaa.Affine(rotate=(-10, 10)), # rotate by -10 to +10 degrees
+ iaa.Resize((0.5, 3.0)) # scale images to 50-300% of their size
+)
+```
+
+In MMOCR, we can directly configure the above data augmentation pipeline in `train_pipeline` as follows:
+
+```python
+dict(
+ type='ImgAugWrapper',
+ args=[
+ ['Fliplr', 0.5],
+ dict(cls='Affine', rotate=[-10, 10]),
+ ['Resize', [0.5, 3.0]],
+ ]
+)
+```
+
+Specifically, the `args` parameter accepts a list, and each element in the list can be a list or a dictionary. If it is a list, the first element of the list is the class name in `imgaug.augmenters`, and the following elements are the initialization parameters of the class; if it is a dictionary, the `cls` key corresponds to the class name in `imgaug.augmenters`, and the other key-value pairs correspond to the initialization parameters of the class.
+
+### `TorchVisionWrapper` Example
+
+For example, in the original TorchVision, we can define a `Compose` type data transformation pipeline as follows to perform color jittering on the image:
+
+```python
+import torchvision.transforms as transforms
+
+aug = transforms.Compose([
+ transforms.ColorJitter(
+ brightness=32.0 / 255, # brightness jittering range
+ saturation=0.5) # saturation jittering range
+])
+```
+
+In MMOCR, we can directly configure the above data transformation pipeline in `train_pipeline` as follows:
+
+```python
+dict(
+ type='TorchVisionWrapper',
+ op='ColorJitter',
+ brightness=32.0 / 255,
+ saturation=0.5
+)
+```
+
+Specifically, the `op` parameter is the class name in `torchvision.transforms`, and the following parameters correspond to the initialization parameters of the class.
diff --git a/mmocr-dev-1.x/docs/en/basic_concepts/visualizers.md b/mmocr-dev-1.x/docs/en/basic_concepts/visualizers.md
new file mode 100644
index 0000000000000000000000000000000000000000..bf620e1b7f531a8638242bbea7879f0d4430536f
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/basic_concepts/visualizers.md
@@ -0,0 +1,3 @@
+# Visualizers\[coming soon\]
+
+Coming Soon!
diff --git a/mmocr-dev-1.x/docs/en/conf.py b/mmocr-dev-1.x/docs/en/conf.py
new file mode 100644
index 0000000000000000000000000000000000000000..b406fa6debf1ab5e0a98d0f0d51eef1a8461830e
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/conf.py
@@ -0,0 +1,176 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+
+import os
+import subprocess
+import sys
+
+import pytorch_sphinx_theme
+
+sys.path.insert(0, os.path.abspath('../../'))
+
+# -- Project information -----------------------------------------------------
+
+project = 'MMOCR'
+copyright = '2020-2030, OpenMMLab'
+author = 'OpenMMLab'
+
+# The full version, including alpha/beta/rc tags
+version_file = '../../mmocr/version.py'
+with open(version_file) as f:
+ exec(compile(f.read(), version_file, 'exec'))
+__version__ = locals()['__version__']
+release = __version__
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+ 'sphinx.ext.autodoc',
+ 'sphinx.ext.napoleon',
+ 'sphinx.ext.viewcode',
+ 'sphinx_markdown_tables',
+ 'sphinx_copybutton',
+ 'myst_parser',
+ 'sphinx.ext.intersphinx',
+ 'sphinx.ext.autodoc.typehints',
+ 'sphinx.ext.autosummary',
+ 'sphinx.ext.autosectionlabel',
+ 'sphinx_tabs.tabs',
+]
+autodoc_typehints = 'description'
+autodoc_mock_imports = ['mmcv._ext']
+autosummary_generate = True # Turn on sphinx.ext.autosummary
+
+# Ignore >>> when copying code
+copybutton_prompt_text = r'>>> |\.\.\. '
+copybutton_prompt_is_regexp = True
+
+myst_enable_extensions = ['colon_fence']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+source_suffix = {
+ '.rst': 'restructuredtext',
+ '.md': 'markdown',
+}
+
+# The master toctree document.
+master_doc = 'index'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+#
+# html_theme = 'sphinx_rtd_theme'
+html_theme = 'pytorch_sphinx_theme'
+html_theme_path = [pytorch_sphinx_theme.get_html_theme_path()]
+html_theme_options = {
+ 'logo_url':
+ 'https://mmocr.readthedocs.io/en/dev-1.x/',
+ 'menu': [
+ {
+ 'name':
+ 'Tutorial',
+ 'url':
+ 'https://colab.research.google.com/github/open-mmlab/mmocr/blob/'
+ 'dev-1.x/demo/tutorial.ipynb'
+ },
+ {
+ 'name': 'GitHub',
+ 'url': 'https://github.com/open-mmlab/mmocr'
+ },
+ {
+ 'name':
+ 'Upstream',
+ 'children': [
+ {
+ 'name':
+ 'MMEngine',
+ 'url':
+ 'https://github.com/open-mmlab/mmengine',
+ 'description':
+ 'Foundational library for training deep '
+ 'learning models'
+ },
+ {
+ 'name': 'MMCV',
+ 'url': 'https://github.com/open-mmlab/mmcv',
+ 'description': 'Foundational library for computer vision'
+ },
+ {
+ 'name': 'MMDetection',
+ 'url': 'https://github.com/open-mmlab/mmdetection',
+ 'description': 'Object detection toolbox and benchmark'
+ },
+ ]
+ },
+ ],
+ # Specify the language of shared menu
+ 'menu_lang':
+ 'en'
+}
+
+language = 'en'
+
+master_doc = 'index'
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+html_css_files = [
+ 'https://cdn.datatables.net/1.13.2/css/dataTables.bootstrap5.min.css',
+ 'css/readthedocs.css'
+]
+html_js_files = [
+ 'https://cdn.datatables.net/1.13.2/js/jquery.dataTables.min.js',
+ 'https://cdn.datatables.net/1.13.2/js/dataTables.bootstrap5.min.js',
+ 'js/collapsed.js',
+ 'js/table.js',
+]
+
+myst_heading_anchors = 4
+
+intersphinx_mapping = {
+ 'python': ('https://docs.python.org/3', None),
+ 'numpy': ('https://numpy.org/doc/stable', None),
+ 'torch': ('https://pytorch.org/docs/stable/', None),
+ 'mmcv': ('https://mmcv.readthedocs.io/en/2.x/', None),
+ 'mmengine': ('https://mmengine.readthedocs.io/en/latest/', None),
+ 'mmdetection': ('https://mmdetection.readthedocs.io/en/dev-3.x/', None),
+}
+
+
+def builder_inited_handler(app):
+ subprocess.run(['./merge_docs.sh'])
+ subprocess.run(['./stats.py'])
+ subprocess.run(['./dataset_zoo.py'])
+ subprocess.run(['./project_zoo.py'])
+
+
+def setup(app):
+ app.connect('builder-inited', builder_inited_handler)
diff --git a/mmocr-dev-1.x/docs/en/contact.md b/mmocr-dev-1.x/docs/en/contact.md
new file mode 100644
index 0000000000000000000000000000000000000000..c8a4321e3b1dd21457c82f8823a0a5c5d71e256e
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/contact.md
@@ -0,0 +1,18 @@
+## Welcome to the OpenMMLab community
+
+Scan the QR code below to follow the OpenMMLab team's [**Zhihu Official Account**](https://www.zhihu.com/people/openmmlab) and join the OpenMMLab team's [**QQ Group**](https://jq.qq.com/?_wv=1027&k=aCvMxdr3), or join the official communication WeChat group by adding the WeChat, or join our [**Slack**](https://join.slack.com/t/mmocrworkspace/shared_invite/zt-1ifqhfla8-yKnLO_aKhVA2h71OrK8GZw)
+
+
+
+
+
+We will provide you with the OpenMMLab community
+
+- ๐ข share the latest core technologies of AI frameworks
+- ๐ป Explaining PyTorch common module source Code
+- ๐ฐ News related to the release of OpenMMLab
+- ๐ Introduction of cutting-edge algorithms developed by OpenMMLab
+ ๐ Get the more efficient answer and feedback
+- ๐ฅ Provide a platform for communication with developers from all walks of life
+
+The OpenMMLab community looks forward to your participation! ๐ฌ
diff --git a/mmocr-dev-1.x/docs/en/dataset_zoo.py b/mmocr-dev-1.x/docs/en/dataset_zoo.py
new file mode 100755
index 0000000000000000000000000000000000000000..733dc5cdaff09922f6a52c3405602dff8e28d011
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/dataset_zoo.py
@@ -0,0 +1,69 @@
+#!/usr/bin/env python
+import os
+import os.path as osp
+import re
+
+import yaml
+
+dataset_zoo_path = '../../dataset_zoo'
+datasets = os.listdir(dataset_zoo_path)
+datasets.sort()
+
+table = '# Overview\n'
+table += '## Supported Datasets\n'
+table += '| Dataset Name | Text Detection | Text Recognition | Text Spotting | KIE |\n' \
+ '|--------------|----------------|------------------|---------------|-----|\n' # noqa: E501
+details = '## Dataset Details\n'
+
+for dataset in datasets:
+ meta = yaml.safe_load(
+ open(osp.join(dataset_zoo_path, dataset, 'metafile.yml')))
+ dataset_name = meta['Name']
+ detail_link = re.sub('[^A-Za-z0-9- ]', '',
+ dataset_name).replace(' ', '-').lower()
+ paper = meta['Paper']
+ data = meta['Data']
+
+ table += '| [{}](#{}) | {} | {} | {} | {} |\n'.format(
+ dataset,
+ detail_link,
+ 'โ' if 'textdet' in data['Tasks'] else '',
+ 'โ' if 'textrecog' in data['Tasks'] else '',
+ 'โ' if 'textspotting' in data['Tasks'] else '',
+ 'โ' if 'kie' in data['Tasks'] else '',
+ )
+
+ details += '### {}\n'.format(dataset_name)
+ details += "> \"{}\", *{}*, {}. [PDF]({})\n\n".format(
+ paper['Title'], paper['Venue'], paper['Year'], paper['URL'])
+
+ # Basic Info
+ details += 'A. Basic Info\n'
+ details += ' - Official Website: [{}]({})\n'.format(
+ dataset, data['Website'])
+ details += ' - Year: {}\n'.format(paper['Year'])
+ details += ' - Language: {}\n'.format(data['Language'])
+ details += ' - Scene: {}\n'.format(data['Scene'])
+ details += ' - Annotation Granularity: {}\n'.format(data['Granularity'])
+ details += ' - Supported Tasks: {}\n'.format(data['Tasks'])
+ details += ' - License: [{}]({})\n'.format(data['License']['Type'],
+ data['License']['Link'])
+
+ # Format
+ details += 'B. Annotation Format\n\n'
+ sample_path = osp.join(dataset_zoo_path, dataset, 'sample_anno.md')
+ if osp.exists(sample_path):
+ with open(sample_path, 'r') as f:
+ samples = f.readlines()
+ samples = ''.join(samples)
+ details += samples
+ details += '\n\n'
+
+ # Reference
+ details += 'C. Reference\n'
+ details += '```bibtex\n{}\n```\n'.format(paper['BibTeX'])
+
+datasetzoo = table + details
+
+with open('user_guides/data_prepare/datasetzoo.md', 'w') as f:
+ f.write(datasetzoo)
diff --git a/mmocr-dev-1.x/docs/en/docutils.conf b/mmocr-dev-1.x/docs/en/docutils.conf
new file mode 100644
index 0000000000000000000000000000000000000000..0c00c84688701117f231fd0c8ec295fb747b7d8f
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/docutils.conf
@@ -0,0 +1,2 @@
+[html writers]
+table_style: colwidths-auto
diff --git a/mmocr-dev-1.x/docs/en/get_started/faq.md b/mmocr-dev-1.x/docs/en/get_started/faq.md
new file mode 100644
index 0000000000000000000000000000000000000000..0c72f3238db0a079edb0236ec54267e759d8da6a
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/get_started/faq.md
@@ -0,0 +1,83 @@
+# FAQ
+
+## General
+
+**Q1** I'm getting the warning like `unexpected key in source state_dict: fc.weight, fc.bias`, is there something wrong?
+
+**A** It's not an error. It occurs because the backbone network is pretrained on image classification tasks, where the last fc layer is required to generate the classification output. However, the fc layer is no longer needed when the backbone network is used to extract features in downstream tasks, and therefore these weights can be safely skipped when loading the checkpoint.
+
+**Q2** MMOCR terminates with an error: `shapely.errors.TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry`. How could I fix it?
+
+**A** This error occurs because of some invalid polygons (e.g., polygons with self-intersections) existing in the dataset or generated by some non-rigorous data transforms. These polygons can be fixed by adding `FixInvalidPolygon` transform after the transform likely to introduce invalid polygons. For example, a common practice is to append it after `LoadOCRAnnotations` in both train and test pipeline. The resulting pipeline should look like:
+
+```python
+train_pipeline = [
+ ...
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ ),
+ dict(type='FixInvalidPolygon', min_poly_points=4),
+ ...
+]
+```
+
+In practice, we find that Totaltext contains some invalid polygons and using `FixInvalidPolygon` is a must. [Here](https://github.com/open-mmlab/mmocr/blob/27b6a68586b9a040678fe083bcf60662ae1b9261/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_totaltext.py) is an example config.
+
+**Q3** Getting `libpng warning: iCCP: known incorrect sRGB profile` when loading images with `cv2` backend.
+
+**A** This is a warning from `libpng` and it is safe to ignore. It is caused by the `icc` profile in the image. You can use `pillow` backend to avoid this warning:
+
+```python
+train_pipeline = [
+ dict(
+ type='LoadImageFromFile',
+ imdecode_backend='pillow'),
+ ...
+]
+```
+
+## Text Recognition
+
+**Q1** What are the steps to train text recognition models with my own dictionary?
+
+**A** In MMOCR 1.0, you only need to modify the config and point `Dictionary` to your custom dict file. For example, if you want to train SAR model (https://github.com/open-mmlab/mmocr/blob/75c06d34bbc01d3d11dfd7afc098b6cdeee82579/configs/textrecog/sar/sar_resnet31_parallel-decoder_5e_st-sub_mj-sub_sa_real.py) with your own dictionary placed at `/my/dict.txt`, you can modify `dictionary.dict_file` term in [base config](https://github.com/open-mmlab/mmocr/blob/75c06d34bbc01d3d11dfd7afc098b6cdeee82579/configs/textrecog/sar/_base_sar_resnet31_parallel-decoder.py#L1) to:
+
+```python
+dictionary = dict(
+ type='Dictionary',
+ dict_file='/my/dict.txt',
+ with_start=True,
+ with_end=True,
+ same_start_end=True,
+ with_padding=True,
+ with_unknown=True)
+```
+
+Now you are good to go. You can also find more information in [Dictionary API](https://mmocr.readthedocs.io/en/dev-1.x/api/generated/mmocr.models.common.Dictionary.html#mmocr.models.common.Dictionary).
+
+**Q2** How to properly visualize non-English characters?
+
+**A** You can customize `font_families` or `font_properties` in visualizer. For example, to visualize Korean:
+
+`configs/textrecog/_base_/default_runtime.py`:
+
+```python
+visualizer = dict(
+ type='TextRecogLocalVisualizer',
+ name='visualizer',
+ font_families='NanumGothic', # new feature
+ vis_backends=vis_backends)
+```
+
+It's also fine to pass the font path to visualizer:
+
+```python
+visualizer = dict(
+ type='TextRecogLocalVisualizer',
+ name='visualizer',
+ font_properties='path/to/font_file',
+ vis_backends=vis_backends)
+```
diff --git a/mmocr-dev-1.x/docs/en/get_started/install.md b/mmocr-dev-1.x/docs/en/get_started/install.md
new file mode 100644
index 0000000000000000000000000000000000000000..e892ba37272903e853a60cf7d7a2f0c8ee0cdc05
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/get_started/install.md
@@ -0,0 +1,244 @@
+# Installation
+
+## Prerequisites
+
+- Linux | Windows | macOS
+- Python 3.7
+- PyTorch 1.6 or higher
+- torchvision 0.7.0
+- CUDA 10.1
+- NCCL 2
+- GCC 5.4.0 or higher
+
+## Environment Setup
+
+```{note}
+If you are experienced with PyTorch and have already installed it, just skip this part and jump to the [next section](#installation-steps). Otherwise, you can follow these steps for the preparation.
+```
+
+**Step 0.** Download and install Miniconda from the [official website](https://docs.conda.io/en/latest/miniconda.html).
+
+**Step 1.** Create a conda environment and activate it.
+
+```shell
+conda create --name openmmlab python=3.8 -y
+conda activate openmmlab
+```
+
+**Step 2.** Install PyTorch following [official instructions](https://pytorch.org/get-started/locally/), e.g.
+
+````{tabs}
+
+```{code-tab} shell GPU Platform
+conda install pytorch torchvision -c pytorch
+```
+
+```{code-tab} shell CPU Platform
+conda install pytorch torchvision cpuonly -c pytorch
+```
+
+````
+
+## Installation Steps
+
+We recommend that users follow our best practices to install MMOCR. However, the whole process is highly customizable. See [Customize Installation](#customize-installation) section for more information.
+
+### Best Practices
+
+**Step 0.** Install [MMEngine](https://github.com/open-mmlab/mmengine), [MMCV](https://github.com/open-mmlab/mmcv) and [MMDetection](https://github.com/open-mmlab/mmdetection) using [MIM](https://github.com/open-mmlab/mim).
+
+```shell
+pip install -U openmim
+mim install mmengine
+mim install mmcv
+mim install mmdet
+```
+
+**Step 1.** Install MMOCR.
+
+If you wish to run and develop MMOCR directly, install it from **source** (recommended).
+
+If you use MMOCR as a dependency or third-party package, install it via **MIM**.
+
+`````{tabs}
+
+````{group-tab} Install from Source
+
+```shell
+
+git clone https://github.com/open-mmlab/mmocr.git
+cd mmocr
+pip install -v -e .
+# "-v" increases pip's verbosity.
+# "-e" means installing the project in editable mode,
+# That is, any local modifications on the code will take effect immediately.
+
+```
+
+````
+
+````{group-tab} Install via MIM
+
+```shell
+
+mim install mmocr
+
+```
+
+````
+
+`````
+
+**Step 2. (Optional)** If you wish to use any transform involving `albumentations` (For example, `Albu` in ABINet's pipeline), or any dependency for building documentation or running unit tests, please install the dependency using the following command:
+
+`````{tabs}
+
+````{group-tab} Install from Source
+
+```shell
+# install albu
+pip install -r requirements/albu.txt
+# install the dependencies for building documentation and running unit tests
+pip install -r requirements.txt
+```
+
+````
+
+````{group-tab} Install via MIM
+
+```shell
+pip install albumentations>=1.1.0 --no-binary qudida,albumentations
+```
+
+````
+
+`````
+
+```{note}
+
+We recommend checking the environment after installing `albumentations` to
+ensure that `opencv-python` and `opencv-python-headless` are not installed together, otherwise it might cause unexpected issues. If that's unfortunately the case, please uninstall `opencv-python-headless` to make sure MMOCR's visualization utilities can work.
+
+Refer
+to [albumentations's official documentation](https://albumentations.ai/docs/getting_started/installation/#note-on-opencv-dependencies) for more details.
+
+```
+
+### Verify the installation
+
+You may verify the installation via this inference demo.
+
+`````{tabs}
+
+````{tab} Python
+
+Run the following code in a Python interpreter:
+
+```python
+>>> from mmocr.apis import MMOCRInferencer
+>>> ocr = MMOCRInferencer(det='DBNet', rec='CRNN')
+>>> ocr('demo/demo_text_ocr.jpg', show=True, print_result=True)
+```
+````
+
+````{tab} Shell
+
+If you installed MMOCR from source, you can run the following in MMOCR's root directory:
+
+```shell
+python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec CRNN --show --print-result
+```
+````
+
+`````
+
+You should be able to see a pop-up image and the inference result printed out in the console upon successful verification.
+
+
+
+
+
+
+```bash
+# Inference result
+{'predictions': [{'rec_texts': ['cbanks', 'docecea', 'grouf', 'pwate', 'chobnsonsg', 'soxee', 'oeioh', 'c', 'sones', 'lbrandec', 'sretalg', '11', 'to8', 'round', 'sale', 'year',
+'ally', 'sie', 'sall'], 'rec_scores': [...], 'det_polygons': [...], 'det_scores':
+[...]}]}
+```
+
+```{note}
+If you are running MMOCR on a server without GUI or via SSH tunnel with X11 forwarding disabled, you may not see the pop-up window.
+```
+
+## Customize Installation
+
+### CUDA versions
+
+When installing PyTorch, you need to specify the version of CUDA. If you are not clear on which to choose, follow our recommendations:
+
+- For Ampere-based NVIDIA GPUs, such as GeForce 30 series and NVIDIA A100, CUDA 11 is a must.
+- For older NVIDIA GPUs, CUDA 11 is backward compatible, but CUDA 10.2 offers better compatibility and is more lightweight.
+
+Please make sure the GPU driver satisfies the minimum version requirements. See [this table](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions) for more information.
+
+```{note}
+Installing CUDA runtime libraries is enough if you follow our best practices, because no CUDA code will be compiled locally. However if you hope to compile MMCV from source or develop other CUDA operators, you need to install the complete CUDA toolkit from NVIDIA's [website](https://developer.nvidia.com/cuda-downloads), and its version should match the CUDA version of PyTorch. i.e., the specified version of cudatoolkit in `conda install` command.
+```
+
+### Install MMCV without MIM
+
+MMCV contains C++ and CUDA extensions, thus depending on PyTorch in a complex way. MIM solves such dependencies automatically and makes the installation easier. However, it is not a must.
+
+To install MMCV with pip instead of MIM, please follow [MMCV installation guides](https://mmcv.readthedocs.io/en/latest/get_started/installation.html). This requires manually specifying a find-url based on PyTorch version and its CUDA version.
+
+For example, the following command install mmcv-full built for PyTorch 1.10.x and CUDA 11.3.
+
+```shell
+pip install `mmcv>=2.0.0rc1` -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10/index.html
+```
+
+### Install on CPU-only platforms
+
+MMOCR can be built for CPU-only environment. In CPU mode you can train (requires MMCV version >= 1.4.4), test or inference a model.
+
+However, some functionalities are gone in this mode:
+
+- Deformable Convolution
+- Modulated Deformable Convolution
+- ROI pooling
+- SyncBatchNorm
+
+If you try to train/test/inference a model containing above ops, an error will be raised.
+The following table lists affected algorithms.
+
+| Operator | Model |
+| :-----------------------------------------------------: | :-----------------------------------------------------: |
+| Deformable Convolution/Modulated Deformable Convolution | DBNet (r50dcnv2), DBNet++ (r50dcnv2), FCENet (r50dcnv2) |
+| SyncBatchNorm | PANet, PSENet |
+
+### Using MMOCR with Docker
+
+We provide a [Dockerfile](https://github.com/open-mmlab/mmocr/blob/master/docker/Dockerfile) to build an image.
+
+```shell
+# build an image with PyTorch 1.6, CUDA 10.1
+docker build -t mmocr docker/
+```
+
+Run it with
+
+```shell
+docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmocr/data mmocr
+```
+
+## Dependency on MMEngine, MMCV & MMDetection
+
+MMOCR has different version requirements on MMEngine, MMCV and MMDetection at each release to guarantee the implementation correctness. Please refer to the table below and ensure the package versions fit the requirement.
+
+| MMOCR | MMEngine | MMCV | MMDetection |
+| -------------- | --------------------------- | -------------------------- | --------------------------- |
+| dev-1.x | 0.7.1 \<= mmengine \< 1.0.0 | 2.0.0rc4 \<= mmcv \< 2.1.0 | 3.0.0rc5 \<= mmdet \< 3.1.0 |
+| 1.0.0 | 0.7.1 \<= mmengine \< 1.0.0 | 2.0.0rc4 \<= mmcv \< 2.1.0 | 3.0.0rc5 \<= mmdet \< 3.1.0 |
+| 1.0.0rc6 | 0.6.0 \<= mmengine \< 1.0.0 | 2.0.0rc4 \<= mmcv \< 2.1.0 | 3.0.0rc5 \<= mmdet \< 3.1.0 |
+| 1.0.0rc\[4-5\] | 0.1.0 \<= mmengine \< 1.0.0 | 2.0.0rc1 \<= mmcv \< 2.1.0 | 3.0.0rc0 \<= mmdet \< 3.1.0 |
+| 1.0.0rc\[0-3\] | 0.0.0 \<= mmengine \< 0.2.0 | 2.0.0rc1 \<= mmcv \< 2.1.0 | 3.0.0rc0 \<= mmdet \< 3.1.0 |
diff --git a/mmocr-dev-1.x/docs/en/get_started/overview.md b/mmocr-dev-1.x/docs/en/get_started/overview.md
new file mode 100644
index 0000000000000000000000000000000000000000..7bbb67b142750fb1d148b44c0f9d79d605a061ad
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/get_started/overview.md
@@ -0,0 +1,20 @@
+# Overview
+
+MMOCR is an open source toolkit based on [PyTorch](https://pytorch.org/) and [MMDetection](https://github.com/open-mmlab/mmdetection), supporting numerous OCR-related models, including text detection, text recognition, and key information extraction. In addition, it supports widely-used academic datasets and provides many useful tools, assisting users in exploring various aspects of models and datasets and implementing high-quality algorithms. Generally, it has the following features.
+
+- **One-stop, Multi-model**: MMOCR supports various OCR-related tasks and implements the latest models for text detection, recognition, and key information extraction.
+- **Modular Design**: MMOCR's modular design allows users to define and reuse modules in the model on demand.
+- **Various Useful Tools**: MMOCR provides a number of analysis tools, including visualizers, validation scripts, evaluators, etc., to help users troubleshoot, finetune or compare models.
+- **Powered by [OpenMMLab](https://openmmlab.com/)**: Like other algorithm libraries in OpenMMLab family, MMOCR follows OpenMMLab's rigorous development guidelines and interface conventions, significantly reducing the learning cost of users familiar with other projects in OpenMMLab family. In addition, benefiting from the unified interfaces among OpenMMLab, you can easily call the models implemented in other OpenMMLab projects (e.g. MMDetection) in MMOCR, facilitating cross-domain research and real-world applications.
+
+Together with the release of OpenMMLab 2.0, MMOCR now also comes to its 1.0.0 version, which has made significant BC-breaking changes, resulting in less code redundancy, higher code efficiency and an overall more systematic and consistent design.
+
+Considering that there are some backward incompatible changes in this version compared to 0.x, we have prepared a detailed [migration guide](../migration/overview.md). It lists all the changes made in the new version and the steps required to migrate. We hope this guide can help users familiar with the old framework to complete the upgrade as quickly as possible. Though this may take some time, we believe that the new features brought by MMOCR and the OpenMMLab ecosystem will make it all worthwhile. ๐
+
+Next, please read the section according to your actual needs.
+
+- We recommend that beginners go through [Quick Run](quick_run.md) to get familiar with MMOCR and master the usage of MMOCR by reading the examples in **User Guides**.
+- Intermediate and advanced developers are suggested to learn the background, conventions, and recommended implementations of each component from **Basic Concepts**.
+- Read our [FAQ](faq.md) to find answers to frequently asked questions.
+- If you can't find the answers you need in the documentation, feel free to raise an [issue](https://github.com/open-mmlab/mmocr/issues).
+- Everyone is welcome to be a contributor! Read the [contribution guide](../notes/contribution_guide.md) to learn how to contribute to MMOCR!
diff --git a/mmocr-dev-1.x/docs/en/get_started/quick_run.md b/mmocr-dev-1.x/docs/en/get_started/quick_run.md
new file mode 100644
index 0000000000000000000000000000000000000000..5c5f01a4491fbbc64e2c4bbc63bf69b1d7f949d4
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/get_started/quick_run.md
@@ -0,0 +1,203 @@
+# Quick Run
+
+This chapter will take you through the basic functions of MMOCR. And we assume you [installed MMOCR from source](install.md#best-practices). You may check out the [tutorial notebook](https://colab.research.google.com/github/open-mmlab/mmocr/blob/dev-1.x/demo/tutorial.ipynb) for how to perform inference, training and testing interactively.
+
+## Inference
+
+Run the following in MMOCR's root directory:
+
+```shell
+python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec CRNN --show --print-result
+```
+
+You should be able to see a pop-up image and the inference result printed out in the console.
+
+
+
+
+
+
+```bash
+# Inference result
+{'predictions': [{'rec_texts': ['cbanks', 'docecea', 'grouf', 'pwate', 'chobnsonsg', 'soxee', 'oeioh', 'c', 'sones', 'lbrandec', 'sretalg', '11', 'to8', 'round', 'sale', 'year',
+'ally', 'sie', 'sall'], 'rec_scores': [...], 'det_polygons': [...], 'det_scores':
+[...]}]}
+```
+
+```{note}
+If you are running MMOCR on a server without GUI or via SSH tunnel with X11 forwarding disabled, you may not see the pop-up window.
+```
+
+A detailed description of MMOCR's inference interface can be found [here](../user_guides/inference.md)
+
+In addition to using our well-provided pre-trained models, you can also train models on your own datasets. In the next section, we will take you through the basic functions of MMOCR by training DBNet on the mini [ICDAR 2015](https://rrc.cvc.uab.es/?ch=4&com=downloads) dataset as an example.
+
+## Prepare a Dataset
+
+Since the variety of OCR dataset formats are not conducive to either switching or joint training of multiple datasets, MMOCR proposes a uniform [data format](../user_guides/dataset_prepare.md), and provides [dataset preparer](../user_guides/data_prepare/dataset_preparer.md) for commonly used OCR datasets. Usually, to use those datasets in MMOCR, you just need to follow the steps to get them ready for use.
+
+```{note}
+But here, efficiency means everything.
+```
+
+Here, we have prepared a lite version of ICDAR 2015 dataset for demonstration purposes. Download our pre-prepared [zip](https://download.openmmlab.com/mmocr/data/icdar2015/mini_icdar2015.tar.gz) and extract it to the `data/` directory under mmocr to get our prepared image and annotation file.
+
+```Bash
+wget https://download.openmmlab.com/mmocr/data/icdar2015/mini_icdar2015.tar.gz
+mkdir -p data/
+tar xzvf mini_icdar2015.tar.gz -C data/
+```
+
+## Modify the Config
+
+Once the dataset is prepared, we will then specify the location of the training set and the training parameters by modifying the config file.
+
+In this example, we will train a DBNet using resnet18 as its backbone. Since MMOCR already has a config file for the full ICDAR 2015 dataset (`configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py`), we just need to make some modifications on top of it.
+
+We first need to modify the path to the dataset. In this config, most of the key config files are imported in `_base_`, such as the database configuration from `configs/textdet/_base_/datasets/icdar2015.py`. Open that file and replace the path pointed to by `icdar2015_textdet_data_root` in the first line with:
+
+```Python
+icdar2015_textdet_data_root = 'data/mini_icdar2015'
+```
+
+Also, because of the reduced dataset size, we have to reduce the number of training epochs to 400 accordingly, shorten the validation interval as well as the weight storage interval to 10 rounds, and drop the learning rate decay strategy. The following lines of configuration can be directly put into `configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py` to take effect.
+
+```Python
+# Save checkpoints every 10 epochs, and only keep the latest checkpoint
+default_hooks = dict(
+ checkpoint=dict(
+ type='CheckpointHook',
+ interval=10,
+ max_keep_ckpts=1,
+ ))
+# Set the maximum number of epochs to 400, and validate the model every 10 epochs
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=400, val_interval=10)
+# Fix learning rate as a constant
+param_scheduler = [
+ dict(type='ConstantLR', factor=1.0),
+]
+```
+
+Here, we have rewritten the corresponding parameters in the base configuration directly through the inheritance ({external+mmengine:doc}`MMEngine: Config `) mechanism of the config. The original fields are distributed in `configs/textdet/_base_/schedules/schedule_sgd_1200e.py` and `configs/textdet/_base_/default_runtime.py`.
+
+```{note}
+For a more detailed description of config, please refer to [here](../user_guides/config.md).
+```
+
+## Browse the Dataset
+
+Before we start the training, we can also visualize the image processed by training-time [data transforms](../basic_concepts/transforms.md). It's quite simple: pass the config file we need to visualize into the [browse_dataset.py](/tools/analysis_tools/browse_dataset.py) script.
+
+```Bash
+python tools/analysis_tools/browse_dataset.py configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py
+```
+
+The transformed images and annotations will be displayed one by one in a pop-up window.
+
+
+
+
+
+```{note}
+For details on the parameters and usage of this script, please refer to [here](../user_guides/useful_tools.md).
+```
+
+```{tip}
+In addition to satisfying our curiosity, visualization can also help us check the parts that may affect the model's performance before training, such as problems in configs, datasets and data transforms.
+```
+
+## Training
+
+Start the training by running the following command:
+
+```Bash
+python tools/train.py configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py
+```
+
+Depending on the system environment, MMOCR will automatically use the best device for training. If a GPU is available, a single GPU training will be started by default. When you start to see the output of the losses, you have successfully started the training.
+
+```Bash
+2022/08/22 18:42:22 - mmengine - INFO - Epoch(train) [1][5/7] lr: 7.0000e-03 memory: 7730 data_time: 0.4496 loss_prob: 14.6061 loss_thr: 2.2904 loss_db: 0.9879 loss: 17.8843 time: 1.8666
+2022/08/22 18:42:24 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_icdar2015
+2022/08/22 18:42:28 - mmengine - INFO - Epoch(train) [2][5/7] lr: 7.0000e-03 memory: 6695 data_time: 0.2052 loss_prob: 6.7840 loss_thr: 1.4114 loss_db: 0.9855 loss: 9.1809 time: 0.7506
+2022/08/22 18:42:29 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_icdar2015
+2022/08/22 18:42:33 - mmengine - INFO - Epoch(train) [3][5/7] lr: 7.0000e-03 memory: 6690 data_time: 0.2101 loss_prob: 3.0700 loss_thr: 1.1800 loss_db: 0.9967 loss: 5.2468 time: 0.6244
+2022/08/22 18:42:33 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_icdar2015
+```
+
+Without extra configurations, model weights will be saved to `work_dirs/dbnet_resnet18_fpnc_1200e_icdar2015/`, while the logs will be stored in `work_dirs/dbnet_resnet18_fpnc_1200e_icdar2015/TIMESTAMP/`. Next, we just need to wait with some patience for training to finish.
+
+```{note}
+For advanced usage of training, such as CPU training, multi-GPU training, and cluster training, please refer to [Training and Testing](../user_guides/train_test.md).
+```
+
+## Testing
+
+After 400 epochs, we observe that DBNet performs best in the last epoch, with `hmean` reaching 60.86 (You may see a different result):
+
+```Bash
+08/22 19:24:52 - mmengine - INFO - Epoch(val) [400][100/100] icdar/precision: 0.7285 icdar/recall: 0.5226 icdar/hmean: 0.6086
+```
+
+```{note}
+It may not have been trained to be optimal, but it is sufficient for a demo.
+```
+
+However, this value only reflects the performance of DBNet on the mini ICDAR 2015 dataset. For a comprehensive evaluation, we also need to see how it performs on out-of-distribution datasets. For example, `tests/data/det_toy_dataset` is a very small real dataset that we can use to verify the actual performance of DBNet.
+
+Before testing, we also need to make some changes to the location of the dataset. Open `configs/textdet/_base_/datasets/icdar2015.py` and change `data_root` of `icdar2015_textdet_test` to `tests/data/det_toy_dataset`:
+
+```Python
+# ...
+icdar2015_textdet_test = dict(
+ type='OCRDataset',
+ data_root='tests/data/det_toy_dataset',
+ # ...
+ )
+```
+
+Start testing:
+
+```Bash
+python tools/test.py configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py work_dirs/dbnet_resnet18_fpnc_1200e_icdar2015/epoch_400.pth
+```
+
+And get the outputs like:
+
+```Bash
+08/21 21:45:59 - mmengine - INFO - Epoch(test) [5/10] memory: 8562
+08/21 21:45:59 - mmengine - INFO - Epoch(test) [10/10] eta: 0:00:00 time: 0.4893 data_time: 0.0191 memory: 283
+08/21 21:45:59 - mmengine - INFO - Evaluating hmean-iou...
+08/21 21:45:59 - mmengine - INFO - prediction score threshold: 0.30, recall: 0.6190, precision: 0.4815, hmean: 0.5417
+08/21 21:45:59 - mmengine - INFO - prediction score threshold: 0.40, recall: 0.6190, precision: 0.5909, hmean: 0.6047
+08/21 21:45:59 - mmengine - INFO - prediction score threshold: 0.50, recall: 0.6190, precision: 0.6842, hmean: 0.6500
+08/21 21:45:59 - mmengine - INFO - prediction score threshold: 0.60, recall: 0.6190, precision: 0.7222, hmean: 0.6667
+08/21 21:45:59 - mmengine - INFO - prediction score threshold: 0.70, recall: 0.3810, precision: 0.8889, hmean: 0.5333
+08/21 21:45:59 - mmengine - INFO - prediction score threshold: 0.80, recall: 0.0000, precision: 0.0000, hmean: 0.0000
+08/21 21:45:59 - mmengine - INFO - prediction score threshold: 0.90, recall: 0.0000, precision: 0.0000, hmean: 0.0000
+08/21 21:45:59 - mmengine - INFO - Epoch(test) [10/10] icdar/precision: 0.7222 icdar/recall: 0.6190 icdar/hmean: 0.6667
+```
+
+The model achieves an hmean of 0.6667 on this dataset.
+
+```{note}
+For advanced usage of testing, such as CPU testing, multi-GPU testing, and cluster testing, please refer to [Training and Testing](../user_guides/train_test.md).
+```
+
+## Visualize the Outputs
+
+We can also visualize its prediction output in `test.py`. You can open a pop-up visualization window with the `show` parameter; and can also specify the directory where the prediction result images are exported with the `show-dir` parameter.
+
+```Bash
+python tools/test.py configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py work_dirs/dbnet_resnet18_fpnc_1200e_icdar2015/epoch_400.pth --show-dir imgs/
+```
+
+The true labels and predicted values are displayed in a tiled fashion in the visualization results. The green boxes in the left panel indicate the true labels and the red boxes in the right panel indicate the predicted values.
+
+
+
+
+
+```{note}
+For a description of more visualization features, see [here](../user_guides/visualization.md).
+```
diff --git a/mmocr-dev-1.x/docs/en/index.rst b/mmocr-dev-1.x/docs/en/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..123b9b933e987b201f9265baac0f1a0225e38f8f
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/index.rst
@@ -0,0 +1,113 @@
+Welcome to MMOCR's documentation!
+=======================================
+
+You can switch between English and Chinese in the lower-left corner of the layout.
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Get Started
+
+ get_started/overview.md
+ get_started/install.md
+ get_started/quick_run.md
+ get_started/faq.md
+
+.. toctree::
+ :maxdepth: 2
+ :caption: User Guides
+
+ user_guides/inference.md
+ user_guides/config.md
+ user_guides/dataset_prepare.md
+ user_guides/train_test.md
+ user_guides/visualization.md
+ user_guides/useful_tools.md
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Basic Concepts
+
+ basic_concepts/structures.md
+ basic_concepts/transforms.md
+ basic_concepts/evaluation.md
+ basic_concepts/datasets.md
+ basic_concepts/overview.md
+ basic_concepts/data_flow.md
+ basic_concepts/models.md
+ basic_concepts/visualizers.md
+ basic_concepts/convention.md
+ basic_concepts/engine.md
+
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Dataset Zoo
+
+ user_guides/data_prepare/datasetzoo.md
+ user_guides/data_prepare/dataset_preparer.md
+ user_guides/data_prepare/det.md
+ user_guides/data_prepare/recog.md
+ user_guides/data_prepare/kie.md
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Model Zoo
+
+ modelzoo.md
+ projectzoo.md
+ backbones.md
+ textdet_models.md
+ textrecog_models.md
+ kie_models.md
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+
+ notes/branches.md
+ notes/contribution_guide.md
+ notes/changelog.md
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Migrating from MMOCR 0.x
+
+ migration/overview.md
+ migration/news.md
+ migration/branches.md
+ migration/code.md
+ migration/dataset.md
+ migration/model.md
+ migration/transforms.md
+
+.. toctree::
+ :maxdepth: 1
+ :caption: API Reference
+
+ mmocr.apis
+ mmocr.structures
+ mmocr.datasets
+ mmocr.transforms
+ mmocr.models
+ mmocr.evaluation
+ mmocr.visualization
+ mmocr.engine
+ mmocr.utils
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Contact US
+
+ contact.md
+
+.. toctree::
+ :caption: Switch Language
+
+ switch_language.md
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`search`
diff --git a/mmocr-dev-1.x/docs/en/make.bat b/mmocr-dev-1.x/docs/en/make.bat
new file mode 100644
index 0000000000000000000000000000000000000000..8a3a0e25b49a52ade52c4f69ddeb0bc3d12527ff
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/make.bat
@@ -0,0 +1,36 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+ set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+ echo.
+ echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+ echo.installed, then set the SPHINXBUILD environment variable to point
+ echo.to the full path of the 'sphinx-build' executable. Alternatively you
+ echo.may add the Sphinx directory to PATH.
+ echo.
+ echo.If you don't have Sphinx installed, grab it from
+ echo.http://sphinx-doc.org/
+ exit /b 1
+)
+
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/mmocr-dev-1.x/docs/en/merge_docs.sh b/mmocr-dev-1.x/docs/en/merge_docs.sh
new file mode 100755
index 0000000000000000000000000000000000000000..9835eab21fc3dc8a2de63332148dcd0b4f2fa6a2
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/merge_docs.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+
+# gather models
+sed -e '$a\\n' -s ../../configs/kie/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Key Information Extraction Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >kie_models.md
+sed -e '$a\\n' -s ../../configs/textdet/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Detection Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textdet_models.md
+sed -e '$a\\n' -s ../../configs/textrecog/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textrecog_models.md
+sed -e '$a\\n' -s ../../configs/backbone/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# BackBones' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >backbones.md
diff --git a/mmocr-dev-1.x/docs/en/migration/branches.md b/mmocr-dev-1.x/docs/en/migration/branches.md
new file mode 100644
index 0000000000000000000000000000000000000000..d5b02ae2461f2e743d5323d9f1978c622740ef86
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/migration/branches.md
@@ -0,0 +1,38 @@
+# Branch Migration
+
+At an earlier stage, MMOCR had three branches: `main`, `1.x`, and `dev-1.x`. Some of these branches have been renamed together with the official MMOCR 1.0.0 release, and here is the changelog.
+
+- `main` branch housed the code for MMOCR 0.x (e.g., v0.6.3). Now it has been renamed to `0.x`.
+- `1.x` contained the code for MMOCR 1.x (e.g., 1.0.0rc6). Now it is an alias of `main`, and will be removed in mid 2023.
+- `dev-1.x` was the development branch for MMOCR 1.x. Now it remains unchanged.
+
+For more information about the branches, check out [branches](../notes/branches.md).
+
+## Resolving Conflicts When Upgrading the `main` branch
+
+For users who wish to upgrade from the old `main` branch that has the code for MMOCR 0.x, the non-fast-forwarded-able nature of the upgrade may cause conflicts. To resolve these conflicts, follow the steps below:
+
+1. Commit all the changes you have on `main` if you have any. Backup your current `main` branch by creating a copy.
+
+ ```bash
+ git checkout main
+ git add --all
+ git commit -m 'backup'
+ git checkout -b main_backup
+ ```
+
+2. Fetch the latest changes from the remote repository.
+
+ ```bash
+ git remote add openmmlab git@github.com:open-mmlab/mmocr.git
+ git fetch openmmlab
+ ```
+
+3. Reset the `main` branch to the latest `main` branch on the remote repository by running `git reset --hard openmmlab/main`.
+
+ ```bash
+ git checkout main
+ git reset --hard openmmlab/main
+ ```
+
+By following these steps, you can successfully upgrade your `main` branch.
diff --git a/mmocr-dev-1.x/docs/en/migration/code.md b/mmocr-dev-1.x/docs/en/migration/code.md
new file mode 100644
index 0000000000000000000000000000000000000000..31b84b4985bd6690a7b79a0fce4463f0c7d26f73
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/migration/code.md
@@ -0,0 +1,151 @@
+# Code Migration
+
+MMOCR has been designed in a way that there are a lot of shortcomings in the initial version in order to balance the tasks of text detection, recognition and key information extraction. In this 1.0 release, MMOCR synchronizes its new model architecture to align as much as possible with the overall OpenMMLab design and to achieve structural uniformity within the algorithm library. Although this upgrade is not fully backward compatible, we summarize the changes that may be of interest to developers for those who need them.
+
+## Fundamental Changes
+
+Functional boundaries of modules has not been clearly defined in MMOCR 0.x. In MMOCR 1.0, we address this issue by refactoring the design of model modules. Here are some major changes in 1.0:
+
+- MMOCR 1.0 no longer supports named entity recognition tasks since it's not in the scope of OCR.
+
+- The module that computes the loss in a model is named as *Module Loss*, which is also responsible for the conversion of gold annotations into loss targets. Another module, *Postprocessor*, is responsible for decoding the model raw output into `DataSample` for the corresponding task at prediction time.
+
+- The inputs of all models are now organized as a dictionary that consists of two keys: `inputs`, containing the original features of the images, and `List[DataSample]`, containing the meta-information of the images. At training time, the output format of a model is standardized to a dictionary containing the loss tensors. Similarly, a model generates a sequence of `DataSample`s containing the prediction outputs in testing.
+
+- In MMOCR 0.x, the majority of classes named `XXLoss` have the implementations closely bound to the corresponding model, while their names made users hard to tell them apart from other generic losses like `DiceLoss`. In 1.0, they are renamed to the form `XXModuleLoss`. (e.g. `DBLoss` was renamed to `DBModuleLoss`). The key to their configurations in config files is also changed from `loss` to `module_loss`.
+
+- The names of generic loss classes that are not related to the model implementation are kept as `XXLoss`. (e.g. [`MaskedBCELoss`](mmocr.models.common.losses.MaskedBCELoss)) They are all placed under `mmocr/models/common/losses`.
+
+- Changes under `mmocr/models/common/losses`: `DiceLoss` is renamed to [`MaskedDiceLoss`](mmocr.models.common.losses.MaskedDiceLoss). `FocalLoss` has been removed.
+
+- MMOCR 1.0 adds a *Dictionary* module which originates from *label converter*. It is used in text recognition and key information extraction tasks.
+
+## Text Detection Models
+
+### Key Changes (TL;DR)
+
+- The model weights from MMOCR 0.x still works in the 1.0, but the fields starting with `bbox_head` in the state dict `state_dict` need to be renamed to `det_head`.
+
+- `XXTargets` transforms, which were responsible for genearting detection targets, have been merged into `XXModuleLoss`.
+
+### SingleStageTextDetector
+
+- The original inheritance chain was `mmdet.BaseDetector->SingleStageDetector->SingleStageTextDetector`. Now `SingleStageTextDetector` is directly inherited from `BaseDetector` without extra dependency on MMDetection, and `SingleStageDetector` is deleted.
+
+- `bbox_head` is renamed to `det_head`.
+
+- `train_cfg`, `test_cfg` and `pretrained` fields are removed.
+
+- `forward_train()` and `simple_test()` are refactored to `loss()` and `predict()`. The part of `simple_test()` that was responsible for splitting the raw output of the model and feeding it into `head.get_bounary()` is integrated into `BaseTextDetPostProcessor`.
+
+- `TextDetectorMixin` has been removed since its implementation overlaps with `TextDetLocalVisualizer`.
+
+### Head
+
+- `HeadMixin`, the base class that `XXXHead` had to inherit from in version 0.x, has been replaced by `BaseTextDetHead`. `get_boundary()` and `resize_boundary()` are now rewritten as `__call__()` and `rescale()` in `BaseTextDetPostProcessor`.
+
+### ModuleLoss
+
+- Data transforms `XXXTargets` in text detection tasks are all moved to `XXXModuleLoss._get_target_single()`. Target-related configurations are no longer specified in the data pipeline but in `XXXLoss` instead.
+
+### Postprocessor
+
+- The logic in the original `XXXPostprocessor.__call__()` are transferred to the refactored `XXXPostprocessor.get_text_instances()`.
+
+- `BasePostprocessor` is refactored to `BaseTextDetPostProcessor`. This base class splits and processes the model output predictions one by one and supports automatic scaling of the output polygon or bounding box based on `scale_factor`.
+
+## Text Recognition
+
+### Key Changes (TL;DR)
+
+- Due to the change of the character order and some bugs in the model architecture being fixed, the recognition model weights in 0.x can no longer be directly used in 1.0. We will provide a migration script and tutorial for those who need it.
+
+- The support of SegOCR has been removed. TPS-CRNN will still be supported in a later version.
+
+- Test time augmentation will be supported in the upcoming release.
+
+- *Label converter* module has been removed and its functions have been split into *Dictionary*, *ModuleLoss* and *Postprocessor*.
+
+- The definition of `max_seq_len` has been unified and now it represents the original output length of the model.
+
+### Label Converter
+
+- The original label converters had spelling errors (written as label convertors). We fixed them by removing label converters from this project.
+
+- The part responsible for converting characters/strings to and from numeric indexes was extracted to *Dictionary*.
+
+- In older versions, different label converters would have different special character sets and character order. In version 0.x, the character order was as follows.
+
+| Converter | Character order |
+| ------------------------------- | ----------------------------------------- |
+| `AttnConvertor`, `ABIConvertor` | ``, ``, ``, characters |
+| `CTCConvertor` | ``, ``, characters |
+
+In 1.0, instead of designing different dictionaries and character orders for different tasks, we have a unified *Dictionary* implementation with the character order always as characters, \, \, \. \ in `CTCConvertor` has been equivalently replaced by \.
+
+- *Label convertor* originally supported three ways to initialize dictionaries: `dict_type`, `dict_file` and `dict_list`, which are now reduced to `dict_file` only in `Dictionary`. Also, we have put those pre-defined character sets originally supported in `dict_type` into `dicts/` directory now. The corresponding mapping is as follows:
+
+ | MMOCR 0.x: `dict_type` | MMOCR 1.0: Dict path |
+ | ---------------------- | -------------------------------------- |
+ | DICT90 | dicts/english_digits_symbols.txt |
+ | DICT91 | dicts/english_digits_symbols_space.txt |
+ | DICT36 | dicts/lower_english_digits.txt |
+ | DICT37 | dicts/lower_english_digits_space.txt |
+
+- The implementation of `str2tensor()` in *label converter* has been moved to `ModuleLoss.get_targets()`. The following table shows the correspondence between the old and new method implementations. Note that the old and new implementations are not identical.
+
+ | MMOCR 0.x | MMOCR 1.0 | Note |
+ | --------------------------------------------------------- | --------------------------------------- | -------------------------------------------------------------------------------------------------------- |
+ | `ABIConvertor.str2tensor()`, `AttnConvertor.str2tensor()` | `BaseTextRecogModuleLoss.get_targets()` | The different implementations between `ABIConvertor.str2tensor()` and `AttnConvertor.str2tensor()` have been unified in the new version. |
+ | `CTCConvertor.str2tensor()` | `CTCModuleLoss.get_targets()` | |
+
+- The implementation of `tensor2idx()` in *label converter* has been moved to `Postprocessor.get_single_prediction()`. The following table shows the correspondence between the old and new method implementations. Note that the old and new implementations are not identical.
+
+ | MMOCR 0.x | MMOCR 1.0 |
+ | --------------------------------------------------------- | ------------------------------------------------ |
+ | `ABIConvertor.tensor2idx()`, `AttnConvertor.tensor2idx()` | `AttentionPostprocessor.get_single_prediction()` |
+ | `CTCConvertor.tensor2idx()` | `CTCPostProcessor.get_single_prediction()` |
+
+## Key Information Extraction
+
+### Key Changes (TL;DR)
+
+- Due to changes in the inputs to the model, the model weights obtained in 0.x can no longer be directly used in 1.0.
+
+### KIEDataset & OpensetKIEDataset
+
+- The part that reads data is kept in `WildReceiptDataset`.
+
+- The part that additionally processes the nodes and edges is moved to `LoadKIEAnnotation`.
+
+- The part that uses dictionaries to transform text is moved to `SDMGRHead.convert_text()`, with the help of *Dictionary*.
+
+- The part of `compute_relation()` that computes the relationships between text boxes is moved to `SDMGRHead.compute_relations()`. It's now done inside the model.
+
+- The part that evaluates the model performance is done in [`F1Metric`](mmocr.evaluation.metric.F1Metric).
+
+- The part of `OpensetKIEDataset` that processes model's edge outputs is moved to `SDMGRPostProcessor`.
+
+### SDMGR
+
+- `show_result()` is integrated into `KIEVisualizer`.
+
+- The part of `forward_test()` that post-processes the output is organized in `SDMGRPostProcessor`.
+
+## Utils Migration
+
+Utility functions are now grouped together under `mmocr/utils/`. Here are the scopes of the files in this directory:
+
+- bbox_utils.py: bounding box related functions.
+- check_argument.py: used to check argument type.
+- collect_env.py: used to collect running environment.
+- data_converter_utils.py: used for data format conversion.
+- fileio.py: file input and output related functions.
+- img_utils.py: image processing related functions.
+- mask_utils.py: mask related functions.
+- ocr.py: used for MMOCR inference.
+- parsers.py: used for parsing datasets.
+- polygon_utils.py: polygon related functions.
+- setup_env.py: used for initialize MMOCR.
+- string_utils.py: string related functions.
+- typing.py: defines the abbreviation of types used in MMOCR.
diff --git a/mmocr-dev-1.x/docs/en/migration/dataset.md b/mmocr-dev-1.x/docs/en/migration/dataset.md
new file mode 100644
index 0000000000000000000000000000000000000000..6238c344fee15183e43bb67d67ea573b669f586d
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/migration/dataset.md
@@ -0,0 +1,257 @@
+# Dataset Migration
+
+Based on the new design of [BaseDataset](mmengine.dataset.BaseDataset) in [MMEngine](https://github.com/open-mmlab/mmengine), we have refactored the base OCR dataset class [`OCRDataset`](mmocr.datasets.OCRDataset) in MMOCR 1.0. The following document describes the differences between the old and new dataset formats in MMOCR, and how to migrate from the deprecated version to the latest. For users who do not want to migrate datasets at this time, we also provide a temporary solution in [Section Compatibility](#compatibility).
+
+```{note}
+The Key Information Extraction task still uses the original WildReceipt dataset annotation format.
+```
+
+## Review of Old Dataset Formats
+
+MMOCR version 0.x implements a number of dataset classes, such as `IcdarDataset`, `TextDetDataset` for text detection tasks, and `OCRDataset`, `OCRSegDataset` for text recognition tasks. At the same time, the annotations may vary in different formats, such as `.txt`, `.json`, `.jsonl`. Users have to manually configure the `Loader` and the `Parser` while customizing the datasets.
+
+### Text Detection
+
+For the text detection task, `IcdarDataset` uses a COCO-like annotation format.
+
+```json
+{
+ "images": [
+ {
+ "id": 1,
+ "width": 800,
+ "height": 600,
+ "file_name": "test.jpg"
+ }
+ ],
+ "annotations": [
+ {
+ "id": 1,
+ "image_id": 1,
+ "category_id": 1,
+ "bbox": [0,0,10,10],
+ "segmentation": [
+ [0,0,10,0,10,10,0,10]
+ ],
+ "area": 100,
+ "iscrowd": 0
+ }
+ ]
+}
+```
+
+The `TextDetDataset` uses the JSON Line storage format, converting COCO-like labels to strings and saves them in `.txt` or `.jsonl` format files.
+
+```text
+{"file_name": "test/img_2.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 0, "category_id": 1, "bbox": [602.0, 173.0, 33.0, 24.0], "segmentation": [[602, 173, 635, 175, 634, 197, 602, 196]]}, {"iscrowd": 0, "category_id": 1, "bbox": [734.0, 310.0, 58.0, 54.0], "segmentation": [[734, 310, 792, 320, 792, 364, 738, 361]]}]}
+{"file_name": "test/img_5.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [405.0, 409.0, 32.0, 52.0], "segmentation": [[408, 409, 437, 436, 434, 461, 405, 433]]}, {"iscrowd": 1, "category_id": 1, "bbox": [435.0, 434.0, 8.0, 33.0], "segmentation": [[437, 434, 443, 440, 441, 467, 435, 462]]}]}
+```
+
+### Text Recognition
+
+For text recognition tasks, there are two annotation formats in MMOCR version 0.x. The simple `.txt` annotations separate image name and word annotation by a blank space, which cannot handle the case when spaces are included in a text instance.
+
+```text
+img1.jpg OpenMMLab
+img2.jpg MMOCR
+```
+
+The JSON Line format uses a dictionary-like structure to represent the annotations, where the keys `filename` and `text` store the image name and word label, respectively.
+
+```json
+{"filename": "img1.jpg", "text": "OpenMMLab"}
+{"filename": "img2.jpg", "text": "MMOCR"}
+```
+
+## New Dataset Format
+
+To solve the dataset issues, MMOCR 1.x adopts a unified dataset design introduced in MMEngine. Each annotation file is a `.json` file that stores a `dict`, containing both `metainfo` and `data_list`, where the former includes basic information about the dataset and the latter consists of the label item of each target instance.
+
+```json
+{
+ "metainfo":
+ {
+ "classes": ("cat", "dog"),
+ // ...
+ },
+ "data_list":
+ [
+ {
+ "img_path": "xxx/xxx_0.jpg",
+ "img_label": 0,
+ // ...
+ },
+ // ...
+ ]
+}
+```
+
+Based on the above structure, we introduced `TextDetDataset`, `TextRecogDataset` for MMOCR-specific tasks.
+
+### Text Detection
+
+#### Introduction of the New Format
+
+The `TextDetDataset` holds the information required by the text detection task, such as bounding boxes and labels. We refer users to `tests/data/det_toy_dataset/instances_test.json` which is an example annotation for `TextDetDataset`.
+
+```json
+{
+ "metainfo":
+ {
+ "dataset_type": "TextDetDataset",
+ "task_name": "textdet",
+ "category": [{"id": 0, "name": "text"}]
+ },
+ "data_list":
+ [
+ {
+ "img_path": "test_img.jpg",
+ "height": 640,
+ "width": 640,
+ "instances":
+ [
+ {
+ "polygon": [0, 0, 0, 10, 10, 20, 20, 0],
+ "bbox": [0, 0, 10, 20],
+ "bbox_label": 0,
+ "ignore": False
+ }๏ผ
+ // ...
+ ]
+ }
+ ]
+}
+```
+
+The bounding box format is as follows: `[min_x, min_y, max_x, max_y]`
+
+#### Migration Script
+
+We provide a migration script to help users migrate old annotation files to the new format.
+
+```bash
+python tools/dataset_converters/textdet/data_migrator.py ${IN_PATH} ${OUT_PATH}
+```
+
+| ARGS | Type | Description |
+| -------- | -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| in_path | str | ๏ผRequired๏ผPath to the old annotation file. |
+| out_path | str | ๏ผRequired๏ผPath to the new annotation file. |
+| --task | 'auto', 'textdet', 'textspotter' | Specifies the compatible task for the output dataset annotation. If 'textdet' is specified, the text field in coco format will not be dumped. The default is 'auto', which automatically determines the output format based on the the old annotation files. |
+
+### Text Recognition
+
+#### Introduction of the New Format
+
+The `TextRecogDataset` holds the information required by the text detection task, such as text and image path. We refer users to `tests/data/rec_toy_dataset/labels.json` which is an example annotation for `TextRecogDataset`.
+
+```json
+{
+ "metainfo":
+ {
+ "dataset_type": "TextRecogDataset",
+ "task_name": "textrecog",
+ },
+ "data_list":
+ [
+ {
+ "img_path": "test_img.jpg",
+ "instances":
+ [
+ {
+ "text": "GRAND"
+ }
+ ]
+ }
+ ]
+}
+```
+
+#### Migration Script
+
+We provide a migration script to help users migrate old annotation files to the new format.
+
+```bash
+python tools/dataset_converters/textrecog/data_migrator.py ${IN_PATH} ${OUT_PATH} --format ${txt, jsonl, lmdb}
+```
+
+| ARGS | Type | Description |
+| -------- | ---------------------- | ------------------------------------------------- |
+| in_path | str | ๏ผRequired๏ผPath to the old annotation file. |
+| out_path | str | ๏ผRequired๏ผPath to the new annotation file. |
+| --format | 'txt', 'jsonl', 'lmdb' | Specify the format of the old dataset annotation. |
+
+## Compatibility
+
+In consideration of the cost to users for data migration, we have temporarily made MMOCR version 1.x compatible with the old MMOCR 0.x format.
+
+```{note}
+The code and components used for compatibility with the old data format may be completely removed in a future release. Therefore, we strongly recommend that users migrate their datasets to the new data format.
+```
+
+Specifically, we provide three dataset classes [IcdarDataset](mmocr.datasets.IcdarDataset), [RecogTextDataset](mmocr.datasets.RecogTextDataset), [RecogLMDBDataset](mmocr.datasets.RecogLMDBDataset) to support the old formats.
+
+1. [IcdarDataset](mmocr.datasets.IcdarDataset) supports COCO-like format annotations for text detection. You just need to add a new dataset config to `configs/textdet/_base_/datasets` and specify its dataset type as `IcdarDataset`.
+
+ ```python
+ data_root = 'data/det/icdar2015'
+ train_anno_path = 'instances_training.json'
+
+ train_dataset = dict(
+ type='IcdarDataset',
+ data_root=data_root,
+ ann_file=train_anno_path,
+ data_prefix=dict(img_path='imgs/'),
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=None)
+ ```
+
+2. [RecogTextDataset](mmocr.datasets.RecogTextDataset) supports `.txt` and `.jsonl` format annotations for text recognition. You just need to add a new dataset config to `configs/textrecog/_base_/datasets` and specify its dataset type as `RecogTextDataset`. For example, the following example shows how to configure and load the 0.x format labels `old_label.txt` and `old_label.jsonl` from the toy dataset.
+
+ ```python
+ data_root = 'tests/data/rec_toy_dataset/'
+
+ # loading 0.x txt format annos
+ txt_dataset = dict(
+ type='RecogTextDataset',
+ data_root=data_root,
+ ann_file='old_label.txt',
+ data_prefix=dict(img_path='imgs'),
+ parser_cfg=dict(
+ type='LineStrParser',
+ keys=['filename', 'text'],
+ keys_idx=[0, 1]),
+ pipeline=[])
+
+ # loading 0.x json line format annos
+ jsonl_dataset = dict(
+ type='RecogTextDataset',
+ data_root=data_root,
+ ann_file='old_label.jsonl',
+ data_prefix=dict(img_path='imgs'),
+ parser_cfg=dict(
+ type='LineJsonParser',
+ keys=['filename', 'text'],
+ pipeline=[]))
+ ```
+
+3. [RecogLMDBDataset](mmocr.datasets.RecogLMDBDataset) supports LMDB format dataset (img+labels) for text recognition. You just need to add a new dataset config to `configs/textrecog/_base_/datasets` and specify its dataset type as `RecogLMDBDataset`. For example, the following example shows how to configure and load the **both labels and images** `imgs.lmdb` from the toy dataset.
+
+- set the dataset type to `RecogLMDBDataset`
+
+```python
+# Specify the dataset type as RecogLMDBDataset
+ data_root = 'tests/data/rec_toy_dataset/'
+
+ lmdb_dataset = dict(
+ type='RecogLMDBDataset',
+ data_root=data_root,
+ ann_file='imgs.lmdb',
+ pipeline=None)
+```
+
+- replace the [`LoadImageFromFile`](mmocr.datasets.transforms.LoadImageFromFile) with [`LoadImageFromNDArray`](mmocr.datasets.transforms.LoadImageFromNDArray) in the data pipelines in `train_pipeline` and `test_pipeline`., for example๏ผ
+
+```python
+ train_pipeline = [dict(type='LoadImageFromNDArray')]
+```
diff --git a/mmocr-dev-1.x/docs/en/migration/model.md b/mmocr-dev-1.x/docs/en/migration/model.md
new file mode 100644
index 0000000000000000000000000000000000000000..2ab507470ed9892b12f420c40eb81bbecd0bfd29
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/migration/model.md
@@ -0,0 +1,5 @@
+# Pretrained Model Migration
+
+Due to the extensive refactoring and fixing of the model structure in the new version, MMOCR 1.x does not support load weights trained by the old version. We have updated the pre-training weights and logs of all models on our website.
+
+In addition, we are working on the development of a weight migration tool for text detection tasks and plan to release it in the near future. Since the text recognition and key information extraction models are too much modified and the migration is lossy, we do not plan to support them accordingly for the time being. If you have specific requirements, please feel free to raise an [Issue](https://github.com/open-mmlab/mmocr/issues).
diff --git a/mmocr-dev-1.x/docs/en/migration/news.md b/mmocr-dev-1.x/docs/en/migration/news.md
new file mode 100644
index 0000000000000000000000000000000000000000..1dc991186953916a4a6d4a47d4a1c30c7d8ada7d
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/migration/news.md
@@ -0,0 +1,19 @@
+# What's New in MMOCR 1.x
+
+Here are some highlights of MMOCR 1.x compared to 0.x.
+
+1. **New engines**. MMOCR 1.x is based on [MMEngine](https://github.com/open-mmlab/mmengine), which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.
+
+2. **Unified interfaces**. As a part of the OpenMMLab 2.0 projects, MMOCR 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
+
+3. **Cross project calling**. Benefiting from the unified design, you can use the models implemented in other OpenMMLab projects, such as MMDet. We provide an example of how to use MMDetection's Mask R-CNN through `MMDetWrapper`. Check our documents for more details. More wrappers will be released in the future.
+
+4. **Stronger visualization**. We provide a series of useful tools which are mostly based on brand-new visualizers. As a result, it is more convenient for the users to explore the models and datasets now.
+
+5. **More documentation and tutorials**. We add a bunch of documentation and tutorials to help users get started more smoothly.
+
+6. **One-stop Dataset Preparaion**. Multiple datasets are instantly ready with only one line of command, via our [Dataset Preparer](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/dataset_preparer.html).
+
+7. **Embracing more `projects/`**: We now introduce `projects/` folder, where some experimental features, frameworks and models can be placed, only needed to satisfy the minimum requirement on the code quality. Everyone is welcome to post their implementation of any great ideas in this folder! Learn more from our [example project](https://github.com/open-mmlab/mmocr/blob/dev-1.x/projects/example_project/).
+
+8. **More models**. MMOCR 1.0 supports more tasks and more state-of-the-art models!
diff --git a/mmocr-dev-1.x/docs/en/migration/overview.md b/mmocr-dev-1.x/docs/en/migration/overview.md
new file mode 100644
index 0000000000000000000000000000000000000000..e389781dc5279340ef43301e373e0e5fa10d63cc
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/migration/overview.md
@@ -0,0 +1,18 @@
+# Overview
+
+Along with the release of OpenMMLab 2.0, MMOCR 1.0 made many significant changes, resulting in less redundant, more efficient code and a more consistent overall design. However, these changes break backward compatibility. We understand that with such huge changes, it is not easy for users familiar with the old version to adapt to the new version. Therefore, we prepared a detailed migration guide to make the transition as smooth as possible so that all users can enjoy the productivity benefits of the new MMOCR and the entire OpenMMLab 2.0 ecosystem.
+
+```{warning}
+MMOCR 1.0 depends on the new foundational library for training deep learning models [MMEngine](https://github.com/open-mmlab/mmengine), and therefore has an entirely different dependency chain compared with MMOCR 0.x. Even if you have a well-rounded MMOCR 0.x environment before, you still need to create a new python environment for MMOCR 1.0. We provide a detailed [installation guide](../get_started/install.md) for reference.
+```
+
+Next, please read the sections according to your requirements.
+
+- Read [What's new in MMOCR 1.x](./news.md) to learn about the new features and changes in MMOCR 1.x.
+- If you want to migrate a model trained in version 0.x to use it directly in version 1.0, please read [Pretrained Model Migration](./model.md).
+- If you want to train the model, please read [Dataset Migration](./dataset.md) and [Data Transform Migration](./transforms.md).
+- If you want to develop on MMOCR, please read [Code Migration](code.md), [Branch Migration](branches.md) and [Upstream Library Changes](https://github.com/open-mmlab/mmengine/tree/main/docs/en/migration).
+
+As shown in the following figure, the maintenance plan of MMOCR 1.x version is mainly divided into three stages, namely "RC Period", "Compatibility Period" and "Maintenance Period". For old versions, we will no longer add major new features. Therefore, we strongly recommend users to migrate to MMOCR 1.x version as soon as possible.
+
+![plan](https://user-images.githubusercontent.com/45810070/192927112-70c0108d-58ed-4c77-8a0a-9d9685a48333.png)
diff --git a/mmocr-dev-1.x/docs/en/migration/transforms.md b/mmocr-dev-1.x/docs/en/migration/transforms.md
new file mode 100644
index 0000000000000000000000000000000000000000..33661313d5abae59b49e59e6fcf53591c40b3959
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/migration/transforms.md
@@ -0,0 +1,583 @@
+# Data Transform Migration
+
+## Introduction
+
+In MMOCR version 0.x, we implemented a series of **Data Transform** methods in `mmocr/datasets/pipelines/xxx_transforms.py`. However, these modules are scattered all over the place and lack a standardized design. Therefore, we refactored all the data transform modules in MMOCR version 1.x. According to the task type, they are now defined in `ocr_transforms.py`, `textdet_transforms.py`, and `textrecog_transforms.py`, respectively, under `mmocr/datasets/transforms`. Specifically, `ocr_transforms.py` implements the data augmentation methods for OCR-related tasks in general, while `textdet_transforms.py` and `textrecog_transforms.py` implement data augmentation transforms related to text detection and text recognition tasks, respectively.
+
+Since some of the modules were renamed, merged or separated during the refactoring process, the new interface and default parameters may be inconsistent with the old version. Therefore, this migration guide will introduce how to configure the new data transforms to achieve the identical behavior as the old version.
+
+## Configuration Migration Guide
+
+### Data Formatting Related Data Transforms
+
+1. `Collect` + `CustomFormatBundle` -> [`PackTextDetInputs`](mmocr.datasets.transforms.formatting.PackTextDetInputs)/[`PackTextRecogInputs`](mmocr.datasets.transforms.formatting.PackTextRecogInputs)
+
+`PackxxxInputs` implements both `Collect` and `CustomFormatBundle` functions, and no longer has `key` parameters, the generation of training targets is moved to be done in `loss` modules.
+
+
+
+### Data Augmentation Related Data Transforms
+
+1. `ResizeOCR` -> [`Resize`](mmocr.datasets.transforms.Resize), [`RescaleToHeight`](mmocr.datasets.transforms.RescaleToHeight), [`PadToWidth`](mmocr.datasets.transforms.PadToWidth)
+
+ The original `ResizeOCR` is now split into three data augmentation modules.
+
+ When `keep_aspect_ratio=False`, it is equivalent to `Resize` in version 1.x. Its configuration can be modified as follows.
+
+
+
+When `keep_aspect_ratio=True` and `max_width=None`. The image will be rescaled to a fixed size alongside the height while keeping the aspect ratio the same as the origin.
+
+
+
+When `keep_aspect_ratio=True` and `max_width` is a fixed value. The image will be rescaled to a fixed size alongside the height while keeping the aspect ratio the same as the origin. Then, the width will be padded or cropped to `max_width`. That is to say, the shape of the output image is always `(height, max_width)`.
+
+
+
+2. `RandomRotateTextDet` & `RandomRotatePolyInstances` -> [`RandomRotate`](mmocr.datasets.transforms.RandomRotate)
+
+ We implemented all random rotation-related data augmentation in `RandomRotate` in version 1.x. Its default behavior is identical to the `RandomRotateTextDet` in version 0.x.
+
+```{note}
+ The default value of "max_angle" might be different from the old version, so the users are suggested to manually set the number.
+```
+
+
+
+For `RandomRotatePolyInstances`๏ผit is supposed to set `use_canvas=True`ใ
+
+
+
+
+
MMOCR 0.x Configuration
+
MMOCR 1.x Configuration
+
+
+
+
+```python
+dict(
+ type='RandomRotatePolyInstances',
+ rotate_ratio=0.5, # Specify the execution probability
+ max_angle=60,
+ pad_with_fixed_color=False)
+```
+
+
+
+```python
+# Wrap the data transforms with RandomApply and specify the execution probability
+dict(
+ type='RandomApply',
+ transforms=[
+ dict(type='RandomRotate',
+ max_angle=60,
+ pad_with_fixed_color=False,
+ use_canvas=True)],
+ prob=0.5) # Specify the execution probability
+```
+
+
+
+
+
+```{note}
+In version 0.x, some data augmentation methods specified execution probability by defining an internal variable "xxx_ratio", such as "rotate_ratio", "crop_ratio", etc. In version 1.x, these parameters have been removed. Now we can use "RandomApply" to wrap different data transforms and specify their execution probabilities.
+```
+
+3. `RandomCropFlip` -> [`TextDetRandomCropFlip`](mmocr.datasets.transforms.TextDetRandomCropFlip)
+
+ Currently, only the method name has been changed, and other parameters remain the same.
+
+4. `RandomCropPolyInstances` -> [`RandomCrop`](mmocr.datasets.transforms.RandomCrop)
+
+ In MMOCR version 1.x, `crop_ratio` and `instance_key` are removed. The `gt_polygons` is now used as the target for cropping.
+
+
+
+
+
MMOCR 0.x Configuration
+
MMOCR 1.x Configuration
+
+
+
+
+```python
+dict(
+ type='RandomCropPolyInstances',
+ instance_key='gt_masks',
+ crop_ratio=0.8, # Specify the execution probability
+ min_side_ratio=0.3)
+```
+
+
+
+```python
+# Wrap the data transforms with RandomApply and specify the execution probability
+dict(
+ type='RandomApply',
+ transforms=[dict(type='RandomCrop', min_side_ratio=0.3)],
+ prob=0.8) # Specify the execution probability
+```
+
+
+
+
+
+5. `RandomCropInstances` -> [`TextDetRandomCrop`](mmocr.datasets.transforms.TextDetRandomCrop)
+
+ In MMOCR version 1.x, `crop_ratio` and `instance_key` are removed. The `gt_polygons` is now used as the target for cropping.
+
+
+
+6. `EastRandomCrop` -> [`RandomCrop`](mmocr.datasets.transforms.RandomCrop) + [`Resize`](mmocr.datasets.transforms.Resize) + [`mmengine.Pad`](mmcv.transforms.Pad)
+
+ `EastRandomCrop` was implemented by applying cropping, scaling and padding to the input image. Now, the same effect can be achieved by combining three data transforms.
+
+
+
+7. `RandomScaling` -> [`mmengine.RandomResize`](mmcv.transforms.RandomResize)
+
+ The `RandomScaling` is now replaced with [`mmengine.RandomResize`](mmcv.transforms.RandomResize).
+
+
+
+```{note}
+By default, the data pipeline will search for the corresponding data transforms from the register of the current *scope*, and if that data transform does not exist, it will continue to search in the upstream library, such as MMCV and MMEngine. For example, the `RandomResize` transform is not implemented in MMOCR, but it can be directly called in the configuration, as the program will automatically search for it from MMCV. In addition, you can also specify *scope* by adding a prefix. For example, `mmengine.RandomResize` will force it to use `RandomResize` implemented in MMEngine, which is useful when a method of the same name exists in both upstream and downstream libraries. It is noteworthy that all of the data transforms implemented in MMCV are registered to MMEngine, that is why we use `mmengine.RandomResize` but not `mmcv.RandomResize`.
+```
+
+8. `SquareResizePad` -> [`Resize`](mmocr.datasets.transforms.Resize) + [`SourceImagePad`](mmocr.datasets.transforms.SourceImagePad)
+
+ `SquareResizePad` implements two branches and uses one of them randomly based on the `pad_ratio`. Specifically, one branch first resizes the image and then pads it to a certain size; while the other branch only resizes the image. To enhance the reusability of the different modules, we split this data transform into a combination of `Resize` + `SourceImagePad` in version 1.x, and control the branches via `RandomChoice`.
+
+
+
+```{note}
+In version 1.x, the random choice wrapper "RandomChoice" replaces "OneOfWrapper", allowing random selection of data transform combinations.
+```
+
+9. `RandomWrapper` -> [`mmengine.RandomApply`](mmcv.transforms.RandomApply)
+
+ In version 1.x, the `RandomWrapper` wrapper has been replaced with `RandomApply` in MMEngine, which is used to specify the probability of performing a data transform. And the probability `p` is now named `prob`.
+
+
+
+10. `OneOfWrapper` -> [`mmengine.RandomChoice`](mmcv.transforms.RandomChoice)
+
+ The random choice wrapper is now renamed to `RandomChoice` and is used in exactly the same way as before.
+
+11. `ScaleAspectJitter` -> [`ShortScaleAspectJitter`](mmocr.datasets.transforms.ShortScaleAspectJitter), [`BoundedScaleAspectJitter`](mmocr.datasets.transforms.BoundedScaleAspectJitter)
+
+ The `ScaleAspectJitter` implemented several different image size jittering strategies, which has now been split into several independent data transforms.
+
+ When `resize_type='indep_sample_in_range'`, it is equivalent to `RandomResize`.
+
+
+
+When `resize_type='long_short_bound'`, we implemented `BoundedScaleAspectJitter`, which randomly rescales the image so that the long and short sides of the image are around the bound; then jitters the aspect ratio.
+
+
+
+When `resize_type='round_min_img_scale'`, we implemented `ShortScaleAspectJitter`, which rescales the image for its shorter side to reach the `short_size` and then jitters its aspect ratio, finally rescales the shape guaranteed to be divided by scale_divisor.
+
+
diff --git a/mmocr-dev-1.x/docs/en/notes/branches.md b/mmocr-dev-1.x/docs/en/notes/branches.md
new file mode 100644
index 0000000000000000000000000000000000000000..9b799946895de88065724684dd28939d635cfc83
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/notes/branches.md
@@ -0,0 +1,25 @@
+# Branches
+
+This documentation aims to provide a comprehensive understanding of the purpose and features of each branch in MMOCR.
+
+## Branch Overview
+
+### 1. `main`
+
+The `main` branch serves as the default branch for the MMOCR project. It contains the latest stable version of MMOCR, currently housing the code for MMOCR 1.x (e.g. v1.0.0). The `main` branch ensures users have access to the most recent and reliable version of the software.
+
+### 2. `dev-1.x`
+
+The `dev-1.x` branch is dedicated to the development of the next major version of MMOCR. This branch will routinely undergo reliance tests, and the passing commits will be squashed in a release and published to the `main` branch. By having a separate development branch, the project can continue to evolve without impacting the stability of the `main` branch. **All the PRs should be merged into the `dev-1.x` branch.**
+
+### 3. `0.x`
+
+The `0.x` branch serves as an archive for MMOCR 0.x (e.g. v0.6.3). This branch will no longer actively receive updates or improvements, but it remains accessible for historical reference or for users who have not yet upgraded to MMOCR 1.x.
+
+### 3. `1.x`
+
+It's an alias of `main` branch, which is intended for a smooth transition from the compatibility period. It will be removed in mid 2023.
+
+```{note}
+The branches mapping has been changed in 2023.04.06. For the legacy branches mapping and the guide for migration, please refer to the [branch migration guide](../migration/branches.md).
+```
diff --git a/mmocr-dev-1.x/docs/en/notes/changelog.md b/mmocr-dev-1.x/docs/en/notes/changelog.md
new file mode 100644
index 0000000000000000000000000000000000000000..b04e69531cb606f4f7e2e9d326d9955c6deab5f2
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/notes/changelog.md
@@ -0,0 +1,458 @@
+# Changelog of v1.x
+
+## v1.0.0 (04/06/2023)
+
+We are excited to announce the first official release of MMOCR 1.0, with numerous enhancements, bug fixes, and the introduction of new dataset support!
+
+### ๐ Highlights
+
+- Support for SCUT-CTW1500, SynthText, and MJSynth datasets
+- Updated FAQ and documentation
+- Deprecation of file_client_args in favor of backend_args
+- Added a new MMOCR tutorial notebook
+
+### ๐ New Features & Enhancement
+
+- Add SCUT-CTW1500 by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1677
+- Cherry Pick #1205 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1774
+- Make lanms-neo optional by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1772
+- SynthText by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1779
+- Deprecate file_client_args and use backend_args instead by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1765
+- MJSynth by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1791
+- Add MMOCR tutorial notebook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1771
+- decouple batch_size to det_batch_size, rec_batch_size and kie_batch_size in MMOCRInferencer by @hugotong6425 in https://github.com/open-mmlab/mmocr/pull/1801
+- Accepts local-rank in train.py and test.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1806
+- update stitch_boxes_into_lines by @cherryjm in https://github.com/open-mmlab/mmocr/pull/1824
+- Add tests for pytorch 2.0 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1836
+
+### ๐ Docs
+
+- FAQ by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1773
+- Remove LoadImageFromLMDB from docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1767
+- Mark projects in docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1766
+- add opendatalab download link by @jorie-peng in https://github.com/open-mmlab/mmocr/pull/1753
+- Fix some deadlinks in the docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1469
+- Fix quick run by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1775
+- Dataset by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1782
+- Update faq by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1817
+- more social network links by @fengshiwest in https://github.com/open-mmlab/mmocr/pull/1818
+- Update docs after branch switching by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1834
+
+### ๐ ๏ธ Bug Fixes:
+
+- Place dicts to .mim by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1781
+- Test svtr_small instead of svtr_tiny by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1786
+- Add pse weight to metafile by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1787
+- Synthtext metafile by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1788
+- Clear up some unused scripts by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1798
+- if dst not exists, when move a single file may raise a file not exists error. by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1803
+- CTW1500 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1814
+- MJSynth & SynthText Dataset Preparer config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1805
+- Use poly_intersection instead of poly.intersection to avoid supโฆ by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1811
+- Abinet: fix ValueError: Blur limit must be odd when centered=True. Got: (3, 6) by @hugotong6425 in https://github.com/open-mmlab/mmocr/pull/1821
+- Bug generated during kie inference visualization by @Yangget in https://github.com/open-mmlab/mmocr/pull/1830
+- Revert sync bn in inferencer by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1832
+- Fix mmdet digit version by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1840
+
+### ๐ New Contributors
+
+- @jorie-peng made their first contribution in https://github.com/open-mmlab/mmocr/pull/1753
+- @hugotong6425 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1801
+- @fengshiwest made their first contribution in https://github.com/open-mmlab/mmocr/pull/1818
+- @cherryjm made their first contribution in https://github.com/open-mmlab/mmocr/pull/1824
+- @Yangget made their first contribution in https://github.com/open-mmlab/mmocr/pull/1830
+
+Thank you to all the contributors for making this release possible! We're excited about the new features and enhancements in this version, and we're looking forward to your feedback and continued support. Happy coding! ๐
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc6...v1.0.0
+
+### Highlights
+
+## v1.0.0rc6 (03/07/2023)
+
+### Highlights
+
+1. Two new models, ABCNet v2 (inference only) and SPTS are added to `projects/` folder.
+2. Announcing `Inferencer`, a unified inference interface in OpenMMLab for everyone's easy access and quick inference with all the pre-trained weights. [Docs](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/inference.html)
+3. Users can use test-time augmentation for text recognition tasks. [Docs](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/train_test.html#test-time-augmentation)
+4. Support [batch augmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Hoffer_Augment_Your_Batch_Improving_Generalization_Through_Instance_Repetition_CVPR_2020_paper.pdf) through [`BatchAugSampler`](https://github.com/open-mmlab/mmocr/pull/1757), which is a technique used in SPTS.
+5. Dataset Preparer has been refactored to allow more flexible configurations. Besides, users are now able to prepare text recognition datasets in LMDB formats. [Docs](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/dataset_preparer.html#lmdb-format)
+6. Some textspotting datasets have been revised to enhance the correctness and consistency with the common practice.
+7. Potential spurious warnings from `shapely` have been eliminated.
+
+### Dependency
+
+This version requires MMEngine >= 0.6.0, MMCV >= 2.0.0rc4 and MMDet >= 3.0.0rc5.
+
+### New Features & Enhancements
+
+- Discard deprecated lmdb dataset format and only support img+label now by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1681
+- abcnetv2 inference by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1657
+- Add RepeatAugSampler by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1678
+- SPTS by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1696
+- Refactor Inferencers by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1608
+- Dynamic return type for rescale_polygons by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1702
+- Revise upstream version limit by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1703
+- TextRecogCropConverter add crop with opencv warpPersepective function by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1667
+- change cudnn benchmark to false by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1705
+- Add ST-pretrained DB-series models and logs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1635
+- Only keep meta and state_dict when publish model by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1729
+- Rec TTA by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1401
+- Speedup formatting by replacing np.transpose with torchโฆ by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1719
+- Support auto import modules from registry. by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1731
+- Support batch visualization & dumping in Inferencer by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1722
+- add a new argument font_properties to set a specific font file in order to draw Chinese characters properly by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1709
+- Refactor data converter and gather by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1707
+- Support batch augmentation through BatchAugSampler by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1757
+- Put all registry into registry.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1760
+- train by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1756
+- configs for regression benchmark by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1755
+- Support lmdb format in Dataset Preparer by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1762
+
+### Docs
+
+- update the link of DBNet by @AllentDan in https://github.com/open-mmlab/mmocr/pull/1672
+- Add notice for default branch switching by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1693
+- docs: Add twitter discord medium youtube link by @vansin in https://github.com/open-mmlab/mmocr/pull/1724
+- Remove unsupported datasets in docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1670
+
+### Bug Fixes
+
+- Update dockerfile by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1671
+- Explicitly create np object array for compatibility by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1691
+- Fix a minor error in docstring by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1685
+- Fix lint by @triple-Mu in https://github.com/open-mmlab/mmocr/pull/1694
+- Fix LoadOCRAnnotation ut by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1695
+- Fix isort pre-commit error by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1697
+- Update owners by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1699
+- Detect intersection before using shapley.intersection to eliminate spurious warnings by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1710
+- Fix some inferencer bugs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1706
+- Fix textocr ignore flag by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1712
+- Add missing softmax in ASTER forward_test by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1718
+- Fix head in readme by @vansin in https://github.com/open-mmlab/mmocr/pull/1727
+- Fix some browse dataset script bugs and draw textdet gt instance with ignore flags by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1701
+- icdar textrecog ann parser skip data with ignore flag by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1708
+- bezier_to_polygon -> bezier2polygon by @double22a in https://github.com/open-mmlab/mmocr/pull/1739
+- Fix docs recog CharMetric P/R error definition by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1740
+- Remove outdated resources in demo/ by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1747
+- Fix wrong ic13 textspotting split data; add lexicons to ic13, ic15 and totaltext by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1758
+- SPTS readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1761
+
+### New Contributors
+
+- @triple-Mu made their first contribution in https://github.com/open-mmlab/mmocr/pull/1694
+- @double22a made their first contribution in https://github.com/open-mmlab/mmocr/pull/1739
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc5...v1.0.0rc6
+
+## v1.0.0rc5 (01/06/2023)
+
+### Highlights
+
+1. Two models, Aster and SVTR, are added to our model zoo. The full implementation of ABCNet is also available now.
+2. Dataset Preparer supports 5 more datasets: CocoTextV2, FUNSD, TextOCR, NAF, SROIE.
+3. We have 4 more text recognition transforms, and two helper transforms. See https://github.com/open-mmlab/mmocr/pull/1646 https://github.com/open-mmlab/mmocr/pull/1632 https://github.com/open-mmlab/mmocr/pull/1645 for details.
+4. The transform, `FixInvalidPolygon`, is getting smarter at dealing with invalid polygons, and now capable of handling more weird annotations. As a result, a complete training cycle on TotalText dataset can be performed bug-free. The weights of DBNet and FCENet pretrained on TotalText are also released.
+
+### New Features & Enhancements
+
+- Update ic15 det config according to DataPrepare by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1617
+- Refactor icdardataset metainfo to lowercase. by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1620
+- Add ASTER Encoder by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1239
+- Add ASTER decoder by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1625
+- Add ASTER config by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1238
+- Update ASTER config by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1629
+- Support browse_dataset.py to visualize original dataset by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1503
+- Add CocoTextv2 to dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1514
+- Add Funsd to dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1550
+- Add TextOCR to Dataset Preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1543
+- Refine example projects and readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1628
+- Enhance FixInvalidPolygon, add RemoveIgnored transform by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1632
+- ConditionApply by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1646
+- Add NAF to dataset preparer by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1609
+- Add SROIE to dataset preparer by @FerryHuang in https://github.com/open-mmlab/mmocr/pull/1639
+- Add svtr decoder by @willpat1213 in https://github.com/open-mmlab/mmocr/pull/1448
+- Add missing unit tests by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1651
+- Add svtr encoder by @willpat1213 in https://github.com/open-mmlab/mmocr/pull/1483
+- ABCNet train by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1610
+- Totaltext cfgs for DB and FCE by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1633
+- Add Aliases to models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1611
+- SVTR transforms by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1645
+- Add SVTR framework and configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1621
+- Issue Template by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1663
+
+### Docs
+
+- Add Chinese translation for browse_dataset.py by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1647
+- updata abcnet doc by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1658
+- update the dbnetpp\`s readme file by @zhuyue66 in https://github.com/open-mmlab/mmocr/pull/1626
+- Inferencer docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1744
+
+### Bug Fixes
+
+- nn.SmoothL1Loss beta can not be zero in PyTorch 1.13 version by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1616
+- ctc loss bug if target is empty by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1618
+- Add torch 1.13 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1619
+- Remove outdated tutorial link by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1627
+- Dev 1.x some doc mistakes by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1630
+- Support custom font to visualize some languages (e.g. Korean) by @ProtossDragoon in https://github.com/open-mmlab/mmocr/pull/1567
+- db_module_loss๏ผnegative number encountered in sqrt by @KevinNuNu in https://github.com/open-mmlab/mmocr/pull/1640
+- Use int instead of np.int by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1636
+- Remove support for py3.6 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1660
+
+### New Contributors
+
+- @zhuyue66 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1626
+- @KevinNuNu made their first contribution in https://github.com/open-mmlab/mmocr/pull/1630
+- @FerryHuang made their first contribution in https://github.com/open-mmlab/mmocr/pull/1639
+- @willpat1213 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1448
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc4...v1.0.0rc5
+
+## v1.0.0rc4 (12/06/2022)
+
+### Highlights
+
+1. Dataset Preparer can automatically generate base dataset configs at the end of the preparation process, and supports 6 more datasets: IIIT5k, CUTE80, ICDAR2013, ICDAR2015, SVT, SVTP.
+2. Introducing our `projects/` folder - implementing new models and features into OpenMMLab's algorithm libraries has long been complained to be troublesome due to the rigorous requirements on code quality, which could hinder the fast iteration of SOTA models and might discourage community members from sharing their latest outcome here. We now introduce `projects/` folder, where some experimental features, frameworks and models can be placed, only needed to satisfy the minimum requirement on the code quality. Everyone is welcome to post their implementation of any great ideas in this folder! We also add the first [example project](https://github.com/open-mmlab/mmocr/tree/dev-1.x/projects/example_project) to illustrate what we expect a good project to have (check out the raw content of README.md for more info!).
+3. Inside the `projects/` folder, we are releasing the preview version of ABCNet, which is the first implementation of text spotting models in MMOCR. It's inference-only now, but the full implementation will be available very soon.
+
+### New Features & Enhancements
+
+- Add SVT to dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1521
+- Polish bbox2poly by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1532
+- Add SVTP to dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1523
+- Iiit5k converter by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1530
+- Add cute80 to dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1522
+- Add IC13 preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1531
+- Add 'Projects/' folder, and the first example project by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1524
+- Rename to {dataset-name}\_task_train/test by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1541
+- Add print_config.py to the tools by @IncludeMathH in https://github.com/open-mmlab/mmocr/pull/1547
+- Add get_md5 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1553
+- Add config generator by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1552
+- Support IC15_1811 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1556
+- Update CT80 config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1555
+- Add config generators to all textdet and textrecog configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1560
+- Refactor TPS by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1240
+- Add TextSpottingConfigGenerator by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1561
+- Add common typing by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1596
+- Update textrecog config and readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1597
+- Support head loss or postprocessor is None for only infer by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1594
+- Textspotting datasample by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1593
+- Simplify mono_gather by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1588
+- ABCNet v1 infer by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1598
+
+### Docs
+
+- Add Chinese Guidance on How to Add New Datasets to Dataset Preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1506
+- Update the qq group link by @vansin in https://github.com/open-mmlab/mmocr/pull/1569
+- Collapse some sections; update logo url by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1571
+- Update dataset preparer (CN) by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1591
+
+### Bug Fixes
+
+- Fix two bugs in dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1513
+- Register bug of CLIPResNet by @jyshee in https://github.com/open-mmlab/mmocr/pull/1517
+- Being more conservative on Dataset Preparer by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1520
+- python -m pip upgrade in windows by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1525
+- Fix wildreceipt metafile by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1528
+- Fix Dataset Preparer Extract by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1527
+- Fix ICDARTxtParser by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1529
+- Fix Dataset Zoo Script by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1533
+- Fix crop without padding and recog metainfo delete unuse info by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1526
+- Automatically create nonexistent directory for base configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1535
+- Change mmcv.dump to mmengine.dump by @ProtossDragoon in https://github.com/open-mmlab/mmocr/pull/1540
+- mmocr.utils.typing -> mmocr.utils.typing_utils by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1538
+- Wildreceipt tests by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1546
+- Fix judge exist dir by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1542
+- Fix IC13 textdet config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1563
+- Fix IC13 textrecog annotations by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1568
+- Auto scale lr by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1584
+- Fix icdar data parse for text containing separator by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1587
+- Fix textspotting ut by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1599
+- Fix TextSpottingConfigGenerator and TextSpottingDataConverter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1604
+- Keep E2E Inferencer output simple by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1559
+
+### New Contributors
+
+- @jyshee made their first contribution in https://github.com/open-mmlab/mmocr/pull/1517
+- @ProtossDragoon made their first contribution in https://github.com/open-mmlab/mmocr/pull/1540
+- @IncludeMathH made their first contribution in https://github.com/open-mmlab/mmocr/pull/1547
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc3...v1.0.0rc4
+
+## v1.0.0rc3 (11/03/2022)
+
+### Highlights
+
+1. We release several pretrained models using [oCLIP-ResNet](https://github.com/open-mmlab/mmocr/blob/1.x/configs/backbone/oclip/README.md) as the backbone, which is a ResNet variant trained with [oCLIP](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880282.pdf) and can significantly boost the performance of text detection models.
+
+2. Preparing datasets is troublesome and tedious, especially in OCR domain where multiple datasets are usually required. In order to free our users from laborious work, we designed a [Dataset Preparer](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/dataset_preparer.html) to help you get a bunch of datasets ready for use, with only **one line of command**! Dataset Preparer is also crafted to consist of a series of reusable modules, each responsible for handling one of the standardized phases throughout the preparation process, shortening the development cycle on supporting new datasets.
+
+### New Features & Enhancements
+
+- Add Dataset Preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1484
+
+* support modified resnet structure used in oCLIP by @HannibalAPE in https://github.com/open-mmlab/mmocr/pull/1458
+* Add oCLIP configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1509
+
+### Docs
+
+- Update install.md by @rogachevai in https://github.com/open-mmlab/mmocr/pull/1494
+- Refine some docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1455
+- Update some dataset preparer related docs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1502
+- oclip readme by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1505
+
+### Bug Fixes
+
+- Fix offline_eval error caused by new data flow by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1500
+
+### New Contributors
+
+- @rogachevai made their first contribution in https://github.com/open-mmlab/mmocr/pull/1494
+- @HannibalAPE made their first contribution in https://github.com/open-mmlab/mmocr/pull/1458
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc2...v1.0.0rc3
+
+## v1.0.0rc2 (10/14/2022)
+
+This release relaxes the version requirement of `MMEngine` to `>=0.1.0, < 1.0.0`.
+
+## v1.0.0rc1 (10/09/2022)
+
+### Highlights
+
+This release fixes a severe bug leading to inaccurate metric report in multi-GPU training.
+We release the weights for all the text recognition models in MMOCR 1.0 architecture. The inference shorthand for them are also added back to `ocr.py`. Besides, more documentation chapters are available now.
+
+### New Features & Enhancements
+
+- Simplify the Mask R-CNN config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1391
+- auto scale lr by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1326
+- Update paths to pretrain weights by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1416
+- Streamline duplicated split_result in pan_postprocessor by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1418
+- Update model links in ocr.py and inference.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1431
+- Update rec configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1417
+- Visualizer refine by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1411
+- Support get flops and parameters in dev-1.x by @vansin in https://github.com/open-mmlab/mmocr/pull/1414
+
+### Docs
+
+- intersphinx and api by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1367
+- Fix quickrun by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1374
+- Fix some docs issues by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1385
+- Add Documents for DataElements by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1381
+- config english by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1372
+- Metrics by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1399
+- Add version switcher to menu by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1407
+- Data Transforms by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1392
+- Fix inference docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1415
+- Fix some docs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1410
+- Add maintenance plan to migration guide by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1413
+- Update Recog Models by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1402
+
+### Bug Fixes
+
+- clear metric.results only done in main process by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1379
+- Fix a bug in MMDetWrapper by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1393
+- Fix browse_dataset.py by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1398
+- ImgAugWrapper: Do not cilp polygons if not applicable by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1231
+- Fix CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1365
+- Fix merge stage test by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1370
+- Del CI support for torch 1.5.1 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1371
+- Test windows cu111 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1373
+- Fix windows CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1387
+- Upgrade pre commit hooks by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1429
+- Skip invalid augmented polygons in ImgAugWrapper by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1434
+
+### New Contributors
+
+- @vansin made their first contribution in https://github.com/open-mmlab/mmocr/pull/1414
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc0...v1.0.0rc1
+
+## v1.0.0rc0 (09/01/2022)
+
+We are excited to announce the release of MMOCR 1.0.0rc0.
+MMOCR 1.0.0rc0 is the first version of MMOCR 1.x, a part of the OpenMMLab 2.0 projects.
+Built upon the new [training engine](https://github.com/open-mmlab/mmengine),
+MMOCR 1.x unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed.
+
+### Highlights
+
+1. **New engines**. MMOCR 1.x is based on [MMEngine](https://github.com/open-mmlab/mmengine), which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.
+
+2. **Unified interfaces**. As a part of the OpenMMLab 2.0 projects, MMOCR 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
+
+3. **Cross project calling**. Benefiting from the unified design, you can use the models implemented in other OpenMMLab projects, such as MMDet. We provide an example of how to use MMDetection's Mask R-CNN through `MMDetWrapper`. Check our documents for more details. More wrappers will be released in the future.
+
+4. **Stronger visualization**. We provide a series of useful tools which are mostly based on brand-new visualizers. As a result, it is more convenient for the users to explore the models and datasets now.
+
+5. **More documentation and tutorials**. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it [here](https://mmocr.readthedocs.io/en/dev-1.x/).
+
+### Breaking Changes
+
+We briefly list the major breaking changes here.
+We will update the [migration guide](../migration.md) to provide complete details and migration instructions.
+
+#### Dependencies
+
+- MMOCR 1.x relies on MMEngine to run. MMEngine is a new foundational library for training deep learning models in OpenMMLab 2.0 models. The dependencies of file IO and training are migrated from MMCV 1.x to MMEngine.
+- MMOCR 1.x relies on MMCV>=2.0.0rc0. Although MMCV no longer maintains the training functionalities since 2.0.0rc0, MMOCR 1.x relies on the data transforms, CUDA operators, and image processing interfaces in MMCV. Note that the package `mmcv` is the version that provide pre-built CUDA operators and `mmcv-lite` does not since MMCV 2.0.0rc0, while `mmcv-full` has been deprecated.
+
+#### Training and testing
+
+- MMOCR 1.x uses Runner in [MMEngine](https://github.com/open-mmlab/mmengine) rather than that in MMCV. The new Runner implements and unifies the building logic of dataset, model, evaluation, and visualizer. Therefore, MMOCR 1.x no longer maintains the building logics of those modules in `mmocr.train.apis` and `tools/train.py`. Those code have been migrated into [MMEngine](https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/runner.py). Please refer to the [migration guide of Runner in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/runner.html) for more details.
+- The Runner in MMEngine also supports testing and validation. The testing scripts are also simplified, which has similar logic as that in training scripts to build the runner.
+- The execution points of hooks in the new Runner have been enriched to allow more flexible customization. Please refer to the [migration guide of Hook in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/hook.html) for more details.
+- Learning rate and momentum scheduling has been migrated from `Hook` to `Parameter Scheduler` in MMEngine. Please refer to the [migration guide of Parameter Scheduler in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/param_scheduler.html) for more details.
+
+#### Configs
+
+- The [Runner in MMEngine](https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/runner.py) uses a different config structures to ease the understanding of the components in runner. Users can read the [config example of MMOCR](../user_guides/config.md) or refer to the [migration guide in MMEngine](https://mmengine.readthedocs.io/en/latest/migration/runner.html) for migration details.
+- The file names of configs and models are also refactored to follow the new rules unified across OpenMMLab 2.0 projects. Please refer to the [user guides of config](../user_guides/config.md) for more details.
+
+#### Dataset
+
+The Dataset classes implemented in MMOCR 1.x all inherits from the `BaseDetDataset`, which inherits from the [BaseDataset in MMEngine](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/basedataset.html). There are several changes of Dataset in MMOCR 1.x.
+
+- All the datasets support to serialize the data list to reduce the memory when multiple workers are built to accelerate data loading.
+- The interfaces are changed accordingly.
+
+#### Data Transforms
+
+The data transforms in MMOCR 1.x all inherits from those in MMCV>=2.0.0rc0, which follows a new convention in OpenMMLab 2.0 projects.
+The changes are listed as below:
+
+- The interfaces are also changed. Please refer to the [API Reference](https://mmocr.readthedocs.io/en/dev-1.x/)
+- The functionality of some data transforms (e.g., `Resize`) are decomposed into several transforms.
+- The same data transforms in different OpenMMLab 2.0 libraries have the same augmentation implementation and the logic of the same arguments, i.e., `Resize` in MMDet 3.x and MMOCR 1.x will resize the image in the exact same manner given the same arguments.
+
+#### Model
+
+The models in MMOCR 1.x all inherits from `BaseModel` in MMEngine, which defines a new convention of models in OpenMMLab 2.0 projects. Users can refer to the [tutorial of model](https://mmengine.readthedocs.io/en/latest/tutorials/model.html) in MMengine for more details. Accordingly, there are several changes as the following:
+
+- The model interfaces, including the input and output formats, are significantly simplified and unified following the new convention in MMOCR 1.x. Specifically, all the input data in training and testing are packed into `inputs` and `data_samples`, where `inputs` contains model inputs like a list of image tensors, and `data_samples` contains other information of the current data sample such as ground truths and model predictions. In this way, different tasks in MMOCR 1.x can share the same input arguments, which makes the models more general and suitable for multi-task learning.
+- The model has a data preprocessor module, which is used to pre-process the input data of model. In MMOCR 1.x, the data preprocessor usually does necessary steps to form the input images into a batch, such as padding. It can also serve as a place for some special data augmentations or more efficient data transformations like normalization.
+- The internal logic of model have been changed. In MMOCR 0.x, model used `forward_train` and `simple_test` to deal with different model forward logics. In MMOCR 1.x and OpenMMLab 2.0, the forward function has three modes: `loss`, `predict`, and `tensor` for training, inference, and tracing or other purposes, respectively. The forward function calls `self.loss()`, `self.predict()`, and `self._forward()` given the modes `loss`, `predict`, and `tensor`, respectively.
+
+#### Evaluation
+
+MMOCR 1.x mainly implements corresponding metrics for each task, which are manipulated by [Evaluator](https://mmengine.readthedocs.io/en/latest/design/evaluator.html) to complete the evaluation.
+In addition, users can build evaluator in MMOCR 1.x to conduct offline evaluation, i.e., evaluate predictions that may not produced by MMOCR, prediction follows our dataset conventions. More details can be find in the [Evaluation Tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/evaluation.html) in MMEngine.
+
+#### Visualization
+
+The functions of visualization in MMOCR 1.x are removed. Instead, in OpenMMLab 2.0 projects, we use [Visualizer](https://mmengine.readthedocs.io/en/latest/design/visualization.html) to visualize data. MMOCR 1.x implements `TextDetLocalVisualizer`, `TextRecogLocalVisualizer`, and `KIELocalVisualizer` to allow visualization of ground truths, model predictions, and feature maps, etc., at any place, for the three tasks supported in MMOCR. It also supports to dump the visualization data to any external visualization backends such as Tensorboard and Wandb. Check our [Visualization Document](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/visualization.html) for more details.
+
+### Improvements
+
+- Most models enjoy a performance improvement from the new framework and refactor of data transforms. For example, in MMOCR 1.x, DBNet-R50 achieves **0.854** hmean score on ICDAR 2015, while the counterpart can only get **0.840** hmean score in MMOCR 0.x.
+- Support mixed precision training of most of the models. However, the [rest models](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/train_test.html#mixed-precision-training) are not supported yet because the operators they used might not be representable in fp16. We will update the documentation and list the results of mixed precision training.
+
+### Ongoing changes
+
+1. Test-time augmentation: which was supported in MMOCR 0.x, is not implemented yet in this version due to limited time slot. We will support it in the following releases with a new and simplified design.
+2. Inference interfaces: a unified inference interfaces will be supported in the future to ease the use of released models.
+3. Interfaces of useful tools that can be used in notebook: more useful tools that implemented in the `tools/` directory will have their python interfaces so that they can be used through notebook and in downstream libraries.
+4. Documentation: we will add more design docs, tutorials, and migration guidance so that the community can deep dive into our new design, participate the future development, and smoothly migrate downstream libraries to MMOCR 1.x.
diff --git a/mmocr-dev-1.x/docs/en/notes/changelog_v0.x.md b/mmocr-dev-1.x/docs/en/notes/changelog_v0.x.md
new file mode 100644
index 0000000000000000000000000000000000000000..6b087b1d55cc73b5051fcd4272d97afa0bcbf753
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/notes/changelog_v0.x.md
@@ -0,0 +1,904 @@
+# Changelog
+
+## 0.6.0 (05/05/2022)
+
+### Highlights
+
+1. A new recognition algorithm [MASTER](https://arxiv.org/abs/1910.02562) has been added into MMOCR, which was the championship solution for the "ICDAR 2021 Competition on Scientific Table Image Recognition to Latex"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe
+2. [DBNet++](https://arxiv.org/abs/2202.10304) has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.
+3. Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo ([Det](https://mmocr.readthedocs.io/en/latest/datasets/det.html#) & [Recog](https://mmocr.readthedocs.io/en/latest/datasets/recog.html) ) to explore further information.
+4. To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the [doc](https://mmocr.readthedocs.io/en/latest/tools.html#convert-text-recognition-dataset-to-lmdb-format).
+5. Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. [Doc](https://mmocr.readthedocs.io/en/latest/tutorials/dataset_types.html#getting-mean-evaluation-scores)
+6. Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. ([Doc](https://mmocr.readthedocs.io/en/latest/tutorials/dataset_types.html#evaluation)) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the [Evaluation](https://mmocr.readthedocs.io/en/latest/tutorials/dataset_types.html#ocrdataset) section for details.
+7. MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit **Labelme** to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read [tutorial docs](https://mmocr.readthedocs.io/en/latest/tools.html) to get started.
+
+### Lmdb Dataset
+
+Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline [LoadImageFromLMDB](https://github.com/open-mmlab/mmocr/blob/878383b9de8d0e598f31fbb844ffcb0c305deb8b/mmocr/datasets/pipelines/loading.py#L140).
+This section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.
+
+#### Specifications
+
+To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:
+
+- The parameter describing the data volume of the dataset is `num-samples` instead of `total_number` (deprecated).
+- Images and labels are stored with keys in the form of `image-000000001` and `label-000000001`, respectively.
+
+#### Usage
+
+1. Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.
+
+- Previously, MMOCR had a function `txt2lmdb` (deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility [lmdb_converter](https://github.com/open-mmlab/mmocr/blob/main/tools/data/utils/lmdb_converter.py) to convert recognition datasets with both images and labels to lmdb format.
+
+- Say that your recognition data in MMOCR's format are organized as follows. (See an example in [ocr_toy_dataset](https://github.com/open-mmlab/mmocr/tree/main/tests/data/ocr_toy_dataset)).
+
+ ```text
+ # Directory structure
+
+ โโโimg_path
+ | |โโ img1.jpg
+ | |โโ img2.jpg
+ | |โโ ...
+ |โโlabel.txt (or label.jsonl)
+
+ # Annotation format
+
+ label.txt: img1.jpg HELLO
+ img2.jpg WORLD
+ ...
+
+ label.jsonl: {'filename':'img1.jpg', 'text':'HELLO'}
+ {'filename':'img2.jpg', 'text':'WORLD'}
+ ...
+ ```
+
+- Then pack these files up:
+
+ ```bash
+ python tools/data/utils/lmdb_converter.py {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}
+ ```
+
+- Check out [tools.md](https://github.com/open-mmlab/mmocr/blob/main/docs/en/tools.md) for more details.
+
+2. The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:
+
+- Set parser as `LineJsonParser` and `file_format` as 'lmdb' in [dataset config](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/recog_datasets/ST_MJ_train.py#L9)
+
+ ```python
+ # configs/_base_/recog_datasets/ST_MJ_train.py
+ train1 = dict(
+ type='OCRDataset',
+ img_prefix=train_img_prefix1,
+ ann_file=train_ann_file1,
+ loader=dict(
+ type='AnnFileLoader',
+ repeat=1,
+ file_format='lmdb',
+ parser=dict(
+ type='LineJsonParser',
+ keys=['filename', 'text'],
+ )),
+ pipeline=None,
+ test_mode=False)
+ ```
+
+- Use `LoadImageFromLMDB` in [pipeline](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/recog_pipelines/crnn_pipeline.py#L4):
+
+ ```python
+ # configs/_base_/recog_pipelines/crnn_pipeline.py
+ train_pipeline = [
+ dict(type='LoadImageFromLMDB', color_type='grayscale'),
+ ...
+ ```
+
+3. You are good to go! Start training and MMOCR will load data from your lmdb dataset.
+
+### New Features & Enhancements
+
+- Add analyze_logs in tools and its description in docs by @Y-M-Y in https://github.com/open-mmlab/mmocr/pull/899
+- Add LSVT Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/896
+- Add RCTW dataset converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/914
+- Support computing mean scores in UniformConcatDataset by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/981
+- Support loading images and labels from lmdb file by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/982
+- Add recog2lmdb and new toy dataset files by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/979
+- Add labelme converter for textdet and textrecog by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/972
+- Update CircleCI configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/918
+- Update Git Action by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/930
+- More customizable fields in dataloaders by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/933
+- Skip CIs when docs are modified by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/941
+- Rename Github tests, fix ignored paths by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/946
+- Support latest MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/959
+- Support dynamic threshold range in eval_hmean by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/962
+- Update the version requirement of mmdet in docker by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/966
+- Replace `opencv-python-headless` with `open-python` by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/970
+- Update Dataset Configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/980
+- Add SynthText dataset config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/983
+- Automatically report mean scores when applicable by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/995
+- Add DBNet++ by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/973
+- Add MASTER by @JiaquanYe in https://github.com/open-mmlab/mmocr/pull/807
+- Allow choosing metrics to report in text recognition tasks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/989
+- Add HierText converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/948
+- Fix lint_only in CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/998
+
+### Bug Fixes
+
+- Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/927
+- Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/944
+- Fix a Bug in ResNet plugin by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/967
+- revert a wrong setting in db_r18 cfg by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/978
+- Fix TotalText Anno version issue by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/945
+- Update installation step of `albumentations` by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/984
+- Fix ImgAug transform by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/949
+- Fix GPG key error in CI and docker by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/988
+- update label.lmdb by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/991
+- correct meta key by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/926
+- Use new image by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/976
+- Fix Data Converter Issues by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/955
+
+### Docs
+
+- Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/905
+- Fix the misleading description in test.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/908
+- Update recog.md for lmdb Generation by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/934
+- Add MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/954
+- Add wechat QR code to CN readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/960
+- Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/947
+- Use QR codes from MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/971
+- Renew dataset_types.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/997
+
+### New Contributors
+
+- @Y-M-Y made their first contribution in https://github.com/open-mmlab/mmocr/pull/899
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v0.5.0...v0.6.0
+
+## 0.5.0 (31/03/2022)
+
+### Highlights
+
+1. MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain `.txt` file to JSON line format `.jsonl`, and then revise a few configurations to enable the `LineJsonParser`. For more information, please read our step-by-step [tutorial](https://mmocr.readthedocs.io/en/latest/tutorials/blank_recog.html).
+2. [Tesseract](https://github.com/tesseract-ocr/tesseract) is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in `mmocr.utils.ocr` by wrapping Tesseract as a detector and/or recognizer. Users can easily create an MMOCR object by `MMOCR(det=โTesseractโ, recog=โTesseractโ)`. Credit to @garvan2021
+3. We release data converters for **16** widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( [Det](https://mmocr.readthedocs.io/en/latest/datasets/det.html#) & [Recog](https://mmocr.readthedocs.io/en/latest/datasets/recog.html) ) to explore further information.
+4. Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!
+
+### Migration Guide - ResNet
+
+Some refactoring processes are still going on. For text recognition models, we unified the [`ResNet-like` architectures](https://github.com/open-mmlab/mmocr/blob/72f945457324e700f0d14796dd10a51535c01a57/mmocr/models/textrecog/backbones/resnet.py) which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.
+
+#### Plugin
+
+- `Plugin` is a module category inherited from MMCV's implementation of `PLUGIN_LAYERS`, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at [mmocr/models/textrecog/plugins/common.py](https://github.com/open-mmlab/mmocr/blob/72f945457324e700f0d14796dd10a51535c01a57/mmocr/models/textrecog/plugins/common.py), or click the button below.
+
+
+ Plugin Example
+
+ ```python
+ @PLUGIN_LAYERS.register_module()
+ class Maxpool2d(nn.Module):
+ """A wrapper around nn.Maxpool2d().
+
+ Args:
+ kernel_size (int or tuple(int)): Kernel size for max pooling layer
+ stride (int or tuple(int)): Stride for max pooling layer
+ padding (int or tuple(int)): Padding for pooling layer
+ """
+
+ def __init__(self, kernel_size, stride, padding=0, **kwargs):
+ super(Maxpool2d, self).__init__()
+ self.model = nn.MaxPool2d(kernel_size, stride, padding)
+
+ def forward(self, x):
+ """
+ Args:
+ x (Tensor): Input feature map
+
+ Returns:
+ Tensor: The tensor after Maxpooling layer.
+ """
+ return self.model(x)
+ ```
+
+
+
+#### Stage-wise Plugins
+
+- ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving `plugins` parameters in ResNet.
+
+ ```text
+ [port1: before stage] ---> [stage] ---> [port2: after stage]
+ ```
+
+- E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this
+
+ ```python
+ resnet18_speical = ResNet(
+ # for simplicity, some required
+ # parameters are omitted
+ plugins=[
+ dict(
+ cfg=dict(
+ type='ConvModule',
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='ReLU')),
+ stages=(True, True, True, True),
+ position='before_stage')
+ dict(
+ cfg=dict(
+ type='ConvModule',
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='ReLU')),
+ stages=(True, True, False, True),
+ position='after_stage')
+ ])
+ ```
+
+- You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in [MASTER](https://arxiv.org/abs/1910.02562) as an example:
+
+
+ Multiple Plugins Example
+
+ - ResNet in Master is based on ResNet31. And after each stage, a module named `GCAModule` will be used. The `GCAModule` is inserted before the stage-wise convolution layer in ResNet31. In conlusion, there will be two plugins at `after_stage` port in the same time.
+
+ ```python
+ resnet_master = ResNet(
+ # for simplicity, some required
+ # parameters are omitted
+ plugins=[
+ dict(
+ cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)),
+ stages=(True, True, False, False),
+ position='before_stage'),
+ dict(
+ cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)),
+ stages=(False, False, True, False),
+ position='before_stage'),
+ dict(
+ cfg=dict(type='GCAModule', kernel_size=3, stride=1, padding=1),
+ stages=[True, True, True, True],
+ position='after_stage'),
+ dict(
+ cfg=dict(
+ type='ConvModule',
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='ReLU')),
+ stages=(True, True, True, True),
+ position='after_stage')
+ ])
+
+ ```
+
+
+
+ - In each plugin, we will pass two parameters (`in_channels`, `out_channels`) to support operations that need the information of current channels.
+
+#### Block-wise Plugin (Experimental)
+
+- We also refactored the `BasicBlock` used in ResNet. Now it can be customized with block-wise plugins. Check [here](https://github.com/open-mmlab/mmocr/blob/72f945457324e700f0d14796dd10a51535c01a57/mmocr/models/textrecog/layers/conv_layer.py) for more details.
+
+- BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.
+
+ ```text
+ [port1: before_conv1] ---> [conv1] --->
+ [port2: after_conv1] ---> [conv2] --->
+ [port3: after_conv2] ---> +(shortcut) ---> [port4: after_shortcut]
+ ```
+
+- In each plugin, we will pass a parameter `in_channels` to support operations that need the information of current channels.
+
+- E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:
+
+
+ Block-wise Plugin Example
+
+ ```python
+ resnet_31 = ResNet(
+ in_channels=3,
+ stem_channels=[64, 128],
+ block_cfgs=dict(type='BasicBlock'),
+ arch_layers=[1, 2, 5, 3],
+ arch_channels=[256, 256, 512, 512],
+ strides=[1, 1, 1, 1],
+ plugins=[
+ dict(
+ cfg=dict(type='Maxpool2d',
+ kernel_size=2,
+ stride=(2, 2)),
+ stages=(True, True, False, False),
+ position='before_stage'),
+ dict(
+ cfg=dict(type='Maxpool2d',
+ kernel_size=(2, 1),
+ stride=(2, 1)),
+ stages=(False, False, True, False),
+ position='before_stage'),
+ dict(
+ cfg=dict(
+ type='ConvModule',
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='ReLU')),
+ stages=(True, True, True, True),
+ position='after_stage')
+ ])
+ ```
+
+
+
+#### Full Examples
+
+
+ResNet without plugins
+
+- ResNet45 is used in ASTER and ABINet without any plugins.
+
+ ```python
+ resnet45_aster = ResNet(
+ in_channels=3,
+ stem_channels=[64, 128],
+ block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
+ arch_layers=[3, 4, 6, 6, 3],
+ arch_channels=[32, 64, 128, 256, 512],
+ strides=[(2, 2), (2, 2), (2, 1), (2, 1), (2, 1)])
+
+ resnet45_abi = ResNet(
+ in_channels=3,
+ stem_channels=32,
+ block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
+ arch_layers=[3, 4, 6, 6, 3],
+ arch_channels=[32, 64, 128, 256, 512],
+ strides=[2, 1, 2, 1, 1])
+ ```
+
+
+
+ResNet with plugins
+
+- ResNet31 is a typical architecture to use stage-wise plugins. Before the first three stages, Maxpooling layer is used. After each stage, a convolution layer with BN and ReLU is used.
+
+ ```python
+ resnet_31 = ResNet(
+ in_channels=3,
+ stem_channels=[64, 128],
+ block_cfgs=dict(type='BasicBlock'),
+ arch_layers=[1, 2, 5, 3],
+ arch_channels=[256, 256, 512, 512],
+ strides=[1, 1, 1, 1],
+ plugins=[
+ dict(
+ cfg=dict(type='Maxpool2d',
+ kernel_size=2,
+ stride=(2, 2)),
+ stages=(True, True, False, False),
+ position='before_stage'),
+ dict(
+ cfg=dict(type='Maxpool2d',
+ kernel_size=(2, 1),
+ stride=(2, 1)),
+ stages=(False, False, True, False),
+ position='before_stage'),
+ dict(
+ cfg=dict(
+ type='ConvModule',
+ kernel_size=3,
+ stride=1,
+ padding=1,
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='ReLU')),
+ stages=(True, True, True, True),
+ position='after_stage')
+ ])
+ ```
+
+
+
+### Migration Guide - Dataset Annotation Loader
+
+The annotation loaders, `LmdbLoader` and `HardDiskLoader`, are unified into `AnnFileLoader` for a more consistent design and wider support on different file formats and storage backends. `AnnFileLoader` can load the annotations from `disk`(default), `http` and `petrel` backend, and parse the annotation in `txt` or `lmdb` format. `LmdbLoader` and `HardDiskLoader` are deprecated, and users are recommended to modify their configs to use the new `AnnFileLoader`. Users can migrate their legacy loader `HardDiskLoader` referring to the following example:
+
+```python
+# Legacy config
+train = dict(
+ type='OCRDataset',
+ ...
+ loader=dict(
+ type='HardDiskLoader',
+ ...))
+
+# Suggested config
+train = dict(
+ type='OCRDataset',
+ ...
+ loader=dict(
+ type='AnnFileLoader',
+ file_storage_backend='disk',
+ file_format='txt',
+ ...))
+```
+
+Similarly, using `AnnFileLoader` with `file_format='lmdb'` instead of `LmdbLoader` is strongly recommended.
+
+### New Features & Enhancements
+
+- Update mmcv install by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/775
+- Upgrade isort by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/771
+- Automatically infer device for inference if not speicifed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/781
+- Add open-mmlab precommit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/787
+- Add windows CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/790
+- Add CurvedSyntext150k Converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/719
+- Add FUNSD Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/808
+- Support loading annotation file with petrel/http backend by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/793
+- Support different seeds on different ranks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/820
+- Support json in recognition converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/844
+- Add args and docs for multi-machine training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/849
+- Add warning info for LineStrParser by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/850
+- Deploy openmmlab-bot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/876
+- Add Tesserocr Inference by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/814
+- Add LV Dataset Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/871
+- Add SROIE Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/810
+- Add NAF Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/815
+- Add DeText Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/818
+- Add IMGUR Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/825
+- Add ILST Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/833
+- Add KAIST Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/835
+- Add IC11 (Born-digital Images) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/857
+- Add IC13 (Focused Scene Text) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/861
+- Add BID Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/862
+- Add Vintext Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/864
+- Add MTWI Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/867
+- Add COCO Text v2 Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/872
+- Add ReCTS Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/892
+- Refactor ResNets by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/809
+
+### Bug Fixes
+
+- Bump mmdet version to 2.20.0 in Dockerfile by @GPhilo in https://github.com/open-mmlab/mmocr/pull/763
+- Update mmdet version limit by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/773
+- Minimum version requirement of albumentations by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/769
+- Disable worker in the dataloader of gpu unit test by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/780
+- Standardize the type of torch.device in ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/800
+- Use RECOGNIZER instead of DETECTORS by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/685
+- Add num_classes to configs of ABINet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/805
+- Support loading space character from dict file by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/854
+- Description in tools/data/utils/txt2lmdb.py by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/870
+- ignore_index in SARLoss by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/869
+- Fix a bug that may cause inplace operation error by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/884
+- Use hyphen instead of underscores in script args by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/890
+
+### Docs
+
+- Add deprecation message for deploy tools by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/801
+- Reorganizing OpenMMLab projects in readme by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/806
+- Add demo/README_zh.md by @EighteenSprings in https://github.com/open-mmlab/mmocr/pull/802
+- Add detailed version requirement table by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/778
+- Correct misleading section title in training.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/819
+- Update README_zh-CN document URL by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/823
+- translate testing.md. by @yangrisheng in https://github.com/open-mmlab/mmocr/pull/822
+- Fix confused description for load-from and resume-from by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/842
+- Add documents getting_started in docs/zh by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/841
+- Add the model serving translation document by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/845
+- Update docs about installation on Windows by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/852
+- Update tutorial notebook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/853
+- Update Instructions for New Data Converters by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/900
+- Brief installation instruction in README by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/897
+- update doc for ILST, VinText, BID by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/902
+- Fix typos in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/903
+- Recog dataset doc by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/893
+- Reorganize the directory structure section in det.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/894
+
+### New Contributors
+
+- @GPhilo made their first contribution in https://github.com/open-mmlab/mmocr/pull/763
+- @xinke-wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/801
+- @EighteenSprings made their first contribution in https://github.com/open-mmlab/mmocr/pull/802
+- @BeyondYourself made their first contribution in https://github.com/open-mmlab/mmocr/pull/823
+- @yangrisheng made their first contribution in https://github.com/open-mmlab/mmocr/pull/822
+- @Mountchicken made their first contribution in https://github.com/open-mmlab/mmocr/pull/844
+- @garvan2021 made their first contribution in https://github.com/open-mmlab/mmocr/pull/814
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v0.4.1...v0.5.0
+
+## v0.4.1 (27/01/2022)
+
+### Highlights
+
+1. Visualizing edge weights in OpenSet KIE is now supported! https://github.com/open-mmlab/mmocr/pull/677
+2. Some configurations have been optimized to significantly speed up the training and testing processes! Don't worry - you can still tune these parameters in case these modifications do not work. https://github.com/open-mmlab/mmocr/pull/757
+3. Now you can use CPU to train/debug your model! https://github.com/open-mmlab/mmocr/pull/752
+4. We have fixed a severe bug that causes users unable to call `mmocr.apis.test` with our pre-built wheels. https://github.com/open-mmlab/mmocr/pull/667
+
+### New Features & Enhancements
+
+- Show edge score for openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/677
+- Download flake8 from github as pre-commit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/695
+- Deprecate the support for 'python setup.py test' by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/722
+- Disable multi-processing feature of cv2 to speed up data loading by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/721
+- Extend ctw1500 converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/729
+- Extend totaltext converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/728
+- Speed up training by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/739
+- Add setup multi-processing both in train and test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/757
+- Support CPU training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/752
+- Support specify gpu for testing and training with gpu-id instead of gpu-ids and gpus by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/756
+- Remove unnecessary custom_import from test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/758
+
+### Bug Fixes
+
+- Fix satrn onnxruntime test by @AllentDan in https://github.com/open-mmlab/mmocr/pull/679
+- Support both ConcatDataset and UniformConcatDataset by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/675
+- Fix bugs of show_results in single_gpu_test by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/667
+- Fix a bug for sar decoder when bi-rnn is used by @MhLiao in https://github.com/open-mmlab/mmocr/pull/690
+- Fix opencv version to avoid some bugs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/694
+- Fix py39 ci error by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/707
+- Update visualize.py by @TommyZihao in https://github.com/open-mmlab/mmocr/pull/715
+- Fix link of config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/726
+- Use yaml.safe_load instead of load by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/753
+- Add necessary keys to test_pipelines to enable test-time visualization by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/754
+
+### Docs
+
+- Fix recog.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/674
+- Add config tutorial by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/683
+- Add MMSelfSup/MMRazor/MMDeploy in readme by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/692
+- Add recog & det model summary by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/693
+- Update docs link by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/710
+- add pull request template.md by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/711
+- Add website links to readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/731
+- update readme according to standard by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/742
+
+### New Contributors
+
+- @MhLiao made their first contribution in https://github.com/open-mmlab/mmocr/pull/690
+- @TommyZihao made their first contribution in https://github.com/open-mmlab/mmocr/pull/715
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v0.4.0...v0.4.1
+
+## v0.4.0 (15/12/2021)
+
+### Highlights
+
+1. We release a new text recognition model - [ABINet](https://arxiv.org/pdf/2103.06495.pdf) (CVPR 2021, Oral). With it dedicated model design and useful data augmentation transforms, ABINet can achieve the best performance on irregular text recognition tasks. [Check it out!](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition)
+2. We are also working hard to fulfill the requests from our community.
+ [OpenSet KIE](https://mmocr.readthedocs.io/en/latest/kie_models.html#wildreceiptopenset) is one of the achievement, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide
+ a demo script to convert WildReceipt to open set domain, though it cannot
+ take the full advantage of OpenSet format. For more information, please read our
+ [tutorial](https://mmocr.readthedocs.io/en/latest/tutorials/kie_closeset_openset.html).
+3. APIs of models can be exposed through TorchServe. [Docs](https://mmocr.readthedocs.io/en/latest/model_serving.html)
+
+### Breaking Changes & Migration Guide
+
+#### Postprocessor
+
+Some refactoring processes are still going on. For all text detection models, we unified their `decode` implementations into a new module category, `POSTPROCESSOR`, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the `text_repr_type` argument in `bbox_head` is deprecated and will be removed in the future release.
+
+**Migration Guide**: Find a similar line from detection model's config:
+
+```
+text_repr_type=xxx,
+```
+
+And replace it with
+
+```
+postprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),
+```
+
+Take a snippet of PANet's config as an example. Before the change, its config for `bbox_head` looks like:
+
+```
+ bbox_head=dict(
+ type='PANHead',
+ text_repr_type='poly',
+ in_channels=[128, 128, 128, 128],
+ out_channels=6,
+ module_loss=dict(type='PANModuleLoss')),
+```
+
+Afterwards:
+
+```
+ bbox_head=dict(
+ type='PANHead',
+ in_channels=[128, 128, 128, 128],
+ out_channels=6,
+ module_loss=dict(type='PANModuleLoss'),
+ postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),
+```
+
+There are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in `mmocr/models/textdet/postprocess` or through our [api docs](https://mmocr.readthedocs.io/en/latest/api.html#textdet-postprocess).
+
+#### New Config Structure
+
+We reorganized the `configs/` directory by extracting reusable sections into `configs/_base_`. Now the directory tree of `configs/_base_` is organized as follows:
+
+```
+_base_
+โโโ det_datasets
+โโโ det_models
+โโโ det_pipelines
+โโโ recog_datasets
+โโโ recog_models
+โโโ recog_pipelines
+โโโ schedules
+```
+
+Most of model configs are making full use of base configs now, which makes the overall structural clearer and facilitates fair
+comparison across models. Despite the seemingly significant hierarchical difference, **these changes would not break the backward compatibility** as the names of model configs remain the same.
+
+### New Features
+
+- Support openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/498
+- Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in https://github.com/open-mmlab/mmocr/pull/497
+- Support Chinese for kie show result by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/464
+- Add TorchServe support for text detection and recognition by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/522
+- Save filename in text detection test results by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/570
+- Add codespell pre-commit hook and fix typos by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/520
+- Avoid duplicate placeholder docs in CN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/582
+- Save results to json file for kie. by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/589
+- Add SAR_CN to ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/579
+- mim extension for windows by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/641
+- Support muitiple pipelines for different datasets by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/657
+- ABINet Framework by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/651
+
+### Refactoring
+
+- Refactor textrecog config structure by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/617
+- Refactor text detection config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/626
+- refactor transformer modules by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/618
+- refactor textdet postprocess by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/640
+
+### Docs
+
+- C++ example section by @apiaccess21 in https://github.com/open-mmlab/mmocr/pull/593
+- install.md Chinese section by @A465539338 in https://github.com/open-mmlab/mmocr/pull/364
+- Add Chinese Translation of deployment.md. by @fatfishZhao in https://github.com/open-mmlab/mmocr/pull/506
+- Fix a model link and add the metafile for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/473
+- Improve docs style by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/474
+- Enhancement & sync Chinese docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/492
+- TorchServe docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/539
+- Update docs menu by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/564
+- Docs for KIE CloseSet & OpenSet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/573
+- Fix broken links by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/576
+- Docstring for text recognition models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/562
+- Add MMFlow & MIM by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/597
+- Add MMFewShot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/621
+- Update model readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/604
+- Add input size check to model_inference by @mpena-vina in https://github.com/open-mmlab/mmocr/pull/633
+- Docstring for textdet models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/561
+- Add MMHuman3D in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/644
+- Use shared menu from theme instead by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/655
+- Refactor docs structure by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/662
+- Docs fix by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/664
+
+### Enhancements
+
+- Use bounding box around polygon instead of within polygon by @alexander-soare in https://github.com/open-mmlab/mmocr/pull/469
+- Add CITATION.cff by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/476
+- Add py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/475
+- update model-index.yml by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/484
+- Use container in CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/502
+- CircleCI Setup by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/611
+- Remove unnecessary custom_import from train.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/603
+- Change the upper version of mmcv to 1.5.0 by @zhouzaida in https://github.com/open-mmlab/mmocr/pull/628
+- Update CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/631
+- Pass custom_hooks to MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/609
+- Skip CI when some specific files were changed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/642
+- Add markdown linter in pre-commit hook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/643
+- Use shape from loaded image by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/652
+- Cancel previous runs that are not completed by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/666
+
+### Bug Fixes
+
+- Modify algorithm "sar" weights path in metafile by @ShoupingShan in https://github.com/open-mmlab/mmocr/pull/581
+- Fix Cuda CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/472
+- Fix image export in test.py for KIE models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/486
+- Allow invalid polygons in intersection and union by default by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/471
+- Update checkpoints' links for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/518
+- Fix converting to onnx bug because of changing key from img_shape to resize_shape by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/523
+- Fix PyTorch 1.6 incompatible checkpoints by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/540
+- Fix paper field in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/550
+- Unify recognition task names in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/548
+- Fix py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/563
+- Always map location to cpu when loading checkpoint by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/567
+- Fix wrong model builder in recog_test_imgs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/574
+- Improve dbnet r50 by fixing img std by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/578
+- Fix resource warning: unclosed file by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/577
+- Fix bug that same start_point for different texts in draw_texts_by_pil by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/587
+- Keep original texts for kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/588
+- Fix random seed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/600
+- Fix DBNet_r50 config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/625
+- Change SBC case to DBC case by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/632
+- Fix kie demo by @innerlee in https://github.com/open-mmlab/mmocr/pull/610
+- fix type check by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/650
+- Remove depreciated image validator in totaltext converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/661
+- Fix change locals() dict by @Fei-Wang in https://github.com/open-mmlab/mmocr/pull/663
+- fix #614: textsnake targets by @HolyCrap96 in https://github.com/open-mmlab/mmocr/pull/660
+
+### New Contributors
+
+- @alexander-soare made their first contribution in https://github.com/open-mmlab/mmocr/pull/469
+- @A465539338 made their first contribution in https://github.com/open-mmlab/mmocr/pull/364
+- @fatfishZhao made their first contribution in https://github.com/open-mmlab/mmocr/pull/506
+- @baudm made their first contribution in https://github.com/open-mmlab/mmocr/pull/497
+- @ShoupingShan made their first contribution in https://github.com/open-mmlab/mmocr/pull/581
+- @apiaccess21 made their first contribution in https://github.com/open-mmlab/mmocr/pull/593
+- @zhouzaida made their first contribution in https://github.com/open-mmlab/mmocr/pull/628
+- @mpena-vina made their first contribution in https://github.com/open-mmlab/mmocr/pull/633
+- @Fei-Wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/663
+
+**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v0.3.0...0.4.0
+
+## v0.3.0 (25/8/2021)
+
+### Highlights
+
+1. We add a new text recognition model -- SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. ([@2793145003](https://github.com/2793145003)) [#405](https://github.com/open-mmlab/mmocr/pull/405)
+2. Improve the demo script, `ocr.py`, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. ([@samayala22](https://github.com/samayala22), [@manjrekarom](https://github.com/manjrekarom)) [#371](https://github.com/open-mmlab/mmocr/pull/371), [#386](https://github.com/open-mmlab/mmocr/pull/386), [#400](https://github.com/open-mmlab/mmocr/pull/400), [#374](https://github.com/open-mmlab/mmocr/pull/374), [#428](https://github.com/open-mmlab/mmocr/pull/428)
+3. Our documentation is reorganized into a clearer structure. More useful contents are on the way! [#409](https://github.com/open-mmlab/mmocr/pull/409), [#454](https://github.com/open-mmlab/mmocr/pull/454)
+4. The requirement of `Polygon3` is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions in `shapely` instead. [#448](https://github.com/open-mmlab/mmocr/pull/448)
+
+### Breaking Changes & Migration Guide
+
+1. Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs [#382](https://github.com/open-mmlab/mmocr/pull/382)
+2. MMOCR now has its own model and layer registries inherited from MMDetection's or MMCV's counterparts. ([#436](https://github.com/open-mmlab/mmocr/pull/436)) The modified hierarchical structure of the model registries are now organized as follows.
+
+```text
+mmcv.MODELS -> mmdet.BACKBONES -> BACKBONES
+mmcv.MODELS -> mmdet.NECKS -> NECKS
+mmcv.MODELS -> mmdet.ROI_EXTRACTORS -> ROI_EXTRACTORS
+mmcv.MODELS -> mmdet.HEADS -> HEADS
+mmcv.MODELS -> mmdet.LOSSES -> LOSSES
+mmcv.MODELS -> mmdet.DETECTORS -> DETECTORS
+mmcv.ACTIVATION_LAYERS -> ACTIVATION_LAYERS
+mmcv.UPSAMPLE_LAYERS -> UPSAMPLE_LAYERS
+```
+
+To migrate your old implementation to our new backend, you need to change the import path of any registries and their corresponding builder functions (including `build_detectors`) from `mmdet.models.builder` to `mmocr.models.builder`. If you have referred to any model or layer of MMDetection or MMCV in your model config, you need to add `mmdet.` or `mmcv.` prefix to its name to inform the model builder of the right namespace to work on.
+
+Interested users may check out [MMCV's tutorial on Registry](https://mmcv.readthedocs.io/en/latest/understand_mmcv/registry.html) for in-depth explanations on its mechanism.
+
+### New Features
+
+- Automatically replace SyncBN with BN for inference [#420](https://github.com/open-mmlab/mmocr/pull/420), [#453](https://github.com/open-mmlab/mmocr/pull/453)
+- Support batch inference for CRNN and SegOCR [#407](https://github.com/open-mmlab/mmocr/pull/407)
+- Support exporting documentation in pdf or epub format [#406](https://github.com/open-mmlab/mmocr/pull/406)
+- Support `persistent_workers` option in data loader [#459](https://github.com/open-mmlab/mmocr/pull/459)
+
+### Bug Fixes
+
+- Remove depreciated key in kie_test_imgs.py [#381](https://github.com/open-mmlab/mmocr/pull/381)
+- Fix dimension mismatch in batch testing/inference of DBNet [#383](https://github.com/open-mmlab/mmocr/pull/383)
+- Fix the problem of dice loss which stays at 1 with an empty target given [#408](https://github.com/open-mmlab/mmocr/pull/408)
+- Fix a wrong link in ocr.py ([@naarkhoo](https://github.com/naarkhoo)) [#417](https://github.com/open-mmlab/mmocr/pull/417)
+- Fix undesired assignment to "pretrained" in test.py [#418](https://github.com/open-mmlab/mmocr/pull/418)
+- Fix a problem in polygon generation of DBNet [#421](https://github.com/open-mmlab/mmocr/pull/421), [#443](https://github.com/open-mmlab/mmocr/pull/443)
+- Skip invalid annotations in totaltext_converter [#438](https://github.com/open-mmlab/mmocr/pull/438)
+- Add zero division handler in poly utils, remove Polygon3 [#448](https://github.com/open-mmlab/mmocr/pull/448)
+
+### Improvements
+
+- Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to [@gen-ko](https://github.com/gen-ko) who has re-distributed this package!)
+- Support MIM [#394](https://github.com/open-mmlab/mmocr/pull/394)
+- Add tests for PyTorch 1.9 in CI [#401](https://github.com/open-mmlab/mmocr/pull/401)
+- Enables fullscreen layout in readthedocs [#413](https://github.com/open-mmlab/mmocr/pull/413)
+- General documentation enhancement [#395](https://github.com/open-mmlab/mmocr/pull/395)
+- Update version checker [#427](https://github.com/open-mmlab/mmocr/pull/427)
+- Add copyright info [#439](https://github.com/open-mmlab/mmocr/pull/439)
+- Update citation information [#440](https://github.com/open-mmlab/mmocr/pull/440)
+
+### Contributors
+
+We thank [@2793145003](https://github.com/2793145003), [@samayala22](https://github.com/samayala22), [@manjrekarom](https://github.com/manjrekarom), [@naarkhoo](https://github.com/naarkhoo), [@gen-ko](https://github.com/gen-ko), [@duanjiaqi](https://github.com/duanjiaqi), [@gaotongxiao](https://github.com/gaotongxiao), [@cuhk-hbsun](https://github.com/cuhk-hbsun), [@innerlee](https://github.com/innerlee), [@wdsd641417025](https://github.com/wdsd641417025) for their contribution to this release!
+
+## v0.2.1 (20/7/2021)
+
+### Highlights
+
+1. Upgrade to use MMCV-full **>= 1.3.8** and MMDetection **>= 2.13.0** for latest features
+2. Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) [#278](https://github.com/open-mmlab/mmocr/pull/278), [#291](https://github.com/open-mmlab/mmocr/pull/291), [#300](https://github.com/open-mmlab/mmocr/pull/300), [#328](https://github.com/open-mmlab/mmocr/pull/328)
+3. Unified parameter initialization method which uses init_cfg in config files [#365](https://github.com/open-mmlab/mmocr/pull/365)
+
+### New Features
+
+- Support TextOCR dataset [#293](https://github.com/open-mmlab/mmocr/pull/293)
+- Support Total-Text dataset [#266](https://github.com/open-mmlab/mmocr/pull/266), [#273](https://github.com/open-mmlab/mmocr/pull/273), [#357](https://github.com/open-mmlab/mmocr/pull/357)
+- Support grouping text detection box into lines [#290](https://github.com/open-mmlab/mmocr/pull/290), [#304](https://github.com/open-mmlab/mmocr/pull/304)
+- Add benchmark_processing script that benchmarks data loading process [#261](https://github.com/open-mmlab/mmocr/pull/261)
+- Add SynthText preprocessor for text recognition models [#351](https://github.com/open-mmlab/mmocr/pull/351), [#361](https://github.com/open-mmlab/mmocr/pull/361)
+- Support batch inference during testing [#310](https://github.com/open-mmlab/mmocr/pull/310)
+- Add user-friendly OCR inference script [#366](https://github.com/open-mmlab/mmocr/pull/366)
+
+### Bug Fixes
+
+- Fix improper class ignorance in SDMGR Loss [#221](https://github.com/open-mmlab/mmocr/pull/221)
+- Fix potential numerical zero division error in DRRG [#224](https://github.com/open-mmlab/mmocr/pull/224)
+- Fix installing requirements with pip and mim [#242](https://github.com/open-mmlab/mmocr/pull/242)
+- Fix dynamic input error of DBNet [#269](https://github.com/open-mmlab/mmocr/pull/269)
+- Fix space parsing error in LineStrParser [#285](https://github.com/open-mmlab/mmocr/pull/285)
+- Fix textsnake decode error [#264](https://github.com/open-mmlab/mmocr/pull/264)
+- Correct isort setup [#288](https://github.com/open-mmlab/mmocr/pull/288)
+- Fix a bug in SDMGR config [#316](https://github.com/open-mmlab/mmocr/pull/316)
+- Fix kie_test_img for KIE nonvisual [#319](https://github.com/open-mmlab/mmocr/pull/319)
+- Fix metafiles [#342](https://github.com/open-mmlab/mmocr/pull/342)
+- Fix different device problem in FCENet [#334](https://github.com/open-mmlab/mmocr/pull/334)
+- Ignore improper tailing empty characters in annotation files [#358](https://github.com/open-mmlab/mmocr/pull/358)
+- Docs fixes [#247](https://github.com/open-mmlab/mmocr/pull/247), [#255](https://github.com/open-mmlab/mmocr/pull/255), [#265](https://github.com/open-mmlab/mmocr/pull/265), [#267](https://github.com/open-mmlab/mmocr/pull/267), [#268](https://github.com/open-mmlab/mmocr/pull/268), [#270](https://github.com/open-mmlab/mmocr/pull/270), [#276](https://github.com/open-mmlab/mmocr/pull/276), [#287](https://github.com/open-mmlab/mmocr/pull/287), [#330](https://github.com/open-mmlab/mmocr/pull/330), [#355](https://github.com/open-mmlab/mmocr/pull/355), [#367](https://github.com/open-mmlab/mmocr/pull/367)
+- Fix NRTR config [#356](https://github.com/open-mmlab/mmocr/pull/356), [#370](https://github.com/open-mmlab/mmocr/pull/370)
+
+### Improvements
+
+- Add backend for resizeocr [#244](https://github.com/open-mmlab/mmocr/pull/244)
+- Skip image processing pipelines in SDMGR novisual [#260](https://github.com/open-mmlab/mmocr/pull/260)
+- Speedup DBNet [#263](https://github.com/open-mmlab/mmocr/pull/263)
+- Update mmcv installation method in workflow [#323](https://github.com/open-mmlab/mmocr/pull/323)
+- Add part of Chinese documentations [#353](https://github.com/open-mmlab/mmocr/pull/353), [#362](https://github.com/open-mmlab/mmocr/pull/362)
+- Add support for ConcatDataset with two workflows [#348](https://github.com/open-mmlab/mmocr/pull/348)
+- Add list_from_file and list_to_file utils [#226](https://github.com/open-mmlab/mmocr/pull/226)
+- Speed up sort_vertex [#239](https://github.com/open-mmlab/mmocr/pull/239)
+- Support distributed evaluation of KIE [#234](https://github.com/open-mmlab/mmocr/pull/234)
+- Add pretrained FCENet on IC15 [#258](https://github.com/open-mmlab/mmocr/pull/258)
+- Support CPU for OCR demo [#227](https://github.com/open-mmlab/mmocr/pull/227)
+- Avoid extra image pre-processing steps [#375](https://github.com/open-mmlab/mmocr/pull/375)
+
+## v0.2.0 (18/5/2021)
+
+### Highlights
+
+1. Add the NER approach Bert-softmax (NAACL'2019)
+2. Add the text detection method DRRG (CVPR'2020)
+3. Add the text detection method FCENet (CVPR'2021)
+4. Increase the ease of use via adding text detection and recognition end-to-end demo, and colab online demo.
+5. Simplify the installation.
+
+### New Features
+
+- Add Bert-softmax for Ner task [#148](https://github.com/open-mmlab/mmocr/pull/148)
+- Add DRRG [#189](https://github.com/open-mmlab/mmocr/pull/189)
+- Add FCENet [#133](https://github.com/open-mmlab/mmocr/pull/133)
+- Add end-to-end demo [#105](https://github.com/open-mmlab/mmocr/pull/105)
+- Support batch inference [#86](https://github.com/open-mmlab/mmocr/pull/86) [#87](https://github.com/open-mmlab/mmocr/pull/87) [#178](https://github.com/open-mmlab/mmocr/pull/178)
+- Add TPS preprocessor for text recognition [#117](https://github.com/open-mmlab/mmocr/pull/117) [#135](https://github.com/open-mmlab/mmocr/pull/135)
+- Add demo documentation [#151](https://github.com/open-mmlab/mmocr/pull/151) [#166](https://github.com/open-mmlab/mmocr/pull/166) [#168](https://github.com/open-mmlab/mmocr/pull/168) [#170](https://github.com/open-mmlab/mmocr/pull/170) [#171](https://github.com/open-mmlab/mmocr/pull/171)
+- Add checkpoint for Chinese recognition [#156](https://github.com/open-mmlab/mmocr/pull/156)
+- Add metafile [#175](https://github.com/open-mmlab/mmocr/pull/175) [#176](https://github.com/open-mmlab/mmocr/pull/176) [#177](https://github.com/open-mmlab/mmocr/pull/177) [#182](https://github.com/open-mmlab/mmocr/pull/182) [#183](https://github.com/open-mmlab/mmocr/pull/183)
+- Add support for numpy array inference [#74](https://github.com/open-mmlab/mmocr/pull/74)
+
+### Bug Fixes
+
+- Fix the duplicated point bug due to transform for textsnake [#130](https://github.com/open-mmlab/mmocr/pull/130)
+- Fix CTC loss NaN [#159](https://github.com/open-mmlab/mmocr/pull/159)
+- Fix error raised if result is empty in demo [#144](https://github.com/open-mmlab/mmocr/pull/141)
+- Fix results missing if one image has a large number of boxes [#98](https://github.com/open-mmlab/mmocr/pull/98)
+- Fix package missing in dockerfile [#109](https://github.com/open-mmlab/mmocr/pull/109)
+
+### Improvements
+
+- Simplify installation procedure via removing compiling [#188](https://github.com/open-mmlab/mmocr/pull/188)
+- Speed up panet post processing so that it can detect dense texts [#188](https://github.com/open-mmlab/mmocr/pull/188)
+- Add zh-CN README [#70](https://github.com/open-mmlab/mmocr/pull/70) [#95](https://github.com/open-mmlab/mmocr/pull/95)
+- Support windows [#89](https://github.com/open-mmlab/mmocr/pull/89)
+- Add Colab [#147](https://github.com/open-mmlab/mmocr/pull/147) [#199](https://github.com/open-mmlab/mmocr/pull/199)
+- Add 1-step installation using conda environment [#193](https://github.com/open-mmlab/mmocr/pull/193) [#194](https://github.com/open-mmlab/mmocr/pull/194) [#195](https://github.com/open-mmlab/mmocr/pull/195)
+
+## v0.1.0 (7/4/2021)
+
+### Highlights
+
+- MMOCR is released.
+
+### Main Features
+
+- Support text detection, text recognition and the corresponding downstream tasks such as key information extraction.
+- For text detection, support both single-step (`PSENet`, `PANet`, `DBNet`, `TextSnake`) and two-step (`MaskRCNN`) methods.
+- For text recognition, support CTC-loss based method `CRNN`; Encoder-decoder (with attention) based methods `SAR`, `Robustscanner`; Segmentation based method `SegOCR`; Transformer based method `NRTR`.
+- For key information extraction, support GCN based method `SDMG-R`.
+- Provide checkpoints and log files for all of the methods above.
diff --git a/mmocr-dev-1.x/docs/en/notes/contribution_guide.md b/mmocr-dev-1.x/docs/en/notes/contribution_guide.md
new file mode 100644
index 0000000000000000000000000000000000000000..94cf4ce165196baeaff18c6615f1f683dfaa70eb
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/notes/contribution_guide.md
@@ -0,0 +1,134 @@
+# Contribution Guide
+
+OpenMMLab welcomes everyone who is interested in contributing to our projects and accepts contribution in the form of PR.
+
+## What is PR
+
+`PR` is the abbreviation of `Pull Request`. Here's the definition of `PR` in the [official document](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) of Github.
+
+```
+Pull requests let you tell others about changes you have pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.
+```
+
+## Basic Workflow
+
+1. Get the most recent codebase
+2. Checkout a new branch from `dev-1.x` branch, depending on the version of the codebase you want to contribute to.
+3. Commit your changes ([Don't forget to use pre-commit hooks!](#3-commit-your-changes))
+4. Push your changes and create a PR
+5. Discuss and review your code
+6. Merge your branch to `dev-1.x` branch
+
+## Procedures in detail
+
+### 1. Get the most recent codebase
+
+- When you work on your first PR
+
+ Fork the OpenMMLab repository: click the **fork** button at the top right corner of Github page
+ ![avatar](https://user-images.githubusercontent.com/22607038/195038780-06a46340-8376-4bde-a07f-2577f231a204.png)
+
+ Clone forked repository to local
+
+ ```bash
+ git clone git@github.com:XXX/mmocr.git
+ ```
+
+ Add source repository to upstream
+
+ ```bash
+ git remote add upstream git@github.com:open-mmlab/mmocr
+ ```
+
+- After your first PR
+
+ Checkout the latest branch of the local repository and pull the latest branch of the source repository. Here we assume that you are working on the `dev-1.x` branch.
+
+ ```bash
+ git checkout dev-1.x
+ git pull upstream dev-1.x
+ ```
+
+### 2. Checkout a new branch from `dev-1.x` branch
+
+```bash
+git checkout -b branchname
+```
+
+```{tip}
+To make commit history clear, we strongly recommend you checkout the `dev-1.x` branch before creating a new branch.
+```
+
+### 3. Commit your changes
+
+- If you are a first-time contributor, please install and initialize pre-commit hooks from the repository root directory first.
+
+ ```bash
+ pip install -U pre-commit
+ pre-commit install
+ ```
+
+- Commit your changes as usual. Pre-commit hooks will be triggered to stylize your code before each commit.
+
+ ```bash
+ # coding
+ git add [files]
+ git commit -m 'messages'
+ ```
+
+ ```{note}
+ Sometimes your code may be changed by pre-commit hooks. In this case, please remember to re-stage the modified files and commit again.
+ ```
+
+### 4. Push your changes to the forked repository and create a PR
+
+- Push the branch to your forked remote repository
+
+ ```bash
+ git push origin branchname
+ ```
+
+- Create a PR
+ ![avatar](https://user-images.githubusercontent.com/22607038/195053564-71bd3cb4-b8d4-4ed9-9075-051e138b7fd4.png)
+
+- Revise PR message template to describe your motivation and modifications made in this PR. You can also link the related issue to the PR manually in the PR message (For more information, checkout the [official guidance](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)).
+
+- Specifically, if you are contributing to `dev-1.x`, you will have to change the base branch of the PR to `dev-1.x` in the PR page, since the default base branch is `main`.
+
+ ![avatar](https://user-images.githubusercontent.com/22607038/195045928-f3ceedc8-0162-46a7-ae1a-7e22829fe189.png)
+
+- You can also ask a specific person to review the changes you've proposed.
+
+### 5. Discuss and review your code
+
+- Modify your codes according to reviewers' suggestions and then push your changes.
+
+### 6. Merge your branch to `dev-1.x` branch and delete the branch
+
+- After the PR is merged by the maintainer, you can delete the branch you created in your forked repository.
+
+ ```bash
+ git branch -d branchname # delete local branch
+ git push origin --delete branchname # delete remote branch
+ ```
+
+## PR Specs
+
+1. Use [pre-commit](https://pre-commit.com) hook to avoid issues of code style
+
+2. One short-time branch should be matched with only one PR
+
+3. Accomplish a detailed change in one PR. Avoid large PR
+
+ - Bad: Support Faster R-CNN
+ - Acceptable: Add a box head to Faster R-CNN
+ - Good: Add a parameter to box head to support custom conv-layer number
+
+4. Provide clear and significant commit message
+
+5. Provide clear and meaningful PR description
+
+ - Task name should be clarified in title. The general format is: \[Prefix\] Short description of the PR (Suffix)
+ - Prefix: add new feature \[Feature\], fix bug \[Fix\], related to documents \[Docs\], in developing \[WIP\] (which will not be reviewed temporarily)
+ - Introduce main changes, results and influences on other modules in short description
+ - Associate related issues and pull requests with a milestone
diff --git a/mmocr-dev-1.x/docs/en/project_zoo.py b/mmocr-dev-1.x/docs/en/project_zoo.py
new file mode 100755
index 0000000000000000000000000000000000000000..ec5671793371fa22e754537b9fd12db22656ae42
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/project_zoo.py
@@ -0,0 +1,52 @@
+#!/usr/bin/env python
+import os.path as osp
+import re
+
+# This script reads /projects/selected.txt and generate projectzoo.md
+
+files = []
+
+project_zoo = """
+# SOTA Models
+
+Here are some selected project implementations that are not yet included in
+MMOCR package, but are ready to use.
+
+"""
+
+files = open('../../projects/selected.txt').readlines()
+
+for file in files:
+ file = file.strip()
+ with open(osp.join('../../', file)) as f:
+ content = f.read()
+
+ # Extract title
+ expr = '# (.*?)\n'
+ title = re.search(expr, content).group(1)
+ project_zoo += f'## {title}\n\n'
+
+ # Locate the description
+ expr = '## Description\n(.*?)##'
+ description = re.search(expr, content, re.DOTALL).group(1)
+ project_zoo += f'{description}\n'
+
+ # check milestone 1
+ expr = r'- \[(.?)\] Milestone 1'
+ state = re.search(expr, content, re.DOTALL).group(1)
+ infer_state = 'โ' if state == 'x' else 'โ'
+
+ # check milestone 2
+ expr = r'- \[(.?)\] Milestone 2'
+ state = re.search(expr, content, re.DOTALL).group(1)
+ training_state = 'โ' if state == 'x' else 'โ'
+
+ # add table
+ readme_link = f'https://github.com/open-mmlab/mmocr/blob/dev-1.x/{file}'
+ project_zoo += '### Status \n'
+ project_zoo += '| Inference | Train | README |\n'
+ project_zoo += '| --------- | -------- | ------ |\n'
+ project_zoo += f'|๏ธ{infer_state}|{training_state}|[link]({readme_link})|\n'
+
+with open('projectzoo.md', 'w') as f:
+ f.write(project_zoo)
diff --git a/mmocr-dev-1.x/docs/en/requirements.txt b/mmocr-dev-1.x/docs/en/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..89fbf86c01cb29f10f7e99c910248c4d5229da58
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/requirements.txt
@@ -0,0 +1,4 @@
+recommonmark
+sphinx
+sphinx_markdown_tables
+sphinx_rtd_theme
diff --git a/mmocr-dev-1.x/docs/en/stats.py b/mmocr-dev-1.x/docs/en/stats.py
new file mode 100755
index 0000000000000000000000000000000000000000..3238686937660a189b9db21b8653b519bb12627c
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/stats.py
@@ -0,0 +1,131 @@
+#!/usr/bin/env python
+# Copyright (c) OpenMMLab. All rights reserved.
+import functools as func
+import re
+from os.path import basename, splitext
+
+import numpy as np
+import titlecase
+from weight_list import gen_weight_list
+
+
+def title2anchor(name):
+ return re.sub(r'-+', '-', re.sub(r'[^a-zA-Z0-9]', '-',
+ name.strip().lower())).strip('-')
+
+
+# Count algorithms
+
+files = [
+ 'backbones.md', 'textdet_models.md', 'textrecog_models.md', 'kie_models.md'
+]
+
+stats = []
+
+for f in files:
+ with open(f) as content_file:
+ content = content_file.read()
+
+ # Remove the blackquote notation from the paper link under the title
+ # for better layout in readthedocs
+ expr = r'(^## \s*?.*?\s+?)>\s*?(\[.*?\]\(.*?\))'
+ content = re.sub(expr, r'\1\2', content, flags=re.MULTILINE)
+ with open(f, 'w') as content_file:
+ content_file.write(content)
+
+ # title
+ title = content.split('\n')[0].replace('#', '')
+
+ # count papers
+ exclude_papertype = ['ABSTRACT', 'IMAGE']
+ exclude_expr = ''.join(f'(?!{s})' for s in exclude_papertype)
+ expr = rf''\
+ r'\s*\n.*?\btitle\s*=\s*{(.*?)}'
+ papers = {(papertype, titlecase.titlecase(paper.lower().strip()))
+ for (papertype, paper) in re.findall(expr, content, re.DOTALL)}
+ print(papers)
+ # paper links
+ revcontent = '\n'.join(list(reversed(content.splitlines())))
+ paperlinks = {}
+ for _, p in papers:
+ q = p.replace('\\', '\\\\').replace('?', '\\?')
+ paper_link = title2anchor(
+ re.search(
+ rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n',
+ revcontent, re.DOTALL | re.IGNORECASE).group(1))
+ paperlinks[p] = f'[{p}]({splitext(basename(f))[0]}.md#{paper_link})'
+ paperlist = '\n'.join(
+ sorted(f' - [{t}] {paperlinks[x]}' for t, x in papers))
+ # count configs
+ configs = {
+ x.lower().strip()
+ for x in re.findall(r'https.*configs/.*\.py', content)
+ }
+
+ # count ckpts
+ ckpts = {
+ x.lower().strip()
+ for x in re.findall(r'https://download.*\.pth', content)
+ if 'mmocr' in x
+ }
+
+ statsmsg = f"""
+### [{title}]({f})
+
+* Number of checkpoints: {len(ckpts)}
+* Number of configs: {len(configs)}
+* Number of papers: {len(papers)}
+{paperlist}
+
+ """
+
+ stats.append((papers, configs, ckpts, statsmsg))
+
+allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _, _ in stats])
+allconfigs = func.reduce(lambda a, b: a.union(b), [c for _, c, _, _ in stats])
+allckpts = func.reduce(lambda a, b: a.union(b), [c for _, _, c, _ in stats])
+msglist = '\n'.join(x for _, _, _, x in stats)
+
+papertypes, papercounts = np.unique([t for t, _ in allpapers],
+ return_counts=True)
+countstr = '\n'.join(
+ [f' - {t}: {c}' for t, c in zip(papertypes, papercounts)])
+
+# get model list
+weight_list = gen_weight_list()
+
+modelzoo = f"""
+# Overview
+
+## Weights
+
+Here are the list of weights available for
+[Inference](user_guides/inference.md).
+
+For the ease of reference, some weights may have shorter aliases, which will be
+separated by `/` in the table.
+For example, "`DB_r18 / dbnet_resnet18_fpnc_1200e_icdar2015`" means that you can
+use either `DB_r18` or `dbnet_resnet18_fpnc_1200e_icdar2015`
+to initialize the Inferencer:
+
+```python
+>>> from mmocr.apis import TextDetInferencer
+>>> inferencer = TextDetInferencer(model='DB_r18')
+>>> # equivalent to
+>>> inferencer = TextDetInferencer(model='dbnet_resnet18_fpnc_1200e_icdar2015')
+```
+
+{weight_list}
+
+## Statistics
+
+* Number of checkpoints: {len(allckpts)}
+* Number of configs: {len(allconfigs)}
+* Number of papers: {len(allpapers)}
+{countstr}
+
+{msglist}
+""" # noqa
+
+with open('modelzoo.md', 'w') as f:
+ f.write(modelzoo)
diff --git a/mmocr-dev-1.x/docs/en/switch_language.md b/mmocr-dev-1.x/docs/en/switch_language.md
new file mode 100644
index 0000000000000000000000000000000000000000..7baa29992eb3b36ab2804b577d3bb76db8cc4233
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/switch_language.md
@@ -0,0 +1,3 @@
+## English
+
+## ็ฎไฝไธญๆ
diff --git a/mmocr-dev-1.x/docs/en/user_guides/config.md b/mmocr-dev-1.x/docs/en/user_guides/config.md
new file mode 100644
index 0000000000000000000000000000000000000000..c2573d8488c67173e23d0b50258bb970dbe48e22
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/config.md
@@ -0,0 +1,707 @@
+# Config
+
+MMOCR mainly uses Python files as configuration files. The design of its configuration file system integrates the ideas of modularity and inheritance to facilitate various experiments.
+
+## Common Usage
+
+```{note}
+This section is recommended to be read together with the primary usage in {external+mmengine:doc}`MMEngine: Config `.
+```
+
+There are three most common operations in MMOCR: inheritance of configuration files, reference to `_base_` variables, and modification of `_base_` variables. Config provides two syntaxes for inheriting and modifying `_base_`, one for Python, Json, and Yaml, and one for Python configuration files only. In MMOCR, we **prefer the Python-only syntax**, so this will be the basis for further description.
+
+The `configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py` is used as an example to illustrate the three common uses.
+
+```Python
+_base_ = [
+ '_base_dbnet_resnet18_fpnc.py',
+ '../_base_/datasets/icdar2015.py',
+ '../_base_/default_runtime.py',
+ '../_base_/schedules/schedule_sgd_1200e.py',
+]
+
+# dataset settings
+icdar2015_textdet_train = _base_.icdar2015_textdet_train
+icdar2015_textdet_train.pipeline = _base_.train_pipeline
+icdar2015_textdet_test = _base_.icdar2015_textdet_test
+icdar2015_textdet_test.pipeline = _base_.test_pipeline
+
+train_dataloader = dict(
+ batch_size=16,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=icdar2015_textdet_train)
+
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=4,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=icdar2015_textdet_test)
+```
+
+### Configuration Inheritance
+
+There is an inheritance mechanism for configuration files, i.e. one configuration file A can use another configuration file B as its base and inherit all the fields directly from it, thus avoiding a lot of copy-pasting.
+
+In `dbnet_resnet18_fpnc_1200e_icdar2015.py` you can see that
+
+```Python
+_base_ = [
+ '_base_dbnet_resnet18_fpnc.py',
+ '../_base_/datasets/icdar2015.py',
+ '../_base_/default_runtime.py',
+ '../_base_/schedules/schedule_sgd_1200e.py',
+]
+```
+
+The above statement reads all the base configuration files in the list, and all the fields in them are loaded into `dbnet_resnet18_fpnc_1200e_icdar2015.py`. We can see the structure of the configuration file after it has been parsed by running the following statement in a Python interpretation.
+
+```Python
+from mmengine import Config
+db_config = Config.fromfile('configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py')
+print(db_config)
+```
+
+It can be found that the parsed configuration contains all the fields and information in the base configuration.
+
+```{note}
+Variables with the same name cannot exist in each `_base_` profile.
+```
+
+### `_base_` Variable References
+
+Sometimes we may need to reference some fields in the `_base_` configuration directly in order to avoid duplicate definitions. Suppose we want to get the variable `pseudo` in the `_base_` configuration, we can get the variable in the `_base_` configuration directly via `_base_.pseudo`.
+
+This syntax has been used extensively in the configuration of MMOCR, and the dataset and pipeline configurations for each model in MMOCR are referenced in the *_base_* configuration. For example,
+
+```Python
+icdar2015_textdet_train = _base_.icdar2015_textdet_train
+# ...
+train_dataloader = dict(
+ # ...
+ dataset=icdar2015_textdet_train)
+```
+
+
+
+### `_base_` Variable Modification
+
+In MMOCR, different algorithms usually have different pipelines in different datasets, so there are often scenarios to modify the `pipeline` in the dataset. There are also many scenarios where you need to modify variables in the `_base_` configuration, for example, modifying the training strategy of an algorithm, replacing some modules of an algorithm(backbone, etc.). Users can directly modify the referenced `_base_` variables using Python syntax. For dict, we also provide a method similar to class attribute modification to modify the contents of the dictionary directly.
+
+1. Dictionary
+
+ Here is an example of modifying `pipeline` in a dataset.
+
+ The dictionary can be modified using Python syntax:
+
+ ```Python
+ # Get the dataset in _base_
+ icdar2015_textdet_train = _base_.icdar2015_textdet_train
+ # You can modify the variables directly with Python's update
+ icdar2015_textdet_train.update(pipeline=_base_.train_pipeline)
+ ```
+
+ It can also be modified in the same way as changing Python class attributes.
+
+ ```Python
+ # Get the dataset in _base_
+ icdar2015_textdet_train = _base_.icdar2015_textdet_train
+ # The class property method is modified
+ icdar2015_textdet_train.pipeline = _base_.train_pipeline
+ ```
+
+2. List
+
+ Suppose the variable `pseudo = [1, 2, 3]` in the `_base_` configuration needs to be modified to `[1, 2, 4]`:
+
+ ```Python
+ # pseudo.py
+ pseudo = [1, 2, 3]
+ ```
+
+ Can be rewritten directly as.
+
+ ```Python
+ _base_ = ['pseudo.py']
+ pseudo = [1, 2, 4]
+ ```
+
+ Or modify the list using Python syntax:
+
+ ```Python
+ _base_ = ['pseudo.py']
+ pseudo = _base_.pseudo
+ pseudo[2] = 4
+ ```
+
+### Command Line Modification
+
+Sometimes we only want to fix part of the configuration and do not want to modify the configuration file itself. For example, if you want to change the learning rate during an experiment but do not want to write a new configuration file, you can pass in parameters on the command line to override the relevant configuration.
+
+We can pass `--cfg-options` on the command line and modify the corresponding fields directly with the arguments after it. For example, we can run the following command to modify the learning rate temporarily for this training session.
+
+```Shell
+python tools/train.py example.py --cfg-options optim_wrapper.optimizer.lr=1
+```
+
+For more detailed usage, refer to {external+mmengine:doc}`MMEngine: Command Line Modification `.
+
+## Configuration Content
+
+With config files and Registry, MMOCR can modify the training parameters as well as the model configuration without invading the code. Specifically, users can customize the following modules in the configuration file: environment configuration, hook configuration, log configuration, training strategy configuration, data-related configuration, model-related configuration, evaluation configuration, and visualization configuration.
+
+This document will take the text detection algorithm `DBNet` and the text recognition algorithm `CRNN` as examples to introduce the contents of Config in detail.
+
+
+
+### Environment Configuration
+
+```Python
+default_scope = 'mmocr'
+env_cfg = dict(
+ cudnn_benchmark=True,
+ mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
+ dist_cfg=dict(backend='nccl'))
+randomness = dict(seed=None)
+```
+
+There are three main components:
+
+- Set the default `scope` of all registries to `mmocr`, ensuring that all modules are searched first from the `MMOCR` codebase. If the module does not exist, the search will continue from the upstream algorithm libraries `MMEngine` and `MMCV`, see {external+mmengine:doc}`MMEngine: Registry ` for more details.
+
+- `env_cfg` configures the distributed environment, see {external+mmengine:doc}`MMEngine: Runner ` for more details.
+
+- `randomness`: Some settings to make the experiment as reproducible
+ as possible like seed and deterministic. See {external+mmengine:doc}`MMEngine: Runner ` for more details.
+
+
+
+### Hook Configuration
+
+Hooks are divided into two main parts, default hooks, which are required for all tasks to run, and custom hooks, which generally serve specific algorithms or specific tasks (there are no custom hooks in MMOCR so far).
+
+```Python
+default_hooks = dict(
+ timer=dict(type='IterTimerHook'), # Time recording, including data time as well as model inference time
+ logger=dict(type='LoggerHook', interval=1), # Collect logs from different components
+ param_scheduler=dict(type='ParamSchedulerHook'), # Update some hyper-parameters in optimizer
+ checkpoint=dict(type='CheckpointHook', interval=1),# Save checkpoint. `interval` control save interval
+ sampler_seed=dict(type='DistSamplerSeedHook'), # Data-loading sampler for distributed training.
+ sync_buffer=dict(type='SyncBuffersHook'), # Synchronize buffer in case of distributed training
+ visualization=dict( # Visualize the results of val and test
+ type='VisualizationHook',
+ interval=1,
+ enable=False,
+ show=False,
+ draw_gt=False,
+ draw_pred=False))
+ custom_hooks = []
+```
+
+Here is a brief description of a few hooks whose parameters may be changed frequently. For a general modification method, refer to Modify configuration.
+
+- `LoggerHook`: Used to configure the behavior of the logger. For example, by modifying `interval` you can control the interval of log printing, so that the log is printed once per `interval` iteration, for more settings refer to [LoggerHook API](mmengine.hooks.LoggerHook).
+
+- `CheckpointHook`: Used to configure checkpoint-related behavior, such as saving optimal and/or latest weights. You can also modify `interval` to control the checkpoint saving interval. More settings can be found in [CheckpointHook API](mmengine.hooks.CheckpointHook)
+
+- `VisualizationHook`: Used to configure visualization-related behavior, such as visualizing predicted results during validation or testing. **Default is off**. This Hook also depends on [Visualization Configuration](#Visualization-configuration). You can refer to [Visualizer](visualization.md) for more details. For more configuration, you can refer to [VisualizationHook API](mmocr.engine.hooks.VisualizationHook).
+
+If you want to learn more about the configuration of the default hooks and their functions, you can refer to {external+mmengine:doc}`MMEngine: Hooks `.
+
+
+
+### Log Configuration
+
+This section is mainly used to configure the log level and the log processor.
+
+```Python
+log_level = 'INFO' # Logging Level
+log_processor = dict(type='LogProcessor',
+ window_size=10,
+ by_epoch=True)
+```
+
+- The logging severity level is the same as that of {external+python:doc}`Python: logging `
+
+- The log processor is mainly used to control the format of the output, detailed functions can be found in {external+mmengine:doc}`MMEngine: logging `.
+
+ - `by_epoch=True` indicates that the logs are output in accordance to "epoch", and the log format needs to be consistent with the `type='EpochBasedTrainLoop'` parameter in `train_cfg`. For example, if you want to output logs by iteration number, you need to set ` by_epoch=False` in `log_processor` and `type='IterBasedTrainLoop'` in `train_cfg`.
+
+ - `window_size` indicates the smoothing window of the loss, i.e. the average value of the various losses for the last `window_size` iterations. the final loss value printed in logger is the average of all the losses.
+
+
+
+### Training Strategy Configuration
+
+This section mainly contains optimizer settings, learning rate schedules and `Loop` settings.
+
+Training strategies usually vary for different tasks (text detection, text recognition, key information extraction). Here we explain the example configuration in `CRNN`, which is a text recognition model.
+
+```Python
+# optimizer
+optim_wrapper = dict(
+ type='OptimWrapper', optimizer=dict(type='Adadelta', lr=1.0))
+param_scheduler = [dict(type='ConstantLR', factor=1.0)]
+train_cfg = dict(type='EpochBasedTrainLoop',
+ max_epochs=5, # train epochs
+ val_interval=1) # val interval
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+```
+
+- `optim_wrapper` : It contains two main parts, OptimWrapper and Optimizer. Detailed usage information can be found in {external+mmengine:doc}`MMEngine: Optimizer Wrapper `.
+
+ - The Optimizer wrapper supports different training strategies, including mixed-accuracy training (AMP), gradient accumulation, and gradient truncation.
+
+ - All PyTorch optimizers are supported in the optimizer settings. All supported optimizers are available in {external+torch:ref}`PyTorch Optimizer List `.
+
+- `param_scheduler` : learning rate tuning strategy, supports most of the learning rate schedulers in PyTorch, such as `ExponentialLR`, `LinearLR`, `StepLR`, `MultiStepLR`, etc., and is used in much the same way, see [scheduler interface](mmengine.optim.scheduler), and more features can be found in the {external+mmengine:doc}`MMEngine: Optimizer Parameter Tuning Strategy `.
+
+- `train/test/val_cfg` : the execution flow of the task, MMEngine provides four kinds of flow: `EpochBasedTrainLoop`, `IterBasedTrainLoop`, `ValLoop`, `TestLoop` More can be found in {external+mmengine:doc}`MMEngine: loop controller `.
+
+### Data-related Configuration
+
+
+
+#### Dataset Configuration
+
+It is mainly about two parts.
+
+- The location of the dataset(s), including images and annotation files.
+
+- Data augmentation related configurations. In the OCR domain, data augmentation is usually strongly associated with the model.
+
+More parameter configurations can be found in [Data Base Class](#TODO).
+
+The naming convention for dataset fields in MMOCR is
+
+```Python
+{dataset}_{task}_{train/val/test} = dict(...)
+```
+
+- dataset: See [dataset abbreviations](#TODO)
+
+- task: `det`(text detection), `rec`(text recognition), `kie`(key information extraction)
+
+- train/val/test: Dataset split.
+
+For example, for text recognition tasks, Syn90k is used as the training set, while icdar2013 and icdar2015 serve as the test sets. These are configured as follows.
+
+```Python
+# text recognition dataset configuration
+mjsynth_textrecog_train = dict(
+ type='OCRDataset',
+ data_root='data/rec/Syn90k/',
+ data_prefix=dict(img_path='mnt/ramdisk/max/90kDICT32px'),
+ ann_file='train_labels.json',
+ test_mode=False,
+ pipeline=None)
+
+icdar2013_textrecog_test = dict(
+ type='OCRDataset',
+ data_root='data/rec/icdar_2013/',
+ data_prefix=dict(img_path='Challenge2_Test_Task3_Images/'),
+ ann_file='test_labels.json',
+ test_mode=True,
+ pipeline=None)
+
+icdar2015_textrecog_test = dict(
+ type='OCRDataset',
+ data_root='data/rec/icdar_2015/',
+ data_prefix=dict(img_path='ch4_test_word_images_gt/'),
+ ann_file='test_labels.json',
+ test_mode=True,
+ pipeline=None)
+```
+
+
+
+#### Data Pipeline Configuration
+
+In MMOCR, dataset construction and data preparation are decoupled from each other. In other words, dataset classes such as `OCRDataset` are responsible for reading and parsing annotation files, while Data Transforms further implement data loading, data augmentation, data formatting and other related functions.
+
+In general, there are different augmentation strategies for training and testing, so there are usually `training_pipeline` and `testing_pipeline`. More information can be found in [Data Transforms](../basic_concepts/transforms.md)
+
+- The data augmentation process of the training pipeline is usually: data loading (LoadImageFromFile) -> annotation information loading (LoadXXXAnntation) -> data augmentation -> data formatting (PackXXXInputs).
+
+- The data augmentation flow of the test pipeline is usually: Data Loading (LoadImageFromFile) -> Data Augmentation -> Annotation Loading (LoadXXXAnntation) -> Data Formatting (PackXXXInputs).
+
+Due to the specificity of the OCR task, different models have different data augmentation techniques, and even the same model can have different data augmentation strategies for different datasets. Take `CRNN` as an example.
+
+```Python
+# Data Augmentation
+train_pipeline = [
+ dict(
+ type='LoadImageFromFile',
+ color_type='grayscale',
+ ignore_empty=True,
+ min_size=5),
+ dict(type='LoadOCRAnnotations', with_text=True),
+ dict(type='Resize', scale=(100, 32), keep_ratio=False),
+ dict(
+ type='PackTextRecogInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'valid_ratio'))
+]
+test_pipeline = [
+ dict(
+ type='LoadImageFromFile',
+ color_type='grayscale'),
+ dict(
+ type='RescaleToHeight',
+ height=32,
+ min_width=32,
+ max_width=None,
+ width_divisor=16),
+ dict(type='LoadOCRAnnotations', with_text=True),
+ dict(
+ type='PackTextRecogInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'valid_ratio'))
+]
+```
+
+#### Dataloader Configuration
+
+The main configuration information needed to construct the dataset loader (dataloader), see {external+torch:doc}`PyTorch DataLoader ` for more tutorials.
+
+```Python
+# Dataloader
+train_dataloader = dict(
+ batch_size=64,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=dict(
+ type='ConcatDataset',
+ datasets=[mjsynth_textrecog_train],
+ pipeline=train_pipeline))
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=4,
+ persistent_workers=True,
+ drop_last=False,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=dict(
+ type='ConcatDataset',
+ datasets=[icdar2013_textrecog_test, icdar2015_textrecog_test],
+ pipeline=test_pipeline))
+test_dataloader = val_dataloader
+```
+
+### Model-related Configuration
+
+
+
+#### Network Configuration
+
+This section configures the network architecture. Different algorithmic tasks use different network architectures. Find more info about network architecture in [structures](../basic_concepts/structures.md)
+
+##### Text Detection
+
+Text detection consists of several parts:
+
+- `data_preprocessor`: [data_preprocessor](mmocr.models.textdet.data_preprocessors.TextDetDataPreprocessor)
+- `backbone`: backbone network configuration
+- `neck`: neck network configuration
+- `det_head`: detection head network configuration
+ - `module_loss`: module loss configuration
+ - `postprocessor`: postprocessor configuration
+
+We present the model configuration in text detection using DBNet as an example.
+
+```Python
+model = dict(
+ type='DBNet',
+ data_preprocessor=dict(
+ type='TextDetDataPreprocessor',
+ mean=[123.675, 116.28, 103.53],
+ std=[58.395, 57.12, 57.375],
+ bgr_to_rgb=True,
+ pad_size_divisor=32)
+ backbone=dict(
+ type='mmdet.ResNet',
+ depth=18,
+ num_stages=4,
+ out_indices=(0, 1, 2, 3),
+ frozen_stages=-1,
+ norm_cfg=dict(type='BN', requires_grad=True),
+ init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18'),
+ norm_eval=False,
+ style='caffe'),
+ neck=dict(
+ type='FPNC', in_channels=[64, 128, 256, 512], lateral_channels=256),
+ det_head=dict(
+ type='DBHead',
+ in_channels=256,
+ module_loss=dict(type='DBModuleLoss'),
+ postprocessor=dict(type='DBPostprocessor', text_repr_type='quad')))
+```
+
+##### Text Recognition
+
+Text recognition mainly contains:
+
+- `data_processor`: [data preprocessor configuration](mmocr.models.textrecog.data_processors.TextRecDataPreprocessor)
+- `preprocessor`: network preprocessor configuration, e.g. TPS
+- `backbone`: backbone configuration
+- `encoder`: encoder configuration
+- `decoder`: decoder configuration
+ - `module_loss`: decoder module loss configuration
+ - `postprocessor`: decoder postprocessor configuration
+ - `dictionary`: dictionary configuration
+
+Using CRNN as an example.
+
+```Python
+# model
+model = dict(
+ type='CRNN',
+ data_preprocessor=dict(
+ type='TextRecogDataPreprocessor', mean=[127], std=[127])
+ preprocessor=None,
+ backbone=dict(type='VeryDeepVgg', leaky_relu=False, input_channels=1),
+ encoder=None,
+ decoder=dict(
+ type='CRNNDecoder',
+ in_channels=512,
+ rnn_flag=True,
+ module_loss=dict(type='CTCModuleLoss', letter_case='lower'),
+ postprocessor=dict(type='CTCPostProcessor'),
+ dictionary=dict(
+ type='Dictionary',
+ dict_file='dicts/lower_english_digits.txt',
+ with_padding=True)))
+```
+
+
+
+#### Checkpoint Loading Configuration
+
+The model weights in the checkpoint file can be loaded via the `load_from` parameter, simply by setting the `load_from` parameter to the path of the checkpoint file.
+
+You can also resume training by setting `resume=True` to load the training status information in the checkpoint. When both `load_from` and `resume=True` are set, MMEngine will load the training state from the checkpoint file at the `load_from` path.
+
+If only `resume=True` is set, the executor will try to find and read the latest checkpoint file from the `work_dir` folder
+
+```Python
+load_from = None # Path to load checkpoint
+resume = False # whether resume
+```
+
+More can be found in {external+mmengine:doc}`MMEngine: Load Weights or Recover Training ` and [OCR Advanced Tips - Resume Training from Checkpoints](train_test.md#resume-training-from-a-checkpoint).
+
+
+
+### Evaluation Configuration
+
+In model validation and model testing, quantitative measurement of model accuracy is often required. MMOCR performs this function by means of `Metric` and `Evaluator`. For more information, please refer to {external+mmengine:doc}`MMEngine: Evaluation ` and [Evaluation](../basic_concepts/evaluation.md)
+
+#### Evaluator
+
+Evaluator is mainly used to manage multiple datasets and multiple `Metrics`. For single and multiple dataset cases, there are single and multiple dataset evaluators, both of which can manage multiple `Metrics`.
+
+The single-dataset evaluator is configured as follows.
+
+```Python
+# Single Dataset Single Metric
+val_evaluator = dict(
+ type='Evaluator',
+ metrics=dict())
+
+# Single Dataset Multiple Metric
+val_evaluator = dict(
+ type='Evaluator',
+ metrics=[...])
+```
+
+`MultiDatasetsEvaluator` differs from single-dataset evaluation in two aspects: `type` and `dataset_prefixes`. The evaluator type must be `MultiDatasetsEvaluator` and cannot be omitted. The `dataset_prefixes` is mainly used to distinguish the results of different datasets with the same evaluation metrics, see [MultiDatasetsEvaluation](../basic_concepts/evaluation.md).
+
+Assuming that we need to test accuracy on IC13 and IC15 datasets, the configuration is as follows.
+
+```Python
+# Multiple datasets, single Metric
+val_evaluator = dict(
+ type='MultiDatasetsEvaluator',
+ metrics=dict(),
+ dataset_prefixes=['IC13', 'IC15'])
+
+# Multiple datasets, multiple Metrics
+val_evaluator = dict(
+ type='MultiDatasetsEvaluator',
+ metrics=[...],
+ dataset_prefixes=['IC13', 'IC15'])
+```
+
+#### Metric
+
+A metric evaluates a model's performance from a specific perspective. While there is no such common metric that fits all the tasks, MMOCR provides enough flexibility such that multiple metrics serving the same task can be used simultaneously. Here we list task-specific metrics for reference.
+
+Text detection: [`HmeanIOUMetric`](mmocr.evaluation.metrics.HmeanIOUMetric)
+
+Text recognition: [`WordMetric`](mmocr.evaluation.metrics.WordMetric), [`CharMetric`](mmocr.evaluation.metrics.CharMetric), [`OneMinusNEDMetric`](mmocr.evaluation.metrics.OneMinusNEDMetric)
+
+Key information extraction: [`F1Metric`](mmocr.evaluation.metrics.F1Metric)
+
+Text detection as an example, using a single `Metric` in the case of single dataset evaluation.
+
+```Python
+val_evaluator = dict(type='HmeanIOUMetric')
+```
+
+Take text recognition as an example, multiple datasets (`IC13` and `IC15`) are evaluated using multiple `Metric`s (`WordMetric` and `CharMetric`).
+
+```Python
+val_evaluator = dict(
+ type='MultiDatasetsEvaluator',
+ metrics=[
+ dict(
+ type='WordMetric',
+ mode=['exact', 'ignore_case', 'ignore_case_symbol']),
+ dict(type='CharMetric')
+ ],
+ dataset_prefixes=['IC13', 'IC15'])
+test_evaluator = val_evaluator
+```
+
+
+
+### Visualization Configuration
+
+Each task is bound to a task-specific visualizer. The visualizer is mainly used for visualizing or storing intermediate results of user models and visualizing val and test prediction results. The visualization results can also be stored in different backends such as WandB, TensorBoard, etc. through the corresponding visualization backend. Commonly used modification operations can be found in [visualization](visualization.md).
+
+The default configuration of visualization for text detection is as follows.
+
+```Python
+vis_backends = [dict(type='LocalVisBackend')]
+visualizer = dict(
+ type='TextDetLocalVisualizer', # Different visualizers for different tasks
+ vis_backends=vis_backends,
+ name='visualizer')
+```
+
+## Directory Structure
+
+All configuration files of `MMOCR` are placed under the `configs` folder. To avoid config files from being too long and improve their reusability and clarity, MMOCR takes advantage of the inheritance mechanism and split config files into eight sections. Since each section is closely related to the task type, MMOCR provides a task folder for each task in `configs/`, namely `textdet` (text detection task), `textrecog` (text recognition task), and `kie` (key information extraction). Each folder is further divided into two parts: `_base_` folder and algorithm configuration folders.
+
+1. the `_base_` folder stores some general config files unrelated to specific algorithms, and each section is divided into datasets, training strategies and runtime configurations by directory.
+
+2. The algorithm configuration folder stores config files that are strongly related to the algorithm. The algorithm configuration folder has two kinds of config files.
+
+ 1. Config files starting with `_base_`: Configures the model and data pipeline of an algorithm. In OCR domain, data augmentation strategies are generally strongly related to the algorithm, so the model and data pipeline are usually placed in the same config file.
+
+ 2. Other config files, i.e. the algorithm-specific configurations on the specific dataset(s): These are the full config files that further configure training and testing settings, aggregating `_base_` configurations that are scattered in different locations. Inside some modifications to the fields in `_base_` configs may be performed, such as data pipeline, training strategy, etc.
+
+All these config files are distributed in different folders according to their contents as follows:
+
+
+
+
+
+The final directory structure is as follows.
+
+```Python
+configs
+โโโ textdet
+โ โโโ _base_
+โ โ โโโ datasets
+โ โ โ โโโ icdar2015.py
+โ โ โ โโโ icdar2017.py
+โ โ โ โโโ totaltext.py
+โ โ โโโ schedules
+โ โ โ โโโ schedule_adam_600e.py
+โ โ โโโ default_runtime.py
+โ โโโ dbnet
+โ โโโ _base_dbnet_resnet18_fpnc.py
+โ โโโ dbnet_resnet18_fpnc_1200e_icdar2015.py
+โโโ textrecog
+โ โโโ _base_
+โ โ โโโ datasets
+โ โ โ โโโ icdar2015.py
+โ โ โ โโโ icdar2017.py
+โ โ โ โโโ totaltext.py
+โ โ โโโ schedules
+โ โ โ โโโ schedule_adam_base.py
+โ โ โโโ default_runtime.py
+โ โโโ crnn
+โ โโโ _base_crnn_mini-vgg.py
+โ โโโ crnn_mini-vgg_5e_mj.py
+โโโ kie
+ โโโ _base_
+ โ โโโdatasets
+ โ โโโ default_runtime.py
+ โโโ sgdmr
+ โโโ sdmgr_novisual_60e_wildreceipt_openset.py
+```
+
+## Naming Conventions
+
+MMOCR has a convention to name config files, and contributors to the code base need to follow the same naming rules. The file names are divided into four sections: algorithm information, module information, training information, and data information. Words that logically belong to different sections are connected by an underscore `'_'`, and multiple words in the same section are connected by a hyphen `'-'`.
+
+```Python
+{{algorithm info}}_{{module info}}_{{training info}}_{{data info}}.py
+```
+
+- algorithm info: the name of the algorithm, such as dbnet, crnn, etc.
+
+- module info: list some intermediate modules in the order of data flow. Its content depends on the algorithm, and some modules strongly related to the model will be omitted to avoid an overly long name. For example:
+
+ - For the text detection task and the key information extraction task :
+
+ ```Python
+ {{algorithm info}}_{{backbone}}_{{neck}}_{{head}}_{{training info}}_{{data info}}.py
+ ```
+
+ `{head}` is usually omitted since it's algorithm-specific.
+
+ - For text recognition tasks.
+
+ ```Python
+ {{algorithm info}}_{{backbone}}_{{encoder}}_{{decoder}}_{{training info}}_{{data info}}.py
+ ```
+
+ Since encoder and decoder are generally bound to the algorithm, they are usually omitted.
+
+- training info: some settings of the training strategy, including batch size, schedule, etc.
+
+- data info: dataset name, modality, input size, etc., such as icdar2015 and synthtext.
diff --git a/mmocr-dev-1.x/docs/en/user_guides/data_prepare/dataset_preparer.md b/mmocr-dev-1.x/docs/en/user_guides/data_prepare/dataset_preparer.md
new file mode 100644
index 0000000000000000000000000000000000000000..55174cc5894e54db0daf9706ef406fce6f8d14c6
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/data_prepare/dataset_preparer.md
@@ -0,0 +1,776 @@
+# Dataset Preparer (Beta)
+
+```{note}
+Dataset Preparer is still in beta version and might not be stable enough. You are welcome to try it out and report any issues to us.
+```
+
+## One-click data preparation script
+
+MMOCR provides a unified one-stop data preparation script `prepare_dataset.py`.
+
+Only one line of command is needed to complete the data download, decompression, format conversion, and basic configure generation.
+
+```bash
+python tools/dataset_converters/prepare_dataset.py [-h] [--nproc NPROC] [--task {textdet,textrecog,textspotting,kie}] [--splits SPLITS [SPLITS ...]] [--lmdb] [--overwrite-cfg] [--dataset-zoo-path DATASET_ZOO_PATH] datasets [datasets ...]
+```
+
+| ARGS | Type | Description |
+| ------------------ | ---- | ----------------------------------------------------------------------------------------------------------------------------------------- |
+| dataset_name | str | (required) dataset name. |
+| --nproc | int | Number of processes to be used. Defaults to 4. |
+| --task | str | Convert the dataset to the format of a specified task supported by MMOCR. options are: 'textdet', 'textrecog', 'textspotting', and 'kie'. |
+| --splits | str | Splits of the dataset to be prepared. Multiple splits can be accepted. Defaults to `train val test`. |
+| --lmdb | str | Store the data in LMDB format. Only valid when the task is `textrecog`. |
+| --overwrite-cfg | str | Whether to overwrite the dataset config file if it already exists in `configs/{task}/_base_/datasets`. |
+| --dataset-zoo-path | str | Path to the dataset config file. If not specified, the default path is `./dataset_zoo`. |
+
+For example, the following command shows how to use the script to prepare the ICDAR2015 dataset for text detection task.
+
+```bash
+python tools/dataset_converters/prepare_dataset.py icdar2015 --task textdet --overwrite-cfg
+```
+
+Also, the script supports preparing multiple datasets at the same time. For example, the following command shows how to prepare the ICDAR2015 and TotalText datasets for text recognition task.
+
+```bash
+python tools/dataset_converters/prepare_dataset.py icdar2015 totaltext --task textrecog --overwrite-cfg
+```
+
+To check the supported datasets of Dataset Preparer, please refer to [Dataset Zoo](./datasetzoo.md). Some of other datasets that need to be prepared manually are listed in [Text Detection](./det.md) and [Text Recognition](./recog.md).
+
+For users in China, more datasets can be downloaded from the opensource dataset platform: [OpenDataLab](https://opendatalab.com/). After downloading the data, you can place the files listed in `data_obtainer.save_name` in `data/cache` and rerun the script.
+
+## Advanced Usage
+
+### LMDB Format
+
+In text recognition tasks, we usually use LMDB format to store data to speed up data loading. When using the `prepare_dataset.py` script to prepare data, you can store data to the LMDB format by the `--lmdb` parameter. For example:
+
+```bash
+python tools/dataset_converters/prepare_dataset.py icdar2015 --task textrecog --lmdb
+```
+
+As soon as the dataset is prepared, Dataset Preparer will generate `icdar2015_lmdb.py` in the `configs/textrecog/_base_/datasets/` directory. You can inherit this file and point the `dataloader` to the LMDB dataset. Moreover, the LMDB dataset needs to be loaded by [`LoadImageFromNDArray`](mmocr.datasets.transforms.LoadImageFromNDArray), thus you also need to modify `pipeline`.
+
+For example, if we want to change the training set of `configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py` to icdar2015 generated before, we need to perform the following modifications:
+
+1. Modify `configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py`:
+
+ ```python
+ _base_ = [
+ '../_base_/datasets/icdar2015_lmdb.py', # point to icdar2015 lmdb dataset
+ ...
+ ]
+
+ train_list = [_base_.icdar2015_lmdb_textrecog_train]
+ ...
+ ```
+
+2. Modify `train_pipeline` in `configs/textrecog/crnn/_base_crnn_mini-vgg.py`, change `LoadImageFromFile` to `LoadImageFromNDArray`:
+
+ ```python
+ train_pipeline = [
+ dict(
+ type='LoadImageFromNDArray',
+ color_type='grayscale',
+ file_client_args=file_client_args,
+ ignore_empty=True,
+ min_size=2),
+ ...
+ ]
+ ```
+
+## Design
+
+There are many OCR datasets with different languages, annotation formats, and scenarios. There are generally two ways to use these datasets: to quickly understand the relevant information about the dataset, or to use it to train models. To meet these two usage scenarios, MMOCR provides dataset automatic preparation scripts. The dataset automatic preparation script uses modular design, which greatly enhances scalability, and allows users to easily configure other public or private datasets. The configuration files for the dataset automatic preparation script are uniformly stored in the `dataset_zoo/` directory. Users can find all the configuration files for the dataset preparation scripts officially supported by MMOCR in this directory. The directory structure of this folder is as follows:
+
+```text
+dataset_zoo/
+โโโ icdar2015
+โ โโโ metafile.yml
+โ โโโ sample_anno.md
+โ โโโ textdet.py
+โ โโโ textrecog.py
+โ โโโ textspotting.py
+โโโ wildreceipt
+ โโโ metafile.yml
+ โโโ sample_anno.md
+ โโโ kie.py
+ โโโ textdet.py
+ โโโ textrecog.py
+ โโโ textspotting.py
+```
+
+### Dataset-related Information
+
+The relevant information of a dataset includes the annotation format, annotation examples, and basic statistical information of the dataset. Although this information can be found on the official website of each dataset, it is scattered across various websites, and users need to spend a lot of time to discover the basic information of the dataset. Therefore, MMOCR has designed some paradigms to help users quickly understand the basic information of the dataset. MMOCR divides the relevant information of the dataset into two parts. One part is the basic information of the dataset, including the year of publication, the authors of the paper, and copyright information, etc. The other part is the annotation information of the dataset, including the annotation format and annotation examples. MMOCR provides a paradigm for each part, and contributors can fill in the basic information of the dataset according to the paradigm. This way, users can quickly understand the basic information of the dataset. Based on the basic information of the dataset, MMOCR provides a `metafile.yml` file, which contains the basic information of the corresponding dataset, including the year of publication, the authors of the paper, and copyright information, etc. In this way, users can quickly understand the basic information of the dataset. This file is not mandatory during the dataset preparation process (so users can ignore it when adding their own private datasets), but to better understand the information of various public datasets, MMOCR recommends that users read the corresponding metafile information before using the dataset preparation script to understand whether the characteristics of the dataset meet the user's needs. MMOCR uses ICDAR2015 as an example, and its sample content is shown below:
+
+```yaml
+Name: 'Incidental Scene Text IC15'
+Paper:
+ Title: ICDAR 2015 Competition on Robust Reading
+ URL: https://rrc.cvc.uab.es/files/short_rrc_2015.pdf
+ Venue: ICDAR
+ Year: '2015'
+ BibTeX: '@inproceedings{karatzas2015icdar,
+ title={ICDAR 2015 competition on robust reading},
+ author={Karatzas, Dimosthenis and Gomez-Bigorda, Lluis and Nicolaou, Anguelos and Ghosh, Suman and Bagdanov, Andrew and Iwamura, Masakazu and Matas, Jiri and Neumann, Lukas and Chandrasekhar, Vijay Ramaseshan and Lu, Shijian and others},
+ booktitle={2015 13th international conference on document analysis and recognition (ICDAR)},
+ pages={1156--1160},
+ year={2015},
+ organization={IEEE}}'
+Data:
+ Website: https://rrc.cvc.uab.es/?ch=4
+ Language:
+ - English
+ Scene:
+ - Natural Scene
+ Granularity:
+ - Word
+ Tasks:
+ - textdet
+ - textrecog
+ - textspotting
+ License:
+ Type: CC BY 4.0
+ Link: https://creativecommons.org/licenses/by/4.0/
+```
+
+Specifically, MMOCR lists the meaning of each field in the following table:
+
+| Field Name | Meaning |
+| :--------------- | :------------------------------------------------------------------------------------------------------- |
+| Name | The name of the dataset |
+| Paper.Title | The title of the paper for the dataset |
+| Paper.URL | The URL of the paper for the dataset |
+| Paper.Venue | The venue of the paper for the dataset |
+| Paper.Year | The year of publication for the paper |
+| Paper.BibTeX | The BibTeX citation of the paper for the dataset |
+| Data.Website | The official website of the dataset |
+| Data.Language | The supported languages of the dataset |
+| Data.Scene | The supported scenes of the dataset, such as `Natural Scene`, `Document`, `Handwritten`, etc. |
+| Data.Granularity | The supported granularities of the dataset, such as `Character`, `Word`, `Line`, etc. |
+| Data.Tasks | The supported tasks of the dataset, such as `textdet`, `textrecog`, `textspotting`, `kie`, etc. |
+| Data.License | License information for the dataset. Use `N/A` if no license exists. |
+| Data.Format | File format of the annotation files, such as `.txt`, `.xml`, `.json`, etc. |
+| Data.Keywords | Keywords describing the characteristics of the dataset, such as `Horizontal`, `Vertical`, `Curved`, etc. |
+
+For the annotation information of the dataset, MMOCR provides a `sample_anno.md` file, which users can use as a template to fill in the annotation information of the dataset, so that users can quickly understand the annotation information of the dataset. MMOCR uses ICDAR2015 as an example, and the sample content is as follows:
+
+````markdown
+ **Text Detection**
+
+ ```text
+ # x1,y1,x2,y2,x3,y3,x4,y4,trans
+
+ 377,117,463,117,465,130,378,130,Genaxis Theatre
+ 493,115,519,115,519,131,493,131,[06]
+ 374,155,409,155,409,170,374,170,###
+````
+
+`sample_anno.md` provides annotation information for different tasks of the dataset, including the format of the annotation files (text corresponds to `txt` files, and the format of the annotation files can also be found in `meta.yml`), and examples of the annotations.
+
+With the information in these two files, users can quickly understand the basic information of the dataset. Additionally, MMOCR has summarized the basic information of all datasets, and users can view the basic information of all datasets in the [Overview](.overview.md).
+
+### Dataset Usage
+
+After decades of development, the OCR field has seen a series of related datasets emerge, often providing text annotation files in various styles, making it necessary for users to perform format conversion when using these datasets. Therefore, to facilitate dataset preparation for users, we have designed the Dataset Preparer to help users quickly prepare datasets in the format supported by MMOCR. For details, please refer to the [Dataset Format](../../basic_concepts/datasets.md) document. The following figure shows a typical workflow for running the Dataset Preparer.
+
+![workflow](https://user-images.githubusercontent.com/87774050/233025618-aa3c3ad6-c595-49a3-b080-a6284748c0c1.jpg)
+
+The figure shows that when running the Dataset Preparer, the following operations will be performed in sequence:
+
+1. For the training set, validation set, and test set, the preparers will perform:
+ 1. [Dataset download, extraction, and movement (Obtainer)](#Dataset-download-extraction-and-movement-obtainer)
+ 2. [Matching annotations with images (Gatherer)](#dataset-collection-gatherer)
+ 3. [Parsing original annotations (Parser)](#dataset-parsing-parser)
+ 4. [Packing annotations into a unified format (Packer)](#dataset-conversion-packer)
+ 5. [Saving annotations (Dumper)](#annotation-saving-dumper)
+2. Delete files (Delete)
+3. Generate the configuration file for the data set (Config Generator).
+
+To handle various types of datasets, MMOCR has designed each component as a plug-and-play module, and allows users to configure the dataset preparation process through configuration files located in `dataset_zoo/`. These configuration files are in Python format and can be used in the same way as other configuration files in MMOCR, as described in the [Configuration File documentation](../config.md).
+
+In `dataset_zoo/`, each dataset has its own folder, and the configuration files are named after the task to distinguish different configurations under different tasks. Taking the text detection part of ICDAR2015 as an example, the sample configuration file `dataset_zoo/icdar2015/textdet.py` is shown below:
+
+```python
+data_root = 'data/icdar2015'
+cache_path = 'data/cache'
+train_preparer = dict(
+ obtainer=dict(
+ type='NaiveDataObtainer',
+ cache_path=cache_path,
+ files=[
+ dict(
+ url='https://rrc.cvc.uab.es/downloads/ch4_training_images.zip',
+ save_name='ic15_textdet_train_img.zip',
+ md5='c51cbace155dcc4d98c8dd19d378f30d',
+ content=['image'],
+ mapping=[['ic15_textdet_train_img', 'textdet_imgs/train']]),
+ dict(
+ url='https://rrc.cvc.uab.es/downloads/'
+ 'ch4_training_localization_transcription_gt.zip',
+ save_name='ic15_textdet_train_gt.zip',
+ md5='3bfaf1988960909014f7987d2343060b',
+ content=['annotation'],
+ mapping=[['ic15_textdet_train_gt', 'annotations/train']]),
+ ]),
+ gatherer=dict(
+ type='PairGatherer',
+ img_suffixes=['.jpg', '.JPG'],
+ rule=[r'img_(\d+)\.([jJ][pP][gG])', r'gt_img_\1.txt']),
+ parser=dict(type='ICDARTxtTextDetAnnParser', encoding='utf-8-sig'),
+ packer=dict(type='TextDetPacker'),
+ dumper=dict(type='JsonDumper'),
+)
+
+test_preparer = dict(
+ obtainer=dict(
+ type='NaiveDataObtainer',
+ cache_path=cache_path,
+ files=[
+ dict(
+ url='https://rrc.cvc.uab.es/downloads/ch4_test_images.zip',
+ save_name='ic15_textdet_test_img.zip',
+ md5='97e4c1ddcf074ffcc75feff2b63c35dd',
+ content=['image'],
+ mapping=[['ic15_textdet_test_img', 'textdet_imgs/test']]),
+ dict(
+ url='https://rrc.cvc.uab.es/downloads/'
+ 'Challenge4_Test_Task4_GT.zip',
+ save_name='ic15_textdet_test_gt.zip',
+ md5='8bce173b06d164b98c357b0eb96ef430',
+ content=['annotation'],
+ mapping=[['ic15_textdet_test_gt', 'annotations/test']]),
+ ]),
+ gatherer=dict(
+ type='PairGatherer',
+ img_suffixes=['.jpg', '.JPG'],
+ rule=[r'img_(\d+)\.([jJ][pP][gG])', r'gt_img_\1.txt']),
+ parser=dict(type='ICDARTxtTextDetAnnParser', encoding='utf-8-sig'),
+ packer=dict(type='TextDetPacker'),
+ dumper=dict(type='JsonDumper'),
+)
+
+delete = ['annotations', 'ic15_textdet_test_img', 'ic15_textdet_train_img']
+config_generator = dict(type='TextDetConfigGenerator')
+```
+
+#### Dataset download extraction and movement (Obtainer)
+
+The `obtainer` module in Dataset Preparer is responsible for downloading, extracting, and moving the dataset. Currently, MMOCR only provides the `NaiveDataObtainer`. Generally speaking, the built-in `NaiveDataObtainer` is sufficient for downloading most datasets that can be accessed through direct links, and supports operations such as extraction, moving files, and renaming. However, MMOCR currently does not support automatically downloading datasets stored in resources that require login, such as Baidu or Google Drive. Here is a brief introduction to the `NaiveDataObtainer`.
+
+| Field Name | Meaning |
+| ---------- | -------------------------------------------------------------------------------------------- |
+| cache_path | Dataset cache path, used to store the compressed files downloaded during dataset preparation |
+| data_root | Root directory where the dataset is stored |
+| files | Dataset file list, used to describe the download information of the dataset |
+
+The `files` field is a list, and each element in the list is a dictionary used to describe the download information of a dataset file. The table below shows the meaning of each field:
+
+| Field Name | Meaning |
+| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
+| url | Download link for the dataset file |
+| save_name | Name used to save the dataset file |
+| md5 (optional) | MD5 hash of the dataset file, used to check if the downloaded file is complete |
+| split (optional) | Dataset split the file belongs to, such as `train`, `test`, etc., this field can be omitted |
+| content (optional) | Content of the dataset file, such as `image`, `annotation`, etc., this field can be omitted |
+| mapping (optional) | Decompression mapping of the dataset file, used to specify the storage location of the file after decompression, this field can be omitted |
+
+The Dataset Preparer follows the following conventions:
+
+- Images of different types of datasets are moved to the corresponding category `{taskname}_imgs/{split}/` folder, such as `textdet_imgs/train/`.
+- For a annotation file containing annotation information for all images, the annotations are moved to `annotations/{split}.*` file, such as `annotations/train.json`.
+- For a annotation file containing annotation information for one image, all annotation files are moved to `annotations/{split}/` folder, such as `annotations/train/`.
+- For some other special cases, such as all training, testing, and validation images are in one folder, the images can be moved to a self-set folder, such as `{taskname}_imgs/imgs/`, and the image storage location should be specified in the subsequent `gatherer` module.
+
+An example configuration is as follows:
+
+```python
+ obtainer=dict(
+ type='NaiveDataObtainer',
+ cache_path=cache_path,
+ files=[
+ dict(
+ url='https://rrc.cvc.uab.es/downloads/ch4_training_images.zip',
+ save_name='ic15_textdet_train_img.zip',
+ md5='c51cbace155dcc4d98c8dd19d378f30d',
+ content=['image'],
+ mapping=[['ic15_textdet_train_img', 'textdet_imgs/train']]),
+ dict(
+ url='https://rrc.cvc.uab.es/downloads/'
+ 'ch4_training_localization_transcription_gt.zip',
+ save_name='ic15_textdet_train_gt.zip',
+ md5='3bfaf1988960909014f7987d2343060b',
+ content=['annotation'],
+ mapping=[['ic15_textdet_train_gt', 'annotations/train']]),
+ ]),
+```
+
+#### Dataset collection (Gatherer)
+
+The `gatherer` module traverses the files in the dataset directory, matches image files with their corresponding annotation files, and organizes a file list for the `parser` module to read. Therefore, it is necessary to know the matching rules between image files and annotation files in the current dataset. There are two commonly used annotation storage formats for OCR datasets: one is multiple annotation files corresponding to multiple images, and the other is a single annotation file corresponding to multiple images, for example:
+
+```text
+Many-to-Many
+โโโ {taskname}_imgs/{split}/img_img_1.jpg
+โโโ annotations/{split}/gt_img_1.txt
+โโโ {taskname}_imgs/{split}/img_2.jpg
+โโโ annotations/{split}/gt_img_2.txt
+โโโ {taskname}_imgs/{split}/img_3.JPG
+โโโ annotations/{split}/gt_img_3.txt
+
+One-to-Many
+โโโ {taskname}/{split}/img_1.jpg
+โโโ {taskname}/{split}/img_2.jpg
+โโโ {taskname}/{split}/img_3.JPG
+โโโ annotations/gt.txt
+```
+
+Specific design is as follows:
+
+![Gatherer](https://user-images.githubusercontent.com/24622904/224935300-9f27e471-e87d-42db-a11d-adc8f603a7c9.png)
+
+MMOCR has built-in `PairGatherer` and `MonoGatherer` to handle the two common cases mentioned above. `PairGatherer` is used for many-to-many situations, while `MonoGatherer` is used for one-to-many situations.
+
+```{note}
+To simplify processing, the gatherer assumes that the dataset's images and annotations are stored separately in `{taskname}_imgs/{split}/` and `annotations/`, respectively. In particular, for many-to-many situations, the annotation file needs to be placed in `annotations/{split}`.
+```
+
+- In the many-to-many case, `PairGatherer` needs to find the image files and corresponding annotation files according to a certain naming convention. First, the suffix of the image needs to be specified by the `img_suffixes` parameter, as in the example above `img_suffixes=[.jpg,.JPG]`. In addition, a pair of [regular expressions](https://docs.python.org/3/library/re.html) `rule` is used to specify the correspondence between the image and annotation files. For example, `rule=[r'img_(\d+)\.([jJ][pP][gG])'๏ผr'gt_img_\1.txt']`. The first regular expression is used to match the image file name, `\d+` is used to match the image sequence number, and `([jJ][pP][gG])` is used to match the image suffix. The second regular expression is used to match the annotation file name, where `\1` associates the matched image sequence number with the annotation file sequence number. An example configuration is:
+
+```python
+ gatherer=dict(
+ type='PairGatherer',
+ img_suffixes=['.jpg', '.JPG'],
+ rule=[r'img_(\d+)\.([jJ][pP][gG])', r'gt_img_\1.txt']),
+```
+
+For the case of one-to-many, it is usually simple, and the user only needs to specify the annotation file name. For example, for the training set configuration:
+
+```python
+ gatherer=dict(type='MonoGatherer', ann_name='train.txt'),
+```
+
+MMOCR has also made conventions on the return value of `Gatherer`. `Gatherer` returns a tuple with two elements. The first element is a list of image paths (including all image paths) or the folder containing all images. The second element is a list of annotation file paths (including all annotation file paths) or the path of the annotation file (the annotation file contains all image annotation information). Specifically, the return value of `PairGatherer` is (list of image paths, list of annotation file paths), as shown below:
+
+```python
+ (['{taskname}_imgs/{split}/img_1.jpg', '{taskname}_imgs/{split}/img_2.jpg', '{taskname}_imgs/{split}/img_3.JPG'],
+ ['annotations/{split}/gt_img_1.txt', 'annotations/{split}/gt_img_2.txt', 'annotations/{split}/gt_img_3.txt'])
+```
+
+`MonoGatherer` returns a tuple containing the path to the image directory and the path to the annotation file, as follows:
+
+```python
+ ('{taskname}/{split}', 'annotations/gt.txt')
+```
+
+#### Dataset parsing (Parser)
+
+`Parser` is mainly used to parse the original annotation files. Since the original annotation formats vary greatly, MMOCR provides `BaseParser` as a base class, which users can inherit to implement their own `Parser`. In `BaseParser`, MMOCR has designed two interfaces: `parse_files` and `parse_file`, where the annotation parsing is conventionally carried out. For the two different input situations of `Gatherer` (many-to-many, one-to-many), the implementations of these two interfaces should be different.
+
+- `BaseParser` by default handles the many-to-many situation. Among them, `parse_files` distributes the data in parallel to multiple `parse_file` processes, and each `parse_file` parses the annotation of a single image separately.
+- For the one-to-many situation, the user needs to override `parse_files` to implement loading the annotation and returning standardized results.
+
+The interface of `BaseParser` is defined as follows:
+
+```python
+class BaseParser:
+ def __call__(self, img_paths, ann_paths):
+ return self.parse_files(img_paths, ann_paths)
+
+ def parse_files(self, img_paths: Union[List[str], str],
+ ann_paths: Union[List[str], str]) -> List[Tuple]:
+ samples = track_parallel_progress_multi_args(
+ self.parse_file, (img_paths, ann_paths), nproc=self.nproc)
+ return samples
+
+ @abstractmethod
+ def parse_file(self, img_path: str, ann_path: str) -> Tuple:
+
+ raise NotImplementedError
+```
+
+In order to ensure the uniformity of subsequent modules, MMOCR has made conventions for the return values of `parse_files` and `parse_file`. The return value of `parse_file` is a tuple, the first element of which is the image path, and the second element is the annotation information. The annotation information is a list, each element of which is a dictionary with the fields `poly`, `text`, and `ignore`, as shown below:
+
+```python
+# An example of returned values:
+(
+ 'imgs/train/xxx.jpg',
+ [
+ dict(
+ poly=[0, 1, 1, 1, 1, 0, 0, 0],
+ text='hello',
+ ignore=False),
+ ...
+ ]
+)
+```
+
+The output of `parse_files` is a list, and each element in the list is the return value of `parse_file`. An example is:
+
+```python
+[
+ (
+ 'imgs/train/xxx.jpg',
+ [
+ dict(
+ poly=[0, 1, 1, 1, 1, 0, 0, 0],
+ text='hello',
+ ignore=False),
+ ...
+ ]
+ ),
+ ...
+]
+```
+
+#### Dataset Conversion (Packer)
+
+`Packer` is mainly used to convert data into a unified annotation format, because the input data is the output of parsers and the format has been fixed. Therefore, the packer only needs to convert the input format into a unified annotation format for each task. Currently, MMOCR supports tasks such as text detection, text recognition, end-to-end OCR, and key information extraction, and MMOCR has a corresponding packer for each task, as shown below:
+
+![Packer](https://user-images.githubusercontent.com/24622904/225248832-11be894f-7b44-4ffa-83e1-8478c37b5e63.png)
+
+For text detection, end-to-end OCR, and key information extraction, MMOCR has a unique corresponding `Packer`. However, for text recognition, MMOCR provides two `Packer` options: `TextRecogPacker` and `TextRecogCropPacker`, due to the existence of two types of datasets:
+
+- Each image is a recognition sample, and the annotation information returned by the `parser` is only a `dict(text='xxx')`. In this case, `TextRecogPacker` can be used.
+- The dataset does not crop text from the image, and it essentially contains end-to-end OCR annotations that include the position information of the text and the corresponding text information. `TextRecogCropPacker` will crop the text from the image and then convert it into the unified format for text recognition.
+
+#### Annotation Saving (Dumper)
+
+The `dumper` module is used to determine what format the data should be saved in. Currently, MMOCR supports `JsonDumper`, `WildreceiptOpensetDumper`, and `TextRecogLMDBDumper`. They are used to save data in the standard MMOCR JSON format, the Wildreceipt format, and the LMDB format commonly used in the academic community for text recognition, respectively.
+
+#### Delete files (Delete)
+
+When processing a dataset, temporary files that are not needed may be generated. Here, a list of such files or folders can be passed in, which will be deleted when the conversion is finished.
+
+#### Generate the configuration file for the dataset (ConfigGenerator)
+
+In order to automatically generate basic configuration files after preparing the dataset, MMOCR has implemented `TextDetConfigGenerator`, `TextRecogConfigGenerator`, and `TextSpottingConfigGenerator` for each task. The main parameters supported by these generators are as follows:
+
+| Field Name | Meaning |
+| ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| data_root | Root directory where the dataset is stored. |
+| train_anns | Path to the training set annotations in the configuration file. If not specified, it defaults to `[dict(ann_file='{taskname}_train.json', dataset_postfix='']`. |
+| val_anns | Path to the validation set annotations in the configuration file. If not specified, it defaults to an empty string. |
+| test_anns | Path to the test set annotations in the configuration file. If not specified, it defaults to `[dict(ann_file='{taskname}_test.json', dataset_postfix='']`. |
+| config_path | Path to the directory where the configuration files for the algorithm are stored. The configuration generator will write the default configuration to `{config_path}/{taskname}/_base_/datasets/{dataset_name}.py`. If not specified, it defaults to `configs/`. |
+
+After preparing all the files for the dataset, the configuration generator will automatically generate the basic configuration files required to call the dataset. Below is a minimal example of a `TextDetConfigGenerator` configuration:
+
+```python
+config_generator = dict(type='TextDetConfigGenerator')
+```
+
+The generated file will be placed by default under `configs/{task}/_base_/datasets/`. In this example, the basic configuration file for the ICDAR 2015 dataset will be generated at `configs/textdet/_base_/datasets/icdar2015.py`.
+
+```python
+icdar2015_textdet_data_root = 'data/icdar2015'
+
+icdar2015_textdet_train = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textdet_data_root,
+ ann_file='textdet_train.json',
+ filter_cfg=dict(filter_empty_gt=True, min_size=32),
+ pipeline=None)
+
+icdar2015_textdet_test = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textdet_data_root,
+ ann_file='textdet_test.json',
+ test_mode=True,
+ pipeline=None)
+```
+
+If the dataset is special and there are several variants of the annotations, the configuration generator also supports generating variables pointing to each variant in the base configuration. However, this requires users to differentiate them by using different `dataset_postfix` when setting up. For example, the ICDAR 2015 text recognition dataset has two annotation versions for the test set, the original version and the 1811 version, which can be specified in `test_anns` as follows:
+
+```python
+config_generator = dict(
+ type='TextRecogConfigGenerator',
+ test_anns=[
+ dict(ann_file='textrecog_test.json'),
+ dict(dataset_postfix='857', ann_file='textrecog_test_857.json')
+ ])
+```
+
+The configuration generator will generate the following configurations:
+
+```python
+icdar2015_textrecog_data_root = 'data/icdar2015'
+
+icdar2015_textrecog_train = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textrecog_data_root,
+ ann_file='textrecog_train.json',
+ pipeline=None)
+
+icdar2015_textrecog_test = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textrecog_data_root,
+ ann_file='textrecog_test.json',
+ test_mode=True,
+ pipeline=None)
+
+icdar2015_1811_textrecog_test = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textrecog_data_root,
+ ann_file='textrecog_test_1811.json',
+ test_mode=True,
+ pipeline=None)
+```
+
+With this file, MMOCR can directly import this dataset into the `dataloader` from the model configuration file (the following sample is excerpted from [`configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py`](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py)):
+
+```python
+_base_ = [
+ '../_base_/datasets/icdar2015.py',
+ # ...
+]
+
+# dataset settings
+icdar2015_textdet_train = _base_.icdar2015_textdet_train
+icdar2015_textdet_test = _base_.icdar2015_textdet_test
+# ...
+
+train_dataloader = dict(
+ dataset=icdar2015_textdet_train)
+
+val_dataloader = dict(
+ dataset=icdar2015_textdet_test)
+
+test_dataloader = val_dataloader
+```
+
+```{note}
+By default, the configuration generator does not overwrite existing base configuration files unless the user manually specifies `overwrite-cfg` when running the script.
+```
+
+## Adding a new dataset to Dataset Preparer
+
+### Adding Public Datasets
+
+MMOCR has already supported many [commonly used public datasets](./datasetzoo.md). If the dataset you want to use has not been supported yet and you are willing to [contribute to the MMOCR](../../notes/contribution_guide.md) open-source community, you can follow the steps below to add a new dataset.
+
+In the following example, we will show you how to add the **ICDAR2013** dataset step by step.
+
+#### Adding `metafile.yml`
+
+First, make sure that the dataset you want to add does not already exist in `dataset_zoo/`. Then, create a new folder named after the dataset you want to add, such as `icdar2013/` (usually, use lowercase alphanumeric characters without symbols to name the dataset). In the `icdar2013/` folder, create a `metafile.yml` file and fill in the basic information of the dataset according to the following template:
+
+```yaml
+Name: 'Incidental Scene Text IC13'
+Paper:
+ Title: ICDAR 2013 Robust Reading Competition
+ URL: https://www.imlab.jp/publication_data/1352/icdar_competition_report.pdf
+ Venue: ICDAR
+ Year: '2013'
+ BibTeX: '@inproceedings{karatzas2013icdar,
+ title={ICDAR 2013 robust reading competition},
+ author={Karatzas, Dimosthenis and Shafait, Faisal and Uchida, Seiichi and Iwamura, Masakazu and i Bigorda, Lluis Gomez and Mestre, Sergi Robles and Mas, Joan and Mota, David Fernandez and Almazan, Jon Almazan and De Las Heras, Lluis Pere},
+ booktitle={2013 12th international conference on document analysis and recognition},
+ pages={1484--1493},
+ year={2013},
+ organization={IEEE}}'
+Data:
+ Website: https://rrc.cvc.uab.es/?ch=2
+ Language:
+ - English
+ Scene:
+ - Natural Scene
+ Granularity:
+ - Word
+ Tasks:
+ - textdet
+ - textrecog
+ - textspotting
+ License:
+ Type: N/A
+ Link: N/A
+ Format: .txt
+ Keywords:
+ - Horizontal
+```
+
+#### Add Annotation Examples
+
+Finally, you can add an annotation example file `sample_anno.md` under the `dataset_zoo/icdar2013/` directory to help the documentation script add annotation examples when generating documentation. The annotation example file is a Markdown file that typically contains the raw data format of a single sample. For example, the following code block shows a sample data file for the ICDAR2013 dataset:
+
+````markdown
+ **Text Detection**
+
+ ```text
+ # train split
+ # x1 y1 x2 y2 "transcript"
+
+ 158 128 411 181 "Footpath"
+ 443 128 501 169 "To"
+ 64 200 363 243 "Colchester"
+
+ # test split
+ # x1, y1, x2, y2, "transcript"
+
+ 38, 43, 920, 215, "Tiredness"
+ 275, 264, 665, 450, "kills"
+ 0, 699, 77, 830, "A"
+````
+
+#### Add configuration files for corresponding tasks
+
+In the `dataset_zoo/icdar2013` directory, add a `.py` configuration file named after the task. For example, `textdet.py`, `textrecog.py`, `textspotting.py`, `kie.py`, etc. The configuration template is shown below:
+
+```python
+data_root = ''
+data_cache = 'data/cache'
+train_prepare = dict(
+ obtainer=dict(
+ type='NaiveObtainer',
+ data_cache=data_cache,
+ files=[
+ dict(
+ url='xx',
+ md5='',
+ save_name='xxx',
+ mapping=list())
+ ]),
+ gatherer=dict(type='xxxGatherer', **kwargs),
+ parser=dict(type='xxxParser', **kwargs),
+ packer=dict(type='TextxxxPacker'), # Packer for the task
+ dumper=dict(type='JsonDumper'),
+)
+test_prepare = dict(
+ obtainer=dict(
+ type='NaiveObtainer',
+ data_cache=data_cache,
+ files=[
+ dict(
+ url='xx',
+ md5='',
+ save_name='xxx',
+ mapping=list())
+ ]),
+ gatherer=dict(type='xxxGatherer', **kwargs),
+ parser=dict(type='xxxParser', **kwargs),
+ packer=dict(type='TextxxxPacker'), # Packer for the task
+ dumper=dict(type='JsonDumper'),
+)
+```
+
+Taking the file detection task as an example, let's introduce the specific content of the configuration file. In general, users do not need to implement new `obtainer`, `gatherer`, `packer`, or `dumper`, but usually need to implement a new `parser` according to the annotation format of the dataset.
+
+Regarding the configuration of `obtainer`, we will not go into detail here, and you can refer to [Data set download, extraction, and movement (Obtainer)](#Dataset-download-extraction-and-movement-obtainer).
+
+For the `gatherer`, by observing the obtained ICDAR2013 dataset files, we found that each image has a corresponding `.txt` format annotation file:
+
+```text
+data_root
+โโโ textdet_imgs/train/
+โ โโโ img_1.jpg
+โ โโโ img_2.jpg
+โ โโโ ...
+โโโ annotations/train/
+โ โโโ gt_img_1.txt
+โ โโโ gt_img_2.txt
+โ โโโ ...
+```
+
+Moreover, the name of each annotation file corresponds to the image: `gt_img_1.txt` corresponds to `img_1.jpg`, and so on. Therefore, `PairGatherer` can be used to match them.
+
+```python
+gatherer=dict(
+ type='PairGatherer',
+ img_suffixes=['.jpg'],
+ rule=[r'(\w+)\.jpg', r'gt_\1.txt'])
+```
+
+The first regular expression in the rule is used to match the image file name, and the second regular expression is used to match the annotation file name. Here, `(\w+)` is used to match the image file name, and `gt_\1.txt` is used to match the annotation file name, where `\1` represents the content matched by the first regular expression. That is, it replaces `img_xx.jpg` with `gt_img_xx.txt`.
+
+Next, you need to implement a `parser` to parse the original annotation files into a standard format. Usually, before adding a new dataset, users can browse the [details page](./datasetzoo.md) of the supported datasets and check if there is a dataset with the same format. If there is, you can use the parser of that dataset directly. Otherwise, you need to implement a new format parser.
+
+Data format parsers are stored in the `mmocr/datasets/preparers/parsers` directory. All parsers need to inherit from `BaseParser` and implement the `parse_file` or `parse_files` method. For more information, please refer to [Parsing original annotations (Parser)](#dataset-parsing-parser).
+
+By observing the annotation files of the ICDAR2013 dataset:
+
+```text
+158 128 411 181 "Footpath"
+443 128 501 169 "To"
+64 200 363 243 "Colchester"
+542, 710, 938, 841, "break"
+87, 884, 457, 1021, "could"
+517, 919, 831, 1024, "save"
+```
+
+We found that the built-in `ICDARTxtTextDetAnnParser` already meets the requirements, so we can directly use this parser and configure it in the `preparer`.
+
+```python
+parser=dict(
+ type='ICDARTxtTextDetAnnParser',
+ remove_strs=[',', '"'],
+ encoding='utf-8',
+ format='x1 y1 x2 y2 trans',
+ separator=' ',
+ mode='xyxy')
+```
+
+In the configuration for the `ICDARTxtTextDetAnnParser`, `remove_strs=[',', '"']` is specified to remove extra quotes and commas in the annotation files. In the `format` section, `x1 y1 x2 y2 trans` indicates that each line in the annotation file contains four coordinates and a text content separated by spaces (`separator`=' '). Also, `mode` is set to `xyxy`, which means that the coordinates in the annotation file are the coordinates of the top-left and bottom-right corners, so that `ICDARTxtTextDetAnnParser` can parse the annotations into a unified format.
+
+For the `packer`, taking the file detection task as an example, its `packer` is `TextDetPacker`, and its configuration is as follows:
+
+```python
+packer=dict(type='TextDetPacker')
+```
+
+Finally, specify the `dumper`, which is generally saved in json format. Its configuration is as follows:
+
+```python
+dumper=dict(type='JsonDumper')
+```
+
+After the above configuration, the configuration file for the ICDAR2013 training set is as follows:
+
+```python
+train_preparer = dict(
+ obtainer=dict(
+ type='NaiveDataObtainer',
+ cache_path=cache_path,
+ files=[
+ dict(
+ url='https://rrc.cvc.uab.es/downloads/'
+ 'Challenge2_Training_Task12_Images.zip',
+ save_name='ic13_textdet_train_img.zip',
+ md5='a443b9649fda4229c9bc52751bad08fb',
+ content=['image'],
+ mapping=[['ic13_textdet_train_img', 'textdet_imgs/train']]),
+ dict(
+ url='https://rrc.cvc.uab.es/downloads/'
+ 'Challenge2_Training_Task1_GT.zip',
+ save_name='ic13_textdet_train_gt.zip',
+ md5='f3a425284a66cd67f455d389c972cce4',
+ content=['annotation'],
+ mapping=[['ic13_textdet_train_gt', 'annotations/train']]),
+ ]),
+ gatherer=dict(
+ type='PairGatherer',
+ img_suffixes=['.jpg'],
+ rule=[r'(\w+)\.jpg', r'gt_\1.txt']),
+ parser=dict(
+ type='ICDARTxtTextDetAnnParser',
+ remove_strs=[',', '"'],
+ format='x1 y1 x2 y2 trans',
+ separator=' ',
+ mode='xyxy'),
+ packer=dict(type='TextDetPacker'),
+ dumper=dict(type='JsonDumper'),
+)
+```
+
+To automatically generate the basic configuration after the dataset is prepared, you also need to configure the corresponding task's `config_generator`.
+
+In this example, since it is a text detection task, you only need to set the generator to `TextDetConfigGenerator`.
+
+```python
+config_generator = dict(type='TextDetConfigGenerator')
+```
+
+### Use DataPreparer to prepare customized dataset
+
+\[Coming Soon\]
diff --git a/mmocr-dev-1.x/docs/en/user_guides/data_prepare/det.md b/mmocr-dev-1.x/docs/en/user_guides/data_prepare/det.md
new file mode 100644
index 0000000000000000000000000000000000000000..8221215000d13e747495f09aaa11398f1aa1d774
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/data_prepare/det.md
@@ -0,0 +1,635 @@
+# Text Detection
+
+```{note}
+This page is a manual preparation guide for datasets not yet supported by [Dataset Preparer](./dataset_preparer.md), which all these scripts will be eventually migrated into.
+```
+
+## Overview
+
+| Dataset | Images | | Annotation Files | | |
+| :---------------: | :------------------------------------------------------: | :------------------------------------------------: | :-----------------------------------------------------------------: | :-----: | :-: |
+| | | training | validation | testing | |
+| ICDAR2011 | [homepage](https://rrc.cvc.uab.es/?ch=1) | - | - | | |
+| ICDAR2017 | [homepage](https://rrc.cvc.uab.es/?ch=8&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - | |
+| CurvedSynText150k | [homepage](https://github.com/aim-uofa/AdelaiDet/blob/master/datasets/README.md) \| [Part1](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) \| [Part2](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) | [instances_training.json](https://download.openmmlab.com/mmocr/data/curvedsyntext/instances_training.json) | - | - | |
+| DeText | [homepage](https://rrc.cvc.uab.es/?ch=9) | - | - | - | |
+| Lecture Video DB | [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb) | - | - | - | |
+| LSVT | [homepage](https://rrc.cvc.uab.es/?ch=16) | - | - | - | |
+| IMGUR | [homepage](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset) | - | - | - | |
+| KAIST | [homepage](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) | - | - | - | |
+| MTWI | [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us) | - | - | - | |
+| ReCTS | [homepage](https://rrc.cvc.uab.es/?ch=12) | - | - | - | |
+| IIIT-ILST | [homepage](http://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-ilst) | - | - | - | |
+| VinText | [homepage](https://github.com/VinAIResearch/dict-guided) | - | - | - | |
+| BID | [homepage](https://github.com/ricardobnjunior/Brazilian-Identity-Document-Dataset) | - | - | - | |
+| RCTW | [homepage](https://rctw.vlrlab.net/index.html) | - | - | - | |
+| HierText | [homepage](https://github.com/google-research-datasets/hiertext) | - | - | - | |
+| ArT | [homepage](https://rrc.cvc.uab.es/?ch=14) | - | - | - | |
+
+### Install AWS CLI (optional)
+
+- Since there are some datasets that require the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) to be installed in advance, we provide a quick installation guide here:
+
+ ```bash
+ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
+ unzip awscliv2.zip
+ sudo ./aws/install
+ ./aws/install -i /usr/local/aws-cli -b /usr/local/bin
+ !aws configure
+ # this command will require you to input keys, you can skip them except
+ # for the Default region name
+ # AWS Access Key ID [None]:
+ # AWS Secret Access Key [None]:
+ # Default region name [None]: us-east-1
+ # Default output format [None]
+ ```
+
+For users in China, these datasets can also be downloaded from [OpenDataLab](https://opendatalab.com/) with high speed:
+
+- [CTW1500](https://opendatalab.com/SCUT-CTW1500?source=OpenMMLab%20GitHub)
+- [ICDAR2013](https://opendatalab.com/ICDAR_2013?source=OpenMMLab%20GitHub)
+- [ICDAR2015](https://opendatalab.com/ICDAR2015?source=OpenMMLab%20GitHub)
+- [Totaltext](https://opendatalab.com/TotalText?source=OpenMMLab%20GitHub)
+- [MSRA-TD500](https://opendatalab.com/MSRA-TD500?source=OpenMMLab%20GitHub)
+
+## Important Note
+
+```{note}
+**For users who want to train models on CTW1500, ICDAR 2015/2017, and Totaltext dataset,** there might be some images containing orientation info in EXIF data. The default OpenCV
+backend used in MMCV would read them and apply the rotation on the images. However, their gold annotations are made on the raw pixels, and such
+inconsistency results in false examples in the training set. Therefore, users should use `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` in pipelines to change MMCV's default loading behaviour. (see [DBNet's pipeline config](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/det_pipelines/dbnet_pipeline.py) for example)
+```
+
+## ICDAR 2011 (Born-Digital Images)
+
+- Step1: Download `Challenge1_Training_Task12_Images.zip`, `Challenge1_Training_Task1_GT.zip`, `Challenge1_Test_Task12_Images.zip`, and `Challenge1_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=1&com=downloads) `Task 1.1: Text Localization (2013 edition)`.
+
+ ```bash
+ mkdir icdar2011 && cd icdar2011
+ mkdir imgs && mkdir annotations
+
+ # Download ICDAR 2011
+ wget https://rrc.cvc.uab.es/downloads/Challenge1_Training_Task12_Images.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/Challenge1_Training_Task1_GT.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/Challenge1_Test_Task12_Images.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/Challenge1_Test_Task1_GT.zip --no-check-certificate
+
+ # For images
+ unzip -q Challenge1_Training_Task12_Images.zip -d imgs/training
+ unzip -q Challenge1_Test_Task12_Images.zip -d imgs/test
+ # For annotations
+ unzip -q Challenge1_Training_Task1_GT.zip -d annotations/training
+ unzip -q Challenge1_Test_Task1_GT.zip -d annotations/test
+
+ rm Challenge1_Training_Task12_Images.zip && rm Challenge1_Test_Task12_Images.zip && rm Challenge1_Training_Task1_GT.zip && rm Challenge1_Test_Task1_GT.zip
+ ```
+
+- Step 2: Generate `instances_training.json` and `instances_test.json` with the following command:
+
+ ```bash
+ python tools/dataset_converters/textdet/ic11_converter.py PATH/TO/icdar2011 --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ icdar2011
+ โ โโโ imgs
+ โ โโโ instances_test.json
+ โ โโโ instances_training.json
+ ```
+
+## ICDAR 2017
+
+- Follow similar steps as [ICDAR 2015](#icdar-2015).
+
+- The resulting directory structure looks like the following:
+
+ ```text
+ โโโ icdar2017
+ โย ย โโโ imgs
+ โย ย โโโ annotations
+ โย ย โโโ instances_training.json
+ โย ย โโโ instances_val.json
+ ```
+
+## CurvedSynText150k
+
+- Step1: Download [syntext1.zip](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) and [syntext2.zip](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) to `CurvedSynText150k/`.
+
+- Step2:
+
+ ```bash
+ unzip -q syntext1.zip
+ mv train.json train1.json
+ unzip images.zip
+ rm images.zip
+
+ unzip -q syntext2.zip
+ mv train.json train2.json
+ unzip images.zip
+ rm images.zip
+ ```
+
+- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/curvedsyntext/instances_training.json) to `CurvedSynText150k/`
+
+- Or, generate `instances_training.json` with following command:
+
+ ```bash
+ python tools/dataset_converters/common/curvedsyntext_converter.py PATH/TO/CurvedSynText150k --nproc 4
+ ```
+
+- The resulting directory structure looks like the following:
+
+ ```text
+ โโโ CurvedSynText150k
+ โย ย โโโ syntext_word_eng
+ โย ย โโโ emcs_imgs
+ โย ย โโโ instances_training.json
+ ```
+
+## DeText
+
+- Step1: Download `ch9_training_images.zip`, `ch9_training_localization_transcription_gt.zip`, `ch9_validation_images.zip`, and `ch9_validation_localization_transcription_gt.zip` from **Task 3: End to End** on the [homepage](https://rrc.cvc.uab.es/?ch=9).
+
+ ```bash
+ mkdir detext && cd detext
+ mkdir imgs && mkdir annotations && mkdir imgs/training && mkdir imgs/val && mkdir annotations/training && mkdir annotations/val
+
+ # Download DeText
+ wget https://rrc.cvc.uab.es/downloads/ch9_training_images.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/ch9_training_localization_transcription_gt.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/ch9_validation_images.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/ch9_validation_localization_transcription_gt.zip --no-check-certificate
+
+ # Extract images and annotations
+ unzip -q ch9_training_images.zip -d imgs/training && unzip -q ch9_training_localization_transcription_gt.zip -d annotations/training && unzip -q ch9_validation_images.zip -d imgs/val && unzip -q ch9_validation_localization_transcription_gt.zip -d annotations/val
+
+ # Remove zips
+ rm ch9_training_images.zip && rm ch9_training_localization_transcription_gt.zip && rm ch9_validation_images.zip && rm ch9_validation_localization_transcription_gt.zip
+ ```
+
+- Step2: Generate `instances_training.json` and `instances_val.json` with following command:
+
+ ```bash
+ python tools/dataset_converters/textdet/detext_converter.py PATH/TO/detext --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ detext
+ โย ย โโโ annotations
+ โย ย โโโ imgs
+ โย ย โโโ instances_test.json
+ โย ย โโโ instances_training.json
+ ```
+
+## Lecture Video DB
+
+- Step1: Download [IIIT-CVid.zip](http://cdn.iiit.ac.in/cdn/preon.iiit.ac.in/~kartik/IIIT-CVid.zip) to `lv/`.
+
+ ```bash
+ mkdir lv && cd lv
+
+ # Download LV dataset
+ wget http://cdn.iiit.ac.in/cdn/preon.iiit.ac.in/~kartik/IIIT-CVid.zip
+ unzip -q IIIT-CVid.zip
+
+ mv IIIT-CVid/Frames imgs
+
+ rm IIIT-CVid.zip
+ ```
+
+- Step2: Generate `instances_training.json`, `instances_val.json`, and `instances_test.json` with following command:
+
+ ```bash
+ python tools/dataset_converters/textdet/lv_converter.py PATH/TO/lv --nproc 4
+ ```
+
+- The resulting directory structure looks like the following:
+
+ ```text
+ โโโ lv
+ โย ย โโโ imgs
+ โย ย โโโ instances_test.json
+ โย ย โโโ instances_training.json
+ โย ย โโโ instances_val.json
+ ```
+
+## LSVT
+
+- Step1: Download [train_full_images_0.tar.gz](https://dataset-bj.cdn.bcebos.com/lsvt/train_full_images_0.tar.gz), [train_full_images_1.tar.gz](https://dataset-bj.cdn.bcebos.com/lsvt/train_full_images_1.tar.gz), and [train_full_labels.json](https://dataset-bj.cdn.bcebos.com/lsvt/train_full_labels.json) to `lsvt/`.
+
+ ```bash
+ mkdir lsvt && cd lsvt
+
+ # Download LSVT dataset
+ wget https://dataset-bj.cdn.bcebos.com/lsvt/train_full_images_0.tar.gz
+ wget https://dataset-bj.cdn.bcebos.com/lsvt/train_full_images_1.tar.gz
+ wget https://dataset-bj.cdn.bcebos.com/lsvt/train_full_labels.json
+
+ mkdir annotations
+ tar -xf train_full_images_0.tar.gz && tar -xf train_full_images_1.tar.gz
+ mv train_full_labels.json annotations/ && mv train_full_images_1/*.jpg train_full_images_0/
+ mv train_full_images_0 imgs
+
+ rm train_full_images_0.tar.gz && rm train_full_images_1.tar.gz && rm -rf train_full_images_1
+ ```
+
+- Step2: Generate `instances_training.json` and `instances_val.json` (optional) with the following command:
+
+ ```bash
+ # Annotations of LSVT test split is not publicly available, split a validation
+ # set by adding --val-ratio 0.2
+ python tools/dataset_converters/textdet/lsvt_converter.py PATH/TO/lsvt
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ |โโ lsvt
+ โย ย โโโ imgs
+ โย ย โโโ instances_training.json
+ โย ย โโโ instances_val.json (optional)
+ ```
+
+## IMGUR
+
+- Step1: Run `download_imgur5k.py` to download images. You can merge [PR#5](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset/pull/5) in your local repository to enable a **much faster** parallel execution of image download.
+
+ ```bash
+ mkdir imgur && cd imgur
+
+ git clone https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset.git
+
+ # Download images from imgur.com. This may take SEVERAL HOURS!
+ python ./IMGUR5K-Handwriting-Dataset/download_imgur5k.py --dataset_info_dir ./IMGUR5K-Handwriting-Dataset/dataset_info/ --output_dir ./imgs
+
+ # For annotations
+ mkdir annotations
+ mv ./IMGUR5K-Handwriting-Dataset/dataset_info/*.json annotations
+
+ rm -rf IMGUR5K-Handwriting-Dataset
+ ```
+
+- Step2: Generate `instances_train.json`, `instance_val.json` and `instances_test.json` with the following command:
+
+ ```bash
+ python tools/dataset_converters/textdet/imgur_converter.py PATH/TO/imgur
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ imgur
+ โ โโโ annotations
+ โ โโโ imgs
+ โ โโโ instances_test.json
+ โ โโโ instances_training.json
+ โ โโโ instances_val.json
+ ```
+
+## KAIST
+
+- Step1: Complete download [KAIST_all.zip](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) to `kaist/`.
+
+ ```bash
+ mkdir kaist && cd kaist
+ mkdir imgs && mkdir annotations
+
+ # Download KAIST dataset
+ wget http://www.iapr-tc11.org/dataset/KAIST_SceneText/KAIST_all.zip
+ unzip -q KAIST_all.zip
+
+ rm KAIST_all.zip
+ ```
+
+- Step2: Extract zips:
+
+ ```bash
+ python tools/dataset_converters/common/extract_kaist.py PATH/TO/kaist
+ ```
+
+- Step3: Generate `instances_training.json` and `instances_val.json` (optional) with following command:
+
+ ```bash
+ # Since KAIST does not provide an official split, you can split the dataset by adding --val-ratio 0.2
+ python tools/dataset_converters/textdet/kaist_converter.py PATH/TO/kaist --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ kaist
+ โ โโโ annotations
+ โ โโโ imgs
+ โ โโโ instances_training.json
+ โ โโโ instances_val.json (optional)
+ ```
+
+## MTWI
+
+- Step1: Download `mtwi_2018_train.zip` from [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us).
+
+ ```bash
+ mkdir mtwi && cd mtwi
+
+ unzip -q mtwi_2018_train.zip
+ mv image_train imgs && mv txt_train annotations
+
+ rm mtwi_2018_train.zip
+ ```
+
+- Step2: Generate `instances_training.json` and `instance_val.json` (optional) with the following command:
+
+ ```bash
+ # Annotations of MTWI test split is not publicly available, split a validation
+ # set by adding --val-ratio 0.2
+ python tools/dataset_converters/textdet/mtwi_converter.py PATH/TO/mtwi --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ mtwi
+ โ โโโ annotations
+ โ โโโ imgs
+ โ โโโ instances_training.json
+ โ โโโ instances_val.json (optional)
+ ```
+
+## ReCTS
+
+- Step1: Download [ReCTS.zip](https://datasets.cvc.uab.es/rrc/ReCTS.zip) to `rects/` from the [homepage](https://rrc.cvc.uab.es/?ch=12&com=downloads).
+
+ ```bash
+ mkdir rects && cd rects
+
+ # Download ReCTS dataset
+ # You can also find Google Drive link on the dataset homepage
+ wget https://datasets.cvc.uab.es/rrc/ReCTS.zip --no-check-certificate
+ unzip -q ReCTS.zip
+
+ mv img imgs && mv gt_unicode annotations
+
+ rm ReCTS.zip && rm -rf gt
+ ```
+
+- Step2: Generate `instances_training.json` and `instances_val.json` (optional) with following command:
+
+ ```bash
+ # Annotations of ReCTS test split is not publicly available, split a validation
+ # set by adding --val-ratio 0.2
+ python tools/dataset_converters/textdet/rects_converter.py PATH/TO/rects --nproc 4 --val-ratio 0.2
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ rects
+ โ โโโ annotations
+ โ โโโ imgs
+ โ โโโ instances_val.json (optional)
+ โ โโโ instances_training.json
+ ```
+
+## ILST
+
+- Step1: Download `IIIT-ILST` from [onedrive](https://iiitaphyd-my.sharepoint.com/:f:/g/personal/minesh_mathew_research_iiit_ac_in/EtLvCozBgaBIoqglF4M-lHABMgNcCDW9rJYKKWpeSQEElQ?e=zToXZP)
+
+- Step2: Run the following commands
+
+ ```bash
+ unzip -q IIIT-ILST.zip && rm IIIT-ILST.zip
+ cd IIIT-ILST
+
+ # rename files
+ cd Devanagari && for i in `ls`; do mv -f $i `echo "devanagari_"$i`; done && cd ..
+ cd Malayalam && for i in `ls`; do mv -f $i `echo "malayalam_"$i`; done && cd ..
+ cd Telugu && for i in `ls`; do mv -f $i `echo "telugu_"$i`; done && cd ..
+
+ # transfer image path
+ mkdir imgs && mkdir annotations
+ mv Malayalam/{*jpg,*jpeg} imgs/ && mv Malayalam/*xml annotations/
+ mv Devanagari/*jpg imgs/ && mv Devanagari/*xml annotations/
+ mv Telugu/*jpeg imgs/ && mv Telugu/*xml annotations/
+
+ # remove unnecessary files
+ rm -rf Devanagari && rm -rf Malayalam && rm -rf Telugu && rm -rf README.txt
+ ```
+
+- Step3: Generate `instances_training.json` and `instances_val.json` (optional). Since the original dataset doesn't have a validation set, you may specify `--val-ratio` to split the dataset. E.g., if val-ratio is 0.2, then 20% of the data are left out as the validation set in this example.
+
+ ```bash
+ python tools/dataset_converters/textdet/ilst_converter.py PATH/TO/IIIT-ILST --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ IIIT-ILST
+ โย ย โโโ annotations
+ โย ย โโโ imgs
+ โย ย โโโ instances_val.json (optional)
+ โย ย โโโ instances_training.json
+ ```
+
+## VinText
+
+- Step1: Download [vintext.zip](https://drive.google.com/drive/my-drive) to `vintext`
+
+ ```bash
+ mkdir vintext && cd vintext
+
+ # Download dataset from google drive
+ wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1UUQhNvzgpZy7zXBFQp0Qox-BBjunZ0ml' -O- โ sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1UUQhNvzgpZy7zXBFQp0Qox-BBjunZ0ml" -O vintext.zip && rm -rf /tmp/cookies.txt
+
+ # Extract images and annotations
+ unzip -q vintext.zip && rm vintext.zip
+ mv vietnamese/labels ./ && mv vietnamese/test_image ./ && mv vietnamese/train_images ./ && mv vietnamese/unseen_test_images ./
+ rm -rf vietnamese
+
+ # Rename files
+ mv labels annotations && mv test_image test && mv train_images training && mv unseen_test_images unseen_test
+ mkdir imgs
+ mv training imgs/ && mv test imgs/ && mv unseen_test imgs/
+ ```
+
+- Step2: Generate `instances_training.json`, `instances_test.json` and `instances_unseen_test.json`
+
+ ```bash
+ python tools/dataset_converters/textdet/vintext_converter.py PATH/TO/vintext --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ vintext
+ โย ย โโโ annotations
+ โย ย โโโ imgs
+ โย ย โโโ instances_test.json
+ โย ย โโโ instances_unseen_test.json
+ โย ย โโโ instances_training.json
+ ```
+
+## BID
+
+- Step1: Download [BID Dataset.zip](https://drive.google.com/file/d/1Oi88TRcpdjZmJ79WDLb9qFlBNG8q2De6/view)
+
+- Step2: Run the following commands to preprocess the dataset
+
+ ```bash
+ # Rename
+ mv BID\ Dataset.zip BID_Dataset.zip
+
+ # Unzip and Rename
+ unzip -q BID_Dataset.zip && rm BID_Dataset.zip
+ mv BID\ Dataset BID
+
+ # The BID dataset has a problem of permission, and you may
+ # add permission for this file
+ chmod -R 777 BID
+ cd BID
+ mkdir imgs && mkdir annotations
+
+ # For images and annotations
+ mv CNH_Aberta/*in.jpg imgs && mv CNH_Aberta/*txt annotations && rm -rf CNH_Aberta
+ mv CNH_Frente/*in.jpg imgs && mv CNH_Frente/*txt annotations && rm -rf CNH_Frente
+ mv CNH_Verso/*in.jpg imgs && mv CNH_Verso/*txt annotations && rm -rf CNH_Verso
+ mv CPF_Frente/*in.jpg imgs && mv CPF_Frente/*txt annotations && rm -rf CPF_Frente
+ mv CPF_Verso/*in.jpg imgs && mv CPF_Verso/*txt annotations && rm -rf CPF_Verso
+ mv RG_Aberto/*in.jpg imgs && mv RG_Aberto/*txt annotations && rm -rf RG_Aberto
+ mv RG_Frente/*in.jpg imgs && mv RG_Frente/*txt annotations && rm -rf RG_Frente
+ mv RG_Verso/*in.jpg imgs && mv RG_Verso/*txt annotations && rm -rf RG_Verso
+
+ # Remove unnecessary files
+ rm -rf desktop.ini
+ ```
+
+- Step3: - Step3: Generate `instances_training.json` and `instances_val.json` (optional). Since the original dataset doesn't have a validation set, you may specify `--val-ratio` to split the dataset. E.g., if val-ratio is 0.2, then 20% of the data are left out as the validation set in this example.
+
+ ```bash
+ python tools/dataset_converters/textdet/bid_converter.py PATH/TO/BID --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ BID
+ โย ย โโโ annotations
+ โย ย โโโ imgs
+ โย ย โโโ instances_training.json
+ โย ย โโโ instances_val.json (optional)
+ ```
+
+## RCTW
+
+- Step1: Download `train_images.zip.001`, `train_images.zip.002`, and `train_gts.zip` from the [homepage](https://rctw.vlrlab.net/dataset.html), extract the zips to `rctw/imgs` and `rctw/annotations`, respectively.
+
+- Step2: Generate `instances_training.json` and `instances_val.json` (optional). Since the test annotations are not publicly available, you may specify `--val-ratio` to split the dataset. E.g., if val-ratio is 0.2, then 20% of the data are left out as the validation set in this example.
+
+ ```bash
+ # Annotations of RCTW test split is not publicly available, split a validation set by adding --val-ratio 0.2
+ python tools/dataset_converters/textdet/rctw_converter.py PATH/TO/rctw --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ rctw
+ โย ย โโโ annotations
+ โย ย โโโ imgs
+ โย ย โโโ instances_training.json
+ โย ย โโโ instances_val.json (optional)
+ ```
+
+## HierText
+
+- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/det.html#install-aws-cli-optional).
+
+- Step2: Clone [HierText](https://github.com/google-research-datasets/hiertext) repo to get annotations
+
+ ```bash
+ mkdir HierText
+ git clone https://github.com/google-research-datasets/hiertext.git
+ ```
+
+- Step3: Download `train.tgz`, `validation.tgz` from aws
+
+ ```bash
+ aws s3 --no-sign-request cp s3://open-images-dataset/ocr/train.tgz .
+ aws s3 --no-sign-request cp s3://open-images-dataset/ocr/validation.tgz .
+ ```
+
+- Step4: Process raw data
+
+ ```bash
+ # process annotations
+ mv hiertext/gt ./
+ rm -rf hiertext
+ mv gt annotations
+ gzip -d annotations/train.jsonl.gz
+ gzip -d annotations/validation.jsonl.gz
+ # process images
+ mkdir imgs
+ mv train.tgz imgs/
+ mv validation.tgz imgs/
+ tar -xzvf imgs/train.tgz
+ tar -xzvf imgs/validation.tgz
+ ```
+
+- Step5: Generate `instances_training.json` and `instance_val.json`. HierText includes different levels of annotation, from paragraph, line, to word. Check the original [paper](https://arxiv.org/pdf/2203.15143.pdf) for details. E.g. set `--level paragraph` to get paragraph-level annotation. Set `--level line` to get line-level annotation. set `--level word` to get word-level annotation.
+
+ ```bash
+ # Collect word annotation from HierText --level word
+ python tools/dataset_converters/textdet/hiertext_converter.py PATH/TO/HierText --level word --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ HierText
+ โย ย โโโ annotations
+ โย ย โโโ imgs
+ โย ย โโโ instances_training.json
+ โย ย โโโ instances_val.json
+ ```
+
+## ArT
+
+- Step1: Download `train_images.tar.gz`, and `train_labels.json` from the [homepage](https://rrc.cvc.uab.es/?ch=14&com=downloads) to `art/`
+
+ ```bash
+ mkdir art && cd art
+ mkdir annotations
+
+ # Download ArT dataset
+ wget https://dataset-bj.cdn.bcebos.com/art/train_images.tar.gz --no-check-certificate
+ wget https://dataset-bj.cdn.bcebos.com/art/train_labels.json --no-check-certificate
+
+ # Extract
+ tar -xf train_images.tar.gz
+ mv train_images imgs
+ mv train_labels.json annotations/
+
+ # Remove unnecessary files
+ rm train_images.tar.gz
+ ```
+
+- Step2: Generate `instances_training.json` and `instances_val.json` (optional). Since the test annotations are not publicly available, you may specify `--val-ratio` to split the dataset. E.g., if val-ratio is 0.2, then 20% of the data are left out as the validation set in this example.
+
+ ```bash
+ # Annotations of ArT test split is not publicly available, split a validation set by adding --val-ratio 0.2
+ python tools/data/textdet/art_converter.py PATH/TO/art --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ art
+ โย ย โโโ annotations
+ โย ย โโโ imgs
+ โย ย โโโ instances_training.json
+ โย ย โโโ instances_val.json (optional)
+ ```
diff --git a/mmocr-dev-1.x/docs/en/user_guides/data_prepare/kie.md b/mmocr-dev-1.x/docs/en/user_guides/data_prepare/kie.md
new file mode 100644
index 0000000000000000000000000000000000000000..9d324383c45726c564ed86d040a93e0c6fcc2d80
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/data_prepare/kie.md
@@ -0,0 +1,42 @@
+# Key Information Extraction
+
+```{note}
+This page is a manual preparation guide for datasets not yet supported by [Dataset Preparer](./dataset_preparer.md), which all these scripts will be eventually migrated into.
+```
+
+## Overview
+
+The structure of the key information extraction dataset directory is organized as follows.
+
+```text
+โโโ wildreceipt
+ โโโ class_list.txt
+ โโโ dict.txt
+ โโโ image_files
+ โโโ openset_train.txt
+ โโโ openset_test.txt
+ โโโ test.txt
+ โโโ train.txt
+```
+
+## Preparation Steps
+
+### WildReceipt
+
+- Just download and extract [wildreceipt.tar](https://download.openmmlab.com/mmocr/data/wildreceipt.tar).
+
+### WildReceiptOpenset
+
+- Step0: have [WildReceipt](#WildReceipt) prepared.
+- Step1: Convert annotation files to OpenSet format:
+
+```bash
+# You may find more available arguments by running
+# python tools/data/kie/closeset_to_openset.py -h
+python tools/data/kie/closeset_to_openset.py data/wildreceipt/train.txt data/wildreceipt/openset_train.txt
+python tools/data/kie/closeset_to_openset.py data/wildreceipt/test.txt data/wildreceipt/openset_test.txt
+```
+
+```{note}
+You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial](../tutorials/kie_closeset_openset.md).
+```
diff --git a/mmocr-dev-1.x/docs/en/user_guides/data_prepare/recog.md b/mmocr-dev-1.x/docs/en/user_guides/data_prepare/recog.md
new file mode 100644
index 0000000000000000000000000000000000000000..e4a021581c770d8455eec1f69cb42320dc67c555
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/data_prepare/recog.md
@@ -0,0 +1,784 @@
+# Text Recognition
+
+```{note}
+This page is a manual preparation guide for datasets not yet supported by [Dataset Preparer](./dataset_preparer.md), which all these scripts will be eventually migrated into.
+```
+
+## Overview
+
+| Dataset | images | annotation file | annotation file |
+| :--------------: | :-----------------------------------------------------: | :--------------------------------------------------------------: | :---------------------------------------------------------------: |
+| | | training | test |
+| coco_text | [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads) | [train_labels.json](#TODO) | - |
+| ICDAR2011 | [homepage](https://rrc.cvc.uab.es/?ch=1) | - | - |
+| SynthAdd | [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) | [train_labels.json](https://download.openmmlab.com/mmocr/data/1.x/recog/synthtext_add/train_labels.json) | - |
+| OpenVINO | [Open Images](https://github.com/cvdfoundation/open-images-dataset) | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) |
+| DeText | [homepage](https://rrc.cvc.uab.es/?ch=9) | - | - |
+| Lecture Video DB | [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb) | - | - |
+| LSVT | [homepage](https://rrc.cvc.uab.es/?ch=16) | - | - |
+| IMGUR | [homepage](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset) | - | - |
+| KAIST | [homepage](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) | - | - |
+| MTWI | [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us) | - | - |
+| ReCTS | [homepage](https://rrc.cvc.uab.es/?ch=12) | - | - |
+| IIIT-ILST | [homepage](http://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-ilst) | - | - |
+| VinText | [homepage](https://github.com/VinAIResearch/dict-guided) | - | - |
+| BID | [homepage](https://github.com/ricardobnjunior/Brazilian-Identity-Document-Dataset) | - | - |
+| RCTW | [homepage](https://rctw.vlrlab.net/index.html) | - | - |
+| HierText | [homepage](https://github.com/google-research-datasets/hiertext) | - | - |
+| ArT | [homepage](https://rrc.cvc.uab.es/?ch=14) | - | - |
+
+(\*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.
+
+### Install AWS CLI (optional)
+
+- Since there are some datasets that require the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) to be installed in advance, we provide a quick installation guide here:
+
+ ```bash
+ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
+ unzip awscliv2.zip
+ sudo ./aws/install
+ ./aws/install -i /usr/local/aws-cli -b /usr/local/bin
+ !aws configure
+ # this command will require you to input keys, you can skip them except
+ # for the Default region name
+ # AWS Access Key ID [None]:
+ # AWS Secret Access Key [None]:
+ # Default region name [None]: us-east-1
+ # Default output format [None]
+ ```
+
+For users in China, these datasets can also be downloaded from [OpenDataLab](https://opendatalab.com/) with high speed:
+
+- [icdar_2013](https://opendatalab.com/ICDAR_2013?source=OpenMMLab%20GitHub)
+- [icdar_2015](https://opendatalab.com/ICDAR2015?source=OpenMMLab%20GitHub)
+- [IIIT5K](https://opendatalab.com/IIIT_5K?source=OpenMMLab%20GitHub)
+- [ct80](https://opendatalab.com/CUTE_80?source=OpenMMLab%20GitHub)
+- [svt](https://opendatalab.com/SVT?source=OpenMMLab%20GitHub)
+- [Totaltext](https://opendatalab.com/TotalText?source=OpenMMLab%20GitHub)
+- [IAM](https://opendatalab.com/IAM_Handwriting?source=OpenMMLab%20GitHub)
+
+## ICDAR 2011 (Born-Digital Images)
+
+- Step1: Download `Challenge1_Training_Task3_Images_GT.zip`, `Challenge1_Test_Task3_Images.zip`, and `Challenge1_Test_Task3_GT.txt` from [homepage](https://rrc.cvc.uab.es/?ch=1&com=downloads) `Task 1.3: Word Recognition (2013 edition)`.
+
+ ```bash
+ mkdir icdar2011 && cd icdar2011
+ mkdir annotations
+
+ # Download ICDAR 2011
+ wget https://rrc.cvc.uab.es/downloads/Challenge1_Training_Task3_Images_GT.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/Challenge1_Test_Task3_Images.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/Challenge1_Test_Task3_GT.txt --no-check-certificate
+
+ # For images
+ mkdir crops
+ unzip -q Challenge1_Training_Task3_Images_GT.zip -d crops/train
+ unzip -q Challenge1_Test_Task3_Images.zip -d crops/test
+
+ # For annotations
+ mv Challenge1_Test_Task3_GT.txt annotations && mv crops/train/gt.txt annotations/Challenge1_Train_Task3_GT.txt
+ ```
+
+- Step2: Convert original annotations to `train_labels.json` and `test_labels.json` with the following command:
+
+ ```bash
+ python tools/dataset_converters/textrecog/ic11_converter.py PATH/TO/icdar2011
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ icdar2011
+ โ โโโ crops
+ โ โโโ train_labels.json
+ โ โโโ test_labels.json
+ ```
+
+## coco_text
+
+- Step1: Download from [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)
+
+- Step2: Download [train_labels.json](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_labels.json)
+
+- After running the above codes, the directory structure
+ should be as follows:
+
+ ```text
+ โโโ coco_text
+ โ โโโ train_labels.json
+ โ โโโ train_words
+ ```
+
+## SynthAdd
+
+- Step1: Download `SynthText_Add.zip` from [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x))
+
+- Step2: Download [train_labels.json](https://download.openmmlab.com/mmocr/data/1.x/recog/synthtext_add/train_labels.json)
+
+- Step3:
+
+ ```bash
+ mkdir SynthAdd && cd SynthAdd
+
+ mv /path/to/SynthText_Add.zip .
+
+ unzip SynthText_Add.zip
+
+ mv /path/to/train_labels.json .
+
+ # create soft link
+ cd /path/to/mmocr/data/recog
+
+ ln -s /path/to/SynthAdd SynthAdd
+
+ ```
+
+- After running the above codes, the directory structure
+ should be as follows:
+
+ ```text
+ โโโ SynthAdd
+ โ โโโ train_labels.json
+ โ โโโ SynthText_Add
+ ```
+
+## OpenVINO
+
+- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/recog.html#install-aws-cli-optional).
+
+- Step2: Download [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) subsets `train_1`, `train_2`, `train_5`, `train_f`, and `validation` to `openvino/`.
+
+ ```bash
+ mkdir openvino && cd openvino
+
+ # Download Open Images subsets
+ for s in 1 2 5 f; do
+ aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_${s}.tar.gz .
+ done
+ aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz .
+
+ # Download annotations
+ for s in 1 2 5 f; do
+ wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_train_${s}.json
+ done
+ wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_validation.json
+
+ # Extract images
+ mkdir -p openimages_v5/val
+ for s in 1 2 5 f; do
+ tar zxf train_${s}.tar.gz -C openimages_v5
+ done
+ tar zxf validation.tar.gz -C openimages_v5/val
+ ```
+
+- Step3: Generate `train_{1,2,5,f}_labels.json`, `val_labels.json` and crop images using 4 processes with the following command:
+
+ ```bash
+ python tools/dataset_converters/textrecog/openvino_converter.py /path/to/openvino 4
+ ```
+
+- After running the above codes, the directory structure
+ should be as follows:
+
+ ```text
+ โโโ OpenVINO
+ โ โโโ image_1
+ โ โโโ image_2
+ โ โโโ image_5
+ โ โโโ image_f
+ โ โโโ image_val
+ โ โโโ train_1_labels.json
+ โ โโโ train_2_labels.json
+ โ โโโ train_5_labels.json
+ โ โโโ train_f_labels.json
+ โ โโโ val_labels.json
+ ```
+
+## DeText
+
+- Step1: Download `ch9_training_images.zip`, `ch9_training_localization_transcription_gt.zip`, `ch9_validation_images.zip`, and `ch9_validation_localization_transcription_gt.zip` from **Task 3: End to End** on the [homepage](https://rrc.cvc.uab.es/?ch=9).
+
+ ```bash
+ mkdir detext && cd detext
+ mkdir imgs && mkdir annotations && mkdir imgs/training && mkdir imgs/val && mkdir annotations/training && mkdir annotations/val
+
+ # Download DeText
+ wget https://rrc.cvc.uab.es/downloads/ch9_training_images.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/ch9_training_localization_transcription_gt.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/ch9_validation_images.zip --no-check-certificate
+ wget https://rrc.cvc.uab.es/downloads/ch9_validation_localization_transcription_gt.zip --no-check-certificate
+
+ # Extract images and annotations
+ unzip -q ch9_training_images.zip -d imgs/training && unzip -q ch9_training_localization_transcription_gt.zip -d annotations/training && unzip -q ch9_validation_images.zip -d imgs/val && unzip -q ch9_validation_localization_transcription_gt.zip -d annotations/val
+
+ # Remove zips
+ rm ch9_training_images.zip && rm ch9_training_localization_transcription_gt.zip && rm ch9_validation_images.zip && rm ch9_validation_localization_transcription_gt.zip
+ ```
+
+- Step2: Generate `train_labels.json` and `test_labels.json` with following command:
+
+ ```bash
+ # Add --preserve-vertical to preserve vertical texts for training, otherwise
+ # vertical images will be filtered and stored in PATH/TO/detext/ignores
+ python tools/dataset_converters/textrecog/detext_converter.py PATH/TO/detext --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ detext
+ โ โโโ crops
+ โ โโโ ignores
+ โ โโโ train_labels.json
+ โ โโโ test_labels.json
+ ```
+
+## NAF
+
+- Step1: Download [labeled_images.tar.gz](https://github.com/herobd/NAF_dataset/releases/tag/v1.0) to `naf/`.
+
+ ```bash
+ mkdir naf && cd naf
+
+ # Download NAF dataset
+ wget https://github.com/herobd/NAF_dataset/releases/download/v1.0/labeled_images.tar.gz
+ tar -zxf labeled_images.tar.gz
+
+ # For images
+ mkdir annotations && mv labeled_images imgs
+
+ # For annotations
+ git clone https://github.com/herobd/NAF_dataset.git
+ mv NAF_dataset/train_valid_test_split.json annotations/ && mv NAF_dataset/groups annotations/
+
+ rm -rf NAF_dataset && rm labeled_images.tar.gz
+ ```
+
+- Step2: Generate `train_labels.json`, `val_labels.json`, and `test_labels.json` with following command:
+
+ ```bash
+ # Add --preserve-vertical to preserve vertical texts for training, otherwise
+ # vertical images will be filtered and stored in PATH/TO/naf/ignores
+ python tools/dataset_converters/textrecog/naf_converter.py PATH/TO/naf --nproc 4
+
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ naf
+ โ โโโ crops
+ โ โโโ train_labels.json
+ โ โโโ val_labels.json
+ โ โโโ test_labels.json
+ ```
+
+## Lecture Video DB
+
+```{warning}
+This section is not fully tested yet.
+```
+
+```{note}
+The LV dataset has already provided cropped images and the corresponding annotations
+```
+
+- Step1: Download [IIIT-CVid.zip](http://cdn.iiit.ac.in/cdn/preon.iiit.ac.in/~kartik/IIIT-CVid.zip) to `lv/`.
+
+ ```bash
+ mkdir lv && cd lv
+
+ # Download LV dataset
+ wget http://cdn.iiit.ac.in/cdn/preon.iiit.ac.in/~kartik/IIIT-CVid.zip
+ unzip -q IIIT-CVid.zip
+
+ # For image
+ mv IIIT-CVid/Crops ./
+
+ # For annotation
+ mv IIIT-CVid/train.txt train_labels.json && mv IIIT-CVid/val.txt val_label.txt && mv IIIT-CVid/test.txt test_labels.json
+
+ rm IIIT-CVid.zip
+ ```
+
+- Step2: Generate `train_labels.json`, `val.json`, and `test.json` with following command:
+
+ ```bash
+ python tools/dataset_converters/textdreog/lv_converter.py PATH/TO/lv
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ lv
+ โ โโโ Crops
+ โ โโโ train_labels.json
+ โ โโโ test_labels.json
+ ```
+
+## LSVT
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download [train_full_images_0.tar.gz](https://dataset-bj.cdn.bcebos.com/lsvt/train_full_images_0.tar.gz), [train_full_images_1.tar.gz](https://dataset-bj.cdn.bcebos.com/lsvt/train_full_images_1.tar.gz), and [train_full_labels.json](https://dataset-bj.cdn.bcebos.com/lsvt/train_full_labels.json) to `lsvt/`.
+
+ ```bash
+ mkdir lsvt && cd lsvt
+
+ # Download LSVT dataset
+ wget https://dataset-bj.cdn.bcebos.com/lsvt/train_full_images_0.tar.gz
+ wget https://dataset-bj.cdn.bcebos.com/lsvt/train_full_images_1.tar.gz
+ wget https://dataset-bj.cdn.bcebos.com/lsvt/train_full_labels.json
+
+ mkdir annotations
+ tar -xf train_full_images_0.tar.gz && tar -xf train_full_images_1.tar.gz
+ mv train_full_labels.json annotations/ && mv train_full_images_1/*.jpg train_full_images_0/
+ mv train_full_images_0 imgs
+
+ rm train_full_images_0.tar.gz && rm train_full_images_1.tar.gz && rm -rf train_full_images_1
+ ```
+
+- Step2: Generate `train_labels.json` and `val_label.json` (optional) with the following command:
+
+ ```bash
+ # Annotations of LSVT test split is not publicly available, split a validation
+ # set by adding --val-ratio 0.2
+ # Add --preserve-vertical to preserve vertical texts for training, otherwise
+ # vertical images will be filtered and stored in PATH/TO/lsvt/ignores
+ python tools/dataset_converters/textdrecog/lsvt_converter.py PATH/TO/lsvt --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ lsvt
+ โ โโโ crops
+ โ โโโ ignores
+ โ โโโ train_labels.json
+ โ โโโ val_label.json (optional)
+ ```
+
+## IMGUR
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Run `download_imgur5k.py` to download images. You can merge [PR#5](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset/pull/5) in your local repository to enable a **much faster** parallel execution of image download.
+
+ ```bash
+ mkdir imgur && cd imgur
+
+ git clone https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset.git
+
+ # Download images from imgur.com. This may take SEVERAL HOURS!
+ python ./IMGUR5K-Handwriting-Dataset/download_imgur5k.py --dataset_info_dir ./IMGUR5K-Handwriting-Dataset/dataset_info/ --output_dir ./imgs
+
+ # For annotations
+ mkdir annotations
+ mv ./IMGUR5K-Handwriting-Dataset/dataset_info/*.json annotations
+
+ rm -rf IMGUR5K-Handwriting-Dataset
+ ```
+
+- Step2: Generate `train_labels.json`, `val_label.txt` and `test_labels.json` and crop images with the following command:
+
+ ```bash
+ python tools/dataset_converters/textrecog/imgur_converter.py PATH/TO/imgur
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ imgur
+ โ โโโ crops
+ โ โโโ train_labels.json
+ โ โโโ test_labels.json
+ โ โโโ val_label.json
+ ```
+
+## KAIST
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download [KAIST_all.zip](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) to `kaist/`.
+
+ ```bash
+ mkdir kaist && cd kaist
+ mkdir imgs && mkdir annotations
+
+ # Download KAIST dataset
+ wget http://www.iapr-tc11.org/dataset/KAIST_SceneText/KAIST_all.zip
+ unzip -q KAIST_all.zip && rm KAIST_all.zip
+ ```
+
+- Step2: Extract zips:
+
+ ```bash
+ python tools/dataset_converters/common/extract_kaist.py PATH/TO/kaist
+ ```
+
+- Step3: Generate `train_labels.json` and `val_label.json` (optional) with following command:
+
+ ```bash
+ # Since KAIST does not provide an official split, you can split the dataset by adding --val-ratio 0.2
+ # Add --preserve-vertical to preserve vertical texts for training, otherwise
+ # vertical images will be filtered and stored in PATH/TO/kaist/ignores
+ python tools/dataset_converters/textrecog/kaist_converter.py PATH/TO/kaist --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ kaist
+ โ โโโ crops
+ โ โโโ ignores
+ โ โโโ train_labels.json
+ โ โโโ val_label.json (optional)
+ ```
+
+## MTWI
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download `mtwi_2018_train.zip` from [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us).
+
+ ```bash
+ mkdir mtwi && cd mtwi
+
+ unzip -q mtwi_2018_train.zip
+ mv image_train imgs && mv txt_train annotations
+
+ rm mtwi_2018_train.zip
+ ```
+
+- Step2: Generate `train_labels.json` and `val_label.json` (optional) with the following command:
+
+ ```bash
+ # Annotations of MTWI test split is not publicly available, split a validation
+ # set by adding --val-ratio 0.2
+ # Add --preserve-vertical to preserve vertical texts for training, otherwise
+ # vertical images will be filtered and stored in PATH/TO/mtwi/ignores
+ python tools/dataset_converters/textrecog/mtwi_converter.py PATH/TO/mtwi --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ mtwi
+ โ โโโ crops
+ โ โโโ train_labels.json
+ โ โโโ val_label.json (optional)
+ ```
+
+## ReCTS
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download [ReCTS.zip](https://datasets.cvc.uab.es/rrc/ReCTS.zip) to `rects/` from the [homepage](https://rrc.cvc.uab.es/?ch=12&com=downloads).
+
+ ```bash
+ mkdir rects && cd rects
+
+ # Download ReCTS dataset
+ # You can also find Google Drive link on the dataset homepage
+ wget https://datasets.cvc.uab.es/rrc/ReCTS.zip --no-check-certificate
+ unzip -q ReCTS.zip
+
+ mv img imgs && mv gt_unicode annotations
+
+ rm ReCTS.zip -f && rm -rf gt
+ ```
+
+- Step2: Generate `train_labels.json` and `val_label.json` (optional) with the following command:
+
+ ```bash
+ # Annotations of ReCTS test split is not publicly available, split a validation
+ # set by adding --val-ratio 0.2
+ # Add --preserve-vertical to preserve vertical texts for training, otherwise
+ # vertical images will be filtered and stored in PATH/TO/rects/ignores
+ python tools/dataset_converters/textrecog/rects_converter.py PATH/TO/rects --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ rects
+ โ โโโ crops
+ โ โโโ ignores
+ โ โโโ train_labels.json
+ โ โโโ val_label.json (optional)
+ ```
+
+## ILST
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download `IIIT-ILST.zip` from [onedrive link](https://iiitaphyd-my.sharepoint.com/:f:/g/personal/minesh_mathew_research_iiit_ac_in/EtLvCozBgaBIoqglF4M-lHABMgNcCDW9rJYKKWpeSQEElQ?e=zToXZP)
+
+- Step2: Run the following commands
+
+ ```bash
+ unzip -q IIIT-ILST.zip && rm IIIT-ILST.zip
+ cd IIIT-ILST
+
+ # rename files
+ cd Devanagari && for i in `ls`; do mv -f $i `echo "devanagari_"$i`; done && cd ..
+ cd Malayalam && for i in `ls`; do mv -f $i `echo "malayalam_"$i`; done && cd ..
+ cd Telugu && for i in `ls`; do mv -f $i `echo "telugu_"$i`; done && cd ..
+
+ # transfer image path
+ mkdir imgs && mkdir annotations
+ mv Malayalam/{*jpg,*jpeg} imgs/ && mv Malayalam/*xml annotations/
+ mv Devanagari/*jpg imgs/ && mv Devanagari/*xml annotations/
+ mv Telugu/*jpeg imgs/ && mv Telugu/*xml annotations/
+
+ # remove unnecessary files
+ rm -rf Devanagari && rm -rf Malayalam && rm -rf Telugu && rm -rf README.txt
+ ```
+
+- Step3: Generate `train_labels.json` and `val_label.json` (optional) and crop images using 4 processes with the following command (add `--preserve-vertical` if you wish to preserve the images containing vertical texts). Since the original dataset doesn't have a validation set, you may specify `--val-ratio` to split the dataset. E.g., if val-ratio is 0.2, then 20% of the data are left out as the validation set in this example.
+
+ ```bash
+ python tools/dataset_converters/textrecog/ilst_converter.py PATH/TO/IIIT-ILST --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ IIIT-ILST
+ โ โโโ crops
+ โ โโโ ignores
+ โ โโโ train_labels.json
+ โ โโโ val_label.json (optional)
+ ```
+
+## VinText
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download [vintext.zip](https://drive.google.com/drive/my-drive) to `vintext`
+
+ ```bash
+ mkdir vintext && cd vintext
+
+ # Download dataset from google drive
+ wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1UUQhNvzgpZy7zXBFQp0Qox-BBjunZ0ml' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1UUQhNvzgpZy7zXBFQp0Qox-BBjunZ0ml" -O vintext.zip && rm -rf /tmp/cookies.txt
+
+ # Extract images and annotations
+ unzip -q vintext.zip && rm vintext.zip
+ mv vietnamese/labels ./ && mv vietnamese/test_image ./ && mv vietnamese/train_images ./ && mv vietnamese/unseen_test_images ./
+ rm -rf vietnamese
+
+ # Rename files
+ mv labels annotations && mv test_image test && mv train_images training && mv unseen_test_images unseen_test
+ mkdir imgs
+ mv training imgs/ && mv test imgs/ && mv unseen_test imgs/
+ ```
+
+- Step2: Generate `train_labels.json`, `test_labels.json`, `unseen_test_labels.json`, and crop images using 4 processes with the following command (add `--preserve-vertical` if you wish to preserve the images containing vertical texts).
+
+ ```bash
+ python tools/dataset_converters/textrecog/vintext_converter.py PATH/TO/vietnamese --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ vintext
+ โ โโโ crops
+ โ โโโ ignores
+ โ โโโ train_labels.json
+ โ โโโ test_labels.json
+ โ โโโ unseen_test_labels.json
+ ```
+
+## BID
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download [BID Dataset.zip](https://drive.google.com/file/d/1Oi88TRcpdjZmJ79WDLb9qFlBNG8q2De6/view)
+
+- Step2: Run the following commands to preprocess the dataset
+
+ ```bash
+ # Rename
+ mv BID\ Dataset.zip BID_Dataset.zip
+
+ # Unzip and Rename
+ unzip -q BID_Dataset.zip && rm BID_Dataset.zip
+ mv BID\ Dataset BID
+
+ # The BID dataset has a problem of permission, and you may
+ # add permission for this file
+ chmod -R 777 BID
+ cd BID
+ mkdir imgs && mkdir annotations
+
+ # For images and annotations
+ mv CNH_Aberta/*in.jpg imgs && mv CNH_Aberta/*txt annotations && rm -rf CNH_Aberta
+ mv CNH_Frente/*in.jpg imgs && mv CNH_Frente/*txt annotations && rm -rf CNH_Frente
+ mv CNH_Verso/*in.jpg imgs && mv CNH_Verso/*txt annotations && rm -rf CNH_Verso
+ mv CPF_Frente/*in.jpg imgs && mv CPF_Frente/*txt annotations && rm -rf CPF_Frente
+ mv CPF_Verso/*in.jpg imgs && mv CPF_Verso/*txt annotations && rm -rf CPF_Verso
+ mv RG_Aberto/*in.jpg imgs && mv RG_Aberto/*txt annotations && rm -rf RG_Aberto
+ mv RG_Frente/*in.jpg imgs && mv RG_Frente/*txt annotations && rm -rf RG_Frente
+ mv RG_Verso/*in.jpg imgs && mv RG_Verso/*txt annotations && rm -rf RG_Verso
+
+ # Remove unnecessary files
+ rm -rf desktop.ini
+ ```
+
+- Step3: Generate `train_labels.json` and `val_label.json` (optional) and crop images using 4 processes with the following command (add `--preserve-vertical` if you wish to preserve the images containing vertical texts). Since the original dataset doesn't have a validation set, you may specify `--val-ratio` to split the dataset. E.g., if test-ratio is 0.2, then 20% of the data are left out as the validation set in this example.
+
+ ```bash
+ python tools/dataset_converters/textrecog/bid_converter.py PATH/TO/BID --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ BID
+ โ โโโ crops
+ โ โโโ ignores
+ โ โโโ train_labels.json
+ โ โโโ val_label.json (optional)
+ ```
+
+## RCTW
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download `train_images.zip.001`, `train_images.zip.002`, and `train_gts.zip` from the [homepage](https://rctw.vlrlab.net/dataset.html), extract the zips to `rctw/imgs` and `rctw/annotations`, respectively.
+
+- Step2: Generate `train_labels.json` and `val_label.json` (optional). Since the original dataset doesn't have a validation set, you may specify `--val-ratio` to split the dataset. E.g., if val-ratio is 0.2, then 20% of the data are left out as the validation set in this example.
+
+ ```bash
+ # Annotations of RCTW test split is not publicly available, split a validation set by adding --val-ratio 0.2
+ # Add --preserve-vertical to preserve vertical texts for training, otherwise vertical images will be filtered and stored in PATH/TO/rctw/ignores
+ python tools/dataset_converters/textrecog/rctw_converter.py PATH/TO/rctw --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ rctw
+ โย ย โโโ crops
+ โย ย โโโ ignores
+ โย ย โโโ train_labels.json
+ โย ย โโโ val_label.json (optional)
+ ```
+
+## HierText
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/recog.html#install-aws-cli-optional).
+
+- Step2: Clone [HierText](https://github.com/google-research-datasets/hiertext) repo to get annotations
+
+ ```bash
+ mkdir HierText
+ git clone https://github.com/google-research-datasets/hiertext.git
+ ```
+
+- Step3: Download `train.tgz`, `validation.tgz` from aws
+
+ ```bash
+ aws s3 --no-sign-request cp s3://open-images-dataset/ocr/train.tgz .
+ aws s3 --no-sign-request cp s3://open-images-dataset/ocr/validation.tgz .
+ ```
+
+- Step4: Process raw data
+
+ ```bash
+ # process annotations
+ mv hiertext/gt ./
+ rm -rf hiertext
+ mv gt annotations
+ gzip -d annotations/train.json.gz
+ gzip -d annotations/validation.json.gz
+ # process images
+ mkdir imgs
+ mv train.tgz imgs/
+ mv validation.tgz imgs/
+ tar -xzvf imgs/train.tgz
+ tar -xzvf imgs/validation.tgz
+ ```
+
+- Step5: Generate `train_labels.json` and `val_label.json`. HierText includes different levels of annotation, including `paragraph`, `line`, and `word`. Check the original [paper](https://arxiv.org/pdf/2203.15143.pdf) for details. E.g. set `--level paragraph` to get paragraph-level annotation. Set `--level line` to get line-level annotation. set `--level word` to get word-level annotation.
+
+ ```bash
+ # Collect word annotation from HierText --level word
+ # Add --preserve-vertical to preserve vertical texts for training, otherwise vertical images will be filtered and stored in PATH/TO/HierText/ignores
+ python tools/dataset_converters/textrecog/hiertext_converter.py PATH/TO/HierText --level word --nproc 4
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ HierText
+ โย ย โโโ crops
+ โย ย โโโ ignores
+ โย ย โโโ train_labels.json
+ โย ย โโโ val_label.json
+ ```
+
+## ArT
+
+```{warning}
+This section is not fully tested yet.
+```
+
+- Step1: Download `train_images.tar.gz`, and `train_labels.json` from the [homepage](https://rrc.cvc.uab.es/?ch=14&com=downloads) to `art/`
+
+ ```bash
+ mkdir art && cd art
+ mkdir annotations
+
+ # Download ArT dataset
+ wget https://dataset-bj.cdn.bcebos.com/art/train_task2_images.tar.gz
+ wget https://dataset-bj.cdn.bcebos.com/art/train_task2_labels.json
+
+ # Extract
+ tar -xf train_task2_images.tar.gz
+ mv train_task2_images crops
+ mv train_task2_labels.json annotations/
+
+ # Remove unnecessary files
+ rm train_images.tar.gz
+ ```
+
+- Step2: Generate `train_labels.json` and `val_label.json` (optional). Since the test annotations are not publicly available, you may specify `--val-ratio` to split the dataset. E.g., if val-ratio is 0.2, then 20% of the data are left out as the validation set in this example.
+
+ ```bash
+ # Annotations of ArT test split is not publicly available, split a validation set by adding --val-ratio 0.2
+ python tools/dataset_converters/textrecog/art_converter.py PATH/TO/art
+ ```
+
+- After running the above codes, the directory structure should be as follows:
+
+ ```text
+ โโโ art
+ โย ย โโโ crops
+ โย ย โโโ train_labels.json
+ โย ย โโโ val_label.json (optional)
+ ```
diff --git a/mmocr-dev-1.x/docs/en/user_guides/dataset_prepare.md b/mmocr-dev-1.x/docs/en/user_guides/dataset_prepare.md
new file mode 100644
index 0000000000000000000000000000000000000000..02c3ac0f914264754539918554e3708d35e05ace
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/dataset_prepare.md
@@ -0,0 +1,153 @@
+# Dataset Preparation
+
+## Introduction
+
+After decades of development, the OCR community has produced a series of related datasets that often provide annotations of text in a variety of styles, making it necessary for users to convert these datasets to the required format when using them. MMOCR supports dozens of commonly used text-related datasets and provides a [data preparation script](./data_prepare/dataset_preparer.md) to help users prepare the datasets with only one command.
+
+In this section, we will introduce a typical process of preparing a dataset for MMOCR:
+
+1. [Download datasets and convert its format to the suggested one](#downloading-datasets-and-converting-format)
+2. [Modify the config file](#dataset-configuration)
+
+However, the first step is not necessary if you already have a dataset in the format that MMOCR supports. You can read [Dataset Classes](../basic_concepts/datasets.md#dataset-classes-and-annotation-formats) for more details.
+
+## Downloading Datasets and Converting Format
+
+As an example of the data preparation steps, you can use the following command to prepare the ICDAR 2015 dataset for text detection task.
+
+```shell
+python tools/dataset_converters/prepare_dataset.py icdar2015 --task textdet
+```
+
+Then, the dataset has been downloaded and converted to MMOCR format, and the file directory structure is as follows:
+
+```text
+data/icdar2015
+โโโ textdet_imgs
+โ โโโ test
+โ โโโ train
+โโโ textdet_test.json
+โโโ textdet_train.json
+```
+
+Once your dataset has been prepared, you can use the [browse_dataset.py](./useful_tools.md#dataset-visualization-tool) to visualize the dataset and check if the annotations are correct.
+
+```bash
+python tools/analysis_tools/browse_dataset.py configs/textdet/_base_/datasets/icdar2015.py
+```
+
+## Dataset Configuration
+
+### Single Dataset Training
+
+When training or evaluating a model on new datasets, we need to write the dataset config where the image path, annotation path, and image prefix are set. The path `configs/xxx/_base_/datasets/` is pre-configured with the commonly used datasets in MMOCR (if you use `prepare_dataset.py` to prepare dataset, this config will be generated automatically), here we take the ICDAR 2015 dataset as an example (see `configs/textdet/_base_/datasets/icdar2015.py`).
+
+```Python
+icdar2015_textdet_data_root = 'data/icdar2015' # dataset root path
+
+# Train set config
+icdar2015_textdet_train = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textdet_data_root, # dataset root path
+ ann_file='textdet_train.json', # name of annotation
+ filter_cfg=dict(filter_empty_gt=True, min_size=32), # filtering empty images
+ pipeline=None)
+# Test set config
+icdar2015_textdet_test = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textdet_data_root,
+ ann_file='textdet_test.json',
+ test_mode=True,
+ pipeline=None)
+```
+
+After configuring the dataset, we can import it in the corresponding model configs. For example, to train the "DBNet_R18" model on the ICDAR 2015 dataset.
+
+```Python
+_base_ = [
+ '_base_dbnet_r18_fpnc.py',
+ '../_base_/datasets/icdar2015.py', # import the dataset config
+ '../_base_/default_runtime.py',
+ '../_base_/schedules/schedule_sgd_1200e.py',
+]
+
+icdar2015_textdet_train = _base_.icdar2015_textdet_train # specify the training set
+icdar2015_textdet_train.pipeline = _base_.train_pipeline # specify the training pipeline
+icdar2015_textdet_test = _base_.icdar2015_textdet_test # specify the testing set
+icdar2015_textdet_test.pipeline = _base_.test_pipeline # specify the testing pipeline
+
+train_dataloader = dict(
+ batch_size=16,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=icdar2015_textdet_train) # specify the dataset in train_dataloader
+
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=4,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=icdar2015_textdet_test) # specify the dataset in val_dataloader
+
+test_dataloader = val_dataloader
+```
+
+### Multi-dataset Training
+
+In addition, [`ConcatDataset`](mmocr.datasets.ConcatDataset) enables users to train or test the model on a combination of multiple datasets. You just need to set the dataset type in the dataloader to `ConcatDataset` in the configuration file and specify the corresponding list of datasets.
+
+```Python
+train_list = [ic11, ic13, ic15]
+train_dataloader = dict(
+ dataset=dict(
+ type='ConcatDataset', datasets=train_list, pipeline=train_pipeline))
+```
+
+For example, the following configuration uses the MJSynth dataset for training and 6 academic datasets (CUTE80, IIIT5K, SVT, SVTP, ICDAR2013, ICDAR2015) for testing.
+
+```Python
+_base_ = [ # Import all dataset configurations you want to use
+ '../_base_/datasets/mjsynth.py',
+ '../_base_/datasets/cute80.py',
+ '../_base_/datasets/iiit5k.py',
+ '../_base_/datasets/svt.py',
+ '../_base_/datasets/svtp.py',
+ '../_base_/datasets/icdar2013.py',
+ '../_base_/datasets/icdar2015.py',
+ '../_base_/default_runtime.py',
+ '../_base_/schedules/schedule_adadelta_5e.py',
+ '_base_crnn_mini-vgg.py',
+]
+
+# List of training datasets
+train_list = [_base_.mjsynth_textrecog_train]
+# List of testing datasets
+test_list = [
+ _base_.cute80_textrecog_test, _base_.iiit5k_textrecog_test, _base_.svt_textrecog_test,
+ _base_.svtp_textrecog_test, _base_.icdar2013_textrecog_test, _base_.icdar2015_textrecog_test
+]
+
+# Use ConcatDataset to combine the datasets in the list
+train_dataset = dict(
+ type='ConcatDataset', datasets=train_list, pipeline=_base_.train_pipeline)
+test_dataset = dict(
+ type='ConcatDataset', datasets=test_list, pipeline=_base_.test_pipeline)
+
+train_dataloader = dict(
+ batch_size=192 * 4,
+ num_workers=32,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=train_dataset)
+
+test_dataloader = dict(
+ batch_size=1,
+ num_workers=4,
+ persistent_workers=True,
+ drop_last=False,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=test_dataset)
+
+val_dataloader = test_dataloader
+```
diff --git a/mmocr-dev-1.x/docs/en/user_guides/inference.md b/mmocr-dev-1.x/docs/en/user_guides/inference.md
new file mode 100644
index 0000000000000000000000000000000000000000..0687a327320017b9a4e268c8947644aa95aa4ed5
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/inference.md
@@ -0,0 +1,538 @@
+# Inference
+
+In OpenMMLab, all the inference operations are unified into a new interface - `Inferencer`. `Inferencer` is designed to expose a neat and simple API to users, and shares very similar interface across different OpenMMLab libraries.
+
+In MMOCR, Inferencers are constructed in different levels of task abstraction.
+
+- Standard Inferencer: Following OpenMMLab's convention, each fundamental task in MMOCR has a standard Inferencer, namely `TextDetInferencer` (text detection), `TextRecInferencer` (text recognition), `TextSpottingInferencer` (end-to-end OCR), and `KIEInferencer` (key information extraction). They are designed to perform inference on a single task, and can be chained together to perform inference on a series of tasks. They also share very similar interface, have standard input/output protocol, and overall follow the OpenMMLab design.
+- **MMOCRInferencer**: We also provide `MMOCRInferencer`, a convenient inference interface only designed for MMOCR. It encapsulates and chains all the Inferencers in MMOCR, so users can use this Inferencer to perform a series of tasks on an image and directly get the final result in an end-to-end manner. *However, it has a relatively different interface from other standard Inferencers, and some of standard Inferencer functionalities might be sacrificed for the sake of simplicity.*
+
+For new users, we recommend using **MMOCRInferencer** to test out different combinations of models.
+
+If you are a developer and wish to integrate the models into your own project, we recommend using **standard Inferencers**, as they are more flexible and standardized, equipped with full functionalities.
+
+## Basic Usage
+
+`````{tabs}
+
+````{group-tab} MMOCRInferencer
+
+As of now, `MMOCRInferencer` can perform inference on the following tasks:
+
+- Text detection
+- Text recognition
+- OCR (text detection + text recognition)
+- Key information extraction (text detection + text recognition + key information extraction)
+- *OCR (text spotting)* (coming soon)
+
+For convenience, `MMOCRInferencer` provides both Python and command line interfaces. For example, if you want to perform OCR inference on `demo/demo_text_ocr.jpg` with `DBNet` as the text detection model and `CRNN` as the text recognition model, you can simply run the following command:
+
+::::{tabs}
+
+:::{code-tab} python
+>>> from mmocr.apis import MMOCRInferencer
+>>> # Load models into memory
+>>> ocr = MMOCRInferencer(det='DBNet', rec='SAR')
+>>> # Perform inference
+>>> ocr('demo/demo_text_ocr.jpg', show=True)
+:::
+
+:::{code-tab} bash
+python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec SAR --show
+:::
+::::
+
+The resulting OCR output will be displayed in a new window:
+
+
+
+
+
+```{note}
+If you are running MMOCR on a server without GUI or via SSH tunnel with X11 forwarding disabled, the `show` option will not work. However, you can still save visualizations to files by setting `out_dir` and `save_vis=True` arguments. Read [Dumping Results](#dumping-results) for details.
+```
+
+Depending on the initialization arguments, `MMOCRInferencer` can run in different modes. For example, it can run in KIE mode if it is initialized with `det`, `rec` and `kie` specified.
+
+::::{tabs}
+
+:::{code-tab} python
+>>> kie = MMOCRInferencer(det='DBNet', rec='SAR', kie='SDMGR')
+>>> kie('demo/demo_kie.jpeg', show=True)
+:::
+
+:::{code-tab} bash
+python tools/infer.py demo/demo_kie.jpeg --det DBNet --rec SAR --kie SDMGR --show
+:::
+
+::::
+
+The output image should look like this:
+
+
+
+
+
+
+You may have found that the Python interface and the command line interface of `MMOCRInferencer` are very similar. The following sections will use the Python interface as an example to introduce the usage of `MMOCRInferencer`. For more information about the command line interface, please refer to [Command Line Interface](#command-line-interface).
+
+````
+
+````{group-tab} Standard Inferencer
+
+In general, all the standard Inferencers across OpenMMLab share a very similar interface. The following example shows how to use `TextDetInferencer` to perform inference on a single image.
+
+```python
+>>> from mmocr.apis import TextDetInferencer
+>>> # Load models into memory
+>>> inferencer = TextDetInferencer(model='DBNet')
+>>> # Inference
+>>> inferencer('demo/demo_text_ocr.jpg', show=True)
+```
+
+The visualization result should look like:
+
+
+
+
+
+````
+
+`````
+
+## Initialization
+
+Each Inferencer must be initialized with a model. You can also choose the inference device during initialization.
+
+### Model Initialization
+
+`````{tabs}
+
+````{group-tab} MMOCRInferencer
+
+For each task, `MMOCRInferencer` takes two arguments in the form of `xxx` and `xxx_weights` (e.g. `det` and `det_weights`) for initialization, and there are many ways to initialize a model for inference. We will take `det` and `det_weights` as an example to illustrate some typical ways to initialize a model.
+
+- To infer with MMOCR's pre-trained model, passing its name to the argument `det` can work. The weights will be automatically downloaded and loaded from OpenMMLab's model zoo. Check [Weights](../modelzoo.md#weights) for available model names.
+
+ ```python
+ >>> MMOCRInferencer(det='DBNet')
+ ```
+
+- To load custom config and weight, you can pass the path to the config file to `det` and the path to the weight to `det_weights`.
+
+ ```python
+ >>> MMOCRInferencer(det='path/to/dbnet_config.py', det_weights='path/to/dbnet.pth')
+ ```
+
+You may click on the "Standard Inferencer" tab to find more initialization methods.
+
+````
+
+````{group-tab} Standard Inferencer
+
+Every standard `Inferencer` accepts two parameters, `model` and `weights`. (In `MMOCRInferencer`, they are referred to as `xxx` and `xxx_weights`)
+
+- `model` takes either the name of a model, or the path to a config file as input. The name of a model is obtained from the model's metafile ([Example](https://github.com/open-mmlab/mmocr/blob/1.x/configs/textdet/dbnet/metafile.yml)) indexed from [model-index.yml](https://github.com/open-mmlab/mmocr/blob/1.x/model-index.yml). You can find the list of available weights [here](../modelzoo.md#weights).
+
+- `weights` accepts the path to a weight file.
+
+
+
+There are various ways to initialize a model.
+
+- To infer with MMOCR's pre-trained model, you can pass its name to `model`. The weights will be automatically downloaded and loaded from OpenMMLab's model zoo.
+
+ ```python
+ >>> from mmocr.apis import TextDetInferencer
+ >>> inferencer = TextDetInferencer(model='DBNet')
+ ```
+
+ ```{note}
+ The model type must match the Inferencer type.
+ ```
+
+ You can load another weight by passing its path/url to `weights`.
+
+ ```python
+ >>> inferencer = TextDetInferencer(model='DBNet', weights='path/to/dbnet.pth')
+ ```
+
+- To load custom config and weight, you can pass the path to the config file to `model` and the path to the weight to `weights`.
+
+ ```python
+ >>> inferencer = TextDetInferencer(model='path/to/dbnet_config.py', weights='path/to/dbnet.pth')
+ ```
+
+- By default, [MMEngine](https://github.com/open-mmlab/mmengine/) dumps config to the weight. If you have a weight trained on MMEngine, you can also pass the path to the weight file to `weights` without specifying `model`:
+
+ ```python
+ >>> # It will raise an error if the config file cannot be found in the weight
+ >>> inferencer = TextDetInferencer(weights='path/to/dbnet.pth')
+ ```
+
+- Passing config file to `model` without specifying `weight` will result in a randomly initialized model.
+
+````
+`````
+
+### Device
+
+Each Inferencer instance is bound to a device.
+By default, the best device is automatically decided by [MMEngine](https://github.com/open-mmlab/mmengine/). You can also alter the device by specifying the `device` argument. For example, you can use the following code to create an Inferencer on GPU 1.
+
+`````{tabs}
+
+````{group-tab} MMOCRInferencer
+
+```python
+>>> inferencer = MMOCRInferencer(det='DBNet', device='cuda:1')
+```
+
+````
+
+````{group-tab} Standard Inferencer
+
+```python
+>>> inferencer = TextDetInferencer(model='DBNet', device='cuda:1')
+```
+
+````
+
+`````
+
+To create an Inferencer on CPU:
+
+`````{tabs}
+
+````{group-tab} MMOCRInferencer
+
+```python
+>>> inferencer = MMOCRInferencer(det='DBNet', device='cpu')
+```
+
+````
+
+````{group-tab} Standard Inferencer
+
+```python
+>>> inferencer = TextDetInferencer(model='DBNet', device='cpu')
+```
+
+````
+
+`````
+
+Refer to [torch.device](torch.device) for all the supported forms.
+
+## Inference
+
+Once the Inferencer is initialized, you can directly pass in the raw data to be inferred and get the inference results from return values.
+
+### Input
+
+`````{tabs}
+
+````{tab} MMOCRInferencer / TextDetInferencer / TextRecInferencer / TextSpottingInferencer
+
+Input can be either of these types:
+
+- str: Path/URL to the image.
+
+ ```python
+ >>> inferencer('demo/demo_text_ocr.jpg')
+ ```
+
+- array: Image in numpy array. It should be in BGR order.
+
+ ```python
+ >>> import mmcv
+ >>> array = mmcv.imread('demo/demo_text_ocr.jpg')
+ >>> inferencer(array)
+ ```
+
+- list: A list of basic types above. Each element in the list will be processed separately.
+
+ ```python
+ >>> inferencer(['img_1.jpg', 'img_2.jpg])
+ >>> # You can even mix the types
+ >>> inferencer(['img_1.jpg', array])
+ ```
+
+- str: Path to the directory. All images in the directory will be processed.
+
+ ```python
+ >>> inferencer('tests/data/det_toy_dataset/imgs/test/')
+ ```
+
+````
+
+````{tab} KIEInferencer
+
+Input can be a dict or list[dict], where each dictionary contains
+following keys:
+
+- `img` (str or ndarray): Path to the image or the image itself. If KIE Inferencer is used in no-visual mode, this key is not required.
+If it's an numpy array, it should be in BGR order.
+- `img_shape` (tuple(int, int)): Image shape in (H, W). Only required when KIE Inferencer is used in no-visual mode and no `img` is provided.
+- `instances` (list[dict]): A list of instances.
+
+Each `instance` looks like the following:
+
+```python
+{
+ # A nested list of 4 numbers representing the bounding box of
+ # the instance, in (x1, y1, x2, y2) order.
+ "bbox": np.array([[x1, y1, x2, y2], [x1, y1, x2, y2], ...],
+ dtype=np.int32),
+
+ # List of texts.
+ "texts": ['text1', 'text2', ...],
+}
+```
+
+````
+`````
+
+### Output
+
+By default, each `Inferencer` returns the prediction results in a dictionary format.
+
+- `visualization` contains the visualized predictions. But it's an empty list by default unless `return_vis=True`.
+
+- `predictions` contains the predictions results in a json-serializable format. As presented below, the contents are slightly different depending on the task type.
+
+ `````{tabs}
+
+ :::{group-tab} MMOCRInferencer
+
+ ```python
+ {
+ 'predictions' : [
+ # Each instance corresponds to an input image
+ {
+ 'det_polygons': [...], # 2d list of length (N,), format: [x1, y1, x2, y2, ...]
+ 'det_scores': [...], # float list of length (N,)
+ 'det_bboxes': [...], # 2d list of shape (N, 4), format: [min_x, min_y, max_x, max_y]
+ 'rec_texts': [...], # str list of length (N,)
+ 'rec_scores': [...], # float list of length (N,)
+ 'kie_labels': [...], # node labels, length (N, )
+ 'kie_scores': [...], # node scores, length (N, )
+ 'kie_edge_scores': [...], # edge scores, shape (N, N)
+ 'kie_edge_labels': [...] # edge labels, shape (N, N)
+ },
+ ...
+ ],
+ 'visualization' : [
+ array(..., dtype=uint8),
+ ]
+ }
+ ```
+
+ :::
+
+ :::{group-tab} Standard Inferencer
+
+ ````{tabs}
+ ```{code-tab} python TextDetInferencer
+
+ {
+ 'predictions' : [
+ # Each instance corresponds to an input image
+ {
+ 'polygons': [...], # 2d list of len (N,) in the format of [x1, y1, x2, y2, ...]
+ 'bboxes': [...], # 2d list of shape (N, 4), in the format of [min_x, min_y, max_x, max_y]
+ 'scores': [...] # list of float, len (N, )
+ },
+ ]
+ 'visualization' : [
+ array(..., dtype=uint8),
+ ]
+ }
+ ```
+
+ ```{code-tab} python TextRecInferencer
+ {
+ 'predictions' : [
+ # Each instance corresponds to an input image
+ {
+ 'text': '...', # a string
+ 'scores': 0.1, # a float
+ },
+ ...
+ ]
+ 'visualization' : [
+ array(..., dtype=uint8),
+ ]
+ }
+ ```
+
+ ```{code-tab} python TextSpottingInferencer
+ {
+ 'predictions' : [
+ # Each instance corresponds to an input image
+ {
+ 'polygons': [...], # 2d list of len (N,) in the format of [x1, y1, x2, y2, ...]
+ 'bboxes': [...], # 2d list of shape (N, 4), in the format of [min_x, min_y, max_x, max_y]
+ 'scores': [...] # list of float, len (N, )
+ 'texts': ['...',] # list of texts, len (N, )
+ },
+ ]
+ 'visualization' : [
+ array(..., dtype=uint8),
+ ]
+ }
+ ```
+
+ ```{code-tab} python KIEInferencer
+ {
+ 'predictions' : [
+ # Each instance corresponds to an input image
+ {
+ 'labels': [...], # node label, len (N,)
+ 'scores': [...], # node scores, len (N, )
+ 'edge_scores': [...], # edge scores, shape (N, N)
+ 'edge_labels': [...], # edge labels, shape (N, N)
+ },
+ ]
+ 'visualization' : [
+ array(..., dtype=uint8),
+ ]
+ }
+ ```
+ ````
+
+ :::
+
+ `````
+
+If you wish to get the raw outputs from the model, you can set `return_datasamples` to `True` to get the original [DataSample](structures.md), which will be stored in `predictions`.
+
+### Dumping Results
+
+Apart from obtaining predictions from the return value, you can also export the predictions/visualizations to files by setting `out_dir` and `save_pred`/`save_vis` arguments.
+
+```python
+>>> inferencer('img_1.jpg', out_dir='outputs/', save_pred=True, save_vis=True)
+```
+
+Results in the directory structure like:
+
+```text
+outputs
+โโโ preds
+โ โโโ img_1.json
+โโโ vis
+ โโโ img_1.jpg
+```
+
+The filename of each file is the same as the corresponding input image filename. If the input image is an array, the filename will be a number starting from 0.
+
+### Batch Inference
+
+You can customize the batch size by setting `batch_size`. The default batch size is 1.
+
+## API
+
+Here are extensive lists of parameters that you can use.
+
+````{tabs}
+
+```{group-tab} MMOCRInferencer
+
+**MMOCRInferencer.\_\_init\_\_():**
+
+| Arguments | Type | Default | Description |
+| ------------- | ---------------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `det` | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained text detection algorithm. It's the path to the config file or the model name defined in metafile. |
+| `det_weights` | str, optional | None | Path to the custom checkpoint file of the selected det model. If it is not specified and "det" is a model name of metafile, the weights will be loaded from metafile. |
+| `rec` | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained text recognition algorithm. Itโs the path to the config file or the model name defined in metafile. |
+| `rec_weights` | str, optional | None | Path to the custom checkpoint file of the selected rec model. If it is not specified and โrecโ is a model name of metafile, the weights will be loaded from metafile. |
+| `kie` \[1\] | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained key information extraction algorithm. Itโs the path to the config file or the model name defined in metafile. |
+| `kie_weights` | str, optional | None | Path to the custom checkpoint file of the selected kie model. If it is not specified and โkieโ is a model name of metafile, the weights will be loaded from metafile. |
+| `device` | str, optional | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. If None, the available device will be automatically used. Defaults to None. |
+
+\[1\]: `kie` is only effective when both text detection and recognition models are specified.
+
+**MMOCRInferencer.\_\_call\_\_()**
+
+| Arguments | Type | Default | Description |
+| -------------------- | ----------------------- | ------------ | ------------------------------------------------------------------------------------------------ |
+| `inputs` | str/list/tuple/np.array | **required** | It can be a path to an image/a folder, an np array or a list/tuple (with img paths or np arrays) |
+| `return_datasamples` | bool | False | Whether to return results as DataSamples. If False, the results will be packed into a dict. |
+| `batch_size` | int | 1 | Inference batch size. |
+| `det_batch_size` | int, optional | None | Inference batch size for text detection model. Overwrite batch_size if it is not None. |
+| `rec_batch_size` | int, optional | None | Inference batch size for text recognition model. Overwrite batch_size if it is not None. |
+| `kie_batch_size` | int, optional | None | Inference batch size for KIE model. Overwrite batch_size if it is not None. |
+| `return_vis` | bool | False | Whether to return the visualization result. |
+| `print_result` | bool | False | Whether to print the inference result to the console. |
+| `show` | bool | False | Whether to display the visualization results in a popup window. |
+| `wait_time` | float | 0 | The interval of show(s). |
+| `out_dir` | str | `results/` | Output directory of results. |
+| `save_vis` | bool | False | Whether to save the visualization results to `out_dir`. |
+| `save_pred` | bool | False | Whether to save the inference results to `out_dir`. |
+
+```
+
+```{group-tab} Standard Inferencer
+
+**Inferencer.\_\_init\_\_():**
+
+| Arguments | Type | Default | Description |
+| --------- | ---------------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `model` | str or [Weights](../modelzoo.html#weights), optional | None | Path to the config file or the model name defined in metafile. |
+| `weights` | str, optional | None | Path to the custom checkpoint file of the selected det model. If it is not specified and "det" is a model name of metafile, the weights will be loaded from metafile. |
+| `device` | str, optional | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. If None, the available device will be automatically used. Defaults to None. |
+
+**Inferencer.\_\_call\_\_()**
+
+| Arguments | Type | Default | Description |
+| -------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------- |
+| `inputs` | str/list/tuple/np.array | **required** | It can be a path to an image/a folder, an np array or a list/tuple (with img paths or np arrays) |
+| `return_datasamples` | bool | False | Whether to return results as DataSamples. If False, the results will be packed into a dict. |
+| `batch_size` | int | 1 | Inference batch size. |
+| `progress_bar` | bool | True | Whether to show a progress bar. |
+| `return_vis` | bool | False | Whether to return the visualization result. |
+| `print_result` | bool | False | Whether to print the inference result to the console. |
+| `show` | bool | False | Whether to display the visualization results in a popup window. |
+| `wait_time` | float | 0 | The interval of show(s). |
+| `draw_pred` | bool | True | Whether to draw predicted bounding boxes. *Only applicable on `TextDetInferencer` and `TextSpottingInferencer`.* |
+| `out_dir` | str | `results/` | Output directory of results. |
+| `save_vis` | bool | False | Whether to save the visualization results to `out_dir`. |
+| `save_pred` | bool | False | Whether to save the inference results to `out_dir`. |
+
+```
+````
+
+## Command Line Interface
+
+```{note}
+This section is only applicable to `MMOCRInferencer`.
+```
+
+You can use `tools/infer.py` to perform inference through `MMOCRInferencer`.
+Its general usage is as follows:
+
+```bash
+python tools/infer.py INPUT_PATH [--det DET] [--det-weights ...] ...
+```
+
+where `INPUT_PATH` is a required field, which should be a path to an image or a folder. Command-line parameters follow the mapping relationship with the Python interface parameters as follows:
+
+- To convert the Python interface parameters to the command line ones, you need to add two `--` in front of the Python interface parameters, and replace the underscore `_` with the hyphen `-`. For example, `out_dir` becomes `--out-dir`.
+- For boolean type parameters, putting the parameter in the command is equivalent to specifying it as True. For example, `--show` will specify the `show` parameter as True.
+
+In addition, the command line will not display the inference result by default. You can use the `--print-result` parameter to view the inference result.
+
+Here is an example:
+
+```bash
+python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec SAR --show --print-result
+```
+
+Running this command will give the following result:
+
+```bash
+{'predictions': [{'rec_texts': ['CBank', 'Docbcba', 'GROUP', 'MAUN', 'CROBINSONS', 'AOCOC', '916M3', 'BOO9', 'Oven', 'BRANDS', 'ARETAIL', '14', '70S', 'ROUND', 'SALE', 'YEAR', 'ALLY', 'SALE', 'SALE'],
+'rec_scores': [0.9753464579582214, ...], 'det_polygons': [[551.9930285844646, 411.9138765335083, 553.6153911653112,
+383.53195309638977, 620.2410061195247, 387.33785033226013, 618.6186435386782, 415.71977376937866], ...], 'det_scores': [0.8230461478233337, ...]}]}
+```
diff --git a/mmocr-dev-1.x/docs/en/user_guides/train_test.md b/mmocr-dev-1.x/docs/en/user_guides/train_test.md
new file mode 100644
index 0000000000000000000000000000000000000000..0e825217f89017ea04ece03f170eef4f9c53a4bc
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/train_test.md
@@ -0,0 +1,323 @@
+# Training and Testing
+
+To meet diverse requirements, MMOCR supports training and testing models on various devices, including PCs, work stations, computation clusters, etc.
+
+## Single GPU Training and Testing
+
+### Training
+
+`tools/train.py` provides the basic training service. MMOCR recommends using GPUs for model training and testing, but it still enables CPU-Only training and testing. For example, the following commands demonstrate how to train a DBNet model using a single GPU or CPU.
+
+```bash
+# Train the specified MMOCR model by calling tools/train.py
+CUDA_VISIBLE_DEVICES= python tools/train.py ${CONFIG_FILE} [PY_ARGS]
+
+# Training
+# Example 1: Training DBNet with CPU
+CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py
+
+# Example 2: Specify to train DBNet with gpu:0, specify the working directory as dbnet/, and turn on mixed precision (amp) training
+CUDA_VISIBLE_DEVICES=0 python tools/train.py configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py --work-dir dbnet/ --amp
+```
+
+```{note}
+If multiple GPUs are available, you can specify a certain GPU, e.g. the third one, by setting CUDA_VISIBLE_DEVICES=3.
+```
+
+The following table lists all the arguments supported by `train.py`. Args without the `--` prefix are mandatory, while others are optional.
+
+| ARGS | Type | Description |
+| --------------- | ---- | --------------------------------------------------------------------------- |
+| config | str | (required) Path to config. |
+| --work-dir | str | Specify the working directory for the training logs and models checkpoints. |
+| --resume | bool | Whether to resume training from the latest checkpoint. |
+| --amp | bool | Whether to use automatic mixture precision for training. |
+| --auto-scale-lr | bool | Whether to use automatic learning rate scaling. |
+| --cfg-options | str | Override some settings in the configs. [Example](<>) |
+| --launcher | str | Option for launcher๏ผ\['none', 'pytorch', 'slurm', 'mpi'\]. |
+| --local_rank | int | Rank of local machine๏ผused for distributed training๏ผdefaults to 0ใ |
+| --tta | bool | Whether to use test time augmentation. |
+
+### Test
+
+`tools/test.py` provides the basic testing service, which is used in a similar way to the training script. For example, the following command demonstrates test a DBNet model on a single GPU or CPU.
+
+```bash
+# Test a pretrained MMOCR model by calling tools/test.py
+CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [PY_ARGS]
+
+# Test
+# Example 1: Testing DBNet with CPU
+CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth
+
+# Example 2: Testing DBNet on gpu:0
+CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth
+```
+
+The following table lists all the arguments supported by `test.py`. Args without the `--` prefix are mandatory, while others are optional.
+
+| ARGS | Type | Description |
+| ------------- | ----- | -------------------------------------------------------------------- |
+| config | str | (required) Path to config. |
+| checkpoint | str | (required) The model to be tested. |
+| --work-dir | str | Specify the working directory for the logs. |
+| --save-preds | bool | Whether to save the predictions to a pkl file. |
+| --show | bool | Whether to visualize the predictions. |
+| --show-dir | str | Path to save the visualization results. |
+| --wait-time | float | Interval of visualization (s), defaults to 2. |
+| --cfg-options | str | Override some settings in the configs. [Example](<>) |
+| --launcher | str | Option for launcher๏ผ\['none', 'pytorch', 'slurm', 'mpi'\]. |
+| --local_rank | int | Rank of local machine๏ผused for distributed training๏ผdefaults to 0. |
+
+## Training and Testing with Multiple GPUs
+
+For large models, distributed training or testing significantly improves the efficiency. For this purpose, MMOCR provides distributed scripts `tools/dist_train.sh` and `tools/dist_test.sh` implemented based on [MMDistributedDataParallel](mmengine.model.wrappers.MMDistributedDataParallel).
+
+```bash
+# Training
+NNODES=${NNODES} NODE_RANK=${NODE_RANK} PORT=${MASTER_PORT} MASTER_ADDR=${MASTER_ADDR} ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
+
+# Testing
+NNODES=${NNODES} NODE_RANK=${NODE_RANK} PORT=${MASTER_PORT} MASTER_ADDR=${MASTER_ADDR} ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
+```
+
+The following table lists the arguments supported by `dist_*.sh`.
+
+| ARGS | Type | Description |
+| --------------- | ---- | --------------------------------------------------------------------------------------------- |
+| NNODES | int | The number of nodes. Defaults to 1. |
+| NODE_RANK | int | The rank of current node. Defaults to 0. |
+| PORT | int | The master port that will be used by rank 0 node, ranging from 0 to 65535. Defaults to 29500. |
+| MASTER_ADDR | str | The address of rank 0 node. Defaults to "127.0.0.1". |
+| CONFIG_FILE | str | (required) The path to config. |
+| CHECKPOINT_FILE | str | (required๏ผonly used in dist_test.sh)The path to checkpoint to be tested. |
+| GPU_NUM | int | (required) The number of GPUs to be used per node. |
+| \[PY_ARGS\] | str | Arguments to be parsed by tools/train.py and tools/test.py. |
+
+These two scripts enable training and testing on **single-machine multi-GPU** or **multi-machine multi-GPU**. See the following example for usage.
+
+### Single-machine Multi-GPU
+
+The following commands demonstrate how to train and test with a specified number of GPUs on a **single machine** with multiple GPUs.
+
+1. **Training**
+
+ Training DBNet using 4 GPUs on a single machine.
+
+ ```bash
+ tools/dist_train.sh configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py 4
+ ```
+
+2. **Testing**
+
+ Testing DBNet using 4 GPUs on a single machine.
+
+ ```bash
+ tools/dist_test.sh configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth 4
+ ```
+
+### Launching Multiple Tasks on Single Machine
+
+For a workstation equipped with multiple GPUs, the user can launch multiple tasks simultaneously by specifying the GPU IDs. For example, the following command demonstrates how to test DBNet with GPU `[0, 1, 2, 3]` and train CRNN on GPU `[4, 5, 6, 7]`.
+
+```bash
+# Specify gpu:0,1,2,3 for testing and assign port number 29500
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_test.sh configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth 4
+
+# Specify gpu:4,5,6,7 for training and assign port number 29501
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh configs/textrecog/crnn/crnn_academic_dataset.py 4
+```
+
+```{note}
+`dist_train.sh` sets `MASTER_PORT` to `29500` by default. When other processes already occupy this port, the program will get a runtime error `RuntimeError: Address already in use`. In this case, you need to set `MASTER_PORT` to another free port number in the range of `(0~65535)`.
+```
+
+### Multi-machine Multi-GPU Training and Testing
+
+You can launch a task on multiple machines connected to the same network. MMOCR relies on `torch.distributed` package for distributed training. Find more information at PyTorchโs [launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility).
+
+1. **Training**
+
+ The following command demonstrates how to train DBNet on two machines with a total of 4 GPUs.
+
+ ```bash
+ # Say that you want to launch the training job on two machines
+ # On the first machine:
+ NNODES=2 NODE_RANK=0 PORT=29500 MASTER_ADDR=10.140.0.169 tools/dist_train.sh configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py 2
+ # On the second machine:
+ NNODES=2 NODE_RANK=1 PORT=29501 MASTER_ADDR=10.140.0.169 tools/dist_train.sh configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py 2
+ ```
+
+2. **Testing**
+
+ The following command demonstrates how to test DBNet on two machines with a total of 4 GPUs.
+
+ ```bash
+ # Say that you want to launch the testing job on two machines
+ # On the first machine:
+ NNODES=2 NODE_RANK=0 PORT=29500 MASTER_ADDR=10.140.0.169 tools/dist_test.sh configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth 2
+ # On the second machine:
+ NNODES=2 NODE_RANK=1 PORT=29501 MASTER_ADDR=10.140.0.169 tools/dist_test.sh configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth 2
+ ```
+
+ ```{note}
+ The speed of the network could be the bottleneck of training.
+ ```
+
+## Training and Testing with Slurm Cluster
+
+If you run MMOCR on a cluster managed with [Slurm](https://slurm.schedmd.com/), you can use the script `tools/slurm_train.sh` and `tools/slurm_test.sh`.
+
+```bash
+# tools/slurm_train.sh provides scripts for submitting training tasks on clusters managed by the slurm
+GPUS=${GPUS} GPUS_PER_NODE=${GPUS_PER_NODE} CPUS_PER_TASK=${CPUS_PER_TASK} SRUN_ARGS=${SRUN_ARGS} ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [PY_ARGS]
+
+# tools/slurm_test.sh provides scripts for submitting testing tasks on clusters managed by the slurm
+GPUS=${GPUS} GPUS_PER_NODE=${GPUS_PER_NODE} CPUS_PER_TASK=${CPUS_PER_TASK} SRUN_ARGS=${SRUN_ARGS} ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${WORK_DIR} [PY_ARGS]
+```
+
+| ARGS | Type | Description |
+| --------------- | ---- | ----------------------------------------------------------------------------------------------------------- |
+| GPUS | int | The number of GPUs to be used by this task. Defaults to 8. |
+| GPUS_PER_NODE | int | The number of GPUs to be allocated per node. Defaults to 8. |
+| CPUS_PER_TASK | int | The number of CPUs to be allocated per task. Defaults to 5. |
+| SRUN_ARGS | str | Arguments to be parsed by srun. Available options can be found [here](https://slurm.schedmd.com/srun.html). |
+| PARTITION | str | (required) Specify the partition on cluster. |
+| JOB_NAME | str | (required) Name of the submitted job. |
+| WORK_DIR | str | (required) Specify the working directory for saving the logs and checkpoints. |
+| CHECKPOINT_FILE | str | (required๏ผonly used in slurm_test.sh)Path to the checkpoint to be tested. |
+| PY_ARGS | str | Arguments to be parsed by `tools/train.py` and `tools/test.py`. |
+
+These scripts enable training and testing on slurm clusters, see the following examples.
+
+1. Training
+
+ Here is an example of using 1 GPU to train a DBNet model on the `dev` partition.
+
+ ```bash
+ # Example: Request 1 GPU resource on dev partition for DBNet training task
+ GPUS=1 GPUS_PER_NODE=1 CPUS_PER_TASK=5 tools/slurm_train.sh dev db_r50 configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py work_dir
+ ```
+
+2. Testing
+
+ Similarly, the following example requests 1 GPU for testing.
+
+ ```bash
+ # Example: Request 1 GPU resource on dev partition for DBNet testing task
+ GPUS=1 GPUS_PER_NODE=1 CPUS_PER_TASK=5 tools/slurm_test.sh dev db_r50 configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth work_dir
+ ```
+
+## Advanced Tips
+
+### Resume Training from a Checkpoint
+
+`tools/train.py` allows users to resume training from a checkpoint by specifying the `--resume` parameter, where it will automatically resume training from the latest saved checkpoint.
+
+```bash
+# Example: Resuming training from the latest checkpoint
+python tools/train.py configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py 4 --resume
+```
+
+By default, the program will automatically resume training from the last successfully saved checkpoint in the last training session, i.e. `latest.pth`. However,
+
+```python
+# Example: Set the path of the checkpoint you want to load in the configuration file
+load_from = 'work_dir/dbnet/models/epoch_10000.pth'
+```
+
+### Mixed Precision Training
+
+Mixed precision training offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network. In MMOCR, the users can enable the automatic mixed precision training by simply add `--amp`.
+
+```bash
+# Example: Using automatic mixed precision training
+python tools/train.py configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py 4 --amp
+```
+
+The following table shows the support of each algorithm in MMOCR for automatic mixed precision training.
+
+| | Whether support AMP | Description |
+| ------------- | :-----------------: | :-------------------------------------: |
+| | Text Detection | |
+| DBNet | Y | |
+| DBNetpp | Y | |
+| DRRG | N | roi_align_rotated does not support fp16 |
+| FCENet | N | BCELoss does not support fp16 |
+| Mask R-CNN | Y | |
+| PANet | Y | |
+| PSENet | Y | |
+| TextSnake | N | |
+| | Text Recognition | |
+| ABINet | Y | |
+| CRNN | Y | |
+| MASTER | Y | |
+| NRTR | Y | |
+| RobustScanner | Y | |
+| SAR | Y | |
+| SATRN | Y | |
+
+### Automatic Learning Rate Scaling
+
+MMOCR sets default initial learning rates for each model in the configuration file. However, these initial learning rates may not be applicable when the user uses a different `batch_size` than our preset `base_batch_size`. Therefore, we provide a tool to automatically scale the learning rate, which can be called by adding the `--auto-scale-lr`.
+
+```bash
+# Example: Using automatic learning rate scaling
+python tools/train.py configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py 4 --auto-scale-lr
+```
+
+### Visualize the Predictions
+
+`tools/test.py` provides the visualization interface to facilitate the qualitative analysis of the OCR models.
+
+
+
+![Detection](../../../demo/resources/det_vis.png)
+
+(Green boxes are GTs, while red boxes are predictions)
+
+
+
+
+
+![Recognition](../../../demo/resources/rec_vis.png)
+
+(Green font is the GT, red font is the prediction)
+
+
+
+
+
+![KIE](../../../demo/resources/kie_vis.png)
+
+(From left to right: original image, text detection and recognition result, text classification result, relationship)
+
+
+
+```bash
+# Example 1: Show the visualization results per 2 seconds
+python tools/test.py configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth --show --wait-time 2
+
+# Example 2: For systems that do not support graphical interfaces (such as computing clusters, etc.), the visualization results can be dumped in the specified path
+python tools/test.py configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py dbnet_r50.pth --show-dir ./vis_results
+```
+
+The visualization-related parameters in `tools/test.py` are described as follows.
+
+| ARGS | Type | Description |
+| ----------- | ----- | --------------------------------------------- |
+| --show | bool | Whether to show the visualization results. |
+| --show-dir | str | Path to save the visualization results. |
+| --wait-time | float | Interval of visualization (s), defaults to 2. |
+
+### Test Time Augmentation
+
+Test time augmentation (TTA) is a technique that is used to improve the performance of a model by performing data augmentation on the input image at test time. It is a simple yet effective method to improve the performance of a model. In MMOCR, we support TTA in the following ways:
+
+```{note}
+TTA is only supported for text recognition models.
+```
+
+```bash
+python tools/test.py configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py checkpoints/crnn_mini-vgg_5e_mj.pth --tta
+```
diff --git a/mmocr-dev-1.x/docs/en/user_guides/useful_tools.md b/mmocr-dev-1.x/docs/en/user_guides/useful_tools.md
new file mode 100644
index 0000000000000000000000000000000000000000..9828198f62fcd818946cfa19459b0d841d7cd4e4
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/useful_tools.md
@@ -0,0 +1,241 @@
+# Useful Tools
+
+## Visualization Tools
+
+### Dataset Visualization Tool
+
+MMOCR provides a dataset visualization tool `tools/visualizations/browse_datasets.py` to help users troubleshoot possible dataset-related problems. You just need to specify the path to the training config (usually stored in `configs/textdet/dbnet/xxx.py`) or the dataset config (usually stored in `configs/textdet/_base_/datasets/xxx.py`), and the tool will automatically plots the transformed (or original) images and labels.
+
+#### Usage
+
+```bash
+python tools/visualizations/browse_dataset.py \
+ ${CONFIG_FILE} \
+ [-o, --output-dir ${OUTPUT_DIR}] \
+ [-p, --phase ${DATASET_PHASE}] \
+ [-m, --mode ${DISPLAY_MODE}] \
+ [-t, --task ${DATASET_TASK}] \
+ [-n, --show-number ${NUMBER_IMAGES_DISPLAY}] \
+ [-i, --show-interval ${SHOW_INTERRVAL}] \
+ [--cfg-options ${CFG_OPTIONS}]
+```
+
+| ARGS | Type | Description |
+| ------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
+| config | str | (required) Path to the config. |
+| -o, --output-dir | str | If GUI is not available, specifying an output path to save the visualization results. |
+| -p, --phase | str | Phase of dataset to visualize. Use "train", "test" or "val" if you just want to visualize the default split. It's also possible to be a dataset variable name, which might be useful when a dataset split has multiple variants in the config. |
+| -m, --mode | `original`, `transformed`, `pipeline` | Display mode: display original pictures or transformed pictures or comparison pictures.`original` only visualizes the original dataset & annotations; `transformed` shows the resulting images processed through all the transforms; `pipeline` shows all the intermediate images. Defaults to "transformed". |
+| -t, --task | `auto`, `textdet`, `textrecog` | Specify the task type of the dataset. If `auto`, the task type will be inferred from the config. If the script is unable to infer the task type, you need to specify it manually. Defaults to `auto`. |
+| -n, --show-number | int | The number of samples to visualized. If not specified, display all images in the dataset. |
+| -i, --show-interval | float | Interval of visualization (s), defaults to 2. |
+| --cfg-options | float | Override configs.[Example](./config.md#command-line-modification) |
+
+#### Examples
+
+The following example demonstrates how to use the tool to visualize the training data used by the "DBNet_R50_icdar2015" model.
+
+```Bash
+# Example: Visualizing the training data used by dbnet_r50dcn_v2_fpnc_1200e_icadr2015 model
+python tools/visualizations/browse_dataset.py configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py
+```
+
+By default, the visualization mode is "transformed", and you will see the images & annotations being transformed by the pipeline:
+
+
+
+
+
+If you just want to visualize the original dataset, simply set the mode to "original":
+
+```Bash
+python tools/visualizations/browse_dataset.py configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py -m original
+```
+
+
+
+Or, to visualize the entire pipeline:
+
+```Bash
+python tools/visualizations/browse_dataset.py configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py -m pipeline
+```
+
+
+
+In addition, users can also visualize the original images and their corresponding labels of the dataset by specifying the path to the dataset config file, for example:
+
+```Bash
+python tools/visualizations/browse_dataset.py configs/textrecog/_base_/datasets/icdar2015.py
+```
+
+Some datasets might have multiple variants. For example, the test split of `icdar2015` textrecog dataset has two variants, which the [base dataset config](/configs/textrecog/_base_/datasets/icdar2015.py) defines as follows:
+
+```python
+icdar2015_textrecog_test = dict(
+ ann_file='textrecog_test.json',
+ # ...
+ )
+
+icdar2015_1811_textrecog_test = dict(
+ ann_file='textrecog_test_1811.json',
+ # ...
+)
+```
+
+In this case, you can specify the variant name to visualize the corresponding dataset:
+
+```Bash
+python tools/visualizations/browse_dataset.py configs/textrecog/_base_/datasets/icdar2015.py -p icdar2015_1811_textrecog_test
+```
+
+Based on this tool, users can easily verify if the annotation of a custom dataset is correct.
+
+### Hyper-parameter Scheduler Visualization
+
+This tool aims to help the user to check the hyper-parameter scheduler of the optimizer (without training), which support the "learning rate" or "momentum"
+
+#### Introduce the scheduler visualization tool
+
+```bash
+python tools/visualizations/vis_scheduler.py \
+ ${CONFIG_FILE} \
+ [-p, --parameter ${PARAMETER_NAME}] \
+ [-d, --dataset-size ${DATASET_SIZE}] \
+ [-n, --ngpus ${NUM_GPUs}] \
+ [-s, --save-path ${SAVE_PATH}] \
+ [--title ${TITLE}] \
+ [--style ${STYLE}] \
+ [--window-size ${WINDOW_SIZE}] \
+ [--cfg-options]
+```
+
+**Description of all arguments**๏ผ
+
+- `config`: The path of a model config file.
+- **`-p, --parameter`**: The param to visualize its change curve, choose from "lr" and "momentum". Default to use "lr".
+- **`-d, --dataset-size`**: The size of the datasets. If set๏ผ`build_dataset` will be skipped and `${DATASET_SIZE}` will be used as the size. Default to use the function `build_dataset`.
+- **`-n, --ngpus`**: The number of GPUs used in training, default to be 1.
+- **`-s, --save-path`**: The learning rate curve plot save path, default not to save.
+- `--title`: Title of figure. If not set, default to be config file name.
+- `--style`: Style of plt. If not set, default to be `whitegrid`.
+- `--window-size`: The shape of the display window. If not specified, it will be set to `12*7`. If used, it must be in the format `'W*H'`.
+- `--cfg-options`: Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
+
+```{note}
+Loading annotations maybe consume much time, you can directly specify the size of the dataset with `-d, dataset-size` to save time.
+```
+
+#### How to plot the learning rate curve without training
+
+You can use the following command to plot the step learning rate schedule used in the config `configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py`:
+
+```bash
+python tools/visualizations/vis_scheduler.py configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py -d 100
+```
+
+
+
+## Analysis Tools
+
+### Offline Evaluation Tool
+
+For saved prediction results, we provide an offline evaluation script `tools/analysis_tools/offline_eval.py`. The following example demonstrates how to use this tool to evaluate the output of the "PSENet" model offline.
+
+```Bash
+# When running the test script for the first time, you can save the output of the model by specifying the --save-preds parameter
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --save-preds
+# Example: Testing on PSENet
+python tools/test.py configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py epoch_600.pth --save-preds
+
+# Then, using the saved outputs for offline evaluation
+python tools/analysis_tool/offline_eval.py ${CONFIG_FILE} ${PRED_FILE}
+# Example: Offline evaluation of saved PSENet results
+python tools/analysis_tools/offline_eval.py configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py work_dirs/psenet_r50_fpnf_600e_icdar2015/epoch_600.pth_predictions.pkl
+```
+
+`-save-preds` saves the output to `work_dir/CONFIG_NAME/MODEL_NAME_predictions.pkl` by default
+
+In addition, based on this tool, users can also convert predictions obtained from other libraries into MMOCR-supported formats, then use MMOCR's built-in metrics to evaluate them.
+
+| ARGS | Type | Description |
+| ------------- | ----- | ----------------------------------------------------------------- |
+| config | str | (required) Path to the config. |
+| pkl_results | str | (required) The saved predictions. |
+| --cfg-options | float | Override configs.[Example](./config.md#command-line-modification) |
+
+### Calculate FLOPs and the Number of Parameters
+
+We provide a method to calculate the FLOPs and the number of parameters, first we install the dependencies using the following command.
+
+```shell
+pip install fvcore
+```
+
+The usage of the script to calculate FLOPs and the number of parameters is as follows.
+
+```shell
+python tools/analysis_tools/get_flops.py ${config} --shape ${IMAGE_SHAPE}
+```
+
+| ARGS | Type | Description |
+| ------- | ---- | ----------------------------------------------------------------------------------------- |
+| config | str | (required) Path to the config. |
+| --shape | int | Image size to use when calculating FLOPs, such as `--shape 320 320`. Default is `640 640` |
+
+For example, you can run the following command to get FLOPs and the number of parameters of `dbnet_resnet18_fpnc_100k_synthtext.py`:
+
+```shell
+python tools/analysis_tools/get_flops.py configs/textdet/dbnet/dbnet_resnet18_fpnc_100k_synthtext.py --shape 1024 1024
+```
+
+The output is as follows:
+
+```shell
+input shape is (1, 3, 1024, 1024)
+| module | #parameters or shape | #flops |
+| :------------------------ | :------------------- | :------ |
+| model | 12.341M | 63.955G |
+| backbone | 11.177M | 38.159G |
+| backbone.conv1 | 9.408K | 2.466G |
+| backbone.conv1.weight | (64, 3, 7, 7) | |
+| backbone.bn1 | 0.128K | 83.886M |
+| backbone.bn1.weight | (64,) | |
+| backbone.bn1.bias | (64,) | |
+| backbone.layer1 | 0.148M | 9.748G |
+| backbone.layer1.0 | 73.984K | 4.874G |
+| backbone.layer1.1 | 73.984K | 4.874G |
+| backbone.layer2 | 0.526M | 8.642G |
+| backbone.layer2.0 | 0.23M | 3.79G |
+| backbone.layer2.1 | 0.295M | 4.853G |
+| backbone.layer3 | 2.1M | 8.616G |
+| backbone.layer3.0 | 0.919M | 3.774G |
+| backbone.layer3.1 | 1.181M | 4.842G |
+| backbone.layer4 | 8.394M | 8.603G |
+| backbone.layer4.0 | 3.673M | 3.766G |
+| backbone.layer4.1 | 4.721M | 4.837G |
+| neck | 0.836M | 14.887G |
+| neck.lateral_convs | 0.246M | 2.013G |
+| neck.lateral_convs.0.conv | 16.384K | 1.074G |
+| neck.lateral_convs.1.conv | 32.768K | 0.537G |
+| neck.lateral_convs.2.conv | 65.536K | 0.268G |
+| neck.lateral_convs.3.conv | 0.131M | 0.134G |
+| neck.smooth_convs | 0.59M | 12.835G |
+| neck.smooth_convs.0.conv | 0.147M | 9.664G |
+| neck.smooth_convs.1.conv | 0.147M | 2.416G |
+| neck.smooth_convs.2.conv | 0.147M | 0.604G |
+| neck.smooth_convs.3.conv | 0.147M | 0.151G |
+| det_head | 0.329M | 10.909G |
+| det_head.binarize | 0.164M | 10.909G |
+| det_head.binarize.0 | 0.147M | 9.664G |
+| det_head.binarize.1 | 0.128K | 20.972M |
+| det_head.binarize.3 | 16.448K | 1.074G |
+| det_head.binarize.4 | 0.128K | 83.886M |
+| det_head.binarize.6 | 0.257K | 67.109M |
+| det_head.threshold | 0.164M | |
+| det_head.threshold.0 | 0.147M | |
+| det_head.threshold.1 | 0.128K | |
+| det_head.threshold.3 | 16.448K | |
+| det_head.threshold.4 | 0.128K | |
+| det_head.threshold.6 | 0.257K | |
+!!!Please be cautious if you use the results in papers. You may need to check if all ops are supported and verify that the flops computation is correct.
+```
diff --git a/mmocr-dev-1.x/docs/en/user_guides/visualization.md b/mmocr-dev-1.x/docs/en/user_guides/visualization.md
new file mode 100644
index 0000000000000000000000000000000000000000..2ce21cf30fb6798f198206fb41aed17fc61afe3d
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/user_guides/visualization.md
@@ -0,0 +1,107 @@
+# Visualization
+
+Before reading this tutorial, it is recommended to read MMEngine's {external+mmengine:doc}`MMEngine: Visualization ` documentation to get a first glimpse of the `Visualizer` definition and usage.
+
+In brief, the [`Visualizer`](mmengine.visualization.Visualizer) is implemented in MMEngine to meet the daily visualization needs, and contains three main functions:
+
+- Implement common drawing APIs, such as [`draw_bboxes`](mmengine.visualization.Visualizer.draw_bboxes) which implements bounding box drawing functions, [`draw_lines`](mmengine.visualization.Visualizer.draw_lines) implements the line drawing function.
+- Support writing visualization results, learning rate curves, loss function curves, and verification accuracy curves to various backends, including local disks and common deep learning training logging tools such as [TensorBoard](https://www.tensorflow.org/tensorboard) and [Wandb](https://wandb.ai/site).
+- Support calling anywhere in the code to visualize or record intermediate states of the model during training or testing, such as feature maps and validation results.
+
+Based on MMEngine's Visualizer, MMOCR comes with a variety of pre-built visualization tools that can be used by the user by simply modifying the following configuration files.
+
+- The `tools/analysis_tools/browse_dataset.py` script provides a dataset visualization function that draws images and corresponding annotations after Data Transforms, as described in [`browse_dataset.py`](useful_tools.md).
+- MMEngine implements `LoggerHook`, which uses `Visualizer` to write the learning rate, loss and evaluation results to the backend set by `Visualizer`. Therefore, by modifying the `Visualizer` backend in the configuration file, for example to ` TensorBoardVISBackend` or `WandbVISBackend`, you can implement logging to common training logging tools such as `TensorBoard` or `WandB`, thus making it easy for users to use these visualization tools to analyze and monitor the training process.
+- The `VisualizerHook` is implemented in MMOCR, which uses the `Visualizer` to visualize or store the prediction results of the validation or prediction phase into the backend set by the `Visualizer`, so by modifying the `Visualizer` backend in the configuration file, for example, to ` TensorBoardVISBackend` or `WandbVISBackend`, you can implement storing the predicted images to `TensorBoard` or `Wandb`.
+
+## Configuration
+
+Thanks to the use of the registration mechanism, in MMOCR we can set the behavior of the `Visualizer` by modifying the configuration file. Usually, we define the default configuration for the visualizer in `task/_base_/default_runtime.py`, see [configuration tutorial](config.md) for details.
+
+```Python
+vis_backends = [dict(type='LocalVisBackend')]
+visualizer = dict(
+ type='TextxxxLocalVisualizer', # use different visualizers for different tasks
+ vis_backends=vis_backends,
+ name='visualizer')
+```
+
+Based on the above example, we can see that the configuration of `Visualizer` consists of two main parts, namely, the type of `Visualizer` and the visualization backend `vis_backends` it uses.
+
+- For different OCR tasks, various visualizers are pre-configured in MMOCR, including [`TextDetLocalVisualizer`](mmocr.visualization.TextDetLocalVisualizer), [`TextRecogLocalVisualizer`](mmocr.visualization.TextRecogLocalVisualizer), [`TextSpottingLocalVisualizer`](mmocr.visualization.TextSpottingLocalVisualizer) and [`KIELocalVisualizer`](mmocr.visualization.KIELocalVisualizer). These visualizers extend the basic Visulizer API according to the characteristics of their tasks and implement the corresponding tag information interface `add_datasamples`. For example, users can directly use `TextDetLocalVisualizer` to visualize labels or predictions for text detection tasks.
+- MMOCR sets the visualization backend `vis_backend` to the local visualization backend `LocalVisBackend` by default, saving all visualization results and other training information in a local folder.
+
+## Storage
+
+MMOCR uses the local visualization backend [`LocalVisBackend`](mmengine.visualization.LocalVisBackend) by default, and the model loss, learning rate, model evaluation accuracy and visualization The information stored in `VisualizerHook` and `LoggerHook`, including loss, learning rate, evaluation accuracy will be saved to the `{work_dir}/{config_name}/{time}/{vis_data}` folder by default. In addition, MMOCR also supports other common visualization backends, such as `TensorboardVisBackend` and `WandbVisBackend`, and you only need to change the `vis_backends` type in the configuration file to the corresponding visualization backend. For example, you can store data to `TensorBoard` and `Wandb` by simply inserting the following code block into the configuration file.
+
+```Python
+_base_.visualizer.vis_backends = [
+ dict(type='LocalVisBackend'),
+ dict(type='TensorboardVisBackend'),
+ dict(type='WandbVisBackend'),]
+```
+
+## Plot
+
+### Plot the prediction results
+
+MMOCR mainly uses [`VisualizationHook`](mmocr.engine.hooks.VisualizationHook) to plot the prediction results of validation and test, by default `VisualizationHook` is off, and the default configuration is as follows.
+
+```Python
+visualization=dict( # user visualization of validation and test results
+ type='VisualizationHook',
+ enable=False,
+ interval=1,
+ show=False,
+ draw_gt=False,
+ draw_pred=False)
+```
+
+The following table shows the parameters supported by `VisualizationHook`.
+
+| Parameters | Description |
+| :--------: | :-----------------------------------------------------------------------------------------------------------: |
+| enable | The VisualizationHook is turned on and off by the enable parameter, which is the default state. |
+| interval | Controls how much iteration to store or display the results of a val or test if VisualizationHook is enabled. |
+| show | Controls whether to visualize the results of val or test. |
+| draw_gt | Whether the results of val or test are drawn with or without labeling information |
+| draw_pred | whether to draw predictions for val or test results |
+
+If you want to enable `VisualizationHook` related functions and configurations during training or testing, you only need to modify the configuration, take `dbnet_resnet18_fpnc_1200e_icdar2015.py` as an example, draw annotations and predictions at the same time, and display the images, the configuration can be modified as follows
+
+```Python
+visualization = _base_.default_hooks.visualization
+visualization.update(
+ dict(enable=True, show=True, draw_gt=True, draw_pred=True))
+```
+
+
+
+
+
+If you only want to see the predicted result information you can just let `draw_pred=True`
+
+```Python
+visualization = _base_.default_hooks.visualization
+visualization.update(
+ dict(enable=True, show=True, draw_gt=False, draw_pred=True))
+```
+
+
+
+
+
+The `test.py` procedure is further simplified by providing the `--show` and `--show-dir` parameters to visualize the annotation and prediction results during the test without modifying the configuration.
+
+```Shell
+# Show test results
+python tools/test.py configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py dbnet_r18_fpnc_1200e_icdar2015/epoch_400.pth --show
+
+# Specify where to store the prediction results
+python tools/test.py configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py dbnet_r18_fpnc_1200e_icdar2015/epoch_400.pth --show-dir imgs/
+```
+
+
+
+
diff --git a/mmocr-dev-1.x/docs/en/weight_list.py b/mmocr-dev-1.x/docs/en/weight_list.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d7fee5678c4162cadf7ac7e2670821ec29205e6
--- /dev/null
+++ b/mmocr-dev-1.x/docs/en/weight_list.py
@@ -0,0 +1,115 @@
+import os.path as osp
+
+from mmengine.fileio import load
+from tabulate import tabulate
+
+
+class BaseWeightList:
+ """Class for generating model list in markdown format.
+
+ Args:
+ dataset_list (list[str]): List of dataset names.
+ table_header (list[str]): List of table header.
+ msg (str): Message to be displayed.
+ task_abbr (str): Abbreviation of task name.
+ metric_name (str): Metric name.
+ """
+
+ base_url: str = 'https://github.com/open-mmlab/mmocr/blob/1.x/'
+ table_cfg: dict = dict(
+ tablefmt='pipe', floatfmt='.2f', numalign='right', stralign='center')
+ dataset_list: list
+ table_header: list
+ msg: str
+ task_abbr: str
+ metric_name: str
+
+ def __init__(self):
+ data = (d + f' ({self.metric_name})' for d in self.dataset_list)
+ self.table_header = ['Model', 'README', *data]
+
+ def _get_model_info(self, task_name: str):
+ meta_indexes = load('../../model-index.yml')
+ for meta_path in meta_indexes['Import']:
+ meta_path = osp.join('../../', meta_path)
+ metainfo = load(meta_path)
+ collection2md = {}
+ for item in metainfo['Collections']:
+ url = self.base_url + item['README']
+ collection2md[item['Name']] = f'[link]({url})'
+ for item in metainfo['Models']:
+ if task_name not in item['Config']:
+ continue
+ name = f'`{item["Name"]}`'
+ if item.get('Alias', None):
+ if isinstance(item['Alias'], str):
+ item['Alias'] = [item['Alias']]
+ aliases = [f'`{alias}`' for alias in item['Alias']]
+ aliases.append(name)
+ name = ' / '.join(aliases)
+ readme = collection2md[item['In Collection']]
+ eval_res = self._get_eval_res(item)
+ yield (name, readme, *eval_res)
+
+ def _get_eval_res(self, item):
+ eval_res = {k: '-' for k in self.dataset_list}
+ for res in item['Results']:
+ if res['Dataset'] in self.dataset_list:
+ eval_res[res['Dataset']] = res['Metrics'][self.metric_name]
+ return (eval_res[k] for k in self.dataset_list)
+
+ def gen_model_list(self):
+ content = f'\n{self.msg}\n'
+ content += '```{table}\n:class: model-summary nowrap field-list '
+ content += 'table table-hover\n'
+ content += tabulate(
+ self._get_model_info(self.task_abbr), self.table_header,
+ **self.table_cfg)
+ content += '\n```\n'
+ return content
+
+
+class TextDetWeightList(BaseWeightList):
+
+ dataset_list = ['ICDAR2015', 'CTW1500', 'Totaltext']
+ msg = '### Text Detection'
+ task_abbr = 'textdet'
+ metric_name = 'hmean-iou'
+
+
+class TextRecWeightList(BaseWeightList):
+
+ dataset_list = [
+ 'Avg', 'IIIT5K', 'SVT', 'ICDAR2013', 'ICDAR2015', 'SVTP', 'CT80'
+ ]
+ msg = ('### Text Recognition\n'
+ '```{note}\n'
+ 'Avg is the average on IIIT5K, SVT, ICDAR2013, ICDAR2015, SVTP,'
+ ' CT80.\n```\n')
+ task_abbr = 'textrecog'
+ metric_name = 'word_acc'
+
+ def _get_eval_res(self, item):
+ eval_res = {k: '-' for k in self.dataset_list}
+ avg = []
+ for res in item['Results']:
+ if res['Dataset'] in self.dataset_list:
+ eval_res[res['Dataset']] = res['Metrics'][self.metric_name]
+ avg.append(res['Metrics'][self.metric_name])
+ eval_res['Avg'] = sum(avg) / len(avg)
+ return (eval_res[k] for k in self.dataset_list)
+
+
+class KIEWeightList(BaseWeightList):
+
+ dataset_list = ['wildreceipt']
+ msg = '### Key Information Extraction'
+ task_abbr = 'kie'
+ metric_name = 'macro_f1'
+
+
+def gen_weight_list():
+ content = TextDetWeightList().gen_model_list()
+ content += TextRecWeightList().gen_model_list()
+ content += KIEWeightList().gen_model_list()
+ return content
diff --git a/mmocr-dev-1.x/docs/zh_cn/Makefile b/mmocr-dev-1.x/docs/zh_cn/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..d4bb2cbb9eddb1bb1b4f366623044af8e4830919
--- /dev/null
+++ b/mmocr-dev-1.x/docs/zh_cn/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS ?=
+SPHINXBUILD ?= sphinx-build
+SOURCEDIR = .
+BUILDDIR = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+ @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+ @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/mmocr-dev-1.x/docs/zh_cn/_static/css/readthedocs.css b/mmocr-dev-1.x/docs/zh_cn/_static/css/readthedocs.css
new file mode 100644
index 0000000000000000000000000000000000000000..c4736f9dc728b2b0a49fd8e10d759c5d58e506d1
--- /dev/null
+++ b/mmocr-dev-1.x/docs/zh_cn/_static/css/readthedocs.css
@@ -0,0 +1,6 @@
+.header-logo {
+ background-image: url("../images/mmocr.png");
+ background-size: 110px 40px;
+ height: 40px;
+ width: 110px;
+}
diff --git a/mmocr-dev-1.x/docs/zh_cn/_static/images/mmocr.png b/mmocr-dev-1.x/docs/zh_cn/_static/images/mmocr.png
new file mode 100755
index 0000000000000000000000000000000000000000..363e34989e376b23b78ca4c31933542f15ec78ee
Binary files /dev/null and b/mmocr-dev-1.x/docs/zh_cn/_static/images/mmocr.png differ
diff --git a/mmocr-dev-1.x/docs/zh_cn/_static/js/collapsed.js b/mmocr-dev-1.x/docs/zh_cn/_static/js/collapsed.js
new file mode 100644
index 0000000000000000000000000000000000000000..bedebadc0183105fe5c5978fb6e07d4afca2a149
--- /dev/null
+++ b/mmocr-dev-1.x/docs/zh_cn/_static/js/collapsed.js
@@ -0,0 +1 @@
+var collapsedSections = ['MMOCR 0.x ่ฟ็งปๆๅ', 'API ๆๆกฃ']
diff --git a/mmocr-dev-1.x/docs/zh_cn/_static/js/table.js b/mmocr-dev-1.x/docs/zh_cn/_static/js/table.js
new file mode 100644
index 0000000000000000000000000000000000000000..8dacf477f33e81bba3a0c0edc11b135f648b1f0a
--- /dev/null
+++ b/mmocr-dev-1.x/docs/zh_cn/_static/js/table.js
@@ -0,0 +1,31 @@
+$(document).ready(function () {
+ table = $('.model-summary').DataTable({
+ "stateSave": false,
+ "lengthChange": false,
+ "pageLength": 10,
+ "order": [],
+ "scrollX": true,
+ "columnDefs": [
+ { "type": "summary", targets: '_all' },
+ ]
+ });
+ // Override the default sorting for the summary columns, which
+ // never takes the "-" character into account.
+ jQuery.extend(jQuery.fn.dataTableExt.oSort, {
+ "summary-asc": function (str1, str2) {
+ if (str1 == "
+
+## Description
+
+This is an implementation of [ABCNet](https://github.com/aim-uofa/AdelaiDet) based on [MMOCR](https://github.com/open-mmlab/mmocr/tree/dev-1.x), [MMCV](https://github.com/open-mmlab/mmcv), and [MMEngine](https://github.com/open-mmlab/mmengine).
+
+**ABCNet** is a conceptually novel, efficient, and fully convolutional framework for text spotting, which address the problem by proposing the Adaptive Bezier-Curve Network (ABCNet). Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on arbitrarily-shaped benchmark datasets, namely Total-Text and CTW1500, demonstrate that ABCNet achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our realtime version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accuracy.
+
+
+
+
+
+## Usage
+
+
+
+### Prerequisites
+
+- Python 3.7
+- PyTorch 1.6 or higher
+- [MIM](https://github.com/open-mmlab/mim)
+- [MMOCR](https://github.com/open-mmlab/mmocr)
+
+All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `ABCNet/` root directory, run the following line to add the current directory to `PYTHONPATH`:
+
+```shell
+# Linux
+export PYTHONPATH=`pwd`:$PYTHONPATH
+# Windows PowerShell
+$env:PYTHONPATH=Get-Location
+```
+
+if the data is not in `ABCNet/`, you can link the data into `ABCNet/`:
+
+```shell
+# Linux
+ln -s ${DataPath} $PYTHONPATH
+# Windows PowerShell
+New-Item -ItemType SymbolicLink -Path $env:PYTHONPATH -Name data -Target ${DataPath}
+```
+
+### Training commands
+
+In the current directory, run the following command to train the model:
+
+```bash
+mim train mmocr config/abcnet/abcnet_resnet50_fpn_500e_icdar2015.py --work-dir work_dirs/
+```
+
+To train on multiple GPUs, e.g. 8 GPUs, run the following command:
+
+```bash
+mim train mmocr config/abcnet/abcnet_resnet50_fpn_500e_icdar2015.py --work-dir work_dirs/ --launcher pytorch --gpus 8
+```
+
+### Testing commands
+
+In the current directory, run the following command to test the model:
+
+```bash
+mim test mmocr config/abcnet/abcnet_resnet50_fpn_500e_icdar2015.py --work-dir work_dirs/ --checkpoint ${CHECKPOINT_PATH}
+```
+
+## Results
+
+Here we provide the baseline version of ABCNet with ResNet50 backbone.
+
+To find more variants, please visit the [official model zoo](https://github.com/aim-uofa/AdelaiDet/blob/master/configs/BAText/README.md).
+
+| Name | Pretrained Model | E2E-None-Hmean | det-Hmean | Download |
+| :-------------------: | :--------------------------------------------------------------------------------: | :------------: | :-------: | :------------------------------------------------------------------------: |
+| v1-icdar2015-finetune | [SynthText](https://download.openmmlab.com/mmocr/textspotting/abcnet/abcnet_resnet50_fpn_500e_icdar2015/abcnet_resnet50_fpn_pretrain-d060636c.pth) | 0.6127 | 0.8753 | [model](https://download.openmmlab.com/mmocr/textspotting/abcnet/abcnet_resnet50_fpn_500e_icdar2015/abcnet_resnet50_fpn_500e_icdar2015-326ac6f4.pth) \| [log](https://download.openmmlab.com/mmocr/textspotting/abcnet/abcnet_resnet50_fpn_500e_icdar2015/20221210_170401.log) |
+
+## Citation
+
+If you find ABCNet useful in your research or applications, please cite ABCNet with the following BibTeX entry.
+
+```BibTeX
+@inproceedings{liu2020abcnet,
+ title = {{ABCNet}: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network},
+ author = {Liu, Yuliang and Chen, Hao and Shen, Chunhua and He, Tong and Jin, Lianwen and Wang, Liangwei},
+ booktitle = {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
+ year = {2020}
+}
+```
+
+## Checklist
+
+
+
+- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
+
+ - [x] Finish the code
+
+
+
+ - [x] Basic docstrings & proper citation
+
+
+
+ - [x] Test-time correctness
+
+
+
+ - [x] A full README
+
+
+
+- [x] Milestone 2: Indicates a successful model implementation.
+
+ - [x] Training-time correctness
+
+
+
+- [ ] Milestone 3: Good to be a part of our core package!
+
+ - [ ] Type hints and docstrings
+
+
+
+ - [ ] Unit tests
+
+
+
+ - [ ] Code polishing
+
+
+
+ - [ ] Metafile.yml
+
+
+
+- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
+
+
+
+- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
diff --git a/mmocr-dev-1.x/projects/ABCNet/README_V2.md b/mmocr-dev-1.x/projects/ABCNet/README_V2.md
new file mode 100644
index 0000000000000000000000000000000000000000..0e4580446dd61307a785de49742157e56acfc981
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/README_V2.md
@@ -0,0 +1,137 @@
+# ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting
+
+
+
+## Description
+
+This is an implementation of [ABCNetV2](https://github.com/aim-uofa/AdelaiDet) based on [MMOCR](https://github.com/open-mmlab/mmocr/tree/dev-1.x), [MMCV](https://github.com/open-mmlab/mmcv), and [MMEngine](https://github.com/open-mmlab/mmengine).
+
+**ABCNetV2** contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance of arbitrary shapes, significantly improving the precision of recognition over previous methods. 3) Different from previous methods, which often suffer from complex post-processing and sensitive hyper-parameters, our ABCNet v2 maintains a simple pipeline with the only post-processing non-maximum suppression (NMS). 4) As the performance of text recognition closely depends on feature alignment, ABCNet v2 further adopts a simple yet effective coordinate convolution to encode the position of the convolutional filters, which leads to a considerable improvement with negligible computation overhead. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the-art performance while maintaining very high efficiency.
+
+
+
+
+
+## Usage
+
+
+
+### Prerequisites
+
+- Python 3.7
+- PyTorch 1.6 or higher
+- [MIM](https://github.com/open-mmlab/mim)
+- [MMOCR](https://github.com/open-mmlab/mmocr)
+
+All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `ABCNet/` root directory, run the following line to add the current directory to `PYTHONPATH`:
+
+```shell
+# Linux
+export PYTHONPATH=`pwd`:$PYTHONPATH
+# Windows PowerShell
+$env:PYTHONPATH=Get-Location
+```
+
+if the data is not in `ABCNet/`, you can link the data into `ABCNet/`:
+
+```shell
+# Linux
+ln -s ${DataPath} $PYTHONPATH
+# Windows PowerShell
+New-Item -ItemType SymbolicLink -Path $env:PYTHONPATH -Name data -Target ${DataPath}
+```
+
+### Testing commands
+
+In the current directory, run the following command to test the model:
+
+```bash
+mim test mmocr config/abcnet_v2/abcnet-v2_resnet50_bifpn_500e_icdar2015.py --work-dir work_dirs/ --checkpoint ${CHECKPOINT_PATH}
+```
+
+## Results
+
+Here we provide the baseline version of ABCNet with ResNet50 backbone.
+
+To find more variants, please visit the [official model zoo](https://github.com/aim-uofa/AdelaiDet/blob/master/configs/BAText/README.md).
+
+| Name | Pretrained Model | E2E-None-Hmean | det-Hmean | Download |
+| :-------------------: | :--------------: | :------------: | :-------: | :------------------------------------------------------------------------------------------------------------------------------------------: |
+| v2-icdar2015-finetune | SynthText | 0.6628 | 0.8886 | [model](https://download.openmmlab.com/mmocr/textspotting/abcnet-v2/abcnet-v2_resnet50_bifpn/abcnet-v2_resnet50_bifpn_500e_icdar2015-5e4cc7ed.pth) |
+
+## Citation
+
+If you find ABCNetV2 useful in your research or applications, please cite ABCNetV2 with the following BibTeX entry.
+
+```BibTeX
+@ARTICLE{9525302,
+ author={Liu, Yuliang and Shen, Chunhua and Jin, Lianwen and He, Tong and Chen, Peng and Liu, Chongyu and Chen, Hao},
+ journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+ title={ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting},
+ year={2021},
+ volume={},
+ number={},
+ pages={1-1},
+ doi={10.1109/TPAMI.2021.3107437}}
+```
+
+## Checklist
+
+
+
+- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
+
+ - [x] Finish the code
+
+
+
+ - [x] Basic docstrings & proper citation
+
+
+
+ - [x] Test-time correctness
+
+
+
+ - [x] A full README
+
+
+
+- [ ] Milestone 2: Indicates a successful model implementation.
+
+ - [ ] Training-time correctness
+
+
+
+- [ ] Milestone 3: Good to be a part of our core package!
+
+ - [ ] Type hints and docstrings
+
+
+
+ - [ ] Unit tests
+
+
+
+ - [ ] Code polishing
+
+
+
+ - [ ] Metafile.yml
+
+
+
+- [ ] Move your modules into the core package following the codebase's file hierarchy structure.
+
+
+
+- [ ] Refactor your modules into the core package following the codebase's file hierarchy structure.
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/__init__.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..79dc69bc8e1ce59b7396a733009f8df5d964722f
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# Copyright (c) OpenMMLab. All rights reserved.
+from .metric import * # NOQA
+from .model import * # NOQA
+from .utils import * # NOQA
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/metric/__init__.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/metric/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..cbf8e944556c9c3c699958e54a9cb0e20fbe3134
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/metric/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .e2e_hmean_iou_metric import E2EHmeanIOUMetric
+
+__all__ = ['E2EHmeanIOUMetric']
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/metric/e2e_hmean_iou_metric.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/metric/e2e_hmean_iou_metric.py
new file mode 100644
index 0000000000000000000000000000000000000000..bdab4375e41e3ff8051ebc1b873842ebc04f1e44
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/metric/e2e_hmean_iou_metric.py
@@ -0,0 +1,370 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Optional, Sequence
+
+import numpy as np
+import torch
+from mmengine.evaluator import BaseMetric
+from mmengine.logging import MMLogger
+from scipy.sparse import csr_matrix
+from scipy.sparse.csgraph import maximum_bipartite_matching
+from shapely.geometry import Polygon
+
+from mmocr.evaluation.functional import compute_hmean
+from mmocr.registry import METRICS
+from mmocr.utils import poly_intersection, poly_iou, polys2shapely
+
+
+@METRICS.register_module()
+class E2EHmeanIOUMetric(BaseMetric):
+ # TODO docstring
+ """HmeanIOU metric.
+
+ This method computes the hmean iou metric, which is done in the
+ following steps:
+
+ - Filter the prediction polygon:
+
+ - Scores is smaller than minimum prediction score threshold.
+ - The proportion of the area that intersects with gt ignored polygon is
+ greater than ignore_precision_thr.
+
+ - Computing an M x N IoU matrix, where each element indexing
+ E_mn represents the IoU between the m-th valid GT and n-th valid
+ prediction.
+ - Based on different prediction score threshold:
+ - Obtain the ignored predictions according to prediction score.
+ The filtered predictions will not be involved in the later metric
+ computations.
+ - Based on the IoU matrix, get the match metric according to
+ ``match_iou_thr``.
+ - Based on different `strategy`, accumulate the match number.
+ - calculate H-mean under different prediction score threshold.
+
+ Args:
+ match_iou_thr (float): IoU threshold for a match. Defaults to 0.5.
+ ignore_precision_thr (float): Precision threshold when prediction and\
+ gt ignored polygons are matched. Defaults to 0.5.
+ pred_score_thrs (dict): Best prediction score threshold searching
+ space. Defaults to dict(start=0.3, stop=0.9, step=0.1).
+ strategy (str): Polygon matching strategy. Options are 'max_matching'
+ and 'vanilla'. 'max_matching' refers to the optimum strategy that
+ maximizes the number of matches. Vanilla strategy matches gt and
+ pred polygons if both of them are never matched before. It was used
+ in MMOCR 0.x and and academia. Defaults to 'vanilla'.
+ collect_device (str): Device name used for collecting results from
+ different ranks during distributed training. Must be 'cpu' or
+ 'gpu'. Defaults to 'cpu'.
+ prefix (str, optional): The prefix that will be added in the metric
+ names to disambiguate homonymous metrics of different evaluators.
+ If prefix is not provided in the argument, self.default_prefix
+ will be used instead. Defaults to None
+ """
+ default_prefix: Optional[str] = 'e2e_icdar'
+
+ def __init__(self,
+ match_iou_thr: float = 0.5,
+ ignore_precision_thr: float = 0.5,
+ pred_score_thrs: Dict = dict(start=0.3, stop=0.9, step=0.1),
+ lexicon_path: Optional[str] = None,
+ word_spotting: bool = False,
+ min_length_case_word: int = 3,
+ special_characters: str = "'!?.:,*\"()ยท[]/",
+ strategy: str = 'vanilla',
+ collect_device: str = 'cpu',
+ prefix: Optional[str] = None) -> None:
+ super().__init__(collect_device=collect_device, prefix=prefix)
+ self.match_iou_thr = match_iou_thr
+ self.ignore_precision_thr = ignore_precision_thr
+ self.pred_score_thrs = np.arange(**pred_score_thrs)
+ self.word_spotting = word_spotting
+ self.min_length_case_word = min_length_case_word
+ self.special_characters = special_characters
+ assert strategy in ['max_matching', 'vanilla']
+ self.strategy = strategy
+
+ def process(self, data_batch: Sequence[Dict],
+ data_samples: Sequence[Dict]) -> None:
+ """Process one batch of data samples and predictions. The processed
+ results should be stored in ``self.results``, which will be used to
+ compute the metrics when all batches have been processed.
+
+ Args:
+ data_batch (Sequence[Dict]): A batch of data from dataloader.
+ data_samples (Sequence[Dict]): A batch of outputs from
+ the model.
+ """
+ for data_sample in data_samples:
+
+ pred_instances = data_sample.get('pred_instances')
+ pred_polygons = pred_instances.get('polygons')
+ pred_scores = pred_instances.get('scores')
+ if isinstance(pred_scores, torch.Tensor):
+ pred_scores = pred_scores.cpu().numpy()
+ pred_scores = np.array(pred_scores, dtype=np.float32)
+ pred_texts = pred_instances.get('texts')
+
+ gt_instances = data_sample.get('gt_instances')
+ gt_polys = gt_instances.get('polygons')
+ gt_ignore_flags = gt_instances.get('ignored')
+ gt_texts = gt_instances.get('texts')
+ if isinstance(gt_ignore_flags, torch.Tensor):
+ gt_ignore_flags = gt_ignore_flags.cpu().numpy()
+ gt_polys = polys2shapely(gt_polys)
+ pred_polys = polys2shapely(pred_polygons)
+ if self.word_spotting:
+ gt_ignore_flags, gt_texts = self._word_spotting_filter(
+ gt_ignore_flags, gt_texts)
+ pred_ignore_flags = self._filter_preds(pred_polys, gt_polys,
+ pred_scores,
+ gt_ignore_flags)
+ pred_indexes = self._true_indexes(~pred_ignore_flags)
+ gt_indexes = self._true_indexes(~gt_ignore_flags)
+ pred_texts = [pred_texts[i] for i in pred_indexes]
+ gt_texts = [gt_texts[i] for i in gt_indexes]
+
+ gt_num = np.sum(~gt_ignore_flags)
+ pred_num = np.sum(~pred_ignore_flags)
+ iou_metric = np.zeros([gt_num, pred_num])
+
+ # Compute IoU scores amongst kept pred and gt polygons
+ for pred_mat_id, pred_poly_id in enumerate(pred_indexes):
+ for gt_mat_id, gt_poly_id in enumerate(gt_indexes):
+ iou_metric[gt_mat_id, pred_mat_id] = poly_iou(
+ gt_polys[gt_poly_id], pred_polys[pred_poly_id])
+
+ result = dict(
+ gt_texts=gt_texts,
+ pred_texts=pred_texts,
+ iou_metric=iou_metric,
+ pred_scores=pred_scores[~pred_ignore_flags])
+ self.results.append(result)
+
+ def compute_metrics(self, results: List[Dict]) -> Dict:
+ """Compute the metrics from processed results.
+
+ Args:
+ results (list[dict]): The processed results of each batch.
+
+ Returns:
+ dict: The computed metrics. The keys are the names of the metrics,
+ and the values are corresponding results.
+ """
+ logger: MMLogger = MMLogger.get_current_instance()
+
+ best_eval_results = dict(hmean=-1)
+ logger.info('Evaluating hmean-iou...')
+
+ dataset_pred_num = np.zeros_like(self.pred_score_thrs)
+ dataset_hit_num = np.zeros_like(self.pred_score_thrs)
+ dataset_gt_num = 0
+ for result in results:
+ iou_metric = result['iou_metric'] # (gt_num, pred_num)
+ pred_scores = result['pred_scores'] # (pred_num)
+ gt_texts = result['gt_texts']
+ pred_texts = result['pred_texts']
+ dataset_gt_num += iou_metric.shape[0]
+
+ # Filter out predictions by IoU threshold
+ for i, pred_score_thr in enumerate(self.pred_score_thrs):
+ pred_ignore_flags = pred_scores < pred_score_thr
+ # get the number of matched boxes
+ pred_texts = [
+ pred_texts[j]
+ for j in self._true_indexes(~pred_ignore_flags)
+ ]
+ matched_metric = iou_metric[:, ~pred_ignore_flags] \
+ > self.match_iou_thr
+ if self.strategy == 'max_matching':
+ csr_matched_metric = csr_matrix(matched_metric)
+ matched_preds = maximum_bipartite_matching(
+ csr_matched_metric, perm_type='row')
+ # -1 denotes unmatched pred polygons
+ dataset_hit_num[i] += np.sum(matched_preds != -1)
+ else:
+ # first come first matched
+ matched_gt_indexes = set()
+ matched_pred_indexes = set()
+ matched_e2e_gt_indexes = set()
+ for gt_idx, pred_idx in zip(*np.nonzero(matched_metric)):
+ if gt_idx in matched_gt_indexes or \
+ pred_idx in matched_pred_indexes:
+ continue
+ matched_gt_indexes.add(gt_idx)
+ matched_pred_indexes.add(pred_idx)
+ if self.word_spotting:
+ if gt_texts[gt_idx] == pred_texts[pred_idx]:
+ matched_e2e_gt_indexes.add(gt_idx)
+ else:
+ if self.text_match(gt_texts[gt_idx].upper(),
+ pred_texts[pred_idx].upper()):
+ matched_e2e_gt_indexes.add(gt_idx)
+ dataset_hit_num[i] += len(matched_e2e_gt_indexes)
+ dataset_pred_num[i] += np.sum(~pred_ignore_flags)
+
+ for i, pred_score_thr in enumerate(self.pred_score_thrs):
+ recall, precision, hmean = compute_hmean(
+ int(dataset_hit_num[i]), int(dataset_hit_num[i]),
+ int(dataset_gt_num), int(dataset_pred_num[i]))
+ eval_results = dict(
+ precision=precision, recall=recall, hmean=hmean)
+ logger.info(f'prediction score threshold: {pred_score_thr:.2f}, '
+ f'recall: {eval_results["recall"]:.4f}, '
+ f'precision: {eval_results["precision"]:.4f}, '
+ f'hmean: {eval_results["hmean"]:.4f}\n')
+ if eval_results['hmean'] > best_eval_results['hmean']:
+ best_eval_results = eval_results
+ return best_eval_results
+
+ def _filter_preds(self, pred_polys: List[Polygon], gt_polys: List[Polygon],
+ pred_scores: List[float],
+ gt_ignore_flags: np.ndarray) -> np.ndarray:
+ """Filter out the predictions by score threshold and whether it
+ overlaps ignored gt polygons.
+
+ Args:
+ pred_polys (list[Polygon]): Pred polygons.
+ gt_polys (list[Polygon]): GT polygons.
+ pred_scores (list[float]): Pred scores of polygons.
+ gt_ignore_flags (np.ndarray): 1D boolean array indicating
+ the positions of ignored gt polygons.
+
+ Returns:
+ np.ndarray: 1D boolean array indicating the positions of ignored
+ pred polygons.
+ """
+
+ # Filter out predictions based on the minimum score threshold
+ pred_ignore_flags = pred_scores < self.pred_score_thrs.min()
+ pred_indexes = self._true_indexes(~pred_ignore_flags)
+ gt_indexes = self._true_indexes(gt_ignore_flags)
+ # Filter out pred polygons which overlaps any ignored gt polygons
+ for pred_id in pred_indexes:
+ for gt_id in gt_indexes:
+ # Match pred with ignored gt
+ precision = poly_intersection(
+ gt_polys[gt_id], pred_polys[pred_id]) / (
+ pred_polys[pred_id].area + 1e-5)
+ if precision > self.ignore_precision_thr:
+ pred_ignore_flags[pred_id] = True
+ break
+
+ return pred_ignore_flags
+
+ def _true_indexes(self, array: np.ndarray) -> np.ndarray:
+ """Get indexes of True elements from a 1D boolean array."""
+ return np.where(array)[0]
+
+ def _include_in_dictionary(self, text):
+ """Function used in Word Spotting that finds if the Ground Truth text
+ meets the rules to enter into the dictionary.
+
+ If not, the text will be cared as don't care
+ """
+ # special case 's at final
+ if text[len(text) - 2:] == "'s" or text[len(text) - 2:] == "'S":
+ text = text[0:len(text) - 2]
+
+ # hyphens at init or final of the word
+ text = text.strip('-')
+
+ for character in self.special_characters:
+ text = text.replace(character, ' ')
+
+ text = text.strip()
+
+ if len(text) != len(text.replace(' ', '')):
+ return False
+
+ if len(text) < self.min_length_case_word:
+ return False
+
+ notAllowed = 'รรทฮ'
+
+ range1 = [ord(u'a'), ord(u'z')]
+ range2 = [ord(u'A'), ord(u'Z')]
+ range3 = [ord(u'ร'), ord(u'ฦฟ')]
+ range4 = [ord(u'ว'), ord(u'ษฟ')]
+ range5 = [ord(u'ฮ'), ord(u'ฯฟ')]
+ range6 = [ord(u'-'), ord(u'-')]
+
+ for char in text:
+ charCode = ord(char)
+ if (notAllowed.find(char) != -1):
+ return False
+ # TODO: optimize it with for loop
+ valid = (charCode >= range1[0] and charCode <= range1[1]) or (
+ charCode >= range2[0] and charCode <= range2[1]
+ ) or (charCode >= range3[0] and charCode <= range3[1]) or (
+ charCode >= range4[0] and charCode <= range4[1]) or (
+ charCode >= range5[0]
+ and charCode <= range5[1]) or (charCode >= range6[0]
+ and charCode <= range6[1])
+ if not valid:
+ return False
+
+ return True
+
+ def _include_in_dictionary_text(self, text):
+ """Function applied to the Ground Truth texts used in Word Spotting.
+
+ It removes special characters or terminations
+ """
+ # special case 's at final
+ if text[len(text) - 2:] == "'s" or text[len(text) - 2:] == "'S":
+ text = text[0:len(text) - 2]
+
+ # hyphens at init or final of the word
+ text = text.strip('-')
+
+ for character in self.special_characters:
+ text = text.replace(character, ' ')
+
+ text = text.strip()
+
+ return text
+
+ def text_match(self,
+ gt_text,
+ pred_text,
+ only_remove_first_end_character=True):
+
+ if only_remove_first_end_character:
+ # special characters in GT are allowed only at initial or final
+ # position
+ if (gt_text == pred_text):
+ return True
+
+ if self.special_characters.find(gt_text[0]) > -1:
+ if gt_text[1:] == pred_text:
+ return True
+
+ if self.special_characters.find(gt_text[-1]) > -1:
+ if gt_text[0:len(gt_text) - 1] == pred_text:
+ return True
+
+ if self.special_characters.find(
+ gt_text[0]) > -1 and self.special_characters.find(
+ gt_text[-1]) > -1:
+ if gt_text[1:len(gt_text) - 1] == pred_text:
+ return True
+ return False
+ else:
+ # Special characters are removed from the beginning and the end of
+ # both Detection and GroundTruth
+ while len(gt_text) > 0 and self.special_characters.find(
+ gt_text[0]) > -1:
+ gt_text = gt_text[1:]
+
+ while len(pred_text) > 0 and self.special_characters.find(
+ pred_text[0]) > -1:
+ pred_text = pred_text[1:]
+
+ while len(gt_text) > 0 and self.special_characters.find(
+ gt_text[-1]) > -1:
+ gt_text = gt_text[0:len(gt_text) - 1]
+
+ while len(pred_text) > 0 and self.special_characters.find(
+ pred_text[-1]) > -1:
+ pred_text = pred_text[0:len(pred_text) - 1]
+
+ return gt_text == pred_text
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/__init__.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..f22d9b4f1f7999226030438b99569c14c6f2c4a8
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/__init__.py
@@ -0,0 +1,21 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .abcnet import ABCNet
+from .abcnet_det_head import ABCNetDetHead
+from .abcnet_det_module_loss import ABCNetDetModuleLoss
+from .abcnet_det_postprocessor import ABCNetDetPostprocessor
+from .abcnet_postprocessor import ABCNetPostprocessor
+from .abcnet_rec import ABCNetRec
+from .abcnet_rec_backbone import ABCNetRecBackbone
+from .abcnet_rec_decoder import ABCNetRecDecoder
+from .abcnet_rec_encoder import ABCNetRecEncoder
+from .bezier_roi_extractor import BezierRoIExtractor
+from .bifpn import BiFPN
+from .coordinate_head import CoordinateHead
+from .rec_roi_head import RecRoIHead
+
+__all__ = [
+ 'ABCNetDetHead', 'ABCNetDetPostprocessor', 'ABCNetRecBackbone',
+ 'ABCNetRecDecoder', 'ABCNetRecEncoder', 'ABCNet', 'ABCNetRec',
+ 'BezierRoIExtractor', 'RecRoIHead', 'ABCNetPostprocessor',
+ 'ABCNetDetModuleLoss', 'BiFPN', 'CoordinateHead'
+]
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..7a341226ac91cf4d9154bea8080a0ba6f808235f
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmocr.registry import MODELS
+from .two_stage_text_spotting import TwoStageTextSpotter
+
+
+@MODELS.register_module()
+class ABCNet(TwoStageTextSpotter):
+ """CTC-loss based recognizer."""
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_head.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..4eb45d905e56f937bfcfd2a6e74831468bbc2373
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_head.py
@@ -0,0 +1,197 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch.nn as nn
+from mmcv.cnn import ConvModule, Scale
+from mmdet.models.utils import multi_apply
+
+from mmocr.models.textdet.heads.base import BaseTextDetHead
+from mmocr.registry import MODELS
+
+INF = 1e8
+
+
+@MODELS.register_module()
+class ABCNetDetHead(BaseTextDetHead):
+
+ def __init__(self,
+ in_channels,
+ module_loss=dict(type='ABCNetLoss'),
+ postprocessor=dict(type='ABCNetDetPostprocessor'),
+ num_classes=1,
+ strides=(4, 8, 16, 32, 64),
+ feat_channels=256,
+ stacked_convs=4,
+ dcn_on_last_conv=False,
+ conv_bias='auto',
+ norm_on_bbox=False,
+ centerness_on_reg=False,
+ use_sigmoid_cls=True,
+ with_bezier=False,
+ use_scale=False,
+ conv_cfg=None,
+ norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
+ init_cfg=dict(
+ type='Normal',
+ layer='Conv2d',
+ std=0.01,
+ override=dict(
+ type='Normal',
+ name='conv_cls',
+ std=0.01,
+ bias_prob=0.01))):
+ super().__init__(
+ module_loss=module_loss,
+ postprocessor=postprocessor,
+ init_cfg=init_cfg)
+ self.num_classes = num_classes
+ self.in_channels = in_channels
+ self.strides = strides
+ self.feat_channels = feat_channels
+ self.stacked_convs = stacked_convs
+ self.dcn_on_last_conv = dcn_on_last_conv
+ assert conv_bias == 'auto' or isinstance(conv_bias, bool)
+ self.conv_bias = conv_bias
+ self.norm_on_bbox = norm_on_bbox
+ self.centerness_on_reg = centerness_on_reg
+ self.conv_cfg = conv_cfg
+ self.norm_cfg = norm_cfg
+ self.with_bezier = with_bezier
+ self.use_scale = use_scale
+ self.use_sigmoid_cls = use_sigmoid_cls
+ if self.use_sigmoid_cls:
+ self.cls_out_channels = num_classes
+ else:
+ self.cls_out_channels = num_classes + 1
+
+ self._init_layers()
+
+ def _init_layers(self):
+ """Initialize layers of the head."""
+ self._init_cls_convs()
+ self._init_reg_convs()
+ self._init_predictor()
+ self.conv_centerness = nn.Conv2d(self.feat_channels, 1, 3, padding=1)
+ # if self.use_scale:
+ self.scales = nn.ModuleList([Scale(1.0) for _ in self.strides])
+
+ def _init_cls_convs(self):
+ """Initialize classification conv layers of the head."""
+ self.cls_convs = nn.ModuleList()
+ for i in range(self.stacked_convs):
+ chn = self.in_channels if i == 0 else self.feat_channels
+ if self.dcn_on_last_conv and i == self.stacked_convs - 1:
+ conv_cfg = dict(type='DCNv2')
+ else:
+ conv_cfg = self.conv_cfg
+ self.cls_convs.append(
+ ConvModule(
+ chn,
+ self.feat_channels,
+ 3,
+ stride=1,
+ padding=1,
+ conv_cfg=conv_cfg,
+ norm_cfg=self.norm_cfg,
+ bias=self.conv_bias))
+
+ def _init_reg_convs(self):
+ """Initialize bbox regression conv layers of the head."""
+ self.reg_convs = nn.ModuleList()
+ for i in range(self.stacked_convs):
+ chn = self.in_channels if i == 0 else self.feat_channels
+ if self.dcn_on_last_conv and i == self.stacked_convs - 1:
+ conv_cfg = dict(type='DCNv2')
+ else:
+ conv_cfg = self.conv_cfg
+ self.reg_convs.append(
+ ConvModule(
+ chn,
+ self.feat_channels,
+ 3,
+ stride=1,
+ padding=1,
+ conv_cfg=conv_cfg,
+ norm_cfg=self.norm_cfg,
+ bias=self.conv_bias))
+
+ def _init_predictor(self):
+ """Initialize predictor layers of the head."""
+ self.conv_cls = nn.Conv2d(
+ self.feat_channels, self.cls_out_channels, 3, padding=1)
+ self.conv_reg = nn.Conv2d(self.feat_channels, 4, 3, padding=1)
+ if self.with_bezier:
+ self.conv_bezier = nn.Conv2d(
+ self.feat_channels, 16, kernel_size=3, stride=1, padding=1)
+
+ def forward(self, feats, data_samples=None):
+ """Forward features from the upstream network.
+
+ Args:
+ feats (tuple[Tensor]): Features from the upstream network, each is
+ a 4D-tensor.
+
+ Returns:
+ tuple:
+ cls_scores (list[Tensor]): Box scores for each scale level, \
+ each is a 4D-tensor, the channel number is \
+ num_points * num_classes.
+ bbox_preds (list[Tensor]): Box energies / deltas for each \
+ scale level, each is a 4D-tensor, the channel number is \
+ num_points * 4.
+ centernesses (list[Tensor]): centerness for each scale level, \
+ each is a 4D-tensor, the channel number is num_points * 1.
+ """
+
+ return multi_apply(self.forward_single, feats[1:], self.scales,
+ self.strides)
+
+ def forward_single(self, x, scale, stride):
+ """Forward features of a single scale level.
+
+ Args:
+ x (Tensor): FPN feature maps of the specified stride.
+ scale (:obj: `mmcv.cnn.Scale`): Learnable scale module to resize
+ the bbox prediction.
+ stride (int): The corresponding stride for feature maps, only
+ used to normalize the bbox prediction when self.norm_on_bbox
+ is True.
+
+ Returns:
+ tuple: scores for each class, bbox predictions and centerness \
+ predictions of input feature maps. If ``with_bezier`` is True,
+ Bezier prediction will also be returned.
+ """
+ cls_feat = x
+ reg_feat = x
+
+ for cls_layer in self.cls_convs:
+ cls_feat = cls_layer(cls_feat)
+ cls_score = self.conv_cls(cls_feat)
+
+ for reg_layer in self.reg_convs:
+ reg_feat = reg_layer(reg_feat)
+ bbox_pred = self.conv_reg(reg_feat)
+ if self.with_bezier:
+ bezier_pred = self.conv_bezier(reg_feat)
+
+ if self.centerness_on_reg:
+ centerness = self.conv_centerness(reg_feat)
+ else:
+ centerness = self.conv_centerness(cls_feat)
+ # scale the bbox_pred of different level
+ # float to avoid overflow when enabling FP16
+ if self.use_scale:
+ bbox_pred = scale(bbox_pred).float()
+ else:
+ bbox_pred = bbox_pred.float()
+ if self.norm_on_bbox:
+ # bbox_pred needed for gradient computation has been modified
+ # by F.relu(bbox_pred) when run with PyTorch 1.10. So replace
+ # F.relu(bbox_pred) with bbox_pred.clamp(min=0)
+ bbox_pred = bbox_pred.clamp(min=0)
+ else:
+ bbox_pred = bbox_pred.exp()
+
+ if self.with_bezier:
+ return cls_score, bbox_pred, centerness, bezier_pred
+ else:
+ return cls_score, bbox_pred, centerness
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_module_loss.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_module_loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..a8becc48dd381ec37c5a178508c6af5fec47337b
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_module_loss.py
@@ -0,0 +1,359 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, List, Tuple
+
+import torch
+from mmdet.models.task_modules.prior_generators import MlvlPointGenerator
+from mmdet.models.utils import multi_apply
+from mmdet.utils import reduce_mean
+from torch import Tensor
+
+from mmocr.models.textdet.module_losses.base import BaseTextDetModuleLoss
+from mmocr.registry import MODELS, TASK_UTILS
+from mmocr.structures import TextDetDataSample
+from mmocr.utils import ConfigType, DetSampleList, RangeType
+from ..utils import poly2bezier
+
+INF = 1e8
+
+
+@MODELS.register_module()
+class ABCNetDetModuleLoss(BaseTextDetModuleLoss):
+ # TODO add docs
+
+ def __init__(
+ self,
+ num_classes: int = 1,
+ bbox_coder: ConfigType = dict(type='mmdet.DistancePointBBoxCoder'),
+ regress_ranges: RangeType = ((-1, 64), (64, 128), (128, 256),
+ (256, 512), (512, INF)),
+ strides: List[int] = (8, 16, 32, 64, 128),
+ center_sampling: bool = True,
+ center_sample_radius: float = 1.5,
+ norm_on_bbox: bool = True,
+ loss_cls: ConfigType = dict(
+ type='mmdet.FocalLoss',
+ use_sigmoid=True,
+ gamma=2.0,
+ alpha=0.25,
+ loss_weight=1.0),
+ loss_bbox: ConfigType = dict(type='mmdet.GIoULoss', loss_weight=1.0),
+ loss_centerness: ConfigType = dict(
+ type='mmdet.CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
+ loss_bezier: ConfigType = dict(
+ type='mmdet.SmoothL1Loss', reduction='mean', loss_weight=1.0)
+ ) -> None:
+ super().__init__()
+ self.num_classes = num_classes
+ self.strides = strides
+ self.prior_generator = MlvlPointGenerator(strides)
+ self.regress_ranges = regress_ranges
+ self.center_sampling = center_sampling
+ self.center_sample_radius = center_sample_radius
+ self.norm_on_bbox = norm_on_bbox
+ self.loss_centerness = MODELS.build(loss_centerness)
+ self.loss_cls = MODELS.build(loss_cls)
+ self.loss_bbox = MODELS.build(loss_bbox)
+ self.loss_bezier = MODELS.build(loss_bezier)
+ self.bbox_coder = TASK_UTILS.build(bbox_coder)
+ use_sigmoid_cls = loss_cls.get('use_sigmoid', False)
+ if use_sigmoid_cls:
+ self.cls_out_channels = num_classes
+ else:
+ self.cls_out_channels = num_classes + 1
+
+ def forward(self, inputs: Tuple[Tensor],
+ data_samples: DetSampleList) -> Dict:
+ """Compute ABCNet loss.
+
+ Args:
+ inputs (tuple(tensor)): Raw predictions from model, containing
+ ``cls_scores``, ``bbox_preds``, ``beizer_preds`` and
+ ``centernesses``.
+ Each is a tensor of shape :math:`(N, H, W)`.
+ data_samples (list[TextDetDataSample]): The data samples.
+
+ Returns:
+ dict: The dict for abcnet-det losses with loss_cls, loss_bbox,
+ loss_centerness and loss_bezier.
+ """
+ cls_scores, bbox_preds, centernesses, beizer_preds = inputs
+ assert len(cls_scores) == len(bbox_preds) == len(centernesses) == len(
+ beizer_preds)
+ featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
+ all_level_points = self.prior_generator.grid_priors(
+ featmap_sizes,
+ dtype=bbox_preds[0].dtype,
+ device=bbox_preds[0].device)
+ labels, bbox_targets, bezier_targets = self.get_targets(
+ all_level_points, data_samples)
+
+ num_imgs = cls_scores[0].size(0)
+ # flatten cls_scores, bbox_preds and centerness
+ flatten_cls_scores = [
+ cls_score.permute(0, 2, 3, 1).reshape(-1, self.cls_out_channels)
+ for cls_score in cls_scores
+ ]
+ flatten_bbox_preds = [
+ bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)
+ for bbox_pred in bbox_preds
+ ]
+ flatten_centerness = [
+ centerness.permute(0, 2, 3, 1).reshape(-1)
+ for centerness in centernesses
+ ]
+ flatten_bezier_preds = [
+ bezier_pred.permute(0, 2, 3, 1).reshape(-1, 16)
+ for bezier_pred in beizer_preds
+ ]
+ flatten_cls_scores = torch.cat(flatten_cls_scores)
+ flatten_bbox_preds = torch.cat(flatten_bbox_preds)
+ flatten_centerness = torch.cat(flatten_centerness)
+ flatten_bezier_preds = torch.cat(flatten_bezier_preds)
+ flatten_labels = torch.cat(labels)
+ flatten_bbox_targets = torch.cat(bbox_targets)
+ flatten_bezier_targets = torch.cat(bezier_targets)
+ # repeat points to align with bbox_preds
+ flatten_points = torch.cat(
+ [points.repeat(num_imgs, 1) for points in all_level_points])
+
+ # FG cat_id: [0, num_classes -1], BG cat_id: num_classes
+ bg_class_ind = self.num_classes
+ pos_inds = ((flatten_labels >= 0)
+ & (flatten_labels < bg_class_ind)).nonzero().reshape(-1)
+ num_pos = torch.tensor(
+ len(pos_inds), dtype=torch.float, device=bbox_preds[0].device)
+ num_pos = max(reduce_mean(num_pos), 1.0)
+ loss_cls = self.loss_cls(
+ flatten_cls_scores, flatten_labels, avg_factor=num_pos)
+
+ pos_bbox_preds = flatten_bbox_preds[pos_inds]
+ pos_centerness = flatten_centerness[pos_inds]
+ pos_bezier_preds = flatten_bezier_preds[pos_inds]
+ pos_bbox_targets = flatten_bbox_targets[pos_inds]
+ pos_centerness_targets = self.centerness_target(pos_bbox_targets)
+ pos_bezier_targets = flatten_bezier_targets[pos_inds]
+ # centerness weighted iou loss
+ centerness_denorm = max(
+ reduce_mean(pos_centerness_targets.sum().detach()), 1e-6)
+
+ if len(pos_inds) > 0:
+ pos_points = flatten_points[pos_inds]
+ pos_decoded_bbox_preds = self.bbox_coder.decode(
+ pos_points, pos_bbox_preds)
+ pos_decoded_target_preds = self.bbox_coder.decode(
+ pos_points, pos_bbox_targets)
+ loss_bbox = self.loss_bbox(
+ pos_decoded_bbox_preds,
+ pos_decoded_target_preds,
+ weight=pos_centerness_targets,
+ avg_factor=centerness_denorm)
+ loss_centerness = self.loss_centerness(
+ pos_centerness, pos_centerness_targets, avg_factor=num_pos)
+ loss_bezier = self.loss_bezier(
+ pos_bezier_preds,
+ pos_bezier_targets,
+ weight=pos_centerness_targets[:, None],
+ avg_factor=centerness_denorm)
+ else:
+ loss_bbox = pos_bbox_preds.sum()
+ loss_centerness = pos_centerness.sum()
+ loss_bezier = pos_bezier_preds.sum()
+
+ return dict(
+ loss_cls=loss_cls,
+ loss_bbox=loss_bbox,
+ loss_centerness=loss_centerness,
+ loss_bezier=loss_bezier)
+
+ def get_targets(self, points: List[Tensor], data_samples: DetSampleList
+ ) -> Tuple[List[Tensor], List[Tensor]]:
+ """Compute regression, classification and centerness targets for points
+ in multiple images.
+
+ Args:
+ points (list[Tensor]): Points of each fpn level, each has shape
+ (num_points, 2).
+ data_samples: Batch of data samples. Each data sample contains
+ a gt_instance, which usually includes bboxes and labels
+ attributes.
+
+ Returns:
+ tuple: Targets of each level.
+
+ - concat_lvl_labels (list[Tensor]): Labels of each level.
+ - concat_lvl_bbox_targets (list[Tensor]): BBox targets of each \
+ level.
+ """
+ assert len(points) == len(self.regress_ranges)
+ num_levels = len(points)
+ # expand regress ranges to align with points
+ expanded_regress_ranges = [
+ points[i].new_tensor(self.regress_ranges[i])[None].expand_as(
+ points[i]) for i in range(num_levels)
+ ]
+ # concat all levels points and regress ranges
+ concat_regress_ranges = torch.cat(expanded_regress_ranges, dim=0)
+ concat_points = torch.cat(points, dim=0)
+
+ # the number of points per img, per lvl
+ num_points = [center.size(0) for center in points]
+
+ # get labels and bbox_targets of each image
+ labels_list, bbox_targets_list, bezier_targets_list = multi_apply(
+ self._get_targets_single,
+ data_samples,
+ points=concat_points,
+ regress_ranges=concat_regress_ranges,
+ num_points_per_lvl=num_points)
+
+ # split to per img, per level
+ labels_list = [labels.split(num_points, 0) for labels in labels_list]
+ bbox_targets_list = [
+ bbox_targets.split(num_points, 0)
+ for bbox_targets in bbox_targets_list
+ ]
+ bezier_targets_list = [
+ bezier_targets.split(num_points, 0)
+ for bezier_targets in bezier_targets_list
+ ]
+ # concat per level image
+ concat_lvl_labels = []
+ concat_lvl_bbox_targets = []
+ concat_lvl_bezier_targets = []
+ for i in range(num_levels):
+ concat_lvl_labels.append(
+ torch.cat([labels[i] for labels in labels_list]))
+ bbox_targets = torch.cat(
+ [bbox_targets[i] for bbox_targets in bbox_targets_list])
+ bezier_targets = torch.cat(
+ [bezier_targets[i] for bezier_targets in bezier_targets_list])
+ if self.norm_on_bbox:
+ bbox_targets = bbox_targets / self.strides[i]
+ bezier_targets = bezier_targets / self.strides[i]
+ concat_lvl_bbox_targets.append(bbox_targets)
+ concat_lvl_bezier_targets.append(bezier_targets)
+ return (concat_lvl_labels, concat_lvl_bbox_targets,
+ concat_lvl_bezier_targets)
+
+ def _get_targets_single(self, data_sample: TextDetDataSample,
+ points: Tensor, regress_ranges: Tensor,
+ num_points_per_lvl: List[int]
+ ) -> Tuple[Tensor, Tensor, Tensor]:
+ """Compute regression and classification targets for a single image."""
+ num_points = points.size(0)
+ gt_instances = data_sample.gt_instances
+ gt_instances = gt_instances[~gt_instances.ignored]
+ num_gts = len(gt_instances)
+ gt_bboxes = gt_instances.bboxes
+ gt_labels = gt_instances.labels
+ data_sample.gt_instances = gt_instances
+ polygons = gt_instances.polygons
+ beziers = gt_bboxes.new([poly2bezier(poly) for poly in polygons])
+ gt_instances.beziers = beziers
+ if num_gts == 0:
+ return gt_labels.new_full((num_points,), self.num_classes), \
+ gt_bboxes.new_zeros((num_points, 4)), \
+ gt_bboxes.new_zeros((num_points, 16))
+
+ areas = (gt_bboxes[:, 2] - gt_bboxes[:, 0]) * (
+ gt_bboxes[:, 3] - gt_bboxes[:, 1])
+ # TODO: figure out why these two are different
+ # areas = areas[None].expand(num_points, num_gts)
+ areas = areas[None].repeat(num_points, 1)
+ regress_ranges = regress_ranges[:, None, :].expand(
+ num_points, num_gts, 2)
+ gt_bboxes = gt_bboxes[None].expand(num_points, num_gts, 4)
+ xs, ys = points[:, 0], points[:, 1]
+ xs = xs[:, None].expand(num_points, num_gts)
+ ys = ys[:, None].expand(num_points, num_gts)
+
+ left = xs - gt_bboxes[..., 0]
+ right = gt_bboxes[..., 2] - xs
+ top = ys - gt_bboxes[..., 1]
+ bottom = gt_bboxes[..., 3] - ys
+ bbox_targets = torch.stack((left, top, right, bottom), -1)
+
+ beziers = beziers.reshape(-1, 8,
+ 2)[None].expand(num_points, num_gts, 8, 2)
+ beziers_left = beziers[..., 0] - xs[..., None]
+ beziers_right = beziers[..., 1] - ys[..., None]
+ bezier_targets = torch.stack((beziers_left, beziers_right), dim=-1)
+ bezier_targets = bezier_targets.view(num_points, num_gts, 16)
+ if self.center_sampling:
+ # condition1: inside a `center bbox`
+ radius = self.center_sample_radius
+ center_xs = (gt_bboxes[..., 0] + gt_bboxes[..., 2]) / 2
+ center_ys = (gt_bboxes[..., 1] + gt_bboxes[..., 3]) / 2
+ center_gts = torch.zeros_like(gt_bboxes)
+ stride = center_xs.new_zeros(center_xs.shape)
+
+ # project the points on current lvl back to the `original` sizes
+ lvl_begin = 0
+ for lvl_idx, num_points_lvl in enumerate(num_points_per_lvl):
+ lvl_end = lvl_begin + num_points_lvl
+ stride[lvl_begin:lvl_end] = self.strides[lvl_idx] * radius
+ lvl_begin = lvl_end
+
+ x_mins = center_xs - stride
+ y_mins = center_ys - stride
+ x_maxs = center_xs + stride
+ y_maxs = center_ys + stride
+ center_gts[..., 0] = torch.where(x_mins > gt_bboxes[..., 0],
+ x_mins, gt_bboxes[..., 0])
+ center_gts[..., 1] = torch.where(y_mins > gt_bboxes[..., 1],
+ y_mins, gt_bboxes[..., 1])
+ center_gts[..., 2] = torch.where(x_maxs > gt_bboxes[..., 2],
+ gt_bboxes[..., 2], x_maxs)
+ center_gts[..., 3] = torch.where(y_maxs > gt_bboxes[..., 3],
+ gt_bboxes[..., 3], y_maxs)
+
+ cb_dist_left = xs - center_gts[..., 0]
+ cb_dist_right = center_gts[..., 2] - xs
+ cb_dist_top = ys - center_gts[..., 1]
+ cb_dist_bottom = center_gts[..., 3] - ys
+ center_bbox = torch.stack(
+ (cb_dist_left, cb_dist_top, cb_dist_right, cb_dist_bottom), -1)
+ inside_gt_bbox_mask = center_bbox.min(-1)[0] > 0
+ else:
+ # condition1: inside a gt bbox
+ inside_gt_bbox_mask = bbox_targets.min(-1)[0] > 0
+
+ # condition2: limit the regression range for each location
+ max_regress_distance = bbox_targets.max(-1)[0]
+ inside_regress_range = (
+ (max_regress_distance >= regress_ranges[..., 0])
+ & (max_regress_distance <= regress_ranges[..., 1]))
+
+ # if there are still more than one objects for a location,
+ # we choose the one with minimal area
+ areas[inside_gt_bbox_mask == 0] = INF
+ areas[inside_regress_range == 0] = INF
+ min_area, min_area_inds = areas.min(dim=1)
+
+ labels = gt_labels[min_area_inds]
+ labels[min_area == INF] = self.num_classes # set as BG
+ bbox_targets = bbox_targets[range(num_points), min_area_inds]
+ bezier_targets = bezier_targets[range(num_points), min_area_inds]
+
+ return labels, bbox_targets, bezier_targets
+
+ def centerness_target(self, pos_bbox_targets: Tensor) -> Tensor:
+ """Compute centerness targets.
+
+ Args:
+ pos_bbox_targets (Tensor): BBox targets of positive bboxes in shape
+ (num_pos, 4)
+
+ Returns:
+ Tensor: Centerness target.
+ """
+ # only calculate pos centerness targets, otherwise there may be nan
+ left_right = pos_bbox_targets[:, [0, 2]]
+ top_bottom = pos_bbox_targets[:, [1, 3]]
+ if len(left_right) == 0:
+ centerness_targets = left_right[..., 0]
+ else:
+ centerness_targets = (
+ left_right.min(dim=-1)[0] / left_right.max(dim=-1)[0]) * (
+ top_bottom.min(dim=-1)[0] / top_bottom.max(dim=-1)[0])
+ return torch.sqrt(centerness_targets)
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_postprocessor.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_postprocessor.py
new file mode 100644
index 0000000000000000000000000000000000000000..db9a4d141c32ab840d8fe25640ad9c3fed00db5b
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_det_postprocessor.py
@@ -0,0 +1,228 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from functools import partial
+from typing import List
+
+import numpy as np
+import torch
+from mmcv.ops import batched_nms
+from mmdet.models.task_modules.prior_generators import MlvlPointGenerator
+from mmdet.models.utils import (filter_scores_and_topk, multi_apply,
+ select_single_mlvl)
+from mmengine.structures import InstanceData
+
+from mmocr.models.textdet.postprocessors.base import BaseTextDetPostProcessor
+from mmocr.registry import MODELS, TASK_UTILS
+
+
+@MODELS.register_module()
+class ABCNetDetPostprocessor(BaseTextDetPostProcessor):
+ """Post-processing methods for ABCNet.
+
+ Args:
+ num_classes (int): Number of classes.
+ use_sigmoid_cls (bool): Whether to use sigmoid for classification.
+ strides (tuple): Strides of each feature map.
+ norm_by_strides (bool): Whether to normalize the regression targets by
+ the strides.
+ bbox_coder (dict): Config dict for bbox coder.
+ text_repr_type (str): Text representation type, 'poly' or 'quad'.
+ with_bezier (bool): Whether to use bezier curve for text detection.
+ train_cfg (dict): Config dict for training.
+ test_cfg (dict): Config dict for testing.
+ """
+
+ def __init__(
+ self,
+ num_classes=1,
+ use_sigmoid_cls=True,
+ strides=(4, 8, 16, 32, 64),
+ norm_by_strides=True,
+ bbox_coder=dict(type='mmdet.DistancePointBBoxCoder'),
+ text_repr_type='poly',
+ rescale_fields=None,
+ with_bezier=False,
+ train_cfg=None,
+ test_cfg=None,
+ ):
+ super().__init__(
+ text_repr_type=text_repr_type,
+ rescale_fields=rescale_fields,
+ train_cfg=train_cfg,
+ test_cfg=test_cfg,
+ )
+ self.strides = strides
+ self.norm_by_strides = norm_by_strides
+ self.prior_generator = MlvlPointGenerator(strides)
+ self.bbox_coder = TASK_UTILS.build(bbox_coder)
+ self.use_sigmoid_cls = use_sigmoid_cls
+ self.with_bezier = with_bezier
+ if self.use_sigmoid_cls:
+ self.cls_out_channels = num_classes
+ else:
+ self.cls_out_channels = num_classes + 1
+
+ def split_results(self, pred_results: List[torch.Tensor]):
+ """Split the prediction results into multi-level features. The
+ prediction results are concatenated in the first dimension.
+ Args:
+ pred_results (list[list[torch.Tensor]): Prediction results of all
+ head with multi-level features.
+ The first dimension of pred_results is the number of outputs of
+ head. The second dimension is the number of level. The third
+ dimension is the feature with (N, C, H, W).
+
+ Returns:
+ list[list[torch.Tensor]]:
+ [Batch_size, Number of heads]
+ """
+
+ results = []
+ num_levels = len(pred_results[0])
+ bs = pred_results[0][0].size(0)
+ featmap_sizes = [
+ pred_results[0][i].shape[-2:] for i in range(num_levels)
+ ]
+ mlvl_priors = self.prior_generator.grid_priors(
+ featmap_sizes,
+ dtype=pred_results[0][0].dtype,
+ device=pred_results[0][0].device)
+ for img_id in range(bs):
+ single_results = [mlvl_priors]
+ for pred_result in pred_results:
+ single_results.append(select_single_mlvl(pred_result, img_id))
+ results.append(single_results)
+ return results
+
+ def get_text_instances(
+ self,
+ pred_results,
+ data_sample,
+ nms_pre=-1,
+ score_thr=0,
+ max_per_img=100,
+ nms=dict(type='nms', iou_threshold=0.5),
+ ):
+ """Get text instance predictions of one image."""
+ pred_instances = InstanceData()
+
+ (mlvl_bboxes, mlvl_scores, mlvl_labels, mlvl_score_factors,
+ mlvl_beziers) = multi_apply(
+ self._get_preds_single_level,
+ *pred_results,
+ self.strides,
+ img_shape=data_sample.get('img_shape'),
+ nms_pre=nms_pre,
+ score_thr=score_thr)
+
+ mlvl_bboxes = torch.cat(mlvl_bboxes)
+ mlvl_scores = torch.cat(mlvl_scores)
+ mlvl_labels = torch.cat(mlvl_labels)
+ if self.with_bezier:
+ mlvl_beziers = torch.cat(mlvl_beziers)
+
+ if mlvl_score_factors is not None:
+ mlvl_score_factors = torch.cat(mlvl_score_factors)
+ mlvl_scores = mlvl_scores * mlvl_score_factors
+ mlvl_scores = torch.sqrt(mlvl_scores)
+
+ if mlvl_bboxes.numel() == 0:
+ pred_instances.bboxes = mlvl_bboxes.detach().cpu().numpy()
+ pred_instances.scores = mlvl_scores.detach().cpu().numpy()
+ pred_instances.labels = mlvl_labels.detach().cpu().numpy()
+ if self.with_bezier:
+ pred_instances.beziers = mlvl_beziers.detach().reshape(-1, 16)
+ pred_instances.polygons = []
+ data_sample.pred_instances = pred_instances
+ return data_sample
+ det_bboxes, keep_idxs = batched_nms(mlvl_bboxes, mlvl_scores,
+ mlvl_labels, nms)
+ det_bboxes, scores = np.split(det_bboxes, [-1], axis=1)
+ pred_instances.bboxes = det_bboxes[:max_per_img].detach().cpu().numpy()
+ pred_instances.scores = scores[:max_per_img].detach().cpu().numpy(
+ ).squeeze(-1)
+ pred_instances.labels = mlvl_labels[keep_idxs][:max_per_img].detach(
+ ).cpu().numpy()
+ if self.with_bezier:
+ pred_instances.beziers = mlvl_beziers[
+ keep_idxs][:max_per_img].detach().reshape(-1, 16)
+ data_sample.pred_instances = pred_instances
+ return data_sample
+
+ def _get_preds_single_level(self,
+ priors,
+ cls_scores,
+ bbox_preds,
+ centernesses,
+ bezier_preds=None,
+ stride=1,
+ score_thr=0,
+ nms_pre=-1,
+ img_shape=None):
+ assert cls_scores.size()[-2:] == bbox_preds.size()[-2:]
+ if self.norm_by_strides:
+ bbox_preds = bbox_preds * stride
+ bbox_preds = bbox_preds.permute(1, 2, 0).reshape(-1, 4)
+ if self.with_bezier:
+ if self.norm_by_strides:
+ bezier_preds = bezier_preds * stride
+ bezier_preds = bezier_preds.permute(1, 2, 0).reshape(-1, 8, 2)
+ centernesses = centernesses.permute(1, 2, 0).reshape(-1).sigmoid()
+ cls_scores = cls_scores.permute(1, 2,
+ 0).reshape(-1, self.cls_out_channels)
+ if self.use_sigmoid_cls:
+ scores = cls_scores.sigmoid()
+ else:
+ # remind that we set FG labels to [0, num_class-1]
+ # since mmdet v2.0
+ # BG cat_id: num_class
+ scores = cls_scores.softmax(-1)[:, :-1]
+
+ # After https://github.com/open-mmlab/mmdetection/pull/6268/,
+ # this operation keeps fewer bboxes under the same `nms_pre`.
+ # There is no difference in performance for most models. If you
+ # find a slight drop in performance, you can set a larger
+ # `nms_pre` than before.
+ results = filter_scores_and_topk(
+ scores, score_thr, nms_pre,
+ dict(bbox_preds=bbox_preds, priors=priors))
+ scores, labels, keep_idxs, filtered_results = results
+
+ bbox_preds = filtered_results['bbox_preds']
+ priors = filtered_results['priors']
+ centernesses = centernesses[keep_idxs]
+ bboxes = self.bbox_coder.decode(
+ priors, bbox_preds, max_shape=img_shape)
+ if self.with_bezier:
+ bezier_preds = bezier_preds[keep_idxs]
+ bezier_preds = priors[:, None, :] + bezier_preds
+ bezier_preds[:, :, 0].clamp_(min=0, max=img_shape[1])
+ bezier_preds[:, :, 1].clamp_(min=0, max=img_shape[0])
+ return bboxes, scores, labels, centernesses, bezier_preds
+ else:
+ return bboxes, scores, labels, centernesses
+
+ def __call__(self, pred_results, data_samples, training: bool = False):
+ """Postprocess pred_results according to metainfos in data_samples.
+
+ Args:
+ pred_results (Union[Tensor, List[Tensor]]): The prediction results
+ stored in a tensor or a list of tensor. Usually each item to
+ be post-processed is expected to be a batched tensor.
+ data_samples (list[TextDetDataSample]): Batch of data_samples,
+ each corresponding to a prediction result.
+ training (bool): Whether the model is in training mode. Defaults to
+ False.
+
+ Returns:
+ list[TextDetDataSample]: Batch of post-processed datasamples.
+ """
+ if training:
+ return data_samples
+ cfg = self.train_cfg if training else self.test_cfg
+ if cfg is None:
+ cfg = {}
+ pred_results = self.split_results(pred_results)
+ process_single = partial(self._process_single, **cfg)
+ results = list(map(process_single, pred_results, data_samples))
+
+ return results
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_postprocessor.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_postprocessor.py
new file mode 100644
index 0000000000000000000000000000000000000000..1f75635652a80b688884244a23a07e1b59ba53f4
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_postprocessor.py
@@ -0,0 +1,100 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmocr.models.textdet.postprocessors.base import BaseTextDetPostProcessor
+from mmocr.registry import MODELS
+from ..utils import bezier2poly
+
+
+@MODELS.register_module()
+class ABCNetPostprocessor(BaseTextDetPostProcessor):
+ """Post-processing methods for ABCNet.
+
+ Args:
+ num_classes (int): Number of classes.
+ use_sigmoid_cls (bool): Whether to use sigmoid for classification.
+ strides (tuple): Strides of each feature map.
+ norm_by_strides (bool): Whether to normalize the regression targets by
+ the strides.
+ bbox_coder (dict): Config dict for bbox coder.
+ text_repr_type (str): Text representation type, 'poly' or 'quad'.
+ with_bezier (bool): Whether to use bezier curve for text detection.
+ train_cfg (dict): Config dict for training.
+ test_cfg (dict): Config dict for testing.
+ """
+
+ def __init__(
+ self,
+ text_repr_type='poly',
+ rescale_fields=['beziers', 'polygons'],
+ ):
+ super().__init__(
+ text_repr_type=text_repr_type, rescale_fields=rescale_fields)
+
+ def merge_predict(self, spotting_data_samples, recog_data_samples):
+ texts = [ds.pred_text.item for ds in recog_data_samples]
+ start = 0
+ for spotting_data_sample in spotting_data_samples:
+ end = start + len(spotting_data_sample.pred_instances)
+ spotting_data_sample.pred_instances.texts = texts[start:end]
+ start = end
+ return spotting_data_samples
+
+ # TODO: fix docstr
+ def __call__(self,
+ spotting_data_samples,
+ recog_data_samples,
+ training: bool = False):
+ """Postprocess pred_results according to metainfos in data_samples.
+
+ Args:
+ pred_results (Union[Tensor, List[Tensor]]): The prediction results
+ stored in a tensor or a list of tensor. Usually each item to
+ be post-processed is expected to be a batched tensor.
+ data_samples (list[TextDetDataSample]): Batch of data_samples,
+ each corresponding to a prediction result.
+ training (bool): Whether the model is in training mode. Defaults to
+ False.
+
+ Returns:
+ list[TextDetDataSample]: Batch of post-processed datasamples.
+ """
+ spotting_data_samples = list(
+ map(self._process_single, spotting_data_samples))
+ return self.merge_predict(spotting_data_samples, recog_data_samples)
+
+ def _process_single(self, data_sample):
+ """Process prediction results from one image.
+
+ Args:
+ pred_result (Union[Tensor, List[Tensor]]): Prediction results of an
+ image.
+ data_sample (TextDetDataSample): Datasample of an image.
+ """
+ data_sample = self.get_text_instances(data_sample)
+ if self.rescale_fields and len(self.rescale_fields) > 0:
+ assert isinstance(self.rescale_fields, list)
+ assert set(self.rescale_fields).issubset(
+ set(data_sample.pred_instances.keys()))
+ data_sample = self.rescale(data_sample, data_sample.scale_factor)
+ return data_sample
+
+ def get_text_instances(self, data_sample, **kwargs):
+ """Get text instance predictions of one image.
+
+ Args:
+ pred_result (tuple(Tensor)): Prediction results of an image.
+ data_sample (TextDetDataSample): Datasample of an image.
+ **kwargs: Other parameters. Configurable via ``__init__.train_cfg``
+ and ``__init__.test_cfg``.
+
+ Returns:
+ TextDetDataSample: A new DataSample with predictions filled in.
+ The polygon/bbox results are usually saved in
+ ``TextDetDataSample.pred_instances.polygons`` or
+ ``TextDetDataSample.pred_instances.bboxes``. The confidence scores
+ are saved in ``TextDetDataSample.pred_instances.scores``.
+ """
+ data_sample = data_sample.cpu().numpy()
+ pred_instances = data_sample.pred_instances
+ data_sample.pred_instances.polygons = list(
+ map(bezier2poly, pred_instances.beziers))
+ return data_sample
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec.py
new file mode 100644
index 0000000000000000000000000000000000000000..599a36d41f855a21ecf623389198162e03ee7d50
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec.py
@@ -0,0 +1,8 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from mmocr.models.textrecog import EncoderDecoderRecognizer
+from mmocr.registry import MODELS
+
+
+@MODELS.register_module()
+class ABCNetRec(EncoderDecoderRecognizer):
+ """CTC-loss based recognizer."""
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_backbone.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_backbone.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d77cf2e6f07cd609df16a7feaf83da609b3da3a
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_backbone.py
@@ -0,0 +1,52 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch.nn as nn
+from mmcv.cnn import ConvModule
+from mmengine.model import BaseModule, Sequential
+
+from mmocr.registry import MODELS
+
+
+@MODELS.register_module()
+class ABCNetRecBackbone(BaseModule):
+
+ def __init__(self, init_cfg=None):
+ super().__init__(init_cfg)
+
+ self.convs = Sequential(
+ ConvModule(
+ in_channels=256,
+ out_channels=256,
+ kernel_size=3,
+ padding=1,
+ bias='auto',
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='ReLU')),
+ ConvModule(
+ in_channels=256,
+ out_channels=256,
+ kernel_size=3,
+ padding=1,
+ bias='auto',
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='ReLU')),
+ ConvModule(
+ in_channels=256,
+ out_channels=256,
+ kernel_size=3,
+ padding=1,
+ stride=(2, 1),
+ bias='auto',
+ norm_cfg=dict(type='GN', num_groups=32),
+ act_cfg=dict(type='ReLU')),
+ ConvModule(
+ in_channels=256,
+ out_channels=256,
+ kernel_size=3,
+ padding=1,
+ stride=(2, 1),
+ bias='auto',
+ norm_cfg=dict(type='GN', num_groups=32),
+ act_cfg=dict(type='ReLU')), nn.AdaptiveAvgPool2d((1, None)))
+
+ def forward(self, x):
+ return self.convs(x)
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_decoder.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_decoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..e96f3a3b4fa6d33d79f8433320ee166da9ce0784
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_decoder.py
@@ -0,0 +1,161 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import random
+from typing import Dict, Optional, Sequence, Union
+
+import torch
+import torch.nn as nn
+from torch.nn import functional as F
+
+from mmocr.models.common.dictionary import Dictionary
+from mmocr.models.textrecog.decoders.base import BaseDecoder
+from mmocr.registry import MODELS
+from mmocr.structures import TextRecogDataSample
+
+
+@MODELS.register_module()
+class ABCNetRecDecoder(BaseDecoder):
+ """Decoder for ABCNet.
+
+ Args:
+ in_channels (int): Number of input channels.
+ dropout_prob (float): Probability of dropout. Default to 0.5.
+ teach_prob (float): Probability of teacher forcing. Defaults to 0.5.
+ dictionary (dict or :obj:`Dictionary`): The config for `Dictionary` or
+ the instance of `Dictionary`.
+ module_loss (dict, optional): Config to build module_loss. Defaults
+ to None.
+ postprocessor (dict, optional): Config to build postprocessor.
+ Defaults to None.
+ max_seq_len (int, optional): Max sequence length. Defaults to 30.
+ init_cfg (dict or list[dict], optional): Initialization configs.
+ Defaults to None.
+ """
+
+ def __init__(self,
+ in_channels: int = 256,
+ dropout_prob: float = 0.5,
+ teach_prob: float = 0.5,
+ dictionary: Union[Dictionary, Dict] = None,
+ module_loss: Dict = None,
+ postprocessor: Dict = None,
+ max_seq_len: int = 30,
+ init_cfg=dict(type='Xavier', layer='Conv2d'),
+ **kwargs):
+ super().__init__(
+ init_cfg=init_cfg,
+ dictionary=dictionary,
+ module_loss=module_loss,
+ postprocessor=postprocessor,
+ max_seq_len=max_seq_len)
+ self.in_channels = in_channels
+ self.teach_prob = teach_prob
+ self.embedding = nn.Embedding(self.dictionary.num_classes, in_channels)
+ self.attn_combine = nn.Linear(in_channels * 2, in_channels)
+ self.dropout = nn.Dropout(dropout_prob)
+ self.gru = nn.GRU(in_channels, in_channels)
+ self.out = nn.Linear(in_channels, self.dictionary.num_classes)
+ self.vat = nn.Linear(in_channels, 1)
+ self.softmax = nn.Softmax(dim=-1)
+
+ def forward_train(
+ self,
+ feat: torch.Tensor,
+ out_enc: Optional[torch.Tensor] = None,
+ data_samples: Optional[Sequence[TextRecogDataSample]] = None
+ ) -> torch.Tensor:
+ """
+ Args:
+ feat (Tensor): A Tensor of shape :math:`(N, C, 1, W)`.
+ out_enc (torch.Tensor, optional): Encoder output. Defaults to None.
+ data_samples (list[TextRecogDataSample], optional): Batch of
+ TextRecogDataSample, containing gt_text information. Defaults
+ to None.
+
+ Returns:
+ Tensor: The raw logit tensor. Shape :math:`(N, W, C)` where
+ :math:`C` is ``num_classes``.
+ """
+ bs = out_enc.size()[1]
+ trg_seq = []
+ for target in data_samples:
+ trg_seq.append(target.gt_text.padded_indexes.to(feat.device))
+ decoder_input = torch.zeros(bs).long().to(out_enc.device)
+ trg_seq = torch.stack(trg_seq, dim=0)
+ decoder_hidden = torch.zeros(1, bs,
+ self.in_channels).to(out_enc.device)
+ decoder_outputs = []
+ for index in range(trg_seq.shape[1]):
+ # decoder_output (nbatch, ncls)
+ decoder_output, decoder_hidden = self._attention(
+ decoder_input, decoder_hidden, out_enc)
+ teach_forcing = True if random.random(
+ ) > self.teach_prob else False
+ if teach_forcing:
+ decoder_input = trg_seq[:, index] # Teacher forcing
+ else:
+ _, topi = decoder_output.data.topk(1)
+ decoder_input = topi.squeeze()
+ decoder_outputs.append(decoder_output)
+
+ return torch.stack(decoder_outputs, dim=1)
+
+ def forward_test(
+ self,
+ feat: Optional[torch.Tensor] = None,
+ out_enc: Optional[torch.Tensor] = None,
+ data_samples: Optional[Sequence[TextRecogDataSample]] = None
+ ) -> torch.Tensor:
+ """
+ Args:
+ feat (Tensor): A Tensor of shape :math:`(N, C, 1, W)`.
+ out_enc (torch.Tensor, optional): Encoder output. Defaults to None.
+ data_samples (list[TextRecogDataSample]): Batch of
+ TextRecogDataSample, containing ``gt_text`` information.
+ Defaults to None.
+
+ Returns:
+ Tensor: Character probabilities. of shape
+ :math:`(N, self.max_seq_len, C)` where :math:`C` is
+ ``num_classes``.
+ """
+ bs = out_enc.size()[1]
+ outputs = []
+ decoder_input = torch.zeros(bs).long().to(out_enc.device)
+ decoder_hidden = torch.zeros(1, bs,
+ self.in_channels).to(out_enc.device)
+ for _ in range(self.max_seq_len):
+ # decoder_output (nbatch, ncls)
+ decoder_output, decoder_hidden = self._attention(
+ decoder_input, decoder_hidden, out_enc)
+ _, topi = decoder_output.data.topk(1)
+ decoder_input = topi.squeeze()
+ outputs.append(decoder_output)
+ outputs = torch.stack(outputs, dim=1)
+ return self.softmax(outputs)
+
+ def _attention(self, input, hidden, encoder_outputs):
+ embedded = self.embedding(input)
+ embedded = self.dropout(embedded)
+
+ # test
+ batch_size = encoder_outputs.shape[1]
+
+ alpha = hidden + encoder_outputs
+ alpha = alpha.view(-1, alpha.shape[-1]) # (T * n, hidden_size)
+ attn_weights = self.vat(torch.tanh(alpha)) # (T * n, 1)
+ attn_weights = attn_weights.view(-1, 1, batch_size).permute(
+ (2, 1, 0)) # (T, 1, n) -> (n, 1, T)
+ attn_weights = F.softmax(attn_weights, dim=2)
+
+ attn_applied = torch.matmul(attn_weights,
+ encoder_outputs.permute((1, 0, 2)))
+
+ if embedded.dim() == 1:
+ embedded = embedded.unsqueeze(0)
+ output = torch.cat((embedded, attn_applied.squeeze(1)), 1)
+ output = self.attn_combine(output).unsqueeze(0) # (1, n, hidden_size)
+
+ output = F.relu(output)
+ output, hidden = self.gru(output, hidden) # (1, n, hidden_size)
+ output = self.out(output[0])
+ return output, hidden
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_encoder.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_encoder.py
new file mode 100644
index 0000000000000000000000000000000000000000..5657ef096583efca964519dae36cc17c6ddf4034
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/abcnet_rec_encoder.py
@@ -0,0 +1,54 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict, Optional, Sequence
+
+import torch
+
+from mmocr.models.textrecog.encoders.base import BaseEncoder
+from mmocr.models.textrecog.layers import BidirectionalLSTM
+from mmocr.registry import MODELS
+from mmocr.structures import TextRecogDataSample
+
+
+@MODELS.register_module()
+class ABCNetRecEncoder(BaseEncoder):
+ """Encoder for ABCNet.
+
+ Args:
+ in_channels (int): Number of input channels.
+ out_channels (int): Number of output channels.
+ init_cfg (dict or list[dict], optional): Initialization configs.
+ Defaults to dict(type='Xavier', layer='Conv2d').
+ """
+
+ def __init__(self,
+ in_channels: int = 256,
+ hidden_channels: int = 256,
+ out_channels: int = 256,
+ init_cfg: Dict = None) -> None:
+ super().__init__(init_cfg=init_cfg)
+
+ self.layer = BidirectionalLSTM(in_channels, hidden_channels,
+ out_channels)
+
+ def forward(
+ self,
+ feat: torch.Tensor,
+ data_samples: Optional[Sequence[TextRecogDataSample]] = None
+ ) -> torch.Tensor:
+ """
+ Args:
+ feat (Tensor): Image features with the shape of
+ :math:`(N, C_{in}, H, W)`.
+ data_samples (list[TextRecogDataSample], optional): Batch of
+ TextRecogDataSample, containing valid_ratio information.
+ Defaults to None.
+
+ Returns:
+ Tensor: A tensor of shape :math:`(N, C_{out}, H, W)`.
+ """
+ assert feat.size(2) == 1, 'feature height must be 1'
+ feat = feat.squeeze(2)
+ feat = feat.permute(2, 0, 1) # NxCxW -> WxNxC
+ feat = self.layer(feat)
+ # feat = feat.permute(1, 0, 2).contiguous()
+ return feat
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/base_roi_extractor.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/base_roi_extractor.py
new file mode 100644
index 0000000000000000000000000000000000000000..372a23c2e428e2ca4364d134b726f6618d51af0e
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/base_roi_extractor.py
@@ -0,0 +1,79 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta, abstractmethod
+from typing import List, Tuple
+
+import torch.nn as nn
+from mmcv import ops
+from mmengine.model import BaseModule
+from torch import Tensor
+
+from mmocr.utils import ConfigType, OptMultiConfig
+
+
+class BaseRoIExtractor(BaseModule, metaclass=ABCMeta):
+ """Base class for RoI extractor.
+
+ Args:
+ roi_layer (:obj:`ConfigDict` or dict): Specify RoI layer type and
+ arguments.
+ out_channels (int): Output channels of RoI layers.
+ featmap_strides (list[int]): Strides of input feature maps.
+ init_cfg (:obj:`ConfigDict` or dict or list[:obj:`ConfigDict` or \
+ dict], optional): Initialization config dict. Defaults to None.
+ """
+
+ def __init__(self,
+ roi_layer: ConfigType,
+ out_channels: int,
+ featmap_strides: List[int],
+ init_cfg: OptMultiConfig = None) -> None:
+ super().__init__(init_cfg=init_cfg)
+ self.roi_layers = self.build_roi_layers(roi_layer, featmap_strides)
+ self.out_channels = out_channels
+ self.featmap_strides = featmap_strides
+
+ @property
+ def num_inputs(self) -> int:
+ """int: Number of input feature maps."""
+ return len(self.featmap_strides)
+
+ def build_roi_layers(self, layer_cfg: ConfigType,
+ featmap_strides: List[int]) -> nn.ModuleList:
+ """Build RoI operator to extract feature from each level feature map.
+
+ Args:
+ layer_cfg (:obj:`ConfigDict` or dict): Dictionary to construct and
+ config RoI layer operation. Options are modules under
+ ``mmcv/ops`` such as ``RoIAlign``.
+ featmap_strides (list[int]): The stride of input feature map w.r.t
+ to the original image size, which would be used to scale RoI
+ coordinate (original image coordinate system) to feature
+ coordinate system.
+
+ Returns:
+ :obj:`nn.ModuleList`: The RoI extractor modules for each level
+ feature map.
+ """
+
+ cfg = layer_cfg.copy()
+ layer_type = cfg.pop('type')
+ assert hasattr(ops, layer_type)
+ layer_cls = getattr(ops, layer_type)
+ roi_layers = nn.ModuleList(
+ [layer_cls(spatial_scale=1 / s, **cfg) for s in featmap_strides])
+ return roi_layers
+
+ @abstractmethod
+ def forward(self, feats: Tuple[Tensor], data_samples) -> Tensor:
+ """Extractor ROI feats.
+
+ Args:
+ feats (Tuple[Tensor]): Multi-scale features.
+ data_samples (List[TextSpottingDataSample]):
+
+ - proposals(InstanceData): The proposals of text detection.
+
+ Returns:
+ Tensor: RoI feature.
+ """
+ pass
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/base_roi_head.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/base_roi_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..15652841fe6248ab9a81cf4e052ad67d7c93da5a
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/base_roi_head.py
@@ -0,0 +1,58 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from abc import ABCMeta, abstractmethod
+from typing import Tuple
+
+from mmengine.model import BaseModule
+from torch import Tensor
+
+from mmocr.utils import DetSampleList
+
+
+class BaseRoIHead(BaseModule, metaclass=ABCMeta):
+ """Base class for RoIHeads."""
+
+ @property
+ def with_rec_head(self):
+ """bool: whether the RoI head contains a `mask_head`"""
+ return hasattr(self, 'rec_head') and self.rec_head is not None
+
+ @property
+ def with_extractor(self):
+ """bool: whether the RoI head contains a `mask_head`"""
+ return hasattr(self,
+ 'roi_extractor') and self.roi_extractor is not None
+
+ # @abstractmethod
+ # def init_assigner_sampler(self, *args, **kwargs):
+ # """Initialize assigner and sampler."""
+ # pass
+
+ @abstractmethod
+ def loss(self, x: Tuple[Tensor], data_samples: DetSampleList):
+ """Perform forward propagation and loss calculation of the roi head on
+ the features of the upstream network."""
+
+ @abstractmethod
+ def predict(self, x: Tuple[Tensor],
+ data_samples: DetSampleList) -> DetSampleList:
+ """Perform forward propagation of the roi head and predict detection
+ results on the features of the upstream network.
+
+ Args:
+ x (tuple[Tensor]): Features from upstream network. Each
+ has shape (N, C, H, W).
+ data_samples (List[:obj:`DetDataSample`]): The Data
+ Samples. It usually includes `gt_instance`
+
+ Returns:
+ list[obj:`DetDataSample`]: Detection results of each image.
+ Each item usually contains following keys in 'pred_instance'
+
+ - scores (Tensor): Classification scores, has a shape
+ (num_instance, )
+ - labels (Tensor): Labels of bboxes, has a shape
+ (num_instances, ).
+ - bboxes (Tensor): Has a shape (num_instances, 4),
+ the last dimension 4 arrange as (x1, y1, x2, y2).
+ - polygon (List[Tensor]): Has a shape (num_instances, H, W).
+ """
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/bezier_roi_extractor.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/bezier_roi_extractor.py
new file mode 100644
index 0000000000000000000000000000000000000000..a4848d18e7c33eb6edad873eb376ed8f47480265
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/bezier_roi_extractor.py
@@ -0,0 +1,120 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List, Tuple
+
+import torch
+from mmengine.structures import InstanceData
+from torch import Tensor
+
+from mmocr.registry import MODELS
+from mmocr.utils import ConfigType, OptMultiConfig
+from .base_roi_extractor import BaseRoIExtractor
+
+
+@MODELS.register_module()
+class BezierRoIExtractor(BaseRoIExtractor):
+ """Extract RoI features from a single level feature map.
+
+ If there are multiple input feature levels, each RoI is mapped to a level
+ according to its scale. The mapping rule is proposed in
+ `FPN `_.
+
+ Args:
+ roi_layer (:obj:`ConfigDict` or dict): Specify RoI layer type and
+ arguments.
+ out_channels (int): Output channels of RoI layers.
+ featmap_strides (List[int]): Strides of input feature maps.
+ finest_scale (int): Scale threshold of mapping to level 0.
+ Defaults to 56.
+ init_cfg (:obj:`ConfigDict` or dict or list[:obj:`ConfigDict` or \
+ dict], optional): Initialization config dict. Defaults to None.
+ """
+
+ def __init__(self,
+ roi_layer: ConfigType,
+ out_channels: int,
+ featmap_strides: List[int],
+ finest_scale: int = 96,
+ init_cfg: OptMultiConfig = None) -> None:
+ super().__init__(
+ roi_layer=roi_layer,
+ out_channels=out_channels,
+ featmap_strides=featmap_strides,
+ init_cfg=init_cfg)
+ self.finest_scale = finest_scale
+
+ def to_roi(self, beziers: Tensor) -> Tensor:
+ rois_list = []
+ for img_id, bezier in enumerate(beziers):
+ img_inds = bezier.new_full((bezier.size(0), 1), img_id)
+ rois = torch.cat([img_inds, bezier], dim=-1)
+ rois_list.append(rois)
+ rois = torch.cat(rois_list, 0)
+ return rois
+
+ def map_roi_levels(self, beziers: Tensor, num_levels: int) -> Tensor:
+ """Map rois to corresponding feature levels by scales.
+
+ - scale < finest_scale * 2: level 0
+ - finest_scale * 2 <= scale < finest_scale * 4: level 1
+ - finest_scale * 4 <= scale < finest_scale * 8: level 2
+ - scale >= finest_scale * 8: level 3
+ Args:
+ beziers (Tensor): Input bezier control points, shape (k, 17).
+ num_levels (int): Total level number.
+ Returns:
+ Tensor: Level index (0-based) of each RoI, shape (k, )
+ """
+
+ p1 = beziers[:, 1:3]
+ p2 = beziers[:, 15:]
+ scale = ((p1 - p2)**2).sum(dim=1).sqrt() * 2
+ target_lvls = torch.floor(torch.log2(scale / self.finest_scale + 1e-6))
+ target_lvls = target_lvls.clamp(min=0, max=num_levels - 1).long()
+ return target_lvls
+
+ def forward(self, feats: Tuple[Tensor],
+ proposal_instances: List[InstanceData]) -> Tensor:
+ """Extractor ROI feats.
+
+ Args:
+ feats (Tuple[Tensor]): Multi-scale features.
+ proposal_instances(List[InstanceData]): Proposal instances.
+
+ Returns:
+ Tensor: RoI feature.
+ """
+ beziers = [p_i.beziers for p_i in proposal_instances]
+ rois = self.to_roi(beziers)
+ # convert fp32 to fp16 when amp is on
+ rois = rois.type_as(feats[0])
+ out_size = self.roi_layers[0].output_size
+ feats = feats[:3]
+ num_levels = len(feats)
+ roi_feats = feats[0].new_zeros(
+ rois.size(0), self.out_channels, *out_size)
+
+ if num_levels == 1:
+ if len(rois) == 0:
+ return roi_feats
+ return self.roi_layers[0](feats[0], rois)
+
+ target_lvls = self.map_roi_levels(rois, num_levels)
+
+ for i in range(num_levels):
+ mask = target_lvls == i
+ inds = mask.nonzero(as_tuple=False).squeeze(1)
+ if inds.numel() > 0:
+ rois_ = rois[inds]
+ roi_feats_t = self.roi_layers[i](feats[i], rois_)
+ roi_feats[inds] = roi_feats_t
+ else:
+ # Sometimes some pyramid levels will not be used for RoI
+ # feature extraction and this will cause an incomplete
+ # computation graph in one GPU, which is different from those
+ # in other GPUs and will cause a hanging error.
+ # Therefore, we add it to ensure each feature pyramid is
+ # included in the computation graph to avoid runtime bugs.
+ roi_feats += sum(
+ x.view(-1)[0]
+ for x in self.parameters()) * 0. + feats[i].sum() * 0.
+ return roi_feats
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/bifpn.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/bifpn.py
new file mode 100644
index 0000000000000000000000000000000000000000..7f117dffe62bcb12f267df612abd56b22ad6e547
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/bifpn.py
@@ -0,0 +1,242 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import List
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmcv.cnn import ConvModule
+from mmengine.model import BaseModule
+
+from mmocr.registry import MODELS
+from mmocr.utils import ConfigType, MultiConfig, OptConfigType
+
+
+@MODELS.register_module()
+class BiFPN(BaseModule):
+ """illustration of a minimal bifpn unit P7_0 ------------------------->
+ P7_2 -------->
+
+ |-------------| โ โ |
+ P6_0 ---------> P6_1 ---------> P6_2 -------->
+ |-------------|--------------โ โ โ | P5_0
+ ---------> P5_1 ---------> P5_2 --------> |-------------|--------------โ
+ โ โ | P4_0 ---------> P4_1 ---------> P4_2
+ --------> |-------------|--------------โ โ
+ |--------------โ | P3_0 -------------------------> P3_2 -------->
+ """
+
+ def __init__(self,
+ in_channels: List[int],
+ out_channels: int,
+ num_outs: int,
+ repeat_times: int = 2,
+ start_level: int = 0,
+ end_level: int = -1,
+ add_extra_convs: bool = False,
+ relu_before_extra_convs: bool = False,
+ no_norm_on_lateral: bool = False,
+ conv_cfg: OptConfigType = None,
+ norm_cfg: OptConfigType = None,
+ act_cfg: OptConfigType = None,
+ laterial_conv1x1: bool = False,
+ upsample_cfg: ConfigType = dict(mode='nearest'),
+ pool_cfg: ConfigType = dict(),
+ init_cfg: MultiConfig = dict(
+ type='Xavier', layer='Conv2d', distribution='uniform')):
+ super().__init__(init_cfg=init_cfg)
+ assert isinstance(in_channels, list)
+ self.in_channels = in_channels
+ self.out_channels = out_channels
+ self.num_ins = len(in_channels)
+ self.num_outs = num_outs
+ self.relu_before_extra_convs = relu_before_extra_convs
+ self.no_norm_on_lateral = no_norm_on_lateral
+ self.upsample_cfg = upsample_cfg.copy()
+ self.repeat_times = repeat_times
+ if end_level == -1 or end_level == self.num_ins - 1:
+ self.backbone_end_level = self.num_ins
+ assert num_outs >= self.num_ins - start_level
+ else:
+ # if end_level is not the last level, no extra level is allowed
+ self.backbone_end_level = end_level + 1
+ assert end_level < self.num_ins
+ assert num_outs == end_level - start_level + 1
+ self.start_level = start_level
+ self.end_level = end_level
+ self.add_extra_convs = add_extra_convs
+
+ self.lateral_convs = nn.ModuleList()
+ self.extra_convs = nn.ModuleList()
+ self.bifpn_convs = nn.ModuleList()
+ for i in range(self.start_level, self.backbone_end_level):
+ if in_channels[i] == out_channels:
+ l_conv = nn.Identity()
+ else:
+ l_conv = ConvModule(
+ in_channels[i],
+ out_channels,
+ 1,
+ conv_cfg=conv_cfg,
+ norm_cfg=norm_cfg,
+ bias=True,
+ act_cfg=act_cfg,
+ inplace=False)
+ self.lateral_convs.append(l_conv)
+
+ for _ in range(repeat_times):
+ self.bifpn_convs.append(
+ BiFPNLayer(
+ channels=out_channels,
+ levels=num_outs,
+ conv_cfg=conv_cfg,
+ norm_cfg=norm_cfg,
+ act_cfg=act_cfg,
+ pool_cfg=pool_cfg))
+
+ # add extra conv layers (e.g., RetinaNet)
+ extra_levels = num_outs - self.backbone_end_level + self.start_level
+ if add_extra_convs and extra_levels >= 1:
+ for i in range(extra_levels):
+ if i == 0:
+ in_channels = self.in_channels[self.backbone_end_level - 1]
+ else:
+ in_channels = out_channels
+ if in_channels == out_channels:
+ extra_fpn_conv = nn.MaxPool2d(
+ kernel_size=3, stride=2, padding=1)
+ else:
+ extra_fpn_conv = nn.Sequential(
+ ConvModule(
+ in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=1,
+ norm_cfg=norm_cfg,
+ act_cfg=act_cfg),
+ nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
+ self.extra_convs.append(extra_fpn_conv)
+
+ def forward(self, inputs):
+
+ def extra_convs(inputs, extra_convs):
+ outputs = list()
+ for extra_conv in extra_convs:
+ inputs = extra_conv(inputs)
+ outputs.append(inputs)
+ return outputs
+
+ assert len(inputs) == len(self.in_channels)
+
+ # build laterals
+ laterals = [
+ lateral_conv(inputs[i + self.start_level])
+ for i, lateral_conv in enumerate(self.lateral_convs)
+ ]
+ if self.num_outs > len(laterals) and self.add_extra_convs:
+ extra_source = inputs[self.backbone_end_level - 1]
+ for extra_conv in self.extra_convs:
+ extra_source = extra_conv(extra_source)
+ laterals.append(extra_source)
+
+ for bifpn_module in self.bifpn_convs:
+ laterals = bifpn_module(laterals)
+ outs = laterals
+
+ return tuple(outs)
+
+
+def swish(x):
+ return x * x.sigmoid()
+
+
+class BiFPNLayer(BaseModule):
+
+ def __init__(self,
+ channels,
+ levels,
+ init=0.5,
+ conv_cfg=None,
+ norm_cfg=None,
+ act_cfg=None,
+ upsample_cfg=None,
+ pool_cfg=None,
+ eps=0.0001,
+ init_cfg=None):
+ super().__init__(init_cfg=init_cfg)
+ self.act_cfg = act_cfg
+ self.upsample_cfg = upsample_cfg
+ self.pool_cfg = pool_cfg
+ self.eps = eps
+ self.levels = levels
+ self.bifpn_convs = nn.ModuleList()
+ # weighted
+ self.weight_two_nodes = nn.Parameter(
+ torch.Tensor(2, levels).fill_(init))
+ self.weight_three_nodes = nn.Parameter(
+ torch.Tensor(3, levels - 2).fill_(init))
+ self.relu = nn.ReLU()
+ for _ in range(2):
+ for _ in range(self.levels - 1): # 1,2,3
+ fpn_conv = nn.Sequential(
+ ConvModule(
+ channels,
+ channels,
+ 3,
+ padding=1,
+ conv_cfg=conv_cfg,
+ norm_cfg=norm_cfg,
+ act_cfg=act_cfg,
+ inplace=False))
+ self.bifpn_convs.append(fpn_conv)
+
+ def forward(self, inputs):
+ assert len(inputs) == self.levels
+ # build top-down and down-top path with stack
+ levels = self.levels
+ # w relu
+ w1 = self.relu(self.weight_two_nodes)
+ w1 /= torch.sum(w1, dim=0) + self.eps # normalize
+ w2 = self.relu(self.weight_three_nodes)
+ # w2 /= torch.sum(w2, dim=0) + self.eps # normalize
+ # build top-down
+ idx_bifpn = 0
+ pathtd = inputs
+ inputs_clone = []
+ for in_tensor in inputs:
+ inputs_clone.append(in_tensor.clone())
+
+ for i in range(levels - 1, 0, -1):
+ _, _, h, w = pathtd[i - 1].shape
+ # pathtd[i - 1] = (
+ # w1[0, i - 1] * pathtd[i - 1] + w1[1, i - 1] *
+ # F.interpolate(pathtd[i], size=(h, w), mode='nearest')) / (
+ # w1[0, i - 1] + w1[1, i - 1] + self.eps)
+ pathtd[i -
+ 1] = w1[0, i -
+ 1] * pathtd[i - 1] + w1[1, i - 1] * F.interpolate(
+ pathtd[i], size=(h, w), mode='nearest')
+ pathtd[i - 1] = swish(pathtd[i - 1])
+ pathtd[i - 1] = self.bifpn_convs[idx_bifpn](pathtd[i - 1])
+ idx_bifpn = idx_bifpn + 1
+ # build down-top
+ for i in range(0, levels - 2, 1):
+ tmp_path = torch.stack([
+ inputs_clone[i + 1], pathtd[i + 1],
+ F.max_pool2d(pathtd[i], kernel_size=3, stride=2, padding=1)
+ ],
+ dim=-1)
+ norm_weight = w2[:, i] / (w2[:, i].sum() + self.eps)
+ pathtd[i + 1] = (norm_weight * tmp_path).sum(dim=-1)
+ # pathtd[i + 1] = w2[0, i] * inputs_clone[i + 1]
+ # + w2[1, i] * pathtd[
+ # i + 1] + w2[2, i] * F.max_pool2d(
+ # pathtd[i], kernel_size=3, stride=2, padding=1)
+ pathtd[i + 1] = swish(pathtd[i + 1])
+ pathtd[i + 1] = self.bifpn_convs[idx_bifpn](pathtd[i + 1])
+ idx_bifpn = idx_bifpn + 1
+
+ pathtd[levels - 1] = w1[0, levels - 1] * pathtd[levels - 1] + w1[
+ 1, levels - 1] * F.max_pool2d(
+ pathtd[levels - 2], kernel_size=3, stride=2, padding=1)
+ pathtd[levels - 1] = swish(pathtd[levels - 1])
+ pathtd[levels - 1] = self.bifpn_convs[idx_bifpn](pathtd[levels - 1])
+ return pathtd
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/coordinate_head.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/coordinate_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..dc31e88a628d0d8cd2f82cbdd4cc010eaea39938
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/coordinate_head.py
@@ -0,0 +1,56 @@
+import torch
+import torch.nn as nn
+from mmcv.cnn import ConvModule
+from mmengine.model import BaseModule
+
+from mmocr.registry import MODELS
+
+
+@MODELS.register_module()
+class CoordinateHead(BaseModule):
+
+ def __init__(self,
+ in_channel=256,
+ conv_num=4,
+ norm_cfg=dict(type='BN'),
+ act_cfg=dict(type='ReLU'),
+ init_cfg=None):
+ super().__init__(init_cfg=init_cfg)
+
+ mask_convs = list()
+ for i in range(conv_num):
+ if i == 0:
+ mask_conv = ConvModule(
+ in_channels=in_channel + 2, # 2 for coord
+ out_channels=in_channel,
+ kernel_size=3,
+ padding=1,
+ norm_cfg=norm_cfg,
+ act_cfg=act_cfg)
+ else:
+ mask_conv = ConvModule(
+ in_channels=in_channel,
+ out_channels=in_channel,
+ kernel_size=3,
+ padding=1,
+ norm_cfg=norm_cfg,
+ act_cfg=act_cfg)
+ mask_convs.append(mask_conv)
+ self.mask_convs = nn.Sequential(*mask_convs)
+
+ def forward(self, features):
+ coord_features = list()
+ for feature in features:
+ x_range = torch.linspace(
+ -1, 1, feature.shape[-1], device=feature.device)
+ y_range = torch.linspace(
+ -1, 1, feature.shape[-2], device=feature.device)
+ y, x = torch.meshgrid(y_range, x_range)
+ y = y.expand([feature.shape[0], 1, -1, -1])
+ x = x.expand([feature.shape[0], 1, -1, -1])
+ coord = torch.cat([x, y], 1)
+ feature_with_coord = torch.cat([feature, coord], dim=1)
+ feature_with_coord = self.mask_convs(feature_with_coord)
+ feature_with_coord = feature_with_coord + feature
+ coord_features.append(feature_with_coord)
+ return coord_features
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/rec_roi_head.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/rec_roi_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..a102902c530dca85f2f87be1b5dec8882ac26b2b
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/rec_roi_head.py
@@ -0,0 +1,70 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Tuple
+
+from mmengine.structures import LabelData
+from torch import Tensor
+
+from mmocr.registry import MODELS, TASK_UTILS
+from mmocr.structures import TextRecogDataSample # noqa F401
+from mmocr.utils import DetSampleList, OptMultiConfig, RecSampleList
+from .base_roi_head import BaseRoIHead
+
+
+@MODELS.register_module()
+class RecRoIHead(BaseRoIHead):
+ """Simplest base roi head including one bbox head and one mask head."""
+
+ def __init__(self,
+ neck=None,
+ sampler: OptMultiConfig = None,
+ roi_extractor: OptMultiConfig = None,
+ rec_head: OptMultiConfig = None,
+ init_cfg=None):
+ super().__init__(init_cfg)
+ if sampler is not None:
+ self.sampler = TASK_UTILS.build(sampler)
+ if neck is not None:
+ self.neck = MODELS.build(neck)
+ self.roi_extractor = MODELS.build(roi_extractor)
+ self.rec_head = MODELS.build(rec_head)
+
+ def loss(self, inputs: Tuple[Tensor], data_samples: DetSampleList) -> dict:
+ """Perform forward propagation and loss calculation of the detection
+ roi on the features of the upstream network.
+
+ Args:
+ x (tuple[Tensor]): List of multi-level img features.
+ rpn_results_list (list[:obj:`InstanceData`]): List of region
+ proposals.
+ DetSampleList (list[:obj:`DetDataSample`]): The batch
+ data samples. It usually includes information such
+ as `gt_instance` or `gt_panoptic_seg` or `gt_sem_seg`.
+
+ Returns:
+ dict[str, Tensor]: A dictionary of loss components
+ """
+ proposals = [
+ ds.gt_instances[~ds.gt_instances.ignored] for ds in data_samples
+ ]
+
+ proposals = [p for p in proposals if len(p) > 0]
+ bbox_feats = self.roi_extractor(inputs, proposals)
+ rec_data_samples = [
+ TextRecogDataSample(gt_text=LabelData(item=text))
+ for proposal in proposals for text in proposal.texts
+ ]
+ return self.rec_head.loss(bbox_feats, rec_data_samples)
+
+ def predict(self, inputs: Tuple[Tensor],
+ data_samples: DetSampleList) -> RecSampleList:
+ if hasattr(self, 'neck') and self.neck is not None:
+ inputs = self.neck(inputs)
+ pred_instances = [ds.pred_instances for ds in data_samples]
+ bbox_feats = self.roi_extractor(inputs, pred_instances)
+ if bbox_feats.size(0) == 0:
+ return []
+ len_instance = sum(
+ [len(instance_data) for instance_data in pred_instances])
+ rec_data_samples = [TextRecogDataSample() for _ in range(len_instance)]
+ rec_data_samples = self.rec_head.predict(bbox_feats, rec_data_samples)
+ return rec_data_samples
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/model/two_stage_text_spotting.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/two_stage_text_spotting.py
new file mode 100644
index 0000000000000000000000000000000000000000..4a9bd8efc7f832b6fa3273af4eff9a6b670b3356
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/model/two_stage_text_spotting.py
@@ -0,0 +1,93 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Dict
+
+import torch
+
+from mmocr.models.textdet.detectors.base import BaseTextDetector
+from mmocr.registry import MODELS
+from mmocr.utils import OptConfigType, OptDetSampleList, OptMultiConfig
+
+
+@MODELS.register_module()
+class TwoStageTextSpotter(BaseTextDetector):
+ """Two-stage text spotter.
+
+ Args:
+ backbone (dict, optional): Config dict for text spotter backbone.
+ Defaults to None.
+ neck (dict, optional): Config dict for text spotter neck. Defaults to
+ None.
+ det_head (dict, optional): Config dict for text spotter head. Defaults
+ to None.
+ roi_head (dict, optional): Config dict for text spotter roi head.
+ Defaults to None.
+ data_preprocessor (dict, optional): Config dict for text spotter data
+ preprocessor. Defaults to None.
+ init_cfg (dict, optional): Initialization config dict. Defaults to
+ None.
+ """
+
+ def __init__(self,
+ backbone: OptConfigType = None,
+ neck: OptConfigType = None,
+ det_head: OptConfigType = None,
+ roi_head: OptConfigType = None,
+ postprocessor: OptConfigType = None,
+ data_preprocessor: OptConfigType = None,
+ init_cfg: OptMultiConfig = None) -> None:
+
+ super().__init__(
+ data_preprocessor=data_preprocessor, init_cfg=init_cfg)
+
+ self.backbone = MODELS.build(backbone)
+ if neck is not None:
+ self.neck = MODELS.build(neck)
+ if det_head is not None:
+ self.det_head = MODELS.build(det_head)
+
+ if roi_head is not None:
+ self.roi_head = MODELS.build(roi_head)
+
+ if postprocessor is not None:
+ self.postprocessor = MODELS.build(postprocessor)
+
+ @property
+ def with_det_head(self):
+ """bool: whether the detector has RPN"""
+ return hasattr(self, 'det_head') and self.det_head is not None
+
+ @property
+ def with_roi_head(self):
+ """bool: whether the detector has a RoI head"""
+ return hasattr(self, 'roi_head') and self.roi_head is not None
+
+ def extract_feat(self, img):
+ """Directly extract features from the backbone+neck."""
+ x = self.backbone(img)
+ if self.with_neck:
+ x = self.neck(x)
+ return x
+
+ def loss(self, inputs: torch.Tensor,
+ data_samples: OptDetSampleList) -> Dict:
+ losses = dict()
+ inputs = self.extract_feat(inputs)
+ det_loss, data_samples = self.det_head.loss_and_predict(
+ inputs, data_samples)
+ roi_losses = self.roi_head.loss(inputs, data_samples)
+ losses.update(det_loss)
+ losses.update(roi_losses)
+ return losses
+
+ def predict(self, inputs: torch.Tensor,
+ data_samples: OptDetSampleList) -> OptDetSampleList:
+ """Predict results from a batch of inputs and data samples with post-
+ processing."""
+ inputs = self.extract_feat(inputs)
+ data_samples = self.det_head.predict(inputs, data_samples)
+ rec_data_samples = self.roi_head.predict(inputs, data_samples)
+ return self.postprocessor(data_samples, rec_data_samples)
+
+ def _forward(self, inputs: torch.Tensor,
+ data_samples: OptDetSampleList) -> torch.Tensor:
+ pass
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/utils/__init__.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..d0007ffae850901ee62e43beebfe56fc2865cf73
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/utils/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .bezier_utils import bezier2poly, poly2bezier
+
+__all__ = ['poly2bezier', 'bezier2poly']
diff --git a/mmocr-dev-1.x/projects/ABCNet/abcnet/utils/bezier_utils.py b/mmocr-dev-1.x/projects/ABCNet/abcnet/utils/bezier_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..d93a6293926e2d807eb089bf92835e39a4ef5d84
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/abcnet/utils/bezier_utils.py
@@ -0,0 +1,62 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import numpy as np
+from scipy.special import comb as n_over_k
+
+from mmocr.utils.typing_utils import ArrayLike
+
+
+def bezier_coefficient(n, t, k):
+ return t**k * (1 - t)**(n - k) * n_over_k(n, k)
+
+
+def bezier_coefficients(time, point_num, ratios):
+ return [[bezier_coefficient(time, ratio, num) for num in range(point_num)]
+ for ratio in ratios]
+
+
+def linear_interpolation(point1: np.ndarray,
+ point2: np.ndarray,
+ number: int = 2) -> np.ndarray:
+ t = np.linspace(0, 1, number + 2).reshape(-1, 1)
+ return point1 + (point2 - point1) * t
+
+
+def curve2bezier(curve: ArrayLike):
+ curve = np.array(curve).reshape(-1, 2)
+ if len(curve) == 2:
+ return linear_interpolation(curve[0], curve[1])
+ diff = curve[1:] - curve[:-1]
+ distance = np.linalg.norm(diff, axis=-1)
+ norm_distance = distance / distance.sum()
+ norm_distance = np.hstack(([0], norm_distance))
+ cum_norm_dis = norm_distance.cumsum()
+ pseudo_inv = np.linalg.pinv(bezier_coefficients(3, 4, cum_norm_dis))
+ control_points = pseudo_inv.dot(curve)
+ return control_points
+
+
+def bezier2curve(bezier: np.ndarray, num_sample: int = 10):
+ bezier = np.asarray(bezier)
+ t = np.linspace(0, 1, num_sample)
+ return np.array(bezier_coefficients(3, 4, t)).dot(bezier)
+
+
+def poly2bezier(poly):
+ poly = np.array(poly).reshape(-1, 2)
+ points_num = len(poly)
+ up_curve = poly[:points_num // 2]
+ down_curve = poly[points_num // 2:]
+ up_bezier = curve2bezier(up_curve)
+ down_bezier = curve2bezier(down_curve)
+ up_bezier[0] = up_curve[0]
+ up_bezier[-1] = up_curve[-1]
+ down_bezier[0] = down_curve[0]
+ down_bezier[-1] = down_curve[-1]
+ return np.vstack((up_bezier, down_bezier)).flatten().tolist()
+
+
+def bezier2poly(bezier, num_sample=20):
+ bezier = bezier.reshape(2, 4, 2)
+ curve_top = bezier2curve(bezier[0], num_sample)
+ curve_bottom = bezier2curve(bezier[1], num_sample)
+ return np.vstack((curve_top, curve_bottom)).flatten().tolist()
diff --git a/mmocr-dev-1.x/projects/ABCNet/config/_base_/datasets/icdar2015.py b/mmocr-dev-1.x/projects/ABCNet/config/_base_/datasets/icdar2015.py
new file mode 100644
index 0000000000000000000000000000000000000000..240f1347fda7057aa20f009e493aca368d097954
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/config/_base_/datasets/icdar2015.py
@@ -0,0 +1,15 @@
+icdar2015_textspotting_data_root = 'data/icdar2015'
+
+icdar2015_textspotting_train = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textspotting_data_root,
+ ann_file='textspotting_train.json',
+ pipeline=None)
+
+icdar2015_textspotting_test = dict(
+ type='OCRDataset',
+ data_root=icdar2015_textspotting_data_root,
+ ann_file='textspotting_test.json',
+ test_mode=True,
+ # indices=50,
+ pipeline=None)
diff --git a/mmocr-dev-1.x/projects/ABCNet/config/_base_/default_runtime.py b/mmocr-dev-1.x/projects/ABCNet/config/_base_/default_runtime.py
new file mode 100644
index 0000000000000000000000000000000000000000..4b9b72c53f6285ebb2a205982226066b4e21178e
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/config/_base_/default_runtime.py
@@ -0,0 +1,41 @@
+default_scope = 'mmocr'
+env_cfg = dict(
+ cudnn_benchmark=False,
+ mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
+ dist_cfg=dict(backend='nccl'),
+)
+randomness = dict(seed=None)
+
+default_hooks = dict(
+ timer=dict(type='IterTimerHook'),
+ logger=dict(type='LoggerHook', interval=5),
+ param_scheduler=dict(type='ParamSchedulerHook'),
+ checkpoint=dict(type='CheckpointHook', interval=20),
+ sampler_seed=dict(type='DistSamplerSeedHook'),
+ sync_buffer=dict(type='SyncBuffersHook'),
+ visualization=dict(
+ type='VisualizationHook',
+ interval=1,
+ enable=False,
+ show=False,
+ draw_gt=False,
+ draw_pred=False),
+)
+
+# Logging
+log_level = 'INFO'
+log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
+
+load_from = None
+resume = False
+
+# Evaluation
+val_evaluator = [dict(type='E2EHmeanIOUMetric'), dict(type='HmeanIOUMetric')]
+test_evaluator = val_evaluator
+
+# Visualization
+vis_backends = [dict(type='LocalVisBackend')]
+visualizer = dict(
+ type='TextSpottingLocalVisualizer',
+ name='visualizer',
+ vis_backends=vis_backends)
diff --git a/mmocr-dev-1.x/projects/ABCNet/config/_base_/schedules/schedule_sgd_500e.py b/mmocr-dev-1.x/projects/ABCNet/config/_base_/schedules/schedule_sgd_500e.py
new file mode 100644
index 0000000000000000000000000000000000000000..431c48ff9ddfbcd25425007c633014d68f5a64e0
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/config/_base_/schedules/schedule_sgd_500e.py
@@ -0,0 +1,12 @@
+# optimizer
+optim_wrapper = dict(
+ type='OptimWrapper',
+ optimizer=dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001),
+ clip_grad=dict(type='value', clip_value=1))
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=500, val_interval=20)
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+# learning policy
+param_scheduler = [
+ dict(type='LinearLR', end=1000, start_factor=0.001, by_epoch=False),
+]
diff --git a/mmocr-dev-1.x/projects/ABCNet/config/abcnet/_base_abcnet_resnet50_fpn.py b/mmocr-dev-1.x/projects/ABCNet/config/abcnet/_base_abcnet_resnet50_fpn.py
new file mode 100644
index 0000000000000000000000000000000000000000..05d570132485a43aa1afb8646f9aaa609a42f286
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/config/abcnet/_base_abcnet_resnet50_fpn.py
@@ -0,0 +1,165 @@
+num_classes = 1
+strides = [8, 16, 32, 64, 128]
+bbox_coder = dict(type='mmdet.DistancePointBBoxCoder')
+with_bezier = True
+norm_on_bbox = True
+use_sigmoid_cls = True
+
+dictionary = dict(
+ type='Dictionary',
+ dict_file='{{ fileDirname }}/../../dicts/abcnet.txt',
+ with_start=False,
+ with_end=False,
+ same_start_end=False,
+ with_padding=True,
+ with_unknown=True)
+
+model = dict(
+ type='ABCNet',
+ data_preprocessor=dict(
+ type='TextDetDataPreprocessor',
+ mean=[123.675, 116.28, 103.53][::-1],
+ std=[1, 1, 1],
+ bgr_to_rgb=False,
+ pad_size_divisor=32),
+ backbone=dict(
+ type='mmdet.ResNet',
+ depth=50,
+ num_stages=4,
+ out_indices=(0, 1, 2, 3),
+ frozen_stages=1,
+ norm_cfg=dict(type='BN', requires_grad=False),
+ norm_eval=True,
+ style='caffe',
+ init_cfg=dict(
+ type='Pretrained',
+ checkpoint='open-mmlab://detectron2/resnet50_caffe')),
+ neck=dict(
+ type='mmdet.FPN',
+ in_channels=[256, 512, 1024, 2048],
+ out_channels=256,
+ start_level=0,
+ add_extra_convs='on_output', # use P5
+ num_outs=6,
+ relu_before_extra_convs=True),
+ det_head=dict(
+ type='ABCNetDetHead',
+ num_classes=num_classes,
+ in_channels=256,
+ stacked_convs=4,
+ feat_channels=256,
+ strides=strides,
+ norm_on_bbox=norm_on_bbox,
+ use_sigmoid_cls=use_sigmoid_cls,
+ centerness_on_reg=True,
+ dcn_on_last_conv=False,
+ conv_bias=True,
+ use_scale=False,
+ with_bezier=with_bezier,
+ init_cfg=dict(
+ type='Normal',
+ layer='Conv2d',
+ std=0.01,
+ override=dict(
+ type='Normal',
+ name='conv_cls',
+ std=0.01,
+ bias=-4.59511985013459), # -log((1-p)/p) where p=0.01
+ ),
+ module_loss=dict(
+ type='ABCNetDetModuleLoss',
+ num_classes=num_classes,
+ strides=strides,
+ center_sampling=True,
+ center_sample_radius=1.5,
+ bbox_coder=bbox_coder,
+ norm_on_bbox=norm_on_bbox,
+ loss_cls=dict(
+ type='mmdet.FocalLoss',
+ use_sigmoid=use_sigmoid_cls,
+ gamma=2.0,
+ alpha=0.25,
+ loss_weight=1.0),
+ loss_bbox=dict(type='mmdet.GIoULoss', loss_weight=1.0),
+ loss_centerness=dict(
+ type='mmdet.CrossEntropyLoss',
+ use_sigmoid=True,
+ loss_weight=1.0)),
+ postprocessor=dict(
+ type='ABCNetDetPostprocessor',
+ use_sigmoid_cls=use_sigmoid_cls,
+ strides=[8, 16, 32, 64, 128],
+ bbox_coder=dict(type='mmdet.DistancePointBBoxCoder'),
+ with_bezier=True,
+ test_cfg=dict(
+ nms_pre=1000,
+ nms=dict(type='nms', iou_threshold=0.5),
+ score_thr=0.3))),
+ roi_head=dict(
+ type='RecRoIHead',
+ roi_extractor=dict(
+ type='BezierRoIExtractor',
+ roi_layer=dict(
+ type='BezierAlign', output_size=(8, 32), sampling_ratio=1.0),
+ out_channels=256,
+ featmap_strides=[4, 8, 16]),
+ rec_head=dict(
+ type='ABCNetRec',
+ backbone=dict(type='ABCNetRecBackbone'),
+ encoder=dict(type='ABCNetRecEncoder'),
+ decoder=dict(
+ type='ABCNetRecDecoder',
+ dictionary=dictionary,
+ postprocessor=dict(
+ type='AttentionPostprocessor',
+ ignore_chars=['padding', 'unknown']),
+ module_loss=dict(
+ type='CEModuleLoss',
+ ignore_first_char=False,
+ ignore_char=-1,
+ reduction='mean'),
+ max_seq_len=25))),
+ postprocessor=dict(
+ type='ABCNetPostprocessor',
+ rescale_fields=['polygons', 'bboxes', 'beziers'],
+ ))
+
+test_pipeline = [
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
+ dict(type='Resize', scale=(2000, 4000), keep_ratio=True, backend='pillow'),
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ with_text=True),
+ dict(
+ type='PackTextDetInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
+]
+
+train_pipeline = [
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ with_text=True),
+ dict(type='RemoveIgnored'),
+ dict(type='RandomCrop', min_side_ratio=0.1),
+ dict(
+ type='RandomRotate',
+ max_angle=30,
+ pad_with_fixed_color=True,
+ use_canvas=True),
+ dict(
+ type='RandomChoiceResize',
+ scales=[(980, 2900), (1044, 2900), (1108, 2900), (1172, 2900),
+ (1236, 2900), (1300, 2900), (1364, 2900), (1428, 2900),
+ (1492, 2900)],
+ keep_ratio=True),
+ dict(
+ type='PackTextDetInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
+]
diff --git a/mmocr-dev-1.x/projects/ABCNet/config/abcnet/abcnet_resnet50_fpn_500e_icdar2015.py b/mmocr-dev-1.x/projects/ABCNet/config/abcnet/abcnet_resnet50_fpn_500e_icdar2015.py
new file mode 100644
index 0000000000000000000000000000000000000000..424a35254ebdd3050e8e13b506b7ee5d97a565fb
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/config/abcnet/abcnet_resnet50_fpn_500e_icdar2015.py
@@ -0,0 +1,37 @@
+_base_ = [
+ '_base_abcnet_resnet50_fpn.py',
+ '../_base_/datasets/icdar2015.py',
+ '../_base_/default_runtime.py',
+ '../_base_/schedules/schedule_sgd_500e.py',
+]
+
+# dataset settings
+icdar2015_textspotting_train = _base_.icdar2015_textspotting_train
+icdar2015_textspotting_train.pipeline = _base_.train_pipeline
+icdar2015_textspotting_test = _base_.icdar2015_textspotting_test
+icdar2015_textspotting_test.pipeline = _base_.test_pipeline
+
+train_dataloader = dict(
+ batch_size=2,
+ num_workers=8,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=True),
+ dataset=icdar2015_textspotting_train)
+
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=4,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=icdar2015_textspotting_test)
+
+test_dataloader = val_dataloader
+
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+
+custom_imports = dict(imports=['abcnet'], allow_failed_imports=False)
+
+load_from = 'https://download.openmmlab.com/mmocr/textspotting/abcnet/abcnet_resnet50_fpn_500e_icdar2015/abcnet_resnet50_fpn_pretrain-d060636c.pth' # noqa
+
+find_unused_parameters = True
diff --git a/mmocr-dev-1.x/projects/ABCNet/config/abcnet_v2/_base_abcnet-v2_resnet50_bifpn.py b/mmocr-dev-1.x/projects/ABCNet/config/abcnet_v2/_base_abcnet-v2_resnet50_bifpn.py
new file mode 100644
index 0000000000000000000000000000000000000000..b6bca5a6c292b663ba440df087265828f76a646a
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/config/abcnet_v2/_base_abcnet-v2_resnet50_bifpn.py
@@ -0,0 +1,118 @@
+num_classes = 1
+strides = [8, 16, 32, 64, 128]
+bbox_coder = dict(type='mmdet.DistancePointBBoxCoder')
+with_bezier = True
+norm_on_bbox = True
+use_sigmoid_cls = True
+
+dictionary = dict(
+ type='Dictionary',
+ dict_file='{{ fileDirname }}/../../dicts/abcnet.txt',
+ with_start=False,
+ with_end=False,
+ same_start_end=False,
+ with_padding=True,
+ with_unknown=True)
+
+model = dict(
+ type='ABCNet',
+ data_preprocessor=dict(
+ type='TextDetDataPreprocessor',
+ mean=[123.675, 116.28, 103.53][::-1],
+ std=[1, 1, 1],
+ bgr_to_rgb=False,
+ pad_size_divisor=32),
+ backbone=dict(
+ type='mmdet.ResNet',
+ depth=50,
+ num_stages=4,
+ out_indices=(0, 1, 2, 3),
+ frozen_stages=1,
+ norm_cfg=dict(type='BN', requires_grad=False),
+ norm_eval=True,
+ style='caffe',
+ init_cfg=dict(
+ type='Pretrained',
+ checkpoint='open-mmlab://detectron2/resnet50_caffe')),
+ neck=dict(
+ type='BiFPN',
+ in_channels=[256, 512, 1024, 2048],
+ out_channels=256,
+ start_level=0,
+ add_extra_convs=True, # use P5
+ norm_cfg=dict(type='BN'),
+ num_outs=6,
+ relu_before_extra_convs=True),
+ det_head=dict(
+ type='ABCNetDetHead',
+ num_classes=num_classes,
+ in_channels=256,
+ stacked_convs=4,
+ feat_channels=256,
+ strides=strides,
+ norm_on_bbox=norm_on_bbox,
+ use_sigmoid_cls=use_sigmoid_cls,
+ centerness_on_reg=True,
+ dcn_on_last_conv=False,
+ conv_bias=True,
+ use_scale=False,
+ with_bezier=with_bezier,
+ init_cfg=dict(
+ type='Normal',
+ layer='Conv2d',
+ std=0.01,
+ override=dict(
+ type='Normal',
+ name='conv_cls',
+ std=0.01,
+ bias=-4.59511985013459), # -log((1-p)/p) where p=0.01
+ ),
+ module_loss=None,
+ postprocessor=dict(
+ type='ABCNetDetPostprocessor',
+ # rescale_fields=['polygons', 'bboxes'],
+ use_sigmoid_cls=use_sigmoid_cls,
+ strides=[8, 16, 32, 64, 128],
+ bbox_coder=dict(type='mmdet.DistancePointBBoxCoder'),
+ with_bezier=True,
+ test_cfg=dict(
+ # rescale_fields=['polygon', 'bboxes', 'bezier'],
+ nms_pre=1000,
+ nms=dict(type='nms', iou_threshold=0.4),
+ score_thr=0.3))),
+ roi_head=dict(
+ type='RecRoIHead',
+ neck=dict(type='CoordinateHead'),
+ roi_extractor=dict(
+ type='BezierRoIExtractor',
+ roi_layer=dict(
+ type='BezierAlign', output_size=(16, 64), sampling_ratio=1.0),
+ out_channels=256,
+ featmap_strides=[4, 8, 16]),
+ rec_head=dict(
+ type='ABCNetRec',
+ backbone=dict(type='ABCNetRecBackbone'),
+ encoder=dict(type='ABCNetRecEncoder'),
+ decoder=dict(
+ type='ABCNetRecDecoder',
+ dictionary=dictionary,
+ postprocessor=dict(type='AttentionPostprocessor'),
+ max_seq_len=25))),
+ postprocessor=dict(
+ type='ABCNetPostprocessor',
+ rescale_fields=['polygons', 'bboxes', 'beziers'],
+ ))
+
+test_pipeline = [
+ dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
+ dict(type='Resize', scale=(2000, 4000), keep_ratio=True, backend='pillow'),
+ dict(
+ type='LoadOCRAnnotations',
+ with_polygon=True,
+ with_bbox=True,
+ with_label=True,
+ with_text=True),
+ dict(
+ type='PackTextDetInputs',
+ meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
+]
diff --git a/mmocr-dev-1.x/projects/ABCNet/config/abcnet_v2/abcnet-v2_resnet50_bifpn_500e_icdar2015.py b/mmocr-dev-1.x/projects/ABCNet/config/abcnet_v2/abcnet-v2_resnet50_bifpn_500e_icdar2015.py
new file mode 100644
index 0000000000000000000000000000000000000000..5b51f562438981299cd009349f795a4379eb9f96
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/config/abcnet_v2/abcnet-v2_resnet50_bifpn_500e_icdar2015.py
@@ -0,0 +1,23 @@
+_base_ = [
+ '_base_abcnet-v2_resnet50_bifpn.py',
+ '../_base_/datasets/icdar2015.py',
+ '../_base_/default_runtime.py',
+]
+
+# dataset settings
+icdar2015_textspotting_test = _base_.icdar2015_textspotting_test
+icdar2015_textspotting_test.pipeline = _base_.test_pipeline
+
+val_dataloader = dict(
+ batch_size=1,
+ num_workers=4,
+ persistent_workers=True,
+ sampler=dict(type='DefaultSampler', shuffle=False),
+ dataset=icdar2015_textspotting_test)
+
+test_dataloader = val_dataloader
+
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+
+custom_imports = dict(imports=['abcnet'], allow_failed_imports=False)
diff --git a/mmocr-dev-1.x/projects/ABCNet/dicts/abcnet.txt b/mmocr-dev-1.x/projects/ABCNet/dicts/abcnet.txt
new file mode 100644
index 0000000000000000000000000000000000000000..173d6c4a7ad83dcb6cdb3d177456d0b4d553c01c
--- /dev/null
+++ b/mmocr-dev-1.x/projects/ABCNet/dicts/abcnet.txt
@@ -0,0 +1,95 @@
+
+!
+"
+#
+$
+%
+&
+'
+(
+)
+*
++
+,
+-
+.
+/
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+:
+;
+<
+=
+>
+?
+@
+A
+B
+C
+D
+E
+F
+G
+H
+I
+J
+K
+L
+M
+N
+O
+P
+Q
+R
+S
+T
+U
+V
+W
+X
+Y
+Z
+[
+\
+]
+^
+_
+`
+a
+b
+c
+d
+e
+f
+g
+h
+i
+j
+k
+l
+m
+n
+o
+p
+q
+r
+s
+t
+u
+v
+w
+x
+y
+z
+{
+|
+}
+~
\ No newline at end of file
diff --git a/mmocr-dev-1.x/projects/README.md b/mmocr-dev-1.x/projects/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..b9dc68752a1dd491eb2d3c43debe665fd00fa77a
--- /dev/null
+++ b/mmocr-dev-1.x/projects/README.md
@@ -0,0 +1,13 @@
+# Projects
+
+The OpenMMLab ecosystem can only grow through the contributions of the community.
+Everyone is welcome to post their implementation of any great ideas in this folder! If you wish to start your own project, please go through the [example project](example_project/) for the best practice. For common questions about projects, please read our [faq](faq.md).
+
+## External Projects
+
+Here we lists some selected external projects released in the community built upon MMOCR:
+
+- [TableMASTER-mmocr](https://github.com/JiaquanYe/TableMASTER-mmocr)
+- [WordArt](https://github.com/xdxie/WordArt)
+
+Note: The core maintainers of MMOCR only ensure the results are reproducible and the code quality meets its claim at the time each project was submitted, but they may not be responsible for future maintenance. The original authors take responsibility for maintaining their own projects.
diff --git a/mmocr-dev-1.x/projects/SPTS/README.md b/mmocr-dev-1.x/projects/SPTS/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..af4a4f9b3ba78979ff725c5ba2e58b8472984e3f
--- /dev/null
+++ b/mmocr-dev-1.x/projects/SPTS/README.md
@@ -0,0 +1,186 @@
+# SPTS: Single-Point Text Spotting
+
+
+
+## Description
+
+This is an implementation of [SPTS](https://github.com/shannanyinxiang/SPTS) based on [MMOCR](https://github.com/open-mmlab/mmocr/tree/dev-1.x), [MMCV](https://github.com/open-mmlab/mmcv), and [MMEngine](https://github.com/open-mmlab/mmengine).
+
+Existing scene text spotting (i.e., end-to-end text detection and recognition) methods rely on costly bounding box annotations (e.g., text-line, word-level, or character-level bounding boxes). For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose an end-to-end scene text spotting method that tackles scene text spotting as a sequence prediction task. Given an image as input, we formulate the desired detection and recognition results as a sequence of discrete tokens and use an auto-regressive Transformer to predict the sequence. The proposed method is simple yet effective, which can achieve state-of-the-art results on widely used benchmarks. Most significantly, we show that the performance is not very sensitive to the positions of the point annotation, meaning that it can be much easier to be annotated or even be automatically generated than the bounding box that requires precise positions. We believe that such a pioneer attempt indicates a significant opportunity for scene text spotting applications of a much larger scale than previously possible.
+
+