Spaces:

MeYourHint
/

MoMask

Running

App Files Files Community

MeYourHint commited on Dec 23, 2023

Commit

c0eac48

1 Parent(s): 08572f0

first demo version

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.DS_Store +0 -0
LICENSE +21 -0
README.md +221 -13
app.py +203 -0
assets/mapping.json +1 -0
assets/mapping6.json +1 -0
assets/text_prompt.txt +12 -0
common/__init__.py +0 -0
common/quaternion.py +423 -0
common/skeleton.py +199 -0
data/__init__.py +0 -0
data/t2m_dataset.py +348 -0
dataset/__init__.py +0 -0
edit_t2m.py +195 -0
environment.yml +204 -0
eval_t2m_trans_res.py +199 -0
eval_t2m_vq.py +123 -0
example_data/000612.mp4 +0 -0
example_data/000612.npy +3 -0
gen_t2m.py +261 -0
models/.DS_Store +0 -0
models/__init__.py +0 -0
models/mask_transformer/__init__.py +0 -0
models/mask_transformer/tools.py +165 -0
models/mask_transformer/transformer.py +1039 -0
models/mask_transformer/transformer_trainer.py +359 -0
models/t2m_eval_modules.py +182 -0
models/t2m_eval_wrapper.py +191 -0
models/vq/__init__.py +0 -0
models/vq/encdec.py +68 -0
models/vq/model.py +124 -0
models/vq/quantizer.py +180 -0
models/vq/residual_vq.py +194 -0
models/vq/resnet.py +84 -0
models/vq/vq_trainer.py +359 -0
motion_loaders/__init__.py +0 -0
motion_loaders/dataset_motion_loader.py +27 -0
options/__init__.py +0 -0
options/base_option.py +61 -0
options/eval_option.py +38 -0
options/train_option.py +64 -0
options/vq_option.py +89 -0
prepare/.DS_Store +0 -0
prepare/download_evaluator.sh +24 -0
prepare/download_glove.sh +9 -0
prepare/download_models.sh +31 -0
prepare/download_models_demo.sh +10 -0
requirements.txt +140 -0
train_res_transformer.py +171 -0
train_t2m_transformer.py +153 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Chuan Guo
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,13 +1,221 @@
----
-title: MoMask
-emoji: 💻
-colorFrom: indigo
-colorTo: green
-sdk: gradio
-sdk_version: 4.12.0
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# MoMask: Generative Masked Modeling of 3D Human Motions
+## [[Project Page]](https://ericguo5513.github.io/momask) [[Paper]](https://arxiv.org/abs/2312.00063)
+![teaser_image](https://ericguo5513.github.io/momask/static/images/teaser.png)
+If you find our code or paper helpful, please consider citing:
+```
+@article{guo2023momask,
+      title={MoMask: Generative Masked Modeling of 3D Human Motions},
+      author={Chuan Guo and Yuxuan Mu and Muhammad Gohar Javed and Sen Wang and Li Cheng},
+      year={2023},
+      eprint={2312.00063},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```
+## :postbox: News
+📢 **2023-12-19** --- Release scripts for temporal inpainting.
+📢 **2023-12-15** --- Release codes and models for momask. Including training/eval/generation scripts.
+📢 **2023-11-29** --- Initialized the webpage and git project.
+## :round_pushpin: Get You Ready
+<details>
+### 1. Conda Environment
+```
+conda env create -f environment.yml
+conda activate momask
+pip install git+https://github.com/openai/CLIP.git
+```
+We test our code on Python 3.7.13 and PyTorch 1.7.1
+### 2. Models and Dependencies
+#### Download Pre-trained Models
+```
+bash prepare/download_models.sh
+```
+#### Download Evaluation Models and Gloves
+For evaluation only.
+```
+bash prepare/download_evaluator.sh
+bash prepare/download_glove.sh
+```
+#### Troubleshooting
+To address the download error related to gdown: "Cannot retrieve the public link of the file. You may need to change the permission to 'Anyone with the link', or have had many accesses". A potential solution is to run `pip install --upgrade --no-cache-dir gdown`, as suggested on https://github.com/wkentaro/gdown/issues/43. This should help resolve the issue.
+#### (Optional) Download Mannually
+Visit [[Google Drive]](https://drive.google.com/drive/folders/1b3GnAbERH8jAoO5mdWgZhyxHB73n23sK?usp=drive_link) to download the models and evaluators mannually.
+### 3. Get Data
+You have two options here:
+* **Skip getting data**, if you just want to generate motions using *own* descriptions.
+* **Get full data**, if you want to *re-train* and *evaluate* the model.
+**(a). Full data (text + motion)**
+**HumanML3D** - Follow the instruction in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git), then copy the result dataset to our repository:
+```
+cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
+```
+**KIT**-Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git), then place result in `./dataset/KIT-ML`
+####
+</details>
+## :rocket: Demo
+<details>
+### (a) Generate from a single prompt
+```
+python gen_t2m.py --gpu_id 1 --ext exp1 --text_prompt "A person is running on a treadmill."
+```
+### (b) Generate from a prompt file
+An example of prompt file is given in `./assets/text_prompt.txt`. Please follow the format of `<text description>#<motion length>` at each line. Motion length indicates the number of poses, which must be integeter and will be rounded by 4. In our work, motion is in 20 fps.
+If you write `<text description>#NA`, our model will determine a length. Note once there is **one** NA, all the others will be **NA** automatically.
+```
+python gen_t2m.py --gpu_id 1 --ext exp2 --text_path ./assets/text_prompt.txt
+```
+A few more parameters you may be interested:
+* `--repeat_times`: number of replications for generation, default `1`.
+* `--motion_length`: specify the number of poses for generation, only applicable in (a).
+The output files are stored under folder `./generation/<ext>/`. They are
+* `numpy files`: generated motions with shape of (nframe, 22, 3), under subfolder `./joints`.
+* `video files`: stick figure animation in mp4 format, under subfolder `./animation`.
+* `bvh files`: bvh files of the generated motion, under subfolder `./animation`.
+We also apply naive foot ik to the generated motions, see files with suffix `_ik`. It sometimes works well, but sometimes will fail.
+</details>
+## :dancers: Visualization
+<details>
+All the animations are manually rendered in blender. We use the characters from [mixamo](https://www.mixamo.com/#/). You need to download the characters in T-Pose with skeleton.
+### Retargeting
+For retargeting, we found rokoko usually leads to large error on foot. On the other hand, [keemap.rig.transfer](https://github.com/nkeeline/Keemap-Blender-Rig-ReTargeting-Addon/releases) shows more precise retargetting. You could watch the [tutorial](https://www.youtube.com/watch?v=EG-VCMkVpxg) here.
+Following these steps:
+* Download keemap.rig.transfer from the github, and install it in blender.
+* Import both the motion files (.bvh) and character files (.fbx) in blender.
+* `Shift + Select` the both source and target skeleton. (Do not need to be Rest Position)
+* Switch to `Pose Mode`, then unfold the `KeeMapRig` tool at the top-right corner of the view window.
+* Load and read the bone mapping file `./assets/mapping.json`(or `mapping6.json` if it doesn't work). This file is manually made by us. It works for most characters in mixamo. You could make your own.
+* Adjust the `Number of Samples`, `Source Rig`, `Destination Rig Name`.
+* Clik `Transfer Animation from Source Destination`, wait a few seconds.
+We didn't tried other retargetting tools. Welcome to comment if you find others are more useful.
+### Scene
+We use this [scene](https://drive.google.com/file/d/1lg62nugD7RTAIz0Q_YP2iZsxpUzzOkT1/view?usp=sharing) for animation.
+</details>
+## :clapper: Temporal Inpainting
+<details>
+We conduct mask-based editing in the m-transformer stage, followed by the regeneration of residual tokens for the entire sequence. To load your own motion, provide the path through `--source_motion`. Utilize `-msec` to specify the mask section, supporting either ratio or frame index. For instance, `-msec 0.3,0.6` with `max_motion_length=196` is equivalent to `-msec 59,118`, indicating the editing of the frame section [59, 118].
+```
+python edit_t2m.py --gpu_id 1 --ext exp3 --use_res_model -msec 0.4,0.7 --text_prompt "A man picks something from the ground using his right hand."
+```
+Note: Presently, the source motion must adhere to the format of a HumanML3D dim-263 feature vector. An example motion vector data from the HumanML3D test set is available in `example_data/000612.npy`. To process your own motion data, you can utilize the `process_file` function from `utils/motion_process.py`.
+</details>
+## :space_invader: Train Your Own Models
+<details>
+**Note**: You have to train RVQ **BEFORE** training masked/residual transformers. The latter two can be trained simultaneously.
+### Train RVQ
+```
+python train_vq.py --name rvq_name --gpu_id 1 --dataset_name t2m --batch_size 512 --num_quantizers 6  --max_epoch 500 --quantize_drop_prob 0.2
+```
+### Train Masked Transformer
+```
+python train_t2m_transformer.py --name mtrans_name --gpu_id 2 --dataset_name t2m --batch_size 64 --vq_name rvq_name
+```
+### Train Residual Transformer
+```
+python train_res_transformer.py --name rtrans_name  --gpu_id 2 --dataset_name t2m --batch_size 64 --vq_name rvq_name --cond_drop_prob 0.2 --share_weight
+```
+* `--dataset_name`: motion dataset, `t2m` for HumanML3D and `kit` for KIT-ML.
+* `--name`: name your model. This will create to model space as `./checkpoints/<dataset_name>/<name>`
+* `--gpu_id`: GPU id.
+* `--batch_size`: we use `512` for rvq training. For masked/residual transformer, we use `64` on HumanML3D and `16` for KIT-ML.
+* `--num_quantizers`: number of quantization layers, `6` is used in our case.
+* `--quantize_drop_prob`: quantization dropout ratio, `0.2` is used.
+* `--vq_name`: when training masked/residual transformer, you need to specify the name of rvq model for tokenization.
+* `--cond_drop_prob`: condition drop ratio, for classifier-free guidance. `0.2` is used.
+* `--share_weight`: whether to share the projection/embedding weights in residual transformer.
+All the pre-trained models and intermediate results will be saved in space `./checkpoints/<dataset_name>/<name>`.
+</details>
+## :book: Evaluation
+<details>
+### Evaluate RVQ Reconstruction:
+HumanML3D:
+```
+python eval_t2m_vq.py --gpu_id 0 --name rvq_nq6_dc512_nc512_noshare_qdp0.2 --dataset_name t2m --ext rvq_nq6
+```
+KIT-ML:
+```
+python eval_t2m_vq.py --gpu_id 0 --name rvq_nq6_dc512_nc512_noshare_qdp0.2_k --dataset_name kit --ext rvq_nq6
+```
+### Evaluate Text2motion Generation:
+HumanML3D:
+```
+python eval_t2m_trans_res.py --res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw --dataset_name t2m --name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns --gpu_id 1 --cond_scale 4 --time_steps 10 --ext evaluation
+```
+KIT-ML:
+```
+python eval_t2m_trans_res.py --res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw_k --dataset_name kit --name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns_k --gpu_id 0 --cond_scale 2 --time_steps 10 --ext evaluation
+```
+* `--res_name`: model name of `residual transformer`.
+* `--name`: model name of `masked transformer`.
+* `--cond_scale`: scale of classifer-free guidance.
+* `--time_steps`: number of iterations for inference.
+* `--ext`: filename for saving evaluation results.
+The final evaluation results will be saved in `./checkpoints/<dataset_name>/<name>/eval/<ext>.log`
+</details>
+## Acknowlegements
+We sincerely thank the open-sourcing of these works where our code is based on:
+[deep-motion-editing](https://github.com/DeepMotionEditing/deep-motion-editing), [Muse](https://github.com/lucidrains/muse-maskgit-pytorch), [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch), [T2M-GPT](https://github.com/Mael-zys/T2M-GPT), [MDM](https://github.com/GuyTevet/motion-diffusion-model/tree/main) and [MLD](https://github.com/ChenFengYe/motion-latent-diffusion/tree/main)
+## License
+This code is distributed under an [MIT LICENSE](https://github.com/EricGuo5513/momask-codes/tree/main?tab=MIT-1-ov-file#readme).
+Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

app.py ADDED Viewed

	@@ -0,0 +1,203 @@

+from functools import partial
+import os
+import torch
+import numpy as np
+import gradio as gr
+import gdown
+WEBSITE = """
+<div class="embed_hidden">
+<h1 style='text-align: center'> MoMask: Generative Masked Modeling of 3D Human Motions </h1>
+<h2 style='text-align: center'>
+<a href="https://ericguo5513.github.io" target="_blank"><nobr>Chuan Guo*</nobr></a> &emsp;
+<a href="https://yxmu.foo/" target="_blank"><nobr>Yuxuan Mu*</nobr></a> &emsp;
+<a href="https://scholar.google.com/citations?user=w4e-j9sAAAAJ&hl=en" target="_blank"><nobr>Muhammad Gohar Javed*</nobr></a> &emsp;
+<a href="https://sites.google.com/site/senwang1312home/" target="_blank"><nobr>Sen Wang</nobr></a> &emsp;
+<a href="https://www.ece.ualberta.ca/~lcheng5/" target="_blank"><nobr>Li Cheng</nobr></a>
+</h2>
+<h2 style='text-align: center'>
+<nobr>arXiv 2023</nobr>
+</h2>
+<h3 style="text-align:center;">
+<a target="_blank" href="https://arxiv.org/abs/2312.00063"> <button type="button" class="btn btn-primary btn-lg"> Paper </button></a> &ensp;
+<a target="_blank" href="https://github.com/EricGuo5513/momask-codes"> <button type="button" class="btn btn-primary btn-lg"> Code </button></a> &ensp;
+<a target="_blank" href="https://ericguo5513.github.io/momask/"> <button type="button" class="btn btn-primary btn-lg"> Webpage </button></a> &ensp;
+<a target="_blank" href="https://ericguo5513.github.io/source_files/momask_2023_bib.txt"> <button type="button" class="btn btn-primary btn-lg"> BibTex </button></a>
+</h3>
+<h3> Description </h3>
+<p>
+This space illustrates <a href='https://ericguo5513.github.io/momask/' target='_blank'><b>MoMask</b></a>, a method for text-to-motion generation.
+</p>
+</div>
+"""
+EXAMPLES = [
+    "A person is walking slowly",
+    "A person is walking in a circle",
+    "A person is jumping rope",
+    "Someone is doing a backflip",
+    "A person is doing a moonwalk",
+    "A person walks forward and then turns back",
+    "Picking up an object",
+    "A person is swimming in the sea",
+    "A human is squatting",
+    "Someone is jumping with one foot",
+    "A person is chopping vegetables",
+    "Someone walks backward",
+    "Somebody is ascending a staircase",
+    "A person is sitting down",
+    "A person is taking the stairs",
+    "Someone is doing jumping jacks",
+    "The person walked forward and is picking up his toolbox",
+    "The person angrily punching the air",
+]
+# Show closest text in the training
+# css to make videos look nice
+# var(--block-border-color); TODO
+CSS = """
+.retrieved_video {
+    position: relative;
+    margin: 0;
+    box-shadow: var(--block-shadow);
+    border-width: var(--block-border-width);
+    border-color: #000000;
+    border-radius: var(--block-radius);
+    background: var(--block-background-fill);
+    width: 100%;
+    line-height: var(--line-sm);
+}
+}
+"""
+DEFAULT_TEXT = "A person is "
+def generate(
+    text, uid, motion_length=0, seed=351540, repeat_times=4,
+):
+    os.system(f'python gen_t2m.py --gpu_id 0 --seed {seed} --ext {uid} --repeat_times {repeat_times} --motion_length {motion_length} --text_prompt {text}')
+    datas = []
+    for n in repeat_times:
+        data_unit = {
+            "url": f"./generation/{uid}/animations/0/sample0_repeat{n}_len196_ik.mp4"
+            }
+        datas.append(data_unit)
+    return datas
+# HTML component
+def get_video_html(data, video_id, width=700, height=700):
+    url = data["url"]
+    # class="wrap default svelte-gjihhp hide"
+    # <div class="contour_video" style="position: absolute; padding: 10px;">
+    # width="{width}" height="{height}"
+    video_html = f"""
+<video class="retrieved_video" width="{width}" height="{height}" preload="auto" muted playsinline onpause="this.load()"
+autoplay loop disablepictureinpicture id="{video_id}">
+  <source src="{url}" type="video/mp4">
+  Your browser does not support the video tag.
+</video>
+"""
+    return video_html
+def generate_component(generate_function, text):
+    if text == DEFAULT_TEXT or text == "" or text is None:
+        return [None for _ in range(4)]
+    datas = generate_function(text, )
+    htmls = [get_video_html(data, idx) for idx, data in enumerate(datas)]
+    return htmls
+if not os.path.exists("checkpoints/t2m"):
+    os.system("bash prepare/download_models.sh")
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# LOADING
+# DEMO
+theme = gr.themes.Default(primary_hue="blue", secondary_hue="gray")
+generate_and_show = partial(generate_component, generate)
+with gr.Blocks(css=CSS, theme=theme) as demo:
+    gr.Markdown(WEBSITE)
+    videos = []
+    with gr.Row():
+        with gr.Column(scale=3):
+            with gr.Column(scale=2):
+                text = gr.Textbox(
+                    show_label=True,
+                    label="Text prompt",
+                    value=DEFAULT_TEXT,
+                )
+            with gr.Column(scale=1):
+                gen_btn = gr.Button("Generate", variant="primary")
+                clear = gr.Button("Clear", variant="secondary")
+        with gr.Column(scale=2):
+            def generate_example(text):
+                return generate_and_show(text)
+            examples = gr.Examples(
+                examples=[[x, None, None] for x in EXAMPLES],
+                inputs=[text],
+                examples_per_page=20,
+                run_on_click=False,
+                cache_examples=False,
+                fn=generate_example,
+                outputs=[],
+            )
+    i = -1
+    # should indent
+    for _ in range(1):
+        with gr.Row():
+            for _ in range(4):
+                i += 1
+                video = gr.HTML()
+                videos.append(video)
+    # connect the examples to the output
+    # a bit hacky
+    examples.outputs = videos
+    def load_example(example_id):
+        processed_example = examples.non_none_processed_examples[example_id]
+        return gr.utils.resolve_singleton(processed_example)
+    examples.dataset.click(
+        load_example,
+        inputs=[examples.dataset],
+        outputs=examples.inputs_with_examples,  # type: ignore
+        show_progress=False,
+        postprocess=False,
+        queue=False,
+    ).then(fn=generate_example, inputs=examples.inputs, outputs=videos)
+    gen_btn.click(
+        fn=generate_and_show,
+        inputs=[text],
+        outputs=videos,
+    )
+    text.submit(
+        fn=generate_and_show,
+        inputs=[text],
+        outputs=videos,
+    )
+    def clear_videos():
+        return [None for x in range(4)] + [DEFAULT_TEXT]
+    clear.click(fn=clear_videos, outputs=videos + [text])
+demo.launch()

assets/mapping.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"bones": [{"name": "Hips", "label": "", "description": "", "SourceBoneName": "Hips", "DestinationBoneName": "mixamorig:Hips", "keyframe_this_bone": true, "CorrectionFactorX": 2.6179938316345215, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": true, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 0.2588190734386444, "QuatCorrectionFactorx": 0.965925931930542, "QuatCorrectionFactory": 2.7939677238464355e-09, "QuatCorrectionFactorz": -2.7939677238464355e-09, "scale_secondary_bone_name": ""}, {"name": "RightUpLeg", "label": "", "description": "", "SourceBoneName": "RightUpLeg", "DestinationBoneName": "mixamorig:RightUpLeg", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftUpLeg", "label": "", "description": "", "SourceBoneName": "LeftUpLeg", "DestinationBoneName": "mixamorig:LeftUpLeg", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightLeg", "label": "", "description": "", "SourceBoneName": "RightLeg", "DestinationBoneName": "mixamorig:RightLeg", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 2.094395160675049, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftLeg", "label": "", "description": "", "SourceBoneName": "LeftLeg", "DestinationBoneName": "mixamorig:LeftLeg", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 3.665191411972046, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightShoulder", "label": "", "description": "", "SourceBoneName": "RightShoulder", "DestinationBoneName": "mixamorig:RightShoulder", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftShoulder", "label": "", "description": "", "SourceBoneName": "LeftShoulder", "DestinationBoneName": "mixamorig:LeftShoulder", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightArm", "label": "", "description": "", "SourceBoneName": "RightArm", "DestinationBoneName": "mixamorig:RightArm", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": -1.0471975803375244, "CorrectionFactorZ": -0.1745329201221466, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftArm", "label": "", "description": "", "SourceBoneName": "LeftArm", "DestinationBoneName": "mixamorig:LeftArm", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 1.0471975803375244, "CorrectionFactorZ": 0.1745329201221466, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightForeArm", "label": "", "description": "", "SourceBoneName": "RightForeArm", "DestinationBoneName": "mixamorig:RightForeArm", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": -2.094395160675049, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftForeArm", "label": "", "description": "", "SourceBoneName": "LeftForeArm", "DestinationBoneName": "mixamorig:LeftForeArm", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 1.5707963705062866, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Spine", "label": "", "description": "", "SourceBoneName": "Spine", "DestinationBoneName": "mixamorig:Spine", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Spine1", "label": "", "description": "", "SourceBoneName": "Spine1", "DestinationBoneName": "mixamorig:Spine1", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Spine2", "label": "", "description": "", "SourceBoneName": "Spine2", "DestinationBoneName": "mixamorig:Spine2", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Neck", "label": "", "description": "", "SourceBoneName": "Neck", "DestinationBoneName": "mixamorig:Neck", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Head", "label": "", "description": "", "SourceBoneName": "Head", "DestinationBoneName": "mixamorig:Head", "keyframe_this_bone": true, "CorrectionFactorX": 0.3490658402442932, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightFoot", "label": "", "description": "", "SourceBoneName": "RightFoot", "DestinationBoneName": "mixamorig:RightFoot", "keyframe_this_bone": true, "CorrectionFactorX": -0.19192171096801758, "CorrectionFactorY": 2.979980945587158, "CorrectionFactorZ": -0.05134282633662224, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": -0.082771435379982, "QuatCorrectionFactorx": -0.0177358016371727, "QuatCorrectionFactory": -0.9920229315757751, "QuatCorrectionFactorz": -0.09340716898441315, "scale_secondary_bone_name": ""}, {"name": "LeftFoot", "label": "", "description": "", "SourceBoneName": "LeftFoot", "DestinationBoneName": "mixamorig:LeftFoot", "keyframe_this_bone": true, "CorrectionFactorX": -0.25592508912086487, "CorrectionFactorY": -2.936899423599243, "CorrectionFactorZ": 0.2450830191373825, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 0.11609010398387909, "QuatCorrectionFactorx": 0.10766097158193588, "QuatCorrectionFactory": -0.9808290004730225, "QuatCorrectionFactorz": -0.11360746622085571, "scale_secondary_bone_name": ""}], "start_frame_to_apply": 0, "number_of_frames_to_apply": 196, "keyframe_every_n_frames": 1, "source_rig_name": "bvh_batch1_sample30_repeat1_len48", "destination_rig_name": "Armature", "bone_rotation_mode": "EULER", "bone_mapping_file": "C:\\Users\\cguo2\\Documents\\CVPR2024_MoMask\\mapping.json"}

assets/mapping6.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"bones": [{"name": "Hips", "label": "", "description": "", "SourceBoneName": "Hips", "DestinationBoneName": "mixamorig6:Hips", "keyframe_this_bone": true, "CorrectionFactorX": 2.6179938316345215, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": true, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 0.2588190734386444, "QuatCorrectionFactorx": 0.965925931930542, "QuatCorrectionFactory": 2.7939677238464355e-09, "QuatCorrectionFactorz": -2.7939677238464355e-09, "scale_secondary_bone_name": ""}, {"name": "RightUpLeg", "label": "", "description": "", "SourceBoneName": "RightUpLeg", "DestinationBoneName": "mixamorig6:RightUpLeg", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftUpLeg", "label": "", "description": "", "SourceBoneName": "LeftUpLeg", "DestinationBoneName": "mixamorig6:LeftUpLeg", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightLeg", "label": "", "description": "", "SourceBoneName": "RightLeg", "DestinationBoneName": "mixamorig6:RightLeg", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 2.094395160675049, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftLeg", "label": "", "description": "", "SourceBoneName": "LeftLeg", "DestinationBoneName": "mixamorig6:LeftLeg", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 3.665191411972046, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightShoulder", "label": "", "description": "", "SourceBoneName": "RightShoulder", "DestinationBoneName": "mixamorig6:RightShoulder", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftShoulder", "label": "", "description": "", "SourceBoneName": "LeftShoulder", "DestinationBoneName": "mixamorig6:LeftShoulder", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightArm", "label": "", "description": "", "SourceBoneName": "RightArm", "DestinationBoneName": "mixamorig6:RightArm", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": -1.0471975803375244, "CorrectionFactorZ": -0.1745329201221466, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftArm", "label": "", "description": "", "SourceBoneName": "LeftArm", "DestinationBoneName": "mixamorig6:LeftArm", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 1.0471975803375244, "CorrectionFactorZ": 0.1745329201221466, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "RightForeArm", "label": "", "description": "", "SourceBoneName": "RightForeArm", "DestinationBoneName": "mixamorig6:RightForeArm", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": -2.094395160675049, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "LeftForeArm", "label": "", "description": "", "SourceBoneName": "LeftForeArm", "DestinationBoneName": "mixamorig6:LeftForeArm", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 1.5707963705062866, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Spine", "label": "", "description": "", "SourceBoneName": "Spine", "DestinationBoneName": "mixamorig6:Spine", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Spine1", "label": "", "description": "", "SourceBoneName": "Spine1", "DestinationBoneName": "mixamorig6:Spine1", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Spine2", "label": "", "description": "", "SourceBoneName": "Spine2", "DestinationBoneName": "mixamorig6:Spine2", "keyframe_this_bone": true, "CorrectionFactorX": 0.0, "CorrectionFactorY": 0.0, "CorrectionFactorZ": 0.0, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 1.0, "QuatCorrectionFactorx": 0.0, "QuatCorrectionFactory": 0.0, "QuatCorrectionFactorz": 0.0, "scale_secondary_bone_name": ""}, {"name": "Neck", "label": "", "description": "", "SourceBoneName": "Neck", "DestinationBoneName": "mixamorig6:Neck", "keyframe_this_bone": true, "CorrectionFactorX": -0.994345486164093, "CorrectionFactorY": -0.006703000050038099, "CorrectionFactorZ": 0.04061730206012726, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 0.8787809014320374, "QuatCorrectionFactorx": -0.4767816960811615, "QuatCorrectionFactory": -0.01263047568500042, "QuatCorrectionFactorz": 0.016250507906079292, "scale_secondary_bone_name": ""}, {"name": "Head", "label": "", "description": "", "SourceBoneName": "Head", "DestinationBoneName": "mixamorig6:Head", "keyframe_this_bone": true, "CorrectionFactorX": -0.07639937847852707, "CorrectionFactorY": 0.011205507442355156, "CorrectionFactorZ": 0.011367863975465298, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": 0.9992374181747437, "QuatCorrectionFactorx": -0.038221005350351334, "QuatCorrectionFactory": 0.0053814793936908245, "QuatCorrectionFactorz": 0.005893632769584656, "scale_secondary_bone_name": ""}, {"name": "RightFoot", "label": "", "description": "", "SourceBoneName": "RightFoot", "DestinationBoneName": "mixamorig6:RightFoot", "keyframe_this_bone": true, "CorrectionFactorX": -0.17194896936416626, "CorrectionFactorY": 2.7372374534606934, "CorrectionFactorZ": -0.029542576521635056, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": -0.20128199458122253, "QuatCorrectionFactorx": 0.002824343740940094, "QuatCorrectionFactory": -0.9761614799499512, "QuatCorrectionFactorz": -0.08115538209676743, "scale_secondary_bone_name": ""}, {"name": "LeftFoot", "label": "", "description": "", "SourceBoneName": "LeftFoot", "DestinationBoneName": "mixamorig6:LeftFoot", "keyframe_this_bone": true, "CorrectionFactorX": -0.09363158047199249, "CorrectionFactorY": -2.9336421489715576, "CorrectionFactorZ": -0.17343592643737793, "has_twist_bone": false, "TwistBoneName": "", "set_bone_position": false, "set_bone_rotation": true, "bone_rotation_application_axis": "XYZ", "position_correction_factorX": 0.0, "position_correction_factorY": 0.0, "position_correction_factorZ": 0.0, "position_gain": 1.0, "position_pole_distance": 0.30000001192092896, "postion_type": "SINGLE_BONE_OFFSET", "set_bone_scale": false, "scale_gain": 1.0, "scale_max": 1.0, "scale_min": 0.5, "bone_scale_application_axis": "Y", "QuatCorrectionFactorw": -0.09925344586372375, "QuatCorrectionFactorx": 0.09088610112667084, "QuatCorrectionFactory": 0.9893556833267212, "QuatCorrectionFactorz": 0.05535021424293518, "scale_secondary_bone_name": ""}], "start_frame_to_apply": 0, "number_of_frames_to_apply": 196, "keyframe_every_n_frames": 1, "source_rig_name": "MoMask__02_ik", "destination_rig_name": "Armature", "bone_rotation_mode": "EULER", "bone_mapping_file": "C:\\Users\\cguo2\\Documents\\CVPR2024_MoMask\\mapping6.json"}

assets/text_prompt.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+the person holds his left foot with his left hand, puts his right foot up and left hand up too.#132
+a man bends down and picks something up with his left hand.#84
+A man stands for few seconds and picks up his arms and shakes them.#176
+A person walks with a limp, their left leg get injured.#192
+a person jumps up and then lands.#52
+a person performs a standing back kick.#52
+A person pokes their right hand along the ground, like they might be planting seeds.#60
+the person steps forward and uses the left leg to kick something forward.#92
+the man walked forward, spun right on one foot and walked back to his original position.#92
+the person was pushed but did not fall.#124
+this person stumbles left and right while moving forward.#132
+a person reaching down and picking something up.#148

common/__init__.py ADDED Viewed

File without changes

common/quaternion.py ADDED Viewed

	@@ -0,0 +1,423 @@

+# Copyright (c) 2018-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+#
+import torch
+import numpy as np
+_EPS4 = np.finfo(float).eps * 4.0
+_FLOAT_EPS = np.finfo(np.float).eps
+# PyTorch-backed implementations
+def qinv(q):
+    assert q.shape[-1] == 4, 'q must be a tensor of shape (*, 4)'
+    mask = torch.ones_like(q)
+    mask[..., 1:] = -mask[..., 1:]
+    return q * mask
+def qinv_np(q):
+    assert q.shape[-1] == 4, 'q must be a tensor of shape (*, 4)'
+    return qinv(torch.from_numpy(q).float()).numpy()
+def qnormalize(q):
+    assert q.shape[-1] == 4, 'q must be a tensor of shape (*, 4)'
+    return q / torch.norm(q, dim=-1, keepdim=True)
+def qmul(q, r):
+    """
+    Multiply quaternion(s) q with quaternion(s) r.
+    Expects two equally-sized tensors of shape (*, 4), where * denotes any number of dimensions.
+    Returns q*r as a tensor of shape (*, 4).
+    """
+    assert q.shape[-1] == 4
+    assert r.shape[-1] == 4
+    original_shape = q.shape
+    # Compute outer product
+    terms = torch.bmm(r.view(-1, 4, 1), q.view(-1, 1, 4))
+    w = terms[:, 0, 0] - terms[:, 1, 1] - terms[:, 2, 2] - terms[:, 3, 3]
+    x = terms[:, 0, 1] + terms[:, 1, 0] - terms[:, 2, 3] + terms[:, 3, 2]
+    y = terms[:, 0, 2] + terms[:, 1, 3] + terms[:, 2, 0] - terms[:, 3, 1]
+    z = terms[:, 0, 3] - terms[:, 1, 2] + terms[:, 2, 1] + terms[:, 3, 0]
+    return torch.stack((w, x, y, z), dim=1).view(original_shape)
+def qrot(q, v):
+    """
+    Rotate vector(s) v about the rotation described by quaternion(s) q.
+    Expects a tensor of shape (*, 4) for q and a tensor of shape (*, 3) for v,
+    where * denotes any number of dimensions.
+    Returns a tensor of shape (*, 3).
+    """
+    assert q.shape[-1] == 4
+    assert v.shape[-1] == 3
+    assert q.shape[:-1] == v.shape[:-1]
+    original_shape = list(v.shape)
+    # print(q.shape)
+    q = q.contiguous().view(-1, 4)
+    v = v.contiguous().view(-1, 3)
+    qvec = q[:, 1:]
+    uv = torch.cross(qvec, v, dim=1)
+    uuv = torch.cross(qvec, uv, dim=1)
+    return (v + 2 * (q[:, :1] * uv + uuv)).view(original_shape)
+def qeuler(q, order, epsilon=0, deg=True):
+    """
+    Convert quaternion(s) q to Euler angles.
+    Expects a tensor of shape (*, 4), where * denotes any number of dimensions.
+    Returns a tensor of shape (*, 3).
+    """
+    assert q.shape[-1] == 4
+    original_shape = list(q.shape)
+    original_shape[-1] = 3
+    q = q.view(-1, 4)
+    q0 = q[:, 0]
+    q1 = q[:, 1]
+    q2 = q[:, 2]
+    q3 = q[:, 3]
+    if order == 'xyz':
+        x = torch.atan2(2 * (q0 * q1 - q2 * q3), 1 - 2 * (q1 * q1 + q2 * q2))
+        y = torch.asin(torch.clamp(2 * (q1 * q3 + q0 * q2), -1 + epsilon, 1 - epsilon))
+        z = torch.atan2(2 * (q0 * q3 - q1 * q2), 1 - 2 * (q2 * q2 + q3 * q3))
+    elif order == 'yzx':
+        x = torch.atan2(2 * (q0 * q1 - q2 * q3), 1 - 2 * (q1 * q1 + q3 * q3))
+        y = torch.atan2(2 * (q0 * q2 - q1 * q3), 1 - 2 * (q2 * q2 + q3 * q3))
+        z = torch.asin(torch.clamp(2 * (q1 * q2 + q0 * q3), -1 + epsilon, 1 - epsilon))
+    elif order == 'zxy':
+        x = torch.asin(torch.clamp(2 * (q0 * q1 + q2 * q3), -1 + epsilon, 1 - epsilon))
+        y = torch.atan2(2 * (q0 * q2 - q1 * q3), 1 - 2 * (q1 * q1 + q2 * q2))
+        z = torch.atan2(2 * (q0 * q3 - q1 * q2), 1 - 2 * (q1 * q1 + q3 * q3))
+    elif order == 'xzy':
+        x = torch.atan2(2 * (q0 * q1 + q2 * q3), 1 - 2 * (q1 * q1 + q3 * q3))
+        y = torch.atan2(2 * (q0 * q2 + q1 * q3), 1 - 2 * (q2 * q2 + q3 * q3))
+        z = torch.asin(torch.clamp(2 * (q0 * q3 - q1 * q2), -1 + epsilon, 1 - epsilon))
+    elif order == 'yxz':
+        x = torch.asin(torch.clamp(2 * (q0 * q1 - q2 * q3), -1 + epsilon, 1 - epsilon))
+        y = torch.atan2(2 * (q1 * q3 + q0 * q2), 1 - 2 * (q1 * q1 + q2 * q2))
+        z = torch.atan2(2 * (q1 * q2 + q0 * q3), 1 - 2 * (q1 * q1 + q3 * q3))
+    elif order == 'zyx':
+        x = torch.atan2(2 * (q0 * q1 + q2 * q3), 1 - 2 * (q1 * q1 + q2 * q2))
+        y = torch.asin(torch.clamp(2 * (q0 * q2 - q1 * q3), -1 + epsilon, 1 - epsilon))
+        z = torch.atan2(2 * (q0 * q3 + q1 * q2), 1 - 2 * (q2 * q2 + q3 * q3))
+    else:
+        raise
+    if deg:
+        return torch.stack((x, y, z), dim=1).view(original_shape) * 180 / np.pi
+    else:
+        return torch.stack((x, y, z), dim=1).view(original_shape)
+# Numpy-backed implementations
+def qmul_np(q, r):
+    q = torch.from_numpy(q).contiguous().float()
+    r = torch.from_numpy(r).contiguous().float()
+    return qmul(q, r).numpy()
+def qrot_np(q, v):
+    q = torch.from_numpy(q).contiguous().float()
+    v = torch.from_numpy(v).contiguous().float()
+    return qrot(q, v).numpy()
+def qeuler_np(q, order, epsilon=0, use_gpu=False):
+    if use_gpu:
+        q = torch.from_numpy(q).cuda().float()
+        return qeuler(q, order, epsilon).cpu().numpy()
+    else:
+        q = torch.from_numpy(q).contiguous().float()
+        return qeuler(q, order, epsilon).numpy()
+def qfix(q):
+    """
+    Enforce quaternion continuity across the time dimension by selecting
+    the representation (q or -q) with minimal distance (or, equivalently, maximal dot product)
+    between two consecutive frames.
+    Expects a tensor of shape (L, J, 4), where L is the sequence length and J is the number of joints.
+    Returns a tensor of the same shape.
+    """
+    assert len(q.shape) == 3
+    assert q.shape[-1] == 4
+    result = q.copy()
+    dot_products = np.sum(q[1:] * q[:-1], axis=2)
+    mask = dot_products < 0
+    mask = (np.cumsum(mask, axis=0) % 2).astype(bool)
+    result[1:][mask] *= -1
+    return result
+def euler2quat(e, order, deg=True):
+    """
+    Convert Euler angles to quaternions.
+    """
+    assert e.shape[-1] == 3
+    original_shape = list(e.shape)
+    original_shape[-1] = 4
+    e = e.view(-1, 3)
+    ## if euler angles in degrees
+    if deg:
+        e = e * np.pi / 180.
+    x = e[:, 0]
+    y = e[:, 1]
+    z = e[:, 2]
+    rx = torch.stack((torch.cos(x / 2), torch.sin(x / 2), torch.zeros_like(x), torch.zeros_like(x)), dim=1)
+    ry = torch.stack((torch.cos(y / 2), torch.zeros_like(y), torch.sin(y / 2), torch.zeros_like(y)), dim=1)
+    rz = torch.stack((torch.cos(z / 2), torch.zeros_like(z), torch.zeros_like(z), torch.sin(z / 2)), dim=1)
+    result = None
+    for coord in order:
+        if coord == 'x':
+            r = rx
+        elif coord == 'y':
+            r = ry
+        elif coord == 'z':
+            r = rz
+        else:
+            raise
+        if result is None:
+            result = r
+        else:
+            result = qmul(result, r)
+    # Reverse antipodal representation to have a non-negative "w"
+    if order in ['xyz', 'yzx', 'zxy']:
+        result *= -1
+    return result.view(original_shape)
+def expmap_to_quaternion(e):
+    """
+    Convert axis-angle rotations (aka exponential maps) to quaternions.
+    Stable formula from "Practical Parameterization of Rotations Using the Exponential Map".
+    Expects a tensor of shape (*, 3), where * denotes any number of dimensions.
+    Returns a tensor of shape (*, 4).
+    """
+    assert e.shape[-1] == 3
+    original_shape = list(e.shape)
+    original_shape[-1] = 4
+    e = e.reshape(-1, 3)
+    theta = np.linalg.norm(e, axis=1).reshape(-1, 1)
+    w = np.cos(0.5 * theta).reshape(-1, 1)
+    xyz = 0.5 * np.sinc(0.5 * theta / np.pi) * e
+    return np.concatenate((w, xyz), axis=1).reshape(original_shape)
+def euler_to_quaternion(e, order):
+    """
+    Convert Euler angles to quaternions.
+    """
+    assert e.shape[-1] == 3
+    original_shape = list(e.shape)
+    original_shape[-1] = 4
+    e = e.reshape(-1, 3)
+    x = e[:, 0]
+    y = e[:, 1]
+    z = e[:, 2]
+    rx = np.stack((np.cos(x / 2), np.sin(x / 2), np.zeros_like(x), np.zeros_like(x)), axis=1)
+    ry = np.stack((np.cos(y / 2), np.zeros_like(y), np.sin(y / 2), np.zeros_like(y)), axis=1)
+    rz = np.stack((np.cos(z / 2), np.zeros_like(z), np.zeros_like(z), np.sin(z / 2)), axis=1)
+    result = None
+    for coord in order:
+        if coord == 'x':
+            r = rx
+        elif coord == 'y':
+            r = ry
+        elif coord == 'z':
+            r = rz
+        else:
+            raise
+        if result is None:
+            result = r
+        else:
+            result = qmul_np(result, r)
+    # Reverse antipodal representation to have a non-negative "w"
+    if order in ['xyz', 'yzx', 'zxy']:
+        result *= -1
+    return result.reshape(original_shape)
+def quaternion_to_matrix(quaternions):
+    """
+    Convert rotations given as quaternions to rotation matrices.
+    Args:
+        quaternions: quaternions with real part first,
+            as tensor of shape (..., 4).
+    Returns:
+        Rotation matrices as tensor of shape (..., 3, 3).
+    """
+    r, i, j, k = torch.unbind(quaternions, -1)
+    two_s = 2.0 / (quaternions * quaternions).sum(-1)
+    o = torch.stack(
+        (
+            1 - two_s * (j * j + k * k),
+            two_s * (i * j - k * r),
+            two_s * (i * k + j * r),
+            two_s * (i * j + k * r),
+            1 - two_s * (i * i + k * k),
+            two_s * (j * k - i * r),
+            two_s * (i * k - j * r),
+            two_s * (j * k + i * r),
+            1 - two_s * (i * i + j * j),
+        ),
+        -1,
+    )
+    return o.reshape(quaternions.shape[:-1] + (3, 3))
+def quaternion_to_matrix_np(quaternions):
+    q = torch.from_numpy(quaternions).contiguous().float()
+    return quaternion_to_matrix(q).numpy()
+def quaternion_to_cont6d_np(quaternions):
+    rotation_mat = quaternion_to_matrix_np(quaternions)
+    cont_6d = np.concatenate([rotation_mat[..., 0], rotation_mat[..., 1]], axis=-1)
+    return cont_6d
+def quaternion_to_cont6d(quaternions):
+    rotation_mat = quaternion_to_matrix(quaternions)
+    cont_6d = torch.cat([rotation_mat[..., 0], rotation_mat[..., 1]], dim=-1)
+    return cont_6d
+def cont6d_to_matrix(cont6d):
+    assert cont6d.shape[-1] == 6, "The last dimension must be 6"
+    x_raw = cont6d[..., 0:3]
+    y_raw = cont6d[..., 3:6]
+    x = x_raw / torch.norm(x_raw, dim=-1, keepdim=True)
+    z = torch.cross(x, y_raw, dim=-1)
+    z = z / torch.norm(z, dim=-1, keepdim=True)
+    y = torch.cross(z, x, dim=-1)
+    x = x[..., None]
+    y = y[..., None]
+    z = z[..., None]
+    mat = torch.cat([x, y, z], dim=-1)
+    return mat
+def cont6d_to_matrix_np(cont6d):
+    q = torch.from_numpy(cont6d).contiguous().float()
+    return cont6d_to_matrix(q).numpy()
+def qpow(q0, t, dtype=torch.float):
+    ''' q0 : tensor of quaternions
+    t: tensor of powers
+    '''
+    q0 = qnormalize(q0)
+    theta0 = torch.acos(q0[..., 0])
+    ## if theta0 is close to zero, add epsilon to avoid NaNs
+    mask = (theta0 <= 10e-10) * (theta0 >= -10e-10)
+    theta0 = (1 - mask) * theta0 + mask * 10e-10
+    v0 = q0[..., 1:] / torch.sin(theta0).view(-1, 1)
+    if isinstance(t, torch.Tensor):
+        q = torch.zeros(t.shape + q0.shape)
+        theta = t.view(-1, 1) * theta0.view(1, -1)
+    else:  ## if t is a number
+        q = torch.zeros(q0.shape)
+        theta = t * theta0
+    q[..., 0] = torch.cos(theta)
+    q[..., 1:] = v0 * torch.sin(theta).unsqueeze(-1)
+    return q.to(dtype)
+def qslerp(q0, q1, t):
+    '''
+    q0: starting quaternion
+    q1: ending quaternion
+    t: array of points along the way
+    Returns:
+    Tensor of Slerps: t.shape + q0.shape
+    '''
+    q0 = qnormalize(q0)
+    q1 = qnormalize(q1)
+    q_ = qpow(qmul(q1, qinv(q0)), t)
+    return qmul(q_,
+                q0.contiguous().view(torch.Size([1] * len(t.shape)) + q0.shape).expand(t.shape + q0.shape).contiguous())
+def qbetween(v0, v1):
+    '''
+    find the quaternion used to rotate v0 to v1
+    '''
+    assert v0.shape[-1] == 3, 'v0 must be of the shape (*, 3)'
+    assert v1.shape[-1] == 3, 'v1 must be of the shape (*, 3)'
+    v = torch.cross(v0, v1)
+    w = torch.sqrt((v0 ** 2).sum(dim=-1, keepdim=True) * (v1 ** 2).sum(dim=-1, keepdim=True)) + (v0 * v1).sum(dim=-1,
+                                                                                                              keepdim=True)
+    return qnormalize(torch.cat([w, v], dim=-1))
+def qbetween_np(v0, v1):
+    '''
+    find the quaternion used to rotate v0 to v1
+    '''
+    assert v0.shape[-1] == 3, 'v0 must be of the shape (*, 3)'
+    assert v1.shape[-1] == 3, 'v1 must be of the shape (*, 3)'
+    v0 = torch.from_numpy(v0).float()
+    v1 = torch.from_numpy(v1).float()
+    return qbetween(v0, v1).numpy()
+def lerp(p0, p1, t):
+    if not isinstance(t, torch.Tensor):
+        t = torch.Tensor([t])
+    new_shape = t.shape + p0.shape
+    new_view_t = t.shape + torch.Size([1] * len(p0.shape))
+    new_view_p = torch.Size([1] * len(t.shape)) + p0.shape
+    p0 = p0.view(new_view_p).expand(new_shape)
+    p1 = p1.view(new_view_p).expand(new_shape)
+    t = t.view(new_view_t).expand(new_shape)
+    return p0 + t * (p1 - p0)

common/skeleton.py ADDED Viewed

	@@ -0,0 +1,199 @@

+from common.quaternion import *
+import scipy.ndimage.filters as filters
+class Skeleton(object):
+    def __init__(self, offset, kinematic_tree, device):
+        self.device = device
+        self._raw_offset_np = offset.numpy()
+        self._raw_offset = offset.clone().detach().to(device).float()
+        self._kinematic_tree = kinematic_tree
+        self._offset = None
+        self._parents = [0] * len(self._raw_offset)
+        self._parents[0] = -1
+        for chain in self._kinematic_tree:
+            for j in range(1, len(chain)):
+                self._parents[chain[j]] = chain[j-1]
+    def njoints(self):
+        return len(self._raw_offset)
+    def offset(self):
+        return self._offset
+    def set_offset(self, offsets):
+        self._offset = offsets.clone().detach().to(self.device).float()
+    def kinematic_tree(self):
+        return self._kinematic_tree
+    def parents(self):
+        return self._parents
+    # joints (batch_size, joints_num, 3)
+    def get_offsets_joints_batch(self, joints):
+        assert len(joints.shape) == 3
+        _offsets = self._raw_offset.expand(joints.shape[0], -1, -1).clone()
+        for i in range(1, self._raw_offset.shape[0]):
+            _offsets[:, i] = torch.norm(joints[:, i] - joints[:, self._parents[i]], p=2, dim=1)[:, None] * _offsets[:, i]
+        self._offset = _offsets.detach()
+        return _offsets
+    # joints (joints_num, 3)
+    def get_offsets_joints(self, joints):
+        assert len(joints.shape) == 2
+        _offsets = self._raw_offset.clone()
+        for i in range(1, self._raw_offset.shape[0]):
+            # print(joints.shape)
+            _offsets[i] = torch.norm(joints[i] - joints[self._parents[i]], p=2, dim=0) * _offsets[i]
+        self._offset = _offsets.detach()
+        return _offsets
+    # face_joint_idx should follow the order of right hip, left hip, right shoulder, left shoulder
+    # joints (batch_size, joints_num, 3)
+    def inverse_kinematics_np(self, joints, face_joint_idx, smooth_forward=False):
+        assert len(face_joint_idx) == 4
+        '''Get Forward Direction'''
+        l_hip, r_hip, sdr_r, sdr_l = face_joint_idx
+        across1 = joints[:, r_hip] - joints[:, l_hip]
+        across2 = joints[:, sdr_r] - joints[:, sdr_l]
+        across = across1 + across2
+        across = across / np.sqrt((across**2).sum(axis=-1))[:, np.newaxis]
+        # print(across1.shape, across2.shape)
+        # forward (batch_size, 3)
+        forward = np.cross(np.array([[0, 1, 0]]), across, axis=-1)
+        if smooth_forward:
+            forward = filters.gaussian_filter1d(forward, 20, axis=0, mode='nearest')
+            # forward (batch_size, 3)
+        forward = forward / np.sqrt((forward**2).sum(axis=-1))[..., np.newaxis]
+        '''Get Root Rotation'''
+        target = np.array([[0,0,1]]).repeat(len(forward), axis=0)
+        root_quat = qbetween_np(forward, target)
+        '''Inverse Kinematics'''
+        # quat_params (batch_size, joints_num, 4)
+        # print(joints.shape[:-1])
+        quat_params = np.zeros(joints.shape[:-1] + (4,))
+        # print(quat_params.shape)
+        root_quat[0] = np.array([[1.0, 0.0, 0.0, 0.0]])
+        quat_params[:, 0] = root_quat
+        # quat_params[0, 0] = np.array([[1.0, 0.0, 0.0, 0.0]])
+        for chain in self._kinematic_tree:
+            R = root_quat
+            for j in range(len(chain) - 1):
+                # (batch, 3)
+                u = self._raw_offset_np[chain[j+1]][np.newaxis,...].repeat(len(joints), axis=0)
+                # print(u.shape)
+                # (batch, 3)
+                v = joints[:, chain[j+1]] - joints[:, chain[j]]
+                v = v / np.sqrt((v**2).sum(axis=-1))[:, np.newaxis]
+                # print(u.shape, v.shape)
+                rot_u_v = qbetween_np(u, v)
+                R_loc = qmul_np(qinv_np(R), rot_u_v)
+                quat_params[:,chain[j + 1], :] = R_loc
+                R = qmul_np(R, R_loc)
+        return quat_params
+    # Be sure root joint is at the beginning of kinematic chains
+    def forward_kinematics(self, quat_params, root_pos, skel_joints=None, do_root_R=True):
+        # quat_params (batch_size, joints_num, 4)
+        # joints (batch_size, joints_num, 3)
+        # root_pos (batch_size, 3)
+        if skel_joints is not None:
+            offsets = self.get_offsets_joints_batch(skel_joints)
+        if len(self._offset.shape) == 2:
+            offsets = self._offset.expand(quat_params.shape[0], -1, -1)
+        joints = torch.zeros(quat_params.shape[:-1] + (3,)).to(self.device)
+        joints[:, 0] = root_pos
+        for chain in self._kinematic_tree:
+            if do_root_R:
+                R = quat_params[:, 0]
+            else:
+                R = torch.tensor([[1.0, 0.0, 0.0, 0.0]]).expand(len(quat_params), -1).detach().to(self.device)
+            for i in range(1, len(chain)):
+                R = qmul(R, quat_params[:, chain[i]])
+                offset_vec = offsets[:, chain[i]]
+                joints[:, chain[i]] = qrot(R, offset_vec) + joints[:, chain[i-1]]
+        return joints
+    # Be sure root joint is at the beginning of kinematic chains
+    def forward_kinematics_np(self, quat_params, root_pos, skel_joints=None, do_root_R=True):
+        # quat_params (batch_size, joints_num, 4)
+        # joints (batch_size, joints_num, 3)
+        # root_pos (batch_size, 3)
+        if skel_joints is not None:
+            skel_joints = torch.from_numpy(skel_joints)
+            offsets = self.get_offsets_joints_batch(skel_joints)
+        if len(self._offset.shape) == 2:
+            offsets = self._offset.expand(quat_params.shape[0], -1, -1)
+        offsets = offsets.numpy()
+        joints = np.zeros(quat_params.shape[:-1] + (3,))
+        joints[:, 0] = root_pos
+        for chain in self._kinematic_tree:
+            if do_root_R:
+                R = quat_params[:, 0]
+            else:
+                R = np.array([[1.0, 0.0, 0.0, 0.0]]).repeat(len(quat_params), axis=0)
+            for i in range(1, len(chain)):
+                R = qmul_np(R, quat_params[:, chain[i]])
+                offset_vec = offsets[:, chain[i]]
+                joints[:, chain[i]] = qrot_np(R, offset_vec) + joints[:, chain[i - 1]]
+        return joints
+    def forward_kinematics_cont6d_np(self, cont6d_params, root_pos, skel_joints=None, do_root_R=True):
+        # cont6d_params (batch_size, joints_num, 6)
+        # joints (batch_size, joints_num, 3)
+        # root_pos (batch_size, 3)
+        if skel_joints is not None:
+            skel_joints = torch.from_numpy(skel_joints)
+            offsets = self.get_offsets_joints_batch(skel_joints)
+        if len(self._offset.shape) == 2:
+            offsets = self._offset.expand(cont6d_params.shape[0], -1, -1)
+        offsets = offsets.numpy()
+        joints = np.zeros(cont6d_params.shape[:-1] + (3,))
+        joints[:, 0] = root_pos
+        for chain in self._kinematic_tree:
+            if do_root_R:
+                matR = cont6d_to_matrix_np(cont6d_params[:, 0])
+            else:
+                matR = np.eye(3)[np.newaxis, :].repeat(len(cont6d_params), axis=0)
+            for i in range(1, len(chain)):
+                matR = np.matmul(matR, cont6d_to_matrix_np(cont6d_params[:, chain[i]]))
+                offset_vec = offsets[:, chain[i]][..., np.newaxis]
+                # print(matR.shape, offset_vec.shape)
+                joints[:, chain[i]] = np.matmul(matR, offset_vec).squeeze(-1) + joints[:, chain[i-1]]
+        return joints
+    def forward_kinematics_cont6d(self, cont6d_params, root_pos, skel_joints=None, do_root_R=True):
+        # cont6d_params (batch_size, joints_num, 6)
+        # joints (batch_size, joints_num, 3)
+        # root_pos (batch_size, 3)
+        if skel_joints is not None:
+            # skel_joints = torch.from_numpy(skel_joints)
+            offsets = self.get_offsets_joints_batch(skel_joints)
+        if len(self._offset.shape) == 2:
+            offsets = self._offset.expand(cont6d_params.shape[0], -1, -1)
+        joints = torch.zeros(cont6d_params.shape[:-1] + (3,)).to(cont6d_params.device)
+        joints[..., 0, :] = root_pos
+        for chain in self._kinematic_tree:
+            if do_root_R:
+                matR = cont6d_to_matrix(cont6d_params[:, 0])
+            else:
+                matR = torch.eye(3).expand((len(cont6d_params), -1, -1)).detach().to(cont6d_params.device)
+            for i in range(1, len(chain)):
+                matR = torch.matmul(matR, cont6d_to_matrix(cont6d_params[:, chain[i]]))
+                offset_vec = offsets[:, chain[i]].unsqueeze(-1)
+                # print(matR.shape, offset_vec.shape)
+                joints[:, chain[i]] = torch.matmul(matR, offset_vec).squeeze(-1) + joints[:, chain[i-1]]
+        return joints

data/__init__.py ADDED Viewed

File without changes

data/t2m_dataset.py ADDED Viewed

	@@ -0,0 +1,348 @@

+from os.path import join as pjoin
+import torch
+from torch.utils import data
+import numpy as np
+from tqdm import tqdm
+from torch.utils.data._utils.collate import default_collate
+import random
+import codecs as cs
+def collate_fn(batch):
+    batch.sort(key=lambda x: x[3], reverse=True)
+    return default_collate(batch)
+class MotionDataset(data.Dataset):
+    def __init__(self, opt, mean, std, split_file):
+        self.opt = opt
+        joints_num = opt.joints_num
+        self.data = []
+        self.lengths = []
+        id_list = []
+        with open(split_file, 'r') as f:
+            for line in f.readlines():
+                id_list.append(line.strip())
+        for name in tqdm(id_list):
+            try:
+                motion = np.load(pjoin(opt.motion_dir, name + '.npy'))
+                if motion.shape[0] < opt.window_size:
+                    continue
+                self.lengths.append(motion.shape[0] - opt.window_size)
+                self.data.append(motion)
+            except Exception as e:
+                # Some motion may not exist in KIT dataset
+                print(e)
+                pass
+        self.cumsum = np.cumsum([0] + self.lengths)
+        if opt.is_train:
+            # root_rot_velocity (B, seq_len, 1)
+            std[0:1] = std[0:1] / opt.feat_bias
+            # root_linear_velocity (B, seq_len, 2)
+            std[1:3] = std[1:3] / opt.feat_bias
+            # root_y (B, seq_len, 1)
+            std[3:4] = std[3:4] / opt.feat_bias
+            # ric_data (B, seq_len, (joint_num - 1)*3)
+            std[4: 4 + (joints_num - 1) * 3] = std[4: 4 + (joints_num - 1) * 3] / 1.0
+            # rot_data (B, seq_len, (joint_num - 1)*6)
+            std[4 + (joints_num - 1) * 3: 4 + (joints_num - 1) * 9] = std[4 + (joints_num - 1) * 3: 4 + (
+                    joints_num - 1) * 9] / 1.0
+            # local_velocity (B, seq_len, joint_num*3)
+            std[4 + (joints_num - 1) * 9: 4 + (joints_num - 1) * 9 + joints_num * 3] = std[
+                                                                                       4 + (joints_num - 1) * 9: 4 + (
+                                                                                               joints_num - 1) * 9 + joints_num * 3] / 1.0
+            # foot contact (B, seq_len, 4)
+            std[4 + (joints_num - 1) * 9 + joints_num * 3:] = std[
+                                                              4 + (
+                                                                          joints_num - 1) * 9 + joints_num * 3:] / opt.feat_bias
+            assert 4 + (joints_num - 1) * 9 + joints_num * 3 + 4 == mean.shape[-1]
+            np.save(pjoin(opt.meta_dir, 'mean.npy'), mean)
+            np.save(pjoin(opt.meta_dir, 'std.npy'), std)
+        self.mean = mean
+        self.std = std
+        print("Total number of motions {}, snippets {}".format(len(self.data), self.cumsum[-1]))
+    def inv_transform(self, data):
+        return data * self.std + self.mean
+    def __len__(self):
+        return self.cumsum[-1]
+    def __getitem__(self, item):
+        if item != 0:
+            motion_id = np.searchsorted(self.cumsum, item) - 1
+            idx = item - self.cumsum[motion_id] - 1
+        else:
+            motion_id = 0
+            idx = 0
+        motion = self.data[motion_id][idx:idx + self.opt.window_size]
+        "Z Normalization"
+        motion = (motion - self.mean) / self.std
+        return motion
+class Text2MotionDatasetEval(data.Dataset):
+    def __init__(self, opt, mean, std, split_file, w_vectorizer):
+        self.opt = opt
+        self.w_vectorizer = w_vectorizer
+        self.max_length = 20
+        self.pointer = 0
+        self.max_motion_length = opt.max_motion_length
+        min_motion_len = 40 if self.opt.dataset_name =='t2m' else 24
+        data_dict = {}
+        id_list = []
+        with cs.open(split_file, 'r') as f:
+            for line in f.readlines():
+                id_list.append(line.strip())
+        # id_list = id_list[:250]
+        new_name_list = []
+        length_list = []
+        for name in tqdm(id_list):
+            try:
+                motion = np.load(pjoin(opt.motion_dir, name + '.npy'))
+                if (len(motion)) < min_motion_len or (len(motion) >= 200):
+                    continue
+                text_data = []
+                flag = False
+                with cs.open(pjoin(opt.text_dir, name + '.txt')) as f:
+                    for line in f.readlines():
+                        text_dict = {}
+                        line_split = line.strip().split('#')
+                        caption = line_split[0]
+                        tokens = line_split[1].split(' ')
+                        f_tag = float(line_split[2])
+                        to_tag = float(line_split[3])
+                        f_tag = 0.0 if np.isnan(f_tag) else f_tag
+                        to_tag = 0.0 if np.isnan(to_tag) else to_tag
+                        text_dict['caption'] = caption
+                        text_dict['tokens'] = tokens
+                        if f_tag == 0.0 and to_tag == 0.0:
+                            flag = True
+                            text_data.append(text_dict)
+                        else:
+                            try:
+                                n_motion = motion[int(f_tag*20) : int(to_tag*20)]
+                                if (len(n_motion)) < min_motion_len or (len(n_motion) >= 200):
+                                    continue
+                                new_name = random.choice('ABCDEFGHIJKLMNOPQRSTUVW') + '_' + name
+                                while new_name in data_dict:
+                                    new_name = random.choice('ABCDEFGHIJKLMNOPQRSTUVW') + '_' + name
+                                data_dict[new_name] = {'motion': n_motion,
+                                                       'length': len(n_motion),
+                                                       'text':[text_dict]}
+                                new_name_list.append(new_name)
+                                length_list.append(len(n_motion))
+                            except:
+                                print(line_split)
+                                print(line_split[2], line_split[3], f_tag, to_tag, name)
+                                # break
+                if flag:
+                    data_dict[name] = {'motion': motion,
+                                       'length': len(motion),
+                                       'text': text_data}
+                    new_name_list.append(name)
+                    length_list.append(len(motion))
+            except:
+                pass
+        name_list, length_list = zip(*sorted(zip(new_name_list, length_list), key=lambda x: x[1]))
+        self.mean = mean
+        self.std = std
+        self.length_arr = np.array(length_list)
+        self.data_dict = data_dict
+        self.name_list = name_list
+        self.reset_max_len(self.max_length)
+    def reset_max_len(self, length):
+        assert length <= self.max_motion_length
+        self.pointer = np.searchsorted(self.length_arr, length)
+        print("Pointer Pointing at %d"%self.pointer)
+        self.max_length = length
+    def inv_transform(self, data):
+        return data * self.std + self.mean
+    def __len__(self):
+        return len(self.data_dict) - self.pointer
+    def __getitem__(self, item):
+        idx = self.pointer + item
+        data = self.data_dict[self.name_list[idx]]
+        motion, m_length, text_list = data['motion'], data['length'], data['text']
+        # Randomly select a caption
+        text_data = random.choice(text_list)
+        caption, tokens = text_data['caption'], text_data['tokens']
+        if len(tokens) < self.opt.max_text_len:
+            # pad with "unk"
+            tokens = ['sos/OTHER'] + tokens + ['eos/OTHER']
+            sent_len = len(tokens)
+            tokens = tokens + ['unk/OTHER'] * (self.opt.max_text_len + 2 - sent_len)
+        else:
+            # crop
+            tokens = tokens[:self.opt.max_text_len]
+            tokens = ['sos/OTHER'] + tokens + ['eos/OTHER']
+            sent_len = len(tokens)
+        pos_one_hots = []
+        word_embeddings = []
+        for token in tokens:
+            word_emb, pos_oh = self.w_vectorizer[token]
+            pos_one_hots.append(pos_oh[None, :])
+            word_embeddings.append(word_emb[None, :])
+        pos_one_hots = np.concatenate(pos_one_hots, axis=0)
+        word_embeddings = np.concatenate(word_embeddings, axis=0)
+        if self.opt.unit_length < 10:
+            coin2 = np.random.choice(['single', 'single', 'double'])
+        else:
+            coin2 = 'single'
+        if coin2 == 'double':
+            m_length = (m_length // self.opt.unit_length - 1) * self.opt.unit_length
+        elif coin2 == 'single':
+            m_length = (m_length // self.opt.unit_length) * self.opt.unit_length
+        idx = random.randint(0, len(motion) - m_length)
+        motion = motion[idx:idx+m_length]
+        "Z Normalization"
+        motion = (motion - self.mean) / self.std
+        if m_length < self.max_motion_length:
+            motion = np.concatenate([motion,
+                                     np.zeros((self.max_motion_length - m_length, motion.shape[1]))
+                                     ], axis=0)
+        # print(word_embeddings.shape, motion.shape)
+        # print(tokens)
+        return word_embeddings, pos_one_hots, caption, sent_len, motion, m_length, '_'.join(tokens)
+class Text2MotionDataset(data.Dataset):
+    def __init__(self, opt, mean, std, split_file):
+        self.opt = opt
+        self.max_length = 20
+        self.pointer = 0
+        self.max_motion_length = opt.max_motion_length
+        min_motion_len = 40 if self.opt.dataset_name =='t2m' else 24
+        data_dict = {}
+        id_list = []
+        with cs.open(split_file, 'r') as f:
+            for line in f.readlines():
+                id_list.append(line.strip())
+        # id_list = id_list[:250]
+        new_name_list = []
+        length_list = []
+        for name in tqdm(id_list):
+            try:
+                motion = np.load(pjoin(opt.motion_dir, name + '.npy'))
+                if (len(motion)) < min_motion_len or (len(motion) >= 200):
+                    continue
+                text_data = []
+                flag = False
+                with cs.open(pjoin(opt.text_dir, name + '.txt')) as f:
+                    for line in f.readlines():
+                        text_dict = {}
+                        line_split = line.strip().split('#')
+                        # print(line)
+                        caption = line_split[0]
+                        tokens = line_split[1].split(' ')
+                        f_tag = float(line_split[2])
+                        to_tag = float(line_split[3])
+                        f_tag = 0.0 if np.isnan(f_tag) else f_tag
+                        to_tag = 0.0 if np.isnan(to_tag) else to_tag
+                        text_dict['caption'] = caption
+                        text_dict['tokens'] = tokens
+                        if f_tag == 0.0 and to_tag == 0.0:
+                            flag = True
+                            text_data.append(text_dict)
+                        else:
+                            try:
+                                n_motion = motion[int(f_tag*20) : int(to_tag*20)]
+                                if (len(n_motion)) < min_motion_len or (len(n_motion) >= 200):
+                                    continue
+                                new_name = random.choice('ABCDEFGHIJKLMNOPQRSTUVW') + '_' + name
+                                while new_name in data_dict:
+                                    new_name = random.choice('ABCDEFGHIJKLMNOPQRSTUVW') + '_' + name
+                                data_dict[new_name] = {'motion': n_motion,
+                                                       'length': len(n_motion),
+                                                       'text':[text_dict]}
+                                new_name_list.append(new_name)
+                                length_list.append(len(n_motion))
+                            except:
+                                print(line_split)
+                                print(line_split[2], line_split[3], f_tag, to_tag, name)
+                                # break
+                if flag:
+                    data_dict[name] = {'motion': motion,
+                                       'length': len(motion),
+                                       'text': text_data}
+                    new_name_list.append(name)
+                    length_list.append(len(motion))
+            except Exception as e:
+                # print(e)
+                pass
+        # name_list, length_list = zip(*sorted(zip(new_name_list, length_list), key=lambda x: x[1]))
+        name_list, length_list = new_name_list, length_list
+        self.mean = mean
+        self.std = std
+        self.length_arr = np.array(length_list)
+        self.data_dict = data_dict
+        self.name_list = name_list
+    def inv_transform(self, data):
+        return data * self.std + self.mean
+    def __len__(self):
+        return len(self.data_dict) - self.pointer
+    def __getitem__(self, item):
+        idx = self.pointer + item
+        data = self.data_dict[self.name_list[idx]]
+        motion, m_length, text_list = data['motion'], data['length'], data['text']
+        # Randomly select a caption
+        text_data = random.choice(text_list)
+        caption, tokens = text_data['caption'], text_data['tokens']
+        if self.opt.unit_length < 10:
+            coin2 = np.random.choice(['single', 'single', 'double'])
+        else:
+            coin2 = 'single'
+        if coin2 == 'double':
+            m_length = (m_length // self.opt.unit_length - 1) * self.opt.unit_length
+        elif coin2 == 'single':
+            m_length = (m_length // self.opt.unit_length) * self.opt.unit_length
+        idx = random.randint(0, len(motion) - m_length)
+        motion = motion[idx:idx+m_length]
+        "Z Normalization"
+        motion = (motion - self.mean) / self.std
+        if m_length < self.max_motion_length:
+            motion = np.concatenate([motion,
+                                     np.zeros((self.max_motion_length - m_length, motion.shape[1]))
+                                     ], axis=0)
+        # print(word_embeddings.shape, motion.shape)
+        # print(tokens)
+        return caption, motion, m_length
+    def reset_min_len(self, length):
+        assert length <= self.max_motion_length
+        self.pointer = np.searchsorted(self.length_arr, length)
+        print("Pointer Pointing at %d" % self.pointer)

dataset/__init__.py ADDED Viewed

File without changes

edit_t2m.py ADDED Viewed

	@@ -0,0 +1,195 @@

+import os
+from os.path import join as pjoin
+import torch
+import torch.nn.functional as F
+from models.mask_transformer.transformer import MaskTransformer, ResidualTransformer
+from models.vq.model import RVQVAE, LengthEstimator
+from options.eval_option import EvalT2MOptions
+from utils.get_opt import get_opt
+from utils.fixseed import fixseed
+from visualization.joints2bvh import Joint2BVHConvertor
+from utils.motion_process import recover_from_ric
+from utils.plot_script import plot_3d_motion
+from utils.paramUtil import t2m_kinematic_chain
+import numpy as np
+from gen_t2m import load_vq_model, load_res_model, load_trans_model
+if __name__ == '__main__':
+    parser = EvalT2MOptions()
+    opt = parser.parse()
+    fixseed(opt.seed)
+    opt.device = torch.device("cpu" if opt.gpu_id == -1 else "cuda:" + str(opt.gpu_id))
+    torch.autograd.set_detect_anomaly(True)
+    dim_pose = 251 if opt.dataset_name == 'kit' else 263
+    root_dir = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.name)
+    model_dir = pjoin(root_dir, 'model')
+    result_dir = pjoin('./editing', opt.ext)
+    joints_dir = pjoin(result_dir, 'joints')
+    animation_dir = pjoin(result_dir, 'animations')
+    os.makedirs(joints_dir, exist_ok=True)
+    os.makedirs(animation_dir,exist_ok=True)
+    model_opt_path = pjoin(root_dir, 'opt.txt')
+    model_opt = get_opt(model_opt_path, device=opt.device)
+    #######################
+    ######Loading RVQ######
+    #######################
+    vq_opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, model_opt.vq_name, 'opt.txt')
+    vq_opt = get_opt(vq_opt_path, device=opt.device)
+    vq_opt.dim_pose = dim_pose
+    vq_model, vq_opt = load_vq_model(vq_opt)
+    model_opt.num_tokens = vq_opt.nb_code
+    model_opt.num_quantizers = vq_opt.num_quantizers
+    model_opt.code_dim = vq_opt.code_dim
+    #################################
+    ######Loading R-Transformer######
+    #################################
+    res_opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.res_name, 'opt.txt')
+    res_opt = get_opt(res_opt_path, device=opt.device)
+    res_model = load_res_model(res_opt, vq_opt, opt)
+    assert res_opt.vq_name == model_opt.vq_name
+    #################################
+    ######Loading M-Transformer######
+    #################################
+    t2m_transformer = load_trans_model(model_opt, opt, 'latest.tar')
+    t2m_transformer.eval()
+    vq_model.eval()
+    res_model.eval()
+    res_model.to(opt.device)
+    t2m_transformer.to(opt.device)
+    vq_model.to(opt.device)
+    ##### ---- Data ---- #####
+    max_motion_length = 196
+    mean = np.load(pjoin(opt.checkpoints_dir, opt.dataset_name, model_opt.vq_name, 'meta', 'mean.npy'))
+    std = np.load(pjoin(opt.checkpoints_dir, opt.dataset_name, model_opt.vq_name, 'meta', 'std.npy'))
+    def inv_transform(data):
+        return data * std + mean
+    ### We provided an example source motion (from 'new_joint_vecs') for editing. See './example_data/000612.mp4'###
+    motion = np.load(opt.source_motion)
+    m_length = len(motion)
+    motion = (motion - mean) / std
+    if max_motion_length > m_length:
+        motion = np.concatenate([motion, np.zeros((max_motion_length - m_length, motion.shape[1])) ], axis=0)
+    motion = torch.from_numpy(motion)[None].to(opt.device)
+    prompt_list = []
+    length_list = []
+    if opt.motion_length == 0:
+        opt.motion_length = m_length
+        print("Using default motion length.")
+    prompt_list.append(opt.text_prompt)
+    length_list.append(opt.motion_length)
+    if opt.text_prompt == "":
+        raise "Using an empty text prompt."
+    token_lens = torch.LongTensor(length_list) // 4
+    token_lens = token_lens.to(opt.device).long()
+    m_length = token_lens * 4
+    captions = prompt_list
+    print_captions = captions[0]
+    _edit_slice = opt.mask_edit_section
+    edit_slice = []
+    for eds in _edit_slice:
+        _start, _end = eds.split(',')
+        _start = eval(_start)
+        _end = eval(_end)
+        edit_slice.append([_start, _end])
+    sample = 0
+    kinematic_chain = t2m_kinematic_chain
+    converter = Joint2BVHConvertor()
+    with torch.no_grad():
+        tokens, features = vq_model.encode(motion)
+    ### build editing mask, TOEDIT marked as 1 ###
+    edit_mask = torch.zeros_like(tokens[..., 0])
+    seq_len = tokens.shape[1]
+    for _start, _end in edit_slice:
+        if isinstance(_start, float):
+            _start = int(_start*seq_len)
+            _end = int(_end*seq_len)
+        else:
+            _start //= 4
+            _end //= 4
+        edit_mask[:, _start: _end] = 1
+        print_captions = f'{print_captions} [{_start*4/20.}s - {_end*4/20.}s]'
+    edit_mask = edit_mask.bool()
+    for r in range(opt.repeat_times):
+        print("-->Repeat %d"%r)
+        with torch.no_grad():
+            mids = t2m_transformer.edit(
+                                        captions, tokens[..., 0].clone(), m_length//4,
+                                        timesteps=opt.time_steps,
+                                        cond_scale=opt.cond_scale,
+                                        temperature=opt.temperature,
+                                        topk_filter_thres=opt.topkr,
+                                        gsample=opt.gumbel_sample,
+                                        force_mask=opt.force_mask,
+                                        edit_mask=edit_mask.clone(),
+                                        )
+            if opt.use_res_model:
+                mids = res_model.generate(mids, captions, m_length//4, temperature=1, cond_scale=2)
+            else:
+                mids.unsqueeze_(-1)
+            pred_motions = vq_model.forward_decoder(mids)
+            pred_motions = pred_motions.detach().cpu().numpy()
+            source_motions = motion.detach().cpu().numpy()
+            data = inv_transform(pred_motions)
+            source_data = inv_transform(source_motions)
+        for k, (caption, joint_data, source_data)  in enumerate(zip(captions, data, source_data)):
+            print("---->Sample %d: %s %d"%(k, caption, m_length[k]))
+            animation_path = pjoin(animation_dir, str(k))
+            joint_path = pjoin(joints_dir, str(k))
+            os.makedirs(animation_path, exist_ok=True)
+            os.makedirs(joint_path, exist_ok=True)
+            joint_data = joint_data[:m_length[k]]
+            joint = recover_from_ric(torch.from_numpy(joint_data).float(), 22).numpy()
+            source_data = source_data[:m_length[k]]
+            soucre_joint = recover_from_ric(torch.from_numpy(source_data).float(), 22).numpy()
+            bvh_path = pjoin(animation_path, "sample%d_repeat%d_len%d_ik.bvh"%(k, r, m_length[k]))
+            _, ik_joint = converter.convert(joint, filename=bvh_path, iterations=100)
+            bvh_path = pjoin(animation_path, "sample%d_repeat%d_len%d.bvh" % (k, r, m_length[k]))
+            _, joint = converter.convert(joint, filename=bvh_path, iterations=100, foot_ik=False)
+            save_path = pjoin(animation_path, "sample%d_repeat%d_len%d.mp4"%(k, r, m_length[k]))
+            ik_save_path = pjoin(animation_path, "sample%d_repeat%d_len%d_ik.mp4"%(k, r, m_length[k]))
+            source_save_path = pjoin(animation_path, "sample%d_source_len%d.mp4"%(k, m_length[k]))
+            plot_3d_motion(ik_save_path, kinematic_chain, ik_joint, title=print_captions, fps=20)
+            plot_3d_motion(save_path, kinematic_chain, joint, title=print_captions, fps=20)
+            plot_3d_motion(source_save_path, kinematic_chain, soucre_joint, title='None', fps=20)
+            np.save(pjoin(joint_path, "sample%d_repeat%d_len%d.npy"%(k, r, m_length[k])), joint)
+            np.save(pjoin(joint_path, "sample%d_repeat%d_len%d_ik.npy"%(k, r, m_length[k])), ik_joint)

environment.yml ADDED Viewed

	@@ -0,0 +1,204 @@

+name: momask
+channels:
+  - pytorch
+  - anaconda
+  - conda-forge
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - absl-py=1.4.0=pyhd8ed1ab_0
+  - aiohttp=3.8.3=py37h5eee18b_0
+  - aiosignal=1.2.0=pyhd3eb1b0_0
+  - argon2-cffi=21.3.0=pyhd3eb1b0_0
+  - argon2-cffi-bindings=21.2.0=py37h7f8727e_0
+  - async-timeout=4.0.2=py37h06a4308_0
+  - asynctest=0.13.0=py_0
+  - attrs=22.1.0=py37h06a4308_0
+  - backcall=0.2.0=pyhd3eb1b0_0
+  - beautifulsoup4=4.11.1=pyha770c72_0
+  - blas=1.0=mkl
+  - bleach=4.1.0=pyhd3eb1b0_0
+  - blinker=1.4=py37h06a4308_0
+  - brotlipy=0.7.0=py37h540881e_1004
+  - c-ares=1.19.0=h5eee18b_0
+  - ca-certificates=2023.05.30=h06a4308_0
+  - catalogue=2.0.8=py37h89c1867_0
+  - certifi=2022.12.7=py37h06a4308_0
+  - cffi=1.15.1=py37h74dc2b5_0
+  - charset-normalizer=2.1.1=pyhd8ed1ab_0
+  - click=8.0.4=py37h89c1867_0
+  - colorama=0.4.5=pyhd8ed1ab_0
+  - cryptography=35.0.0=py37hf1a17b8_2
+  - cudatoolkit=11.0.221=h6bb024c_0
+  - cycler=0.11.0=pyhd3eb1b0_0
+  - cymem=2.0.6=py37hd23a5d3_3
+  - cython-blis=0.7.7=py37hda87dfa_1
+  - dataclasses=0.8=pyhc8e2a94_3
+  - dbus=1.13.18=hb2f20db_0
+  - debugpy=1.5.1=py37h295c915_0
+  - decorator=5.1.1=pyhd3eb1b0_0
+  - defusedxml=0.7.1=pyhd3eb1b0_0
+  - entrypoints=0.4=py37h06a4308_0
+  - expat=2.4.9=h6a678d5_0
+  - fftw=3.3.9=h27cfd23_1
+  - filelock=3.8.0=pyhd8ed1ab_0
+  - fontconfig=2.13.1=h6c09931_0
+  - freetype=2.11.0=h70c0345_0
+  - frozenlist=1.3.3=py37h5eee18b_0
+  - giflib=5.2.1=h7b6447c_0
+  - glib=2.69.1=h4ff587b_1
+  - gst-plugins-base=1.14.0=h8213a91_2
+  - gstreamer=1.14.0=h28cd5cc_2
+  - h5py=3.7.0=py37h737f45e_0
+  - hdf5=1.10.6=h3ffc7dd_1
+  - icu=58.2=he6710b0_3
+  - idna=3.4=pyhd8ed1ab_0
+  - importlib-metadata=4.11.4=py37h89c1867_0
+  - intel-openmp=2021.4.0=h06a4308_3561
+  - ipykernel=6.15.2=py37h06a4308_0
+  - ipython=7.31.1=py37h06a4308_1
+  - ipython_genutils=0.2.0=pyhd3eb1b0_1
+  - jedi=0.18.1=py37h06a4308_1
+  - jinja2=3.1.2=pyhd8ed1ab_1
+  - joblib=1.1.0=pyhd3eb1b0_0
+  - jpeg=9b=h024ee3a_2
+  - jsonschema=3.0.2=py37_0
+  - jupyter_client=7.4.9=py37h06a4308_0
+  - jupyter_core=4.11.2=py37h06a4308_0
+  - jupyterlab_pygments=0.1.2=py_0
+  - kiwisolver=1.4.2=py37h295c915_0
+  - langcodes=3.3.0=pyhd8ed1ab_0
+  - lcms2=2.12=h3be6417_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.3=he6710b0_2
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgfortran-ng=11.2.0=h00389a5_1
+  - libgfortran5=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libpng=1.6.37=hbc83047_0
+  - libprotobuf=3.15.8=h780b84a_1
+  - libsodium=1.0.18=h7b6447c_0
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libtiff=4.1.0=h2733197_1
+  - libuuid=1.0.3=h7f8727e_2
+  - libuv=1.40.0=h7b6447c_0
+  - libwebp=1.2.0=h89dd481_0
+  - libxcb=1.15=h7f8727e_0
+  - libxml2=2.9.14=h74e7548_0
+  - lz4-c=1.9.3=h295c915_1
+  - markdown=3.4.3=pyhd8ed1ab_0
+  - markupsafe=2.1.1=py37h540881e_1
+  - matplotlib=3.1.3=py37_0
+  - matplotlib-base=3.1.3=py37hef1b27d_0
+  - matplotlib-inline=0.1.6=py37h06a4308_0
+  - mistune=0.8.4=py37h14c3975_1001
+  - mkl=2021.4.0=h06a4308_640
+  - mkl-service=2.4.0=py37h7f8727e_0
+  - mkl_fft=1.3.1=py37hd3c417c_0
+  - mkl_random=1.2.2=py37h51133e4_0
+  - multidict=6.0.2=py37h5eee18b_0
+  - murmurhash=1.0.7=py37hd23a5d3_0
+  - nb_conda_kernels=2.3.1=py37h06a4308_0
+  - nbclient=0.5.13=py37h06a4308_0
+  - nbconvert=6.4.4=py37h06a4308_0
+  - nbformat=5.5.0=py37h06a4308_0
+  - ncurses=6.3=h5eee18b_3
+  - nest-asyncio=1.5.6=py37h06a4308_0
+  - ninja=1.10.2=h06a4308_5
+  - ninja-base=1.10.2=hd09550d_5
+  - notebook=6.4.12=py37h06a4308_0
+  - numpy=1.21.5=py37h6c91a56_3
+  - numpy-base=1.21.5=py37ha15fc14_3
+  - openssl=1.1.1v=h7f8727e_0
+  - packaging=21.3=pyhd8ed1ab_0
+  - pandocfilters=1.5.0=pyhd3eb1b0_0
+  - parso=0.8.3=pyhd3eb1b0_0
+  - pathy=0.6.2=pyhd8ed1ab_0
+  - pcre=8.45=h295c915_0
+  - pexpect=4.8.0=pyhd3eb1b0_3
+  - pickleshare=0.7.5=pyhd3eb1b0_1003
+  - pillow=9.2.0=py37hace64e9_1
+  - pip=22.2.2=py37h06a4308_0
+  - preshed=3.0.6=py37hd23a5d3_2
+  - prometheus_client=0.14.1=py37h06a4308_0
+  - prompt-toolkit=3.0.36=py37h06a4308_0
+  - psutil=5.9.0=py37h5eee18b_0
+  - ptyprocess=0.7.0=pyhd3eb1b0_2
+  - pycparser=2.21=pyhd8ed1ab_0
+  - pydantic=1.8.2=py37h5e8e339_2
+  - pygments=2.11.2=pyhd3eb1b0_0
+  - pyjwt=2.4.0=py37h06a4308_0
+  - pyopenssl=22.0.0=pyhd8ed1ab_1
+  - pyparsing=3.0.9=py37h06a4308_0
+  - pyqt=5.9.2=py37h05f1152_2
+  - pyrsistent=0.18.0=py37heee7806_0
+  - pysocks=1.7.1=py37h89c1867_5
+  - python=3.7.13=h12debd9_0
+  - python-dateutil=2.8.2=pyhd3eb1b0_0
+  - python-fastjsonschema=2.16.2=py37h06a4308_0
+  - python_abi=3.7=2_cp37m
+  - pytorch=1.7.1=py3.7_cuda11.0.221_cudnn8.0.5_0
+  - pyzmq=23.2.0=py37h6a678d5_0
+  - qt=5.9.7=h5867ecd_1
+  - readline=8.1.2=h7f8727e_1
+  - requests=2.28.1=pyhd8ed1ab_1
+  - scikit-learn=1.0.2=py37h51133e4_1
+  - scipy=1.7.3=py37h6c91a56_2
+  - send2trash=1.8.0=pyhd3eb1b0_1
+  - setuptools=63.4.1=py37h06a4308_0
+  - shellingham=1.5.0=pyhd8ed1ab_0
+  - sip=4.19.8=py37hf484d3e_0
+  - six=1.16.0=pyhd3eb1b0_1
+  - smart_open=5.2.1=pyhd8ed1ab_0
+  - soupsieve=2.3.2.post1=pyhd8ed1ab_0
+  - spacy=3.3.1=py37h79cecc1_0
+  - spacy-legacy=3.0.10=pyhd8ed1ab_0
+  - spacy-loggers=1.0.3=pyhd8ed1ab_0
+  - sqlite=3.39.3=h5082296_0
+  - srsly=2.4.3=py37hd23a5d3_1
+  - tensorboard-plugin-wit=1.8.1=py37h06a4308_0
+  - terminado=0.17.1=py37h06a4308_0
+  - testpath=0.6.0=py37h06a4308_0
+  - thinc=8.0.15=py37h48bf904_0
+  - threadpoolctl=2.2.0=pyh0d69192_0
+  - tk=8.6.12=h1ccaba5_0
+  - torchaudio=0.7.2=py37
+  - torchvision=0.8.2=py37_cu110
+  - tornado=6.2=py37h5eee18b_0
+  - tqdm=4.64.1=py37h06a4308_0
+  - traitlets=5.7.1=py37h06a4308_0
+  - trimesh=3.15.3=pyh1a96a4e_0
+  - typer=0.4.2=pyhd8ed1ab_0
+  - typing-extensions=3.10.0.2=hd8ed1ab_0
+  - typing_extensions=3.10.0.2=pyha770c72_0
+  - urllib3=1.26.15=pyhd8ed1ab_0
+  - wasabi=0.10.1=pyhd8ed1ab_1
+  - webencodings=0.5.1=py37_1
+  - werkzeug=2.2.3=pyhd8ed1ab_0
+  - wheel=0.37.1=pyhd3eb1b0_0
+  - xz=5.2.6=h5eee18b_0
+  - yarl=1.8.1=py37h5eee18b_0
+  - zeromq=4.3.4=h2531618_0
+  - zipp=3.8.1=pyhd8ed1ab_0
+  - zlib=1.2.12=h5eee18b_3
+  - zstd=1.4.9=haebb681_0
+  - pip:
+    - cachetools==5.3.1
+    - einops==0.6.1
+    - ftfy==6.1.1
+    - gdown==4.7.1
+    - google-auth==2.22.0
+    - google-auth-oauthlib==0.4.6
+    - grpcio==1.57.0
+    - oauthlib==3.2.2
+    - protobuf==3.20.3
+    - pyasn1==0.5.0
+    - pyasn1-modules==0.3.0
+    - regex==2023.8.8
+    - requests-oauthlib==1.3.1
+    - rsa==4.9
+    - tensorboard==2.11.2
+    - tensorboard-data-server==0.6.1
+    - wcwidth==0.2.6
+prefix: /home/chuan/anaconda3/envs/momask

eval_t2m_trans_res.py ADDED Viewed

	@@ -0,0 +1,199 @@

+import os
+from os.path import join as pjoin
+import torch
+from models.mask_transformer.transformer import MaskTransformer, ResidualTransformer
+from models.vq.model import RVQVAE
+from options.eval_option import EvalT2MOptions
+from utils.get_opt import get_opt
+from motion_loaders.dataset_motion_loader import get_dataset_motion_loader
+from models.t2m_eval_wrapper import EvaluatorModelWrapper
+import utils.eval_t2m as eval_t2m
+from utils.fixseed import fixseed
+import numpy as np
+def load_vq_model(vq_opt):
+    # opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'opt.txt')
+    vq_model = RVQVAE(vq_opt,
+                dim_pose,
+                vq_opt.nb_code,
+                vq_opt.code_dim,
+                vq_opt.output_emb_width,
+                vq_opt.down_t,
+                vq_opt.stride_t,
+                vq_opt.width,
+                vq_opt.depth,
+                vq_opt.dilation_growth_rate,
+                vq_opt.vq_act,
+                vq_opt.vq_norm)
+    ckpt = torch.load(pjoin(vq_opt.checkpoints_dir, vq_opt.dataset_name, vq_opt.name, 'model', 'net_best_fid.tar'),
+                            map_location=opt.device)
+    model_key = 'vq_model' if 'vq_model' in ckpt else 'net'
+    vq_model.load_state_dict(ckpt[model_key])
+    print(f'Loading VQ Model {vq_opt.name} Completed!')
+    return vq_model, vq_opt
+def load_trans_model(model_opt, which_model):
+    t2m_transformer = MaskTransformer(code_dim=model_opt.code_dim,
+                                      cond_mode='text',
+                                      latent_dim=model_opt.latent_dim,
+                                      ff_size=model_opt.ff_size,
+                                      num_layers=model_opt.n_layers,
+                                      num_heads=model_opt.n_heads,
+                                      dropout=model_opt.dropout,
+                                      clip_dim=512,
+                                      cond_drop_prob=model_opt.cond_drop_prob,
+                                      clip_version=clip_version,
+                                      opt=model_opt)
+    ckpt = torch.load(pjoin(model_opt.checkpoints_dir, model_opt.dataset_name, model_opt.name, 'model', which_model),
+                      map_location=opt.device)
+    model_key = 't2m_transformer' if 't2m_transformer' in ckpt else 'trans'
+    # print(ckpt.keys())
+    missing_keys, unexpected_keys = t2m_transformer.load_state_dict(ckpt[model_key], strict=False)
+    assert len(unexpected_keys) == 0
+    assert all([k.startswith('clip_model.') for k in missing_keys])
+    print(f'Loading Mask Transformer {opt.name} from epoch {ckpt["ep"]}!')
+    return t2m_transformer
+def load_res_model(res_opt):
+    res_opt.num_quantizers = vq_opt.num_quantizers
+    res_opt.num_tokens = vq_opt.nb_code
+    res_transformer = ResidualTransformer(code_dim=vq_opt.code_dim,
+                                            cond_mode='text',
+                                            latent_dim=res_opt.latent_dim,
+                                            ff_size=res_opt.ff_size,
+                                            num_layers=res_opt.n_layers,
+                                            num_heads=res_opt.n_heads,
+                                            dropout=res_opt.dropout,
+                                            clip_dim=512,
+                                            shared_codebook=vq_opt.shared_codebook,
+                                            cond_drop_prob=res_opt.cond_drop_prob,
+                                            # codebook=vq_model.quantizer.codebooks[0] if opt.fix_token_emb else None,
+                                            share_weight=res_opt.share_weight,
+                                            clip_version=clip_version,
+                                            opt=res_opt)
+    ckpt = torch.load(pjoin(res_opt.checkpoints_dir, res_opt.dataset_name, res_opt.name, 'model', 'net_best_fid.tar'),
+                      map_location=opt.device)
+    missing_keys, unexpected_keys = res_transformer.load_state_dict(ckpt['res_transformer'], strict=False)
+    assert len(unexpected_keys) == 0
+    assert all([k.startswith('clip_model.') for k in missing_keys])
+    print(f'Loading Residual Transformer {res_opt.name} from epoch {ckpt["ep"]}!')
+    return res_transformer
+if __name__ == '__main__':
+    parser = EvalT2MOptions()
+    opt = parser.parse()
+    fixseed(opt.seed)
+    opt.device = torch.device("cpu" if opt.gpu_id == -1 else "cuda:" + str(opt.gpu_id))
+    torch.autograd.set_detect_anomaly(True)
+    dim_pose = 251 if opt.dataset_name == 'kit' else 263
+    # out_dir = pjoin(opt.check)
+    root_dir = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.name)
+    model_dir = pjoin(root_dir, 'model')
+    out_dir = pjoin(root_dir, 'eval')
+    os.makedirs(out_dir, exist_ok=True)
+    out_path = pjoin(out_dir, "%s.log"%opt.ext)
+    f = open(pjoin(out_path), 'w')
+    model_opt_path = pjoin(root_dir, 'opt.txt')
+    model_opt = get_opt(model_opt_path, device=opt.device)
+    clip_version = 'ViT-B/32'
+    vq_opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, model_opt.vq_name, 'opt.txt')
+    vq_opt = get_opt(vq_opt_path, device=opt.device)
+    vq_model, vq_opt = load_vq_model(vq_opt)
+    model_opt.num_tokens = vq_opt.nb_code
+    model_opt.num_quantizers = vq_opt.num_quantizers
+    model_opt.code_dim = vq_opt.code_dim
+    res_opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.res_name, 'opt.txt')
+    res_opt = get_opt(res_opt_path, device=opt.device)
+    res_model = load_res_model(res_opt)
+    assert res_opt.vq_name == model_opt.vq_name
+    dataset_opt_path = 'checkpoints/kit/Comp_v6_KLD005/opt.txt' if opt.dataset_name == 'kit' \
+        else 'checkpoints/t2m/Comp_v6_KLD005/opt.txt'
+    wrapper_opt = get_opt(dataset_opt_path, torch.device('cuda'))
+    eval_wrapper = EvaluatorModelWrapper(wrapper_opt)
+    ##### ---- Dataloader ---- #####
+    opt.nb_joints = 21 if opt.dataset_name == 'kit' else 22
+    eval_val_loader, _ = get_dataset_motion_loader(dataset_opt_path, 32, 'test', device=opt.device)
+    # model_dir = pjoin(opt.)
+    for file in os.listdir(model_dir):
+        if opt.which_epoch != "all" and opt.which_epoch not in file:
+            continue
+        print('loading checkpoint {}'.format(file))
+        t2m_transformer = load_trans_model(model_opt, file)
+        t2m_transformer.eval()
+        vq_model.eval()
+        res_model.eval()
+        t2m_transformer.to(opt.device)
+        vq_model.to(opt.device)
+        res_model.to(opt.device)
+        fid = []
+        div = []
+        top1 = []
+        top2 = []
+        top3 = []
+        matching = []
+        mm = []
+        repeat_time = 20
+        for i in range(repeat_time):
+            with torch.no_grad():
+                best_fid, best_div, Rprecision, best_matching, best_mm = \
+                    eval_t2m.evaluation_mask_transformer_test_plus_res(eval_val_loader, vq_model, res_model, t2m_transformer,
+                                                                       i, eval_wrapper=eval_wrapper,
+                                                         time_steps=opt.time_steps, cond_scale=opt.cond_scale,
+                                                         temperature=opt.temperature, topkr=opt.topkr,
+                                                                       force_mask=opt.force_mask, cal_mm=True)
+            fid.append(best_fid)
+            div.append(best_div)
+            top1.append(Rprecision[0])
+            top2.append(Rprecision[1])
+            top3.append(Rprecision[2])
+            matching.append(best_matching)
+            mm.append(best_mm)
+        fid = np.array(fid)
+        div = np.array(div)
+        top1 = np.array(top1)
+        top2 = np.array(top2)
+        top3 = np.array(top3)
+        matching = np.array(matching)
+        mm = np.array(mm)
+        print(f'{file} final result:')
+        print(f'{file} final result:', file=f, flush=True)
+        msg_final = f"\tFID: {np.mean(fid):.3f}, conf. {np.std(fid) * 1.96 / np.sqrt(repeat_time):.3f}\n" \
+                    f"\tDiversity: {np.mean(div):.3f}, conf. {np.std(div) * 1.96 / np.sqrt(repeat_time):.3f}\n" \
+                    f"\tTOP1: {np.mean(top1):.3f}, conf. {np.std(top1) * 1.96 / np.sqrt(repeat_time):.3f}, TOP2. {np.mean(top2):.3f}, conf. {np.std(top2) * 1.96 / np.sqrt(repeat_time):.3f}, TOP3. {np.mean(top3):.3f}, conf. {np.std(top3) * 1.96 / np.sqrt(repeat_time):.3f}\n" \
+                    f"\tMatching: {np.mean(matching):.3f}, conf. {np.std(matching) * 1.96 / np.sqrt(repeat_time):.3f}\n" \
+                    f"\tMultimodality:{np.mean(mm):.3f}, conf.{np.std(mm) * 1.96 / np.sqrt(repeat_time):.3f}\n\n"
+        # logger.info(msg_final)
+        print(msg_final)
+        print(msg_final, file=f, flush=True)
+    f.close()
+# python eval_t2m_trans.py --name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_vq --dataset_name t2m --gpu_id 3 --cond_scale 4 --time_steps 18 --temperature 1 --topkr 0.9 --gumbel_sample --ext cs4_ts18_tau1_topkr0.9_gs

eval_t2m_vq.py ADDED Viewed

	@@ -0,0 +1,123 @@

+import sys
+import os
+from os.path import join as pjoin
+import torch
+from models.vq.model import RVQVAE
+from options.vq_option import arg_parse
+from motion_loaders.dataset_motion_loader import get_dataset_motion_loader
+import utils.eval_t2m as eval_t2m
+from utils.get_opt import get_opt
+from models.t2m_eval_wrapper import EvaluatorModelWrapper
+import warnings
+warnings.filterwarnings('ignore')
+import numpy as np
+from utils.word_vectorizer import WordVectorizer
+def load_vq_model(vq_opt, which_epoch):
+    # opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'opt.txt')
+    vq_model = RVQVAE(vq_opt,
+                dim_pose,
+                vq_opt.nb_code,
+                vq_opt.code_dim,
+                vq_opt.code_dim,
+                vq_opt.down_t,
+                vq_opt.stride_t,
+                vq_opt.width,
+                vq_opt.depth,
+                vq_opt.dilation_growth_rate,
+                vq_opt.vq_act,
+                vq_opt.vq_norm)
+    ckpt = torch.load(pjoin(vq_opt.checkpoints_dir, vq_opt.dataset_name, vq_opt.name, 'model', which_epoch),
+                            map_location='cpu')
+    model_key = 'vq_model' if 'vq_model' in ckpt else 'net'
+    vq_model.load_state_dict(ckpt[model_key])
+    vq_epoch = ckpt['ep'] if 'ep' in ckpt else -1
+    print(f'Loading VQ Model {vq_opt.name} Completed!, Epoch {vq_epoch}')
+    return vq_model, vq_epoch
+if __name__ == "__main__":
+    ##### ---- Exp dirs ---- #####
+    args = arg_parse(False)
+    args.device = torch.device("cpu" if args.gpu_id == -1 else "cuda:" + str(args.gpu_id))
+    args.out_dir = pjoin(args.checkpoints_dir, args.dataset_name, args.name, 'eval')
+    os.makedirs(args.out_dir, exist_ok=True)
+    f = open(pjoin(args.out_dir, '%s.log'%args.ext), 'w')
+    dataset_opt_path = 'checkpoints/kit/Comp_v6_KLD005/opt.txt' if args.dataset_name == 'kit' \
+                                                        else 'checkpoints/t2m/Comp_v6_KLD005/opt.txt'
+    wrapper_opt = get_opt(dataset_opt_path, torch.device('cuda'))
+    eval_wrapper = EvaluatorModelWrapper(wrapper_opt)
+    ##### ---- Dataloader ---- #####
+    args.nb_joints = 21 if args.dataset_name == 'kit' else 22
+    dim_pose = 251 if args.dataset_name == 'kit' else 263
+    eval_val_loader, _ = get_dataset_motion_loader(dataset_opt_path, 32, 'test', device=args.device)
+    print(len(eval_val_loader))
+    ##### ---- Network ---- #####
+    vq_opt_path = pjoin(args.checkpoints_dir, args.dataset_name, args.name, 'opt.txt')
+    vq_opt = get_opt(vq_opt_path, device=args.device)
+    # net = load_vq_model()
+    model_dir = pjoin(args.checkpoints_dir, args.dataset_name, args.name, 'model')
+    for file in os.listdir(model_dir):
+        # if not file.endswith('tar'):
+        #     continue
+        # if not file.startswith('net_best_fid'):
+        #     continue
+        if args.which_epoch != "all" and args.which_epoch not in file:
+            continue
+        print(file)
+        net, ep = load_vq_model(vq_opt, file)
+        net.eval()
+        net.cuda()
+        fid = []
+        div = []
+        top1 = []
+        top2 = []
+        top3 = []
+        matching = []
+        mae = []
+        repeat_time = 20
+        for i in range(repeat_time):
+            best_fid, best_div, Rprecision, best_matching, l1_dist = \
+                eval_t2m.evaluation_vqvae_plus_mpjpe(eval_val_loader, net, i, eval_wrapper=eval_wrapper, num_joint=args.nb_joints)
+            fid.append(best_fid)
+            div.append(best_div)
+            top1.append(Rprecision[0])
+            top2.append(Rprecision[1])
+            top3.append(Rprecision[2])
+            matching.append(best_matching)
+            mae.append(l1_dist)
+        fid = np.array(fid)
+        div = np.array(div)
+        top1 = np.array(top1)
+        top2 = np.array(top2)
+        top3 = np.array(top3)
+        matching = np.array(matching)
+        mae = np.array(mae)
+        print(f'{file} final result, epoch {ep}')
+        print(f'{file} final result, epoch {ep}', file=f, flush=True)
+        msg_final = f"\tFID: {np.mean(fid):.3f}, conf. {np.std(fid)*1.96/np.sqrt(repeat_time):.3f}\n" \
+                    f"\tDiversity: {np.mean(div):.3f}, conf. {np.std(div)*1.96/np.sqrt(repeat_time):.3f}\n" \
+                    f"\tTOP1: {np.mean(top1):.3f}, conf. {np.std(top1)*1.96/np.sqrt(repeat_time):.3f}, TOP2. {np.mean(top2):.3f}, conf. {np.std(top2)*1.96/np.sqrt(repeat_time):.3f}, TOP3. {np.mean(top3):.3f}, conf. {np.std(top3)*1.96/np.sqrt(repeat_time):.3f}\n" \
+                    f"\tMatching: {np.mean(matching):.3f}, conf. {np.std(matching)*1.96/np.sqrt(repeat_time):.3f}\n" \
+                    f"\tMAE:{np.mean(mae):.3f}, conf.{np.std(mae)*1.96/np.sqrt(repeat_time):.3f}\n\n"
+        # logger.info(msg_final)
+        print(msg_final)
+        print(msg_final, file=f, flush=True)
+    f.close()

example_data/000612.mp4 ADDED Viewed

Binary file (154 kB). View file

example_data/000612.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:85e5a8081278a0e31488eaa29386940b9e4b739fb401042f7ad883afb475ab10
+size 418824

gen_t2m.py ADDED Viewed

	@@ -0,0 +1,261 @@

+import os
+from os.path import join as pjoin
+import torch
+import torch.nn.functional as F
+from models.mask_transformer.transformer import MaskTransformer, ResidualTransformer
+from models.vq.model import RVQVAE, LengthEstimator
+from options.eval_option import EvalT2MOptions
+from utils.get_opt import get_opt
+from utils.fixseed import fixseed
+from visualization.joints2bvh import Joint2BVHConvertor
+from torch.distributions.categorical import Categorical
+from utils.motion_process import recover_from_ric
+from utils.plot_script import plot_3d_motion
+from utils.paramUtil import t2m_kinematic_chain
+import numpy as np
+clip_version = 'ViT-B/32'
+def load_vq_model(vq_opt):
+    # opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'opt.txt')
+    vq_model = RVQVAE(vq_opt,
+                vq_opt.dim_pose,
+                vq_opt.nb_code,
+                vq_opt.code_dim,
+                vq_opt.output_emb_width,
+                vq_opt.down_t,
+                vq_opt.stride_t,
+                vq_opt.width,
+                vq_opt.depth,
+                vq_opt.dilation_growth_rate,
+                vq_opt.vq_act,
+                vq_opt.vq_norm)
+    ckpt = torch.load(pjoin(vq_opt.checkpoints_dir, vq_opt.dataset_name, vq_opt.name, 'model', 'net_best_fid.tar'),
+                            map_location='cpu')
+    model_key = 'vq_model' if 'vq_model' in ckpt else 'net'
+    vq_model.load_state_dict(ckpt[model_key])
+    print(f'Loading VQ Model {vq_opt.name} Completed!')
+    return vq_model, vq_opt
+def load_trans_model(model_opt, opt, which_model):
+    t2m_transformer = MaskTransformer(code_dim=model_opt.code_dim,
+                                      cond_mode='text',
+                                      latent_dim=model_opt.latent_dim,
+                                      ff_size=model_opt.ff_size,
+                                      num_layers=model_opt.n_layers,
+                                      num_heads=model_opt.n_heads,
+                                      dropout=model_opt.dropout,
+                                      clip_dim=512,
+                                      cond_drop_prob=model_opt.cond_drop_prob,
+                                      clip_version=clip_version,
+                                      opt=model_opt)
+    ckpt = torch.load(pjoin(model_opt.checkpoints_dir, model_opt.dataset_name, model_opt.name, 'model', which_model),
+                      map_location='cpu')
+    model_key = 't2m_transformer' if 't2m_transformer' in ckpt else 'trans'
+    # print(ckpt.keys())
+    missing_keys, unexpected_keys = t2m_transformer.load_state_dict(ckpt[model_key], strict=False)
+    assert len(unexpected_keys) == 0
+    assert all([k.startswith('clip_model.') for k in missing_keys])
+    print(f'Loading Transformer {opt.name} from epoch {ckpt["ep"]}!')
+    return t2m_transformer
+def load_res_model(res_opt, vq_opt, opt):
+    res_opt.num_quantizers = vq_opt.num_quantizers
+    res_opt.num_tokens = vq_opt.nb_code
+    res_transformer = ResidualTransformer(code_dim=vq_opt.code_dim,
+                                            cond_mode='text',
+                                            latent_dim=res_opt.latent_dim,
+                                            ff_size=res_opt.ff_size,
+                                            num_layers=res_opt.n_layers,
+                                            num_heads=res_opt.n_heads,
+                                            dropout=res_opt.dropout,
+                                            clip_dim=512,
+                                            shared_codebook=vq_opt.shared_codebook,
+                                            cond_drop_prob=res_opt.cond_drop_prob,
+                                            # codebook=vq_model.quantizer.codebooks[0] if opt.fix_token_emb else None,
+                                            share_weight=res_opt.share_weight,
+                                            clip_version=clip_version,
+                                            opt=res_opt)
+    ckpt = torch.load(pjoin(res_opt.checkpoints_dir, res_opt.dataset_name, res_opt.name, 'model', 'net_best_fid.tar'),
+                      map_location=opt.device)
+    missing_keys, unexpected_keys = res_transformer.load_state_dict(ckpt['res_transformer'], strict=False)
+    assert len(unexpected_keys) == 0
+    assert all([k.startswith('clip_model.') for k in missing_keys])
+    print(f'Loading Residual Transformer {res_opt.name} from epoch {ckpt["ep"]}!')
+    return res_transformer
+def load_len_estimator(opt):
+    model = LengthEstimator(512, 50)
+    ckpt = torch.load(pjoin(opt.checkpoints_dir, opt.dataset_name, 'length_estimator', 'model', 'finest.tar'),
+                      map_location=opt.device)
+    model.load_state_dict(ckpt['estimator'])
+    print(f'Loading Length Estimator from epoch {ckpt["epoch"]}!')
+    return model
+if __name__ == '__main__':
+    parser = EvalT2MOptions()
+    opt = parser.parse()
+    fixseed(opt.seed)
+    opt.device = torch.device("cpu" if opt.gpu_id == -1 else "cuda:" + str(opt.gpu_id))
+    torch.autograd.set_detect_anomaly(True)
+    dim_pose = 251 if opt.dataset_name == 'kit' else 263
+    # out_dir = pjoin(opt.check)
+    root_dir = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.name)
+    model_dir = pjoin(root_dir, 'model')
+    result_dir = pjoin('./generation', opt.ext)
+    joints_dir = pjoin(result_dir, 'joints')
+    animation_dir = pjoin(result_dir, 'animations')
+    os.makedirs(joints_dir, exist_ok=True)
+    os.makedirs(animation_dir,exist_ok=True)
+    model_opt_path = pjoin(root_dir, 'opt.txt')
+    model_opt = get_opt(model_opt_path, device=opt.device)
+    #######################
+    ######Loading RVQ######
+    #######################
+    vq_opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, model_opt.vq_name, 'opt.txt')
+    vq_opt = get_opt(vq_opt_path, device=opt.device)
+    vq_opt.dim_pose = dim_pose
+    vq_model, vq_opt = load_vq_model(vq_opt)
+    model_opt.num_tokens = vq_opt.nb_code
+    model_opt.num_quantizers = vq_opt.num_quantizers
+    model_opt.code_dim = vq_opt.code_dim
+    #################################
+    ######Loading R-Transformer######
+    #################################
+    res_opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.res_name, 'opt.txt')
+    res_opt = get_opt(res_opt_path, device=opt.device)
+    res_model = load_res_model(res_opt, vq_opt, opt)
+    assert res_opt.vq_name == model_opt.vq_name
+    #################################
+    ######Loading M-Transformer######
+    #################################
+    t2m_transformer = load_trans_model(model_opt, opt, 'latest.tar')
+    ##################################
+    #####Loading Length Predictor#####
+    ##################################
+    length_estimator = load_len_estimator(model_opt)
+    t2m_transformer.eval()
+    vq_model.eval()
+    res_model.eval()
+    length_estimator.eval()
+    res_model.to(opt.device)
+    t2m_transformer.to(opt.device)
+    vq_model.to(opt.device)
+    length_estimator.to(opt.device)
+    ##### ---- Dataloader ---- #####
+    opt.nb_joints = 21 if opt.dataset_name == 'kit' else 22
+    mean = np.load(pjoin(opt.checkpoints_dir, opt.dataset_name, model_opt.vq_name, 'meta', 'mean.npy'))
+    std = np.load(pjoin(opt.checkpoints_dir, opt.dataset_name, model_opt.vq_name, 'meta', 'std.npy'))
+    def inv_transform(data):
+        return data * std + mean
+    prompt_list = []
+    length_list = []
+    est_length = False
+    if opt.text_prompt != "":
+        prompt_list.append(opt.text_prompt)
+        if opt.motion_length == 0:
+            est_length = True
+        else:
+            length_list.append(opt.motion_length)
+    elif opt.text_path != "":
+        with open(opt.text_path, 'r') as f:
+            lines = f.readlines()
+            for line in lines:
+                infos = line.split('#')
+                prompt_list.append(infos[0])
+                if len(infos) == 1 or (not infos[1].isdigit()):
+                    est_length = True
+                    length_list = []
+                else:
+                    length_list.append(int(infos[-1]))
+    else:
+        raise "A text prompt, or a file a text prompts are required!!!"
+    # print('loading checkpoint {}'.format(file))
+    if est_length:
+        print("Since no motion length are specified, we will use estimated motion lengthes!!")
+        text_embedding = t2m_transformer.encode_text(prompt_list)
+        pred_dis = length_estimator(text_embedding)
+        probs = F.softmax(pred_dis, dim=-1)  # (b, ntoken)
+        token_lens = Categorical(probs).sample()  # (b, seqlen)
+        # lengths = torch.multinomial()
+    else:
+        token_lens = torch.LongTensor(length_list) // 4
+        token_lens = token_lens.to(opt.device).long()
+    m_length = token_lens * 4
+    captions = prompt_list
+    sample = 0
+    kinematic_chain = t2m_kinematic_chain
+    converter = Joint2BVHConvertor()
+    for r in range(opt.repeat_times):
+        print("-->Repeat %d"%r)
+        with torch.no_grad():
+            mids = t2m_transformer.generate(captions, token_lens,
+                                            timesteps=opt.time_steps,
+                                            cond_scale=opt.cond_scale,
+                                            temperature=opt.temperature,
+                                            topk_filter_thres=opt.topkr,
+                                            gsample=opt.gumbel_sample)
+            # print(mids)
+            # print(mids.shape)
+            mids = res_model.generate(mids, captions, token_lens, temperature=1, cond_scale=5)
+            pred_motions = vq_model.forward_decoder(mids)
+            pred_motions = pred_motions.detach().cpu().numpy()
+            data = inv_transform(pred_motions)
+        for k, (caption, joint_data)  in enumerate(zip(captions, data)):
+            print("---->Sample %d: %s %d"%(k, caption, m_length[k]))
+            animation_path = pjoin(animation_dir, str(k))
+            joint_path = pjoin(joints_dir, str(k))
+            os.makedirs(animation_path, exist_ok=True)
+            os.makedirs(joint_path, exist_ok=True)
+            joint_data = joint_data[:m_length[k]]
+            joint = recover_from_ric(torch.from_numpy(joint_data).float(), 22).numpy()
+            bvh_path = pjoin(animation_path, "sample%d_repeat%d_len%d_ik.bvh"%(k, r, m_length[k]))
+            _, ik_joint = converter.convert(joint, filename=bvh_path, iterations=100)
+            bvh_path = pjoin(animation_path, "sample%d_repeat%d_len%d.bvh" % (k, r, m_length[k]))
+            _, joint = converter.convert(joint, filename=bvh_path, iterations=100, foot_ik=False)
+            save_path = pjoin(animation_path, "sample%d_repeat%d_len%d.mp4"%(k, r, m_length[k]))
+            ik_save_path = pjoin(animation_path, "sample%d_repeat%d_len%d_ik.mp4"%(k, r, m_length[k]))
+            plot_3d_motion(ik_save_path, kinematic_chain, ik_joint, title=caption, fps=20)
+            plot_3d_motion(save_path, kinematic_chain, joint, title=caption, fps=20)
+            np.save(pjoin(joint_path, "sample%d_repeat%d_len%d.npy"%(k, r, m_length[k])), joint)
+            np.save(pjoin(joint_path, "sample%d_repeat%d_len%d_ik.npy"%(k, r, m_length[k])), ik_joint)

models/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

models/__init__.py ADDED Viewed

File without changes

models/mask_transformer/__init__.py ADDED Viewed

File without changes

models/mask_transformer/tools.py ADDED Viewed

	@@ -0,0 +1,165 @@

+import torch
+import torch.nn.functional as F
+import math
+from einops import rearrange
+# return mask where padding is FALSE
+def lengths_to_mask(lengths, max_len):
+    # max_len = max(lengths)
+    mask = torch.arange(max_len, device=lengths.device).expand(len(lengths), max_len) < lengths.unsqueeze(1)
+    return mask #(b, len)
+# return mask where padding is ALL FALSE
+def get_pad_mask_idx(seq, pad_idx):
+    return (seq != pad_idx).unsqueeze(1)
+# Given seq: (b, s)
+# Return mat: (1, s, s)
+# Example Output:
+#        [[[ True, False, False],
+#          [ True,  True, False],
+#          [ True,  True,  True]]]
+# For causal attention
+def get_subsequent_mask(seq):
+    sz_b, seq_len = seq.shape
+    subsequent_mask = (1 - torch.triu(
+        torch.ones((1, seq_len, seq_len)), diagonal=1)).bool()
+    return subsequent_mask.to(seq.device)
+def exists(val):
+    return val is not None
+def default(val, d):
+    return val if exists(val) else d
+def eval_decorator(fn):
+    def inner(model, *args, **kwargs):
+        was_training = model.training
+        model.eval()
+        out = fn(model, *args, **kwargs)
+        model.train(was_training)
+        return out
+    return inner
+def l2norm(t):
+    return F.normalize(t, dim = -1)
+# tensor helpers
+# Get a random subset of TRUE mask, with prob
+def get_mask_subset_prob(mask, prob):
+    subset_mask = torch.bernoulli(mask, p=prob) & mask
+    return subset_mask
+# Get mask of special_tokens in ids
+def get_mask_special_tokens(ids, special_ids):
+    mask = torch.zeros_like(ids).bool()
+    for special_id in special_ids:
+        mask |= (ids==special_id)
+    return mask
+# network builder helpers
+def _get_activation_fn(activation):
+    if activation == "relu":
+        return F.relu
+    elif activation == "gelu":
+        return F.gelu
+    raise RuntimeError("activation should be relu/gelu, not {}".format(activation))
+# classifier free guidance functions
+def uniform(shape, device=None):
+    return torch.zeros(shape, device=device).float().uniform_(0, 1)
+def prob_mask_like(shape, prob, device=None):
+    if prob == 1:
+        return torch.ones(shape, device=device, dtype=torch.bool)
+    elif prob == 0:
+        return torch.zeros(shape, device=device, dtype=torch.bool)
+    else:
+        return uniform(shape, device=device) < prob
+# sampling helpers
+def log(t, eps = 1e-20):
+    return torch.log(t.clamp(min = eps))
+def gumbel_noise(t):
+    noise = torch.zeros_like(t).uniform_(0, 1)
+    return -log(-log(noise))
+def gumbel_sample(t, temperature = 1., dim = 1):
+    return ((t / max(temperature, 1e-10)) + gumbel_noise(t)).argmax(dim=dim)
+# Example input:
+#        [[ 0.3596,  0.0862,  0.9771, -1.0000, -1.0000, -1.0000],
+#         [ 0.4141,  0.1781,  0.6628,  0.5721, -1.0000, -1.0000],
+#         [ 0.9428,  0.3586,  0.1659,  0.8172,  0.9273, -1.0000]]
+# Example output:
+#        [[  -inf,   -inf, 0.9771,   -inf,   -inf,   -inf],
+#         [  -inf,   -inf, 0.6628,   -inf,   -inf,   -inf],
+#         [0.9428,   -inf,   -inf,   -inf,   -inf,   -inf]]
+def top_k(logits, thres = 0.9, dim = 1):
+    k = math.ceil((1 - thres) * logits.shape[dim])
+    val, ind = logits.topk(k, dim = dim)
+    probs = torch.full_like(logits, float('-inf'))
+    probs.scatter_(dim, ind, val)
+    # func verified
+    # print(probs)
+    # print(logits)
+    # raise
+    return probs
+# noise schedules
+# More on large value, less on small
+def cosine_schedule(t):
+    return torch.cos(t * math.pi * 0.5)
+def scale_cosine_schedule(t, scale):
+    return torch.clip(scale*torch.cos(t * math.pi * 0.5) + 1 - scale, min=0., max=1.)
+# More on small value, less on large
+def q_schedule(bs, low, high, device):
+    noise = uniform((bs,), device=device)
+    schedule = 1 - cosine_schedule(noise)
+    return torch.round(schedule * (high - low - 1)).long() + low
+def cal_performance(pred, labels, ignore_index=None, smoothing=0., tk=1):
+    loss = cal_loss(pred, labels, ignore_index, smoothing=smoothing)
+    # pred_id = torch.argmax(pred, dim=1)
+    # mask = labels.ne(ignore_index)
+    # n_correct = pred_id.eq(labels).masked_select(mask)
+    # acc = torch.mean(n_correct.float()).item()
+    pred_id_k = torch.topk(pred, k=tk, dim=1).indices
+    pred_id = pred_id_k[:, 0]
+    mask = labels.ne(ignore_index)
+    n_correct = (pred_id_k == labels.unsqueeze(1)).any(dim=1).masked_select(mask)
+    acc = torch.mean(n_correct.float()).item()
+    return loss, pred_id, acc
+def cal_loss(pred, labels, ignore_index=None, smoothing=0.):
+    '''Calculate cross entropy loss, apply label smoothing if needed.'''
+    # print(pred.shape, labels.shape) #torch.Size([64, 1028, 55]) torch.Size([64, 55])
+    # print(pred.shape, labels.shape) #torch.Size([64, 1027, 55]) torch.Size([64, 55])
+    if smoothing:
+        space = 2
+        n_class = pred.size(1)
+        mask = labels.ne(ignore_index)
+        one_hot = rearrange(F.one_hot(labels, n_class + space), 'a ... b -> a b ...')[:, :n_class]
+        # one_hot = torch.zeros_like(pred).scatter(1, labels.unsqueeze(1), 1)
+        sm_one_hot = one_hot * (1 - smoothing) + (1 - one_hot) * smoothing / (n_class - 1)
+        neg_log_prb = -F.log_softmax(pred, dim=1)
+        loss = (sm_one_hot * neg_log_prb).sum(dim=1)
+        # loss = F.cross_entropy(pred, sm_one_hot, reduction='none')
+        loss = torch.mean(loss.masked_select(mask))
+    else:
+        loss = F.cross_entropy(pred, labels, ignore_index=ignore_index)
+    return loss

models/mask_transformer/transformer.py ADDED Viewed

	@@ -0,0 +1,1039 @@

+import torch
+import torch.nn as nn
+import numpy as np
+# from networks.layers import *
+import torch.nn.functional as F
+import clip
+from einops import rearrange, repeat
+import math
+from random import random
+from tqdm.auto import tqdm
+from typing import Callable, Optional, List, Dict
+from copy import deepcopy
+from functools import partial
+from models.mask_transformer.tools import *
+from torch.distributions.categorical import Categorical
+class InputProcess(nn.Module):
+    def __init__(self, input_feats, latent_dim):
+        super().__init__()
+        self.input_feats = input_feats
+        self.latent_dim = latent_dim
+        self.poseEmbedding = nn.Linear(self.input_feats, self.latent_dim)
+    def forward(self, x):
+        # [bs, ntokens, input_feats]
+        x = x.permute((1, 0, 2)) # [seqen, bs, input_feats]
+        # print(x.shape)
+        x = self.poseEmbedding(x)  # [seqlen, bs, d]
+        return x
+class PositionalEncoding(nn.Module):
+    #Borrow from MDM, the same as above, but add dropout, exponential may improve precision
+    def __init__(self, d_model, dropout=0.1, max_len=5000):
+        super(PositionalEncoding, self).__init__()
+        self.dropout = nn.Dropout(p=dropout)
+        pe = torch.zeros(max_len, d_model)
+        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
+        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-np.log(10000.0) / d_model))
+        pe[:, 0::2] = torch.sin(position * div_term)
+        pe[:, 1::2] = torch.cos(position * div_term)
+        pe = pe.unsqueeze(0).transpose(0, 1) #[max_len, 1, d_model]
+        self.register_buffer('pe', pe)
+    def forward(self, x):
+        # not used in the final model
+        x = x + self.pe[:x.shape[0], :]
+        return self.dropout(x)
+class OutputProcess_Bert(nn.Module):
+    def __init__(self, out_feats, latent_dim):
+        super().__init__()
+        self.dense = nn.Linear(latent_dim, latent_dim)
+        self.transform_act_fn = F.gelu
+        self.LayerNorm = nn.LayerNorm(latent_dim, eps=1e-12)
+        self.poseFinal = nn.Linear(latent_dim, out_feats) #Bias!
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.transform_act_fn(hidden_states)
+        hidden_states = self.LayerNorm(hidden_states)
+        output = self.poseFinal(hidden_states)  # [seqlen, bs, out_feats]
+        output = output.permute(1, 2, 0)  # [bs, c, seqlen]
+        return output
+class OutputProcess(nn.Module):
+    def __init__(self, out_feats, latent_dim):
+        super().__init__()
+        self.dense = nn.Linear(latent_dim, latent_dim)
+        self.transform_act_fn = F.gelu
+        self.LayerNorm = nn.LayerNorm(latent_dim, eps=1e-12)
+        self.poseFinal = nn.Linear(latent_dim, out_feats) #Bias!
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.transform_act_fn(hidden_states)
+        hidden_states = self.LayerNorm(hidden_states)
+        output = self.poseFinal(hidden_states)  # [seqlen, bs, out_feats]
+        output = output.permute(1, 2, 0)  # [bs, e, seqlen]
+        return output
+class MaskTransformer(nn.Module):
+    def __init__(self, code_dim, cond_mode, latent_dim=256, ff_size=1024, num_layers=8,
+                 num_heads=4, dropout=0.1, clip_dim=512, cond_drop_prob=0.1,
+                 clip_version=None, opt=None, **kargs):
+        super(MaskTransformer, self).__init__()
+        print(f'latent_dim: {latent_dim}, ff_size: {ff_size}, nlayers: {num_layers}, nheads: {num_heads}, dropout: {dropout}')
+        self.code_dim = code_dim
+        self.latent_dim = latent_dim
+        self.clip_dim = clip_dim
+        self.dropout = dropout
+        self.opt = opt
+        self.cond_mode = cond_mode
+        self.cond_drop_prob = cond_drop_prob
+        if self.cond_mode == 'action':
+            assert 'num_actions' in kargs
+        self.num_actions = kargs.get('num_actions', 1)
+        '''
+        Preparing Networks
+        '''
+        self.input_process = InputProcess(self.code_dim, self.latent_dim)
+        self.position_enc = PositionalEncoding(self.latent_dim, self.dropout)
+        seqTransEncoderLayer = nn.TransformerEncoderLayer(d_model=self.latent_dim,
+                                                          nhead=num_heads,
+                                                          dim_feedforward=ff_size,
+                                                          dropout=dropout,
+                                                          activation='gelu')
+        self.seqTransEncoder = nn.TransformerEncoder(seqTransEncoderLayer,
+                                                     num_layers=num_layers)
+        self.encode_action = partial(F.one_hot, num_classes=self.num_actions)
+        # if self.cond_mode != 'no_cond':
+        if self.cond_mode == 'text':
+            self.cond_emb = nn.Linear(self.clip_dim, self.latent_dim)
+        elif self.cond_mode == 'action':
+            self.cond_emb = nn.Linear(self.num_actions, self.latent_dim)
+        elif self.cond_mode == 'uncond':
+            self.cond_emb = nn.Identity()
+        else:
+            raise KeyError("Unsupported condition mode!!!")
+        _num_tokens = opt.num_tokens + 2  # two dummy tokens, one for masking, one for padding
+        self.mask_id = opt.num_tokens
+        self.pad_id = opt.num_tokens + 1
+        self.output_process = OutputProcess_Bert(out_feats=opt.num_tokens, latent_dim=latent_dim)
+        self.token_emb = nn.Embedding(_num_tokens, self.code_dim)
+        self.apply(self.__init_weights)
+        '''
+        Preparing frozen weights
+        '''
+        if self.cond_mode == 'text':
+            print('Loading CLIP...')
+            self.clip_version = clip_version
+            self.clip_model = self.load_and_freeze_clip(clip_version)
+        self.noise_schedule = cosine_schedule
+    def load_and_freeze_token_emb(self, codebook):
+        '''
+        :param codebook: (c, d)
+        :return:
+        '''
+        assert self.training, 'Only necessary in training mode'
+        c, d = codebook.shape
+        self.token_emb.weight = nn.Parameter(torch.cat([codebook, torch.zeros(size=(2, d), device=codebook.device)], dim=0)) #add two dummy tokens, 0 vectors
+        self.token_emb.requires_grad_(False)
+        # self.token_emb.weight.requires_grad = False
+        # self.token_emb_ready = True
+        print("Token embedding initialized!")
+    def __init_weights(self, module):
+        if isinstance(module, (nn.Linear, nn.Embedding)):
+            module.weight.data.normal_(mean=0.0, std=0.02)
+            if isinstance(module, nn.Linear) and module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+    def parameters_wo_clip(self):
+        return [p for name, p in self.named_parameters() if not name.startswith('clip_model.')]
+    def load_and_freeze_clip(self, clip_version):
+        clip_model, clip_preprocess = clip.load(clip_version, device='cpu',
+                                                jit=False)  # Must set jit=False for training
+        # Cannot run on cpu
+        clip.model.convert_weights(
+            clip_model)  # Actually this line is unnecessary since clip by default already on float16
+        # Date 0707: It's necessary, only unecessary when load directly to gpu. Disable if need to run on cpu
+        # Freeze CLIP weights
+        clip_model.eval()
+        for p in clip_model.parameters():
+            p.requires_grad = False
+        return clip_model
+    def encode_text(self, raw_text):
+        device = next(self.parameters()).device
+        text = clip.tokenize(raw_text, truncate=True).to(device)
+        feat_clip_text = self.clip_model.encode_text(text).float()
+        return feat_clip_text
+    def mask_cond(self, cond, force_mask=False):
+        bs, d =  cond.shape
+        if force_mask:
+            return torch.zeros_like(cond)
+        elif self.training and self.cond_drop_prob > 0.:
+            mask = torch.bernoulli(torch.ones(bs, device=cond.device) * self.cond_drop_prob).view(bs, 1)
+            return cond * (1. - mask)
+        else:
+            return cond
+    def trans_forward(self, motion_ids, cond, padding_mask, force_mask=False):
+        '''
+        :param motion_ids: (b, seqlen)
+        :padding_mask: (b, seqlen), all pad positions are TRUE else FALSE
+        :param cond: (b, embed_dim) for text, (b, num_actions) for action
+        :param force_mask: boolean
+        :return:
+            -logits: (b, num_token, seqlen)
+        '''
+        cond = self.mask_cond(cond, force_mask=force_mask)
+        # print(motion_ids.shape)
+        x = self.token_emb(motion_ids)
+        # print(x.shape)
+        # (b, seqlen, d) -> (seqlen, b, latent_dim)
+        x = self.input_process(x)
+        cond = self.cond_emb(cond).unsqueeze(0) #(1, b, latent_dim)
+        x = self.position_enc(x)
+        xseq = torch.cat([cond, x], dim=0) #(seqlen+1, b, latent_dim)
+        padding_mask = torch.cat([torch.zeros_like(padding_mask[:, 0:1]), padding_mask], dim=1) #(b, seqlen+1)
+        # print(xseq.shape, padding_mask.shape)
+        # print(padding_mask.shape, xseq.shape)
+        output = self.seqTransEncoder(xseq, src_key_padding_mask=padding_mask)[1:] #(seqlen, b, e)
+        logits = self.output_process(output) #(seqlen, b, e) -> (b, ntoken, seqlen)
+        return logits
+    def forward(self, ids, y, m_lens):
+        '''
+        :param ids: (b, n)
+        :param y: raw text for cond_mode=text, (b, ) for cond_mode=action
+        :m_lens: (b,)
+        :return:
+        '''
+        bs, ntokens = ids.shape
+        device = ids.device
+        # Positions that are PADDED are ALL FALSE
+        non_pad_mask = lengths_to_mask(m_lens, ntokens) #(b, n)
+        ids = torch.where(non_pad_mask, ids, self.pad_id)
+        force_mask = False
+        if self.cond_mode == 'text':
+            with torch.no_grad():
+                cond_vector = self.encode_text(y)
+        elif self.cond_mode == 'action':
+            cond_vector = self.enc_action(y).to(device).float()
+        elif self.cond_mode == 'uncond':
+            cond_vector = torch.zeros(bs, self.latent_dim).float().to(device)
+            force_mask = True
+        else:
+            raise NotImplementedError("Unsupported condition mode!!!")
+        '''
+        Prepare mask
+        '''
+        rand_time = uniform((bs,), device=device)
+        rand_mask_probs = self.noise_schedule(rand_time)
+        num_token_masked = (ntokens * rand_mask_probs).round().clamp(min=1)
+        batch_randperm = torch.rand((bs, ntokens), device=device).argsort(dim=-1)
+        # Positions to be MASKED are ALL TRUE
+        mask = batch_randperm < num_token_masked.unsqueeze(-1)
+        # Positions to be MASKED must also be NON-PADDED
+        mask &= non_pad_mask
+        # Note this is our training target, not input
+        labels = torch.where(mask, ids, self.mask_id)
+        x_ids = ids.clone()
+        # Further Apply Bert Masking Scheme
+        # Step 1: 10% replace with an incorrect token
+        mask_rid = get_mask_subset_prob(mask, 0.1)
+        rand_id = torch.randint_like(x_ids, high=self.opt.num_tokens)
+        x_ids = torch.where(mask_rid, rand_id, x_ids)
+        # Step 2: 90% x 10% replace with correct token, and 90% x 88% replace with mask token
+        mask_mid = get_mask_subset_prob(mask & ~mask_rid, 0.88)
+        # mask_mid = mask
+        x_ids = torch.where(mask_mid, self.mask_id, x_ids)
+        logits = self.trans_forward(x_ids, cond_vector, ~non_pad_mask, force_mask)
+        ce_loss, pred_id, acc = cal_performance(logits, labels, ignore_index=self.mask_id)
+        return ce_loss, pred_id, acc
+    def forward_with_cond_scale(self,
+                                motion_ids,
+                                cond_vector,
+                                padding_mask,
+                                cond_scale=3,
+                                force_mask=False):
+        # bs = motion_ids.shape[0]
+        # if cond_scale == 1:
+        if force_mask:
+            return self.trans_forward(motion_ids, cond_vector, padding_mask, force_mask=True)
+        logits = self.trans_forward(motion_ids, cond_vector, padding_mask)
+        if cond_scale == 1:
+            return logits
+        aux_logits = self.trans_forward(motion_ids, cond_vector, padding_mask, force_mask=True)
+        scaled_logits = aux_logits + (logits - aux_logits) * cond_scale
+        return scaled_logits
+    @torch.no_grad()
+    @eval_decorator
+    def generate(self,
+                 conds,
+                 m_lens,
+                 timesteps: int,
+                 cond_scale: int,
+                 temperature=1,
+                 topk_filter_thres=0.9,
+                 gsample=False,
+                 force_mask=False
+                 ):
+        # print(self.opt.num_quantizers)
+        # assert len(timesteps) >= len(cond_scales) == self.opt.num_quantizers
+        device = next(self.parameters()).device
+        seq_len = max(m_lens)
+        batch_size = len(m_lens)
+        if self.cond_mode == 'text':
+            with torch.no_grad():
+                cond_vector = self.encode_text(conds)
+        elif self.cond_mode == 'action':
+            cond_vector = self.enc_action(conds).to(device)
+        elif self.cond_mode == 'uncond':
+            cond_vector = torch.zeros(batch_size, self.latent_dim).float().to(device)
+        else:
+            raise NotImplementedError("Unsupported condition mode!!!")
+        padding_mask = ~lengths_to_mask(m_lens, seq_len)
+        # print(padding_mask.shape, )
+        # Start from all tokens being masked
+        ids = torch.where(padding_mask, self.pad_id, self.mask_id)
+        scores = torch.where(padding_mask, 1e5, 0.)
+        starting_temperature = temperature
+        for timestep, steps_until_x0 in zip(torch.linspace(0, 1, timesteps, device=device), reversed(range(timesteps))):
+            # 0 < timestep < 1
+            rand_mask_prob = self.noise_schedule(timestep)  # Tensor
+            '''
+            Maskout, and cope with variable length
+            '''
+            # fix: the ratio regarding lengths, instead of seq_len
+            num_token_masked = torch.round(rand_mask_prob * m_lens).clamp(min=1)  # (b, )
+            # select num_token_masked tokens with lowest scores to be masked
+            sorted_indices = scores.argsort(
+                dim=1)  # (b, k), sorted_indices[i, j] = the index of j-th lowest element in scores on dim=1
+            ranks = sorted_indices.argsort(dim=1)  # (b, k), rank[i, j] = the rank (0: lowest) of scores[i, j] on dim=1
+            is_mask = (ranks < num_token_masked.unsqueeze(-1))
+            ids = torch.where(is_mask, self.mask_id, ids)
+            '''
+            Preparing input
+            '''
+            # (b, num_token, seqlen)
+            logits = self.forward_with_cond_scale(ids, cond_vector=cond_vector,
+                                                  padding_mask=padding_mask,
+                                                  cond_scale=cond_scale,
+                                                  force_mask=force_mask)
+            logits = logits.permute(0, 2, 1)  # (b, seqlen, ntoken)
+            # print(logits.shape, self.opt.num_tokens)
+            # clean low prob token
+            filtered_logits = top_k(logits, topk_filter_thres, dim=-1)
+            '''
+            Update ids
+            '''
+            # if force_mask:
+            temperature = starting_temperature
+            # else:
+            # temperature = starting_temperature * (steps_until_x0 / timesteps)
+            # temperature = max(temperature, 1e-4)
+            # print(filtered_logits.shape)
+            # temperature is annealed, gradually reducing temperature as well as randomness
+            if gsample:  # use gumbel_softmax sampling
+                # print("1111")
+                pred_ids = gumbel_sample(filtered_logits, temperature=temperature, dim=-1)  # (b, seqlen)
+            else:  # use multinomial sampling
+                # print("2222")
+                probs = F.softmax(filtered_logits, dim=-1)  # (b, seqlen, ntoken)
+                # print(temperature, starting_temperature, steps_until_x0, timesteps)
+                # print(probs / temperature)
+                pred_ids = Categorical(probs / temperature).sample()  # (b, seqlen)
+            # print(pred_ids.max(), pred_ids.min())
+            # if pred_ids.
+            ids = torch.where(is_mask, pred_ids, ids)
+            '''
+            Updating scores
+            '''
+            probs_without_temperature = logits.softmax(dim=-1)  # (b, seqlen, ntoken)
+            scores = probs_without_temperature.gather(2, pred_ids.unsqueeze(dim=-1))  # (b, seqlen, 1)
+            scores = scores.squeeze(-1)  # (b, seqlen)
+            # We do not want to re-mask the previously kept tokens, or pad tokens
+            scores = scores.masked_fill(~is_mask, 1e5)
+        ids = torch.where(padding_mask, -1, ids)
+        # print("Final", ids.max(), ids.min())
+        return ids
+    @torch.no_grad()
+    @eval_decorator
+    def edit(self,
+             conds,
+             tokens,
+             m_lens,
+             timesteps: int,
+             cond_scale: int,
+             temperature=1,
+             topk_filter_thres=0.9,
+             gsample=False,
+             force_mask=False,
+             edit_mask=None,
+             padding_mask=None,
+             ):
+        assert edit_mask.shape == tokens.shape if edit_mask is not None else True
+        device = next(self.parameters()).device
+        seq_len = tokens.shape[1]
+        if self.cond_mode == 'text':
+            with torch.no_grad():
+                cond_vector = self.encode_text(conds)
+        elif self.cond_mode == 'action':
+            cond_vector = self.enc_action(conds).to(device)
+        elif self.cond_mode == 'uncond':
+            cond_vector = torch.zeros(1, self.latent_dim).float().to(device)
+        else:
+            raise NotImplementedError("Unsupported condition mode!!!")
+        if padding_mask == None:
+            padding_mask = ~lengths_to_mask(m_lens, seq_len)
+        # Start from all tokens being masked
+        if edit_mask == None:
+            mask_free = True
+            ids = torch.where(padding_mask, self.pad_id, tokens)
+            edit_mask = torch.ones_like(padding_mask)
+            edit_mask = edit_mask & ~padding_mask
+            edit_len = edit_mask.sum(dim=-1)
+            scores = torch.where(edit_mask, 0., 1e5)
+        else:
+            mask_free = False
+            edit_mask = edit_mask & ~padding_mask
+            edit_len = edit_mask.sum(dim=-1)
+            ids = torch.where(edit_mask, self.mask_id, tokens)
+            scores = torch.where(edit_mask, 0., 1e5)
+        starting_temperature = temperature
+        for timestep, steps_until_x0 in zip(torch.linspace(0, 1, timesteps, device=device), reversed(range(timesteps))):
+            # 0 < timestep < 1
+            rand_mask_prob = 0.16 if mask_free else self.noise_schedule(timestep)  # Tensor
+            '''
+            Maskout, and cope with variable length
+            '''
+            # fix: the ratio regarding lengths, instead of seq_len
+            num_token_masked = torch.round(rand_mask_prob * edit_len).clamp(min=1)  # (b, )
+            # select num_token_masked tokens with lowest scores to be masked
+            sorted_indices = scores.argsort(
+                dim=1)  # (b, k), sorted_indices[i, j] = the index of j-th lowest element in scores on dim=1
+            ranks = sorted_indices.argsort(dim=1)  # (b, k), rank[i, j] = the rank (0: lowest) of scores[i, j] on dim=1
+            is_mask = (ranks < num_token_masked.unsqueeze(-1))
+            # is_mask = (torch.rand_like(scores) < 0.8) * ~padding_mask if mask_free else is_mask
+            ids = torch.where(is_mask, self.mask_id, ids)
+            '''
+            Preparing input
+            '''
+            # (b, num_token, seqlen)
+            logits = self.forward_with_cond_scale(ids, cond_vector=cond_vector,
+                                                  padding_mask=padding_mask,
+                                                  cond_scale=cond_scale,
+                                                  force_mask=force_mask)
+            logits = logits.permute(0, 2, 1)  # (b, seqlen, ntoken)
+            # print(logits.shape, self.opt.num_tokens)
+            # clean low prob token
+            filtered_logits = top_k(logits, topk_filter_thres, dim=-1)
+            '''
+            Update ids
+            '''
+            # if force_mask:
+            temperature = starting_temperature
+            # else:
+            # temperature = starting_temperature * (steps_until_x0 / timesteps)
+            # temperature = max(temperature, 1e-4)
+            # print(filtered_logits.shape)
+            # temperature is annealed, gradually reducing temperature as well as randomness
+            if gsample:  # use gumbel_softmax sampling
+                # print("1111")
+                pred_ids = gumbel_sample(filtered_logits, temperature=temperature, dim=-1)  # (b, seqlen)
+            else:  # use multinomial sampling
+                # print("2222")
+                probs = F.softmax(filtered_logits, dim=-1)  # (b, seqlen, ntoken)
+                # print(temperature, starting_temperature, steps_until_x0, timesteps)
+                # print(probs / temperature)
+                pred_ids = Categorical(probs / temperature).sample()  # (b, seqlen)
+            # print(pred_ids.max(), pred_ids.min())
+            # if pred_ids.
+            ids = torch.where(is_mask, pred_ids, ids)
+            '''
+            Updating scores
+            '''
+            probs_without_temperature = logits.softmax(dim=-1)  # (b, seqlen, ntoken)
+            scores = probs_without_temperature.gather(2, pred_ids.unsqueeze(dim=-1))  # (b, seqlen, 1)
+            scores = scores.squeeze(-1)  # (b, seqlen)
+            # We do not want to re-mask the previously kept tokens, or pad tokens
+            scores = scores.masked_fill(~edit_mask, 1e5) if mask_free else scores.masked_fill(~is_mask, 1e5)
+        ids = torch.where(padding_mask, -1, ids)
+        # print("Final", ids.max(), ids.min())
+        return ids
+    @torch.no_grad()
+    @eval_decorator
+    def edit_beta(self,
+                  conds,
+                  conds_og,
+                  tokens,
+                  m_lens,
+                  cond_scale: int,
+                  force_mask=False,
+                  ):
+        device = next(self.parameters()).device
+        seq_len = tokens.shape[1]
+        if self.cond_mode == 'text':
+            with torch.no_grad():
+                cond_vector = self.encode_text(conds)
+                if conds_og is not None:
+                    cond_vector_og = self.encode_text(conds_og)
+                else:
+                    cond_vector_og = None
+        elif self.cond_mode == 'action':
+            cond_vector = self.enc_action(conds).to(device)
+            if conds_og is not None:
+                cond_vector_og = self.enc_action(conds_og).to(device)
+            else:
+                cond_vector_og = None
+        else:
+            raise NotImplementedError("Unsupported condition mode!!!")
+        padding_mask = ~lengths_to_mask(m_lens, seq_len)
+        # Start from all tokens being masked
+        ids = torch.where(padding_mask, self.pad_id, tokens)  # Do not mask anything
+        '''
+        Preparing input
+        '''
+        # (b, num_token, seqlen)
+        logits = self.forward_with_cond_scale(ids,
+                                              cond_vector=cond_vector,
+                                              cond_vector_neg=cond_vector_og,
+                                              padding_mask=padding_mask,
+                                              cond_scale=cond_scale,
+                                              force_mask=force_mask)
+        logits = logits.permute(0, 2, 1)  # (b, seqlen, ntoken)
+        '''
+        Updating scores
+        '''
+        probs_without_temperature = logits.softmax(dim=-1)  # (b, seqlen, ntoken)
+        tokens[tokens == -1] = 0  # just to get through an error when index = -1 using gather
+        og_tokens_scores = probs_without_temperature.gather(2, tokens.unsqueeze(dim=-1))  # (b, seqlen, 1)
+        og_tokens_scores = og_tokens_scores.squeeze(-1)  # (b, seqlen)
+        return og_tokens_scores
+class ResidualTransformer(nn.Module):
+    def __init__(self, code_dim, cond_mode, latent_dim=256, ff_size=1024, num_layers=8, cond_drop_prob=0.1,
+                 num_heads=4, dropout=0.1, clip_dim=512, shared_codebook=False, share_weight=False,
+                 clip_version=None, opt=None, **kargs):
+        super(ResidualTransformer, self).__init__()
+        print(f'latent_dim: {latent_dim}, ff_size: {ff_size}, nlayers: {num_layers}, nheads: {num_heads}, dropout: {dropout}')
+        # assert shared_codebook == True, "Only support shared codebook right now!"
+        self.code_dim = code_dim
+        self.latent_dim = latent_dim
+        self.clip_dim = clip_dim
+        self.dropout = dropout
+        self.opt = opt
+        self.cond_mode = cond_mode
+        # self.cond_drop_prob = cond_drop_prob
+        if self.cond_mode == 'action':
+            assert 'num_actions' in kargs
+        self.num_actions = kargs.get('num_actions', 1)
+        self.cond_drop_prob = cond_drop_prob
+        '''
+        Preparing Networks
+        '''
+        self.input_process = InputProcess(self.code_dim, self.latent_dim)
+        self.position_enc = PositionalEncoding(self.latent_dim, self.dropout)
+        seqTransEncoderLayer = nn.TransformerEncoderLayer(d_model=self.latent_dim,
+                                                          nhead=num_heads,
+                                                          dim_feedforward=ff_size,
+                                                          dropout=dropout,
+                                                          activation='gelu')
+        self.seqTransEncoder = nn.TransformerEncoder(seqTransEncoderLayer,
+                                                     num_layers=num_layers)
+        self.encode_quant = partial(F.one_hot, num_classes=self.opt.num_quantizers)
+        self.encode_action = partial(F.one_hot, num_classes=self.num_actions)
+        self.quant_emb = nn.Linear(self.opt.num_quantizers, self.latent_dim)
+        # if self.cond_mode != 'no_cond':
+        if self.cond_mode == 'text':
+            self.cond_emb = nn.Linear(self.clip_dim, self.latent_dim)
+        elif self.cond_mode == 'action':
+            self.cond_emb = nn.Linear(self.num_actions, self.latent_dim)
+        else:
+            raise KeyError("Unsupported condition mode!!!")
+        _num_tokens = opt.num_tokens + 1  # one dummy tokens for padding
+        self.pad_id = opt.num_tokens
+        # self.output_process = OutputProcess_Bert(out_feats=opt.num_tokens, latent_dim=latent_dim)
+        self.output_process = OutputProcess(out_feats=code_dim, latent_dim=latent_dim)
+        if shared_codebook:
+            token_embed = nn.Parameter(torch.normal(mean=0, std=0.02, size=(_num_tokens, code_dim)))
+            self.token_embed_weight = token_embed.expand(opt.num_quantizers-1, _num_tokens, code_dim)
+            if share_weight:
+                self.output_proj_weight = self.token_embed_weight
+                self.output_proj_bias = None
+            else:
+                output_proj = nn.Parameter(torch.normal(mean=0, std=0.02, size=(_num_tokens, code_dim)))
+                output_bias = nn.Parameter(torch.zeros(size=(_num_tokens,)))
+                # self.output_proj_bias = 0
+                self.output_proj_weight = output_proj.expand(opt.num_quantizers-1, _num_tokens, code_dim)
+                self.output_proj_bias = output_bias.expand(opt.num_quantizers-1, _num_tokens)
+        else:
+            if share_weight:
+                self.embed_proj_shared_weight = nn.Parameter(torch.normal(mean=0, std=0.02, size=(opt.num_quantizers - 2, _num_tokens, code_dim)))
+                self.token_embed_weight_ = nn.Parameter(torch.normal(mean=0, std=0.02, size=(1, _num_tokens, code_dim)))
+                self.output_proj_weight_ = nn.Parameter(torch.normal(mean=0, std=0.02, size=(1, _num_tokens, code_dim)))
+                self.output_proj_bias = None
+                self.registered = False
+            else:
+                output_proj_weight = torch.normal(mean=0, std=0.02,
+                                                  size=(opt.num_quantizers - 1, _num_tokens, code_dim))
+                self.output_proj_weight = nn.Parameter(output_proj_weight)
+                self.output_proj_bias = nn.Parameter(torch.zeros(size=(opt.num_quantizers, _num_tokens)))
+                token_embed_weight = torch.normal(mean=0, std=0.02,
+                                                  size=(opt.num_quantizers - 1, _num_tokens, code_dim))
+                self.token_embed_weight = nn.Parameter(token_embed_weight)
+        self.apply(self.__init_weights)
+        self.shared_codebook = shared_codebook
+        self.share_weight = share_weight
+        if self.cond_mode == 'text':
+            print('Loading CLIP...')
+            self.clip_version = clip_version
+            self.clip_model = self.load_and_freeze_clip(clip_version)
+    # def
+    def mask_cond(self, cond, force_mask=False):
+        bs, d =  cond.shape
+        if force_mask:
+            return torch.zeros_like(cond)
+        elif self.training and self.cond_drop_prob > 0.:
+            mask = torch.bernoulli(torch.ones(bs, device=cond.device) * self.cond_drop_prob).view(bs, 1)
+            return cond * (1. - mask)
+        else:
+            return cond
+    def __init_weights(self, module):
+        if isinstance(module, (nn.Linear, nn.Embedding)):
+            module.weight.data.normal_(mean=0.0, std=0.02)
+            if isinstance(module, nn.Linear) and module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+    def parameters_wo_clip(self):
+        return [p for name, p in self.named_parameters() if not name.startswith('clip_model.')]
+    def load_and_freeze_clip(self, clip_version):
+        clip_model, clip_preprocess = clip.load(clip_version, device='cpu',
+                                                jit=False)  # Must set jit=False for training
+        # Cannot run on cpu
+        clip.model.convert_weights(
+            clip_model)  # Actually this line is unnecessary since clip by default already on float16
+        # Date 0707: It's necessary, only unecessary when load directly to gpu. Disable if need to run on cpu
+        # Freeze CLIP weights
+        clip_model.eval()
+        for p in clip_model.parameters():
+            p.requires_grad = False
+        return clip_model
+    def encode_text(self, raw_text):
+        device = next(self.parameters()).device
+        text = clip.tokenize(raw_text, truncate=True).to(device)
+        feat_clip_text = self.clip_model.encode_text(text).float()
+        return feat_clip_text
+    def q_schedule(self, bs, low, high):
+        noise = uniform((bs,), device=self.opt.device)
+        schedule = 1 - cosine_schedule(noise)
+        return torch.round(schedule * (high - low)) + low
+    def process_embed_proj_weight(self):
+        if self.share_weight and (not self.shared_codebook):
+            # if not self.registered:
+            self.output_proj_weight = torch.cat([self.embed_proj_shared_weight, self.output_proj_weight_], dim=0)
+            self.token_embed_weight = torch.cat([self.token_embed_weight_, self.embed_proj_shared_weight], dim=0)
+                # self.registered = True
+    def output_project(self, logits, qids):
+        '''
+        :logits: (bs, code_dim, seqlen)
+        :qids: (bs)
+        :return:
+            -logits (bs, ntoken, seqlen)
+        '''
+        # (num_qlayers-1, num_token, code_dim) -> (bs, ntoken, code_dim)
+        output_proj_weight = self.output_proj_weight[qids]
+        # (num_qlayers, ntoken) -> (bs, ntoken)
+        output_proj_bias = None if self.output_proj_bias is None else self.output_proj_bias[qids]
+        output = torch.einsum('bnc, bcs->bns', output_proj_weight, logits)
+        if output_proj_bias is not None:
+            output += output + output_proj_bias.unsqueeze(-1)
+        return output
+    def trans_forward(self, motion_codes, qids, cond, padding_mask, force_mask=False):
+        '''
+        :param motion_codes: (b, seqlen, d)
+        :padding_mask: (b, seqlen), all pad positions are TRUE else FALSE
+        :param qids: (b), quantizer layer ids
+        :param cond: (b, embed_dim) for text, (b, num_actions) for action
+        :return:
+            -logits: (b, num_token, seqlen)
+        '''
+        cond = self.mask_cond(cond, force_mask=force_mask)
+        # (b, seqlen, d) -> (seqlen, b, latent_dim)
+        x = self.input_process(motion_codes)
+        # (b, num_quantizer)
+        q_onehot = self.encode_quant(qids).float().to(x.device)
+        q_emb = self.quant_emb(q_onehot).unsqueeze(0)  # (1, b, latent_dim)
+        cond = self.cond_emb(cond).unsqueeze(0)  # (1, b, latent_dim)
+        x = self.position_enc(x)
+        xseq = torch.cat([cond, q_emb, x], dim=0)  # (seqlen+2, b, latent_dim)
+        padding_mask = torch.cat([torch.zeros_like(padding_mask[:, 0:2]), padding_mask], dim=1)  # (b, seqlen+2)
+        output = self.seqTransEncoder(xseq, src_key_padding_mask=padding_mask)[2:]  # (seqlen, b, e)
+        logits = self.output_process(output)
+        return logits
+    def forward_with_cond_scale(self,
+                                motion_codes,
+                                q_id,
+                                cond_vector,
+                                padding_mask,
+                                cond_scale=3,
+                                force_mask=False):
+        bs = motion_codes.shape[0]
+        # if cond_scale == 1:
+        qids = torch.full((bs,), q_id, dtype=torch.long, device=motion_codes.device)
+        if force_mask:
+            logits = self.trans_forward(motion_codes, qids, cond_vector, padding_mask, force_mask=True)
+            logits = self.output_project(logits, qids-1)
+            return logits
+        logits = self.trans_forward(motion_codes, qids, cond_vector, padding_mask)
+        logits = self.output_project(logits, qids-1)
+        if cond_scale == 1:
+            return logits
+        aux_logits = self.trans_forward(motion_codes, qids, cond_vector, padding_mask, force_mask=True)
+        aux_logits = self.output_project(aux_logits, qids-1)
+        scaled_logits = aux_logits + (logits - aux_logits) * cond_scale
+        return scaled_logits
+    def forward(self, all_indices, y, m_lens):
+        '''
+        :param all_indices: (b, n, q)
+        :param y: raw text for cond_mode=text, (b, ) for cond_mode=action
+        :m_lens: (b,)
+        :return:
+        '''
+        self.process_embed_proj_weight()
+        bs, ntokens, num_quant_layers = all_indices.shape
+        device = all_indices.device
+        # Positions that are PADDED are ALL FALSE
+        non_pad_mask = lengths_to_mask(m_lens, ntokens)  # (b, n)
+        q_non_pad_mask = repeat(non_pad_mask, 'b n -> b n q', q=num_quant_layers)
+        all_indices = torch.where(q_non_pad_mask, all_indices, self.pad_id) #(b, n, q)
+        # randomly sample quantization layers to work on, [1, num_q)
+        active_q_layers = q_schedule(bs, low=1, high=num_quant_layers, device=device)
+        # print(self.token_embed_weight.shape, all_indices.shape)
+        token_embed = repeat(self.token_embed_weight, 'q c d-> b c d q', b=bs)
+        gather_indices = repeat(all_indices[..., :-1], 'b n q -> b n d q', d=token_embed.shape[2])
+        # print(token_embed.shape, gather_indices.shape)
+        all_codes = token_embed.gather(1, gather_indices)  # (b, n, d, q-1)
+        cumsum_codes = torch.cumsum(all_codes, dim=-1) #(b, n, d, q-1)
+        active_indices = all_indices[torch.arange(bs), :, active_q_layers]  # (b, n)
+        history_sum = cumsum_codes[torch.arange(bs), :, :, active_q_layers - 1]
+        force_mask = False
+        if self.cond_mode == 'text':
+            with torch.no_grad():
+                cond_vector = self.encode_text(y)
+        elif self.cond_mode == 'action':
+            cond_vector = self.enc_action(y).to(device).float()
+        elif self.cond_mode == 'uncond':
+            cond_vector = torch.zeros(bs, self.latent_dim).float().to(device)
+            force_mask = True
+        else:
+            raise NotImplementedError("Unsupported condition mode!!!")
+        logits = self.trans_forward(history_sum, active_q_layers, cond_vector, ~non_pad_mask, force_mask)
+        logits = self.output_project(logits, active_q_layers-1)
+        ce_loss, pred_id, acc = cal_performance(logits, active_indices, ignore_index=self.pad_id)
+        return ce_loss, pred_id, acc
+    @torch.no_grad()
+    @eval_decorator
+    def generate(self,
+                 motion_ids,
+                 conds,
+                 m_lens,
+                 temperature=1,
+                 topk_filter_thres=0.9,
+                 cond_scale=2,
+                 num_res_layers=-1, # If it's -1, use all.
+                 ):
+        # print(self.opt.num_quantizers)
+        # assert len(timesteps) >= len(cond_scales) == self.opt.num_quantizers
+        self.process_embed_proj_weight()
+        device = next(self.parameters()).device
+        seq_len = motion_ids.shape[1]
+        batch_size = len(conds)
+        if self.cond_mode == 'text':
+            with torch.no_grad():
+                cond_vector = self.encode_text(conds)
+        elif self.cond_mode == 'action':
+            cond_vector = self.enc_action(conds).to(device)
+        elif self.cond_mode == 'uncond':
+            cond_vector = torch.zeros(batch_size, self.latent_dim).float().to(device)
+        else:
+            raise NotImplementedError("Unsupported condition mode!!!")
+        # token_embed = repeat(self.token_embed_weight, 'c d -> b c d', b=batch_size)
+        # gathered_ids = repeat(motion_ids, 'b n -> b n d', d=token_embed.shape[-1])
+        # history_sum = token_embed.gather(1, gathered_ids)
+        # print(pa, seq_len)
+        padding_mask = ~lengths_to_mask(m_lens, seq_len)
+        # print(padding_mask.shape, motion_ids.shape)
+        motion_ids = torch.where(padding_mask, self.pad_id, motion_ids)
+        all_indices = [motion_ids]
+        history_sum = 0
+        num_quant_layers = self.opt.num_quantizers if num_res_layers==-1 else num_res_layers+1
+        for i in range(1, num_quant_layers):
+            # print(f"--> Working on {i}-th quantizer")
+            # Start from all tokens being masked
+            # qids = torch.full((batch_size,), i, dtype=torch.long, device=motion_ids.device)
+            token_embed = self.token_embed_weight[i-1]
+            token_embed = repeat(token_embed, 'c d -> b c d', b=batch_size)
+            gathered_ids = repeat(motion_ids, 'b n -> b n d', d=token_embed.shape[-1])
+            history_sum += token_embed.gather(1, gathered_ids)
+            logits = self.forward_with_cond_scale(history_sum, i, cond_vector, padding_mask, cond_scale=cond_scale)
+            # logits = self.trans_forward(history_sum, qids, cond_vector, padding_mask)
+            logits = logits.permute(0, 2, 1)  # (b, seqlen, ntoken)
+            # clean low prob token
+            filtered_logits = top_k(logits, topk_filter_thres, dim=-1)
+            pred_ids = gumbel_sample(filtered_logits, temperature=temperature, dim=-1)  # (b, seqlen)
+            # probs = F.softmax(filtered_logits, dim=-1)  # (b, seqlen, ntoken)
+            # # print(temperature, starting_temperature, steps_until_x0, timesteps)
+            # # print(probs / temperature)
+            # pred_ids = Categorical(probs / temperature).sample()  # (b, seqlen)
+            ids = torch.where(padding_mask, self.pad_id, pred_ids)
+            motion_ids = ids
+            all_indices.append(ids)
+        all_indices = torch.stack(all_indices, dim=-1)
+        # padding_mask = repeat(padding_mask, 'b n -> b n q', q=all_indices.shape[-1])
+        # all_indices = torch.where(padding_mask, -1, all_indices)
+        all_indices = torch.where(all_indices==self.pad_id, -1, all_indices)
+        # all_indices = all_indices.masked_fill()
+        return all_indices
+    @torch.no_grad()
+    @eval_decorator
+    def edit(self,
+            motion_ids,
+            conds,
+            m_lens,
+            temperature=1,
+            topk_filter_thres=0.9,
+            cond_scale=2
+            ):
+        # print(self.opt.num_quantizers)
+        # assert len(timesteps) >= len(cond_scales) == self.opt.num_quantizers
+        self.process_embed_proj_weight()
+        device = next(self.parameters()).device
+        seq_len = motion_ids.shape[1]
+        batch_size = len(conds)
+        if self.cond_mode == 'text':
+            with torch.no_grad():
+                cond_vector = self.encode_text(conds)
+        elif self.cond_mode == 'action':
+            cond_vector = self.enc_action(conds).to(device)
+        elif self.cond_mode == 'uncond':
+            cond_vector = torch.zeros(batch_size, self.latent_dim).float().to(device)
+        else:
+            raise NotImplementedError("Unsupported condition mode!!!")
+        # token_embed = repeat(self.token_embed_weight, 'c d -> b c d', b=batch_size)
+        # gathered_ids = repeat(motion_ids, 'b n -> b n d', d=token_embed.shape[-1])
+        # history_sum = token_embed.gather(1, gathered_ids)
+        # print(pa, seq_len)
+        padding_mask = ~lengths_to_mask(m_lens, seq_len)
+        # print(padding_mask.shape, motion_ids.shape)
+        motion_ids = torch.where(padding_mask, self.pad_id, motion_ids)
+        all_indices = [motion_ids]
+        history_sum = 0
+        for i in range(1, self.opt.num_quantizers):
+            # print(f"--> Working on {i}-th quantizer")
+            # Start from all tokens being masked
+            # qids = torch.full((batch_size,), i, dtype=torch.long, device=motion_ids.device)
+            token_embed = self.token_embed_weight[i-1]
+            token_embed = repeat(token_embed, 'c d -> b c d', b=batch_size)
+            gathered_ids = repeat(motion_ids, 'b n -> b n d', d=token_embed.shape[-1])
+            history_sum += token_embed.gather(1, gathered_ids)
+            logits = self.forward_with_cond_scale(history_sum, i, cond_vector, padding_mask, cond_scale=cond_scale)
+            # logits = self.trans_forward(history_sum, qids, cond_vector, padding_mask)
+            logits = logits.permute(0, 2, 1)  # (b, seqlen, ntoken)
+            # clean low prob token
+            filtered_logits = top_k(logits, topk_filter_thres, dim=-1)
+            pred_ids = gumbel_sample(filtered_logits, temperature=temperature, dim=-1)  # (b, seqlen)
+            # probs = F.softmax(filtered_logits, dim=-1)  # (b, seqlen, ntoken)
+            # # print(temperature, starting_temperature, steps_until_x0, timesteps)
+            # # print(probs / temperature)
+            # pred_ids = Categorical(probs / temperature).sample()  # (b, seqlen)
+            ids = torch.where(padding_mask, self.pad_id, pred_ids)
+            motion_ids = ids
+            all_indices.append(ids)
+        all_indices = torch.stack(all_indices, dim=-1)
+        # padding_mask = repeat(padding_mask, 'b n -> b n q', q=all_indices.shape[-1])
+        # all_indices = torch.where(padding_mask, -1, all_indices)
+        all_indices = torch.where(all_indices==self.pad_id, -1, all_indices)
+        # all_indices = all_indices.masked_fill()
+        return all_indices

models/mask_transformer/transformer_trainer.py ADDED Viewed

	@@ -0,0 +1,359 @@

+import torch
+from collections import defaultdict
+import torch.optim as optim
+# import tensorflow as tf
+from torch.utils.tensorboard import SummaryWriter
+from collections import OrderedDict
+from utils.utils import *
+from os.path import join as pjoin
+from utils.eval_t2m import evaluation_mask_transformer, evaluation_res_transformer
+from models.mask_transformer.tools import *
+from einops import rearrange, repeat
+def def_value():
+    return 0.0
+class MaskTransformerTrainer:
+    def __init__(self, args, t2m_transformer, vq_model):
+        self.opt = args
+        self.t2m_transformer = t2m_transformer
+        self.vq_model = vq_model
+        self.device = args.device
+        self.vq_model.eval()
+        if args.is_train:
+            self.logger = SummaryWriter(args.log_dir)
+    def update_lr_warm_up(self, nb_iter, warm_up_iter, lr):
+        current_lr = lr * (nb_iter + 1) / (warm_up_iter + 1)
+        for param_group in self.opt_t2m_transformer.param_groups:
+            param_group["lr"] = current_lr
+        return current_lr
+    def forward(self, batch_data):
+        conds, motion, m_lens = batch_data
+        motion = motion.detach().float().to(self.device)
+        m_lens = m_lens.detach().long().to(self.device)
+        # (b, n, q)
+        code_idx, _ = self.vq_model.encode(motion)
+        m_lens = m_lens // 4
+        conds = conds.to(self.device).float() if torch.is_tensor(conds) else conds
+        # loss_dict = {}
+        # self.pred_ids = []
+        # self.acc = []
+        _loss, _pred_ids, _acc = self.t2m_transformer(code_idx[..., 0], conds, m_lens)
+        return _loss, _acc
+    def update(self, batch_data):
+        loss, acc = self.forward(batch_data)
+        self.opt_t2m_transformer.zero_grad()
+        loss.backward()
+        self.opt_t2m_transformer.step()
+        self.scheduler.step()
+        return loss.item(), acc
+    def save(self, file_name, ep, total_it):
+        t2m_trans_state_dict = self.t2m_transformer.state_dict()
+        clip_weights = [e for e in t2m_trans_state_dict.keys() if e.startswith('clip_model.')]
+        for e in clip_weights:
+            del t2m_trans_state_dict[e]
+        state = {
+            't2m_transformer': t2m_trans_state_dict,
+            'opt_t2m_transformer': self.opt_t2m_transformer.state_dict(),
+            'scheduler':self.scheduler.state_dict(),
+            'ep': ep,
+            'total_it': total_it,
+        }
+        torch.save(state, file_name)
+    def resume(self, model_dir):
+        checkpoint = torch.load(model_dir, map_location=self.device)
+        missing_keys, unexpected_keys = self.t2m_transformer.load_state_dict(checkpoint['t2m_transformer'], strict=False)
+        assert len(unexpected_keys) == 0
+        assert all([k.startswith('clip_model.') for k in missing_keys])
+        try:
+            self.opt_t2m_transformer.load_state_dict(checkpoint['opt_t2m_transformer']) # Optimizer
+            self.scheduler.load_state_dict(checkpoint['scheduler']) # Scheduler
+        except:
+            print('Resume wo optimizer')
+        return checkpoint['ep'], checkpoint['total_it']
+    def train(self, train_loader, val_loader, eval_val_loader, eval_wrapper, plot_eval):
+        self.t2m_transformer.to(self.device)
+        self.vq_model.to(self.device)
+        self.opt_t2m_transformer = optim.AdamW(self.t2m_transformer.parameters(), betas=(0.9, 0.99), lr=self.opt.lr, weight_decay=1e-5)
+        self.scheduler = optim.lr_scheduler.MultiStepLR(self.opt_t2m_transformer,
+                                                        milestones=self.opt.milestones,
+                                                        gamma=self.opt.gamma)
+        epoch = 0
+        it = 0
+        if self.opt.is_continue:
+            model_dir = pjoin(self.opt.model_dir, 'latest.tar')  # TODO
+            epoch, it = self.resume(model_dir)
+            print("Load model epoch:%d iterations:%d"%(epoch, it))
+        start_time = time.time()
+        total_iters = self.opt.max_epoch * len(train_loader)
+        print(f'Total Epochs: {self.opt.max_epoch}, Total Iters: {total_iters}')
+        print('Iters Per Epoch, Training: %04d, Validation: %03d' % (len(train_loader), len(val_loader)))
+        logs = defaultdict(def_value, OrderedDict())
+        best_fid, best_div, best_top1, best_top2, best_top3, best_matching, writer = evaluation_mask_transformer(
+            self.opt.save_root, eval_val_loader, self.t2m_transformer, self.vq_model, self.logger, epoch,
+            best_fid=100, best_div=100,
+            best_top1=0, best_top2=0, best_top3=0,
+            best_matching=100, eval_wrapper=eval_wrapper,
+            plot_func=plot_eval, save_ckpt=False, save_anim=False
+        )
+        best_acc = 0.
+        while epoch < self.opt.max_epoch:
+            self.t2m_transformer.train()
+            self.vq_model.eval()
+            for i, batch in enumerate(train_loader):
+                it += 1
+                if it < self.opt.warm_up_iter:
+                    self.update_lr_warm_up(it, self.opt.warm_up_iter, self.opt.lr)
+                loss, acc = self.update(batch_data=batch)
+                logs['loss'] += loss
+                logs['acc'] += acc
+                logs['lr'] += self.opt_t2m_transformer.param_groups[0]['lr']
+                if it % self.opt.log_every == 0:
+                    mean_loss = OrderedDict()
+                    # self.logger.add_scalar('val_loss', val_loss, it)
+                    # self.l
+                    for tag, value in logs.items():
+                        self.logger.add_scalar('Train/%s'%tag, value / self.opt.log_every, it)
+                        mean_loss[tag] = value / self.opt.log_every
+                    logs = defaultdict(def_value, OrderedDict())
+                    print_current_loss(start_time, it, total_iters, mean_loss, epoch=epoch, inner_iter=i)
+                if it % self.opt.save_latest == 0:
+                    self.save(pjoin(self.opt.model_dir, 'latest.tar'), epoch, it)
+            self.save(pjoin(self.opt.model_dir, 'latest.tar'), epoch, it)
+            epoch += 1
+            print('Validation time:')
+            self.vq_model.eval()
+            self.t2m_transformer.eval()
+            val_loss = []
+            val_acc = []
+            with torch.no_grad():
+                for i, batch_data in enumerate(val_loader):
+                    loss, acc = self.forward(batch_data)
+                    val_loss.append(loss.item())
+                    val_acc.append(acc)
+            print(f"Validation loss:{np.mean(val_loss):.3f}, accuracy:{np.mean(val_acc):.3f}")
+            self.logger.add_scalar('Val/loss', np.mean(val_loss), epoch)
+            self.logger.add_scalar('Val/acc', np.mean(val_acc), epoch)
+            if np.mean(val_acc) > best_acc:
+                print(f"Improved accuracy from {best_acc:.02f} to {np.mean(val_acc)}!!!")
+                self.save(pjoin(self.opt.model_dir, 'net_best_acc.tar'), epoch, it)
+                best_acc = np.mean(val_acc)
+            best_fid, best_div, best_top1, best_top2, best_top3, best_matching, writer = evaluation_mask_transformer(
+                self.opt.save_root, eval_val_loader, self.t2m_transformer, self.vq_model, self.logger, epoch, best_fid=best_fid,
+                best_div=best_div, best_top1=best_top1, best_top2=best_top2, best_top3=best_top3,
+                best_matching=best_matching, eval_wrapper=eval_wrapper,
+                plot_func=plot_eval, save_ckpt=True, save_anim=(epoch%self.opt.eval_every_e==0)
+            )
+class ResidualTransformerTrainer:
+    def __init__(self, args, res_transformer, vq_model):
+        self.opt = args
+        self.res_transformer = res_transformer
+        self.vq_model = vq_model
+        self.device = args.device
+        self.vq_model.eval()
+        if args.is_train:
+            self.logger = SummaryWriter(args.log_dir)
+            # self.l1_criterion = torch.nn.SmoothL1Loss()
+    def update_lr_warm_up(self, nb_iter, warm_up_iter, lr):
+        current_lr = lr * (nb_iter + 1) / (warm_up_iter + 1)
+        for param_group in self.opt_res_transformer.param_groups:
+            param_group["lr"] = current_lr
+        return current_lr
+    def forward(self, batch_data):
+        conds, motion, m_lens = batch_data
+        motion = motion.detach().float().to(self.device)
+        m_lens = m_lens.detach().long().to(self.device)
+        # (b, n, q), (q, b, n ,d)
+        code_idx, all_codes = self.vq_model.encode(motion)
+        m_lens = m_lens // 4
+        conds = conds.to(self.device).float() if torch.is_tensor(conds) else conds
+        ce_loss, pred_ids, acc = self.res_transformer(code_idx, conds, m_lens)
+        return ce_loss, acc
+    def update(self, batch_data):
+        loss, acc = self.forward(batch_data)
+        self.opt_res_transformer.zero_grad()
+        loss.backward()
+        self.opt_res_transformer.step()
+        self.scheduler.step()
+        return loss.item(), acc
+    def save(self, file_name, ep, total_it):
+        res_trans_state_dict = self.res_transformer.state_dict()
+        clip_weights = [e for e in res_trans_state_dict.keys() if e.startswith('clip_model.')]
+        for e in clip_weights:
+            del res_trans_state_dict[e]
+        state = {
+            'res_transformer': res_trans_state_dict,
+            'opt_res_transformer': self.opt_res_transformer.state_dict(),
+            'scheduler':self.scheduler.state_dict(),
+            'ep': ep,
+            'total_it': total_it,
+        }
+        torch.save(state, file_name)
+    def resume(self, model_dir):
+        checkpoint = torch.load(model_dir, map_location=self.device)
+        missing_keys, unexpected_keys = self.res_transformer.load_state_dict(checkpoint['res_transformer'], strict=False)
+        assert len(unexpected_keys) == 0
+        assert all([k.startswith('clip_model.') for k in missing_keys])
+        try:
+            self.opt_res_transformer.load_state_dict(checkpoint['opt_res_transformer']) # Optimizer
+            self.scheduler.load_state_dict(checkpoint['scheduler']) # Scheduler
+        except:
+            print('Resume wo optimizer')
+        return checkpoint['ep'], checkpoint['total_it']
+    def train(self, train_loader, val_loader, eval_val_loader, eval_wrapper, plot_eval):
+        self.res_transformer.to(self.device)
+        self.vq_model.to(self.device)
+        self.opt_res_transformer = optim.AdamW(self.res_transformer.parameters(), betas=(0.9, 0.99), lr=self.opt.lr, weight_decay=1e-5)
+        self.scheduler = optim.lr_scheduler.MultiStepLR(self.opt_res_transformer,
+                                                        milestones=self.opt.milestones,
+                                                        gamma=self.opt.gamma)
+        epoch = 0
+        it = 0
+        if self.opt.is_continue:
+            model_dir = pjoin(self.opt.model_dir, 'latest.tar')  # TODO
+            epoch, it = self.resume(model_dir)
+            print("Load model epoch:%d iterations:%d"%(epoch, it))
+        start_time = time.time()
+        total_iters = self.opt.max_epoch * len(train_loader)
+        print(f'Total Epochs: {self.opt.max_epoch}, Total Iters: {total_iters}')
+        print('Iters Per Epoch, Training: %04d, Validation: %03d' % (len(train_loader), len(val_loader)))
+        logs = defaultdict(def_value, OrderedDict())
+        best_fid, best_div, best_top1, best_top2, best_top3, best_matching, writer = evaluation_res_transformer(
+            self.opt.save_root, eval_val_loader, self.res_transformer, self.vq_model, self.logger, epoch,
+            best_fid=100, best_div=100,
+            best_top1=0, best_top2=0, best_top3=0,
+            best_matching=100, eval_wrapper=eval_wrapper,
+            plot_func=plot_eval, save_ckpt=False, save_anim=False
+        )
+        best_loss = 100
+        best_acc = 0
+        while epoch < self.opt.max_epoch:
+            self.res_transformer.train()
+            self.vq_model.eval()
+            for i, batch in enumerate(train_loader):
+                it += 1
+                if it < self.opt.warm_up_iter:
+                    self.update_lr_warm_up(it, self.opt.warm_up_iter, self.opt.lr)
+                loss, acc = self.update(batch_data=batch)
+                logs['loss'] += loss
+                logs["acc"] += acc
+                logs['lr'] += self.opt_res_transformer.param_groups[0]['lr']
+                if it % self.opt.log_every == 0:
+                    mean_loss = OrderedDict()
+                    # self.logger.add_scalar('val_loss', val_loss, it)
+                    # self.l
+                    for tag, value in logs.items():
+                        self.logger.add_scalar('Train/%s'%tag, value / self.opt.log_every, it)
+                        mean_loss[tag] = value / self.opt.log_every
+                    logs = defaultdict(def_value, OrderedDict())
+                    print_current_loss(start_time, it, total_iters, mean_loss, epoch=epoch, inner_iter=i)
+                if it % self.opt.save_latest == 0:
+                    self.save(pjoin(self.opt.model_dir, 'latest.tar'), epoch, it)
+            epoch += 1
+            self.save(pjoin(self.opt.model_dir, 'latest.tar'), epoch, it)
+            print('Validation time:')
+            self.vq_model.eval()
+            self.res_transformer.eval()
+            val_loss = []
+            val_acc = []
+            with torch.no_grad():
+                for i, batch_data in enumerate(val_loader):
+                    loss, acc = self.forward(batch_data)
+                    val_loss.append(loss.item())
+                    val_acc.append(acc)
+            print(f"Validation loss:{np.mean(val_loss):.3f}, Accuracy:{np.mean(val_acc):.3f}")
+            self.logger.add_scalar('Val/loss', np.mean(val_loss), epoch)
+            self.logger.add_scalar('Val/acc', np.mean(val_acc), epoch)
+            if np.mean(val_loss) < best_loss:
+                print(f"Improved loss from {best_loss:.02f} to {np.mean(val_loss)}!!!")
+                self.save(pjoin(self.opt.model_dir, 'net_best_loss.tar'), epoch, it)
+                best_loss = np.mean(val_loss)
+            if np.mean(val_acc) > best_acc:
+                print(f"Improved acc from {best_acc:.02f} to {np.mean(val_acc)}!!!")
+                # self.save(pjoin(self.opt.model_dir, 'net_best_loss.tar'), epoch, it)
+                best_acc = np.mean(val_acc)
+            best_fid, best_div, best_top1, best_top2, best_top3, best_matching, writer = evaluation_res_transformer(
+                self.opt.save_root, eval_val_loader, self.res_transformer, self.vq_model, self.logger, epoch, best_fid=best_fid,
+                best_div=best_div, best_top1=best_top1, best_top2=best_top2, best_top3=best_top3,
+                best_matching=best_matching, eval_wrapper=eval_wrapper,
+                plot_func=plot_eval, save_ckpt=True, save_anim=(epoch%self.opt.eval_every_e==0)
+            )

models/t2m_eval_modules.py ADDED Viewed

	@@ -0,0 +1,182 @@

+import torch
+import torch.nn as nn
+import numpy as np
+import time
+import math
+import random
+from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
+# from networks.layers import *
+def init_weight(m):
+    if isinstance(m, nn.Conv1d) or isinstance(m, nn.Linear) or isinstance(m, nn.ConvTranspose1d):
+        nn.init.xavier_normal_(m.weight)
+        # m.bias.data.fill_(0.01)
+        if m.bias is not None:
+            nn.init.constant_(m.bias, 0)
+# batch_size, dimension and position
+# output: (batch_size, dim)
+def positional_encoding(batch_size, dim, pos):
+    assert batch_size == pos.shape[0]
+    positions_enc = np.array([
+        [pos[j] / np.power(10000, (i-i%2)/dim) for i in range(dim)]
+        for j in range(batch_size)
+    ], dtype=np.float32)
+    positions_enc[:, 0::2] = np.sin(positions_enc[:, 0::2])
+    positions_enc[:, 1::2] = np.cos(positions_enc[:, 1::2])
+    return torch.from_numpy(positions_enc).float()
+def get_padding_mask(batch_size, seq_len, cap_lens):
+    cap_lens = cap_lens.data.tolist()
+    mask_2d = torch.ones((batch_size, seq_len, seq_len), dtype=torch.float32)
+    for i, cap_len in enumerate(cap_lens):
+        mask_2d[i, :, :cap_len] = 0
+    return mask_2d.bool(), 1 - mask_2d[:, :, 0].clone()
+def top_k_logits(logits, k):
+    v, ix = torch.topk(logits, k)
+    out = logits.clone()
+    out[out < v[:, [-1]]] = -float('Inf')
+    return out
+class PositionalEncoding(nn.Module):
+    def __init__(self, d_model, max_len=300):
+        super(PositionalEncoding, self).__init__()
+        pe = torch.zeros(max_len, d_model)
+        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
+        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
+        pe[:, 0::2] = torch.sin(position * div_term)
+        pe[:, 1::2] = torch.cos(position * div_term)
+        # pe = pe.unsqueeze(0).transpose(0, 1)
+        self.register_buffer('pe', pe)
+    def forward(self, pos):
+        return self.pe[pos]
+class MovementConvEncoder(nn.Module):
+    def __init__(self, input_size, hidden_size, output_size):
+        super(MovementConvEncoder, self).__init__()
+        self.main = nn.Sequential(
+            nn.Conv1d(input_size, hidden_size, 4, 2, 1),
+            nn.Dropout(0.2, inplace=True),
+            nn.LeakyReLU(0.2, inplace=True),
+            nn.Conv1d(hidden_size, output_size, 4, 2, 1),
+            nn.Dropout(0.2, inplace=True),
+            nn.LeakyReLU(0.2, inplace=True),
+        )
+        self.out_net = nn.Linear(output_size, output_size)
+        self.main.apply(init_weight)
+        self.out_net.apply(init_weight)
+    def forward(self, inputs):
+        inputs = inputs.permute(0, 2, 1)
+        outputs = self.main(inputs).permute(0, 2, 1)
+        # print(outputs.shape)
+        return self.out_net(outputs)
+class MovementConvDecoder(nn.Module):
+    def __init__(self, input_size, hidden_size, output_size):
+        super(MovementConvDecoder, self).__init__()
+        self.main = nn.Sequential(
+            nn.ConvTranspose1d(input_size, hidden_size, 4, 2, 1),
+            # nn.Dropout(0.2, inplace=True),
+            nn.LeakyReLU(0.2, inplace=True),
+            nn.ConvTranspose1d(hidden_size, output_size, 4, 2, 1),
+            # nn.Dropout(0.2, inplace=True),
+            nn.LeakyReLU(0.2, inplace=True),
+        )
+        self.out_net = nn.Linear(output_size, output_size)
+        self.main.apply(init_weight)
+        self.out_net.apply(init_weight)
+    def forward(self, inputs):
+        inputs = inputs.permute(0, 2, 1)
+        outputs = self.main(inputs).permute(0, 2, 1)
+        return self.out_net(outputs)
+class TextEncoderBiGRUCo(nn.Module):
+    def __init__(self, word_size, pos_size, hidden_size, output_size, device):
+        super(TextEncoderBiGRUCo, self).__init__()
+        self.device = device
+        self.pos_emb = nn.Linear(pos_size, word_size)
+        self.input_emb = nn.Linear(word_size, hidden_size)
+        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True, bidirectional=True)
+        self.output_net = nn.Sequential(
+            nn.Linear(hidden_size * 2, hidden_size),
+            nn.LayerNorm(hidden_size),
+            nn.LeakyReLU(0.2, inplace=True),
+            nn.Linear(hidden_size, output_size)
+        )
+        self.input_emb.apply(init_weight)
+        self.pos_emb.apply(init_weight)
+        self.output_net.apply(init_weight)
+        # self.linear2.apply(init_weight)
+        # self.batch_size = batch_size
+        self.hidden_size = hidden_size
+        self.hidden = nn.Parameter(torch.randn((2, 1, self.hidden_size), requires_grad=True))
+    # input(batch_size, seq_len, dim)
+    def forward(self, word_embs, pos_onehot, cap_lens):
+        num_samples = word_embs.shape[0]
+        pos_embs = self.pos_emb(pos_onehot)
+        inputs = word_embs + pos_embs
+        input_embs = self.input_emb(inputs)
+        hidden = self.hidden.repeat(1, num_samples, 1)
+        cap_lens = cap_lens.data.tolist()
+        emb = pack_padded_sequence(input_embs, cap_lens, batch_first=True)
+        gru_seq, gru_last = self.gru(emb, hidden)
+        gru_last = torch.cat([gru_last[0], gru_last[1]], dim=-1)
+        return self.output_net(gru_last)
+class MotionEncoderBiGRUCo(nn.Module):
+    def __init__(self, input_size, hidden_size, output_size, device):
+        super(MotionEncoderBiGRUCo, self).__init__()
+        self.device = device
+        self.input_emb = nn.Linear(input_size, hidden_size)
+        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True, bidirectional=True)
+        self.output_net = nn.Sequential(
+            nn.Linear(hidden_size*2, hidden_size),
+            nn.LayerNorm(hidden_size),
+            nn.LeakyReLU(0.2, inplace=True),
+            nn.Linear(hidden_size, output_size)
+        )
+        self.input_emb.apply(init_weight)
+        self.output_net.apply(init_weight)
+        self.hidden_size = hidden_size
+        self.hidden = nn.Parameter(torch.randn((2, 1, self.hidden_size), requires_grad=True))
+    # input(batch_size, seq_len, dim)
+    def forward(self, inputs, m_lens):
+        num_samples = inputs.shape[0]
+        input_embs = self.input_emb(inputs)
+        hidden = self.hidden.repeat(1, num_samples, 1)
+        cap_lens = m_lens.data.tolist()
+        emb = pack_padded_sequence(input_embs, cap_lens, batch_first=True)
+        gru_seq, gru_last = self.gru(emb, hidden)
+        gru_last = torch.cat([gru_last[0], gru_last[1]], dim=-1)
+        return self.output_net(gru_last)

models/t2m_eval_wrapper.py ADDED Viewed

	@@ -0,0 +1,191 @@

+from models.t2m_eval_modules import *
+from utils.word_vectorizer import POS_enumerator
+from os.path import join as pjoin
+def build_models(opt):
+    movement_enc = MovementConvEncoder(opt.dim_pose-4, opt.dim_movement_enc_hidden, opt.dim_movement_latent)
+    text_enc = TextEncoderBiGRUCo(word_size=opt.dim_word,
+                                  pos_size=opt.dim_pos_ohot,
+                                  hidden_size=opt.dim_text_hidden,
+                                  output_size=opt.dim_coemb_hidden,
+                                  device=opt.device)
+    motion_enc = MotionEncoderBiGRUCo(input_size=opt.dim_movement_latent,
+                                      hidden_size=opt.dim_motion_hidden,
+                                      output_size=opt.dim_coemb_hidden,
+                                      device=opt.device)
+    checkpoint = torch.load(pjoin(opt.checkpoints_dir, opt.dataset_name, 'text_mot_match', 'model', 'finest.tar'),
+                            map_location=opt.device)
+    movement_enc.load_state_dict(checkpoint['movement_encoder'])
+    text_enc.load_state_dict(checkpoint['text_encoder'])
+    motion_enc.load_state_dict(checkpoint['motion_encoder'])
+    print('Loading Evaluation Model Wrapper (Epoch %d) Completed!!' % (checkpoint['epoch']))
+    return text_enc, motion_enc, movement_enc
+class EvaluatorModelWrapper(object):
+    def __init__(self, opt):
+        if opt.dataset_name == 't2m':
+            opt.dim_pose = 263
+        elif opt.dataset_name == 'kit':
+            opt.dim_pose = 251
+        else:
+            raise KeyError('Dataset not Recognized!!!')
+        opt.dim_word = 300
+        opt.max_motion_length = 196
+        opt.dim_pos_ohot = len(POS_enumerator)
+        opt.dim_motion_hidden = 1024
+        opt.max_text_len = 20
+        opt.dim_text_hidden = 512
+        opt.dim_coemb_hidden = 512
+        # print(opt)
+        self.text_encoder, self.motion_encoder, self.movement_encoder = build_models(opt)
+        self.opt = opt
+        self.device = opt.device
+        self.text_encoder.to(opt.device)
+        self.motion_encoder.to(opt.device)
+        self.movement_encoder.to(opt.device)
+        self.text_encoder.eval()
+        self.motion_encoder.eval()
+        self.movement_encoder.eval()
+    # Please note that the results does not follow the order of inputs
+    def get_co_embeddings(self, word_embs, pos_ohot, cap_lens, motions, m_lens):
+        with torch.no_grad():
+            word_embs = word_embs.detach().to(self.device).float()
+            pos_ohot = pos_ohot.detach().to(self.device).float()
+            motions = motions.detach().to(self.device).float()
+            align_idx = np.argsort(m_lens.data.tolist())[::-1].copy()
+            motions = motions[align_idx]
+            m_lens = m_lens[align_idx]
+            '''Movement Encoding'''
+            movements = self.movement_encoder(motions[..., :-4]).detach()
+            m_lens = m_lens // self.opt.unit_length
+            motion_embedding = self.motion_encoder(movements, m_lens)
+            '''Text Encoding'''
+            text_embedding = self.text_encoder(word_embs, pos_ohot, cap_lens)
+            text_embedding = text_embedding[align_idx]
+        return text_embedding, motion_embedding
+    # Please note that the results does not follow the order of inputs
+    def get_motion_embeddings(self, motions, m_lens):
+        with torch.no_grad():
+            motions = motions.detach().to(self.device).float()
+            align_idx = np.argsort(m_lens.data.tolist())[::-1].copy()
+            motions = motions[align_idx]
+            m_lens = m_lens[align_idx]
+            '''Movement Encoding'''
+            movements = self.movement_encoder(motions[..., :-4]).detach()
+            m_lens = m_lens // self.opt.unit_length
+            motion_embedding = self.motion_encoder(movements, m_lens)
+        return motion_embedding
+## Borrowed form MDM
+# our version
+def build_evaluators(opt):
+    movement_enc = MovementConvEncoder(opt['dim_pose']-4, opt['dim_movement_enc_hidden'], opt['dim_movement_latent'])
+    text_enc = TextEncoderBiGRUCo(word_size=opt['dim_word'],
+                                  pos_size=opt['dim_pos_ohot'],
+                                  hidden_size=opt['dim_text_hidden'],
+                                  output_size=opt['dim_coemb_hidden'],
+                                  device=opt['device'])
+    motion_enc = MotionEncoderBiGRUCo(input_size=opt['dim_movement_latent'],
+                                      hidden_size=opt['dim_motion_hidden'],
+                                      output_size=opt['dim_coemb_hidden'],
+                                      device=opt['device'])
+    ckpt_dir = opt['dataset_name']
+    if opt['dataset_name'] == 'humanml':
+        ckpt_dir = 't2m'
+    checkpoint = torch.load(pjoin(opt['checkpoints_dir'], ckpt_dir, 'text_mot_match', 'model', 'finest.tar'),
+                            map_location=opt['device'])
+    movement_enc.load_state_dict(checkpoint['movement_encoder'])
+    text_enc.load_state_dict(checkpoint['text_encoder'])
+    motion_enc.load_state_dict(checkpoint['motion_encoder'])
+    print('Loading Evaluation Model Wrapper (Epoch %d) Completed!!' % (checkpoint['epoch']))
+    return text_enc, motion_enc, movement_enc
+# our wrapper
+class EvaluatorWrapper(object):
+    def __init__(self, dataset_name, device):
+        opt = {
+            'dataset_name': dataset_name,
+            'device': device,
+            'dim_word': 300,
+            'max_motion_length': 196,
+            'dim_pos_ohot': len(POS_enumerator),
+            'dim_motion_hidden': 1024,
+            'max_text_len': 20,
+            'dim_text_hidden': 512,
+            'dim_coemb_hidden': 512,
+            'dim_pose': 263 if dataset_name == 'humanml' else 251,
+            'dim_movement_enc_hidden': 512,
+            'dim_movement_latent': 512,
+            'checkpoints_dir': './checkpoints',
+            'unit_length': 4,
+        }
+        self.text_encoder, self.motion_encoder, self.movement_encoder = build_evaluators(opt)
+        self.opt = opt
+        self.device = opt['device']
+        self.text_encoder.to(opt['device'])
+        self.motion_encoder.to(opt['device'])
+        self.movement_encoder.to(opt['device'])
+        self.text_encoder.eval()
+        self.motion_encoder.eval()
+        self.movement_encoder.eval()
+    # Please note that the results does not following the order of inputs
+    def get_co_embeddings(self, word_embs, pos_ohot, cap_lens, motions, m_lens):
+        with torch.no_grad():
+            word_embs = word_embs.detach().to(self.device).float()
+            pos_ohot = pos_ohot.detach().to(self.device).float()
+            motions = motions.detach().to(self.device).float()
+            align_idx = np.argsort(m_lens.data.tolist())[::-1].copy()
+            motions = motions[align_idx]
+            m_lens = m_lens[align_idx]
+            '''Movement Encoding'''
+            movements = self.movement_encoder(motions[..., :-4]).detach()
+            m_lens = m_lens // self.opt['unit_length']
+            motion_embedding = self.motion_encoder(movements, m_lens)
+            # print(motions.shape, movements.shape, motion_embedding.shape, m_lens)
+            '''Text Encoding'''
+            text_embedding = self.text_encoder(word_embs, pos_ohot, cap_lens)
+            text_embedding = text_embedding[align_idx]
+        return text_embedding, motion_embedding
+    # Please note that the results does not following the order of inputs
+    def get_motion_embeddings(self, motions, m_lens):
+        with torch.no_grad():
+            motions = motions.detach().to(self.device).float()
+            align_idx = np.argsort(m_lens.data.tolist())[::-1].copy()
+            motions = motions[align_idx]
+            m_lens = m_lens[align_idx]
+            '''Movement Encoding'''
+            movements = self.movement_encoder(motions[..., :-4]).detach()
+            m_lens = m_lens // self.opt['unit_length']
+            motion_embedding = self.motion_encoder(movements, m_lens)
+        return motion_embedding

models/vq/__init__.py ADDED Viewed

File without changes

models/vq/encdec.py ADDED Viewed

	@@ -0,0 +1,68 @@

+import torch.nn as nn
+from models.vq.resnet import Resnet1D
+class Encoder(nn.Module):
+    def __init__(self,
+                 input_emb_width=3,
+                 output_emb_width=512,
+                 down_t=2,
+                 stride_t=2,
+                 width=512,
+                 depth=3,
+                 dilation_growth_rate=3,
+                 activation='relu',
+                 norm=None):
+        super().__init__()
+        blocks = []
+        filter_t, pad_t = stride_t * 2, stride_t // 2
+        blocks.append(nn.Conv1d(input_emb_width, width, 3, 1, 1))
+        blocks.append(nn.ReLU())
+        for i in range(down_t):
+            input_dim = width
+            block = nn.Sequential(
+                nn.Conv1d(input_dim, width, filter_t, stride_t, pad_t),
+                Resnet1D(width, depth, dilation_growth_rate, activation=activation, norm=norm),
+            )
+            blocks.append(block)
+        blocks.append(nn.Conv1d(width, output_emb_width, 3, 1, 1))
+        self.model = nn.Sequential(*blocks)
+    def forward(self, x):
+        return self.model(x)
+class Decoder(nn.Module):
+    def __init__(self,
+                 input_emb_width=3,
+                 output_emb_width=512,
+                 down_t=2,
+                 stride_t=2,
+                 width=512,
+                 depth=3,
+                 dilation_growth_rate=3,
+                 activation='relu',
+                 norm=None):
+        super().__init__()
+        blocks = []
+        blocks.append(nn.Conv1d(output_emb_width, width, 3, 1, 1))
+        blocks.append(nn.ReLU())
+        for i in range(down_t):
+            out_dim = width
+            block = nn.Sequential(
+                Resnet1D(width, depth, dilation_growth_rate, reverse_dilation=True, activation=activation, norm=norm),
+                nn.Upsample(scale_factor=2, mode='nearest'),
+                nn.Conv1d(width, out_dim, 3, 1, 1)
+            )
+            blocks.append(block)
+        blocks.append(nn.Conv1d(width, width, 3, 1, 1))
+        blocks.append(nn.ReLU())
+        blocks.append(nn.Conv1d(width, input_emb_width, 3, 1, 1))
+        self.model = nn.Sequential(*blocks)
+    def forward(self, x):
+        x = self.model(x)
+        return x.permute(0, 2, 1)

models/vq/model.py ADDED Viewed

	@@ -0,0 +1,124 @@

+import random
+import torch.nn as nn
+from models.vq.encdec import Encoder, Decoder
+from models.vq.residual_vq import ResidualVQ
+class RVQVAE(nn.Module):
+    def __init__(self,
+                 args,
+                 input_width=263,
+                 nb_code=1024,
+                 code_dim=512,
+                 output_emb_width=512,
+                 down_t=3,
+                 stride_t=2,
+                 width=512,
+                 depth=3,
+                 dilation_growth_rate=3,
+                 activation='relu',
+                 norm=None):
+        super().__init__()
+        assert output_emb_width == code_dim
+        self.code_dim = code_dim
+        self.num_code = nb_code
+        # self.quant = args.quantizer
+        self.encoder = Encoder(input_width, output_emb_width, down_t, stride_t, width, depth,
+                               dilation_growth_rate, activation=activation, norm=norm)
+        self.decoder = Decoder(input_width, output_emb_width, down_t, stride_t, width, depth,
+                               dilation_growth_rate, activation=activation, norm=norm)
+        rvqvae_config = {
+            'num_quantizers': args.num_quantizers,
+            'shared_codebook': args.shared_codebook,
+            'quantize_dropout_prob': args.quantize_dropout_prob,
+            'quantize_dropout_cutoff_index': 0,
+            'nb_code': nb_code,
+            'code_dim':code_dim,
+            'args': args,
+        }
+        self.quantizer = ResidualVQ(**rvqvae_config)
+    def preprocess(self, x):
+        # (bs, T, Jx3) -> (bs, Jx3, T)
+        x = x.permute(0, 2, 1).float()
+        return x
+    def postprocess(self, x):
+        # (bs, Jx3, T) ->  (bs, T, Jx3)
+        x = x.permute(0, 2, 1)
+        return x
+    def encode(self, x):
+        N, T, _ = x.shape
+        x_in = self.preprocess(x)
+        x_encoder = self.encoder(x_in)
+        # print(x_encoder.shape)
+        code_idx, all_codes = self.quantizer.quantize(x_encoder, return_latent=True)
+        # print(code_idx.shape)
+        # code_idx = code_idx.view(N, -1)
+        # (N, T, Q)
+        # print()
+        return code_idx, all_codes
+    def forward(self, x):
+        x_in = self.preprocess(x)
+        # Encode
+        x_encoder = self.encoder(x_in)
+        ## quantization
+        # x_quantized, code_idx, commit_loss, perplexity = self.quantizer(x_encoder, sample_codebook_temp=0.5,
+        #                                                                 force_dropout_index=0) #TODO hardcode
+        x_quantized, code_idx, commit_loss, perplexity = self.quantizer(x_encoder, sample_codebook_temp=0.5)
+        # print(code_idx[0, :, 1])
+        ## decoder
+        x_out = self.decoder(x_quantized)
+        # x_out = self.postprocess(x_decoder)
+        return x_out, commit_loss, perplexity
+    def forward_decoder(self, x):
+        x_d = self.quantizer.get_codes_from_indices(x)
+        # x_d = x_d.view(1, -1, self.code_dim).permute(0, 2, 1).contiguous()
+        x = x_d.sum(dim=0).permute(0, 2, 1)
+        # decoder
+        x_out = self.decoder(x)
+        # x_out = self.postprocess(x_decoder)
+        return x_out
+class LengthEstimator(nn.Module):
+    def __init__(self, input_size, output_size):
+        super(LengthEstimator, self).__init__()
+        nd = 512
+        self.output = nn.Sequential(
+            nn.Linear(input_size, nd),
+            nn.LayerNorm(nd),
+            nn.LeakyReLU(0.2, inplace=True),
+            nn.Dropout(0.2),
+            nn.Linear(nd, nd // 2),
+            nn.LayerNorm(nd // 2),
+            nn.LeakyReLU(0.2, inplace=True),
+            nn.Dropout(0.2),
+            nn.Linear(nd // 2, nd // 4),
+            nn.LayerNorm(nd // 4),
+            nn.LeakyReLU(0.2, inplace=True),
+            nn.Linear(nd // 4, output_size)
+        )
+        self.output.apply(self.__init_weights)
+    def __init_weights(self, module):
+        if isinstance(module, (nn.Linear, nn.Embedding)):
+            module.weight.data.normal_(mean=0.0, std=0.02)
+            if isinstance(module, nn.Linear) and module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+    def forward(self, text_emb):
+        return self.output(text_emb)

models/vq/quantizer.py ADDED Viewed

	@@ -0,0 +1,180 @@

+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from einops import rearrange, repeat, reduce, pack, unpack
+# from vector_quantize_pytorch import ResidualVQ
+#Borrow from vector_quantize_pytorch
+def log(t, eps = 1e-20):
+    return torch.log(t.clamp(min = eps))
+def gumbel_noise(t):
+    noise = torch.zeros_like(t).uniform_(0, 1)
+    return -log(-log(noise))
+def gumbel_sample(
+    logits,
+    temperature = 1.,
+    stochastic = False,
+    dim = -1,
+    training = True
+):
+    if training and stochastic and temperature > 0:
+        sampling_logits = (logits / temperature) + gumbel_noise(logits)
+    else:
+        sampling_logits = logits
+    ind = sampling_logits.argmax(dim = dim)
+    return ind
+class QuantizeEMAReset(nn.Module):
+    def __init__(self, nb_code, code_dim, args):
+        super(QuantizeEMAReset, self).__init__()
+        self.nb_code = nb_code
+        self.code_dim = code_dim
+        self.mu = args.mu  ##TO_DO
+        self.reset_codebook()
+    def reset_codebook(self):
+        self.init = False
+        self.code_sum = None
+        self.code_count = None
+        self.register_buffer('codebook', torch.zeros(self.nb_code, self.code_dim, requires_grad=False).cuda())
+    def _tile(self, x):
+        nb_code_x, code_dim = x.shape
+        if nb_code_x < self.nb_code:
+            n_repeats = (self.nb_code + nb_code_x - 1) // nb_code_x
+            std = 0.01 / np.sqrt(code_dim)
+            out = x.repeat(n_repeats, 1)
+            out = out + torch.randn_like(out) * std
+        else:
+            out = x
+        return out
+    def init_codebook(self, x):
+        out = self._tile(x)
+        self.codebook = out[:self.nb_code]
+        self.code_sum = self.codebook.clone()
+        self.code_count = torch.ones(self.nb_code, device=self.codebook.device)
+        self.init = True
+    def quantize(self, x, sample_codebook_temp=0.):
+        # N X C -> C X N
+        k_w = self.codebook.t()
+        # x: NT X C
+        # NT X N
+        distance = torch.sum(x ** 2, dim=-1, keepdim=True) - \
+                   2 * torch.matmul(x, k_w) + \
+                   torch.sum(k_w ** 2, dim=0, keepdim=True)  # (N * L, b)
+        # code_idx = torch.argmin(distance, dim=-1)
+        code_idx = gumbel_sample(-distance, dim = -1, temperature = sample_codebook_temp, stochastic=True, training = self.training)
+        return code_idx
+    def dequantize(self, code_idx):
+        x = F.embedding(code_idx, self.codebook)
+        return x
+    def get_codebook_entry(self, indices):
+        return self.dequantize(indices).permute(0, 2, 1)
+    @torch.no_grad()
+    def compute_perplexity(self, code_idx):
+        # Calculate new centres
+        code_onehot = torch.zeros(self.nb_code, code_idx.shape[0], device=code_idx.device)  # nb_code, N * L
+        code_onehot.scatter_(0, code_idx.view(1, code_idx.shape[0]), 1)
+        code_count = code_onehot.sum(dim=-1)  # nb_code
+        prob = code_count / torch.sum(code_count)
+        perplexity = torch.exp(-torch.sum(prob * torch.log(prob + 1e-7)))
+        return perplexity
+    @torch.no_grad()
+    def update_codebook(self, x, code_idx):
+        code_onehot = torch.zeros(self.nb_code, x.shape[0], device=x.device) # nb_code, N * L
+        code_onehot.scatter_(0, code_idx.view(1, x.shape[0]), 1)
+        code_sum = torch.matmul(code_onehot, x) # nb_code, c
+        code_count = code_onehot.sum(dim=-1) # nb_code
+        out = self._tile(x)
+        code_rand = out[:self.nb_code]
+        # Update centres
+        self.code_sum = self.mu * self.code_sum + (1. - self.mu) * code_sum
+        self.code_count = self.mu * self.code_count + (1. - self.mu) * code_count
+        usage = (self.code_count.view(self.nb_code, 1) >= 1.0).float()
+        code_update = self.code_sum.view(self.nb_code, self.code_dim) / self.code_count.view(self.nb_code, 1)
+        self.codebook = usage * code_update + (1-usage) * code_rand
+        prob = code_count / torch.sum(code_count)
+        perplexity = torch.exp(-torch.sum(prob * torch.log(prob + 1e-7)))
+        return perplexity
+    def preprocess(self, x):
+        # NCT -> NTC -> [NT, C]
+        # x = x.permute(0, 2, 1).contiguous()
+        # x = x.view(-1, x.shape[-1])
+        x = rearrange(x, 'n c t -> (n t) c')
+        return x
+    def forward(self, x, return_idx=False, temperature=0.):
+        N, width, T = x.shape
+        x = self.preprocess(x)
+        if self.training and not self.init:
+            self.init_codebook(x)
+        code_idx = self.quantize(x, temperature)
+        x_d = self.dequantize(code_idx)
+        if self.training:
+            perplexity = self.update_codebook(x, code_idx)
+        else:
+            perplexity = self.compute_perplexity(code_idx)
+        commit_loss = F.mse_loss(x, x_d.detach()) # It's right. the t2m-gpt paper is wrong on embed loss and commitment loss.
+        # Passthrough
+        x_d = x + (x_d - x).detach()
+        # Postprocess
+        x_d = x_d.view(N, T, -1).permute(0, 2, 1).contiguous()
+        code_idx = code_idx.view(N, T).contiguous()
+        # print(code_idx[0])
+        if return_idx:
+            return x_d, code_idx, commit_loss, perplexity
+        return x_d, commit_loss, perplexity
+class QuantizeEMA(QuantizeEMAReset):
+    @torch.no_grad()
+    def update_codebook(self, x, code_idx):
+        code_onehot = torch.zeros(self.nb_code, x.shape[0], device=x.device) # nb_code, N * L
+        code_onehot.scatter_(0, code_idx.view(1, x.shape[0]), 1)
+        code_sum = torch.matmul(code_onehot, x) # nb_code, c
+        code_count = code_onehot.sum(dim=-1) # nb_code
+        # Update centres
+        self.code_sum = self.mu * self.code_sum + (1. - self.mu) * code_sum
+        self.code_count = self.mu * self.code_count + (1. - self.mu) * code_count
+        usage = (self.code_count.view(self.nb_code, 1) >= 1.0).float()
+        code_update = self.code_sum.view(self.nb_code, self.code_dim) / self.code_count.view(self.nb_code, 1)
+        self.codebook = usage * code_update + (1-usage) * self.codebook
+        prob = code_count / torch.sum(code_count)
+        perplexity = torch.exp(-torch.sum(prob * torch.log(prob + 1e-7)))
+        return perplexity

models/vq/residual_vq.py ADDED Viewed

	@@ -0,0 +1,194 @@

+import random
+from math import ceil
+from functools import partial
+from itertools import zip_longest
+from random import randrange
+import torch
+from torch import nn
+import torch.nn.functional as F
+# from vector_quantize_pytorch.vector_quantize_pytorch import VectorQuantize
+from models.vq.quantizer import QuantizeEMAReset, QuantizeEMA
+from einops import rearrange, repeat, pack, unpack
+# helper functions
+def exists(val):
+    return val is not None
+def default(val, d):
+    return val if exists(val) else d
+def round_up_multiple(num, mult):
+    return ceil(num / mult) * mult
+# main class
+class ResidualVQ(nn.Module):
+    """ Follows Algorithm 1. in https://arxiv.org/pdf/2107.03312.pdf """
+    def __init__(
+        self,
+        num_quantizers,
+        shared_codebook=False,
+        quantize_dropout_prob=0.5,
+        quantize_dropout_cutoff_index=0,
+        **kwargs
+    ):
+        super().__init__()
+        self.num_quantizers = num_quantizers
+        # self.layers = nn.ModuleList([VectorQuantize(accept_image_fmap = accept_image_fmap, **kwargs) for _ in range(num_quantizers)])
+        if shared_codebook:
+            layer = QuantizeEMAReset(**kwargs)
+            self.layers = nn.ModuleList([layer for _ in range(num_quantizers)])
+        else:
+            self.layers = nn.ModuleList([QuantizeEMAReset(**kwargs) for _ in range(num_quantizers)])
+        # self.layers = nn.ModuleList([QuantizeEMA(**kwargs) for _ in range(num_quantizers)])
+        # self.quantize_dropout = quantize_dropout and num_quantizers > 1
+        assert quantize_dropout_cutoff_index >= 0 and quantize_dropout_prob >= 0
+        self.quantize_dropout_cutoff_index = quantize_dropout_cutoff_index
+        self.quantize_dropout_prob = quantize_dropout_prob
+    @property
+    def codebooks(self):
+        codebooks = [layer.codebook for layer in self.layers]
+        codebooks = torch.stack(codebooks, dim = 0)
+        return codebooks # 'q c d'
+    def get_codes_from_indices(self, indices): #indices shape 'b n q' # dequantize
+        batch, quantize_dim = indices.shape[0], indices.shape[-1]
+        # because of quantize dropout, one can pass in indices that are coarse
+        # and the network should be able to reconstruct
+        if quantize_dim < self.num_quantizers:
+            indices = F.pad(indices, (0, self.num_quantizers - quantize_dim), value = -1)
+        # get ready for gathering
+        codebooks = repeat(self.codebooks, 'q c d -> q b c d', b = batch)
+        gather_indices = repeat(indices, 'b n q -> q b n d', d = codebooks.shape[-1])
+        # take care of quantizer dropout
+        mask = gather_indices == -1.
+        gather_indices = gather_indices.masked_fill(mask, 0) # have it fetch a dummy code to be masked out later
+        # print(gather_indices.max(), gather_indices.min())
+        all_codes = codebooks.gather(2, gather_indices) # gather all codes
+        # mask out any codes that were dropout-ed
+        all_codes = all_codes.masked_fill(mask, 0.)
+        return all_codes # 'q b n d'
+    def get_codebook_entry(self, indices): #indices shape 'b n q'
+        all_codes = self.get_codes_from_indices(indices) #'q b n d'
+        latent = torch.sum(all_codes, dim=0) #'b n d'
+        latent = latent.permute(0, 2, 1)
+        return latent
+    def forward(self, x, return_all_codes = False, sample_codebook_temp = None, force_dropout_index=-1):
+        # debug check
+        # print(self.codebooks[:,0,0].detach().cpu().numpy())
+        num_quant, quant_dropout_prob, device = self.num_quantizers, self.quantize_dropout_prob, x.device
+        quantized_out = 0.
+        residual = x
+        all_losses = []
+        all_indices = []
+        all_perplexity = []
+        should_quantize_dropout = self.training and random.random() < self.quantize_dropout_prob
+        start_drop_quantize_index = num_quant
+        # To ensure the first-k layers learn things as much as possible, we randomly dropout the last q - k layers
+        if should_quantize_dropout:
+            start_drop_quantize_index = randrange(self.quantize_dropout_cutoff_index, num_quant) # keep quant layers <= quantize_dropout_cutoff_index, TODO vary in batch
+            null_indices_shape = [x.shape[0], x.shape[-1]] # 'b*n'
+            null_indices = torch.full(null_indices_shape, -1., device = device, dtype = torch.long)
+            # null_loss = 0.
+        if force_dropout_index >= 0:
+            should_quantize_dropout = True
+            start_drop_quantize_index = force_dropout_index
+            null_indices_shape = [x.shape[0], x.shape[-1]]  # 'b*n'
+            null_indices = torch.full(null_indices_shape, -1., device=device, dtype=torch.long)
+        # print(force_dropout_index)
+        # go through the layers
+        for quantizer_index, layer in enumerate(self.layers):
+            if should_quantize_dropout and quantizer_index > start_drop_quantize_index:
+                all_indices.append(null_indices)
+                # all_losses.append(null_loss)
+                continue
+            # layer_indices = None
+            # if return_loss:
+            #     layer_indices = indices[..., quantizer_index] #gt indices
+            # quantized, *rest = layer(residual, indices = layer_indices, sample_codebook_temp = sample_codebook_temp) #single quantizer TODO
+            quantized, *rest = layer(residual, return_idx=True, temperature=sample_codebook_temp) #single quantizer
+            # print(quantized.shape, residual.shape)
+            residual -= quantized.detach()
+            quantized_out += quantized
+            embed_indices, loss, perplexity = rest
+            all_indices.append(embed_indices)
+            all_losses.append(loss)
+            all_perplexity.append(perplexity)
+        # stack all losses and indices
+        all_indices = torch.stack(all_indices, dim=-1)
+        all_losses = sum(all_losses)/len(all_losses)
+        all_perplexity = sum(all_perplexity)/len(all_perplexity)
+        ret = (quantized_out, all_indices, all_losses, all_perplexity)
+        if return_all_codes:
+            # whether to return all codes from all codebooks across layers
+            all_codes = self.get_codes_from_indices(all_indices)
+            # will return all codes in shape (quantizer, batch, sequence length, codebook dimension)
+            ret = (*ret, all_codes)
+        return ret
+    def quantize(self, x, return_latent=False):
+        all_indices = []
+        quantized_out = 0.
+        residual = x
+        all_codes = []
+        for quantizer_index, layer in enumerate(self.layers):
+            quantized, *rest = layer(residual, return_idx=True) #single quantizer
+            residual = residual - quantized.detach()
+            quantized_out = quantized_out + quantized
+            embed_indices, loss, perplexity = rest
+            all_indices.append(embed_indices)
+            # print(quantizer_index, embed_indices[0])
+            # print(quantizer_index, quantized[0])
+            # break
+            all_codes.append(quantized)
+        code_idx = torch.stack(all_indices, dim=-1)
+        all_codes = torch.stack(all_codes, dim=0)
+        if return_latent:
+            return code_idx, all_codes
+        return code_idx

models/vq/resnet.py ADDED Viewed

	@@ -0,0 +1,84 @@

+import torch.nn as nn
+import torch
+class nonlinearity(nn.Module):
+    def __init(self):
+        super().__init__()
+    def forward(self, x):
+        return x * torch.sigmoid(x)
+class ResConv1DBlock(nn.Module):
+    def __init__(self, n_in, n_state, dilation=1, activation='silu', norm=None, dropout=0.2):
+        super(ResConv1DBlock, self).__init__()
+        padding = dilation
+        self.norm = norm
+        if norm == "LN":
+            self.norm1 = nn.LayerNorm(n_in)
+            self.norm2 = nn.LayerNorm(n_in)
+        elif norm == "GN":
+            self.norm1 = nn.GroupNorm(num_groups=32, num_channels=n_in, eps=1e-6, affine=True)
+            self.norm2 = nn.GroupNorm(num_groups=32, num_channels=n_in, eps=1e-6, affine=True)
+        elif norm == "BN":
+            self.norm1 = nn.BatchNorm1d(num_features=n_in, eps=1e-6, affine=True)
+            self.norm2 = nn.BatchNorm1d(num_features=n_in, eps=1e-6, affine=True)
+        else:
+            self.norm1 = nn.Identity()
+            self.norm2 = nn.Identity()
+        if activation == "relu":
+            self.activation1 = nn.ReLU()
+            self.activation2 = nn.ReLU()
+        elif activation == "silu":
+            self.activation1 = nonlinearity()
+            self.activation2 = nonlinearity()
+        elif activation == "gelu":
+            self.activation1 = nn.GELU()
+            self.activation2 = nn.GELU()
+        self.conv1 = nn.Conv1d(n_in, n_state, 3, 1, padding, dilation)
+        self.conv2 = nn.Conv1d(n_state, n_in, 1, 1, 0, )
+        self.dropout = nn.Dropout(dropout)
+    def forward(self, x):
+        x_orig = x
+        if self.norm == "LN":
+            x = self.norm1(x.transpose(-2, -1))
+            x = self.activation1(x.transpose(-2, -1))
+        else:
+            x = self.norm1(x)
+            x = self.activation1(x)
+        x = self.conv1(x)
+        if self.norm == "LN":
+            x = self.norm2(x.transpose(-2, -1))
+            x = self.activation2(x.transpose(-2, -1))
+        else:
+            x = self.norm2(x)
+            x = self.activation2(x)
+        x = self.conv2(x)
+        x = self.dropout(x)
+        x = x + x_orig
+        return x
+class Resnet1D(nn.Module):
+    def __init__(self, n_in, n_depth, dilation_growth_rate=1, reverse_dilation=True, activation='relu', norm=None):
+        super().__init__()
+        blocks = [ResConv1DBlock(n_in, n_in, dilation=dilation_growth_rate ** depth, activation=activation, norm=norm)
+                  for depth in range(n_depth)]
+        if reverse_dilation:
+            blocks = blocks[::-1]
+        self.model = nn.Sequential(*blocks)
+    def forward(self, x):
+        return self.model(x)

models/vq/vq_trainer.py ADDED Viewed

	@@ -0,0 +1,359 @@

+import torch
+from torch.utils.data import DataLoader
+from torch.nn.utils import clip_grad_norm_
+from torch.utils.tensorboard import SummaryWriter
+from os.path import join as pjoin
+import torch.nn.functional as F
+import torch.optim as optim
+import time
+import numpy as np
+from collections import OrderedDict, defaultdict
+from utils.eval_t2m import evaluation_vqvae, evaluation_res_conv
+from utils.utils import print_current_loss
+import os
+import sys
+def def_value():
+    return 0.0
+class RVQTokenizerTrainer:
+    def __init__(self, args, vq_model):
+        self.opt = args
+        self.vq_model = vq_model
+        self.device = args.device
+        if args.is_train:
+            self.logger = SummaryWriter(args.log_dir)
+            if args.recons_loss == 'l1':
+                self.l1_criterion = torch.nn.L1Loss()
+            elif args.recons_loss == 'l1_smooth':
+                self.l1_criterion = torch.nn.SmoothL1Loss()
+        # self.critic = CriticWrapper(self.opt.dataset_name, self.opt.device)
+    def forward(self, batch_data):
+        motions = batch_data.detach().to(self.device).float()
+        pred_motion, loss_commit, perplexity = self.vq_model(motions)
+        self.motions = motions
+        self.pred_motion = pred_motion
+        loss_rec = self.l1_criterion(pred_motion, motions)
+        pred_local_pos = pred_motion[..., 4 : (self.opt.joints_num - 1) * 3 + 4]
+        local_pos = motions[..., 4 : (self.opt.joints_num - 1) * 3 + 4]
+        loss_explicit = self.l1_criterion(pred_local_pos, local_pos)
+        loss = loss_rec + self.opt.loss_vel * loss_explicit + self.opt.commit * loss_commit
+        # return loss, loss_rec, loss_vel, loss_commit, perplexity
+        # return loss, loss_rec, loss_percept, loss_commit, perplexity
+        return loss, loss_rec, loss_explicit, loss_commit, perplexity
+    # @staticmethod
+    def update_lr_warm_up(self, nb_iter, warm_up_iter, lr):
+        current_lr = lr * (nb_iter + 1) / (warm_up_iter + 1)
+        for param_group in self.opt_vq_model.param_groups:
+            param_group["lr"] = current_lr
+        return current_lr
+    def save(self, file_name, ep, total_it):
+        state = {
+            "vq_model": self.vq_model.state_dict(),
+            "opt_vq_model": self.opt_vq_model.state_dict(),
+            "scheduler": self.scheduler.state_dict(),
+            'ep': ep,
+            'total_it': total_it,
+        }
+        torch.save(state, file_name)
+    def resume(self, model_dir):
+        checkpoint = torch.load(model_dir, map_location=self.device)
+        self.vq_model.load_state_dict(checkpoint['vq_model'])
+        self.opt_vq_model.load_state_dict(checkpoint['opt_vq_model'])
+        self.scheduler.load_state_dict(checkpoint['scheduler'])
+        return checkpoint['ep'], checkpoint['total_it']
+    def train(self, train_loader, val_loader, eval_val_loader, eval_wrapper, plot_eval=None):
+        self.vq_model.to(self.device)
+        self.opt_vq_model = optim.AdamW(self.vq_model.parameters(), lr=self.opt.lr, betas=(0.9, 0.99), weight_decay=self.opt.weight_decay)
+        self.scheduler = torch.optim.lr_scheduler.MultiStepLR(self.opt_vq_model, milestones=self.opt.milestones, gamma=self.opt.gamma)
+        epoch = 0
+        it = 0
+        if self.opt.is_continue:
+            model_dir = pjoin(self.opt.model_dir, 'latest.tar')
+            epoch, it = self.resume(model_dir)
+            print("Load model epoch:%d iterations:%d"%(epoch, it))
+        start_time = time.time()
+        total_iters = self.opt.max_epoch * len(train_loader)
+        print(f'Total Epochs: {self.opt.max_epoch}, Total Iters: {total_iters}')
+        print('Iters Per Epoch, Training: %04d, Validation: %03d' % (len(train_loader), len(eval_val_loader)))
+        # val_loss = 0
+        # min_val_loss = np.inf
+        # min_val_epoch = epoch
+        current_lr = self.opt.lr
+        logs = defaultdict(def_value, OrderedDict())
+        # sys.exit()
+        best_fid, best_div, best_top1, best_top2, best_top3, best_matching, writer = evaluation_vqvae(
+            self.opt.model_dir, eval_val_loader, self.vq_model, self.logger, epoch, best_fid=1000,
+            best_div=100, best_top1=0,
+            best_top2=0, best_top3=0, best_matching=100,
+            eval_wrapper=eval_wrapper, save=False)
+        while epoch < self.opt.max_epoch:
+            self.vq_model.train()
+            for i, batch_data in enumerate(train_loader):
+                it += 1
+                if it < self.opt.warm_up_iter:
+                    current_lr = self.update_lr_warm_up(it, self.opt.warm_up_iter, self.opt.lr)
+                loss, loss_rec, loss_vel, loss_commit, perplexity = self.forward(batch_data)
+                self.opt_vq_model.zero_grad()
+                loss.backward()
+                self.opt_vq_model.step()
+                if it >= self.opt.warm_up_iter:
+                    self.scheduler.step()
+                logs['loss'] += loss.item()
+                logs['loss_rec'] += loss_rec.item()
+                # Note it not necessarily velocity, too lazy to change the name now
+                logs['loss_vel'] += loss_vel.item()
+                logs['loss_commit'] += loss_commit.item()
+                logs['perplexity'] += perplexity.item()
+                logs['lr'] += self.opt_vq_model.param_groups[0]['lr']
+                if it % self.opt.log_every == 0:
+                    mean_loss = OrderedDict()
+                    # self.logger.add_scalar('val_loss', val_loss, it)
+                    # self.l
+                    for tag, value in logs.items():
+                        self.logger.add_scalar('Train/%s'%tag, value / self.opt.log_every, it)
+                        mean_loss[tag] = value / self.opt.log_every
+                    logs = defaultdict(def_value, OrderedDict())
+                    print_current_loss(start_time, it, total_iters, mean_loss, epoch=epoch, inner_iter=i)
+                if it % self.opt.save_latest == 0:
+                    self.save(pjoin(self.opt.model_dir, 'latest.tar'), epoch, it)
+            self.save(pjoin(self.opt.model_dir, 'latest.tar'), epoch, it)
+            epoch += 1
+            # if epoch % self.opt.save_every_e == 0:
+            #     self.save(pjoin(self.opt.model_dir, 'E%04d.tar' % (epoch)), epoch, total_it=it)
+            print('Validation time:')
+            self.vq_model.eval()
+            val_loss_rec = []
+            val_loss_vel = []
+            val_loss_commit = []
+            val_loss = []
+            val_perpexity = []
+            with torch.no_grad():
+                for i, batch_data in enumerate(val_loader):
+                    loss, loss_rec, loss_vel, loss_commit, perplexity = self.forward(batch_data)
+                    # val_loss_rec += self.l1_criterion(self.recon_motions, self.motions).item()
+                    # val_loss_emb += self.embedding_loss.item()
+                    val_loss.append(loss.item())
+                    val_loss_rec.append(loss_rec.item())
+                    val_loss_vel.append(loss_vel.item())
+                    val_loss_commit.append(loss_commit.item())
+                    val_perpexity.append(perplexity.item())
+            # val_loss = val_loss_rec / (len(val_dataloader) + 1)
+            # val_loss = val_loss / (len(val_dataloader) + 1)
+            # val_loss_rec = val_loss_rec / (len(val_dataloader) + 1)
+            # val_loss_emb = val_loss_emb / (len(val_dataloader) + 1)
+            self.logger.add_scalar('Val/loss', sum(val_loss) / len(val_loss), epoch)
+            self.logger.add_scalar('Val/loss_rec', sum(val_loss_rec) / len(val_loss_rec), epoch)
+            self.logger.add_scalar('Val/loss_vel', sum(val_loss_vel) / len(val_loss_vel), epoch)
+            self.logger.add_scalar('Val/loss_commit', sum(val_loss_commit) / len(val_loss), epoch)
+            self.logger.add_scalar('Val/loss_perplexity', sum(val_perpexity) / len(val_loss_rec), epoch)
+            print('Validation Loss: %.5f Reconstruction: %.5f, Velocity: %.5f, Commit: %.5f' %
+                  (sum(val_loss)/len(val_loss), sum(val_loss_rec)/len(val_loss),
+                   sum(val_loss_vel)/len(val_loss), sum(val_loss_commit)/len(val_loss)))
+            # if sum(val_loss) / len(val_loss) < min_val_loss:
+            #     min_val_loss = sum(val_loss) / len(val_loss)
+            # # if sum(val_loss_vel) / len(val_loss_vel) < min_val_loss:
+            # #     min_val_loss = sum(val_loss_vel) / len(val_loss_vel)
+            #     min_val_epoch = epoch
+            #     self.save(pjoin(self.opt.model_dir, 'finest.tar'), epoch, it)
+            #     print('Best Validation Model So Far!~')
+            best_fid, best_div, best_top1, best_top2, best_top3, best_matching, writer = evaluation_vqvae(
+                self.opt.model_dir, eval_val_loader, self.vq_model, self.logger, epoch, best_fid=best_fid,
+                best_div=best_div, best_top1=best_top1,
+                best_top2=best_top2, best_top3=best_top3, best_matching=best_matching, eval_wrapper=eval_wrapper)
+            if epoch % self.opt.eval_every_e == 0:
+                data = torch.cat([self.motions[:4], self.pred_motion[:4]], dim=0).detach().cpu().numpy()
+                # np.save(pjoin(self.opt.eval_dir, 'E%04d.npy' % (epoch)), data)
+                save_dir = pjoin(self.opt.eval_dir, 'E%04d' % (epoch))
+                os.makedirs(save_dir, exist_ok=True)
+                plot_eval(data, save_dir)
+                # if plot_eval is not None:
+                #     save_dir = pjoin(self.opt.eval_dir, 'E%04d' % (epoch))
+                #     os.makedirs(save_dir, exist_ok=True)
+                #     plot_eval(data, save_dir)
+            # if epoch - min_val_epoch >= self.opt.early_stop_e:
+            #     print('Early Stopping!~')
+class LengthEstTrainer(object):
+    def __init__(self, args, estimator, text_encoder, encode_fnc):
+        self.opt = args
+        self.estimator = estimator
+        self.text_encoder = text_encoder
+        self.encode_fnc = encode_fnc
+        self.device = args.device
+        if args.is_train:
+            # self.motion_dis
+            self.logger = SummaryWriter(args.log_dir)
+            self.mul_cls_criterion = torch.nn.CrossEntropyLoss()
+    def resume(self, model_dir):
+        checkpoints = torch.load(model_dir, map_location=self.device)
+        self.estimator.load_state_dict(checkpoints['estimator'])
+        # self.opt_estimator.load_state_dict(checkpoints['opt_estimator'])
+        return checkpoints['epoch'], checkpoints['iter']
+    def save(self, model_dir, epoch, niter):
+        state = {
+            'estimator': self.estimator.state_dict(),
+            # 'opt_estimator': self.opt_estimator.state_dict(),
+            'epoch': epoch,
+            'niter': niter,
+        }
+        torch.save(state, model_dir)
+    @staticmethod
+    def zero_grad(opt_list):
+        for opt in opt_list:
+            opt.zero_grad()
+    @staticmethod
+    def clip_norm(network_list):
+        for network in network_list:
+            clip_grad_norm_(network.parameters(), 0.5)
+    @staticmethod
+    def step(opt_list):
+        for opt in opt_list:
+            opt.step()
+    def train(self, train_dataloader, val_dataloader):
+        self.estimator.to(self.device)
+        self.text_encoder.to(self.device)
+        self.opt_estimator = optim.Adam(self.estimator.parameters(), lr=self.opt.lr)
+        epoch = 0
+        it = 0
+        if self.opt.is_continue:
+            model_dir = pjoin(self.opt.model_dir, 'latest.tar')
+            epoch, it = self.resume(model_dir)
+        start_time = time.time()
+        total_iters = self.opt.max_epoch * len(train_dataloader)
+        print('Iters Per Epoch, Training: %04d, Validation: %03d' % (len(train_dataloader), len(val_dataloader)))
+        val_loss = 0
+        min_val_loss = np.inf
+        logs = defaultdict(float)
+        while epoch < self.opt.max_epoch:
+            # time0 = time.time()
+            for i, batch_data in enumerate(train_dataloader):
+                self.estimator.train()
+                conds, _, m_lens = batch_data
+                # word_emb = word_emb.detach().to(self.device).float()
+                # pos_ohot = pos_ohot.detach().to(self.device).float()
+                # m_lens = m_lens.to(self.device).long()
+                text_embs = self.encode_fnc(self.text_encoder, conds, self.opt.device).detach()
+                # print(text_embs.shape, text_embs.device)
+                pred_dis = self.estimator(text_embs)
+                self.zero_grad([self.opt_estimator])
+                gt_labels = m_lens // self.opt.unit_length
+                gt_labels = gt_labels.long().to(self.device)
+                # print(gt_labels.shape, pred_dis.shape)
+                # print(gt_labels.max(), gt_labels.min())
+                # print(pred_dis)
+                acc = (gt_labels == pred_dis.argmax(dim=-1)).sum() / len(gt_labels)
+                loss = self.mul_cls_criterion(pred_dis, gt_labels)
+                loss.backward()
+                self.clip_norm([self.estimator])
+                self.step([self.opt_estimator])
+                logs['loss'] += loss.item()
+                logs['acc'] += acc.item()
+                it += 1
+                if it % self.opt.log_every == 0:
+                    mean_loss = OrderedDict({'val_loss': val_loss})
+                    # self.logger.add_scalar('Val/loss', val_loss, it)
+                    for tag, value in logs.items():
+                        self.logger.add_scalar("Train/%s"%tag, value / self.opt.log_every, it)
+                        mean_loss[tag] = value / self.opt.log_every
+                    logs = defaultdict(float)
+                    print_current_loss(start_time, it, total_iters, mean_loss, epoch=epoch, inner_iter=i)
+                    if it % self.opt.save_latest == 0:
+                        self.save(pjoin(self.opt.model_dir, 'latest.tar'), epoch, it)
+            self.save(pjoin(self.opt.model_dir, 'latest.tar'), epoch, it)
+            epoch += 1
+            print('Validation time:')
+            val_loss = 0
+            val_acc = 0
+            # self.estimator.eval()
+            with torch.no_grad():
+                for i, batch_data in enumerate(val_dataloader):
+                    self.estimator.eval()
+                    conds, _, m_lens = batch_data
+                    # word_emb = word_emb.detach().to(self.device).float()
+                    # pos_ohot = pos_ohot.detach().to(self.device).float()
+                    # m_lens = m_lens.to(self.device).long()
+                    text_embs = self.encode_fnc(self.text_encoder, conds, self.opt.device)
+                    pred_dis = self.estimator(text_embs)
+                    gt_labels = m_lens // self.opt.unit_length
+                    gt_labels = gt_labels.long().to(self.device)
+                    loss = self.mul_cls_criterion(pred_dis, gt_labels)
+                    acc = (gt_labels == pred_dis.argmax(dim=-1)).sum() / len(gt_labels)
+                    val_loss += loss.item()
+                    val_acc += acc.item()
+            val_loss = val_loss / len(val_dataloader)
+            val_acc = val_acc / len(val_dataloader)
+            print('Validation Loss: %.5f Validation Acc: %.5f' % (val_loss, val_acc))
+            if val_loss < min_val_loss:
+                self.save(pjoin(self.opt.model_dir, 'finest.tar'), epoch, it)
+                min_val_loss = val_loss

motion_loaders/__init__.py ADDED Viewed

File without changes

motion_loaders/dataset_motion_loader.py ADDED Viewed

	@@ -0,0 +1,27 @@

+from data.t2m_dataset import Text2MotionDatasetEval, collate_fn # TODO
+from utils.word_vectorizer import WordVectorizer
+import numpy as np
+from os.path import join as pjoin
+from torch.utils.data import DataLoader
+from utils.get_opt import get_opt
+def get_dataset_motion_loader(opt_path, batch_size, fname, device):
+    opt = get_opt(opt_path, device)
+    # Configurations of T2M dataset and KIT dataset is almost the same
+    if opt.dataset_name == 't2m' or opt.dataset_name == 'kit':
+        print('Loading dataset %s ...' % opt.dataset_name)
+        mean = np.load(pjoin(opt.meta_dir, 'mean.npy'))
+        std = np.load(pjoin(opt.meta_dir, 'std.npy'))
+        w_vectorizer = WordVectorizer('./glove', 'our_vab')
+        split_file = pjoin(opt.data_root, '%s.txt'%fname)
+        dataset = Text2MotionDatasetEval(opt, mean, std, split_file, w_vectorizer)
+        dataloader = DataLoader(dataset, batch_size=batch_size, num_workers=4, drop_last=True,
+                                collate_fn=collate_fn, shuffle=True)
+    else:
+        raise KeyError('Dataset not Recognized !!')
+    print('Ground Truth Dataset Loading Completed!!!')
+    return dataloader, dataset

options/__init__.py ADDED Viewed

File without changes

options/base_option.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import argparse
+import os
+import torch
+class BaseOptions():
+    def __init__(self):
+        self.parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+        self.initialized = False
+    def initialize(self):
+        self.parser.add_argument('--name', type=str, default="t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns", help='Name of this trial')
+        self.parser.add_argument('--vq_name', type=str, default="rvq_nq1_dc512_nc512", help='Name of the rvq model.')
+        self.parser.add_argument("--gpu_id", type=int, default=-1, help='GPU id')
+        self.parser.add_argument('--dataset_name', type=str, default='t2m', help='Dataset Name, {t2m} for humanml3d, {kit} for kit-ml')
+        self.parser.add_argument('--checkpoints_dir', type=str, default='./checkpoints', help='models are saved here.')
+        self.parser.add_argument('--latent_dim', type=int, default=384, help='Dimension of transformer latent.')
+        self.parser.add_argument('--n_heads', type=int, default=6, help='Number of heads.')
+        self.parser.add_argument('--n_layers', type=int, default=8, help='Number of attention layers.')
+        self.parser.add_argument('--ff_size', type=int, default=1024, help='FF_Size')
+        self.parser.add_argument('--dropout', type=float, default=0.2, help='Dropout ratio in transformer')
+        self.parser.add_argument("--max_motion_length", type=int, default=196, help="Max length of motion")
+        self.parser.add_argument("--unit_length", type=int, default=4, help="Downscale ratio of VQ")
+        self.parser.add_argument('--force_mask', action="store_true", help='True: mask out conditions')
+        self.initialized = True
+    def parse(self):
+        if not self.initialized:
+            self.initialize()
+        self.opt = self.parser.parse_args()
+        self.opt.is_train = self.is_train
+        if self.opt.gpu_id != -1:
+            # self.opt.gpu_id = int(self.opt.gpu_id)
+            torch.cuda.set_device(self.opt.gpu_id)
+        args = vars(self.opt)
+        print('------------ Options -------------')
+        for k, v in sorted(args.items()):
+            print('%s: %s' % (str(k), str(v)))
+        print('-------------- End ----------------')
+        if self.is_train:
+            # save to the disk
+            expr_dir = os.path.join(self.opt.checkpoints_dir, self.opt.dataset_name, self.opt.name)
+            if not os.path.exists(expr_dir):
+                os.makedirs(expr_dir)
+            file_name = os.path.join(expr_dir, 'opt.txt')
+            with open(file_name, 'wt') as opt_file:
+                opt_file.write('------------ Options -------------\n')
+                for k, v in sorted(args.items()):
+                    opt_file.write('%s: %s\n' % (str(k), str(v)))
+                opt_file.write('-------------- End ----------------\n')
+        return self.opt

options/eval_option.py ADDED Viewed

	@@ -0,0 +1,38 @@

+from options.base_option import BaseOptions
+class EvalT2MOptions(BaseOptions):
+    def initialize(self):
+        BaseOptions.initialize(self)
+        self.parser.add_argument('--which_epoch', type=str, default="latest", help='Checkpoint you want to use, {latest, net_best_fid, etc}')
+        self.parser.add_argument('--batch_size', type=int, default=32, help='Batch size')
+        self.parser.add_argument('--ext', type=str, default='text2motion', help='Extension of the result file or folder')
+        self.parser.add_argument("--num_batch", default=2, type=int,
+                                 help="Number of batch for generation")
+        self.parser.add_argument("--repeat_times", default=1, type=int,
+                                 help="Number of repetitions, per sample text prompt")
+        self.parser.add_argument("--cond_scale", default=4, type=float,
+                                 help="For classifier-free sampling - specifies the s parameter, as defined in the paper.")
+        self.parser.add_argument("--temperature", default=1., type=float,
+                                 help="Sampling Temperature.")
+        self.parser.add_argument("--topkr", default=0.9, type=float,
+                                 help="Filter out percentil low prop entries.")
+        self.parser.add_argument("--time_steps", default=18, type=int,
+                                 help="Mask Generate steps.")
+        self.parser.add_argument("--seed", default=10107, type=int)
+        self.parser.add_argument('--gumbel_sample', action="store_true", help='True: gumbel sampling, False: categorical sampling.')
+        self.parser.add_argument('--use_res_model', action="store_true", help='Whether to use residual transformer.')
+        # self.parser.add_argument('--est_length', action="store_true", help='Training iterations')
+        self.parser.add_argument('--res_name', type=str, default='tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw', help='Model name of residual transformer')
+        self.parser.add_argument('--text_path', type=str, default="", help='Text prompt file')
+        self.parser.add_argument('-msec', '--mask_edit_section', nargs='*', type=str, help='Indicate sections for editing, use comma to separate the start and end of a section'
+                                 'type int will specify the token frame, type float will specify the ratio of seq_len')
+        self.parser.add_argument('--text_prompt', default='', type=str, help="A text prompt to be generated. If empty, will take text prompts from dataset.")
+        self.parser.add_argument('--source_motion', default='example_data/000612.npy', type=str, help="Source motion path for editing. (new_joint_vecs format .npy file)")
+        self.parser.add_argument("--motion_length", default=0, type=int,
+                                 help="Motion length for generation, only applicable with single text prompt.")
+        self.is_train = False

options/train_option.py ADDED Viewed

	@@ -0,0 +1,64 @@

+from options.base_option import BaseOptions
+import argparse
+class TrainT2MOptions(BaseOptions):
+    def initialize(self):
+        BaseOptions.initialize(self)
+        self.parser.add_argument('--batch_size', type=int, default=64, help='Batch size')
+        self.parser.add_argument('--max_epoch', type=int, default=500, help='Maximum number of epoch for training')
+        # self.parser.add_argument('--max_iters', type=int, default=150_000, help='Training iterations')
+        '''LR scheduler'''
+        self.parser.add_argument('--lr', type=float, default=2e-4, help='Learning rate')
+        self.parser.add_argument('--gamma', type=float, default=0.1, help='Learning rate schedule factor')
+        self.parser.add_argument('--milestones', default=[50_000], nargs="+", type=int,
+                            help="learning rate schedule (iterations)")
+        self.parser.add_argument('--warm_up_iter', default=2000, type=int, help='number of total iterations for warmup')
+        '''Condition'''
+        self.parser.add_argument('--cond_drop_prob', type=float, default=0.1, help='Drop ratio of condition, for classifier-free guidance')
+        self.parser.add_argument("--seed", default=3407, type=int, help="Seed")
+        self.parser.add_argument('--is_continue', action="store_true", help='Is this trial continuing previous state?')
+        self.parser.add_argument('--gumbel_sample', action="store_true", help='Strategy for token sampling, True: Gumbel sampling, False: Categorical sampling')
+        self.parser.add_argument('--share_weight', action="store_true", help='Whether to share weight for projection/embedding, for residual transformer.')
+        self.parser.add_argument('--log_every', type=int, default=50, help='Frequency of printing training progress, (iteration)')
+        # self.parser.add_argument('--save_every_e', type=int, default=100, help='Frequency of printing training progress')
+        self.parser.add_argument('--eval_every_e', type=int, default=10, help='Frequency of animating eval results, (epoch)')
+        self.parser.add_argument('--save_latest', type=int, default=500, help='Frequency of saving checkpoint, (iteration)')
+        self.is_train = True
+class TrainLenEstOptions():
+    def __init__(self):
+        self.parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+        self.parser.add_argument('--name', type=str, default="test", help='Name of this trial')
+        self.parser.add_argument("--gpu_id", type=int, default=-1, help='GPU id')
+        self.parser.add_argument('--dataset_name', type=str, default='t2m', help='Dataset Name')
+        self.parser.add_argument('--checkpoints_dir', type=str, default='./checkpoints', help='models are saved here')
+        self.parser.add_argument('--batch_size', type=int, default=64, help='Batch size')
+        self.parser.add_argument("--unit_length", type=int, default=4, help="Length of motion")
+        self.parser.add_argument("--max_text_len", type=int, default=20, help="Length of motion")
+        self.parser.add_argument('--max_epoch', type=int, default=300, help='Training iterations')
+        self.parser.add_argument('--lr', type=float, default=1e-4, help='Layers of GRU')
+        self.parser.add_argument('--is_continue', action="store_true", help='Training iterations')
+        self.parser.add_argument('--log_every', type=int, default=50, help='Frequency of printing training progress')
+        self.parser.add_argument('--save_every_e', type=int, default=5, help='Frequency of printing training progress')
+        self.parser.add_argument('--eval_every_e', type=int, default=3, help='Frequency of printing training progress')
+        self.parser.add_argument('--save_latest', type=int, default=500, help='Frequency of printing training progress')
+    def parse(self):
+        self.opt = self.parser.parse_args()
+        self.opt.is_train = True
+        # args = vars(self.opt)
+        return self.opt

options/vq_option.py ADDED Viewed

	@@ -0,0 +1,89 @@

+import argparse
+import os
+import torch
+def arg_parse(is_train=False):
+    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    ## dataloader
+    parser.add_argument('--dataset_name', type=str, default='humanml3d', help='dataset directory')
+    parser.add_argument('--batch_size', default=256, type=int, help='batch size')
+    parser.add_argument('--window_size', type=int, default=64, help='training motion length')
+    parser.add_argument("--gpu_id", type=int, default=0, help='GPU id')
+    ## optimization
+    parser.add_argument('--max_epoch', default=50, type=int, help='number of total epochs to run')
+    # parser.add_argument('--total_iter', default=None, type=int, help='number of total iterations to run')
+    parser.add_argument('--warm_up_iter', default=2000, type=int, help='number of total iterations for warmup')
+    parser.add_argument('--lr', default=2e-4, type=float, help='max learning rate')
+    parser.add_argument('--milestones', default=[150000, 250000], nargs="+", type=int, help="learning rate schedule (iterations)")
+    parser.add_argument('--gamma', default=0.1, type=float, help="learning rate decay")
+    parser.add_argument('--weight_decay', default=0.0, type=float, help='weight decay')
+    parser.add_argument("--commit", type=float, default=0.02, help="hyper-parameter for the commitment loss")
+    parser.add_argument('--loss_vel', type=float, default=0.5, help='hyper-parameter for the velocity loss')
+    parser.add_argument('--recons_loss', type=str, default='l1_smooth', help='reconstruction loss')
+    ## vqvae arch
+    parser.add_argument("--code_dim", type=int, default=512, help="embedding dimension")
+    parser.add_argument("--nb_code", type=int, default=512, help="nb of embedding")
+    parser.add_argument("--mu", type=float, default=0.99, help="exponential moving average to update the codebook")
+    parser.add_argument("--down_t", type=int, default=2, help="downsampling rate")
+    parser.add_argument("--stride_t", type=int, default=2, help="stride size")
+    parser.add_argument("--width", type=int, default=512, help="width of the network")
+    parser.add_argument("--depth", type=int, default=3, help="num of resblocks for each res")
+    parser.add_argument("--dilation_growth_rate", type=int, default=3, help="dilation growth rate")
+    parser.add_argument("--output_emb_width", type=int, default=512, help="output embedding width")
+    parser.add_argument('--vq_act', type=str, default='relu', choices=['relu', 'silu', 'gelu'],
+                        help='dataset directory')
+    parser.add_argument('--vq_norm', type=str, default=None, help='dataset directory')
+    parser.add_argument('--num_quantizers', type=int, default=3, help='num_quantizers')
+    parser.add_argument('--shared_codebook', action="store_true")
+    parser.add_argument('--quantize_dropout_prob', type=float, default=0.2, help='quantize_dropout_prob')
+    # parser.add_argument('--use_vq_prob', type=float, default=0.8, help='quantize_dropout_prob')
+    parser.add_argument('--ext', type=str, default='default', help='reconstruction loss')
+    ## other
+    parser.add_argument('--name', type=str, default="test", help='Name of this trial')
+    parser.add_argument('--is_continue', action="store_true", help='Name of this trial')
+    parser.add_argument('--checkpoints_dir', type=str, default='./checkpoints', help='models are saved here')
+    parser.add_argument('--log_every', default=10, type=int, help='iter log frequency')
+    parser.add_argument('--save_latest', default=500, type=int, help='iter save latest model frequency')
+    parser.add_argument('--save_every_e', default=2, type=int, help='save model every n epoch')
+    parser.add_argument('--eval_every_e', default=1, type=int, help='save eval results every n epoch')
+    # parser.add_argument('--early_stop_e', default=5, type=int, help='early stopping epoch')
+    parser.add_argument('--feat_bias', type=float, default=5, help='Layers of GRU')
+    parser.add_argument('--which_epoch', type=str, default="all", help='Name of this trial')
+    ## For Res Predictor only
+    parser.add_argument('--vq_name', type=str, default="rvq_nq6_dc512_nc512_noshare_qdp0.2", help='Name of this trial')
+    parser.add_argument('--n_res', type=int, default=2, help='Name of this trial')
+    parser.add_argument('--do_vq_res', action="store_true")
+    parser.add_argument("--seed", default=3407, type=int)
+    opt = parser.parse_args()
+    torch.cuda.set_device(opt.gpu_id)
+    args = vars(opt)
+    print('------------ Options -------------')
+    for k, v in sorted(args.items()):
+        print('%s: %s' % (str(k), str(v)))
+    print('-------------- End ----------------')
+    opt.is_train = is_train
+    if is_train:
+    # save to the disk
+        expr_dir = os.path.join(opt.checkpoints_dir, opt.dataset_name, opt.name)
+        if not os.path.exists(expr_dir):
+            os.makedirs(expr_dir)
+        file_name = os.path.join(expr_dir, 'opt.txt')
+        with open(file_name, 'wt') as opt_file:
+            opt_file.write('------------ Options -------------\n')
+            for k, v in sorted(args.items()):
+                opt_file.write('%s: %s\n' % (str(k), str(v)))
+            opt_file.write('-------------- End ----------------\n')
+    return opt

prepare/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

prepare/download_evaluator.sh ADDED Viewed

	@@ -0,0 +1,24 @@

+cd checkpoints
+cd t2m
+echo -e "Downloading evaluation models for HumanML3D dataset"
+gdown --fuzzy https://drive.google.com/file/d/1oLhSH7zTlYkQdUWPv3-v4opigB7pXkFk/view?usp=sharing
+echo -e "Unzipping humanml3d_evaluator.zip"
+unzip humanml3d_evaluator.zip
+echo -e "Clearning humanml3d_evaluator.zip"
+rm humanml3d_evaluator.zip
+cd ../kit/
+echo -e "Downloading pretrained models for KIT-ML dataset"
+gdown --fuzzy https://drive.google.com/file/d/115n1ijntyKDDIZZEuA_aBgffyplNE5az/view?usp=sharing
+echo -e "Unzipping kit_evaluator.zip"
+unzip kit_evaluator.zip
+echo -e "Clearning kit_evaluator.zip"
+rm kit_evaluator.zip
+cd ../../
+echo -e "Downloading done!"

prepare/download_glove.sh ADDED Viewed

	@@ -0,0 +1,9 @@

+echo -e "Downloading glove (in use by the evaluators, not by MoMask itself)"
+gdown --fuzzy https://drive.google.com/file/d/1cmXKUT31pqd7_XpJAiWEo1K81TMYHA5n/view?usp=sharing
+rm -rf glove
+unzip glove.zip
+echo -e "Cleaning\n"
+rm glove.zip
+echo -e "Downloading done!"

prepare/download_models.sh ADDED Viewed

	@@ -0,0 +1,31 @@

+rm -rf checkpoints
+mkdir checkpoints
+cd checkpoints
+mkdir t2m
+cd t2m
+echo -e "Downloading pretrained models for HumanML3D dataset"
+gdown --fuzzy https://drive.google.com/file/d/1dtKP2xBk-UjG9o16MVfBJDmGNSI56Dch/view?usp=sharing
+echo -e "Unzipping humanml3d_models.zip"
+unzip humanml3d_models.zip
+echo -e "Cleaning humanml3d_models.zip"
+rm humanml3d_models.zip
+cd ../
+mkdir kit
+cd kit
+echo -e "Downloading pretrained models for KIT-ML dataset"
+gdown --fuzzy https://drive.google.com/file/d/1MNMdUdn5QoO8UW1iwTcZ0QNaLSH4A6G9/view?usp=sharing
+echo -e "Unzipping kit_models.zip"
+unzip kit_models.zip
+echo -e "Cleaning kit_models.zip"
+rm kit_models.zip
+cd ../../
+echo -e "Downloading done!"

prepare/download_models_demo.sh ADDED Viewed

	@@ -0,0 +1,10 @@

+rm -rf checkpoints
+mkdir checkpoints
+cd checkpoints
+mkdir t2m
+cd t2m
+echo -e "Downloading pretrained models for HumanML3D dataset"
+gdown --fuzzy https://drive.google.com/file/d/1dtKP2xBk-UjG9o16MVfBJDmGNSI56Dch/view?usp=sharing
+unzip humanml3d_models.zip
+rm humanml3d_models.zip
+cd ../../

requirements.txt ADDED Viewed

	@@ -0,0 +1,140 @@

+absl-py @ file:///home/conda/feedstock_root/build_artifacts/absl-py_1673535674859/work
+aiofiles==23.2.1
+aiohttp @ file:///croot/aiohttp_1670009560265/work
+aiosignal @ file:///tmp/build/80754af9/aiosignal_1637843061372/work
+altair==5.0.1
+anyio==3.7.1
+async-timeout @ file:///opt/conda/conda-bld/async-timeout_1664876359750/work
+asynctest==0.13.0
+attrs @ file:///croot/attrs_1668696182826/work
+beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1649463573192/work
+blinker==1.4
+blis==0.7.8
+blobfile==2.0.2
+brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1648854164153/work
+cachetools==5.3.1
+catalogue @ file:///home/conda/feedstock_root/build_artifacts/catalogue_1661366519934/work
+certifi @ file:///croot/certifi_1671487769961/work/certifi
+cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work
+charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1661170624537/work
+chumpy==0.70
+click==8.1.3
+clip @ git+https://github.com/openai/CLIP.git@a9b1bf5920416aaeaec965c25dd9e8f98c864f16
+colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1655412516417/work
+confection==0.0.2
+cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1636040646098/work
+cycler @ file:///tmp/build/80754af9/cycler_1637851556182/work
+cymem @ file:///home/conda/feedstock_root/build_artifacts/cymem_1649412169067/work
+dataclasses @ file:///home/conda/feedstock_root/build_artifacts/dataclasses_1628958434797/work
+einops==0.6.1
+en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0-py3-none-any.whl
+exceptiongroup==1.2.0
+fastapi==0.103.2
+ffmpy==0.3.1
+filelock @ file:///home/conda/feedstock_root/build_artifacts/filelock_1660129891014/work
+frozenlist @ file:///croot/frozenlist_1670004507010/work
+fsspec==2023.1.0
+ftfy==6.1.1
+gdown==4.7.1
+google-auth==2.19.1
+google-auth-oauthlib==0.4.6
+gradio==3.34.0
+gradio_client==0.2.6
+grpcio==1.54.2
+h11==0.14.0
+h5py @ file:///tmp/abs_4aewd3wzey/croots/recipe/h5py_1659091371897/work
+httpcore==0.17.3
+httpx==0.24.1
+huggingface-hub==0.16.4
+idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work
+importlib-metadata==5.0.0
+importlib-resources==5.12.0
+Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1654302431367/work
+joblib @ file:///tmp/build/80754af9/joblib_1635411271373/work
+jsonschema==4.17.3
+kiwisolver @ file:///opt/conda/conda-bld/kiwisolver_1653292039266/work
+langcodes @ file:///home/conda/feedstock_root/build_artifacts/langcodes_1636741340529/work
+linkify-it-py==2.0.2
+loralib==0.1.1
+lxml==4.9.1
+Markdown @ file:///home/conda/feedstock_root/build_artifacts/markdown_1679584000376/work
+markdown-it-py==2.2.0
+MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1648737551960/work
+matplotlib==3.1.3
+mdit-py-plugins==0.3.3
+mdurl==0.1.2
+mkl-fft==1.3.1
+mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work
+mkl-service==2.4.0
+multidict @ file:///croot/multidict_1665674239670/work
+murmurhash==1.0.8
+numpy @ file:///opt/conda/conda-bld/numpy_and_numpy_base_1653915516269/work
+oauthlib==3.2.2
+orjson==3.9.7
+packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1637239678211/work
+pandas==1.3.5
+pathy @ file:///home/conda/feedstock_root/build_artifacts/pathy_1656568808184/work
+Pillow==9.2.0
+pkgutil_resolve_name==1.3.10
+preshed==3.0.7
+protobuf==3.20.3
+pyasn1==0.5.0
+pyasn1-modules==0.3.0
+pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
+pycryptodomex==3.15.0
+pydantic @ file:///home/conda/feedstock_root/build_artifacts/pydantic_1636021129189/work
+pydub==0.25.1
+Pygments==2.17.2
+PyJWT @ file:///opt/conda/conda-bld/pyjwt_1657544592787/work
+pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1663846997386/work
+pyparsing @ file:///opt/conda/conda-bld/pyparsing_1661452539315/work
+pyrsistent==0.19.3
+PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1648857264451/work
+python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
+python-multipart==0.0.6
+pytz==2023.3
+PyYAML==6.0
+regex==2022.9.13
+requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1661872987712/work
+requests-oauthlib==1.3.1
+rsa==4.9
+scikit-learn @ file:///tmp/build/80754af9/scikit-learn_1642601761909/work
+scipy @ file:///opt/conda/conda-bld/scipy_1661390393401/work
+semantic-version==2.10.0
+shellingham @ file:///home/conda/feedstock_root/build_artifacts/shellingham_1659638615822/work
+six @ file:///tmp/build/80754af9/six_1644875935023/work
+smart-open @ file:///home/conda/feedstock_root/build_artifacts/smart_open_1630238320325/work
+smplx==0.1.28
+sniffio==1.3.0
+soupsieve @ file:///home/conda/feedstock_root/build_artifacts/soupsieve_1658207591808/work
+spacy @ file:///opt/conda/conda-bld/spacy_1656601313568/work
+spacy-legacy @ file:///home/conda/feedstock_root/build_artifacts/spacy-legacy_1660748275723/work
+spacy-loggers @ file:///home/conda/feedstock_root/build_artifacts/spacy-loggers_1661365735520/work
+srsly==2.4.4
+starlette==0.27.0
+tensorboard==2.11.2
+tensorboard-data-server==0.6.1
+tensorboard-plugin-wit @ file:///home/builder/tkoch/workspace/tensorflow/tensorboard-plugin-wit_1658918494740/work/tensorboard_plugin_wit-1.8.1-py3-none-any.whl
+tensorboardX==2.6
+thinc==8.0.17
+threadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work
+toolz==0.12.0
+torch==1.7.1
+torch-tb-profiler==0.4.1
+torchaudio==0.7.0a0+a853dff
+torchvision==0.8.2
+tornado @ file:///opt/conda/conda-bld/tornado_1662061693373/work
+tqdm @ file:///opt/conda/conda-bld/tqdm_1664392687731/work
+trimesh @ file:///home/conda/feedstock_root/build_artifacts/trimesh_1664841281434/work
+typer @ file:///home/conda/feedstock_root/build_artifacts/typer_1657029164904/work
+typing_extensions==4.7.1
+uc-micro-py==1.0.2
+urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1678635778344/work
+uvicorn==0.22.0
+vector-quantize-pytorch==1.6.30
+wasabi @ file:///home/conda/feedstock_root/build_artifacts/wasabi_1668249950899/work
+wcwidth==0.2.5
+websockets==11.0.3
+Werkzeug @ file:///home/conda/feedstock_root/build_artifacts/werkzeug_1676411946679/work
+yarl @ file:///opt/conda/conda-bld/yarl_1661437085904/work
+zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1659400682470/work

train_res_transformer.py ADDED Viewed

	@@ -0,0 +1,171 @@

+import os
+import torch
+import numpy as np
+from torch.utils.data import DataLoader
+from os.path import join as pjoin
+from models.mask_transformer.transformer import ResidualTransformer
+from models.mask_transformer.transformer_trainer import ResidualTransformerTrainer
+from models.vq.model import RVQVAE
+from options.train_option import TrainT2MOptions
+from utils.plot_script import plot_3d_motion
+from utils.motion_process import recover_from_ric
+from utils.get_opt import get_opt
+from utils.fixseed import fixseed
+from utils.paramUtil import t2m_kinematic_chain, kit_kinematic_chain
+from data.t2m_dataset import Text2MotionDataset
+from motion_loaders.dataset_motion_loader import get_dataset_motion_loader
+from models.t2m_eval_wrapper import EvaluatorModelWrapper
+def plot_t2m(data, save_dir, captions, m_lengths):
+    data = train_dataset.inv_transform(data)
+    # print(ep_curves.shape)
+    for i, (caption, joint_data) in enumerate(zip(captions, data)):
+        joint_data = joint_data[:m_lengths[i]]
+        joint = recover_from_ric(torch.from_numpy(joint_data).float(), opt.joints_num).numpy()
+        save_path = pjoin(save_dir, '%02d.mp4'%i)
+        # print(joint.shape)
+        plot_3d_motion(save_path, kinematic_chain, joint, title=caption, fps=20)
+def load_vq_model():
+    opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'opt.txt')
+    vq_opt = get_opt(opt_path, opt.device)
+    vq_model = RVQVAE(vq_opt,
+                dim_pose,
+                vq_opt.nb_code,
+                vq_opt.code_dim,
+                vq_opt.output_emb_width,
+                vq_opt.down_t,
+                vq_opt.stride_t,
+                vq_opt.width,
+                vq_opt.depth,
+                vq_opt.dilation_growth_rate,
+                vq_opt.vq_act,
+                vq_opt.vq_norm)
+    ckpt = torch.load(pjoin(vq_opt.checkpoints_dir, vq_opt.dataset_name, vq_opt.name, 'model', 'net_best_fid.tar'),
+                            map_location=opt.device)
+    model_key = 'vq_model' if 'vq_model' in ckpt else 'net'
+    vq_model.load_state_dict(ckpt[model_key])
+    print(f'Loading VQ Model {opt.vq_name}')
+    vq_model.to(opt.device)
+    return vq_model, vq_opt
+if __name__ == '__main__':
+    parser = TrainT2MOptions()
+    opt = parser.parse()
+    fixseed(opt.seed)
+    opt.device = torch.device("cpu" if opt.gpu_id == -1 else "cuda:" + str(opt.gpu_id))
+    torch.autograd.set_detect_anomaly(True)
+    opt.save_root = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.name)
+    opt.model_dir = pjoin(opt.save_root, 'model')
+    # opt.meta_dir = pjoin(opt.save_root, 'meta')
+    opt.eval_dir = pjoin(opt.save_root, 'animation')
+    opt.log_dir = pjoin('./log/res/', opt.dataset_name, opt.name)
+    os.makedirs(opt.model_dir, exist_ok=True)
+    # os.makedirs(opt.meta_dir, exist_ok=True)
+    os.makedirs(opt.eval_dir, exist_ok=True)
+    os.makedirs(opt.log_dir, exist_ok=True)
+    if opt.dataset_name == 't2m':
+        opt.data_root = './dataset/HumanML3D'
+        opt.motion_dir = pjoin(opt.data_root, 'new_joint_vecs')
+        opt.joints_num = 22
+        opt.max_motion_len = 55
+        dim_pose = 263
+        radius = 4
+        fps = 20
+        kinematic_chain = t2m_kinematic_chain
+        dataset_opt_path = './checkpoints/t2m/Comp_v6_KLD005/opt.txt'
+    elif opt.dataset_name == 'kit': #TODO
+        opt.data_root = './dataset/KIT-ML'
+        opt.motion_dir = pjoin(opt.data_root, 'new_joint_vecs')
+        opt.joints_num = 21
+        radius = 240 * 8
+        fps = 12.5
+        dim_pose = 251
+        opt.max_motion_len = 55
+        kinematic_chain = kit_kinematic_chain
+        dataset_opt_path = './checkpoints/kit/Comp_v6_KLD005/opt.txt'
+    else:
+        raise KeyError('Dataset Does Not Exist')
+    opt.text_dir = pjoin(opt.data_root, 'texts')
+    vq_model, vq_opt = load_vq_model()
+    clip_version = 'ViT-B/32'
+    opt.num_tokens = vq_opt.nb_code
+    opt.num_quantizers = vq_opt.num_quantizers
+    # if opt.is_v2:
+    res_transformer = ResidualTransformer(code_dim=vq_opt.code_dim,
+                                          cond_mode='text',
+                                          latent_dim=opt.latent_dim,
+                                          ff_size=opt.ff_size,
+                                          num_layers=opt.n_layers,
+                                          num_heads=opt.n_heads,
+                                          dropout=opt.dropout,
+                                          clip_dim=512,
+                                          shared_codebook=vq_opt.shared_codebook,
+                                          cond_drop_prob=opt.cond_drop_prob,
+                                          # codebook=vq_model.quantizer.codebooks[0] if opt.fix_token_emb else None,
+                                            share_weight=opt.share_weight,
+                                          clip_version=clip_version,
+                                          opt=opt)
+    # else:
+    #     res_transformer = ResidualTransformer(code_dim=vq_opt.code_dim,
+    #                                           cond_mode='text',
+    #                                           latent_dim=opt.latent_dim,
+    #                                           ff_size=opt.ff_size,
+    #                                           num_layers=opt.n_layers,
+    #                                           num_heads=opt.n_heads,
+    #                                           dropout=opt.dropout,
+    #                                           clip_dim=512,
+    #                                           shared_codebook=vq_opt.shared_codebook,
+    #                                           cond_drop_prob=opt.cond_drop_prob,
+    #                                           # codebook=vq_model.quantizer.codebooks[0] if opt.fix_token_emb else None,
+    #                                           clip_version=clip_version,
+    #                                           opt=opt)
+    all_params = 0
+    pc_transformer = sum(param.numel() for param in res_transformer.parameters_wo_clip())
+    print(res_transformer)
+    # print("Total parameters of t2m_transformer net: {:.2f}M".format(pc_transformer / 1000_000))
+    all_params += pc_transformer
+    print('Total parameters of all models: {:.2f}M'.format(all_params / 1000_000))
+    mean = np.load(pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'meta', 'mean.npy'))
+    std = np.load(pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'meta', 'std.npy'))
+    train_split_file = pjoin(opt.data_root, 'train.txt')
+    val_split_file = pjoin(opt.data_root, 'val.txt')
+    train_dataset = Text2MotionDataset(opt, mean, std, train_split_file)
+    val_dataset = Text2MotionDataset(opt, mean, std, val_split_file)
+    train_loader = DataLoader(train_dataset, batch_size=opt.batch_size, num_workers=4, shuffle=True, drop_last=True)
+    val_loader = DataLoader(val_dataset, batch_size=opt.batch_size, num_workers=4, shuffle=True, drop_last=True)
+    eval_val_loader, _ = get_dataset_motion_loader(dataset_opt_path, 32, 'val', device=opt.device)
+    wrapper_opt = get_opt(dataset_opt_path, torch.device('cuda'))
+    eval_wrapper = EvaluatorModelWrapper(wrapper_opt)
+    trainer = ResidualTransformerTrainer(opt, res_transformer, vq_model)
+    trainer.train(train_loader, val_loader, eval_val_loader, eval_wrapper=eval_wrapper, plot_eval=plot_t2m)

train_t2m_transformer.py ADDED Viewed

	@@ -0,0 +1,153 @@

+import os
+import torch
+import numpy as np
+from torch.utils.data import DataLoader
+from os.path import join as pjoin
+from models.mask_transformer.transformer import MaskTransformer
+from models.mask_transformer.transformer_trainer import MaskTransformerTrainer
+from models.vq.model import RVQVAE
+from options.train_option import TrainT2MOptions
+from utils.plot_script import plot_3d_motion
+from utils.motion_process import recover_from_ric
+from utils.get_opt import get_opt
+from utils.fixseed import fixseed
+from utils.paramUtil import t2m_kinematic_chain, kit_kinematic_chain
+from data.t2m_dataset import Text2MotionDataset
+from motion_loaders.dataset_motion_loader import get_dataset_motion_loader
+from models.t2m_eval_wrapper import EvaluatorModelWrapper
+def plot_t2m(data, save_dir, captions, m_lengths):
+    data = train_dataset.inv_transform(data)
+    # print(ep_curves.shape)
+    for i, (caption, joint_data) in enumerate(zip(captions, data)):
+        joint_data = joint_data[:m_lengths[i]]
+        joint = recover_from_ric(torch.from_numpy(joint_data).float(), opt.joints_num).numpy()
+        save_path = pjoin(save_dir, '%02d.mp4'%i)
+        # print(joint.shape)
+        plot_3d_motion(save_path, kinematic_chain, joint, title=caption, fps=20)
+def load_vq_model():
+    opt_path = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'opt.txt')
+    vq_opt = get_opt(opt_path, opt.device)
+    vq_model = RVQVAE(vq_opt,
+                dim_pose,
+                vq_opt.nb_code,
+                vq_opt.code_dim,
+                vq_opt.output_emb_width,
+                vq_opt.down_t,
+                vq_opt.stride_t,
+                vq_opt.width,
+                vq_opt.depth,
+                vq_opt.dilation_growth_rate,
+                vq_opt.vq_act,
+                vq_opt.vq_norm)
+    ckpt = torch.load(pjoin(vq_opt.checkpoints_dir, vq_opt.dataset_name, vq_opt.name, 'model', 'net_best_fid.tar'),
+                            map_location='cpu')
+    model_key = 'vq_model' if 'vq_model' in ckpt else 'net'
+    vq_model.load_state_dict(ckpt[model_key])
+    print(f'Loading VQ Model {opt.vq_name}')
+    return vq_model, vq_opt
+if __name__ == '__main__':
+    parser = TrainT2MOptions()
+    opt = parser.parse()
+    fixseed(opt.seed)
+    opt.device = torch.device("cpu" if opt.gpu_id == -1 else "cuda:" + str(opt.gpu_id))
+    torch.autograd.set_detect_anomaly(True)
+    opt.save_root = pjoin(opt.checkpoints_dir, opt.dataset_name, opt.name)
+    opt.model_dir = pjoin(opt.save_root, 'model')
+    # opt.meta_dir = pjoin(opt.save_root, 'meta')
+    opt.eval_dir = pjoin(opt.save_root, 'animation')
+    opt.log_dir = pjoin('./log/t2m/', opt.dataset_name, opt.name)
+    os.makedirs(opt.model_dir, exist_ok=True)
+    # os.makedirs(opt.meta_dir, exist_ok=True)
+    os.makedirs(opt.eval_dir, exist_ok=True)
+    os.makedirs(opt.log_dir, exist_ok=True)
+    if opt.dataset_name == 't2m':
+        opt.data_root = './dataset/HumanML3D'
+        opt.motion_dir = pjoin(opt.data_root, 'new_joint_vecs')
+        opt.joints_num = 22
+        opt.max_motion_len = 55
+        dim_pose = 263
+        radius = 4
+        fps = 20
+        kinematic_chain = t2m_kinematic_chain
+        dataset_opt_path = './checkpoints/t2m/Comp_v6_KLD005/opt.txt'
+    elif opt.dataset_name == 'kit': #TODO
+        opt.data_root = './dataset/KIT-ML'
+        opt.motion_dir = pjoin(opt.data_root, 'new_joint_vecs')
+        opt.joints_num = 21
+        radius = 240 * 8
+        fps = 12.5
+        dim_pose = 251
+        opt.max_motion_len = 55
+        kinematic_chain = kit_kinematic_chain
+        dataset_opt_path = './checkpoints/kit/Comp_v6_KLD005/opt.txt'
+    else:
+        raise KeyError('Dataset Does Not Exist')
+    opt.text_dir = pjoin(opt.data_root, 'texts')
+    vq_model, vq_opt = load_vq_model()
+    clip_version = 'ViT-B/32'
+    opt.num_tokens = vq_opt.nb_code
+    t2m_transformer = MaskTransformer(code_dim=vq_opt.code_dim,
+                                      cond_mode='text',
+                                      latent_dim=opt.latent_dim,
+                                      ff_size=opt.ff_size,
+                                      num_layers=opt.n_layers,
+                                      num_heads=opt.n_heads,
+                                      dropout=opt.dropout,
+                                      clip_dim=512,
+                                      cond_drop_prob=opt.cond_drop_prob,
+                                      clip_version=clip_version,
+                                      opt=opt)
+    # if opt.fix_token_emb:
+    #     t2m_transformer.load_and_freeze_token_emb(vq_model.quantizer.codebooks[0])
+    all_params = 0
+    pc_transformer = sum(param.numel() for param in t2m_transformer.parameters_wo_clip())
+    # print(t2m_transformer)
+    # print("Total parameters of t2m_transformer net: {:.2f}M".format(pc_transformer / 1000_000))
+    all_params += pc_transformer
+    print('Total parameters of all models: {:.2f}M'.format(all_params / 1000_000))
+    mean = np.load(pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'meta', 'mean.npy'))
+    std = np.load(pjoin(opt.checkpoints_dir, opt.dataset_name, opt.vq_name, 'meta', 'std.npy'))
+    train_split_file = pjoin(opt.data_root, 'train.txt')
+    val_split_file = pjoin(opt.data_root, 'val.txt')
+    train_dataset = Text2MotionDataset(opt, mean, std, train_split_file)
+    val_dataset = Text2MotionDataset(opt, mean, std, val_split_file)
+    train_loader = DataLoader(train_dataset, batch_size=opt.batch_size, num_workers=4, shuffle=True, drop_last=True)
+    val_loader = DataLoader(val_dataset, batch_size=opt.batch_size, num_workers=4, shuffle=True, drop_last=True)
+    eval_val_loader, _ = get_dataset_motion_loader(dataset_opt_path, 32, 'val', device=opt.device)
+    wrapper_opt = get_opt(dataset_opt_path, torch.device('cuda'))
+    eval_wrapper = EvaluatorModelWrapper(wrapper_opt)
+    trainer = MaskTransformerTrainer(opt, t2m_transformer, vq_model)
+    trainer.train(train_loader, val_loader, eval_val_loader, eval_wrapper=eval_wrapper, plot_eval=plot_t2m)