File size: 2,967 Bytes
c61bb99
 
 
 
 
d64b708
 
c61bb99
 
 
 
3686a97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b06c6ea
3686a97
 
 
 
 
 
d64b708
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: mit
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
pipeline_tag: robotics
library_name: pytorch
---

This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
- Library: https://huggingface.co/phython96/ROCKET-1
- Docs: [More Information Needed]
- Paper: https://huggingface.co/papers/2410.17856
- Github: https://github.com/CraftJarvis/ROCKET-1
- Project: https://craftjarvis.github.io/ROCKET-1

## Usage
```python
from rocket.arm.models import ROCKET1
from rocket.stark_tech.env_interface import MinecraftWrapper

model = ROCKET1.from_pretrained("phython96/ROCKET-1").to("cuda")
memory = None
input = {
  "img": torch.rand(224, 224, 3, dtype=torch.uint8), 
  'segment': {
    'obj_id': torch.tensor(6),                              # specify the interaction type
    'obj_mask': torch.zeros(224, 224, dtype=torch.uint8),   # highlight the regions of interest
  }
}
agent_action, memory = model.get_action(input, memory, first=None, input_shape="*")
env_action = MinecraftWrapper.agent_action_to_env(agent_action)

# --------------------- the output --------------------- #
# agent_action = {'buttons': tensor([1], device='cuda:0'), 'camera': tensor([54], device='cuda:0')}
# env_action = {'attack': array(0), 'back': array(0), 'forward': array(0), 'jump': array(0), 'left': array(0), 'right': array(0), 'sneak': array(0), 'sprint': array(0), 'use': array(0), 'drop': array(0), 'inventory': array(0), 'hotbar.1': array(0), 'hotbar.2': array(0), 'hotbar.3': array(0), 'hotbar.4': array(0), 'hotbar.5': array(0), 'hotbar.6': array(0), 'hotbar.7': array(0), 'hotbar.8': array(0), 'hotbar.9': array(0), 'camera': array([-0.61539427, 10.        ])}
```

## Interaction Details

Here are some interaction types:
| interaction | obj_id | function | 
| --- | --- | --- |
| Hunt     | 0 | Approach the animals then kill it. | 
| Mine     | 2 | Approach and mine the target object. |
| Interact | 3 | Approach and right click the target object. | 
| Craft    | 4 | Move the cursor to the item and click on it. |
| Switch   | 5 | Highlight an item in the hotkey bar, then switch to holding state. | 
| Approach | 6 | Approach the target object. |

## Play ROCKET-1 with Gradio
Click the following picture to learn how to play ROCKET-1 with gradio. 
[![](rocket/assets/gradio.png)](https://www.youtube.com/embed/qXLWw81p-Y0)

```sh
cd rocket/arm
python eval_rocket.py --port 8110 --sam-path "/path/to/sam2-ckpt-directory"
```


## Citing ROCKET-1
If you use ROCKET-1 in your research, please use the following BibTeX entry. 

```
@article{cai2024rocket,
  title={ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting},
  author={Cai, Shaofei and Wang, Zihao and Lian, Kewei and Mu, Zhancun and Ma, Xiaojian and Liu, Anji and Liang, Yitao},
  journal={arXiv preprint arXiv:2410.17856},
  year={2024}
}
```