Spaces:
Runtime error
Runtime error
Merge branch 'master' into gradio
Browse files- README.md +8 -1
- envs/__init__.py +94 -2
- envs/mujoco/ant_translator.py +5 -0
- envs/mujoco/halfcheetah_policies.py +15 -0
- envs/mujoco/halfcheetah_translator.py +95 -0
- envs/mujoco/hopper_policies.py +15 -0
- envs/mujoco/hopper_translator.py +84 -0
- envs/mujoco/invertedDoublePendulum_policies.py +15 -0
- envs/mujoco/invertedDoublePendulum_translator.py +68 -0
- envs/mujoco/invertedPendulum_policies.py +15 -0
- envs/mujoco/invertedPendulum_translator.py +73 -0
- envs/mujoco/pusher_policies.py +15 -0
- envs/mujoco/pusher_translator.py +93 -0
- envs/mujoco/reacher_policies.py +15 -0
- envs/mujoco/reacher_translator.py +67 -0
- envs/mujoco/swimmer_policies.py +15 -0
- envs/mujoco/swimmer_translator.py +80 -0
- envs/mujoco/walker2d_policies.py +15 -0
- envs/mujoco/walker2d_translator.py +86 -0
- main_reflexion.py +1 -1
- record_reflexion.csv +8 -1
- test_atari.sh → shell/test_atari.sh +0 -0
- shell/test_mujoco_ant.sh +27 -4
- shell/test_mujoco_halfcheetah.sh +51 -0
- shell/test_mujoco_hopper.sh +28 -0
- shell/test_mujoco_invertedDoublePendulum.sh +27 -0
- shell/test_mujoco_invertedPendulum.sh +25 -0
- shell/test_mujoco_pusher.sh +27 -0
- shell/test_mujoco_reacher.sh +27 -0
- shell/test_mujoco_swimmer.sh +27 -0
- shell/test_mujoco_walker2d.sh +28 -0
- test_reflexion.sh → shell/test_reflexion.sh +0 -0
README.md
CHANGED
@@ -116,6 +116,13 @@ If everything runs smoothly, you have successfully imported the Atari ROMs and s
|
|
116 |
|
117 |
Reference: [StackOverflow answer](https://stackoverflow.com/a/68143504/38626)
|
118 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
119 |
### Visulization with Gradio
|
120 |
|
121 |
> Gradio is an open-source Python package that allows you to quickly build a demo or web application for your machine learning model, API, or any arbitary Python function. You can then share a link to your demo or web application in just a few seconds using Gradio’s built-in sharing features. No JavaScript, CSS, or web hosting experience needed! [from https://www.gradio.app/guides/quickstart]
|
@@ -134,4 +141,4 @@ And then run the following Python file in the root directory:
|
|
134 |
python gradio_reflexion.py
|
135 |
```
|
136 |
|
137 |
-
The visulization web application will open in a browser on http://server-ip-address:7860 if running from a file. If you are running within a notebook, the demo will appear embedded within the notebook.
|
|
|
116 |
|
117 |
Reference: [StackOverflow answer](https://stackoverflow.com/a/68143504/38626)
|
118 |
|
119 |
+
|
120 |
+
### support new env
|
121 |
+
We also support other new env using Gym format, for new env you need to
|
122 |
+
1. Translate your Gym env to TextGym env, make `<your_env>_translator.py, <your_env>policies.py`, put them into `./envs/`, and add your env in `./envs/__init__.py`.
|
123 |
+
2. Add the PPO performance (best or expert) of your env in `./record_reflexion.csv`
|
124 |
+
3. Test it using shell command (recommend using COT, SPP, self-reflexion, and exe under L1&L3 level). Testing examples can be found in `./shell`.
|
125 |
+
|
126 |
### Visulization with Gradio
|
127 |
|
128 |
> Gradio is an open-source Python package that allows you to quickly build a demo or web application for your machine learning model, API, or any arbitary Python function. You can then share a link to your demo or web application in just a few seconds using Gradio’s built-in sharing features. No JavaScript, CSS, or web hosting experience needed! [from https://www.gradio.app/guides/quickstart]
|
|
|
141 |
python gradio_reflexion.py
|
142 |
```
|
143 |
|
144 |
+
The visulization web application will open in a browser on http://server-ip-address:7860 if running from a file. If you are running within a notebook, the demo will appear embedded within the notebook.
|
envs/__init__.py
CHANGED
@@ -18,7 +18,6 @@ from .atari import mspacman_policies, mspacman_translator
|
|
18 |
from .atari import montezumarevenge_policies, montezumarevenge_translator
|
19 |
register_environments()
|
20 |
|
21 |
-
from .mujoco import ant_translator, ant_policies
|
22 |
|
23 |
REGISTRY = {}
|
24 |
REGISTRY["sampling_wrapper"] = SettableStateEnv
|
@@ -139,6 +138,99 @@ REGISTRY["RepresentedMontezumaRevenge_basic_policies"] = [
|
|
139 |
montezumarevenge_policies.dedicated_18_policy,
|
140 |
]
|
141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
142 |
REGISTRY["ant_init_translator"] = ant_translator.GameDescriber
|
143 |
REGISTRY["ant_basic_translator"] = ant_translator.BasicStateSequenceTranslator
|
144 |
-
REGISTRY["ant_policies"] = [ant_policies.pseudo_random_policy, ant_policies.real_random_policy]
|
|
|
18 |
from .atari import montezumarevenge_policies, montezumarevenge_translator
|
19 |
register_environments()
|
20 |
|
|
|
21 |
|
22 |
REGISTRY = {}
|
23 |
REGISTRY["sampling_wrapper"] = SettableStateEnv
|
|
|
138 |
montezumarevenge_policies.dedicated_18_policy,
|
139 |
]
|
140 |
|
141 |
+
REGISTRY["RepresentedMsPacman_init_translator"] = mspacman_translator.GameDescriber
|
142 |
+
REGISTRY["RepresentedMsPacman_basic_translator"] = mspacman_translator.BasicStateSequenceTranslator
|
143 |
+
REGISTRY["RepresentedMsPacman_basic_policies"] = [
|
144 |
+
mspacman_policies.real_random_policy,
|
145 |
+
mspacman_policies.pseudo_random_policy,
|
146 |
+
mspacman_policies.dedicated_1_policy,
|
147 |
+
mspacman_policies.dedicated_2_policy,
|
148 |
+
mspacman_policies.dedicated_3_policy,
|
149 |
+
mspacman_policies.dedicated_4_policy,
|
150 |
+
mspacman_policies.dedicated_5_policy,
|
151 |
+
mspacman_policies.dedicated_6_policy,
|
152 |
+
mspacman_policies.dedicated_7_policy,
|
153 |
+
mspacman_policies.dedicated_8_policy,
|
154 |
+
mspacman_policies.dedicated_9_policy,
|
155 |
+
]
|
156 |
+
|
157 |
+
REGISTRY["RepresentedMontezumaRevenge_init_translator"] = montezumarevenge_translator.GameDescriber
|
158 |
+
REGISTRY["RepresentedMontezumaRevenge_basic_translator"] = montezumarevenge_translator.BasicStateSequenceTranslator
|
159 |
+
REGISTRY["RepresentedMontezumaRevenge_basic_policies"] = [
|
160 |
+
montezumarevenge_policies.real_random_policy,
|
161 |
+
montezumarevenge_policies.pseudo_random_policy,
|
162 |
+
montezumarevenge_policies.dedicated_1_policy,
|
163 |
+
montezumarevenge_policies.dedicated_2_policy,
|
164 |
+
montezumarevenge_policies.dedicated_3_policy,
|
165 |
+
montezumarevenge_policies.dedicated_4_policy,
|
166 |
+
montezumarevenge_policies.dedicated_5_policy,
|
167 |
+
montezumarevenge_policies.dedicated_6_policy,
|
168 |
+
montezumarevenge_policies.dedicated_7_policy,
|
169 |
+
montezumarevenge_policies.dedicated_8_policy,
|
170 |
+
montezumarevenge_policies.dedicated_9_policy,
|
171 |
+
montezumarevenge_policies.dedicated_10_policy,
|
172 |
+
montezumarevenge_policies.dedicated_11_policy,
|
173 |
+
montezumarevenge_policies.dedicated_12_policy,
|
174 |
+
montezumarevenge_policies.dedicated_13_policy,
|
175 |
+
montezumarevenge_policies.dedicated_14_policy,
|
176 |
+
montezumarevenge_policies.dedicated_15_policy,
|
177 |
+
montezumarevenge_policies.dedicated_16_policy,
|
178 |
+
montezumarevenge_policies.dedicated_17_policy,
|
179 |
+
montezumarevenge_policies.dedicated_18_policy,
|
180 |
+
]
|
181 |
+
|
182 |
+
## For mujoco env
|
183 |
+
|
184 |
+
|
185 |
+
from .mujoco import invertedPendulum_translator, invertedPendulum_policies
|
186 |
+
from .mujoco import invertedDoublePendulum_translator, invertedDoublePendulum_policies
|
187 |
+
|
188 |
+
from .mujoco import swimmer_translator, swimmer_policies
|
189 |
+
|
190 |
+
from .mujoco import reacher_translator, reacher_policies
|
191 |
+
|
192 |
+
from .mujoco import hopper_translator, hopper_policies
|
193 |
+
from .mujoco import walker2d_translator, walker2d_policies
|
194 |
+
|
195 |
+
|
196 |
+
|
197 |
+
|
198 |
+
|
199 |
+
REGISTRY["invertedPendulum_init_translator"] = invertedPendulum_translator.GameDescriber
|
200 |
+
REGISTRY["invertedPendulum_basic_translator"] = invertedPendulum_translator.BasicStateSequenceTranslator
|
201 |
+
REGISTRY["invertedPendulum_policies"] = [invertedPendulum_policies.pseudo_random_policy, invertedPendulum_policies.real_random_policy]
|
202 |
+
REGISTRY["invertedDoublePendulum_init_translator"] = invertedDoublePendulum_translator.GameDescriber
|
203 |
+
REGISTRY["invertedDoublePendulum_basic_translator"] = invertedDoublePendulum_translator.BasicStateSequenceTranslator
|
204 |
+
REGISTRY["invertedDoublePendulum_policies"] = [invertedDoublePendulum_policies.pseudo_random_policy, invertedDoublePendulum_policies.real_random_policy]
|
205 |
+
|
206 |
+
|
207 |
+
REGISTRY["swimmer_init_translator"] = swimmer_translator.GameDescriber
|
208 |
+
REGISTRY["swimmer_basic_translator"] = swimmer_translator.BasicStateSequenceTranslator
|
209 |
+
REGISTRY["swimmer_policies"] = [swimmer_policies.pseudo_random_policy, swimmer_policies.real_random_policy]
|
210 |
+
|
211 |
+
REGISTRY["reacher_init_translator"] = reacher_translator.GameDescriber
|
212 |
+
REGISTRY["reacher_basic_translator"] = reacher_translator.BasicStateSequenceTranslator
|
213 |
+
REGISTRY["reacher_policies"] = [reacher_policies.pseudo_random_policy, reacher_policies.real_random_policy]
|
214 |
+
|
215 |
+
REGISTRY["hopper_init_translator"] = hopper_translator.GameDescriber
|
216 |
+
REGISTRY["hopper_basic_translator"] = hopper_translator.BasicStateSequenceTranslator
|
217 |
+
REGISTRY["hopper_policies"] = [hopper_policies.pseudo_random_policy, hopper_policies.real_random_policy]
|
218 |
+
REGISTRY["walker2d_init_translator"] = walker2d_translator.GameDescriber
|
219 |
+
REGISTRY["walker2d_basic_translator"] = walker2d_translator.BasicStateSequenceTranslator
|
220 |
+
REGISTRY["walker2d_policies"] = [walker2d_policies.pseudo_random_policy, walker2d_policies.real_random_policy]
|
221 |
+
|
222 |
+
|
223 |
+
from .mujoco import halfcheetah_translator, halfcheetah_policies
|
224 |
+
REGISTRY["halfcheetah_init_translator"] = halfcheetah_translator.GameDescriber
|
225 |
+
REGISTRY["halfcheetah_basic_translator"] = halfcheetah_translator.BasicStateSequenceTranslator
|
226 |
+
REGISTRY["halfcheetah_policies"] = [halfcheetah_policies.pseudo_random_policy, halfcheetah_policies.real_random_policy]
|
227 |
+
|
228 |
+
from .mujoco import pusher_translator, pusher_policies
|
229 |
+
REGISTRY["pusher_init_translator"] = pusher_translator.GameDescriber
|
230 |
+
REGISTRY["pusher_basic_translator"] = pusher_translator.BasicStateSequenceTranslator
|
231 |
+
REGISTRY["pusher_policies"] = [pusher_policies.pseudo_random_policy, pusher_policies.real_random_policy]
|
232 |
+
|
233 |
+
from .mujoco import ant_translator, ant_policies
|
234 |
REGISTRY["ant_init_translator"] = ant_translator.GameDescriber
|
235 |
REGISTRY["ant_basic_translator"] = ant_translator.BasicStateSequenceTranslator
|
236 |
+
REGISTRY["ant_policies"] = [ant_policies.pseudo_random_policy, ant_policies.real_random_policy]
|
envs/mujoco/ant_translator.py
CHANGED
@@ -1,3 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
1 |
class BasicLevelTranslator:
|
2 |
def __init__(self):
|
3 |
pass
|
|
|
1 |
+
'''Ant
|
2 |
+
Action Space Box(-1.0, 1.0, (8,), float32)
|
3 |
+
Observation Space Box(-inf, inf, (27,), float64)
|
4 |
+
'''
|
5 |
+
|
6 |
class BasicLevelTranslator:
|
7 |
def __init__(self):
|
8 |
pass
|
envs/mujoco/halfcheetah_policies.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import random
|
3 |
+
|
4 |
+
def pseudo_random_policy(state, pre_action):
|
5 |
+
def get_description():
|
6 |
+
return "Select action randomly"
|
7 |
+
pseudo_random_policy.description = get_description()
|
8 |
+
return [2 * random.random() - 1 for i in range(6)]
|
9 |
+
|
10 |
+
|
11 |
+
def real_random_policy(state, pre_action=1):
|
12 |
+
def get_description():
|
13 |
+
return "Select action with a random policy"
|
14 |
+
real_random_policy.description = get_description()
|
15 |
+
return [2 * random.random() - 1 for i in range(6)]
|
envs/mujoco/halfcheetah_translator.py
ADDED
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
class BasicLevelTranslator:
|
2 |
+
def __init__(self):
|
3 |
+
pass
|
4 |
+
|
5 |
+
def translate(self, state):
|
6 |
+
(front_tip_z, front_tip_angle, back_thigh_angle_1, back_shin_angle_1,
|
7 |
+
tip_velocity_x, tip_velocity_y, front_tip_angular_velocity,
|
8 |
+
back_thigh_angular_velocity_1, front_tip_x, front_tip_y, front_tip_angle_2,
|
9 |
+
back_thigh_angle_2, back_shin_angle_2, tip_velocity_angular_x,
|
10 |
+
tip_velocity_angular_y, front_tip_angular_velocity_2,
|
11 |
+
back_thigh_angular_velocity_2) = state[:17]
|
12 |
+
|
13 |
+
res = (
|
14 |
+
f"The front tip is at a z-coordinate of {front_tip_z:.2f} meters. "
|
15 |
+
f"The angle of the front tip is {front_tip_angle:.2f} radians. "
|
16 |
+
f"The angles of the back thigh are {back_thigh_angle_1:.2f} and {back_thigh_angle_2:.2f} radians. "
|
17 |
+
f"The angles of the back shin are {back_shin_angle_1:.2f} and {back_shin_angle_2:.2f} radians. "
|
18 |
+
f"The tip has velocity along the x-axis of {tip_velocity_x:.2f} m/s. "
|
19 |
+
f"The tip has velocity along the y-axis of {tip_velocity_y:.2f} m/s. "
|
20 |
+
f"The angular velocity of the front tip is {front_tip_angular_velocity:.2f} radians/s. "
|
21 |
+
f"The angular velocities of the back thigh are {back_thigh_angular_velocity_1:.2f} and {back_thigh_angular_velocity_2:.2f} radians/s. "
|
22 |
+
f"The x-coordinate of the front tip is {front_tip_x:.2f} meters. "
|
23 |
+
f"The y-coordinate of the front tip is {front_tip_y:.2f} meters. "
|
24 |
+
f"The angle of the front tip is {front_tip_angle_2:.2f} radians. "
|
25 |
+
f"The angular velocity of the tip along the x-axis is {tip_velocity_angular_x:.2f} radians/s. "
|
26 |
+
f"The angular velocity of the tip along the y-axis is {tip_velocity_angular_y:.2f} radians/s. "
|
27 |
+
f"The angular velocity of the back shin is {front_tip_angular_velocity_2:.2f} radians/s."
|
28 |
+
)
|
29 |
+
return res
|
30 |
+
|
31 |
+
class GameDescriber:
|
32 |
+
def __init__(self, args):
|
33 |
+
self.is_only_local_obs = args.is_only_local_obs == 1
|
34 |
+
self.max_episode_len = args.max_episode_len
|
35 |
+
self.action_desc_dict = {
|
36 |
+
}
|
37 |
+
self.reward_desc_dict = {
|
38 |
+
}
|
39 |
+
|
40 |
+
def translate_terminate_state(self, state, episode_len, max_episode_len):
|
41 |
+
return ""
|
42 |
+
|
43 |
+
def translate_potential_next_state(self, state, action):
|
44 |
+
return ""
|
45 |
+
|
46 |
+
def describe_goal(self):
|
47 |
+
return "The goal is to make the Half-Cheetah run forward (right) as fast as possible."
|
48 |
+
|
49 |
+
def describe_game(self):
|
50 |
+
return (
|
51 |
+
"In the Half-Cheetah game, you control a 2-dimensional robot with 9 links and 8 joints. "
|
52 |
+
"The goal is to apply torque to the joints to make the cheetah run forward (right) as fast as possible. "
|
53 |
+
"You can control the back thigh, back shin, and back foot rotors for the back legs, and the front thigh, "
|
54 |
+
"front shin, and front foot rotors for the front legs. The episode ends after 1000 timesteps. "
|
55 |
+
"Your reward is based on how much forward progress you make and how much control effort you apply."
|
56 |
+
)
|
57 |
+
|
58 |
+
def describe_action(self):
|
59 |
+
return (
|
60 |
+
"Your next move: \n"
|
61 |
+
"Please select six numerical values, each one within the range of [-1,1], "
|
62 |
+
"which represents the torque being applied to the back thigh rotor, "
|
63 |
+
"back shin rotor, back foot rotor, front thigh rotor, front shin rotor, "
|
64 |
+
"and front foot rotor respectively."
|
65 |
+
)
|
66 |
+
|
67 |
+
class BasicStateSequenceTranslator(BasicLevelTranslator):
|
68 |
+
def translate(self, infos, is_current=False):
|
69 |
+
descriptions = []
|
70 |
+
if is_current:
|
71 |
+
state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
|
72 |
+
return state_desc
|
73 |
+
for i, info in enumerate(infos):
|
74 |
+
assert 'state' in info, "info should contain state information"
|
75 |
+
|
76 |
+
state_desc = BasicLevelTranslator().translate(info['state'])
|
77 |
+
action_desc = (
|
78 |
+
"Take Action: "
|
79 |
+
"Apply Back Thigh Torque: {:.2f}, "
|
80 |
+
"Apply Back Shin Torque: {:.2f}, "
|
81 |
+
"Apply Back Foot Torque: {:.2f}, "
|
82 |
+
"Apply Front Thigh Torque: {:.2f}, "
|
83 |
+
"Apply Front Shin Torque: {:.2f}, "
|
84 |
+
"Apply Front Foot Torque: {:.2f}"
|
85 |
+
).format(
|
86 |
+
info['action'][0], info['action'][1], info['action'][2],
|
87 |
+
info['action'][3], info['action'][4], info['action'][5]
|
88 |
+
)
|
89 |
+
|
90 |
+
reward_desc = f"Result: Forward Reward of {info['forward_reward']:.2f}, "
|
91 |
+
ctrl_cost_desc = f"Control Cost of {info['ctrl_cost']:.2f}, "
|
92 |
+
total_reward_desc = f"Total Reward of {info['reward']:.2f}, "
|
93 |
+
next_state_desc = BasicLevelTranslator().translate(info['next_state'])
|
94 |
+
descriptions.append(f"{state_desc}.\\n {action_desc} \\n {reward_desc} {ctrl_cost_desc} {total_reward_desc} \\n Transit to {next_state_desc}")
|
95 |
+
return descriptions
|
envs/mujoco/hopper_policies.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import random
|
3 |
+
|
4 |
+
def pseudo_random_policy(state, pre_action):
|
5 |
+
def get_description():
|
6 |
+
return "Select action randomly"
|
7 |
+
pseudo_random_policy.description = get_description()
|
8 |
+
return [2 * random.random() - 1 for i in range(3)]
|
9 |
+
|
10 |
+
|
11 |
+
def real_random_policy(state, pre_action=1):
|
12 |
+
def get_description():
|
13 |
+
return "Select action with a random policy"
|
14 |
+
real_random_policy.description = get_description()
|
15 |
+
return [2 * random.random() - 1 for i in range(3)]
|
envs/mujoco/hopper_translator.py
ADDED
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
'''
|
2 |
+
Action Space Box(-1.0, 1.0, (3,), float32)
|
3 |
+
Observation Space Box(-inf, inf, (11,), float64)
|
4 |
+
'''
|
5 |
+
|
6 |
+
class BasicLevelTranslator:
|
7 |
+
def __init__(self):
|
8 |
+
pass
|
9 |
+
|
10 |
+
def translate(self, state):
|
11 |
+
(top_z, top_angle, thigh_angle, leg_angle, foot_angle,
|
12 |
+
top_x_velocity, top_z_velocity, top_angular_velocity,
|
13 |
+
thigh_angular_velocity, leg_angular_velocity, foot_angular_velocity) = state[:11]
|
14 |
+
|
15 |
+
res = (
|
16 |
+
f"The top is at a z-coordinate of {top_z:.2f} meters. "
|
17 |
+
f"The angle of the top is {top_angle:.2f} radians. "
|
18 |
+
f"The angle of the thigh joint is {thigh_angle:.2f} radians. "
|
19 |
+
f"The angle of the leg joint is {leg_angle:.2f} radians. "
|
20 |
+
f"The angle of the foot joint is {foot_angle:.2f} radians. "
|
21 |
+
f"The x-coordinate velocity of the top is {top_x_velocity:.2f} m/s. "
|
22 |
+
f"The z-coordinate (height) velocity of the top is {top_z_velocity:.2f} m/s. "
|
23 |
+
f"The angular velocity of the top is {top_angular_velocity:.2f} radians/s. "
|
24 |
+
f"The angular velocity of the thigh hinge is {thigh_angular_velocity:.2f} radians/s. "
|
25 |
+
f"The angular velocity of the leg hinge is {leg_angular_velocity:.2f} radians/s. "
|
26 |
+
f"The angular velocity of the foot hinge is {foot_angular_velocity:.2f} radians/s."
|
27 |
+
)
|
28 |
+
return res
|
29 |
+
|
30 |
+
class GameDescriber:
|
31 |
+
def __init__(self, args):
|
32 |
+
self.is_only_local_obs = args.is_only_local_obs == 1
|
33 |
+
self.max_episode_len = args.max_episode_len
|
34 |
+
self.action_desc_dict = {}
|
35 |
+
self.reward_desc_dict = {}
|
36 |
+
|
37 |
+
def translate_terminate_state(self, state, episode_len, max_episode_len):
|
38 |
+
return ""
|
39 |
+
|
40 |
+
def translate_potential_next_state(self, state, action):
|
41 |
+
return ""
|
42 |
+
|
43 |
+
def describe_goal(self):
|
44 |
+
return (
|
45 |
+
"The goal in the Hopper environment is to make the one-legged hopper move forward (right) "
|
46 |
+
"by applying torques to the thigh, leg, and foot joints."
|
47 |
+
)
|
48 |
+
|
49 |
+
def describe_game(self):
|
50 |
+
return (
|
51 |
+
"In the Hopper environment, you control a one-legged hopper consisting of a torso, thigh, leg, "
|
52 |
+
"and a foot on which it rests. Your objective is to apply torques to the thigh, leg, and foot joints "
|
53 |
+
"to make the hopper perform hops in the positive x-direction. The environment provides observations "
|
54 |
+
"of the hopper's body parts and velocities, including the height, angles of joints, and angular velocities. "
|
55 |
+
"The episode ends when certain termination conditions are met."
|
56 |
+
)
|
57 |
+
|
58 |
+
def describe_action(self):
|
59 |
+
return (
|
60 |
+
"Your next move: \n Please provide a list of three numerical values, each within the range of [-1,1], "
|
61 |
+
"representing the torques to be applied at the thigh, leg, and foot joints of the hopper."
|
62 |
+
)
|
63 |
+
|
64 |
+
|
65 |
+
class BasicStateSequenceTranslator(BasicLevelTranslator):
|
66 |
+
def translate(self, infos, is_current=False):
|
67 |
+
descriptions = []
|
68 |
+
if is_current:
|
69 |
+
state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
|
70 |
+
return state_desc
|
71 |
+
for i, info in enumerate(infos):
|
72 |
+
assert 'state' in info, "info should contain state information"
|
73 |
+
|
74 |
+
state_desc = BasicLevelTranslator().translate(info['state'])
|
75 |
+
action_desc = (
|
76 |
+
f"Take Action: Apply Thigh Torque: {info['action'][0]:.2f}, "
|
77 |
+
f"Leg Torque: {info['action'][1]:.2f}, Foot Torque: {info['action'][2]:.2f}"
|
78 |
+
)
|
79 |
+
|
80 |
+
reward_desc = f"Result: Reward of {info['reward']:.2f}, "
|
81 |
+
next_state_desc = BasicLevelTranslator().translate(info['next_state'])
|
82 |
+
descriptions.append(f"{state_desc}.\\n {action_desc} \\n {reward_desc} \\n Transit to {next_state_desc}")
|
83 |
+
return descriptions
|
84 |
+
|
envs/mujoco/invertedDoublePendulum_policies.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import random
|
3 |
+
|
4 |
+
def pseudo_random_policy(state, pre_action):
|
5 |
+
def get_description():
|
6 |
+
return "Select action randomly"
|
7 |
+
pseudo_random_policy.description = get_description()
|
8 |
+
return [6 * random.random() - 3 for i in range(1)]
|
9 |
+
|
10 |
+
|
11 |
+
def real_random_policy(state, pre_action=1):
|
12 |
+
def get_description():
|
13 |
+
return "Select action with a random policy"
|
14 |
+
real_random_policy.description = get_description()
|
15 |
+
return [6 * random.random() - 3 for i in range(1)]
|
envs/mujoco/invertedDoublePendulum_translator.py
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
'''InvertedDoublePendulum-v4
|
2 |
+
Action Space Box(-1.0, 1.0, (1,), float32)
|
3 |
+
Observation Space Box(-inf, inf, (11,), float64)
|
4 |
+
'''
|
5 |
+
|
6 |
+
class BasicLevelTranslator:
|
7 |
+
def translate(self, state):
|
8 |
+
res = (
|
9 |
+
f"Position of the cart: {state[0]:.2f} m\n"
|
10 |
+
f"Vertical angle of the pole: {state[1]:.2f} rad\n"
|
11 |
+
f"Linear velocity of the cart: {state[2]:.2f} m/s\n"
|
12 |
+
f"Angular velocity of the pole: {state[3]:.2f} rad/s"
|
13 |
+
)
|
14 |
+
return res
|
15 |
+
|
16 |
+
class GameDescriber:
|
17 |
+
def __init__(self, args):
|
18 |
+
self.is_only_local_obs = args.is_only_local_obs == 1
|
19 |
+
self.max_episode_len = args.max_episode_len
|
20 |
+
self.action_desc_dict = {
|
21 |
+
0: "Apply a force in the range [-1, 1] to the cart to control its motion.",
|
22 |
+
}
|
23 |
+
self.reward_desc_dict = {}
|
24 |
+
|
25 |
+
def translate_terminate_state(self, state, episode_len, max_episode_len):
|
26 |
+
return ""
|
27 |
+
|
28 |
+
def translate_potential_next_state(self, state, action):
|
29 |
+
return ""
|
30 |
+
|
31 |
+
def describe_goal(self):
|
32 |
+
return (
|
33 |
+
"The goal in the Inverted Pendulum environment is to balance the pole on top of the cart "\
|
34 |
+
"by applying continuous forces to the cart, keeping it upright."
|
35 |
+
)
|
36 |
+
|
37 |
+
def describe_game(self):
|
38 |
+
return (
|
39 |
+
"In the Inverted Pendulum environment, you control a cart that can move linearly with a pole "\
|
40 |
+
"attached to it. Your objective is to balance the pole on top of the cart by applying forces "\
|
41 |
+
"to the cart in a way that keeps the pole upright. "\
|
42 |
+
"The environment provides observations of the cart's position, pole angle, velocities, "\
|
43 |
+
"and angular velocities. The goal is to maintain balance as long as possible."
|
44 |
+
)
|
45 |
+
|
46 |
+
def describe_action(self):
|
47 |
+
return (
|
48 |
+
"Your next move: \n Please provide a numerical value for the force to be applied to the cart. "\
|
49 |
+
"This value should be within the range of [-3, 3], where a positive value indicates applying force "\
|
50 |
+
"in the right direction, and a negative value indicates applying force in the left direction."
|
51 |
+
)
|
52 |
+
|
53 |
+
class BasicStateSequenceTranslator(BasicLevelTranslator):
|
54 |
+
def translate(self, infos, is_current=False):
|
55 |
+
descriptions = []
|
56 |
+
if is_current:
|
57 |
+
state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
|
58 |
+
return state_desc
|
59 |
+
for i, info in enumerate(infos):
|
60 |
+
assert 'state' in info, "info should contain state information"
|
61 |
+
state_desc = BasicLevelTranslator().translate(info['state'])
|
62 |
+
action_desc = f"Applied Force on Cart: {info['action'][0]:.2f}"
|
63 |
+
reward_desc = f"Result: Reward of {info['reward']:.2f}"
|
64 |
+
next_state_desc = BasicLevelTranslator().translate(info['next_state'])
|
65 |
+
descriptions.append(
|
66 |
+
f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to\n{next_state_desc}"
|
67 |
+
)
|
68 |
+
return descriptions
|
envs/mujoco/invertedPendulum_policies.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import random
|
3 |
+
|
4 |
+
def pseudo_random_policy(state, pre_action):
|
5 |
+
def get_description():
|
6 |
+
return "Select action randomly"
|
7 |
+
pseudo_random_policy.description = get_description()
|
8 |
+
return [2 * random.random() - 1 for i in range(1)]
|
9 |
+
|
10 |
+
|
11 |
+
def real_random_policy(state, pre_action=1):
|
12 |
+
def get_description():
|
13 |
+
return "Select action with a random policy"
|
14 |
+
real_random_policy.description = get_description()
|
15 |
+
return [2 * random.random() - 1 for i in range(1)]
|
envs/mujoco/invertedPendulum_translator.py
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
'''InvertedPendulum-v4
|
2 |
+
Action Space Box(-3.0, 3.0, (1,), float32)
|
3 |
+
Observation Space Box(-inf, inf, (4,), float64)
|
4 |
+
'''
|
5 |
+
|
6 |
+
class BasicLevelTranslator:
|
7 |
+
def translate(self, state):
|
8 |
+
res = (
|
9 |
+
f"Position of the cart: {state[0]:.2f} m\n"
|
10 |
+
f"Sine of the angle between cart and first pole: {state[1]:.2f}\n"
|
11 |
+
f"Sine of the angle between two poles: {state[2]:.2f}\n"
|
12 |
+
f"Cosine of the angle between cart and first pole: {state[3]:.2f}\n"
|
13 |
+
f"Cosine of the angle between two poles: {state[4]:.2f}\n"
|
14 |
+
f"Velocity of the cart: {state[5]:.2f} m/s\n"
|
15 |
+
f"Angular velocity of angle between cart and first pole: {state[6]:.2f} rad/s\n"
|
16 |
+
f"Angular velocity of angle between two poles: {state[7]:.2f} rad/s\n"
|
17 |
+
f"Constraint Force 1: {state[8]:.2f} N\n"
|
18 |
+
f"Constraint Force 2: {state[9]:.2f} N\n"
|
19 |
+
f"Constraint Force 3: {state[10]:.2f} N"
|
20 |
+
)
|
21 |
+
return res
|
22 |
+
|
23 |
+
class GameDescriber:
|
24 |
+
def __init__(self, args):
|
25 |
+
self.is_only_local_obs = args.is_only_local_obs == 1
|
26 |
+
self.max_episode_len = args.max_episode_len
|
27 |
+
self.action_desc_dict = {
|
28 |
+
0: "Apply a force in the range [-3, 3] to the cart to control its motion.",
|
29 |
+
}
|
30 |
+
self.reward_desc_dict = {}
|
31 |
+
|
32 |
+
def translate_terminate_state(self, state, episode_len, max_episode_len):
|
33 |
+
return ""
|
34 |
+
|
35 |
+
def translate_potential_next_state(self, state, action):
|
36 |
+
return ""
|
37 |
+
|
38 |
+
def describe_goal(self):
|
39 |
+
return (
|
40 |
+
"The goal in the InvertedDoublePendulum environment is to balance the two poles "\
|
41 |
+
"on top of the cart by applying continuous forces on the cart."
|
42 |
+
)
|
43 |
+
|
44 |
+
def describe_game(self):
|
45 |
+
return (
|
46 |
+
"In the InvertedDoublePendulum environment, you control a system with a cart and two poles. "\
|
47 |
+
"Your objective is to balance the two poles on top of the cart by applying continuous forces "\
|
48 |
+
"to the cart. The environment provides observations of the cart's position, angles of the poles, "\
|
49 |
+
"and their angular velocities. The episode ends when certain termination conditions are met."
|
50 |
+
)
|
51 |
+
|
52 |
+
def describe_action(self):
|
53 |
+
return (
|
54 |
+
"Your next move: \n Please provide a numerical value within the range of [-3,3], "\
|
55 |
+
"representing the force to be applied to the cart."
|
56 |
+
)
|
57 |
+
|
58 |
+
class BasicStateSequenceTranslator(BasicLevelTranslator):
|
59 |
+
def translate(self, infos, is_current=False):
|
60 |
+
descriptions = []
|
61 |
+
if is_current:
|
62 |
+
state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
|
63 |
+
return state_desc
|
64 |
+
for i, info in enumerate(infos):
|
65 |
+
assert 'state' in info, "info should contain state information"
|
66 |
+
state_desc = BasicLevelTranslator().translate(info['state'])
|
67 |
+
action_desc = f"Applied Force on Cart: {info['action'][0]:.2f}"
|
68 |
+
reward_desc = f"Result: Reward of {info['reward']:.2f}"
|
69 |
+
next_state_desc = BasicLevelTranslator().translate(info['next_state'])
|
70 |
+
descriptions.append(
|
71 |
+
f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to\n{next_state_desc}"
|
72 |
+
)
|
73 |
+
return descriptions
|
envs/mujoco/pusher_policies.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import random
|
3 |
+
|
4 |
+
def pseudo_random_policy(state, pre_action):
|
5 |
+
def get_description():
|
6 |
+
return "Select action randomly"
|
7 |
+
pseudo_random_policy.description = get_description()
|
8 |
+
return [4 * random.random() - 2 for i in range(7)]
|
9 |
+
|
10 |
+
|
11 |
+
def real_random_policy(state, pre_action=1):
|
12 |
+
def get_description():
|
13 |
+
return "Select action with a random policy"
|
14 |
+
real_random_policy.description = get_description()
|
15 |
+
return [4 * random.random() - 2 for i in range(7)]
|
envs/mujoco/pusher_translator.py
ADDED
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
'''Pusher
|
2 |
+
Action Space Box(-2.0, 2.0, (7,), float32)
|
3 |
+
Observation Space Box(-inf, inf, (23,), float64)
|
4 |
+
'''
|
5 |
+
import math
|
6 |
+
|
7 |
+
class BasicLevelTranslator:
|
8 |
+
def __init__(self):
|
9 |
+
pass
|
10 |
+
|
11 |
+
def translate(self, state):
|
12 |
+
|
13 |
+
joint_angles = state[:7]
|
14 |
+
joint_velocities = state[7:14]
|
15 |
+
fingertip_coords = state[14:17]
|
16 |
+
object_coords = state[17:20]
|
17 |
+
goal_coords = state[20:]
|
18 |
+
|
19 |
+
joint_angle_degrees = [math.degrees(angle) for angle in joint_angles]
|
20 |
+
joint_velocity_degrees = [math.degrees(velocity) for velocity in joint_velocities]
|
21 |
+
|
22 |
+
res = (f"Rotation of the panning shoulder: {joint_angle_degrees[0]:.2f} degrees, "
|
23 |
+
f"Rotation of the shoulder lifting joint: {joint_angle_degrees[1]:.2f} degrees, "
|
24 |
+
f"Rotation of the shoulder rolling joint: {joint_angle_degrees[2]:.2f} degrees, "
|
25 |
+
f"Rotation of the elbow joint: {joint_angle_degrees[3]:.2f} degrees, "
|
26 |
+
f"Rotation of the forearm rolling joint: {joint_angle_degrees[4]:.2f} degrees, "
|
27 |
+
f"Rotation of the wrist flexing joint: {joint_angle_degrees[5]:.2f} degrees, "
|
28 |
+
f"Rotation of the wrist rolling joint: {joint_angle_degrees[6]:.2f} degrees, "
|
29 |
+
f"Rotational velocity of the panning shoulder: {joint_velocity_degrees[0]:.2f} degrees/s, "
|
30 |
+
f"Rotational velocity of the shoulder lifting joint: {joint_velocity_degrees[1]:.2f} degrees/s, "
|
31 |
+
f"Rotational velocity of the shoulder rolling joint: {joint_velocity_degrees[2]:.2f} degrees/s, "
|
32 |
+
f"Rotational velocity of the elbow joint: {joint_velocity_degrees[3]:.2f} degrees/s, "
|
33 |
+
f"Rotational velocity of the forearm rolling joint: {joint_velocity_degrees[4]:.2f} degrees/s, "
|
34 |
+
f"Rotational velocity of the wrist flexing joint: {joint_velocity_degrees[5]:.2f} degrees/s, "
|
35 |
+
f"Rotational velocity of the wrist rolling joint: {joint_velocity_degrees[6]:.2f} degrees/s, "
|
36 |
+
f"Fingertip coordinates (x, y, z): ({fingertip_coords[0]:.2f}, {fingertip_coords[1]:.2f}, {fingertip_coords[2]:.2f}), "
|
37 |
+
f"Object coordinates (x, y, z): ({object_coords[0]:.2f}, {object_coords[1]:.2f}, {object_coords[2]:.2f}), "
|
38 |
+
f"Goal coordinates (x, y, z): ({goal_coords[0]:.2f}, {goal_coords[1]:.2f}, {goal_coords[2]:.2f}).")
|
39 |
+
return res
|
40 |
+
|
41 |
+
|
42 |
+
class GameDescriber:
|
43 |
+
def __init__(self, args):
|
44 |
+
self.is_only_local_obs = args.is_only_local_obs == 1
|
45 |
+
self.max_episode_len = args.max_episode_len
|
46 |
+
self.action_desc_dict = {
|
47 |
+
}
|
48 |
+
self.reward_desc_dict = {
|
49 |
+
}
|
50 |
+
|
51 |
+
def translate_terminate_state(self, state, episode_len, max_episode_len):
|
52 |
+
return ""
|
53 |
+
|
54 |
+
def translate_potential_next_state(self, state, action):
|
55 |
+
return ""
|
56 |
+
|
57 |
+
def describe_goal(self):
|
58 |
+
return "The goal is to move the target cylinder (object) to the goal position using the robot's end effector (fingertip)."
|
59 |
+
|
60 |
+
def describe_game(self):
|
61 |
+
return ("In the Pusher game, you control a multi-jointed robot arm to manipulate a target cylinder (object) "
|
62 |
+
"and place it in a goal position using the robot's fingertip (end effector). The robot has shoulder, elbow, "
|
63 |
+
"forearm, and wrist joints that you can control with torque values. The observation space includes joint angles, "
|
64 |
+
"angular velocities of joints, fingertip coordinates, object coordinates, and goal coordinates. The reward is "
|
65 |
+
"based on the distance between the fingertip and the object, the distance between the object and the goal, "
|
66 |
+
"and control penalties for large actions.")
|
67 |
+
|
68 |
+
def describe_action(self):
|
69 |
+
return ("Your next move: \n Please provide a list of 7 numerical values within the range [-2, 2], "
|
70 |
+
"representing the torques applied to the robot's joints (shoulder, elbow, forearm, and wrist).")
|
71 |
+
|
72 |
+
|
73 |
+
class BasicStateSequenceTranslator(BasicLevelTranslator):
|
74 |
+
def translate(self, infos, is_current=False):
|
75 |
+
descriptions = []
|
76 |
+
if is_current:
|
77 |
+
state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
|
78 |
+
return state_desc
|
79 |
+
for info in infos:
|
80 |
+
assert 'state' in info, "info should contain state information"
|
81 |
+
|
82 |
+
state_desc = BasicLevelTranslator().translate(info['state'])
|
83 |
+
action_desc = ("Take Action: Apply Torques - "
|
84 |
+
"Shoulder Pan: {:.2f}, Shoulder Lift: {:.2f}, Shoulder Roll: {:.2f}, "
|
85 |
+
"Elbow Flex: {:.2f}, Forearm Roll: {:.2f}, Wrist Flex: {:.2f}, Wrist Roll: {:.2f}"
|
86 |
+
).format(info['action'][0], info['action'][1], info['action'][2], info['action'][3],
|
87 |
+
info['action'][4], info['action'][5], info['action'][6])
|
88 |
+
|
89 |
+
reward_desc = f"Result: Reward of {info['reward']:.2f}"
|
90 |
+
next_state_desc = BasicLevelTranslator().translate(info['next_state'])
|
91 |
+
descriptions.append(f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to {next_state_desc}")
|
92 |
+
return descriptions
|
93 |
+
|
envs/mujoco/reacher_policies.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import random
|
3 |
+
|
4 |
+
def pseudo_random_policy(state, pre_action):
|
5 |
+
def get_description():
|
6 |
+
return "Select action randomly"
|
7 |
+
pseudo_random_policy.description = get_description()
|
8 |
+
return [2 * random.random() - 1 for i in range(2)]
|
9 |
+
|
10 |
+
|
11 |
+
def real_random_policy(state, pre_action=1):
|
12 |
+
def get_description():
|
13 |
+
return "Select action with a random policy"
|
14 |
+
real_random_policy.description = get_description()
|
15 |
+
return [2 * random.random() - 1 for i in range(2)]
|
envs/mujoco/reacher_translator.py
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
'''Reacher
|
2 |
+
Action Space Box(-1.0, 1.0, (2,), float32)
|
3 |
+
|
4 |
+
Observation Space Box(-inf, inf, (11,), float64)
|
5 |
+
'''
|
6 |
+
class BasicLevelTranslator:
|
7 |
+
def __init__(self):
|
8 |
+
pass
|
9 |
+
|
10 |
+
def translate(self, state):
|
11 |
+
(cos_angle_arm1, cos_angle_arm2, sin_angle_arm1, sin_angle_arm2,
|
12 |
+
target_x, target_y, angular_vel_arm1, angular_vel_arm2,
|
13 |
+
diff_x, diff_y, diff_z) = state
|
14 |
+
|
15 |
+
res = (f"Arm1 has a cosine angle of {cos_angle_arm1:.2f} and a sine angle of {sin_angle_arm1:.2f}. "\
|
16 |
+
f"Arm2 has a cosine angle of {cos_angle_arm2:.2f} and a sine angle of {sin_angle_arm2:.2f}. "\
|
17 |
+
f"Target position is at ({target_x:.2f}, {target_y:.2f}). "\
|
18 |
+
f"Arm1's angular velocity is {angular_vel_arm1:.2f} rad/s, and Arm2's is {angular_vel_arm2:.2f} rad/s. "\
|
19 |
+
f"Vector difference between fingertip and target is ({diff_x:.2f}, {diff_y:.2f}, {diff_z:.2f}).")
|
20 |
+
return res
|
21 |
+
|
22 |
+
class GameDescriber:
|
23 |
+
def __init__(self, args):
|
24 |
+
self.is_only_local_obs = args.is_only_local_obs == 1
|
25 |
+
self.max_episode_len = args.max_episode_len
|
26 |
+
self.action_desc_dict = {
|
27 |
+
}
|
28 |
+
self.reward_desc_dict = {
|
29 |
+
}
|
30 |
+
|
31 |
+
def translate_terminate_state(self, state, episode_len, max_episode_len):
|
32 |
+
return ""
|
33 |
+
|
34 |
+
def translate_potential_next_state(self, state, action):
|
35 |
+
return ""
|
36 |
+
|
37 |
+
def describe_goal(self):
|
38 |
+
return "The goal is to control a two-jointed robot arm to move its end effector (fingertip) close to a randomly spawned target."
|
39 |
+
|
40 |
+
def describe_game(self):
|
41 |
+
return ("In the Reacher game, you control a two-jointed robot arm. The objective is to maneuver the arm's fingertip close to a target. "\
|
42 |
+
"The observation space includes the cosine and sine of the arm angles, coordinates of the target, angular velocities of the arms, "\
|
43 |
+
"and the vector from the fingertip to the target. The episode ends after 50 timesteps or if any state space value becomes non-finite. "\
|
44 |
+
"Rewards are given based on the distance of the fingertip from the target and the magnitude of actions applied.")
|
45 |
+
|
46 |
+
def describe_action(self):
|
47 |
+
return ("Your next move: \n Please provide two numerical values representing the torques applied at the two hinge joints. "\
|
48 |
+
"Each value should be within the range of [-1, 1].")
|
49 |
+
|
50 |
+
class BasicStateSequenceTranslator(BasicLevelTranslator):
|
51 |
+
def translate(self, infos, is_current=False):
|
52 |
+
descriptions = []
|
53 |
+
if is_current:
|
54 |
+
state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
|
55 |
+
return state_desc
|
56 |
+
for i, info in enumerate(infos):
|
57 |
+
assert 'state' in info, "info should contain state information"
|
58 |
+
|
59 |
+
state_desc = BasicLevelTranslator().translate(info['state'])
|
60 |
+
action_desc = ("Take Action: Apply Torque at Joint 1: {:.2f}, "
|
61 |
+
"Joint 2 Torque: {:.2f}"
|
62 |
+
).format(info['action'][0], info['action'][1])
|
63 |
+
|
64 |
+
reward_desc = f"Result: Reward of {info['reward']:.2f}, "
|
65 |
+
next_state_desc = BasicLevelTranslator().translate(info['next_state'])
|
66 |
+
descriptions.append(f"{state_desc}.\\n {action_desc} \\n {reward_desc} \\n Transit to {next_state_desc}")
|
67 |
+
return descriptions
|
envs/mujoco/swimmer_policies.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import random
|
3 |
+
|
4 |
+
def pseudo_random_policy(state, pre_action):
|
5 |
+
def get_description():
|
6 |
+
return "Select action randomly"
|
7 |
+
pseudo_random_policy.description = get_description()
|
8 |
+
return [2 * random.random() - 1 for i in range(2)]
|
9 |
+
|
10 |
+
|
11 |
+
def real_random_policy(state, pre_action=1):
|
12 |
+
def get_description():
|
13 |
+
return "Select action with a random policy"
|
14 |
+
real_random_policy.description = get_description()
|
15 |
+
return [2 * random.random() - 1 for i in range(2)]
|
envs/mujoco/swimmer_translator.py
ADDED
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
'''Swimmer
|
2 |
+
Action Space Box(-1.0, 1.0, (2,), float32)
|
3 |
+
|
4 |
+
Observation Space Box(-inf, inf, (8,), float64)
|
5 |
+
'''
|
6 |
+
|
7 |
+
class BasicLevelTranslator:
|
8 |
+
def translate(self, state):
|
9 |
+
res = (
|
10 |
+
f"Angle of the front tip: {state[0]:.2f} rad\n"
|
11 |
+
f"Angle of the first rotor: {state[1]:.2f} rad\n"
|
12 |
+
f"Angle of the second rotor: {state[2]:.2f} rad\n"
|
13 |
+
f"Velocity of the tip along the x-axis: {state[3]:.2f} m/s\n"
|
14 |
+
f"Velocity of the tip along the y-axis: {state[4]:.2f} m/s\n"
|
15 |
+
f"Angular velocity of front tip: {state[5]:.2f} rad/s\n"
|
16 |
+
f"Angular velocity of the first rotor: {state[6]:.2f} rad/s\n"
|
17 |
+
f"Angular velocity of the second rotor: {state[7]:.2f} rad/s"
|
18 |
+
)
|
19 |
+
return res
|
20 |
+
|
21 |
+
class GameDescriber:
|
22 |
+
|
23 |
+
def __init__(self, args):
|
24 |
+
self.is_only_local_obs = args.is_only_local_obs == 1
|
25 |
+
self.max_episode_len = args.max_episode_len
|
26 |
+
self.action_desc_dict = {
|
27 |
+
}
|
28 |
+
self.reward_desc_dict = {
|
29 |
+
}
|
30 |
+
|
31 |
+
def translate_terminate_state(self, state, episode_len, max_episode_len):
|
32 |
+
return ""
|
33 |
+
|
34 |
+
def translate_potential_next_state(self, state, action):
|
35 |
+
return ""
|
36 |
+
|
37 |
+
def describe_goal(self):
|
38 |
+
return (
|
39 |
+
"The goal in the Swimmer environment is to move as fast as possible towards the right "\
|
40 |
+
"by applying torque to the rotors and utilizing fluid friction. The swimmer consists of "\
|
41 |
+
"three or more segments connected by rotors, and the objective is to achieve efficient "\
|
42 |
+
"swimming motion."
|
43 |
+
)
|
44 |
+
|
45 |
+
def describe_game(self):
|
46 |
+
return (
|
47 |
+
"In the Swimmer environment, you control a swimmer consisting of three or more segments "\
|
48 |
+
"connected by rotors. Your goal is to make the swimmer move as fast as possible to the right "\
|
49 |
+
"in a two-dimensional pool. You can achieve this by applying torques to the rotors and utilizing "\
|
50 |
+
"fluid friction. The environment provides observations of the swimmer's angles, velocities, "\
|
51 |
+
"and angular velocities."
|
52 |
+
)
|
53 |
+
|
54 |
+
def describe_action(self):
|
55 |
+
return (
|
56 |
+
"Your next move: \nPlease provide a list of two numerical values, each within the range of [-1, 1], "\
|
57 |
+
"representing the torques to be applied to the two rotors of the swimmer. These torques will help "\
|
58 |
+
"control the swimmer's movement and achieve efficient swimming."
|
59 |
+
)
|
60 |
+
|
61 |
+
|
62 |
+
class BasicStateSequenceTranslator(BasicLevelTranslator):
|
63 |
+
def translate(self, infos, is_current=False):
|
64 |
+
descriptions = []
|
65 |
+
if is_current:
|
66 |
+
state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
|
67 |
+
return state_desc
|
68 |
+
for i, info in enumerate(infos):
|
69 |
+
assert 'state' in info, "info should contain state information"
|
70 |
+
state_desc = BasicLevelTranslator().translate(info['state'])
|
71 |
+
action_desc = (
|
72 |
+
"Torques Applied: "
|
73 |
+
f"Rotor 1: {info['action'][0]:.2f}, Rotor 2: {info['action'][1]:.2f}"
|
74 |
+
)
|
75 |
+
reward_desc = f"Result: Reward of {info['reward']:.2f}"
|
76 |
+
next_state_desc = BasicLevelTranslator().translate(info['next_state'])
|
77 |
+
descriptions.append(
|
78 |
+
f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to\n{next_state_desc}"
|
79 |
+
)
|
80 |
+
return descriptions
|
envs/mujoco/walker2d_policies.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import random
|
3 |
+
|
4 |
+
def pseudo_random_policy(state, pre_action):
|
5 |
+
def get_description():
|
6 |
+
return "Select action randomly"
|
7 |
+
pseudo_random_policy.description = get_description()
|
8 |
+
return [2 * random.random() - 1 for i in range(6)]
|
9 |
+
|
10 |
+
|
11 |
+
def real_random_policy(state, pre_action=1):
|
12 |
+
def get_description():
|
13 |
+
return "Select action with a random policy"
|
14 |
+
real_random_policy.description = get_description()
|
15 |
+
return [2 * random.random() - 1 for i in range(6)]
|
envs/mujoco/walker2d_translator.py
ADDED
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
'''Walker2d
|
3 |
+
Action Space Box(-1.0, 1.0, (6,), float32)
|
4 |
+
Observation Space Box(-inf, inf, (17,), float64)
|
5 |
+
'''
|
6 |
+
class BasicLevelTranslator:
|
7 |
+
def translate(self, state):
|
8 |
+
res = (
|
9 |
+
f"Z-coordinate of the top (height of walker): {state[0]:.2f} m\n"
|
10 |
+
f"Angle of the top: {state[1]:.2f} rad\n"
|
11 |
+
f"Angle of the thigh joint: {state[2]:.2f} rad\n"
|
12 |
+
f"Angle of the leg joint: {state[3]:.2f} rad\n"
|
13 |
+
f"Angle of the foot joint: {state[4]:.2f} rad\n"
|
14 |
+
f"Angle of the left thigh joint: {state[5]:.2f} rad\n"
|
15 |
+
f"Angle of the left leg joint: {state[6]:.2f} rad\n"
|
16 |
+
f"Angle of the left foot joint: {state[7]:.2f} rad\n"
|
17 |
+
f"Velocity of the x-coordinate of the top: {state[8]:.2f} m/s\n"
|
18 |
+
f"Velocity of the z-coordinate (height) of the top: {state[9]:.2f} m/s\n"
|
19 |
+
f"Angular velocity of the angle of the top: {state[10]:.2f} rad/s\n"
|
20 |
+
f"Angular velocity of the thigh hinge: {state[11]:.2f} rad/s\n"
|
21 |
+
f"Angular velocity of the leg hinge: {state[12]:.2f} rad/s\n"
|
22 |
+
f"Angular velocity of the foot hinge: {state[13]:.2f} rad/s\n"
|
23 |
+
f"Angular velocity of the thigh hinge (left): {state[14]:.2f} rad/s\n"
|
24 |
+
f"Angular velocity of the leg hinge (left): {state[15]:.2f} rad/s\n"
|
25 |
+
f"Angular velocity of the foot hinge (left): {state[16]:.2f} rad/s"
|
26 |
+
)
|
27 |
+
return res
|
28 |
+
|
29 |
+
class GameDescriber:
|
30 |
+
def __init__(self, args):
|
31 |
+
self.is_only_local_obs = args.is_only_local_obs == 1
|
32 |
+
self.max_episode_len = args.max_episode_len
|
33 |
+
self.action_desc_dict = {
|
34 |
+
}
|
35 |
+
self.reward_desc_dict = {
|
36 |
+
}
|
37 |
+
|
38 |
+
def translate_terminate_state(self, state, episode_len, max_episode_len):
|
39 |
+
return ""
|
40 |
+
|
41 |
+
def translate_potential_next_state(self, state, action):
|
42 |
+
return ""
|
43 |
+
|
44 |
+
def describe_goal(self):
|
45 |
+
return (
|
46 |
+
"The goal in the Walker2D environment is to coordinate both sets of feet, legs, and thighs "
|
47 |
+
"to move in the forward (right) direction by applying torques to the six hinges connecting "
|
48 |
+
"the six body parts. The objective is to make the robot walk forward."
|
49 |
+
)
|
50 |
+
|
51 |
+
def describe_game(self):
|
52 |
+
return (
|
53 |
+
"In the Walker2D environment, you control a two-dimensional two-legged walker with four main body parts. "
|
54 |
+
"Your objective is to make the walker move forward by coordinating the torques applied to the six hinges "
|
55 |
+
"connecting the body parts. The environment provides observations of the walker's body parts and velocities, "
|
56 |
+
"including the torso, leg, and thigh angles, orientations, and velocities. The goal is to make the walker walk "
|
57 |
+
"forward in the positive x-direction."
|
58 |
+
)
|
59 |
+
|
60 |
+
def describe_action(self):
|
61 |
+
return (
|
62 |
+
"Your next move: \nPlease provide a list of six numerical values, each within the range of [-1, 1], "
|
63 |
+
"representing the torques to be applied at the six hinge joints of the walker. These torques will help "
|
64 |
+
"coordinate the walker's movements and make it walk in the desired direction."
|
65 |
+
)
|
66 |
+
|
67 |
+
class BasicStateSequenceTranslator(BasicLevelTranslator):
|
68 |
+
def translate(self, infos, is_current=False):
|
69 |
+
descriptions = []
|
70 |
+
if is_current:
|
71 |
+
state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
|
72 |
+
return state_desc
|
73 |
+
for i, info in enumerate(infos):
|
74 |
+
assert 'state' in info, "info should contain state information"
|
75 |
+
state_desc = BasicLevelTranslator().translate(info['state'])
|
76 |
+
action_desc = (
|
77 |
+
"Torques Applied: "
|
78 |
+
f"Thigh: {info['action'][0]:.2f}, Leg: {info['action'][1]:.2f}, Foot: {info['action'][2]:.2f}, "
|
79 |
+
f"Left Thigh: {info['action'][3]:.2f}, Left Leg: {info['action'][4]:.2f}, Left Foot: {info['action'][5]:.2f}"
|
80 |
+
)
|
81 |
+
reward_desc = f"Result: Reward of {info['reward']:.2f}"
|
82 |
+
next_state_desc = BasicLevelTranslator().translate(info['next_state'])
|
83 |
+
descriptions.append(
|
84 |
+
f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to\n{next_state_desc}"
|
85 |
+
)
|
86 |
+
return descriptions
|
main_reflexion.py
CHANGED
@@ -292,7 +292,7 @@ if __name__ == "__main__":
|
|
292 |
parser.add_argument(
|
293 |
"--api_type",
|
294 |
type=str,
|
295 |
-
default="
|
296 |
choices=["azure", "openai"],
|
297 |
help="choose api type, now support azure and openai"
|
298 |
)
|
|
|
292 |
parser.add_argument(
|
293 |
"--api_type",
|
294 |
type=str,
|
295 |
+
default="openai",
|
296 |
choices=["azure", "openai"],
|
297 |
help="choose api type, now support azure and openai"
|
298 |
)
|
record_reflexion.csv
CHANGED
@@ -12,4 +12,11 @@ RepresentedBoxing-v0,1,expert,200.0
|
|
12 |
RepresentedPong-v0,1,expert,200.0
|
13 |
RepresentedMsPacman-v0,1,expert,10000.0
|
14 |
RepresentedMontezumaRevenge-v0,1,expert,10000.0
|
15 |
-
Ant-v4,1,expert,5000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
RepresentedPong-v0,1,expert,200.0
|
13 |
RepresentedMsPacman-v0,1,expert,10000.0
|
14 |
RepresentedMontezumaRevenge-v0,1,expert,10000.0
|
15 |
+
Ant-v4,1,expert,5000.2
|
16 |
+
HalfCheetah-v4,1,expert,12138.8
|
17 |
+
Hopper-v4,1,expert,3542.2
|
18 |
+
Walker2d-v4,1,expert,5000.0
|
19 |
+
Swimmer-v4,1,expert,44.4
|
20 |
+
Reacher-v4,1,expert,-2.6
|
21 |
+
Pusher-v4,1,expert,-52.3
|
22 |
+
|
test_atari.sh → shell/test_atari.sh
RENAMED
File without changes
|
shell/test_mujoco_ant.sh
CHANGED
@@ -1,6 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# exe
|
2 |
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
3 |
-
|
4 |
-
python main_reflexion.py --env_name
|
5 |
-
|
6 |
-
python main_reflexion.py --env_name
|
|
|
1 |
+
|
2 |
+
# Ant-v4
|
3 |
+
|
4 |
+
# COT
|
5 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
6 |
+
|
7 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
8 |
+
|
9 |
+
# SPP
|
10 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
11 |
+
|
12 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
+
# REFLEXION
|
17 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
18 |
+
|
19 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
20 |
+
|
21 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
|
22 |
+
|
23 |
+
|
24 |
# exe
|
25 |
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
26 |
+
|
27 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
28 |
+
|
29 |
+
python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
|
shell/test_mujoco_halfcheetah.sh
ADDED
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# HalfCheetah-v4
|
3 |
+
# Naive Actor
|
4 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 1 --num_trails 1
|
5 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
6 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
7 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
8 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 5 --num_trails 1
|
9 |
+
|
10 |
+
# COT
|
11 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
12 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
13 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
14 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
15 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 5 --num_trails 1
|
16 |
+
|
17 |
+
# self consistency
|
18 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 1 --num_trails 1
|
19 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
20 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
21 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
22 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 5 --num_trails 1
|
23 |
+
|
24 |
+
# self-ask
|
25 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 1 --num_trails 1
|
26 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
27 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
28 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
29 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 5 --num_trails 1
|
30 |
+
|
31 |
+
# SPP
|
32 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
33 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
34 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
35 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
36 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 5 --num_trails 1
|
37 |
+
|
38 |
+
# REFLEXION
|
39 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
40 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 2 --num_trails 1 --distiller reflect_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
41 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
42 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 4 --num_trails 1 --distiller reflect_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
|
43 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
|
44 |
+
|
45 |
+
|
46 |
+
# exe
|
47 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
48 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 2 --num_trails 1 --distiller guide_generator --prompt_path "envs/toy_text/few_shot_examples/cliffwalking"
|
49 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
50 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 4 --num_trails 1 --distiller guide_generator --prompt_path "envs/toy_text/few_shot_examples/cliffwalking"
|
51 |
+
python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
|
shell/test_mujoco_hopper.sh
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Hopper-v4
|
2 |
+
|
3 |
+
# COT
|
4 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
5 |
+
|
6 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
7 |
+
|
8 |
+
# SPP
|
9 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
10 |
+
|
11 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
12 |
+
|
13 |
+
|
14 |
+
|
15 |
+
# REFLEXION
|
16 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
17 |
+
|
18 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
19 |
+
|
20 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
|
21 |
+
|
22 |
+
|
23 |
+
# exe
|
24 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
25 |
+
|
26 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
27 |
+
|
28 |
+
python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
|
shell/test_mujoco_invertedDoublePendulum.sh
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# InvertedDoublePendulum-v4
|
2 |
+
|
3 |
+
# COT
|
4 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
5 |
+
|
6 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
7 |
+
|
8 |
+
# SPP
|
9 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
10 |
+
|
11 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
12 |
+
|
13 |
+
|
14 |
+
# REFLEXION
|
15 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
16 |
+
|
17 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
18 |
+
|
19 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
|
20 |
+
|
21 |
+
|
22 |
+
# exe
|
23 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
24 |
+
|
25 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
26 |
+
|
27 |
+
python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
|
shell/test_mujoco_invertedPendulum.sh
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# InvertedPendulum-v4
|
2 |
+
|
3 |
+
# COT
|
4 |
+
python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
5 |
+
|
6 |
+
python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
7 |
+
|
8 |
+
# SPP
|
9 |
+
python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
10 |
+
|
11 |
+
python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
12 |
+
|
13 |
+
|
14 |
+
|
15 |
+
# REFLEXION
|
16 |
+
python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
17 |
+
|
18 |
+
python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
19 |
+
|
20 |
+
|
21 |
+
|
22 |
+
# exe
|
23 |
+
python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
24 |
+
|
25 |
+
python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
shell/test_mujoco_pusher.sh
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Pusher-v4
|
2 |
+
|
3 |
+
# COT
|
4 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
5 |
+
|
6 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
7 |
+
|
8 |
+
# SPP
|
9 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
10 |
+
|
11 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
12 |
+
|
13 |
+
|
14 |
+
# REFLEXION
|
15 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
16 |
+
|
17 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
18 |
+
|
19 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
|
20 |
+
|
21 |
+
|
22 |
+
# exe
|
23 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
24 |
+
|
25 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
26 |
+
|
27 |
+
python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
|
shell/test_mujoco_reacher.sh
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Reacher-v4
|
2 |
+
|
3 |
+
# COT
|
4 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
5 |
+
|
6 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
7 |
+
|
8 |
+
# SPP
|
9 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
10 |
+
|
11 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
12 |
+
|
13 |
+
|
14 |
+
# REFLEXION
|
15 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
16 |
+
|
17 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
18 |
+
|
19 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
|
20 |
+
|
21 |
+
|
22 |
+
# exe
|
23 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
24 |
+
|
25 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
26 |
+
|
27 |
+
python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
|
shell/test_mujoco_swimmer.sh
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Swimmer-v4
|
2 |
+
|
3 |
+
# COT
|
4 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
5 |
+
|
6 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
7 |
+
|
8 |
+
# SPP
|
9 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
10 |
+
|
11 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
12 |
+
|
13 |
+
|
14 |
+
# REFLEXION
|
15 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
16 |
+
|
17 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
18 |
+
|
19 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
|
20 |
+
|
21 |
+
|
22 |
+
# exe
|
23 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
24 |
+
|
25 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
26 |
+
|
27 |
+
python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
|
shell/test_mujoco_walker2d.sh
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Walker2d-v4
|
2 |
+
|
3 |
+
# COT
|
4 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
|
5 |
+
|
6 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
7 |
+
|
8 |
+
# SPP
|
9 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
|
10 |
+
|
11 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
|
12 |
+
|
13 |
+
|
14 |
+
|
15 |
+
# REFLEXION
|
16 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
|
17 |
+
|
18 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
|
19 |
+
|
20 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
|
21 |
+
|
22 |
+
|
23 |
+
# exe
|
24 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
|
25 |
+
|
26 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
|
27 |
+
|
28 |
+
python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
|
test_reflexion.sh → shell/test_reflexion.sh
RENAMED
File without changes
|