Jarvis commited on
Commit
af14952
·
2 Parent(s): c368640 c258b0d

Merge branch 'master' into gradio

Browse files
README.md CHANGED
@@ -116,6 +116,13 @@ If everything runs smoothly, you have successfully imported the Atari ROMs and s
116
 
117
  Reference: [StackOverflow answer](https://stackoverflow.com/a/68143504/38626)
118
 
 
 
 
 
 
 
 
119
  ### Visulization with Gradio
120
 
121
  > Gradio is an open-source Python package that allows you to quickly build a demo or web application for your machine learning model, API, or any arbitary Python function. You can then share a link to your demo or web application in just a few seconds using Gradio’s built-in sharing features. No JavaScript, CSS, or web hosting experience needed! [from https://www.gradio.app/guides/quickstart]
@@ -134,4 +141,4 @@ And then run the following Python file in the root directory:
134
  python gradio_reflexion.py
135
  ```
136
 
137
- The visulization web application will open in a browser on http://server-ip-address:7860 if running from a file. If you are running within a notebook, the demo will appear embedded within the notebook.
 
116
 
117
  Reference: [StackOverflow answer](https://stackoverflow.com/a/68143504/38626)
118
 
119
+
120
+ ### support new env
121
+ We also support other new env using Gym format, for new env you need to
122
+ 1. Translate your Gym env to TextGym env, make `<your_env>_translator.py, <your_env>policies.py`, put them into `./envs/`, and add your env in `./envs/__init__.py`.
123
+ 2. Add the PPO performance (best or expert) of your env in `./record_reflexion.csv`
124
+ 3. Test it using shell command (recommend using COT, SPP, self-reflexion, and exe under L1&L3 level). Testing examples can be found in `./shell`.
125
+
126
  ### Visulization with Gradio
127
 
128
  > Gradio is an open-source Python package that allows you to quickly build a demo or web application for your machine learning model, API, or any arbitary Python function. You can then share a link to your demo or web application in just a few seconds using Gradio’s built-in sharing features. No JavaScript, CSS, or web hosting experience needed! [from https://www.gradio.app/guides/quickstart]
 
141
  python gradio_reflexion.py
142
  ```
143
 
144
+ The visulization web application will open in a browser on http://server-ip-address:7860 if running from a file. If you are running within a notebook, the demo will appear embedded within the notebook.
envs/__init__.py CHANGED
@@ -18,7 +18,6 @@ from .atari import mspacman_policies, mspacman_translator
18
  from .atari import montezumarevenge_policies, montezumarevenge_translator
19
  register_environments()
20
 
21
- from .mujoco import ant_translator, ant_policies
22
 
23
  REGISTRY = {}
24
  REGISTRY["sampling_wrapper"] = SettableStateEnv
@@ -139,6 +138,99 @@ REGISTRY["RepresentedMontezumaRevenge_basic_policies"] = [
139
  montezumarevenge_policies.dedicated_18_policy,
140
  ]
141
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
  REGISTRY["ant_init_translator"] = ant_translator.GameDescriber
143
  REGISTRY["ant_basic_translator"] = ant_translator.BasicStateSequenceTranslator
144
- REGISTRY["ant_policies"] = [ant_policies.pseudo_random_policy, ant_policies.real_random_policy]
 
18
  from .atari import montezumarevenge_policies, montezumarevenge_translator
19
  register_environments()
20
 
 
21
 
22
  REGISTRY = {}
23
  REGISTRY["sampling_wrapper"] = SettableStateEnv
 
138
  montezumarevenge_policies.dedicated_18_policy,
139
  ]
140
 
141
+ REGISTRY["RepresentedMsPacman_init_translator"] = mspacman_translator.GameDescriber
142
+ REGISTRY["RepresentedMsPacman_basic_translator"] = mspacman_translator.BasicStateSequenceTranslator
143
+ REGISTRY["RepresentedMsPacman_basic_policies"] = [
144
+ mspacman_policies.real_random_policy,
145
+ mspacman_policies.pseudo_random_policy,
146
+ mspacman_policies.dedicated_1_policy,
147
+ mspacman_policies.dedicated_2_policy,
148
+ mspacman_policies.dedicated_3_policy,
149
+ mspacman_policies.dedicated_4_policy,
150
+ mspacman_policies.dedicated_5_policy,
151
+ mspacman_policies.dedicated_6_policy,
152
+ mspacman_policies.dedicated_7_policy,
153
+ mspacman_policies.dedicated_8_policy,
154
+ mspacman_policies.dedicated_9_policy,
155
+ ]
156
+
157
+ REGISTRY["RepresentedMontezumaRevenge_init_translator"] = montezumarevenge_translator.GameDescriber
158
+ REGISTRY["RepresentedMontezumaRevenge_basic_translator"] = montezumarevenge_translator.BasicStateSequenceTranslator
159
+ REGISTRY["RepresentedMontezumaRevenge_basic_policies"] = [
160
+ montezumarevenge_policies.real_random_policy,
161
+ montezumarevenge_policies.pseudo_random_policy,
162
+ montezumarevenge_policies.dedicated_1_policy,
163
+ montezumarevenge_policies.dedicated_2_policy,
164
+ montezumarevenge_policies.dedicated_3_policy,
165
+ montezumarevenge_policies.dedicated_4_policy,
166
+ montezumarevenge_policies.dedicated_5_policy,
167
+ montezumarevenge_policies.dedicated_6_policy,
168
+ montezumarevenge_policies.dedicated_7_policy,
169
+ montezumarevenge_policies.dedicated_8_policy,
170
+ montezumarevenge_policies.dedicated_9_policy,
171
+ montezumarevenge_policies.dedicated_10_policy,
172
+ montezumarevenge_policies.dedicated_11_policy,
173
+ montezumarevenge_policies.dedicated_12_policy,
174
+ montezumarevenge_policies.dedicated_13_policy,
175
+ montezumarevenge_policies.dedicated_14_policy,
176
+ montezumarevenge_policies.dedicated_15_policy,
177
+ montezumarevenge_policies.dedicated_16_policy,
178
+ montezumarevenge_policies.dedicated_17_policy,
179
+ montezumarevenge_policies.dedicated_18_policy,
180
+ ]
181
+
182
+ ## For mujoco env
183
+
184
+
185
+ from .mujoco import invertedPendulum_translator, invertedPendulum_policies
186
+ from .mujoco import invertedDoublePendulum_translator, invertedDoublePendulum_policies
187
+
188
+ from .mujoco import swimmer_translator, swimmer_policies
189
+
190
+ from .mujoco import reacher_translator, reacher_policies
191
+
192
+ from .mujoco import hopper_translator, hopper_policies
193
+ from .mujoco import walker2d_translator, walker2d_policies
194
+
195
+
196
+
197
+
198
+
199
+ REGISTRY["invertedPendulum_init_translator"] = invertedPendulum_translator.GameDescriber
200
+ REGISTRY["invertedPendulum_basic_translator"] = invertedPendulum_translator.BasicStateSequenceTranslator
201
+ REGISTRY["invertedPendulum_policies"] = [invertedPendulum_policies.pseudo_random_policy, invertedPendulum_policies.real_random_policy]
202
+ REGISTRY["invertedDoublePendulum_init_translator"] = invertedDoublePendulum_translator.GameDescriber
203
+ REGISTRY["invertedDoublePendulum_basic_translator"] = invertedDoublePendulum_translator.BasicStateSequenceTranslator
204
+ REGISTRY["invertedDoublePendulum_policies"] = [invertedDoublePendulum_policies.pseudo_random_policy, invertedDoublePendulum_policies.real_random_policy]
205
+
206
+
207
+ REGISTRY["swimmer_init_translator"] = swimmer_translator.GameDescriber
208
+ REGISTRY["swimmer_basic_translator"] = swimmer_translator.BasicStateSequenceTranslator
209
+ REGISTRY["swimmer_policies"] = [swimmer_policies.pseudo_random_policy, swimmer_policies.real_random_policy]
210
+
211
+ REGISTRY["reacher_init_translator"] = reacher_translator.GameDescriber
212
+ REGISTRY["reacher_basic_translator"] = reacher_translator.BasicStateSequenceTranslator
213
+ REGISTRY["reacher_policies"] = [reacher_policies.pseudo_random_policy, reacher_policies.real_random_policy]
214
+
215
+ REGISTRY["hopper_init_translator"] = hopper_translator.GameDescriber
216
+ REGISTRY["hopper_basic_translator"] = hopper_translator.BasicStateSequenceTranslator
217
+ REGISTRY["hopper_policies"] = [hopper_policies.pseudo_random_policy, hopper_policies.real_random_policy]
218
+ REGISTRY["walker2d_init_translator"] = walker2d_translator.GameDescriber
219
+ REGISTRY["walker2d_basic_translator"] = walker2d_translator.BasicStateSequenceTranslator
220
+ REGISTRY["walker2d_policies"] = [walker2d_policies.pseudo_random_policy, walker2d_policies.real_random_policy]
221
+
222
+
223
+ from .mujoco import halfcheetah_translator, halfcheetah_policies
224
+ REGISTRY["halfcheetah_init_translator"] = halfcheetah_translator.GameDescriber
225
+ REGISTRY["halfcheetah_basic_translator"] = halfcheetah_translator.BasicStateSequenceTranslator
226
+ REGISTRY["halfcheetah_policies"] = [halfcheetah_policies.pseudo_random_policy, halfcheetah_policies.real_random_policy]
227
+
228
+ from .mujoco import pusher_translator, pusher_policies
229
+ REGISTRY["pusher_init_translator"] = pusher_translator.GameDescriber
230
+ REGISTRY["pusher_basic_translator"] = pusher_translator.BasicStateSequenceTranslator
231
+ REGISTRY["pusher_policies"] = [pusher_policies.pseudo_random_policy, pusher_policies.real_random_policy]
232
+
233
+ from .mujoco import ant_translator, ant_policies
234
  REGISTRY["ant_init_translator"] = ant_translator.GameDescriber
235
  REGISTRY["ant_basic_translator"] = ant_translator.BasicStateSequenceTranslator
236
+ REGISTRY["ant_policies"] = [ant_policies.pseudo_random_policy, ant_policies.real_random_policy]
envs/mujoco/ant_translator.py CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  class BasicLevelTranslator:
2
  def __init__(self):
3
  pass
 
1
+ '''Ant
2
+ Action Space Box(-1.0, 1.0, (8,), float32)
3
+ Observation Space Box(-inf, inf, (27,), float64)
4
+ '''
5
+
6
  class BasicLevelTranslator:
7
  def __init__(self):
8
  pass
envs/mujoco/halfcheetah_policies.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+
4
+ def pseudo_random_policy(state, pre_action):
5
+ def get_description():
6
+ return "Select action randomly"
7
+ pseudo_random_policy.description = get_description()
8
+ return [2 * random.random() - 1 for i in range(6)]
9
+
10
+
11
+ def real_random_policy(state, pre_action=1):
12
+ def get_description():
13
+ return "Select action with a random policy"
14
+ real_random_policy.description = get_description()
15
+ return [2 * random.random() - 1 for i in range(6)]
envs/mujoco/halfcheetah_translator.py ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ class BasicLevelTranslator:
2
+ def __init__(self):
3
+ pass
4
+
5
+ def translate(self, state):
6
+ (front_tip_z, front_tip_angle, back_thigh_angle_1, back_shin_angle_1,
7
+ tip_velocity_x, tip_velocity_y, front_tip_angular_velocity,
8
+ back_thigh_angular_velocity_1, front_tip_x, front_tip_y, front_tip_angle_2,
9
+ back_thigh_angle_2, back_shin_angle_2, tip_velocity_angular_x,
10
+ tip_velocity_angular_y, front_tip_angular_velocity_2,
11
+ back_thigh_angular_velocity_2) = state[:17]
12
+
13
+ res = (
14
+ f"The front tip is at a z-coordinate of {front_tip_z:.2f} meters. "
15
+ f"The angle of the front tip is {front_tip_angle:.2f} radians. "
16
+ f"The angles of the back thigh are {back_thigh_angle_1:.2f} and {back_thigh_angle_2:.2f} radians. "
17
+ f"The angles of the back shin are {back_shin_angle_1:.2f} and {back_shin_angle_2:.2f} radians. "
18
+ f"The tip has velocity along the x-axis of {tip_velocity_x:.2f} m/s. "
19
+ f"The tip has velocity along the y-axis of {tip_velocity_y:.2f} m/s. "
20
+ f"The angular velocity of the front tip is {front_tip_angular_velocity:.2f} radians/s. "
21
+ f"The angular velocities of the back thigh are {back_thigh_angular_velocity_1:.2f} and {back_thigh_angular_velocity_2:.2f} radians/s. "
22
+ f"The x-coordinate of the front tip is {front_tip_x:.2f} meters. "
23
+ f"The y-coordinate of the front tip is {front_tip_y:.2f} meters. "
24
+ f"The angle of the front tip is {front_tip_angle_2:.2f} radians. "
25
+ f"The angular velocity of the tip along the x-axis is {tip_velocity_angular_x:.2f} radians/s. "
26
+ f"The angular velocity of the tip along the y-axis is {tip_velocity_angular_y:.2f} radians/s. "
27
+ f"The angular velocity of the back shin is {front_tip_angular_velocity_2:.2f} radians/s."
28
+ )
29
+ return res
30
+
31
+ class GameDescriber:
32
+ def __init__(self, args):
33
+ self.is_only_local_obs = args.is_only_local_obs == 1
34
+ self.max_episode_len = args.max_episode_len
35
+ self.action_desc_dict = {
36
+ }
37
+ self.reward_desc_dict = {
38
+ }
39
+
40
+ def translate_terminate_state(self, state, episode_len, max_episode_len):
41
+ return ""
42
+
43
+ def translate_potential_next_state(self, state, action):
44
+ return ""
45
+
46
+ def describe_goal(self):
47
+ return "The goal is to make the Half-Cheetah run forward (right) as fast as possible."
48
+
49
+ def describe_game(self):
50
+ return (
51
+ "In the Half-Cheetah game, you control a 2-dimensional robot with 9 links and 8 joints. "
52
+ "The goal is to apply torque to the joints to make the cheetah run forward (right) as fast as possible. "
53
+ "You can control the back thigh, back shin, and back foot rotors for the back legs, and the front thigh, "
54
+ "front shin, and front foot rotors for the front legs. The episode ends after 1000 timesteps. "
55
+ "Your reward is based on how much forward progress you make and how much control effort you apply."
56
+ )
57
+
58
+ def describe_action(self):
59
+ return (
60
+ "Your next move: \n"
61
+ "Please select six numerical values, each one within the range of [-1,1], "
62
+ "which represents the torque being applied to the back thigh rotor, "
63
+ "back shin rotor, back foot rotor, front thigh rotor, front shin rotor, "
64
+ "and front foot rotor respectively."
65
+ )
66
+
67
+ class BasicStateSequenceTranslator(BasicLevelTranslator):
68
+ def translate(self, infos, is_current=False):
69
+ descriptions = []
70
+ if is_current:
71
+ state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
72
+ return state_desc
73
+ for i, info in enumerate(infos):
74
+ assert 'state' in info, "info should contain state information"
75
+
76
+ state_desc = BasicLevelTranslator().translate(info['state'])
77
+ action_desc = (
78
+ "Take Action: "
79
+ "Apply Back Thigh Torque: {:.2f}, "
80
+ "Apply Back Shin Torque: {:.2f}, "
81
+ "Apply Back Foot Torque: {:.2f}, "
82
+ "Apply Front Thigh Torque: {:.2f}, "
83
+ "Apply Front Shin Torque: {:.2f}, "
84
+ "Apply Front Foot Torque: {:.2f}"
85
+ ).format(
86
+ info['action'][0], info['action'][1], info['action'][2],
87
+ info['action'][3], info['action'][4], info['action'][5]
88
+ )
89
+
90
+ reward_desc = f"Result: Forward Reward of {info['forward_reward']:.2f}, "
91
+ ctrl_cost_desc = f"Control Cost of {info['ctrl_cost']:.2f}, "
92
+ total_reward_desc = f"Total Reward of {info['reward']:.2f}, "
93
+ next_state_desc = BasicLevelTranslator().translate(info['next_state'])
94
+ descriptions.append(f"{state_desc}.\\n {action_desc} \\n {reward_desc} {ctrl_cost_desc} {total_reward_desc} \\n Transit to {next_state_desc}")
95
+ return descriptions
envs/mujoco/hopper_policies.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+
4
+ def pseudo_random_policy(state, pre_action):
5
+ def get_description():
6
+ return "Select action randomly"
7
+ pseudo_random_policy.description = get_description()
8
+ return [2 * random.random() - 1 for i in range(3)]
9
+
10
+
11
+ def real_random_policy(state, pre_action=1):
12
+ def get_description():
13
+ return "Select action with a random policy"
14
+ real_random_policy.description = get_description()
15
+ return [2 * random.random() - 1 for i in range(3)]
envs/mujoco/hopper_translator.py ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''
2
+ Action Space Box(-1.0, 1.0, (3,), float32)
3
+ Observation Space Box(-inf, inf, (11,), float64)
4
+ '''
5
+
6
+ class BasicLevelTranslator:
7
+ def __init__(self):
8
+ pass
9
+
10
+ def translate(self, state):
11
+ (top_z, top_angle, thigh_angle, leg_angle, foot_angle,
12
+ top_x_velocity, top_z_velocity, top_angular_velocity,
13
+ thigh_angular_velocity, leg_angular_velocity, foot_angular_velocity) = state[:11]
14
+
15
+ res = (
16
+ f"The top is at a z-coordinate of {top_z:.2f} meters. "
17
+ f"The angle of the top is {top_angle:.2f} radians. "
18
+ f"The angle of the thigh joint is {thigh_angle:.2f} radians. "
19
+ f"The angle of the leg joint is {leg_angle:.2f} radians. "
20
+ f"The angle of the foot joint is {foot_angle:.2f} radians. "
21
+ f"The x-coordinate velocity of the top is {top_x_velocity:.2f} m/s. "
22
+ f"The z-coordinate (height) velocity of the top is {top_z_velocity:.2f} m/s. "
23
+ f"The angular velocity of the top is {top_angular_velocity:.2f} radians/s. "
24
+ f"The angular velocity of the thigh hinge is {thigh_angular_velocity:.2f} radians/s. "
25
+ f"The angular velocity of the leg hinge is {leg_angular_velocity:.2f} radians/s. "
26
+ f"The angular velocity of the foot hinge is {foot_angular_velocity:.2f} radians/s."
27
+ )
28
+ return res
29
+
30
+ class GameDescriber:
31
+ def __init__(self, args):
32
+ self.is_only_local_obs = args.is_only_local_obs == 1
33
+ self.max_episode_len = args.max_episode_len
34
+ self.action_desc_dict = {}
35
+ self.reward_desc_dict = {}
36
+
37
+ def translate_terminate_state(self, state, episode_len, max_episode_len):
38
+ return ""
39
+
40
+ def translate_potential_next_state(self, state, action):
41
+ return ""
42
+
43
+ def describe_goal(self):
44
+ return (
45
+ "The goal in the Hopper environment is to make the one-legged hopper move forward (right) "
46
+ "by applying torques to the thigh, leg, and foot joints."
47
+ )
48
+
49
+ def describe_game(self):
50
+ return (
51
+ "In the Hopper environment, you control a one-legged hopper consisting of a torso, thigh, leg, "
52
+ "and a foot on which it rests. Your objective is to apply torques to the thigh, leg, and foot joints "
53
+ "to make the hopper perform hops in the positive x-direction. The environment provides observations "
54
+ "of the hopper's body parts and velocities, including the height, angles of joints, and angular velocities. "
55
+ "The episode ends when certain termination conditions are met."
56
+ )
57
+
58
+ def describe_action(self):
59
+ return (
60
+ "Your next move: \n Please provide a list of three numerical values, each within the range of [-1,1], "
61
+ "representing the torques to be applied at the thigh, leg, and foot joints of the hopper."
62
+ )
63
+
64
+
65
+ class BasicStateSequenceTranslator(BasicLevelTranslator):
66
+ def translate(self, infos, is_current=False):
67
+ descriptions = []
68
+ if is_current:
69
+ state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
70
+ return state_desc
71
+ for i, info in enumerate(infos):
72
+ assert 'state' in info, "info should contain state information"
73
+
74
+ state_desc = BasicLevelTranslator().translate(info['state'])
75
+ action_desc = (
76
+ f"Take Action: Apply Thigh Torque: {info['action'][0]:.2f}, "
77
+ f"Leg Torque: {info['action'][1]:.2f}, Foot Torque: {info['action'][2]:.2f}"
78
+ )
79
+
80
+ reward_desc = f"Result: Reward of {info['reward']:.2f}, "
81
+ next_state_desc = BasicLevelTranslator().translate(info['next_state'])
82
+ descriptions.append(f"{state_desc}.\\n {action_desc} \\n {reward_desc} \\n Transit to {next_state_desc}")
83
+ return descriptions
84
+
envs/mujoco/invertedDoublePendulum_policies.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+
4
+ def pseudo_random_policy(state, pre_action):
5
+ def get_description():
6
+ return "Select action randomly"
7
+ pseudo_random_policy.description = get_description()
8
+ return [6 * random.random() - 3 for i in range(1)]
9
+
10
+
11
+ def real_random_policy(state, pre_action=1):
12
+ def get_description():
13
+ return "Select action with a random policy"
14
+ real_random_policy.description = get_description()
15
+ return [6 * random.random() - 3 for i in range(1)]
envs/mujoco/invertedDoublePendulum_translator.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''InvertedDoublePendulum-v4
2
+ Action Space Box(-1.0, 1.0, (1,), float32)
3
+ Observation Space Box(-inf, inf, (11,), float64)
4
+ '''
5
+
6
+ class BasicLevelTranslator:
7
+ def translate(self, state):
8
+ res = (
9
+ f"Position of the cart: {state[0]:.2f} m\n"
10
+ f"Vertical angle of the pole: {state[1]:.2f} rad\n"
11
+ f"Linear velocity of the cart: {state[2]:.2f} m/s\n"
12
+ f"Angular velocity of the pole: {state[3]:.2f} rad/s"
13
+ )
14
+ return res
15
+
16
+ class GameDescriber:
17
+ def __init__(self, args):
18
+ self.is_only_local_obs = args.is_only_local_obs == 1
19
+ self.max_episode_len = args.max_episode_len
20
+ self.action_desc_dict = {
21
+ 0: "Apply a force in the range [-1, 1] to the cart to control its motion.",
22
+ }
23
+ self.reward_desc_dict = {}
24
+
25
+ def translate_terminate_state(self, state, episode_len, max_episode_len):
26
+ return ""
27
+
28
+ def translate_potential_next_state(self, state, action):
29
+ return ""
30
+
31
+ def describe_goal(self):
32
+ return (
33
+ "The goal in the Inverted Pendulum environment is to balance the pole on top of the cart "\
34
+ "by applying continuous forces to the cart, keeping it upright."
35
+ )
36
+
37
+ def describe_game(self):
38
+ return (
39
+ "In the Inverted Pendulum environment, you control a cart that can move linearly with a pole "\
40
+ "attached to it. Your objective is to balance the pole on top of the cart by applying forces "\
41
+ "to the cart in a way that keeps the pole upright. "\
42
+ "The environment provides observations of the cart's position, pole angle, velocities, "\
43
+ "and angular velocities. The goal is to maintain balance as long as possible."
44
+ )
45
+
46
+ def describe_action(self):
47
+ return (
48
+ "Your next move: \n Please provide a numerical value for the force to be applied to the cart. "\
49
+ "This value should be within the range of [-3, 3], where a positive value indicates applying force "\
50
+ "in the right direction, and a negative value indicates applying force in the left direction."
51
+ )
52
+
53
+ class BasicStateSequenceTranslator(BasicLevelTranslator):
54
+ def translate(self, infos, is_current=False):
55
+ descriptions = []
56
+ if is_current:
57
+ state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
58
+ return state_desc
59
+ for i, info in enumerate(infos):
60
+ assert 'state' in info, "info should contain state information"
61
+ state_desc = BasicLevelTranslator().translate(info['state'])
62
+ action_desc = f"Applied Force on Cart: {info['action'][0]:.2f}"
63
+ reward_desc = f"Result: Reward of {info['reward']:.2f}"
64
+ next_state_desc = BasicLevelTranslator().translate(info['next_state'])
65
+ descriptions.append(
66
+ f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to\n{next_state_desc}"
67
+ )
68
+ return descriptions
envs/mujoco/invertedPendulum_policies.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+
4
+ def pseudo_random_policy(state, pre_action):
5
+ def get_description():
6
+ return "Select action randomly"
7
+ pseudo_random_policy.description = get_description()
8
+ return [2 * random.random() - 1 for i in range(1)]
9
+
10
+
11
+ def real_random_policy(state, pre_action=1):
12
+ def get_description():
13
+ return "Select action with a random policy"
14
+ real_random_policy.description = get_description()
15
+ return [2 * random.random() - 1 for i in range(1)]
envs/mujoco/invertedPendulum_translator.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''InvertedPendulum-v4
2
+ Action Space Box(-3.0, 3.0, (1,), float32)
3
+ Observation Space Box(-inf, inf, (4,), float64)
4
+ '''
5
+
6
+ class BasicLevelTranslator:
7
+ def translate(self, state):
8
+ res = (
9
+ f"Position of the cart: {state[0]:.2f} m\n"
10
+ f"Sine of the angle between cart and first pole: {state[1]:.2f}\n"
11
+ f"Sine of the angle between two poles: {state[2]:.2f}\n"
12
+ f"Cosine of the angle between cart and first pole: {state[3]:.2f}\n"
13
+ f"Cosine of the angle between two poles: {state[4]:.2f}\n"
14
+ f"Velocity of the cart: {state[5]:.2f} m/s\n"
15
+ f"Angular velocity of angle between cart and first pole: {state[6]:.2f} rad/s\n"
16
+ f"Angular velocity of angle between two poles: {state[7]:.2f} rad/s\n"
17
+ f"Constraint Force 1: {state[8]:.2f} N\n"
18
+ f"Constraint Force 2: {state[9]:.2f} N\n"
19
+ f"Constraint Force 3: {state[10]:.2f} N"
20
+ )
21
+ return res
22
+
23
+ class GameDescriber:
24
+ def __init__(self, args):
25
+ self.is_only_local_obs = args.is_only_local_obs == 1
26
+ self.max_episode_len = args.max_episode_len
27
+ self.action_desc_dict = {
28
+ 0: "Apply a force in the range [-3, 3] to the cart to control its motion.",
29
+ }
30
+ self.reward_desc_dict = {}
31
+
32
+ def translate_terminate_state(self, state, episode_len, max_episode_len):
33
+ return ""
34
+
35
+ def translate_potential_next_state(self, state, action):
36
+ return ""
37
+
38
+ def describe_goal(self):
39
+ return (
40
+ "The goal in the InvertedDoublePendulum environment is to balance the two poles "\
41
+ "on top of the cart by applying continuous forces on the cart."
42
+ )
43
+
44
+ def describe_game(self):
45
+ return (
46
+ "In the InvertedDoublePendulum environment, you control a system with a cart and two poles. "\
47
+ "Your objective is to balance the two poles on top of the cart by applying continuous forces "\
48
+ "to the cart. The environment provides observations of the cart's position, angles of the poles, "\
49
+ "and their angular velocities. The episode ends when certain termination conditions are met."
50
+ )
51
+
52
+ def describe_action(self):
53
+ return (
54
+ "Your next move: \n Please provide a numerical value within the range of [-3,3], "\
55
+ "representing the force to be applied to the cart."
56
+ )
57
+
58
+ class BasicStateSequenceTranslator(BasicLevelTranslator):
59
+ def translate(self, infos, is_current=False):
60
+ descriptions = []
61
+ if is_current:
62
+ state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
63
+ return state_desc
64
+ for i, info in enumerate(infos):
65
+ assert 'state' in info, "info should contain state information"
66
+ state_desc = BasicLevelTranslator().translate(info['state'])
67
+ action_desc = f"Applied Force on Cart: {info['action'][0]:.2f}"
68
+ reward_desc = f"Result: Reward of {info['reward']:.2f}"
69
+ next_state_desc = BasicLevelTranslator().translate(info['next_state'])
70
+ descriptions.append(
71
+ f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to\n{next_state_desc}"
72
+ )
73
+ return descriptions
envs/mujoco/pusher_policies.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+
4
+ def pseudo_random_policy(state, pre_action):
5
+ def get_description():
6
+ return "Select action randomly"
7
+ pseudo_random_policy.description = get_description()
8
+ return [4 * random.random() - 2 for i in range(7)]
9
+
10
+
11
+ def real_random_policy(state, pre_action=1):
12
+ def get_description():
13
+ return "Select action with a random policy"
14
+ real_random_policy.description = get_description()
15
+ return [4 * random.random() - 2 for i in range(7)]
envs/mujoco/pusher_translator.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''Pusher
2
+ Action Space Box(-2.0, 2.0, (7,), float32)
3
+ Observation Space Box(-inf, inf, (23,), float64)
4
+ '''
5
+ import math
6
+
7
+ class BasicLevelTranslator:
8
+ def __init__(self):
9
+ pass
10
+
11
+ def translate(self, state):
12
+
13
+ joint_angles = state[:7]
14
+ joint_velocities = state[7:14]
15
+ fingertip_coords = state[14:17]
16
+ object_coords = state[17:20]
17
+ goal_coords = state[20:]
18
+
19
+ joint_angle_degrees = [math.degrees(angle) for angle in joint_angles]
20
+ joint_velocity_degrees = [math.degrees(velocity) for velocity in joint_velocities]
21
+
22
+ res = (f"Rotation of the panning shoulder: {joint_angle_degrees[0]:.2f} degrees, "
23
+ f"Rotation of the shoulder lifting joint: {joint_angle_degrees[1]:.2f} degrees, "
24
+ f"Rotation of the shoulder rolling joint: {joint_angle_degrees[2]:.2f} degrees, "
25
+ f"Rotation of the elbow joint: {joint_angle_degrees[3]:.2f} degrees, "
26
+ f"Rotation of the forearm rolling joint: {joint_angle_degrees[4]:.2f} degrees, "
27
+ f"Rotation of the wrist flexing joint: {joint_angle_degrees[5]:.2f} degrees, "
28
+ f"Rotation of the wrist rolling joint: {joint_angle_degrees[6]:.2f} degrees, "
29
+ f"Rotational velocity of the panning shoulder: {joint_velocity_degrees[0]:.2f} degrees/s, "
30
+ f"Rotational velocity of the shoulder lifting joint: {joint_velocity_degrees[1]:.2f} degrees/s, "
31
+ f"Rotational velocity of the shoulder rolling joint: {joint_velocity_degrees[2]:.2f} degrees/s, "
32
+ f"Rotational velocity of the elbow joint: {joint_velocity_degrees[3]:.2f} degrees/s, "
33
+ f"Rotational velocity of the forearm rolling joint: {joint_velocity_degrees[4]:.2f} degrees/s, "
34
+ f"Rotational velocity of the wrist flexing joint: {joint_velocity_degrees[5]:.2f} degrees/s, "
35
+ f"Rotational velocity of the wrist rolling joint: {joint_velocity_degrees[6]:.2f} degrees/s, "
36
+ f"Fingertip coordinates (x, y, z): ({fingertip_coords[0]:.2f}, {fingertip_coords[1]:.2f}, {fingertip_coords[2]:.2f}), "
37
+ f"Object coordinates (x, y, z): ({object_coords[0]:.2f}, {object_coords[1]:.2f}, {object_coords[2]:.2f}), "
38
+ f"Goal coordinates (x, y, z): ({goal_coords[0]:.2f}, {goal_coords[1]:.2f}, {goal_coords[2]:.2f}).")
39
+ return res
40
+
41
+
42
+ class GameDescriber:
43
+ def __init__(self, args):
44
+ self.is_only_local_obs = args.is_only_local_obs == 1
45
+ self.max_episode_len = args.max_episode_len
46
+ self.action_desc_dict = {
47
+ }
48
+ self.reward_desc_dict = {
49
+ }
50
+
51
+ def translate_terminate_state(self, state, episode_len, max_episode_len):
52
+ return ""
53
+
54
+ def translate_potential_next_state(self, state, action):
55
+ return ""
56
+
57
+ def describe_goal(self):
58
+ return "The goal is to move the target cylinder (object) to the goal position using the robot's end effector (fingertip)."
59
+
60
+ def describe_game(self):
61
+ return ("In the Pusher game, you control a multi-jointed robot arm to manipulate a target cylinder (object) "
62
+ "and place it in a goal position using the robot's fingertip (end effector). The robot has shoulder, elbow, "
63
+ "forearm, and wrist joints that you can control with torque values. The observation space includes joint angles, "
64
+ "angular velocities of joints, fingertip coordinates, object coordinates, and goal coordinates. The reward is "
65
+ "based on the distance between the fingertip and the object, the distance between the object and the goal, "
66
+ "and control penalties for large actions.")
67
+
68
+ def describe_action(self):
69
+ return ("Your next move: \n Please provide a list of 7 numerical values within the range [-2, 2], "
70
+ "representing the torques applied to the robot's joints (shoulder, elbow, forearm, and wrist).")
71
+
72
+
73
+ class BasicStateSequenceTranslator(BasicLevelTranslator):
74
+ def translate(self, infos, is_current=False):
75
+ descriptions = []
76
+ if is_current:
77
+ state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
78
+ return state_desc
79
+ for info in infos:
80
+ assert 'state' in info, "info should contain state information"
81
+
82
+ state_desc = BasicLevelTranslator().translate(info['state'])
83
+ action_desc = ("Take Action: Apply Torques - "
84
+ "Shoulder Pan: {:.2f}, Shoulder Lift: {:.2f}, Shoulder Roll: {:.2f}, "
85
+ "Elbow Flex: {:.2f}, Forearm Roll: {:.2f}, Wrist Flex: {:.2f}, Wrist Roll: {:.2f}"
86
+ ).format(info['action'][0], info['action'][1], info['action'][2], info['action'][3],
87
+ info['action'][4], info['action'][5], info['action'][6])
88
+
89
+ reward_desc = f"Result: Reward of {info['reward']:.2f}"
90
+ next_state_desc = BasicLevelTranslator().translate(info['next_state'])
91
+ descriptions.append(f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to {next_state_desc}")
92
+ return descriptions
93
+
envs/mujoco/reacher_policies.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+
4
+ def pseudo_random_policy(state, pre_action):
5
+ def get_description():
6
+ return "Select action randomly"
7
+ pseudo_random_policy.description = get_description()
8
+ return [2 * random.random() - 1 for i in range(2)]
9
+
10
+
11
+ def real_random_policy(state, pre_action=1):
12
+ def get_description():
13
+ return "Select action with a random policy"
14
+ real_random_policy.description = get_description()
15
+ return [2 * random.random() - 1 for i in range(2)]
envs/mujoco/reacher_translator.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''Reacher
2
+ Action Space Box(-1.0, 1.0, (2,), float32)
3
+
4
+ Observation Space Box(-inf, inf, (11,), float64)
5
+ '''
6
+ class BasicLevelTranslator:
7
+ def __init__(self):
8
+ pass
9
+
10
+ def translate(self, state):
11
+ (cos_angle_arm1, cos_angle_arm2, sin_angle_arm1, sin_angle_arm2,
12
+ target_x, target_y, angular_vel_arm1, angular_vel_arm2,
13
+ diff_x, diff_y, diff_z) = state
14
+
15
+ res = (f"Arm1 has a cosine angle of {cos_angle_arm1:.2f} and a sine angle of {sin_angle_arm1:.2f}. "\
16
+ f"Arm2 has a cosine angle of {cos_angle_arm2:.2f} and a sine angle of {sin_angle_arm2:.2f}. "\
17
+ f"Target position is at ({target_x:.2f}, {target_y:.2f}). "\
18
+ f"Arm1's angular velocity is {angular_vel_arm1:.2f} rad/s, and Arm2's is {angular_vel_arm2:.2f} rad/s. "\
19
+ f"Vector difference between fingertip and target is ({diff_x:.2f}, {diff_y:.2f}, {diff_z:.2f}).")
20
+ return res
21
+
22
+ class GameDescriber:
23
+ def __init__(self, args):
24
+ self.is_only_local_obs = args.is_only_local_obs == 1
25
+ self.max_episode_len = args.max_episode_len
26
+ self.action_desc_dict = {
27
+ }
28
+ self.reward_desc_dict = {
29
+ }
30
+
31
+ def translate_terminate_state(self, state, episode_len, max_episode_len):
32
+ return ""
33
+
34
+ def translate_potential_next_state(self, state, action):
35
+ return ""
36
+
37
+ def describe_goal(self):
38
+ return "The goal is to control a two-jointed robot arm to move its end effector (fingertip) close to a randomly spawned target."
39
+
40
+ def describe_game(self):
41
+ return ("In the Reacher game, you control a two-jointed robot arm. The objective is to maneuver the arm's fingertip close to a target. "\
42
+ "The observation space includes the cosine and sine of the arm angles, coordinates of the target, angular velocities of the arms, "\
43
+ "and the vector from the fingertip to the target. The episode ends after 50 timesteps or if any state space value becomes non-finite. "\
44
+ "Rewards are given based on the distance of the fingertip from the target and the magnitude of actions applied.")
45
+
46
+ def describe_action(self):
47
+ return ("Your next move: \n Please provide two numerical values representing the torques applied at the two hinge joints. "\
48
+ "Each value should be within the range of [-1, 1].")
49
+
50
+ class BasicStateSequenceTranslator(BasicLevelTranslator):
51
+ def translate(self, infos, is_current=False):
52
+ descriptions = []
53
+ if is_current:
54
+ state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
55
+ return state_desc
56
+ for i, info in enumerate(infos):
57
+ assert 'state' in info, "info should contain state information"
58
+
59
+ state_desc = BasicLevelTranslator().translate(info['state'])
60
+ action_desc = ("Take Action: Apply Torque at Joint 1: {:.2f}, "
61
+ "Joint 2 Torque: {:.2f}"
62
+ ).format(info['action'][0], info['action'][1])
63
+
64
+ reward_desc = f"Result: Reward of {info['reward']:.2f}, "
65
+ next_state_desc = BasicLevelTranslator().translate(info['next_state'])
66
+ descriptions.append(f"{state_desc}.\\n {action_desc} \\n {reward_desc} \\n Transit to {next_state_desc}")
67
+ return descriptions
envs/mujoco/swimmer_policies.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+
4
+ def pseudo_random_policy(state, pre_action):
5
+ def get_description():
6
+ return "Select action randomly"
7
+ pseudo_random_policy.description = get_description()
8
+ return [2 * random.random() - 1 for i in range(2)]
9
+
10
+
11
+ def real_random_policy(state, pre_action=1):
12
+ def get_description():
13
+ return "Select action with a random policy"
14
+ real_random_policy.description = get_description()
15
+ return [2 * random.random() - 1 for i in range(2)]
envs/mujoco/swimmer_translator.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''Swimmer
2
+ Action Space Box(-1.0, 1.0, (2,), float32)
3
+
4
+ Observation Space Box(-inf, inf, (8,), float64)
5
+ '''
6
+
7
+ class BasicLevelTranslator:
8
+ def translate(self, state):
9
+ res = (
10
+ f"Angle of the front tip: {state[0]:.2f} rad\n"
11
+ f"Angle of the first rotor: {state[1]:.2f} rad\n"
12
+ f"Angle of the second rotor: {state[2]:.2f} rad\n"
13
+ f"Velocity of the tip along the x-axis: {state[3]:.2f} m/s\n"
14
+ f"Velocity of the tip along the y-axis: {state[4]:.2f} m/s\n"
15
+ f"Angular velocity of front tip: {state[5]:.2f} rad/s\n"
16
+ f"Angular velocity of the first rotor: {state[6]:.2f} rad/s\n"
17
+ f"Angular velocity of the second rotor: {state[7]:.2f} rad/s"
18
+ )
19
+ return res
20
+
21
+ class GameDescriber:
22
+
23
+ def __init__(self, args):
24
+ self.is_only_local_obs = args.is_only_local_obs == 1
25
+ self.max_episode_len = args.max_episode_len
26
+ self.action_desc_dict = {
27
+ }
28
+ self.reward_desc_dict = {
29
+ }
30
+
31
+ def translate_terminate_state(self, state, episode_len, max_episode_len):
32
+ return ""
33
+
34
+ def translate_potential_next_state(self, state, action):
35
+ return ""
36
+
37
+ def describe_goal(self):
38
+ return (
39
+ "The goal in the Swimmer environment is to move as fast as possible towards the right "\
40
+ "by applying torque to the rotors and utilizing fluid friction. The swimmer consists of "\
41
+ "three or more segments connected by rotors, and the objective is to achieve efficient "\
42
+ "swimming motion."
43
+ )
44
+
45
+ def describe_game(self):
46
+ return (
47
+ "In the Swimmer environment, you control a swimmer consisting of three or more segments "\
48
+ "connected by rotors. Your goal is to make the swimmer move as fast as possible to the right "\
49
+ "in a two-dimensional pool. You can achieve this by applying torques to the rotors and utilizing "\
50
+ "fluid friction. The environment provides observations of the swimmer's angles, velocities, "\
51
+ "and angular velocities."
52
+ )
53
+
54
+ def describe_action(self):
55
+ return (
56
+ "Your next move: \nPlease provide a list of two numerical values, each within the range of [-1, 1], "\
57
+ "representing the torques to be applied to the two rotors of the swimmer. These torques will help "\
58
+ "control the swimmer's movement and achieve efficient swimming."
59
+ )
60
+
61
+
62
+ class BasicStateSequenceTranslator(BasicLevelTranslator):
63
+ def translate(self, infos, is_current=False):
64
+ descriptions = []
65
+ if is_current:
66
+ state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
67
+ return state_desc
68
+ for i, info in enumerate(infos):
69
+ assert 'state' in info, "info should contain state information"
70
+ state_desc = BasicLevelTranslator().translate(info['state'])
71
+ action_desc = (
72
+ "Torques Applied: "
73
+ f"Rotor 1: {info['action'][0]:.2f}, Rotor 2: {info['action'][1]:.2f}"
74
+ )
75
+ reward_desc = f"Result: Reward of {info['reward']:.2f}"
76
+ next_state_desc = BasicLevelTranslator().translate(info['next_state'])
77
+ descriptions.append(
78
+ f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to\n{next_state_desc}"
79
+ )
80
+ return descriptions
envs/mujoco/walker2d_policies.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import random
3
+
4
+ def pseudo_random_policy(state, pre_action):
5
+ def get_description():
6
+ return "Select action randomly"
7
+ pseudo_random_policy.description = get_description()
8
+ return [2 * random.random() - 1 for i in range(6)]
9
+
10
+
11
+ def real_random_policy(state, pre_action=1):
12
+ def get_description():
13
+ return "Select action with a random policy"
14
+ real_random_policy.description = get_description()
15
+ return [2 * random.random() - 1 for i in range(6)]
envs/mujoco/walker2d_translator.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ '''Walker2d
3
+ Action Space Box(-1.0, 1.0, (6,), float32)
4
+ Observation Space Box(-inf, inf, (17,), float64)
5
+ '''
6
+ class BasicLevelTranslator:
7
+ def translate(self, state):
8
+ res = (
9
+ f"Z-coordinate of the top (height of walker): {state[0]:.2f} m\n"
10
+ f"Angle of the top: {state[1]:.2f} rad\n"
11
+ f"Angle of the thigh joint: {state[2]:.2f} rad\n"
12
+ f"Angle of the leg joint: {state[3]:.2f} rad\n"
13
+ f"Angle of the foot joint: {state[4]:.2f} rad\n"
14
+ f"Angle of the left thigh joint: {state[5]:.2f} rad\n"
15
+ f"Angle of the left leg joint: {state[6]:.2f} rad\n"
16
+ f"Angle of the left foot joint: {state[7]:.2f} rad\n"
17
+ f"Velocity of the x-coordinate of the top: {state[8]:.2f} m/s\n"
18
+ f"Velocity of the z-coordinate (height) of the top: {state[9]:.2f} m/s\n"
19
+ f"Angular velocity of the angle of the top: {state[10]:.2f} rad/s\n"
20
+ f"Angular velocity of the thigh hinge: {state[11]:.2f} rad/s\n"
21
+ f"Angular velocity of the leg hinge: {state[12]:.2f} rad/s\n"
22
+ f"Angular velocity of the foot hinge: {state[13]:.2f} rad/s\n"
23
+ f"Angular velocity of the thigh hinge (left): {state[14]:.2f} rad/s\n"
24
+ f"Angular velocity of the leg hinge (left): {state[15]:.2f} rad/s\n"
25
+ f"Angular velocity of the foot hinge (left): {state[16]:.2f} rad/s"
26
+ )
27
+ return res
28
+
29
+ class GameDescriber:
30
+ def __init__(self, args):
31
+ self.is_only_local_obs = args.is_only_local_obs == 1
32
+ self.max_episode_len = args.max_episode_len
33
+ self.action_desc_dict = {
34
+ }
35
+ self.reward_desc_dict = {
36
+ }
37
+
38
+ def translate_terminate_state(self, state, episode_len, max_episode_len):
39
+ return ""
40
+
41
+ def translate_potential_next_state(self, state, action):
42
+ return ""
43
+
44
+ def describe_goal(self):
45
+ return (
46
+ "The goal in the Walker2D environment is to coordinate both sets of feet, legs, and thighs "
47
+ "to move in the forward (right) direction by applying torques to the six hinges connecting "
48
+ "the six body parts. The objective is to make the robot walk forward."
49
+ )
50
+
51
+ def describe_game(self):
52
+ return (
53
+ "In the Walker2D environment, you control a two-dimensional two-legged walker with four main body parts. "
54
+ "Your objective is to make the walker move forward by coordinating the torques applied to the six hinges "
55
+ "connecting the body parts. The environment provides observations of the walker's body parts and velocities, "
56
+ "including the torso, leg, and thigh angles, orientations, and velocities. The goal is to make the walker walk "
57
+ "forward in the positive x-direction."
58
+ )
59
+
60
+ def describe_action(self):
61
+ return (
62
+ "Your next move: \nPlease provide a list of six numerical values, each within the range of [-1, 1], "
63
+ "representing the torques to be applied at the six hinge joints of the walker. These torques will help "
64
+ "coordinate the walker's movements and make it walk in the desired direction."
65
+ )
66
+
67
+ class BasicStateSequenceTranslator(BasicLevelTranslator):
68
+ def translate(self, infos, is_current=False):
69
+ descriptions = []
70
+ if is_current:
71
+ state_desc = BasicLevelTranslator().translate(infos[-1]['state'])
72
+ return state_desc
73
+ for i, info in enumerate(infos):
74
+ assert 'state' in info, "info should contain state information"
75
+ state_desc = BasicLevelTranslator().translate(info['state'])
76
+ action_desc = (
77
+ "Torques Applied: "
78
+ f"Thigh: {info['action'][0]:.2f}, Leg: {info['action'][1]:.2f}, Foot: {info['action'][2]:.2f}, "
79
+ f"Left Thigh: {info['action'][3]:.2f}, Left Leg: {info['action'][4]:.2f}, Left Foot: {info['action'][5]:.2f}"
80
+ )
81
+ reward_desc = f"Result: Reward of {info['reward']:.2f}"
82
+ next_state_desc = BasicLevelTranslator().translate(info['next_state'])
83
+ descriptions.append(
84
+ f"{state_desc}\n{action_desc}\n{reward_desc}\nTransit to\n{next_state_desc}"
85
+ )
86
+ return descriptions
main_reflexion.py CHANGED
@@ -292,7 +292,7 @@ if __name__ == "__main__":
292
  parser.add_argument(
293
  "--api_type",
294
  type=str,
295
- default="azure",
296
  choices=["azure", "openai"],
297
  help="choose api type, now support azure and openai"
298
  )
 
292
  parser.add_argument(
293
  "--api_type",
294
  type=str,
295
+ default="openai",
296
  choices=["azure", "openai"],
297
  help="choose api type, now support azure and openai"
298
  )
record_reflexion.csv CHANGED
@@ -12,4 +12,11 @@ RepresentedBoxing-v0,1,expert,200.0
12
  RepresentedPong-v0,1,expert,200.0
13
  RepresentedMsPacman-v0,1,expert,10000.0
14
  RepresentedMontezumaRevenge-v0,1,expert,10000.0
15
- Ant-v4,1,expert,5000
 
 
 
 
 
 
 
 
12
  RepresentedPong-v0,1,expert,200.0
13
  RepresentedMsPacman-v0,1,expert,10000.0
14
  RepresentedMontezumaRevenge-v0,1,expert,10000.0
15
+ Ant-v4,1,expert,5000.2
16
+ HalfCheetah-v4,1,expert,12138.8
17
+ Hopper-v4,1,expert,3542.2
18
+ Walker2d-v4,1,expert,5000.0
19
+ Swimmer-v4,1,expert,44.4
20
+ Reacher-v4,1,expert,-2.6
21
+ Pusher-v4,1,expert,-52.3
22
+
test_atari.sh → shell/test_atari.sh RENAMED
File without changes
shell/test_mujoco_ant.sh CHANGED
@@ -1,6 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # exe
2
  python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
3
- python main_reflexion.py --env_name CliffWalking-v0 --init_summarizer cliffwalking_init_translator --curr_summarizer cliffwalking_basic_translator --decider exe_actor --prompt_level 2 --num_trails 1 --distiller guide_generator --prompt_path "envs/toy_text/few_shot_examples/cliffwalking"
4
- python main_reflexion.py --env_name CliffWalking-v0 --init_summarizer cliffwalking_init_translator --curr_summarizer cliffwalking_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
5
- python main_reflexion.py --env_name CliffWalking-v0 --init_summarizer cliffwalking_init_translator --curr_summarizer cliffwalking_basic_translator --decider exe_actor --prompt_level 4 --num_trails 1 --distiller guide_generator --prompt_path "envs/toy_text/few_shot_examples/cliffwalking"
6
- python main_reflexion.py --env_name CliffWalking-v0 --init_summarizer cliffwalking_init_translator --curr_summarizer cliffwalking_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
 
1
+
2
+ # Ant-v4
3
+
4
+ # COT
5
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
6
+
7
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
8
+
9
+ # SPP
10
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
11
+
12
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
13
+
14
+
15
+
16
+ # REFLEXION
17
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
18
+
19
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
20
+
21
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
22
+
23
+
24
  # exe
25
  python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
26
+
27
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
28
+
29
+ python main_reflexion.py --env_name Ant-v4 --init_summarizer ant_init_translator --curr_summarizer ant_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
shell/test_mujoco_halfcheetah.sh ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # HalfCheetah-v4
3
+ # Naive Actor
4
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 1 --num_trails 1
5
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
6
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
7
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
8
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider naive_actor --prompt_level 5 --num_trails 1
9
+
10
+ # COT
11
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
12
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
13
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
14
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
15
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider cot_actor --prompt_level 5 --num_trails 1
16
+
17
+ # self consistency
18
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 1 --num_trails 1
19
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
20
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
21
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
22
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider self_consistency_actor --prompt_level 5 --num_trails 1
23
+
24
+ # self-ask
25
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 1 --num_trails 1
26
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
27
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
28
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
29
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider selfask_actor --prompt_level 5 --num_trails 1
30
+
31
+ # SPP
32
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
33
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 2 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
34
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
35
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 4 --num_trails 1 --distiller traj_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
36
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider spp_actor --prompt_level 5 --num_trails 1
37
+
38
+ # REFLEXION
39
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
40
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 2 --num_trails 1 --distiller reflect_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
41
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
42
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 4 --num_trails 1 --distiller reflect_distiller --prompt_path "envs/classic_control/few_shot_examples/halfcheetahpole"
43
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
44
+
45
+
46
+ # exe
47
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
48
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 2 --num_trails 1 --distiller guide_generator --prompt_path "envs/toy_text/few_shot_examples/cliffwalking"
49
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
50
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 4 --num_trails 1 --distiller guide_generator --prompt_path "envs/toy_text/few_shot_examples/cliffwalking"
51
+ python main_reflexion.py --env_name HalfCheetah-v4 --init_summarizer halfcheetah_init_translator --curr_summarizer halfcheetah_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
shell/test_mujoco_hopper.sh ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hopper-v4
2
+
3
+ # COT
4
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
5
+
6
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
7
+
8
+ # SPP
9
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
10
+
11
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
12
+
13
+
14
+
15
+ # REFLEXION
16
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
17
+
18
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
19
+
20
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
21
+
22
+
23
+ # exe
24
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
25
+
26
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
27
+
28
+ python main_reflexion.py --env_name Hopper-v4 --init_summarizer hopper_init_translator --curr_summarizer hopper_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
shell/test_mujoco_invertedDoublePendulum.sh ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # InvertedDoublePendulum-v4
2
+
3
+ # COT
4
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
5
+
6
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
7
+
8
+ # SPP
9
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
10
+
11
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
12
+
13
+
14
+ # REFLEXION
15
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
16
+
17
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
18
+
19
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
20
+
21
+
22
+ # exe
23
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
24
+
25
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
26
+
27
+ python main_reflexion.py --env_name InvertedDoublePendulum-v4 --init_summarizer invertedDoublePendulum_init_translator --curr_summarizer invertedDoublePendulum_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
shell/test_mujoco_invertedPendulum.sh ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # InvertedPendulum-v4
2
+
3
+ # COT
4
+ python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
5
+
6
+ python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
7
+
8
+ # SPP
9
+ python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
10
+
11
+ python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
12
+
13
+
14
+
15
+ # REFLEXION
16
+ python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
17
+
18
+ python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
19
+
20
+
21
+
22
+ # exe
23
+ python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
24
+
25
+ python main_reflexion.py --env_name InvertedPendulum-v4 --init_summarizer invertedPendulum_init_translator --curr_summarizer invertedPendulum_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
shell/test_mujoco_pusher.sh ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pusher-v4
2
+
3
+ # COT
4
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
5
+
6
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
7
+
8
+ # SPP
9
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
10
+
11
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
12
+
13
+
14
+ # REFLEXION
15
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
16
+
17
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
18
+
19
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
20
+
21
+
22
+ # exe
23
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
24
+
25
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
26
+
27
+ python main_reflexion.py --env_name Pusher-v4 --init_summarizer pusher_init_translator --curr_summarizer pusher_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
shell/test_mujoco_reacher.sh ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reacher-v4
2
+
3
+ # COT
4
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
5
+
6
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
7
+
8
+ # SPP
9
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
10
+
11
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
12
+
13
+
14
+ # REFLEXION
15
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
16
+
17
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
18
+
19
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
20
+
21
+
22
+ # exe
23
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
24
+
25
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
26
+
27
+ python main_reflexion.py --env_name Reacher-v4 --init_summarizer reacher_init_translator --curr_summarizer reacher_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
shell/test_mujoco_swimmer.sh ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Swimmer-v4
2
+
3
+ # COT
4
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
5
+
6
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
7
+
8
+ # SPP
9
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
10
+
11
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
12
+
13
+
14
+ # REFLEXION
15
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
16
+
17
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
18
+
19
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
20
+
21
+
22
+ # exe
23
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
24
+
25
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
26
+
27
+ python main_reflexion.py --env_name Swimmer-v4 --init_summarizer swimmer_init_translator --curr_summarizer swimmer_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
shell/test_mujoco_walker2d.sh ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Walker2d-v4
2
+
3
+ # COT
4
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider cot_actor --prompt_level 1 --num_trails 1
5
+
6
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider cot_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
7
+
8
+ # SPP
9
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider spp_actor --prompt_level 1 --num_trails 1
10
+
11
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider spp_actor --prompt_level 3 --num_trails 5 --distiller traj_distiller
12
+
13
+
14
+
15
+ # REFLEXION
16
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider reflexion_actor --prompt_level 1 --num_trails 1 --distiller reflect_distiller
17
+
18
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider reflexion_actor --prompt_level 3 --num_trails 5 --distiller reflect_distiller
19
+
20
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider reflexion_actor --prompt_level 5 --num_trails 1 --distiller reflect_distiller
21
+
22
+
23
+ # exe
24
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider exe_actor --prompt_level 1 --num_trails 1 --distiller guide_generator --api_type openai
25
+
26
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider exe_actor --prompt_level 3 --num_trails 5 --distiller guide_generator
27
+
28
+ python main_reflexion.py --env_name Walker2d-v4 --init_summarizer walker2d_init_translator --curr_summarizer walker2d_basic_translator --decider exe_actor --prompt_level 5 --num_trails 1 --distiller guide_generator
test_reflexion.sh → shell/test_reflexion.sh RENAMED
File without changes