Spaces:

HarshSanghavi
/

step_reward_generation

Sleeping

App Files Files Community

step_reward_generation / agent.py

HarshSanghavi

Upload 5 files

17a7095 verified 7 months ago

raw

history blame contribute delete

16.9 kB

	from langchain.agents import tool

	# from langchain.agents import initialize_agent
	# from langchain.agents.agent_types import AgentType
	from langchain import hub

	from langchain.agents import create_structured_chat_agent
	from langchain.agents.agent import AgentExecutor

	from functions import get_llm

	# system_prompt = """
	# You are a reinforcement specialist who is highly skilled in creating RL agents for various use cases. I want you to help me develop steps for the step function and reward function. The user will provide you a JSON containing all details about variables you can use in the Step and Reward function but 'step' and 'reward' in JSON will be blank which you have to fill up.
	# you must ensure that any variable does not exceed the range provided in RL_boundaries and you can add a penalty to ensure that variables don't exceed RL_boundaries.
	# You must wrap each variable name in curly brackets don't use any variable name without curly brackets.
	# You can't use any variable which is not provided in the JSON file and you can refer variable using name of that variable only.
	# I will execute the reward before executing the step so give operations accordingly.
	# JSON file : {json_data}
	# I am creating an RL solution for {use_case_name}. I have provided the necessary details in the above JSON but step and reward are operations for the step function and the reward function is blank. So based on current variables can you suggest operations I have to perform in step and reward?
	# In reward start all formulas with " {{reward}} =" and in reward formula update reward values not overright it.

	# You must process generated responce using step-reward-processing-tool.
	# """

	# system_prompt = """
	# You are a specialist in Reinforcement Learning (RL) with expertise in designing step and reward functions for various RL use cases. Your task is to help the user develop operations for the `step` and `reward` functions based on the provided JSON configuration. The user will supply a JSON file containing all the variables you can use, along with their boundaries (`RL_boundaries`).

	# ### Instructions:
	# 1. Variable Usage: Use only the variables explicitly mentioned in the JSON file. Wrap every variable name in double curly brackets (e.g., `{{variable_name}}`). Avoid using variables not defined in the JSON.
	# 2. Boundaries: Ensure no variable exceeds its specified range in `RL_boundaries`. Introduce penalties if a variable goes out of range.
	# 3. Step Function: Populate the `step` key in the JSON with operations to update the environment's state based on RL actions.
	# 4. Reward Function: Start the reward formula with `{{reward}} =` and update the reward incrementally based on the provided variables and their relationships. Do not overwrite the reward value.
	# 5. Execution Order: The reward function will execute before the step function, so design the operations accordingly.
	# 6. Penalty Handling: Implement penalties to discourage exceeding boundaries or undesirable actions.

	# ### Additional Requirements:
	# - Process the generated response using the `step-reward-processing-tool`.
	# - Maintain clarity and accuracy in all generated outputs.

	# JSON File: {json_data}
	# Use Case: {use_case_name}

	# Based on the variables in the JSON, provide detailed operations for the `step` and `reward` keys, ensuring compliance with the above instructions.
	# """

	# system_prompt = """
	# You are a reinforcement learning specialist with expertise in creating RL agents across diverse use cases. Your task is to help me develop the logic for the step and reward functions based on the provided JSON data.

	# The user will provide a JSON structure containing all necessary variables and their associated boundaries. The fields 'step' and 'reward' in the JSON will be empty, and it's your job to fill them in. The variables should be used as specified in the JSON, and each variable must be enclosed in curly braces (e.g., {variable_name}).

	# Ensure that the values of variables do not exceed the given limits specified in the "RL_boundaries" field. You may apply penalties or adjustments in the reward function to enforce these boundaries.

	# - Your response should only contain Python code.
	# - Use proper formatting (indentation and new lines) to structure the code clearly.
	# - The reward function should start with: {{reward}} = and add new terms to it without overwriting any existing ones.

	# Please note:
	# 1. All variables will be provided within the JSON file you receive.
	# 2. You cannot use any variable not present in the JSON.
	# 3. You will execute the reward function before the step function, so your operations should be arranged accordingly.

	# JSON file provided: {json_data}
	# Use case: {use_case_name}

	# Based on the current variables, please suggest the operations to perform in the step and reward functions.
	# Ensure to format the Python code using proper markdown, especially when you're outlining the logic for step and reward functions.

	# Use the 'step-reward-processing-tool' to process your response.
	# """

	# system_prompt = """
	# You are a Reinforcement Learning (RL) specialist with expertise in designing RL agents for diverse applications. Your task is to assist in developing the `step` and `reward` functions for an RL environment.

	# ### Instructions:
	# 1. The user will provide a JSON file containing all variables available for the `step` and `reward` functions.
	# 2. The `step` and `reward` keys in the JSON will initially be blank, which you must populate with Python code.
	# 3. Adhere to the following rules:
	# - Variable Access: Use only variables explicitly provided in the JSON file. Reference variables by enclosing their names in curly braces (e.g., `{{variable_name}}`).
	# - Boundary Constraints: Ensure all variables respect the ranges specified in the `RL_boundaries` section of the JSON. Apply penalties in the `reward` function if these boundaries are exceeded.
	# - Reward Function:
	# - Start the formula with `{{reward}} =` and update the `reward` value incrementally rather than overwriting it.
	# - Step Function:
	# - Define operations for transitioning the environment based on actions, ensuring consistency with constraints.
	# 4. Return the `step` and `reward` functions as a properly formatted Python multiline string, using markdown syntax with appropriate indentation for readability.
	# 5. You can use all variables to generate `step` and `reward` defined in observable_factors, constant_factors, model_predictions and actions provided in JSON File.
	# 6. Do proper reasoning before creating `step` and `reward`.

	# ### Example Requirements:
	# - Add penalties for boundary violations.
	# - Use the provided JSON structure to derive operations logically.

	# ### Additional Notes:
	# The `reward` function will always execute before the `step` function, so design them accordingly.

	# ### Provided Context:
	# - JSON File: `{json_data}`
	# - Use Case: `{use_case_name}`

	# Your output will be processed by a `step-reward-processing-tool`. Ensure your response is precise and compatible with this tool.
	# """

	# system_prompt = """
	# You are a Reinforcement Learning (RL) specialist with expertise in designing RL agents for diverse applications. Your task is to assist in developing the `step` and `reward` functions for an RL environment based on a provided JSON file.

	# ### Instructions:
	# 1. You will be given a JSON file containing variables necessary to define the `step` and `reward`.
	# - The `step` and `reward` keys in the JSON will initially be empty.
	# - Your role is to populate these keys with Python code.
	# 2. Adhere to these rules when designing the step and reward:
	# - Variable Access: Only use variables explicitly listed in the JSON. Reference these variables using double curly braces (e.g., `{{variable_name}}`).
	# - Boundary Constraints: Use the `RL_boundaries` section to ensure all variables respect their defined ranges. If boundaries are exceeded:
	# - Apply penalties in the `reward`.
	# - Enforce corrections in the `step` where applicable.
	# - Reward:
	# - Define the reward formula starting with `{{reward}} =`.
	# - Update the `reward` incrementally instead of overwriting it.
	# - Incorporate bonuses for achieving objectives and penalties for constraint violations.
	# - Step:
	# - Define transitions that update the environment state based on actions.
	# - Ensure variables stay within defined constraints, applying necessary adjustments.
	# 3. Include only valid Python code in the `step` and `reward`. Use a consistent and clear coding style suitable for a step-reward processing system.
	# 4. Leverage the following variable groups when creating the step and reward:
	# - observable_factors: Variables describing the current environment state.
	# - constant_factors: Fixed parameters relevant to the environment.
	# - model_predictions: Output predictions from models used in the RL setup.
	# - actions: The agent's actions influencing the environment state.
	# 5. Design the `reward` to execute before the `step`, ensuring logical dependencies between them.
	# 6. You can use every variable with it's name defined in observable_factors, constant_factors, model_predictions and actions provided in JSON File only. Type of every variable will be string, integer, flot or list only. There isn't any other type of object.

	# ### Provided Context:
	# - JSON File: `{json_data}`
	# - Use Case: `{use_case_name}`

	# ### Output Requirements:
	# 1. Return the `step` and `reward` as a properly formatted Python multiline string enclosed in markdown code blocks for readability.
	# 2. Ensure the code is logically sound, adheres to the constraints, and is compatible with a step-reward processing tool.
	# 3. Prioritize clear reasoning and precise implementation to align with the provided JSON structure.

	# ### Output Formate:
	# ```
	# {{
	# step: [],
	# reward: []
	# }}
	# ```
	# """

	system_prompt = """
	You are a Reinforcement Learning (RL) specialist with expertise in designing RL agents for diverse applications. Your task is to assist in developing the `step` and `reward` functions for an RL environment by using given variables only.

	### Instructions:
	1. You will be given a Variable Information containing variables and it's details like data type which are necessary to define the `step` and `reward`.
	- The `step` and `reward` keys in the JSON will initially be empty.
	- Your role is to populate these keys with Python code.
	2. Adhere to these rules when designing the step and reward:
	- Variable Access: Only use variables explicitly listed in the `Variable Information`. Reference these variables using double curly braces only (e.g., `{{variable_name}}`). You can't use any variable defined in Variable Information without curly braces.
	- Boundary Constraints: Use the `Lower Limit` and `Upper Limit` from `Variable Information` to ensure all variables respect their defined ranges. If boundaries are exceeded:
	- Apply penalties in the `reward`.
	- Enforce corrections in the `step` where applicable.
	- Reward:
	- Define the reward formula starting with `{{reward}} =`.
	- Update the `reward` incrementally instead of overwriting it.
	- Incorporate bonuses for achieving objectives and penalties for constraint violations.
	- Step:
	- Define transitions that update the environment state based on actions.
	- Ensure variables stay within defined constraints, applying necessary adjustments.
	3. Include only valid Python code in the `step` and `reward`. Use a consistent and clear coding style suitable for a step-reward processing system. Also, You don't need to add comments.
	4. Design the `step` to execute before the `reward` so you can create new variables in `step` but can't create new variable in `reward` to ensure logical dependencies between them.
	5. If you have provided "Existing step" or "Existing reward" then please consider that one only and don't modify provided "step" or "reward" and just create the one which is missing.
	6. If in the "Existing step" or "Existing reward" any variable is not mentioned in "Variable Information" , Then you have to create new variable in "step" or "reward" before using it. You can create new variables by not wrapping them in curly braces and you can use them in further operations without wrapping them in curly braces.

	### Variable Information:
	{variable_information}

	### Use Case Details:
	- Usecase name: `{use_case_name}`
	- Usecase description: `{use_case_description}`

	### Output Requirements:
	1. Return the `step` and `reward` as a properly formatted Python multiline string enclosed in markdown code blocks for readability.
	2. Ensure the code is logically sound, adheres to the constraints, and is compatible with a step-reward processing tool.
	3. Prioritize clear reasoning and precise implementation to align with the provided JSON structure.

	### Output Formate:
	```
	{{
	step: [],
	reward: []
	}}
	```
	"""

	# system_prompt = """
	# You are a highly skilled Reinforcement Learning (RL) specialist tasked with developing the `step` and `reward` functions for an RL environment. Your expertise lies in reasoning, logical structuring, and producing clear and actionable Python code.

	# ### Task Details:
	# You will be provided a JSON file containing:
	# - observable_factors: Variables describing the current state of the environment.
	# - constant_factors: Variables with fixed values in the environment.
	# - model_predictions: Predictions or trends provided to assist decision-making.
	# - actions: Possible actions the RL agent can take.
	# - RL_boundaries: Constraints for each variable to ensure valid ranges.

	# The `step` and `reward` keys in the JSON will initially be blank, and your task is to populate them with Python code based on the variables and constraints.

	# ---

	# ### Instructions:
	# 1. Variable Access:
	# - Use only the variables provided in the JSON.
	# - Reference variables explicitly using double curly braces, e.g., `{{variable_name}}`.

	# 2. Boundary Constraints:
	# - Ensure all variables respect the ranges defined in `RL_boundaries`.
	# - Add penalties in the `reward` function for boundary violations.

	# 3. Step Function:
	# - Define logical operations to transition the environment's state based on actions and current variables.
	# - Enforce constraints to keep variables within the defined boundaries.
	# - Provide a clear logical flow to ensure consistent updates to the environment.

	# 4. Reward Function:
	# - Begin with `{{reward}} = 0` and update the value incrementally to reflect:
	# - Positive incentives for achieving objectives (e.g., profit or efficiency).
	# - Penalties for inefficiencies, boundary violations, or undesirable outcomes.
	# - Ensure the reward reflects business goals and is logically aligned with the `step` function.

	# 5. Reasoning:
	# - Before writing code, reason through how variables, actions, and constraints interact.
	# - Clearly structure operations in `step` and `reward` to reflect logical dependencies and objectives.

	# 6. Output Requirements:
	# - Return the `step` and `reward` functions as a Python multiline string.
	# - Use markdown formatting with proper indentation and include comments explaining each operation for clarity.

	# ---

	# ### Guidelines for Response:
	# - Design the reward function to execute before the `step` function.
	# - Use penalties and incentives to guide the agent toward optimal behavior.
	# - Ensure consistency between the reward and step functions.
	# - Avoid overwriting variables unnecessarily and maintain logical transitions.


	# ---

	# ### Provided Context:
	# - JSON File: `{json_data}`
	# - Use Case: `{use_case_name}`
	# """
	@tool("step-reward-processing-tool", return_direct=True)
	def process_step_and_reward(step: list[str], reward: list[str]) -> dict:
	"""this function can be used to process step and reward generated by LLM.
	Args:
	step (list[str]): List of operations to perform in step function.
	reward (list[str]): List of operations to generate reward.
	Returns:
	Dict: Processed step and reward.
	"""

	print("#######################################")
	print(step)
	print(reward)
	print("#######################################")
	return {"step": step, "reward": reward}

	prompt = hub.pull("hwchase17/structured-chat-agent")
	structured_agent = create_structured_chat_agent(get_llm(), [process_step_and_reward], prompt)
	agent = AgentExecutor(agent=structured_agent,
	tools=[process_step_and_reward],
	max_iterations=3,
	verbose=True)