Spaces:

teachyourselfcoding
/

chatlawv1

Runtime error

Upload 245 files

fa6856c almost 2 years ago

1.11 kB

	# Random Walks: Decision Tree Example

	This example uses the Toy Problem described in [Decision Transformer (Lili Chen
	et al. 2021)](https://arxiv.org/abs/2106.01345).

	## Game Description

	The task is to find the shortest path on a directed graph. The reward is based
	on how optimal the path is compared to the shortest possible (bounded in [0,
	1]).

	Note this is different to the paper, which gave rewards of -1 for every
	turn not at the goal state, and 0 at the goal state. Here the model instead
	receives its reward at the end of the full trajectory, based on how optimal it
	is compared to the minimum number of steps to reach the goal state (bounded in
	[0, 1]).

	Paths are represented as strings of letters, with each letter corresponding to a
	node in the graph.

	## Training

	![Graph Example](graph-example.png)
	Source: Decision Transformer (Lili Chen et al. 2021)

	For PPO, a language model was fine-tuned to predict the next token in a sequence
	of returns-to-go (sum of future rewards), states and actions. It was trained
	only on random walk data.

	ILQL by contrast learns from the samples directly.