File size: 3,747 Bytes
ed5e255
 
21a461d
eef98a6
ed5e255
1f4d5df
ed5e255
 
 
eef98a6
ed5e255
 
8a983b3
2fbb1cf
1ab224b
 
 
 
 
 
 
cb9c6a8
 
 
 
 
 
 
 
680f0fc
1ab224b
aff5e71
 
 
 
91b65b2
639f902
 
 
 
 
 
 
0347d5d
639f902
 
 
 
0347d5d
639f902
 
0347d5d
 
639f902
 
91b65b2
1813b17
 
ed5e255
91b65b2
a83d4d2
1f4d5df
91b65b2
1f4d5df
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: Blog
fullWidth: true
emoji: ⚔️
colorFrom: red
colorTo: green
sdk: streamlit
sdk_version: 1.21.0
app_file: app.py
pinned: true
---

![image info](./mlion.png)

# 7/27/23 RougeGPT
Rogue GPT is an attempt to not only instruction Tune LLM  powered agents ( treating llms as reasoning engines) for tasks and the mini hack environment, but explore the use of reinforcement learning and continuous learning for embodied agents inside environments using only llms so that Lessons Learned can be abstracted to other modalities.

I want to use small llms and a focused data set so I can really get a good idea of how the moving Parts perform and what data is necessary besides just large general knowledge. I'm under the assumption that a carefully curated data set specifically tailored towards continuous learning in an embodied agent can have desirable results even in less than billion parameter models 

My rough strat is to use the tiny stories data set, a trajectory based data set only using the human Monk trajectories, and select categories from the Nat hack Wiki. I plan to do some ablations to see which data sets are critical. And once I have the basic instruction tuning up so we can follow basic small instructions, I will then attempt to implement some combination of ideas of some papers that I've been interested in.

My justifications for my justifications for the data sets: 

Tiny stories: I want to give the model a basic understanding of the English language so that it can hopefully understand what's happening in the Wikipedia or any of the game messages that nah hack produces.

Trajectory data set: this carefully formatted data set will be used to structure how the agent behaves and how I parse out the states and actions and other various information I'm interested in.

Subset of the nattack wiki: I will be making a subset data set that contains categories that I think would be most useful to an agent who should have information on things inside the game.

I am carefully formatting my trajectory data set for two reasons: one I want to make parsing trivial. Two on I am assuming that regularity and format of state's presented to the llm will allow it to generate output that I desire more easily. This is just an assumption.

# 7/25/23
https://astralcodexten.substack.com/p/were-not-platonists-weve-just-learned
intelligence explosion

# 7/23/23 - Towards A Unified Agent with Foundation Models
https://arxiv.org/abs/2307.09668

Generate synthetic data set for the state that you want, search over the action space until you find a trajectory that reaches a cosine similarity threshold denoted by the state you want, add all those frames and states of the buffer and incorporate into training

You can bootstrap process with priors still search for the desired state


## reward
Reward any trajectory proportionally to a semantically similar state as any state in a run with a victory condition.
Linear or some function reward curve


## Sample curve
Sections of states with more changes in them

## notes
http://www.incompleteideas.net/IncIdeas/BitterLesson.html


# 7/21/23
 am going to naively, without evidence, state that you can represent any function in text with a large language model.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# measured steps
Probably should have figured out sooner that small measured steps done in consistent application leads to results. Predicting the outcome while getting there can be interesting but is ultimately just an イメージ in your head.

# Stack More Layers Differently:
  High-Rank Training Through Low-Rank Updates
  
https://arxiv.org/pdf/2307.05695.pdf
https://github.com/guitaricet/peft_pretraining