Spaces:
Paused
Paused
File size: 8,717 Bytes
ed5e255 21a461d eef98a6 ed5e255 1f4d5df ed5e255 eef98a6 ed5e255 8a983b3 2fbb1cf 352aefb f131872 32b0d27 1ab224b 32b0d27 1ab224b 32b0d27 1ab224b 8a75988 32b0d27 cb9c6a8 32b0d27 cb9c6a8 32b0d27 cb9c6a8 32b0d27 cb9c6a8 32b0d27 1ab224b 32b0d27 aff5e71 91b65b2 639f902 0347d5d 639f902 0347d5d 639f902 0347d5d 639f902 91b65b2 1813b17 ed5e255 91b65b2 a83d4d2 1f4d5df 91b65b2 1f4d5df |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
---
title: Blog
fullWidth: true
emoji: ⚔️
colorFrom: red
colorTo: green
sdk: streamlit
sdk_version: 1.21.0
app_file: app.py
pinned: true
---

Here's your content formatted in Markdown:
```markdown
# 3/14/2024
I have to find some way to map the inputs which are in Biz Hawk format to the Mario gym Format which is a discrete integer action set. I don't want to fine-tune the model with only the gym environment because I don't want to change datasets if I want to use the model to play other games. I'd rather have a stupid mapping function here than go back and redo all my data.
This is the Mario Jim environment expected input where each index corresponds to an integer action: [GitHub - gym-super-mario-bros](https://github.com/Kautenja/gym-super-mario-bros/blob/bcb8f10c3e3676118a7364a68f5c0eb287116d7a/gym_super_mario_bros/actions.py#L27C1-L40C2)
```python
COMPLEX_MOVEMENT = [
['NOOP'],
['right'],
['right', 'A'],
['right', 'B'],
['right', 'A', 'B'],
['A'],
['left'],
['left', 'A'],
['left', 'B'],
['left', 'A', 'B'],
['down'],
['up'],
]
```
This is the output from a script that I used to get controller inputs in the Biz Hawk emulator using LUA. There probably is a better JSON object formatter but I just cracked one together. Note that it also includes player two inputs.
```json
{
"P1 Start": false,
"P1 Down": false,
"P1 Right": false,
"P2 Right": false,
"P2 A": false,
"P2 L": false,
"P1 X": false,
"P1 L": false,
"P1 Left": false,
"P1 R": false,
"P2 Left": false,
"Power": false,
"Reset": false,
"P2 R": false,
"P2 Up": false,
"P1 Up": false,
"P1 Select": false,
"P2 Select": false,
"P2 Start": false,
"P2 X": false,
"P2 B": false,
"P2 Y": false,
"P2 Down": false,
"P1 Y": false,
"P1 A": false,
"P1 B": false
};
{
"P1 Start": false,
"P1 Down": false,
"P1 Right": false,
"P2 Right": false,
"P2 A": false,
"P2 L": false,
"P1 X": false,
"P1 L": false,
"P1 Left": false,
"P1 R": false,
"P2 Left": false,
"Power": false,
"Reset": false,
"P2 R": false,
"P2 Up": false,
"P1 Up": false,
"P1 Select": true,
"P2 Select": false,
"P2 Start": false,
"P2 X": false,
"P2 B": false,
"P2 Y": false,
"P2 Down": false,
"P1 Y": false,
"P1 A": false,
"P1 B": false
}
```
This is the format that the model is trained to output. Notice here that there is a Y button because this is coming from a Super Nintendo.
```json
{
"P1 Right": true,
"P1 Y": true
}
```
Instead of collecting more data from an NES tool-assisted speedrun or movie, I'm just going to map the Y buttons to be the B button and the B button to be the A button. In other words: The button used to run and shoot fireballs in SMW will be mapped to the button that runs and shoots fireballs in SMB. And the same will happen for the jump button.
Controls reference for SMW: [SMW Controls](https://smwspeedruns.com/Controls)
It took a little while and a lot of generations of Chechi P T later, I hammered the output into:
```python
def map_buttons_to_locations(input_dict):
button_mapping = {
"P1 Right": "right",
"P1 Left": "left",
"P1 A": "A",
"P1 B": "A",
"P1 X": "B",
"P1 Y": "B",
"P1 Up": "up",
"P1 Down": "down"
}
concatenated_locations = [button_mapping[key] for key, value in input_dict.items() if value and key in button_mapping]
def compare(concatenated_locations):
result = []
for idx, complex_move in enumerate(COMPLEX_MOVEMENT):
if concatenated_locations == complex_move:
result.append(idx)
return result if result else ['NOOP']
return compare(concatenated_locations)
COMPLEX_MOVEMENT = [
['NOOP'],
['right'],
['right', 'A'],
['right', 'B'],
['right', 'A', 'B'],
['A'],
['left'],
['left', 'A'],
['left', 'B'],
['left', 'A', 'B'],
['down'],
['up'],
]
# Example usage:
input_data = {
# "P1 Left": True,
"P1 Right": True,
"P1 A": True,
"P1 Y": True,
}
print(map_buttons_to_locations(input_data))
```
Which neatly takes a dictionary input and outputs a single numerical output.
# 8/3/2023 rsi
Getting RSI in my hands has been the worst experience of my entire life. It is essentially cost me most of my productive years of my late 20s and early 30s. I am continuing to train with plans set by my physical therapist. And I have not lost hope. My hands aren't bad this time but I can feel the familiar pins and pricks. I guess I'll just take two weeks off of doing anything and see what happens.
# 7/27/23 RougeGPTSure! Here's the nicely formatted version in markdown:
## Rogue GPT
Rogue GPT is an attempt to not only instruct Tune LLM-powered agents (treating LLMs as reasoning engines) for tasks in the mini hack environment but also to explore the use of reinforcement learning and continuous learning for embodied agents inside environments, using only LLMs so that Lessons Learned can be abstracted to other modalities.
I want to use small LLMs and a focused dataset so I can really get a good idea of how the moving parts perform and what data is necessary besides just large general knowledge. I'm under the assumption that a carefully curated dataset specifically tailored towards continuous learning in an embodied agent can yield desirable results even in models with fewer than a billion parameters.
My rough strategy is to use the tiny stories dataset, a trajectory-based dataset only using the human Monk trajectories, and select categories from the Nat hack Wiki. I plan to perform some ablations to see which datasets are critical. Once I have the basic instruction tuning up, so we can follow basic small instructions, I will then attempt to implement some combination of ideas from papers that I've been interested in.
## Justifications for the Datasets
### Tiny Stories Dataset
I want to give the model a basic understanding of the English language so that it can hopefully comprehend what's happening in the Wikipedia or any of the game messages that Nah hack produces. [^1^]
### Trajectory Dataset
This carefully formatted dataset will be used to structure how the agent behaves and how I parse out the states, actions, and other various information I'm interested in. [^2^]
### Subset of the Nat Hack Wiki
I will be creating a subset dataset that contains categories I think would be most useful to an agent who should have information on things inside the game. [^3^]
## Papers I'm Interested In
- Work in Progress Paper 1 [^4^]
- Work in Progress Paper 2 [^5^]
- Work in Progress Paper 3 [^6^]
## References
[^1^]: [Link to Paper 1]
[^2^]: [Link to Paper 2]
[^3^]: [Link to Paper 3]
[^4^]: [Link to Paper 4]
[^5^]: [Link to Paper 5]
[^6^]: [Link to Paper 6]
Please replace the "[Link to Paper X]" with actual links to the papers you're interested in or their respective references. Also, feel free to update the content inside the subsections with the appropriate information about each dataset and your justifications.
# 7/25/23
https://astralcodexten.substack.com/p/were-not-platonists-weve-just-learned
intelligence explosion
# 7/23/23 - Towards A Unified Agent with Foundation Models
https://arxiv.org/abs/2307.09668
Generate synthetic data set for the state that you want, search over the action space until you find a trajectory that reaches a cosine similarity threshold denoted by the state you want, add all those frames and states of the buffer and incorporate into training
You can bootstrap process with priors still search for the desired state
## reward
Reward any trajectory proportionally to a semantically similar state as any state in a run with a victory condition.
Linear or some function reward curve
## Sample curve
Sections of states with more changes in them
## notes
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
# 7/21/23
am going to naively, without evidence, state that you can represent any function in text with a large language model.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# measured steps
Probably should have figured out sooner that small measured steps done in consistent application leads to results. Predicting the outcome while getting there can be interesting but is ultimately just an イメージ in your head.
# Stack More Layers Differently:
High-Rank Training Through Low-Rank Updates
https://arxiv.org/pdf/2307.05695.pdf
https://github.com/guitaricet/peft_pretraining |