File size: 7,121 Bytes
fe643f6 fa1e750 fe643f6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
import marimo
__generated_with = "0.11.9"
app = marimo.App()
@app.cell(hide_code=True)
def _():
import marimo as mo
import synalinks
synalinks.backend.clear_session()
return mo, synalinks
@app.cell(hide_code=True)
def _(mo):
mo.md(
r"""
# Rewards, Metrics & Optimizers
## Understanding Rewards
`Reward`s are an essential part of reinforcement learning frameworks.
They are typically float values (usually between 0.0 and 1.0, but they can be
negative also) that guide the process into making more efficient decisions or
predictions. During training, the goal is to maximize the reward function.
The reward gives the system an indication of how well it performed for that task.
"""
)
return
@app.cell(hide_code=True)
def _(mo):
mo.mermaid(
r"""
graph LR
A[Training Data] -->|Provide x:DataModel| B[Program];
B -->|Generate y_pred:JsonDataModel| C[Reward];
A -->|Provide y_true:DataModel| C;
C -->|Compute reward:Float| D[Optimizer];
D -->|Update trainable_variable:Variable| B;
"""
)
return
@app.cell(hide_code=True)
def _(mo):
mo.md(
r"""
This reinforcement loop is what makes possible for the system to learn by
repeatedly making predictions and refining its knowledge/methodology in order
to maximize the reward.
All rewards consist of a function or program that takes two inputs:
- `y_pred`: The prediction of the program.
- `y_true`: The ground truth/target value provided by the training data.
In Synalinks, we provide for several built-in rewards but it is also possible to
easily create new rewards if you needs to. Overall the choice will depend on the
task to perform. You can have a look at the rewards provided in the
[API section](https://synalinks.github.io/synalinks/Synalinks%20API/Rewards/).
### Understanding Metrics
`Metric`s are scalar values that are monitored during training and evaluation.
These values are used to know which program is best, in order to save it. Or to
provide additional information to compare different architectures with each others.
Unlike `Reward`s, a `Metric` is not used during training, meaning the metric value
is not backpropagated. Additionaly every reward function can be used as metric.
You can have a look at the metrics provided in the
[API section](https://synalinks.github.io/synalinks/Synalinks%20API/Metrics/).
### Predictions Filtering
Sometimes, your program have to output a complex JSON but you want to evaluate
just part of it. This could be because your training data only include a subset
of the JSON, or because the additonal fields were added only to help the LMs.
In that case, you have to filter out or filter in your predictions and ground
truth. Meaning that you want to remove or keep respectively only specific fields
of your JSON data. This can be achieved by adding a `out_mask` or `in_mask` list
parameter containing the keys to remove or keep for evaluation. This parameters
can be added to both reward and metrics. Like in the above example where we only
keep the field `answer` to compute the rewards and metrics.
### Understanding Optimizers
Optimizers are systems that handle the update of the module's state in order to
make them more performant. They are in charge of backpropagating the rewards
from the training process and select or generate examples and hints for the LMs.
Here is an example of program compilation, which is how you configure the reward,
metrics, and optimizer:
"""
)
return
@app.cell
def _(synalinks):
class Query(synalinks.DataModel):
query: str = synalinks.Field(
description="The user query",
)
class AnswerWithThinking(synalinks.DataModel):
thinking: str = synalinks.Field(
description="Your step by step thinking process",
)
answer: str = synalinks.Field(
description="The correct answer",
)
return Query, AnswerWithThinking
@app.cell
async def _(synalinks):
language_model = synalinks.LanguageModel(
model="openai/gpt-4o-mini",
)
_x0 = synalinks.Input(data_model=Query)
_x1 = await synalinks.Generator(
data_model=AnswerWithThinking,
language_model=language_model,
)(_x0)
program = synalinks.Program(
inputs=_x0,
outputs=_x1,
name="chain_of_thought",
description="Useful to answer in a step by step manner.",
)
program.compile(
reward=synalinks.rewards.CosineSimilarity(in_mask=["answer"]),
optimizer=synalinks.optimizers.RandomFewShot(),
metrics=[
synalinks.metrics.F1Score(in_mask=["answer"]),
],
)
return program
@app.cell(hide_code=True)
async def _(mo):
mo.md(
r"""
## Conclusion
In this notebook, we explored the fundamental concepts of training and
optimizing Synalinks programs using rewards, metrics, and optimizers.
These components are crucial for building efficient and adaptive language
model applications.
### Key Takeaways
- **Rewards**: `Reward`s guide the reinforcement learning process by
providing feedback on the system's performance. They are typically
float values that indicate how well the system performed a task,
with the goal of maximizing the reward function during training.
Synalinks offers built-in rewards and allows for custom reward
functions to suit specific tasks.
- **Metrics**: `Metric`s are scalar values monitored during training
and evaluation to determine the best-performing program. Unlike
rewards, metrics are not used for backpropagation. They provide
additional insights for comparing different architectures and
saving the optimal model.
- **Optimizers**: `Optimizer`s update the module's state to improve
performance. They handle the backpropagation of rewards and
select or generate examples and hints for the language models.
Proper configuration of optimizers is essential for effective
training.
- **Filtering Outputs**: When dealing with complex JSON outputs,
filtering predictions and ground truths using `out_mask` or
`in_mask` parameters ensures that only relevant fields are
evaluated. This is particularly useful when the training data
includes a subset of the JSON or when additional fields are
used to aid the language models.
"""
)
return
if __name__ == "__main__":
app.run()
|