5 10

Miguel Gargallo

miguelgargallo

https://itamaesan.org

AI & ML interests

Master in Engineer Informatics, UPC CATALUNYA BARCELONA

Recent Activity

reacted to m-ric's post with 🔥 about 1 month ago

A non-Instruct LLM assistant is mostly useless. 🧐 Since it's mostly a model trained to complete text, when you ask it a question like "What to do during a stopover in Paris?", it can just go on and on adding more details to your question instead of answering, which would be valid to complete text from its training corpus, but not to answer questions. ➡️ So the post-training stage includes an important Instruction tuning step where you teach your model how to be useful : answer questions, be concise, be polite... RLHF is a well known technique for this. For people interested to understand how this step works, the folks at Adaptive ML have made a great guide! Read it here 👉 https://www.adaptive-ml.com/post/from-zero-to-ppo

reacted to m-ric's post with 👀 about 1 month ago

View all activity

Organizations

miguelgargallo's activity

reacted to m-ric's post with 🔥👀 about 1 month ago

Post

2367

A non-Instruct LLM assistant is mostly useless. 🧐

Since it's mostly a model trained to complete text, when you ask it a question like "What to do during a stopover in Paris?", it can just go on and on adding more details to your question instead of answering, which would be valid to complete text from its training corpus, but not to answer questions.

➡️ So the post-training stage includes an important Instruction tuning step where you teach your model how to be useful : answer questions, be concise, be polite... RLHF is a well known technique for this.

For people interested to understand how this step works, the folks at Adaptive ML have made a great guide!

Read it here 👉 https://www.adaptive-ml.com/post/from-zero-to-ppo