JΔ™drzej Grabala

jgitsolutions

AI & ML interests

Local Drive Human Overseered System of Agents, LLMs, Langchains & other useful stuff on mid-to-low-end of commercial hardware.

Recent Activity

liked a model 2 days ago
microsoft/OmniParser-v2.0
liked a Space 2 days ago
hf-accelerate/model-memory-usage
liked a Space 2 days ago
huggingface/ai-deadlines
View all activity

Organizations

LangChain Agents Hub's profile picture LangChainDatasets's profile picture ZeroGPU Explorers's profile picture Dev Mode Explorers's profile picture

jgitsolutions's activity

reacted to chansung's post with πŸ‘ 29 days ago
view post
Post
2953
Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161
upvoted an article about 1 month ago
view article
Article

Welcome to Inference Providers on the Hub πŸ”₯

β€’ 411