Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:
1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents Covers agents, their functions, tool use and how they differ from models
3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more
5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty
7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> https://huggingface.co/Kseniase We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge
RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.
3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342) Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.
GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?
The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).