With the phenomenon of DeepSeek-R1's top reasoning capabilities, we all saw the true power of RL. At its core, RL is a type of machine learning where a model/agent learns to make decisions by interacting with an environment to maximize a reward. RL learns through trial and error, receiving feedback in the form of rewards or penalties.
Here's a list of free sources that will help you dive into RL and how to use it:
2. Hugging Face Deep Reinforcement Learning Course -> https://huggingface.co/learn/deep-rl-course/unit0/introduction You'll learn how to train agents in unique environments, using best libraries, share your results, compete in challenges, and earn a certificate.
4. "Reinforcement Learning and Optimal Control" books, video lectures and course material by Dimitri P. Bertsekas from ASU -> https://web.mit.edu/dimitrib/www/RLbook.html Explores approximate Dynamic Programming (DP) and RL with key concepts and methods like rollout, tree search, and neural network training for RL and more.
8. Concepts: RLHF, RLAIF, RLEF, RLCF -> https://www.turingpost.com/p/rl-f Our flashcards easily explain what are these four RL approaches with different feedback
Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed! This could unlock so many possibilities β¨
π The open source community is unstoppable: 4M total downloads for DeepSeek models on Hugging Face, with 3.2M coming from the +600 models created by the community.
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mβnearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. π
The most popular community model? @bartowski's DeepSeek-R1-Distill-Qwen-32B-GGUF version β 1M downloads alone.
7 Open-source Methods to Improve Video Generation and Understanding
AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, weβre with you!
Today, weβre combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:
Reminder: Donβt. Use. ChatGPT. As. A. Calculator. Seriously. π€
Loved listening to @sasha on Hard Forkβit really made me think.
A few takeaways that hit home: - Individual culpability only gets you so far. The real priority: demanding accountability and transparency from companies. - Evaluate if generative AI is the right tool for certain tasks (like search) before using it.
Over the last few weeks, we have witnessed a surge in AI models' math reasoning capabilities. Top companies like Microsoft, NVIDIA, and Alibaba Qwen have already joined this race to make models "smarter" in mathematics. But why is this shift happening now?
Complex math calculations require advanced multi-step reasoning, making mathematics an ideal domain for demonstrating a model's strong "thinking" capabilities. Additionally, as AI continues to evolve and is applied in math-intensive fields such as machine learning and quantum computing (which is predicted to see significant growth in 2025), it must meet the demands of complex reasoning. Moreover, AI models can be integrated with external tools like symbolic solvers or computational engines to tackle large-scale math problems, which also needs high-quality math reasoning.
So hereβs a list of 10 recent advancements in math reasoning of AI models:
Today, we spoke with Snowflakeβs AI Research Team Leads, Yuxiong He and Samyam Rajbhandari (@samyam) (he is also one the researchers behind DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
DeepSpeed-Inference (2401.08671) and other DeepSpeed papers) Collaborating with their co-authors to reduce inference costs for enterprise-specific tasks, they observed that inputs are often significantly larger than outputs. This is because itβs in the nature of enterprises to analyze enormous amounts of information trying to extract valuable insights, which are much shorter. To address this, they developed SwiftKV SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving
Model Transformation (2410.03960), an optimization that reduces LLM inference costs by up to 75% for Meta Llama LLMs, enhancing efficiency and performance in enterprise AI tasks.
Today they are open-sourcing SwiftKV (Snowflake/Llama-3.1-SwiftKV-8B-Instruct) and ArcticTrainging Platform. In our new episode "15 minutes with a Researcher" they explain how SwiftKV works, its applicability to other architectures, its limitations, and additional methods to further reduce computation costs in inference. Watch the full 15 min interview here (https://youtu.be/9x1k7eXe-6Q?si=4_HQOyi1CPHgvlrx)
@meg, one of the best researchers in AI ethics, makes a critical point about autonomy: fully autonomous systems carry unknowable risks because they operate on computer logic rather than human logic.
The solution? Build systems that support & assist rather than override human decisions.
I highly recommend reading the blog post written by Meg, @evijit@sasha and @giadap. They define different levels of agent autonomy & provide a values-based analysis of risks, benefits, and uses of AI agents to help you make better decisions.
π₯ The AI Agent hype is real! This blog post deep dives into everything you need to know before deploying them: from key definitions to practical recommendations. A must-read for anyone building the future of autonomous systems.
π Key insight: A clear table breaking down the 5 levels of AI agents - from simple processors to fully autonomous systems. Essential framework for understanding where your agent stands on the autonomy spectrum
βοΈ Deep analysis of 15 core values reveals critical trade-offs: accuracy, privacy, safety, equity & more. The same features that make agents powerful can make them risky. Understanding these trade-offs is crucial for responsible deployment
π― 6 key recommendations for the road ahead: - Create rigorous evaluation protocols - Study societal effects - Understand ripple effects - Improve transparency - Open source can make a positive difference - Monitor base model evolution
Almost every AI researcher has studied or conducted a large number of AI research papers. So, it's quite logical that researchers are trying to create AI systems to help conduct research. Creating scientific research could be much easier and more varied if we use LLMs and AI assistants tailored for this purpose. Just imagine how interesting it would be to read high-quality research about AI made by an AI agent.
Today, we offer you to explore these 10 AI systems for scientific research:
Community fine-tuned models are more carbon efficient than the models they are derived from! π₯³πΏ
@alozowski@clefourrier@SaylorTwift@albertvillanova evaluated COβ emissions associated with model inference for over 3000 models on the Open LLM Leaderboard. Interesting trends and new insights emerged...π
10 Free Comprehensive Datasets for Supervised Fine-Tuning
High-quality datasets, their size and relevance directly impact the effectiveness of fine-tuning and the models' real-world applications. Among the numerous datasets for different tasks, it can be challenging to choose the most comprehensive dataset that best suits your purposes.
So today, we invite you to explore top 10 free datasets on natural language processing and maths:
1. fka/awesome-chatgpt-prompts proposes a huge variety of prompts that can be used with ChatGPT. Over 700 models were trained on this dataset.
2. HuggingFaceFW/fineweb from Hugging Face includes 15T tokens of cleaned and deduplicated English web data. Itβs suitable for LLM training, benchmarking, model validation.
3. HuggingFaceFW/fineweb-2 is an another version of FineWeb with high-quality pretraining data to over 1000 languages.
4. O1-OPEN/OpenO1-SFT with Chinese and English data can be used for Chain-of-Thought activation.
5. yahma/alpaca-cleaned is a curated version of the original Alpaca Dataset released by Stanford.
6. lmsys/lmsys-chat-1m with 1 million real-world conversations with 25 state-of-the-art LLMs offers diverse use cases, like content moderation, safety benchmarks, and training instruction-following models.
7. allenai/dolma from Allen AI includes 3T tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.
Math datasets:
1. HuggingFaceTB/finemath consists of educational math content and has two versions: 34B tokens and 54B tokens.
This year, we started our βAI Agents and Agentic Workflowsβ series (https://www.turingpost.com/t/AI-Agents) to explore everything about AI agents step by step: all the vocabulary, how they work, and how to build them. The huge interest in this series and the large number of studies conducted on agents showed that it was one of the most popular and important themes of the year. In 2025, most likely, agents will reach new highs β we will be covering that for you. Now, letβs review the agentic systems that have emerged this year.
Here is a list of 15 agentic systems and frameworks of 2024:
π From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.
Did a fun experiment: What are the main themes emerging from the 100+ Nieman Journalism Lab predictions for 2025?
I used natural language processing to cluster and map them β really helps spot patterns that weren't obvious when reading predictions one by one. So what will shape journalism next year? A lot of AI and US politics (surprise!), but there's also this horizontal axis that spans from industry strategies to deep reflections on how to talk to the public.
Click any dot to explore the original prediction. What themes surprise/interest you the most?