🌁#83: GAN is back

Community Article Published January 13, 2025

🔳 Turing Post has been invited to join 🤗 Hugging Face as a resident -> click to follow!

Now, to the main topic:

Last week’s headlines were dominated by timid CEOs and burning events in LA, with CES coverage flooding every feed. In was intense, and we found ourselves yearning for the comfort of good old machine learning. So today, we’re revisiting a classic: GANs. Are they still worthy of their title as one of the most captivating ideas in ML?

This overview is inspired by the recent paper “The GAN Is Dead; Long Live the GAN!”. As always, let’s begin with our favorite starting point – a refreshing dive into history.

The Birth of GANs: A Game of Two Networks

The paper “Generative Adversarial Nets” was introduced in 2014 by Ian Goodfellow and his team. The concept was simple yet revolutionary: two neural networks, a generator and a discriminator, compete in a zero-sum game.

Generator: This network creates fake data (e.g., images, audio, or text) starting from random noise. Its goal is to generate data so realistic that the other network (the discriminator) can’t tell it’s fake.
Discriminator: This network acts as a judge. It looks at data (both real and fake) and tries to determine if it’s authentic or generated by the generator.

This adversarial training forces both networks to improve, eventually producing synthetic data that’s indistinguishable from the real thing.

This approach turned to be very effective. To the point that in 2016 Yann LeCun said that “it’s the best idea we had in a bit”.

Image Credit: RI Seminar: Yann LeCun : The Next Frontier in AI: Unsupervised Learning

Compared to earlier generative models like Variational Autoencoders (VAEs) and Restricted Boltzmann Machines (RBMs), GANs generated sharper images, learned more intricate patterns, and opened up new possibilities.

The excitement around GANs was palpable but training challenges, like instability and mode collapse, were also real.

The Shift to Diffusion Models

As the years passed, those training difficulties became harder to ignore. Around 2022, a new challenger emerged: diffusion models. These models approached data generation as a gradual refinement process, which made them more stable and easier to train.

Diffusion models quickly stole the spotlight, offering high-quality, diverse outputs and fewer headaches for researchers. GANs, once the star of generative modeling, began to fade from the conversation.

The GAN Is Dead; Long Live the GAN!

Not from the conversations of the true believers! Just a few days ago, in this brand new 2025, a paper with the bold title “The GAN Is Dead; Long Live the GAN!” reignited interest in GANs. Written by Yiwen Huang, Aaron Gokaslan, Volodymyr Kuleshov, and James Tompkin, the paper argued that GANs’ challenges were more about outdated architectures and techniques than inherent flaws.

At the heart of this idea is a better loss function – think of it as a smarter way for the GAN to measure how well it’s learning. They call it the relativistic GAN loss. It makes the GAN training process smoother and less prone to common problems like weird artifacts or getting stuck generating only a small set of images.

The researchers also modernized the GAN architecture. They started with StyleGAN2 (a popular model known for generating photorealistic faces) and stripped out all the extra stuff that’s no longer necessary thanks to recent advancements in AI design. They added better building blocks, like ResNets and grouped convolutions, to create a leaner, meaner GAN called R3GAN.

This new approach not only works better but is also simpler. On standard benchmarks like FFHQ (a dataset of human faces) and CIFAR-10 (smaller images of everyday objects), R3GAN beats existing models, including some diffusion models. Plus, it’s faster to train and uses less computing power.

If you’ve avoided GANs because they seemed too fiddly or outdated, this might be the perfect time to give them another shot. R3GAN makes the whole process way more accessible. It’s time to rethink what GANs can do.

Iterative nature of ML innovation

The revival of GANs is a reminder of the iterative nature of machine learning innovation. GANs remain relevant because they solve real problems efficiently. Their ability to generate high-quality synthetic data is even more critical now, as the demand for genAI data overwhelms available sources. This is especially important in industries like healthcare, where privacy concerns make sharing real-world data challenging.

Curated Collections (ex Twitter library)

Inspired by Agent Laboratory: Using LLM Agents as Research Assistants by AMD and Johns Hopkins University and LLM4SR: A Survey on LLMs for Scientific Research by University of Texas at Dallas, we put together this collection:

10 AI Systems for Scientific Research

Do you like Turing Post? –> Click 'Follow'! And subscribe to receive it straight into your inbox -> https://www.turingpost.com/subscribe

We are reading

In this article, Will Schenk compares different AI research tools, asking them questions like "Why is it dark at night?" and concludes that DeepResearch outperforms other models in providing thorough, reliable, and well-referenced insights. Great read.
How Ben Hylak turned from o1 pro skeptic to fan by overcoming his skill issue.
Agents by Chip Huyen
A re-record of Nathan Lambert’s NeurIPS tutorial on language modeling (plus some added content).

The freshest research papers, categorized for your convenience

There were quite a few TOP research papers this week, we will mark them with 🌟 in each section.

Reasoning and Mathematical Capabilities

🌟 Sky-T1: Train Your Own O1 Preview Model Within $450 demonstrates the affordability of high-performance reasoning models by training a 32B model for reasoning and coding tasks.
🌟 RStar-Math: Small LLMs Can Master Math Reasoning With Self-Evolved Deep Thinking highlights small models excelling in math reasoning via Monte Carlo Tree Search and iterative self-improvement methods.
🌟 Test-time Computing: From System-1 Thinking to System-2 Thinking explores methods to enhance AI reasoning by combining intuitive and deliberative strategies for robust problem-solving.
🌟 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought proposes Meta-CoT to enable iterative exploration and verification, enhancing reasoning for complex problem-solving tasks.
Search-o1: Agentic Search-Enhanced Large Reasoning Models introduces retrieval-augmented generation for reasoning models, enhancing their accuracy in complex domains by integrating external knowledge.
BoostStep: Boosting Mathematical Capability of Large Language Models via Improved Single-Step Reasoning refines step-level reasoning for mathematical tasks, significantly improving accuracy on low-similarity and challenging benchmarks.
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics focuses on chain-of-thought reasoning for multimodal tasks, providing robust frameworks for mathematical problem-solving.
DOLPHIN: Closed-Loop Open-Ended Auto-Research through Thinking, Practice, and Feedback innovates auto-research through iterative feedback loops that integrate idea generation, validation, and refinement.
Multiagent Finetuning: Self-Improvement With Diverse Reasoning Chains enhances model reasoning through multiagent systems that preserve diverse reasoning chains across tasks.

Reinforcement Learning from Human Feedback (RLHF)

🌟REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models improves RLHF by integrating PPO-inspired techniques into the REINFORCE framework, enabling faster, more stable, and efficient alignment without requiring a critic network.
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Models advances RLHF by introducing segment-level reward modeling, ensuring semantically coherent and dense feedback for better model alignment.

Robotics and Physical AI

🌟 Cosmos World Foundation Model Platform for Physical AI trains robotics systems via large-scale, physics-aware simulations for diverse applications.
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives proposes a vision-language framework for robust robotic manipulation, achieving zero-shot generalization across tasks.

Retrieval-Augmented Generation (RAG)

VideoRAG: Retrieval-Augmented Generation over Video Corpus combines visual and textual retrieval to improve response accuracy for video-based questions.
Personalized Graph-Based Retrieval for Large Language Models enriches retrieval by integrating user-centric knowledge graphs for personalized text generation.
Multi-task Retriever Fine-tuning for Domain-specific and Efficient RAG optimizes RAG for enterprise applications by fine-tuning retrievers on domain-specific tasks.
GeAR: Generation Augmented Retrieval bridges retrieval and generation using bi-encoder architectures to locate and retrieve fine-grained text units.

Uncategorized Innovations

An Empirical Study of Autoregressive Pre-Training from Videos explores autoregressive pre-training for video data, achieving competitive performance across diverse domains.
Entropy-Guided Attention for Private LLMs introduces entropy regularization techniques to improve private inference efficiency in LLMs.

That’s all for today. Thank you for reading!

Please share this article to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

Upvote