Osirian AI: A Call For The Resurrection And Reuse Of Deep Learning Models.

Community Article Published March 26, 2025

image/png

Deep learning in its current form is quite similar to how alchemy related to chemistry. We explore various ideas, experiment with different designs, and adjust numerous configurations to develop and obtain a working model, subsequently discarding all other iterations. This process, often stemming from a somewhat limited understanding, results in a significant cost in time, effort, and computational resources.

In addition, due to the rapid pace of development, models considered state-of-the-art today quickly become outdated or obsolete within a matter of months or weeks and are effectively consigned to the Great Deep Wasteland!

While this "throwaway culture" appears inherent in the current state of deep learning, we can certainly find ways to utilize these products in which much effort and compute were invested.

This is why we really need to focus more on research that tries to find good and effective ways to reuse, repurpose, and recycle deep learning models and all the related things they produce. It's important because, as we've seen, we end up wasting a lot of effort and resources when we just throw models away.

Transfer learning is one way to use models again, where you take a model trained for one thing and teach it something new, but it doesn't really solve the problem I'm talking about here. I'm more interested in approaches that let us directly use the many models we end up discarding after experiments, methods like model merging, where you combine different models, and model stitching, where you connect parts of them. These kinds of methods could really help us get more value out of the work we've already done.

This is a big topic with a lot to explore, and I might write more about the different ways we can recycle models in the future. But for now, to give you a better idea of what this looks like in practice, let's take a look at some examples that I find interesting.

We can identify three primary sources of waste throughout the deep learning model development process:

  1. The experimentation phase, which yields numerous discarded model designs and configurations.

  2. The training phase, where numerous intermediate model checkpoints are often discarded.

  3. The deployment phase, where models become quickly outdated.

For each challenge, we will look at an example of mitigation.

Experimentation Phase

Net2Net: Accelerating Learning via Knowledge Transfer

image/png

From ICLR 2016 comes this old gem, Net2Net, developed by the creators of XGBoost and Generative adversarial networks (GAN). Net2Net is a framework that accelerates model development experimentation by enabling the transfer of knowledge from an existing trained prototype to a new model, allowing for rapid initialization and avoiding training from scratch. This is achieved by using two operations based on function-preserving transformations, drawing inspiration from compiler design: Net2WiderNet, which permits the substitution of a model with a functionally equivalent, yet wider, network; and Net2DeeperNet, allowing the replacement of a model with a functionally equivalent, deeper network.

Training Phase

Averaging Weights Leads to Wider Optima and Better Generalization

image/png

Another notable work from 2018 is Stochastic Weight Averaging (SWA), a remarkably straightforward to implement and architecture-agnostic technique that improves the generalization performance of deep learning models with no additional cost over conventional training. Instead of just grabbing the final set of weights after training with Stochastic Gradient Descent (SGD), SWA takes the average of the weights it sees during the later stages of training. This averaging process allows it to achieve better test accuracy and generalization performance with minimal computational overhead.

Deployment Phase

Stitchable Neural Networks

image/png

Model stitching is a fascinating and powerful approach to deep learning model recycling. This technique constructs new, potentially more efficient architectures by combining specific components or layers from pre-trained models. Unlike merging, which combines parameters, stitching enables targeted reuse of learned features at the network module level, often without full retraining. For example, this paper introduces Stitchable Neural Networks (SN-Net), a framework for efficiently creating and deploying deep learning models with varying accuracy-efficiency trade-offs by leveraging the growing availability of pre-trained model families (e.g., different-sized ResNets or DeiTs) in public model zoos.


In ancient Egyptian mythology, Osiris was a benevolent king who brought prosperity and order to Egypt. Driven by jealousy, his brother Set murdered him and scattered the pieces of his body. Devoted to her husband, Osiris's wife, Isis, tirelessly searched for and miraculously pieced his body back together using her powerful magic. Through this act of restoration and his subsequent return from death, Osiris became the god of resurrection and the ruler of the afterlife, judging the souls of the deceased.

Until we figure out how to create models that continually learn, we need to persist in searching for effective ways to utilize the diverse artifacts produced during the model development process and to resurrect those models relegated to the Great Deep Wasteland.

This article was originally published here

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment