Reduce, Reuse, Recycle: Why Open Source is a Win for Sustainability

Community Article Published May 7, 2025

Upvote

Recent years have seen a rise in popularity of open-source AI models across the ecosystem in a variety of tasks, from text generation to audio transcription and reasoning being driven by open models like LLaMa, DeepSeek, Phi and Qwen. As millions of models get shared and used by the community, this also improves the accessibility, democratization and innovation of the field as a whole. But how can we make sure this open-source revolution is sustainable?

One way to achieve this is to go back to the 3Rs (Reduce, Reuse, Recycle), fundamental principles in environmental conservation that can also be applied to AI.

Reduce: SmolLMs, Distillation and Quantization

TL;DR —-- training and sharing smaller models with the community that have similarly (or better!) performance than bigger ones can help incentivize the usage of more compute-optimal models, and push the envelope in terms of overall efficiency.

The first and foremost R – the one that is prioritized by environmental advocates – is to reduce the amount of resources or objects that we consume in our daily lives. Whether it be buying fewer clothes or downsizing our homes, the emphasis is put on making our lives fit within the planetary boundaries in which we operate.

Smol models (and datasets!)

Deploying models can quickly add up in terms of cost, with tasks like image generation using orders of magnitude more energy than tasks involving text and “general purpose” models requiring 20-30 times more energy compared to task-specific ones (source).

The recent family of ‘smolLM’ models with 135M, 360M, and 1.7B parameters, trained by Hugging Face, has proved that smaller LLMs can achieve impressive results when designed and trained thoughtfully. These models were specifically designed to be small and run locally on accessible hardware including smartphones.

Through explicit curation of a custom, high-quality training corpus and a bespoke training approach, SmolLMs are able to compete with state-of-the-art LLMs multiple times their size. The original SmolLM series was followed up with a second family of models (SmolLM 2) as well as SmolVLM2, the smallest video language model, showing that small can be mighty even in multi-modal settings.

Distillation and quantization

Above and beyond models that are explicitly trained to be smaller and less compute-intensive, it is also possible to take existing trained models and make them less compute-intensive. One of these techniques, called knowledge distillation, involves transferring knowledge from a large model to a smaller one. While this doesn’t reduce the amount of compute needed to initially train the model, it does significantly reduce the amount of compute needed to deploy it. For instance, while the highly popular DeepSeek R1 model has 671B parameters (and needs multiple high-performance GPUs to be deployed!), the DeepSeek team also open-sourced distilled versions of the models that range from 1.5B to 70B with similarly high performance.

Another popular technique for reducing the amount of memory and compute needed for model deployment is quantization, which involves using a lower precision, e.g., 8-bits instead of 32, for representing the model. This means that operations like matrix multiplication can be performed much faster (since there are fewer operations to do). Data formats like GGUF, AWQ and AutoGPTQ, are specifically optimized for quick loading and saving of models, which makes it possible to take existing models developed in PyTorch and convert them, giving a huge boost to their efficiency.

One way to measure the relative benefits of these techniques for different models and hardware types is the AI Energy Score project, which proposes standardized approaches for benchmarking the energy efficiency of different models and tasks. The leaderboard includes hundreds of existing open source models, and the methodology can be used to rank and compare models and optimization approaches, and to pick the most efficient model for the task at hand.

Reuse: Using Existing Models

TL;DR —-- Reusing models can help save many tonnes of CO2 emissions compared to training new models, and it’s good practice to search for a model on the HF Hub instead of training from scratch.

“Bigger is better” has been a dominant paradigm in AI research and practice, driven by ‘scaling laws’ that stipulate that larger models have better performance than smaller ones. This inexorably uses increasing amounts of compute, with many recent state-of-the-art LLMs reportedly costing hundreds of millions of dollars to train. This also comes with a cost to the environment since the energy to power this compute is mostly generated from non-renewable sources like natural gas. Research has estimated this cost to be anywhere between 25 to 500 tonnes depending on where and how models are trained.

As the biggest repository of open-source AI models and datasets, the Hugging Face Hub makes it easy for members of the AI community to easily find models that carry out different tasks, from visual question answering to translation and image generation, using predefined pipelines. Whereas before, developers would spend hours “shopping” for the right model and adapting it to the specific framework and hardware setup that they have (CUDA versions are a pain, amirite?), now they can easily find, download and deploy millions of models from the Hugging Face Hub.

Interestingly, many of the models with the most downloads are not the ones with hundreds of billions of parameters, but more lightweight ones like MobileNet-v3 (2.55M parameters!) and BERT-base (110M), reflecting the real-world usage and longevity of small, simple models that can be used in a variety of applications. Since many members of the AI community identify as “GPU poor” (i.e. without ready access to large amounts of cloud compute), reusing models that both require less computational power and are tried and true makes them more accessible.

It’s also worth noting that for some models, like Phi 2, the quantized GGUF version of the model has over 30 times more downloads than the original model, and both the distilled 1.5B and 32B versions of DeepSeek R1 have more downloads than the original version. This goes to show that there is a huge appetite from the community for smaller, more compute-efficient models that still pack the same punch as bigger ones.

Recycle: Fine-tuning and adapting models

TL;DR — it’s possible to adapt existing models, which not only reduces the amount of compute and energy needed for training them, but also allows more transparency and accountability in terms of energy requirements and carbon emissions.

Recycling is the last of the 3 Rs, but thankfully, unlike recycling physical objects like plastic bottles and aluminum cans, recycling AI models is actually much more efficient and less harmful for the environment.

Techniques like LoRA (Low-Rank Adaptation of Large Language Models) involve inserting a smaller number of new weights into an existing model and training only these weights, which significantly reduces the amount of compute necessary while adapting models to generate new styles of images. There are now thousands of community-trained LoRA models that can be used for everything from designing clothing to making comic books, without needing to train models from scratch.

Libraries like AutoTrain also make it easy to fine-tune base models to specific tasks and datasets, which reduces the amount of compute they need (since task-specific models require less compute, all things considered (source)). By automatically tuning the hyperparameters for each model-task combination, AutoTrain also lowers the barrier to entry for AI development, allowing non-technical users to also create their own models, instead of using proprietary AI systems and APIs, which rely upon “general purpose” models that inexorably use more compute as well.

Developing your own models and choosing where and how they are deployed also allows you to have more control over what kind of energy source is being used (for instance, by choosing a datacenter powered by renewable energy) as well as a hardware that is optimized for the task, be it a powerful GPU or a lightweight CPU. It also allows you to use a tool like CodeCarbon to measure and report your energy consumption and environmental impacts, giving you transparency that you wouldn’t have with a commercial AI tool (since we still have no idea what the carbon footprint of ChatGPT is!). Reflecting this cost in ESG reports and internal accounting can also help organizations and individuals get a better understanding of how using AI is impacting their commitments to sustainability and to take specific actions towards reducing their impacts.

Conclusion

As AI becomes increasingly accessible, it’s important to keep considerations around efficiency and energy consumption in mind as we choose models and build tools and applications. Open-source AI allows members of the community to have more control over the kinds of models they develop and deploy, and to start cultivating a culture of improved transparency and accountability with regard to the impact that AI is having on the environment.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote