The French association Data for Good released a white paper exploring the societal and environmental issues surrounding generative AI. I was particularly interested in the environmental impact of language models, which is less covered than the ethical aspects. Here are my key learnings:
- Context: world leaders committed to reduce our emissions by 2050 to well below 2°C. That implies reducing our emissions from 43% between 2020 and 2030 (to limit warming to 1.5°C, see section C.1.1 in IPCC report). However, in the digital space, emissions are not reducing but increasing from 2 to 7% yearly.
- GPT-3’s training emitted a whopping 2200 tons of CO2 equivalent — comparable to 1600 return flights from Paris to New York.
- With 13 million users, ChatGPT’s monthly usage equals 10,000 tons of CO2. It would contribute 0.1% to the yearly carbon footprint of individuals in France/UK if everyone used it today and to 0.5% of our target footprint in 2050.
- ChatGPT+ impact, relying on GPT-4, could be 10 to 100 times more, adding up to 10% to our current yearly carbon footprint… or 50% of our target footprint.
- There are many ways to reduce the impact of using such models: use them reasonably and opt for cloud services with proven environmental performance.
To evaluate the environmental impact of anything, we can estimate its carbon footprint: it measures the total greenhouse gas emissions caused directly and indirectly by an individual, organization, or product, expressed in equivalent tons of carbon dioxide (CO2e).
To put it into perspective, the average annual carbon footprint is approximately 8–13 tons per person in the UK or France, 21 tons in the USA, and 6 tons worldwide. I will consider 10 tons as our current footprint.
Some examples (with sources):
To keep the global temperature increase below 2 degrees, we should aim to reduce our global carbon footprint to 2 tons per person by 2050.
There is much work to do to reduce our emissions by 80 or 90%, and the continuously increasing demand for digital services surpassing efficiency improvements is not helping. How does generative AI fit into this equation, and what can we do to align our digital advancements with our environmental goals?
In the training phase, we feed language models some curated data so that they can learn from it and become capable of answering our requests.
The study analyzed two large language models:
1. Open-source Bloom
2. Proprietary GPT-3 from OpenAI
Key Findings:
– Bloom’s Carbon Footprint: Initially estimated at 30 tons, it was revised to 120 tons after comprehensive analysis.
– GPT -3’s Carbon Footprint: Extrapolated to be 2200 tons, equivalent to 1600 return flights from Paris — New York.
A common viewpoint is that it’s all right for these models to have high training costs because they get used extensively by many users.
Inference in Machine Learning is when we use a trained model to make predictions on live data. We are now looking at the impact of running ChatGPT.
Based on the assumption that Chatgpt has 13 million active users making 15 requests on average, the monthly carbon footprint is 10,000 tons of CO2.
And the key learning for me is that this is much larger than the training impact.
For one user, the addition to the yearly carbon footprint is 12 months * 10000 tons / 13 million users = 9 kilos of CO2eq per year per user, equivalent to 0.1% of the current average annual carbon footprint, or 0.5% of our target footprint.
But what if that person uses ChatGPT plus with GPT-4? GPT-4’s footprint is 10 to 100 times larger than GPT-3. This footprint is worth between 100 kilos of CO2e and 1 ton extra, up to 10% of a French citizen’s carbon footprint — and twice that if you’re doing your best to reduce it. If we consider our target footprint in 2050, that represents 50%!
That sucks.
And what if, one day, every interaction you have with any application in your life makes requests to language models? Scary thought.
The good news is. Using the gpt4 API extensively is so expensive that we can’t let our users make 15 requests a day unless our users are ready to pay a 100$+ monthly subscription, which my target market on the product I am building (a personal assistant to meditation) is unwilling to pay. And that’s not only the small businesses that cannot afford it: Google and Microsoft also cannot afford to replace their search engines with a model of the size of GPT4, which would increase by 100 the cost of their queries.
The recommendations are as follows:
- Stay Sober: It can be tempting to replace an entire IT project with ChatGPT-4, but instead, we can question the project’s utility, the real need to use a language model, and limit its use to specific cases that truly require it. Like, use a much smaller model than GPT-4 whenever you can. Think twice before using (it in) ChatGPT+.
- Optimize Training and Usage: On this point, the techniques are numerous, constantly evolving, and data scientists should use them already… to reduce costs. They mainly consist of reducing infrastructure usage, which in turn reduces electricity consumption and, therefore, carbon emissions. In essence, we only train a model if we must; if we do train, we plan it to avoid wasting resources. And we use the smallest model that meets the needs satisfactorily.
- Select the top country to host your server based on its energy’s carbon footprint. And here comes the French pride: the carbon footprint of our primarily nuclear energy is 7 times less than in the USA. However, suppose you all start hosting your language models here: in that case, we will probably import the coal energy from our dear neighbours 🔥.
- Select the top cloud service based on its environmental performances (these data are sometimes public; there are otherwise tools to measure/estimate it like https://mlco2.github.io/impact/) — favour cloud services that use their servers for longer (however hyper scalers tend to keep their hardware for no more than 4 years), and data centers with high level of sharing
Whether you’re an individual or a corporation, resources and experts are available to guide you on a sustainable path.
At the individual level:
– If you want to evaluate your carbon footprint, there are many tools online. On a personal note, measuring my carbon footprint was an eye-opener, prompting me to explore ways to make a positive impact. if living in the UK, check https://footprint.wwf.org.uk/
– To get a quick 3h course in the fundamental science behind climate change: https://climatefresk.org/
– To investigate the actions you can make and estimate how much it would reduce your footprint, another 3h workshop: https://en.2tonnes.org/
At the corporate level:
Many companies are exploring these issues and here is what they can do:
- educate their employees (with the workshops suggested above),
- performe audits and measure their carbon footprint,
- set up strategies to improve their ESG (Environmental, Social, and corporate Governance) scores.
I heard about this brilliant study thanks to some great people I recently met, from Toovalu and Wavestone. Check out what they do!
Please comment if you found any mistake in my estimations or want to add your thoughts and share if you found it interesting.
🙌 Thank you for taking the time to read this article, I hope it was insightful! Great thanks to Thibaut, Léo, Benoit and Diane for their precious feedback and additions to this article 🙏.
And if you want to stay updated on Generative AI and responsible ML, follow me on Linkedin 👋.
Be the first to comment