What carbon footprint does software have?

What CO₂ footprint does software actually have and how do you calculate it? Spoiler: At a high level, it’s quite easy to describe. However, the concrete measurement of all details is still tricky. Even the cloud providers’ tools work partially with estimates. Nevertheless, the accuracy is definitely sufficient to steer software development in the direction of green software. Especially since in other areas, much coarser estimates are used, e.g., when a flat emission value is assumed for a letter dispatch. Here, 20 g CO₂ is practically always used. A value that comes from a single study based on data from the USA – you can hardly get more undifferentiated than that. Compared to such data, the measurements of software emissions are highly precise.

Emissions from software

Programs in themselves do not cause emissions in the first instance. They induce emissions indirectly via the hardware on which they run. They do this in two ways, both of which are also included in the SCI index we have defined at the Green Software Foundation to assess the climate impact of software:

  • Emissions from power consumption and
  • Emissions bound in the hardware

    Determine emissions of the hardware

    The hardware used must be produced and disposed of. In the process, emissions are produced, so-called embodied emissions. To understand these, you basically have to perform a lifecycle analysis of the hardware. This is not an easy task, which also applies to all other products. It’s usually done using data collections called Life Cycle Inventories (LCIs). For hardware, we currently rely on the information provided by the manufacturers or cloud operators. As an example, here is a data sheet for a desktop. This method is suitable for hardware that you provide and use yourself, i.e. your own data centers. Cloud providers usually supply aggregated data for the various forms of emissions. Google, for example, only takes into account so-called “upstream emissions” for hardware, i.e. those from production. Emissions from disposal are therefore not included.

    Determine emissions from power consumption

    More dynamic and thus more exciting for the development of green software are emissions resulting from power consumption. Power consumption can be determined well if defined (also virtual) hardware can be assumed. Appropriate tools exist here. Even if these are not 100% accurate, because they do not take into account the hypervisor’s own consumption, for example, they provide really useful results. It becomes more difficult when software containers are automatically started by cloud management software on different hardware. Then a suitable share of the embodied emissons of the hardware used for a limited time would have to be calculated in each case. Similar difficulties arise with distributed systems when so-called shared services are used, as is common, for example, with databases or SAN storage.

    Beyond the power consumption of the actual servers, the (cloud) data center requires additional power, e.g. for UPS and air conditioning. This additional consumption is summarized in the Power Usage Efficiency (PUE) factor. The total consumption is thus obtained by multiplying the server power consumption by the PUE.

    If the power consumption is known, the emissions can be derived from it. This is most usefully done using the grid-based method, which basically states that the same electricity comes out of the socket that was produced at the other end of the line. And this is the case regardless of whether one has booked a green electricity tariff or not. For data centers in Germany, you can use the values from the Federal Environment Agency to convert kWh into gCO₂ emissions. For cloud data centers in other parts of the world, the emissions there should be used as a basis.

    Data from cloud providers

    The major cloud providers supply more or less concrete data on the emissions that software generates. Or they provide tools that you can use to determine consumption yourself. See, for example, these reports on AWS emissions or Google Cloud or Azure’s toolkit to determine the SCI score. As said above, these data are partly not complete and always partly contain estimates. But they are good and definitely very helpful.

    Measurement in production or on a test system?

    Where do I actually measure most sensibly? In the production environment or on a dedicated test system? That depends on what I want to use the results for. For sustainability reports, the emissions in production are relevant. There, the actual use of the software by the users is also the basis. The measured values are therefore more realistic, but also more complicated to determine and therefore possibly only imprecise in a different way than on a test system. A dedicated test system has the advantage of full control over the system. Faults are avoided and hardware resources can be specifically allocated (because, for example, memory is not outsourced to a SAN). This makes it easier to track the impact of software changes on individual hardware components. However, user behavior can only be simulated here.

    In order to steer one’s own software development in the direction of green software, it is advisable to carry out regular measurements on a fixed test system. In this way, one can precisely recognize the effects of current further development and, if necessary, take countermeasures in good time. The SCI value was also designed for such comparisons over time. The evaluation of available data from the monitoring of the production environment is also useful. Here, potential for improvement can be identified above all in the coordination of the software with the hardware and the load management in production.

    Back To Top