Data is the fuel that runs AI tools. While many companies claim their AI tools can process unstructured data and generate accurate insights, they require a lot of energy to operate.
One of the biggest challenges for AI systems has also been the vast amount of data. Research shows that more than 80% of global data remains unstructured.
In a conversation with AIM, Triaksh Mitra, a data science professional, explained how unstructured data formats are memory-intensive, demand complex preprocessing, and rely on energy-hungry hardware like GPUs for model training.
The International Energy Agency (IEA) has also argued that policymakers and stakeholders have limited tools to analyse both sides of this issue due to insufficient comprehensive data. There is considerable uncertainty regarding the current and future consumption of data centres.
“Unstructured data…requires intensive preprocessing and often complex deep learning models such as transformers or CNNs, which are computationally expensive,” Mitra explained.
A typical AI data centre uses as much electricity as 1 lakh households, and the largest ones under construction could consume 20 times that amount.
The IEA report says that the unpredictability of future electricity demand necessitates a scenario-based approach to examine different pathways and offer insights on timelines pertinent to energy sector decision-making.
While many organisations have tools to refine the raw data customers use, several others have adopted an AI tool called pipeline control. This tool filters the information the user needs instead of providing all the unnecessary information. However, it consumes a lot of energy.
“The need for scalable storage solutions, like data lakes and cloud infrastructure, further adds to energy costs. Additionally, maintaining high-performance hardware for handling unstructured data workflows increases power usage significantly compared to traditional structured data pipelines,” Mitra stated.
Infrastructure and Sustainability Concerns
AI itself consumes millions of dollars in power, but it’s easy to forget that unstructured data needs much more energy to process.
“OpenAI had to pause a popular Ghibli-style image generator due to overwhelming GPU demand triggered by viral trends, which strained server infrastructure. While isolated, such incidents reveal how easily energy use can spiral without oversight,” Mitra mentioned.
According to IEA data, CPUs and GPUs account for approximately 60% of electricity demand in modern data centres, although this can vary significantly between different types of data centres.
Modern computing also relies on powerful data centres located all over the world. The IEA revealed that the number of these centres will more than double by 2030 to around 945 terawatt-hours (TWh), which is a little more than Japan’s entire electricity consumption at the moment.
AI also has multiple applications in electricity systems due to the complexity of supply, transmission, and demand profiles. According to the IEA analysis, the use of AI could enable up to 175 GW of additional transmission capacity on existing power lines.
The IEA report states that many barriers limit the extent to which AI applications can be implemented and hinder the pace of change. These factors include unfavourable regulations, limited access to data, accessibility difficulties, interoperability concerns, significant skill gaps, insufficient digital infrastructure, and, in certain instances, a general reluctance to embrace change.
“Underlying infrastructure significantly impacts the energy cost of unstructured data processing. Unstructured data demands high-capacity, scalable storage systems and advanced hardware like GPUs or TPUs for training deep learning models.”
“Moreover, maintaining and updating such models requires sustained computing power, making infrastructure choices, like using energy-efficient data centres or hardware accelerators, which are crucial for reducing the overall carbon and energy footprint,” he added.
The carbon footprint of large-scale AI training models can be quantified using metrics such as floating-point operations per second (FLOPs) or total kWh. However, the data science professional highlighted that these metrics are insufficient to understand energy consumption, thus ignoring factors such as cooling overhead, data centre efficiency, and the carbon intensity of the energy source.
Environmental Impact
Many fields within energy innovation involve challenges that AI excels at addressing: intricate design environments, the necessity to navigate performance trade-offs for the best results, and extensive datasets.
According to the IEA, ensuring a consistent and cost-effective power supply for data centres is central to the energy issues related to AI. Specifically, the increasing proliferation of AI data centres has heightened the need to tackle the limitations of the power equipment supply chain.
Some experts highlighted that AI itself is a tool that can help curb AI’s energy demands. While AI’s progression has also made the public fearful about power consumption in the climate community, NVIDIA CEO Jensen Huang argues that the power use projections are likely doubling the count, despite factual data stating the path the AI industry is heading towards.
Having regulatory or industry standards for energy reporting in AI is also important. “As with any emerging technology, unchecked innovation can lead to unintended harm. Transparent energy reporting would ensure that developers remain accountable and consider sustainability from the start, rather than prioritising scale or popularity at the expense of environmental impact,” Mitra said.
He believes that one area where sustainability in AI is heavily overlooked is the volume and velocity of data. The IEA noted that improved efficiency in AI hardware and models could reduce electricity demand from data centres by 20% by 2035. Demand could range from 700 to 1,700 TWh across different scenarios.
It will be unsustainable to store and process data due to the rate of generation, which will continue to grow for many decades. “Without checks, this leads to excessive energy use and infrastructure strain. A more sustainable approach would involve selective data retention, better data curation, and prioritising quality over quantity to reduce unnecessary processing and storage overhead,” Mitra concluded.