Artificial intelligence (AI) can be used to maximize energy efficiency in a wide range of applications like renewable energy integration, 5G networks, electric vehicles, and even computer processing units (CPUs), yet AI itself can be highly energy-intensive to train and use. The good news is that there are tools available to reduce AI power consumption. This FAQ reviews how quantization, sparsity, power capping, and carbon-aware power capping can be used to reduce the computational requirements and energy needed by AI and machine learning (ML).
Quantization can be a powerful tool for reducing AI training energy consumption. It refers to a variety of techniques for converting input values into a smaller set of values. For example, in many cases, the use of lower-precision arithmetic operations that replace 32-bit floating point numbers with 8-bit integers can dramatically increase computational efficiency and reduce memory and bandwidth requirements (Figure 1).
Artificial neural networks (ANNs) can provide a specific example of how quantization reduces energy consumption. Inside an ANN are interconnected activation nodes with weight parameters for each connection. The computational requirements in the activation nodes and the weight parameters can be quantized. ANNs can be computationally intensive, and the use of quantization can have a significant impact on lowering energy demands.
However, without proper implementation, moving from 32-bit floating point to 8-bit integers can reduce the accuracy of training computations and negatively impact ANN performance. For example, quantization can result in discontinuities in the training dataset. That can be addressed with various approaches like relaxed quantization (RQ) that smooths the quantized data and minimizes the effect of the discontinuities. Another approach involves the use of a properly structured straight-through estimator (STE) for quantization that produces a stable and efficient training process.
Sparsity is a commonsense approach to reducing the complexity of AI training models. It’s based on the concept of using only the minimally necessary parameters and avoiding over-parameterization that leads to increased energy consumption during training and inference.
Properly implemented, sparsity can reduce model sizes by 1 to 2 orders of magnitude, simplifying hardware requirements. Reduced computation, memory, and bandwidth requirements translate directly to reduced energy consumption. The use of sparsity can also make more complex models practical to implement.
Parameter optimization is an important activity related to sparsity. In machine learning, it is often referred to as hyperparameter optimization or tuning and is used to minimize a specified loss function. Well-implemented tuning can result in an 80% reduction in the energy consumed during the training of an ML model.
Over time, inference can consume more energy than is required for training. Training occurs once while inference is an ongoing activity. An important factor in minimizing energy consumption during inference involves matching the hardware with the demands of the algorithm.
Power capping involves setting a maximum power level that a processor, like a GPU used for AI and ML training, can consume regardless of the computational intensity of the task. Power capping can result in about 15% energy savings. The challenge is to invoke power capping without unduly increasing the task time.
Power capping is most successful for tasks like training that take a long time to complete. It can increase the time needed to complete a task by 3% or less. For training activities that can occur over days or months, the energy savings more than compensate for the increased training time. For example, a training activity that takes 80 hours without power capping may increase to 82 hours invoking power capping.
Carbon-aware power capping
Carbon-aware power capping takes the concept one step further and periodically forecasts the carbon intensity of the current generation sources. It requires insight into the mix of renewable and non-renewable generation sources, mixed with time of operation data and even the availability of stored renewable energy.
While basic power capping can result in about a 15% energy savings, carbon-aware power capping can reduce the overall carbon footprint of training by about 24% and still only increase the training time by about 3% (Figure 2).
Training and implementing AI and ML can consume large amounts of power and result in high levels of carbon emissions. There are several techniques, like quantization, sparsification, parameter optimization, and power capping, that can significantly reduce the power consumption of AI without compromising its performance.
Carbon-Aware Zeus, Taikai
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference, ArVix
Here’s why quantization matters for AI, Qualcomm
New tools are available to help reduce the energy that AI models devour, MIT
Optimization could cut the carbon footprint of AI training by up to 75%, University of Michigan
Relaxed Quantization for Discretized Neural Networks, arVix
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks, arXiv
Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets, arVix