Data Center Energy: GPUs Stabilize Power Grids

In Brief

Researchers developed a system that lets AI data centers stabilize electrical grid voltages by adjusting GPU batch sizes in real-time—reducing voltage violations by 99.87% while maintaining user service quality, turning energy-hungry data centers into grid-supporting assets.

In Depth

The explosive growth of artificial intelligence has brought data centers into the spotlight—not just for their computational power, but for their massive electricity consumption. As AI workloads continue to expand, data centers are projected to reach 945 TWh globally by 2030, more than doubling their 2024 consumption. But what if these energy-hungry facilities could actually help stabilize our power grids instead of just straining them?

Researchers from the University of Michigan have developed an innovative "GPU-to-Grid" (G2G) framework that transforms AI data centers from passive electricity consumers into active grid-supporting resources. Their breakthrough demonstrates that by intelligently controlling how GPUs process AI workloads—specifically through adjusting batch sizes for large language model (LLM) inference—data centers can help regulate voltage in distribution networks while maintaining quality service for users.

The Challenge: When AI Meets the Grid

Modern data centers housing thousands of GPUs present unique challenges for electrical infrastructure. A single NVIDIA H100 GPU can consume up to 700 watts—equivalent to running seven traditional desktop computers simultaneously. When you scale this to facilities with thousands of GPUs, you're looking at multi-megawatt loads that can fluctuate dramatically based on computational demands.

Traditional power grids weren't designed for such concentrated, rapidly-changing loads. When a data center suddenly ramps up its AI training workload or scales up inference services, the sudden power draw can cause voltage drops in the distribution network. Conversely, when workloads decrease, voltage can rise above safe limits. These voltage fluctuations can damage equipment and disrupt service for other customers connected to the same grid.

Conventional voltage regulation relies on mechanical devices called tap changers, which adjust transformer settings to maintain proper voltage levels. However, these devices are slow—often requiring 30-minute intervals between adjustments to prevent excessive wear. In the fast-moving world of AI workloads, this delay leaves the grid vulnerable to temporary voltage violations.

The Solution: Smarter Batch Processing

The researchers' key insight was recognizing that GPU batch size—a fundamental parameter in AI model inference—could serve as a powerful control knob for both performance and power consumption.

When an LLM like ChatGPT processes requests, it doesn't handle them one at a time. Instead, it groups multiple requests into "batches" and processes them together. This batch size directly affects three critical metrics:

Power Consumption: Larger batch sizes utilize GPUs more fully, drawing more electricity
Token Throughput: How many AI-generated tokens the system produces per second
Inter-Token Latency: How long users wait between words appearing on their screen

The relationship between these factors isn't linear—it follows a curved pattern that the researchers mapped using real measurement data from NVIDIA H100 GPUs running popular models like Meta's Llama 3.1 and Qwen 3. As batch size increases, power consumption and latency rise while throughput gains eventually plateau.

How GPU-to-Grid Works

The G2G framework operates as a closed-loop system that continuously balances three stakeholder needs:

For the Power Grid: The system monitors voltage at all buses in the distribution network. When voltage drops too low (undervoltage), the controller reduces GPU batch sizes, lowering power consumption and allowing voltage to recover. Surprisingly, when voltage rises too high (overvoltage)—which can happen with high renewable generation—the system increases batch sizes to consume more power and bring voltage back down.
For Users: The framework enforces latency constraints for each AI model, ensuring that reducing batch sizes to help the grid doesn't make the service unacceptably slow for users. Different models have different latency thresholds—for example, the massive Llama 3.1 405B model tolerates 0.12 seconds between tokens, while the faster Llama 3.1 8B model targets just 0.08 seconds.
For Data Center Operators: When grid and user constraints allow, the system maximizes token throughput, representing the business objective of serving as many AI requests as possible.

Real-World Performance

To validate their approach, the researchers simulated a 5-megawatt data center connected to the IEEE 13-bus distribution test network—a standard benchmark for power system research. The facility ran five different LLM models across 900 servers with 7,200 GPUs total, spread evenly across three electrical phases.

The results were striking. When comparing three scenarios—no control, traditional tap-changer control only, and GPU batch size control—the GPU-based approach reduced voltage violations by orders of magnitude:

No Control: Nearly 1,000 seconds of voltage violations with worst-case voltages reaching 0.935 per unit (safe range: 0.95-1.05)
Tap Changer Only: Actually performed worse (1,095 seconds of violations) due to slow response and overcorrection
GPU Control: Just 78 seconds of violations, with voltages staying much closer to acceptable limits

The integral voltage violation metric—which captures both duration and severity of problems—improved from 43.74 per-unit-seconds (tap-only) to just 0.057 per-unit-seconds with GPU control, a 99.87% reduction.

Three Operating Modes

During the hour-long test, the controller operated in three distinct modes based on system conditions:

Throughput-Driven Mode: When no constraints were active, the system maximized data center performance by setting batch sizes to optimize token generation across all models.
Voltage-Driven Mode: During critical grid events—like when a sudden training workload caused an undervoltage event around 16 minutes into the test—the controller aggressively reduced batch sizes. Conversely, when workload reductions caused overvoltage near 50 minutes, it increased batch sizes to consume more power.
Latency-Driven Mode: Later in the test, when fewer GPUs were active and statistical variability increased, latency constraints became the limiting factor, preventing the controller from pushing batch sizes too high even when grid conditions would otherwise allow it.

The Counter-Intuitive Finding

One of the most interesting discoveries challenges conventional wisdom: increasing GPU power consumption can actually benefit the grid.

For years, the focus in green computing has been on minimizing energy use. However, modern power grids increasingly face overvoltage problems due to distributed solar panels and other renewable sources pumping power into distribution networks during periods of high generation and low demand. In these situations, having flexible loads that can ramp up their consumption becomes valuable for grid stability.

The G2G framework enables this bidirectional support—reducing consumption during undervoltage and increasing it during overvoltage—making data centers truly grid-interactive assets rather than just problems to be managed.

Implementation Advantages

The proposed system offers several practical benefits:

Speed: GPU batch size changes take effect almost immediately (within one second), compared to 30-minute delays for mechanical tap changers.

No Additional Hardware: The solution uses existing GPU infrastructure and standard LLM serving software (tested with vLLM, a popular open-source inference server).

Model-Free Operation: The controller uses an online feedback optimization approach, meaning it doesn't need detailed knowledge of the grid topology or perfect models of GPU behavior. It learns and adapts based on real-time measurements.

Quality of Service: Throughout testing, the system maintained user-specified latency targets while simultaneously providing grid support.

Future Directions

While this research demonstrates proof-of-concept, several extensions could further enhance the framework:

Hardware-in-the-Loop Testing: The current study used synthetic workloads based on real GPU measurements. Full integration with live GPU clusters would validate end-to-end performance under truly unpredictable conditions.

Additional Control Knobs: Beyond batch size, other GPU parameters like frequency scaling (DVFS) and power caps could provide complementary flexibility with different response characteristics.

Multiple Grid Services: The framework currently targets voltage regulation, but the same principles could extend to frequency regulation, peak shaving, or participation in electricity markets.

Multi-Site Coordination: Large tech companies operate data centers across multiple locations. Coordinating GPU flexibility across sites could provide grid support at transmission network scales, not just local distribution feeders.

Workload Diversity: The study focused on LLM inference, but other AI workloads (training, computer vision, etc.) have different power-performance profiles that might offer unique grid-support capabilities.

Implications for Data Center Energy

This research arrives at a critical moment for data center energy policy. Grid operators and regulators are increasingly concerned about whether electrical infrastructure can keep pace with AI-driven demand growth. Some jurisdictions are considering restrictions on new data center construction or imposing special electricity rates.

The G2G framework suggests an alternative path: rather than treating data centers purely as problems, we can design them as flexible grid assets. This could:

Reduce infrastructure costs by avoiding or deferring expensive grid upgrades
Enable faster data center deployments in areas with constrained grid capacity
Create new revenue streams for operators through ancillary service markets
Improve renewable integration by providing fast-responding flexible demand

Perhaps most importantly, it demonstrates that optimizing for grid support doesn't necessarily mean sacrificing computational performance. With intelligent control, data centers can simultaneously pursue their business objectives while acting as good grid citizens.

The Road Ahead

As AI continues its exponential growth trajectory, the intersection of computing and power systems will only become more critical. The GPU-to-Grid framework represents a new paradigm where these domains are no longer separate—device-level computing decisions are made with grid awareness, and grid operations leverage the inherent flexibility of modern AI workloads.

For the broader energy transition, this matters enormously. Successfully integrating gigawatts of intermittent renewable generation requires flexible demand that can respond within seconds to grid conditions. Vehicle-to-grid technology has long promised this capability but faces adoption challenges. Data centers with GPU-to-Grid control could provide similar services today, using infrastructure that's already being built for other purposes.

The future may see data centers not just as the engines of artificial intelligence, but as intelligent participants in the electricity grid itself—a fitting symbiosis between two of the 21st century's most transformative technologies.

In Terms

GPU (Graphics Processing Unit)
Think of a GPU as a super-efficient calculator designed to handle thousands of simple math problems simultaneously. Originally built to render video game graphics, GPUs have become the workhorses of artificial intelligence because AI models require massive parallel computations. Modern data centers can house thousands of these chips, each consuming as much power as a small household appliance.

Batch Size
Imagine a restaurant kitchen that can either cook meals one at a time or prepare multiple orders together. Batch size is similar—it's the number of AI requests (like ChatGPT queries) that a GPU processes simultaneously in one go. Larger batches are more efficient but take longer to complete, like cooking ten burgers at once versus one at a time.

Voltage Regulation
Electricity in power lines needs to maintain a specific "pressure" (voltage) to work properly—too high and it can damage equipment, too low and devices won't function correctly. Voltage regulation is like a pressure regulator on a water pipe, constantly adjusting to keep everything flowing at just the right level despite changing demand throughout the day.

Distribution Network
This is the final stretch of the electrical grid—the local power lines and transformers that deliver electricity from substations to homes, businesses, and data centers in your neighborhood. Think of it as the small streets and avenues of the power grid, as opposed to the transmission lines which are like highways carrying power across long distances.

Latency
In the AI world, latency is the delay between asking a question and receiving a response. When you chat with an AI like ChatGPT, inter-token latency specifically measures how long you wait between each word appearing on your screen. Low latency means the AI "talks" quickly and smoothly; high latency makes it feel sluggish and frustrating to use.