Researchers developed a system that lets AI data centers stabilize electrical grid voltages by adjusting GPU batch sizes in real-time—reducing voltage violations by 99.87% while maintaining user service quality, turning energy-hungry data centers into grid-supporting assets.
The explosive growth of artificial intelligence has brought data centers into the spotlight—not just for their computational power, but for their massive electricity consumption. As AI workloads continue to expand, data centers are projected to reach 945 TWh globally by 2030, more than doubling their 2024 consumption. But what if these energy-hungry facilities could actually help stabilize our power grids instead of just straining them?
Researchers from the University of Michigan have developed an innovative "GPU-to-Grid" (G2G) framework that transforms AI data centers from passive electricity consumers into active grid-supporting resources. Their breakthrough demonstrates that by intelligently controlling how GPUs process AI workloads—specifically through adjusting batch sizes for large language model (LLM) inference—data centers can help regulate voltage in distribution networks while maintaining quality service for users.
Modern data centers housing thousands of GPUs present unique challenges for electrical infrastructure. A single NVIDIA H100 GPU can consume up to 700 watts—equivalent to running seven traditional desktop computers simultaneously. When you scale this to facilities with thousands of GPUs, you're looking at multi-megawatt loads that can fluctuate dramatically based on computational demands.
Traditional power grids weren't designed for such concentrated, rapidly-changing loads. When a data center suddenly ramps up its AI training workload or scales up inference services, the sudden power draw can cause voltage drops in the distribution network. Conversely, when workloads decrease, voltage can rise above safe limits. These voltage fluctuations can damage equipment and disrupt service for other customers connected to the same grid.
Conventional voltage regulation relies on mechanical devices called tap changers, which adjust transformer settings to maintain proper voltage levels. However, these devices are slow—often requiring 30-minute intervals between adjustments to prevent excessive wear. In the fast-moving world of AI workloads, this delay leaves the grid vulnerable to temporary voltage violations.
The researchers' key insight was recognizing that GPU batch size—a fundamental parameter in AI model inference—could serve as a powerful control knob for both performance and power consumption.
When an LLM like ChatGPT processes requests, it doesn't handle them one at a time. Instead, it groups multiple requests into "batches" and processes them together. This batch size directly affects three critical metrics:
The relationship between these factors isn't linear—it follows a curved pattern that the researchers mapped using real measurement data from NVIDIA H100 GPUs running popular models like Meta's Llama 3.1 and Qwen 3. As batch size increases, power consumption and latency rise while throughput gains eventually plateau.
The G2G framework operates as a closed-loop system that continuously balances three stakeholder needs:
To validate their approach, the researchers simulated a 5-megawatt data center connected to the IEEE 13-bus distribution test network—a standard benchmark for power system research. The facility ran five different LLM models across 900 servers with 7,200 GPUs total, spread evenly across three electrical phases.
The results were striking. When comparing three scenarios—no control, traditional tap-changer control only, and GPU batch size control—the GPU-based approach reduced voltage violations by orders of magnitude:
The integral voltage violation metric—which captures both duration and severity of problems—improved from 43.74 per-unit-seconds (tap-only) to just 0.057 per-unit-seconds with GPU control, a 99.87% reduction.
During the hour-long test, the controller operated in three distinct modes based on system conditions:
One of the most interesting discoveries challenges conventional wisdom: increasing GPU power consumption can actually benefit the grid.
For years, the focus in green computing has been on minimizing energy use. However, modern power grids increasingly face overvoltage problems due to distributed solar panels and other renewable sources pumping power into distribution networks during periods of high generation and low demand. In these situations, having flexible loads that can ramp up their consumption becomes valuable for grid stability.
The G2G framework enables this bidirectional support—reducing consumption during undervoltage and increasing it during overvoltage—making data centers truly grid-interactive assets rather than just problems to be managed.
The proposed system offers several practical benefits:
Speed: GPU batch size changes take effect almost immediately (within one second), compared to 30-minute delays for mechanical tap changers.
No Additional Hardware: The solution uses existing GPU infrastructure and standard LLM serving software (tested with vLLM, a popular open-source inference server).
Model-Free Operation: The controller uses an online feedback optimization approach, meaning it doesn't need detailed knowledge of the grid topology or perfect models of GPU behavior. It learns and adapts based on real-time measurements.
Quality of Service: Throughout testing, the system maintained user-specified latency targets while simultaneously providing grid support.
While this research demonstrates proof-of-concept, several extensions could further enhance the framework:
Hardware-in-the-Loop Testing: The current study used synthetic workloads based on real GPU measurements. Full integration with live GPU clusters would validate end-to-end performance under truly unpredictable conditions.
Additional Control Knobs: Beyond batch size, other GPU parameters like frequency scaling (DVFS) and power caps could provide complementary flexibility with different response characteristics.
Multiple Grid Services: The framework currently targets voltage regulation, but the same principles could extend to frequency regulation, peak shaving, or participation in electricity markets.
Multi-Site Coordination: Large tech companies operate data centers across multiple locations. Coordinating GPU flexibility across sites could provide grid support at transmission network scales, not just local distribution feeders.
Workload Diversity: The study focused on LLM inference, but other AI workloads (training, computer vision, etc.) have different power-performance profiles that might offer unique grid-support capabilities.
This research arrives at a critical moment for data center energy policy. Grid operators and regulators are increasingly concerned about whether electrical infrastructure can keep pace with AI-driven demand growth. Some jurisdictions are considering restrictions on new data center construction or imposing special electricity rates.
The G2G framework suggests an alternative path: rather than treating data centers purely as problems, we can design them as flexible grid assets. This could:
Perhaps most importantly, it demonstrates that optimizing for grid support doesn't necessarily mean sacrificing computational performance. With intelligent control, data centers can simultaneously pursue their business objectives while acting as good grid citizens.
As AI continues its exponential growth trajectory, the intersection of computing and power systems will only become more critical. The GPU-to-Grid framework represents a new paradigm where these domains are no longer separate—device-level computing decisions are made with grid awareness, and grid operations leverage the inherent flexibility of modern AI workloads.
For the broader energy transition, this matters enormously. Successfully integrating gigawatts of intermittent renewable generation requires flexible demand that can respond within seconds to grid conditions. Vehicle-to-grid technology has long promised this capability but faces adoption challenges. Data centers with GPU-to-Grid control could provide similar services today, using infrastructure that's already being built for other purposes.
The future may see data centers not just as the engines of artificial intelligence, but as intelligent participants in the electricity grid itself—a fitting symbiosis between two of the 21st century's most transformative technologies.
GPU (Graphics Processing Unit)
Think of a GPU as a super-efficient calculator designed to handle thousands of simple math problems simultaneously. Originally built to render video game graphics, GPUs have become the workhorses of artificial intelligence because AI models require massive parallel computations. Modern data centers can house thousands of these chips, each consuming as much power as a small household appliance.
Batch Size
Imagine a restaurant kitchen that can either cook meals one at a time or prepare multiple orders together. Batch size is similar—it's the number of AI requests (like ChatGPT queries) that a GPU processes simultaneously in one go. Larger batches are more efficient but take longer to complete, like cooking ten burgers at once versus one at a time.
Voltage Regulation
Electricity in power lines needs to maintain a specific "pressure" (voltage) to work properly—too high and it can damage equipment, too low and devices won't function correctly. Voltage regulation is like a pressure regulator on a water pipe, constantly adjusting to keep everything flowing at just the right level despite changing demand throughout the day.
Distribution Network
This is the final stretch of the electrical grid—the local power lines and transformers that deliver electricity from substations to homes, businesses, and data centers in your neighborhood. Think of it as the small streets and avenues of the power grid, as opposed to the transmission lines which are like highways carrying power across long distances.
Latency
In the AI world, latency is the delay between asking a question and receiving a response. When you chat with an AI like ChatGPT, inter-token latency specifically measures how long you wait between each word appearing on your screen. Low latency means the AI "talks" quickly and smoothly; high latency makes it feel sluggish and frustrating to use.
Zhirui Liang, Jae-Won Chung, Mosharaf Chowdhury, Jiasi Chen, Vladimir Dvorkin. GPU-to-Grid: Voltage Regulation via GPU Utilization Control. https://doi.org/10.48550/arXiv.2602.05116
From: University of Michigan.