智算中心论文观察｜2026-06-08

Current Issue

Volume 2026 · Issue 06-08

按期刊卷期页方式整理本期论文。每条仅使用日报已列出的可追溯公开来源，不新增未经核验事实。

Research Article热管理与液冷

Toward Communication-Efficient Space Data Centers: Bottlenecks, Architectures, and New Paradigms

Minghao Sun、Zehui Chen、Jinbo Hou、Kezhi Wang、Xiaoli Chu

Published 2026-05-13 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

The rapid growth of foundation model training and large-scale AI services has driven ground data centers toward unprecedented power densities, intensifying challenges in energy supply, cooling, and spatial scalability. Space Data Centers (SDCs) have emerged as a promising paradigm for hosting energy-intensive computing infrastructures in orbit, leveraging continuous solar energy and radiative cooling advantages. However, unlike ground facilities primarily constrained by power and site availability, SDCs are fundamentally limited by communication capability. The gap between petabit-scale internal data exchange in ground data centers and the gigabit-scale capacity of ground-space links forms a critical bottleneck. This article systematically analyzes communication constraints in SDC architectures and explores semantic communication as a key enabling paradigm. By transmitting compact, task-relevant semantic representations instead of raw data, uplink pressure can be substantially reduced. The feasibility of communication-efficient orbital AI infrastructures is demonstrated through the evaluation of a multi-layer heterogeneous SDC framework consisting of relay satellites and orbital computing nodes operating under coupled energy and thermal constraints. The article further outlines open research challenges toward scalable deployment.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，液冷、热管理和数据中心能效正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用综述归纳和指标比较，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向冷却效率、能源利用或运维策略的改进方向。意义：对日报读者而言，它可用于判断液冷方案、热管理路线和高密度部署节奏。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Minghao Sun, Zehui Chen, Jinbo Hou, 等. Toward Communication-Efficient Space Data Centers: Bottlenecks, Architectures, and New Paradigms[J/OL]. (2026-05-13)[2026-06-08]. http://arxiv.org/abs/2605.12681v1.

Full text 中文海报

Research Article能效优化

Position: LLM Inference Should Be Evaluated as Energy-to-Token Production

Xiang Liu、Shimiao Yuan、Zhenheng Tang、Peijie Dong、Kaiyong Zhao、Qiang Wang、Bo Li、Xiaowen Chu

Published 2026-05-12 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

LLM inference is still evaluated mainly as a model or software problem: accuracy, latency, throughput, and hardware utilization. This is incomplete. At deployment scale, the relevant output is a quality-conditioned token produced under joint constraints from effective compute, delivered data-center power, cooling capacity, PUE, and utilization. We argue that the ML community should treat inference as \emph{energy-to-token production}. We formalize this view with a dimensionally consistent Token Production Function in which token rate is bounded by both compute-per-token and energy-per-token ceilings. Listed API prices vary by over an order of magnitude across providers, but we use price dispersion only as directional motivation, not as causal evidence of marginal cost. The core physical question is instead: under fixed quality and service targets, when does the binding constraint move from theoretical peak compute toward delivered power, cooling, and operational efficiency? Under this framing, system optimizations -- latent KV-cache compression, sparse or heavily compressed attention, quantization, routing, and difficulty-adaptive reasoning -- are not merely local engineering tricks. They are energy-to-token levers because they reduce FLOPs/token, joules/token, memory traffic, or utilization losses under fixed $(q^{*},s^{*})$. We therefore call for inference papers and benchmarks to report Joules/token, active binding constraint, PUE-adjusted delivered power, and utilization-adjusted token output alongside accuracy and latency.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，PUE/WUE、能效指标和运营成本控制正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用建模优化、调度分析或算法评估，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向能效评价口径、运营指标和优化目标的系统化梳理。意义：对日报读者而言，它可用于判断不同能效指标是否真实反映节能和成本收益。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Xiang Liu, Shimiao Yuan, Zhenheng Tang, 等. Position: LLM Inference Should Be Evaluated as Energy-to-Token Production[J/OL]. (2026-05-12)[2026-06-08]. http://arxiv.org/abs/2605.11733v1.

Full text 中文海报

Research Article芯片与算力

Space-CIM: Enabling Compute-In-Memory Accelerators for Thermally-Constrained Space Platforms

Sohan Salahuddin Mugdho、Md. Shahedul Hasan、Cheng Wang

Published 2026-06-04 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

The rapid growth in compute demand from artificial intelligence (AI) has driven a massive surge in data center construction, precipitating an energy and sustainability crisis. Motivated by the abundant solar energy in outer space and the recent sharp reduction in space launch costs, orbital data centers are emerging as a potential pathway for the future scaling of AI compute infrastructure. While the cold background in vacuum seems appealing for cooling, computing systems operating in space without convection ultimately rely on radiative cooling, requiring large-area radiators. Such limitations in thermal management pose a significant challenge for deploying the standard liquid/air-cooled computers in space. In this work, we investigate the impact of the thermal constraints in space on both graphics processing units (GPUs) with high-bandwidth memory (HBM) and the emerging compute-in-memory (CIM) accelerators. We develop a radiator-in-the-loop co-design methodology that directly links the permitted system TOPS (terra-operations per second) with the practical radiator cooling capacity in space. Our thermal simulations reveal that the separately located GPU die and HBMs create severe thermal hotspots under limited radiator capacity, necessitating GPU thermal throttling. In contrast, CIM accelerators exhibit a much more uniform heat distribution and consistently outperform GPUs in TOPS/W across a wide range of radiator budgets. We systematically evaluated the performance of CIM and GPU across various AI workloads and demonstrated that CIM has a magnified advantage for deployment in space under realistic thermal constraints.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，芯片、服务器和高密度算力部署正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用综述归纳和指标比较，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向跨地域数据中心负载与电力资源之间的调度关系。意义：对日报读者而言，它可用于判断芯片路线和服务器密度变化如何传导到机房设计。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Sohan Salahuddin Mugdho, Md. Shahedul Hasan, Cheng Wang. Space-CIM: Enabling Compute-In-Memory Accelerators for Thermally-Constrained Space Platforms[J/OL]. (2026-06-04)[2026-06-08]. http://arxiv.org/abs/2606.05741v1.

Full text 中文海报

Research Article算电协同

Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination

Yugui Liu、Yibo Ding、Xudong Li、Jing Qu、Wenyi Zhang、Tong Qian、Wuyou Xiao、Zhengyang Hu

Published 2026-06-03 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs, thereby failing to fully unlock this flexibility. In this paper, a bi-level computation-electricity coordination framework is proposed to explicitly capture the bidirectional interactions between DCs and power grid. Firstly, a peer-to-peer cloud service market (P2P-CSM) for geo-distributed DCs is proposed, which enables bilateral cloud service transactions to leverage regional heterogeneities (e.g., electricity prices, cooling efficiency). Secondly, locational marginal prices are embedded into the framework to reflect network congestion and nodal price disparities. Thirdly, a dual consensus alternating direction method of multipliers (ADMM)-based decentralized algorithm is developed as the P2P market clearing algorithm, and a bisection-assisted iterative algorithm is proposed to ensure rigorous convergence of the framework. Case studies conducted on modified IEEE 30-bus system validate that the P2P-CSM achieves a win-win computation-electricity coordination: it not only increases total DC operational profit by 22.8\%, but also effectively alleviates grid congestion and yields a 3.2\% reduction in total energy consumption.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用框架构建和频域/系统级分析，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义：对日报读者而言，它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Yugui Liu, Yibo Ding, Xudong Li, 等. Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination[J/OL]. (2026-06-03)[2026-06-08]. http://arxiv.org/abs/2606.04981v1.

Full text 中文海报

Research Article热管理与液冷

Maximizing Compute Capacity in AI Data Centers through Cooling, Energy Storage, and Computing Adaptation

Shaolei Ren、Mohammad A. Islam、Adam Wierman

Published 2026-05-30 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

The deployment of artificial intelligence is increasingly constrained by limited site-level power capacity, which must support both compute systems and non-compute systems (primarily cooling) at all times. Cooling power demand, especially in non-evaporative cooling systems, can increase substantially with ambient temperature in the summer, producing recurring periods of elevated cooling power that often lasts for multiple hours per day. Therefore, maximizing compute capacity under a limited site-level power budget is an important planning and operational challenge. Sizing the compute system conservatively based on peak cooling power can leave part of the site-level power capacity underutilized when the cooling power is below its peak, particularly in cooler months. On the other hand, sizing the compute system aggressively based on low cooling power can cause the total site-level power demand to exceed the site-level power capacity during hot days in the summer. This paper proposes ComputeAmp (Compute Amplifier), a framework that maximizes the compute capacity by jointly and dynamically leveraging cooling, battery energy storage, and computing-based adaptation. We discuss the opportunities and limitations of ComputeAmp and illustrate its potential to significantly expand usable compute capacity within local power and water resource limits. We also present a problem formulation for ComputeAmp and highlight a few algorithmic and operational challenges.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，液冷、热管理和数据中心能效正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用框架构建和频域/系统级分析，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向冷却效率、能源利用或运维策略的改进方向。意义：对日报读者而言，它可用于判断液冷方案、热管理路线和高密度部署节奏。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Shaolei Ren, Mohammad A. Islam, Adam Wierman. Maximizing Compute Capacity in AI Data Centers through Cooling, Energy Storage, and Computing Adaptation[J/OL]. (2026-05-30)[2026-06-08]. http://arxiv.org/abs/2606.00457v1.

Full text 中文海报

Research Article算电协同

Grid Capacity Expansion under Data Centers and Electrified Manufacturing Large Loads

Jiyong Lee、Melody Agustin、Joanne Langsdorf、Erhan Kutanoglu、Michael Baldea、Ilias Mitrai

Published 2026-05-28 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

In this paper, we consider the expansion of power grids under emerging large loads from data centers and electrified manufacturing. We develop a multi-period grid capacity expansion model to determine optimal investment profiles for power generation, storage, and transmission capacity while accounting for hourly power dispatch, such that electricity demand is satisfied and the total planning and operation cost is minimized. We also propose a new modeling approach regarding the spatial distribution of demand from large loads. The model is used to analyze the expansion of a synthetic grid that follows key characteristics of the ERCOT system over a seven-year planning horizon, under loads from data centers and electrified oil refining, which account for 17.5% and 4.7% of total annual electricity demand by the end of the planning horizon. The optimal investment policy leads to an 83.6% increase in generation capacity and exploits the short construction times of solar and storage as well as the operational flexibility of thermal generators. Finally, sensitivity analysis reveals that the construction time of grid assets substantially impacts investment timing, generation technology mix, and transmission capacity expansion. The proposed modeling framework is general and can be extended to other grid systems, enabling the exploration of diverse demand scenarios, policy assumptions, and regional characteristics.

中文解读

参考文献

Jiyong Lee, Melody Agustin, Joanne Langsdorf, 等. Grid Capacity Expansion under Data Centers and Electrified Manufacturing Large Loads[J/OL]. (2026-05-28)[2026-06-08]. http://arxiv.org/abs/2605.29053v2.

Full text 中文海报

Research Article算电协同

GridPilot: Real-Time Grid-Responsive Control for AI Supercomputers

Denisa-Andreea Constantinescu、David Atienza

Published 2026-05-26 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

At global scale, data-center electricity demand is growing faster than the grids that supply it, while system operators increasingly require large flexible loads that can adjust power within seconds to absorb variable wind and solar generation. For multi-megawatt AI/HPC facilities, the key unresolved question is practical and measurable: how quickly can the software stack translate a grid request into a real change in GPU power at the facility meter, where commitments are settled? We answer this on real hardware with GridPilot, a three-tier predictive controller operating across milliseconds, seconds, and hours, augmented by a deterministic safety-island bypass for fast response. On a three-GPU NVIDIA V100 testbed, GridPilot achieves a measured end-to-end trigger-to-target response of 97.2 ms, which is 6.9x faster than the 700 ms requirement of Nordic Fast Frequency Reserve. We further incorporate an instantaneous Power Usage Effectiveness (PUE) correction so dispatched commitments remain robust at meter level rather than only at IT load level. In replay experiments across six representative European grids (from Sweden to Poland), the PUE-aware controller closes 2.5-5.8 percentage points of cooling-overhead drag. GridPilot is released as open source and serves as a proof of concept that MW-scale AI/HPC demand can be engineered as controllable, grid-responsive flexibility by design.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用实验验证、原型测试或测量对比，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义：对日报读者而言，它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Denisa-Andreea Constantinescu, David Atienza. GridPilot: Real-Time Grid-Responsive Control for AI Supercomputers[J/OL]. (2026-05-26)[2026-06-08]. http://arxiv.org/abs/2605.26384v1.

Full text 中文海报

Research ArticleAI 运维优化

Energy-Aware Computing in the Year 2026

Roblex Nana Tchakoute、Claude Tadonki

Published 2026-05-23 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

High-Performance Computing (HPC) has recently entered the Exascale era, and considerable efforts are being made to fully harness this potential power for large-scale applications, such as cutting-edge generative AI (training and exploitation). The corresponding energy consumption is very high, and forecasts are alarming, making this metric a critical systemic bottleneck. Addressing this issue presents a genuine challenge for the entire cloud-edge-HPC continuum at all scales, from low-power IoT microcontrollers to multi-megawatt data centers. Beyond financial costs, green computing is driven by considerations related to climate change and environmental concerns such as carbon footprint ($CO_2e$), as well as constraints on energy production and supply, leading to a real need to regulate {\em information and communication technology} (ICT) activities. This article presents a comprehensive overview of energy-efficient computing, taking into account the most recent and significant contributions. Based on this exploration of the state of the art, we design and describe a holistic taxonomy of the aforementioned publications, structured around various perspectives, including {\em hardware and software aspects, measurement instrumentation, software optimizations, dynamic task scheduling, voltage scaling, workload consolidation, federated learning}, and {\em cooling}. Particular emphasis is placed on large-scale AI, which receives significant attention due to its considerable resource requirements. We conclude with an analysis of a forward-looking roadmap that considers the main perspectives of sustainable computing.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，AI 运维、负载预测和设施调优正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用建模优化、调度分析或算法评估，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向能效评价口径、运营指标和优化目标的系统化梳理。意义：对日报读者而言，它可用于判断AI 工具是否能降低运维复杂度并提升可用性。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Roblex Nana Tchakoute, Claude Tadonki. Energy-Aware Computing in the Year 2026[J/OL]. (2026-05-23)[2026-06-08]. http://arxiv.org/abs/2605.24569v1.

Full text 中文海报

智算中心论文专站

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献