White Paper No. 110
The AI Disruption: Challenges and Guidance for Data Center Design
White Paper No. 110
Artificial intelligence (AI) is rapidly transforming various sectors such as healthcare, finance, manufacturing, transportation, and entertainment. Generative AI and predictive algorithms are driving this growth, necessitating advanced data center infrastructure to support the increasing demand for AI workloads. This shift is leading to higher rack power densities and presents new challenges in data center design and management. AI start-ups, enterprises, colocation providers, and internet giants must address these challenges to ensure efficient and reliable data center operations.
The AI Disruption: Challenges and Guidance for Data Center Design
AI Growth and Power Consumption: AI currently represents 4.5 GW of power consumption, projected to grow at a compound annual growth rate (CAGR) of 25% to 33%, reaching 14 GW to 18.7 GW by 2028. This growth is significantly higher than the overall data center power demand CAGR of 10%. The increasing demand for AI, especially inference loads, will drive the need for more advanced servers, efficient instruction sets, and improved chip performance.
Power Distribution:
Traditional single phase distribution is impractical for high-density AI racks. A shift to three phase distribution is recommended to reduce the number of circuits and manage higher power densities effectively.
Standard 63 A rack PDUs may not provide sufficient capacity for high-density AI racks. Custom PDUs and careful load analysis are advised to ensure adequate power distribution and personnel safety.
Cooling:
Air cooling is inadequate for AI clusters exceeding 20 kW per rack. Liquid cooling solutions, such as direct-to-chip (DTC) and immersion cooling, are recommended to handle higher thermal design power (TDP) of GPUs and maintain efficient operations.
Bespoke cooling distribution designs are necessary for large-scale AI deployments. Retrofitting existing data centers with liquid cooling systems can be challenging and requires expert assessment to avoid operational disruptions.
Racks:
Standard-width, depth, and height racks may not accommodate the increasing size and power requirements of AI servers. Wider (750 mm), deeper (1,200 mm), and taller (48U or higher) racks with high weight-bearing capacities are recommended to support AI workloads.
Software Management:
Software tools such as Data Center Infrastructure Management (DCIM), Electrical Power Monitoring Systems (EPMS), Building Management Systems (BMS), and digital electrical design tools are crucial for managing high-density AI clusters. These tools help monitor power and cooling capacities, prevent unexpected behavior, and optimize data center layout and resources.
Future Outlook:
Advanced technologies such as solid-state transformers, solid-state circuit breakers, and sustainable dielectric fluids will further enhance data center infrastructure to support AI workloads.
Deeper and AI-optimized IT racks will become standard to accommodate evolving server designs and cooling requirements.
Increased interaction and optimization with the grid will help balance electricity consumption and improve efficiency.
Conclusion:
The rapid growth of AI is significantly impacting data center design and operations. AI workloads are projected to consume 15% to 20% of total data center energy by 2028. The extreme power density of AI training clusters presents challenges in power, cooling, racks, and software management. Addressing these challenges with the recommended guidance will ensure efficient and reliable data center operations to support the evolving demands of AI technologies.
Telephone: 01943 831990
Email: info@advancedpower.co.uk