Developing Accelerated Infrastructure for AI  | TheFutureEconomy.ca
Published on

AI has the potential to transform the way we live. But for AI to become sustainable and pervasive, we also have to transform our computing infrastructure. 

The world’s existing technologies, simply put, weren’t designed for the data-intensive, highly parallel computing problems that AI serves up. As a result, AI clusters and data centers aren’t nearly as efficient or elegant as they could be: in many ways, it’s brute force computing. Power and water consumption in data centers are growing dramatically and many communities around the world are pushing back on plans to expand data infrastructure.

Canada’s Moment to Lead in Greener Data Centers

Serious young Asian IT specialist in white coat standing at cart and using laptop while performing diagnostic tests

Canada can and will play a leading role in overcoming these hurdles. Data center expansion is already underway. Data centers currently account for around 1GW, or 1%, of Canada’s electricity capacity. If all of the projects in review today get approved, that total could grow to 15GW, or enough to power 70% of the homes in the country.

“It may be worth examining if carbon capture—combined with mineralization for long-term storage—can work on this smaller scale. If it does, the technology could be exported to other regions.”

Like in other regions, data center operators are exploring ways to increase their use of renewables and nuclear in these new facilities along with ambient cooling to reduce their carbon footprint of their facilities. In Alberta, some companies are also exploring adding carbon capture to the design of data centers powered by natural gas. To date, carbon capture has not lived up to its promise. Most carbon capture experiments, however, have been coupled with large-scale industrial plants. It may be worth examining if carbon capture—combined with mineralization for long-term storage—can work on this smaller scale. If it does, the technology could be exported to other regions.

Fixing facilities, however, is only part of the equation. AI requires a fundamental overhaul in the systems and components that make up our networks. 

Reinventing the Core: Semiconductors and Interconnects

selective focus of engineer holding microchip near computer motherboard

“Semiconductors are the foundation of AI,” said Dr. Loi Nguyen, executive vice president and general manager of cloud optics at Marvell in a speech at the National University of Singapore last year. “Without them, there would be no AI.”

Consider the fiber optical interconnects that link together large AI clusters. Twenty-five years ago, interconnects could send 100 megabits of data per second. Today, they can deliver over one terabit per second, or 10,000 times faster than cutting-edge speeds two decades ago, and future, cutting-edge clusters will require millions of them. Meanwhile, power per bit has declined by 1000x over this period.

“Because of the growing bandwidth and performance demands of XPUs, the number of optical transceivers is growing faster than XPUs.”

The problem? Data traffic and the number of optical interconnects inside data centers has grown even faster. One of the architectural nuances of AI as it works today is that each AI accelerator or XPU needs to frequently synchronize with every other XPU in the same cluster. Because of the growing bandwidth and performance demands of XPUs, the number of optical transceivers is growing faster than XPUs. In a 1,000-XPU cluster, 2000 optical connections might be needed. In the 100,000-XPU clusters companies are now talking about, the transceiver count could hit 500,000.1

Innovating for Efficiency at Scale

Marvell Canada, among others, is trying to tackle this problem from a number of different directions. We are collaborating on technology for virtual hyperscalers, i.e. fast optical links that would allow them to build networks of smaller, distributed data centers that can replicate the performance and functionality of mega-sized data centers with fewer resources or grid disruptions. COLORZ® 800, an optical module we released last year, for example, can transmit data up to 1000 kilometers while reducing capex and opex up to 75%.8 

“Increasingly, chips will get customized. Instead of buying a generic CPU or GPU, cloud service providers are designing chips fine-tuned to their own applications and software environments.”

Collectively, the industry is also developing entirely new classes of devices—PCIe retimers, active electrical cables, transmit-only optical DSPs, LPO chipsets—to replace or supplement existing products. The acronyms can be bewildering, but the aim is the same: to boost the power/performance by specifically optimizing for narrower use cases.

Increasingly, chips will get customized. Instead of buying a generic CPU or GPU, cloud service providers are designing chips fine-tuned to their own applications and software environments. Some analysts estimate that customized chips can reduce the power consumed for processing by 25% or more2.

Preparing for the Demands of an AI-Driven Future

And we’re just at the start. As consumers and businesses get acclimated to AI, they will want to use it in more situations and demand more granular answers, increasingly in real-time. Semiconductor companies will have to move faster and collaborate across an expanding ecosystem of research universities and partners to keep ahead of demand. 

We can’t predict what technological changes will be needed; we just know we will need them.

Calls to Action:

  • Invest in renewable energy, nuclear, and ambient cooling to reduce the carbon footprint of Canada’s expanding data center infrastructure.
  • Explore small-scale carbon capture and mineralization as a potential solution for reducing emissions from natural gas-powered data centers.
  • Overhaul network systems and components, focusing on advanced semiconductors and optical interconnect technologies to meet AI’s growing demands.
  • Develop virtual hyperscalers to create networks of smaller, distributed data centers that replicate the performance of mega-sized facilities with fewer resources.
  • Collaborate across the semiconductor ecosystem, including research universities and industry partners, to accelerate innovation and keep pace with AI-driven demand.

References:

1. Marvell AI Day April 2024. 

2. 650 Group. July 2024

This article contains forward-looking statements within the meaning of the federal securities laws that involve risks and uncertainties. Forward-looking statements include, without limitation, any statement that may predict, forecast, indicate or imply future events or achievements. Actual events or results may differ materially from those contemplated in this blog. Forward-looking statements are only predictions and are subject to risks, uncertainties and assumptions that are difficult to predict, including those described in the “Risk Factors” section of our Annual Reports on Form 10-K, Quarterly Reports on Form 10-Q and other documents filed by us from time to time with the SEC. Forward-looking statements speak only as of the date they are made. Readers are cautioned not to put undue reliance on forward-looking statements, and no person assumes any obligation to update or revise any such forward-looking statements, whether as a result of new information, future events or otherwise.