New-Tech Europe Magazine | Q1 2023

The Age of Inference - Future Pervasive AI Architectures for Edge and Cloud Look Unified and Scalable

Ivo Bolsens, Senior VP, AMD

The world of artificial intelligence and machine learning (AI/ML) is fragmented into different domains. Two of these domains represent splits between training and inference, and cloud versus edge. There are myriad other AI/ML task differentiations, but these two splits are the major topics of discussion for this article. AI/ML training develops models that inference uses to recognize whatever needs identifying, whether its light versus heavy traffic on a smart city’s street, the clearance level for an ID badge and matching face used for secure access control, words spoken by a telephone caller to a customer service call center, or a handwritten address on an envelope at a postal sorting center. Training normally takes place in enterprise data centers or in the cloud where many high-powered servers, plenty of memory, hardware accelerators, and high-speed networking can be thrown at the workload. In this environment, tremendous amounts of electrical power

for computing, networking, and cooling are used for training with the aim of finishing quickly. Inference workloads can also be performed in a data center or the cloud, but increasingly, inference tasks are migrating to the edge, for several reasons. First, there’s the issue of latency. It takes time to ship raw data back to the cloud or data center. It takes more time to perform the inference, and it takes yet more time to ship the desired answer or decision back to the edge. For some real-time tasks – including factory automation, radar, and electronic warfare – decisions that take too long can be costly. Two more reasons that inference workloads are migrating to the edge involve power: computing power and electrical power. As AI/ML inference workloads migrate to large numbers of edge devices, the aggregate computing power of millions of inference engines in those edge devices exceeds the

computing power of a data center’s servers. In addition, edge inference engines don’t consume large amounts of power. Many interesting chips with new computing architectures have been announced recently to handle the unique needs of edge inference. Makers highlight big teraFLOPS and teraOPS (TFLOPS and TOPs) computing numbers that their devices can attain, with less power consumption. While it’s true that inference workloads require plenty of TFLOPS and TOPS, these specialized edge inference chips represent a one way architectural street, which may prove to be an undesirable route when considering combined training and inference workloads. Today, AI/ML model training workloads largely run on high-powered CPUs and GPUs in data centers where they draw large amounts of power and leverage advanced cooling to perform the many trillions of calculations needed to train

26 l New-Tech Magazine Europe

Made with FlippingBook - professional solution for displaying marketing and sales documents online