New-Tech Europe | June 2017

Embedded Solutions Special Edition

vision c5 dsp block diagramThe Vision C5 DSP neural network processor is: A complete, standalone DSP that runs all layers of CNN (convolution, fully connected, normalization, pooling...) A DSP for the fast-changing neural network field: programmable and future-proof Performance of 1 TMAC/s (trillion multiply-accumulates per second) 1024 8-bit MACs or 512 16-bit MACs for exceptional performance at both resolutions 128-way, 8-bit SIMD or 64-way, 16- bit SIMD VLIW architecture Not a hardware accelerator to pair with a vision DSP, rather a dedicated neural network optimized processor Architected for multi-processor design—scales to multi-TMAC/s solutions Same proven software tool set as the Vision P5 and P6 DSPs <1mm2 in 16nm vision c5 software flowWonderful hardware is not a lot of use if it is too difficult to program. There are standard open-source CNN frameworks such as Caffe and TensorFlow that are the most common way to develop in the space. These flow cleanly into the CNN mapper and then all the way down to the Vision C5 DSP. Summary The Vision C5 is targeted at high- performance CNN applications that require TMAC/s operation. For lower performance, such as the neural nets that are occasionally required in mobile, the Vision P6 DSP is more appropriate, with a performance of up to 200 GMAC/s. For the most demanding applications, multicore versions of the Vision C5 DSP fit the bill.

is targeted at vision, lidar, voice, and radar applications in the mobile, surveillance, automotive, drone, and wearable markets. It has a computational capacity of 1TeraMAC/s (trillion multiply- accumulate operations per second). It is not an accelerator, it is a standalone self-contained neural network DSP. This is important since accelerators only handle part of the problem, requiring a lot of processing power on whatever other processor is in use to do the rest. For example, they may only handle the convolutional (first) step of a CNN, which in addition to only partially offloading the computation, means that a lot of bandwidth is going to be

used shifting data back and forth. The Vision C5 DSP completely offloads all the processing and minimizes the data movement, where much of the power is actually consumed. Typically, neural network applications are divided into two phases, training and inference. The training is normally done in the cloud and requires processing large sets of data requiring 1016 to 1022 MACs per dataset. Inference usually runs closer to the edge of the network, in the drone or car for example. Each image requires 108 to 1012 MACs. The biggest issue, though, is power. It is this inference phase of using neural networks where the Vision C5 DSP is focused.

New-Tech Magazine Europe l 61

Made with