Try our mobile app

CEVA Redefines High Performance AI/ML Processing for Edge AI and Edge Compute Devices with its NeuPro-M Heterogeneous and Secure Processor Architecture

Published: 2022-01-06 12:00:00 ET
<<<  go to CEVA company page

- 3rd generation NeuPro AI/ML architecture offers scalable performance of 20 to 1,200 TOPS at SoC and Chiplet levels, lowers memory bandwidth by 6X

- Targets broad use of AI/ML in automotive, industrial, 5G networks and handsets, surveillance cameras, and Edge Compute

LAS VEGAS, Jan. 6, 2022 /PRNewswire/-- Consumer Electronics Show  CEVA, Inc. (NASDAQ: CEVA), the leading licensor of wireless connectivity and smart sensing technologies and integrated IP solutions, today announced NeuPro-M, its latest generation processor architecture for artificial intelligence and machine learning (AI/ML) inference workloads. Targeting the broad markets of Edge AI and Edge Compute, NeuPro-M is a self-contained heterogeneous architecture that is composed of multiple specialized co-processors and configurable hardware accelerators that seamlessly and simultaneously process diverse workloads of Deep Neural Networks, boosting performance by 5-15X compared to its predecessor. An industry first, NeuPro-M supports both system-on-chip (SoC) as well as Heterogeneous SoC (HSoC) scalability to achieve up to 1,200 TOPS and offers optional robust secure boot and end-to-end data privacy.

NeuPro-M is the latest generation processor architecture from CEVA for artificial intelligence and machine learning (AI/ML) inference workloads. Targeting the broad markets of Edge AI and Edge Compute, NeuPro-M is a self-contained heterogeneous architecture that is composed of multiple specialized co-processors and configurable hardware accelerators that seamlessly and simultaneously process diverse workloads of Deep Neural Networks, boosting performance by 5-15X compared to its predecessor.

NeuPro–M compliant processors initially include the following pre-configured cores:

  • NPM11 – single NeuPro-M engine, up to 20 TOPS at 1.25GHz
  • NPM18 – eight NeuPro-M engines, up to 160 TOPS at 1.25GHz

Illustrating its leading-edge performance, a single NPM11 core, when processing a ResNet50 convolutional neural network, achieves a 5X performance increase and 6X memory bandwidth reduction versus its predecessor, which results in exceptional power efficiency of up to 24 TOPS per watt.

Built on the success of its' predecessors, NeuPro-M is capable of processing all known neural network architectures, as well as integrated native support for next-generation networks like transformers, 3D convolution, self-attention and all types of recurrent neural networks. NeuPro-M has been optimized to process more than 250 neural networks, more than 450 AI kernels and more than 50 algorithms. The embedded vector processing unit (VPU) ensures future proof software-based support of new neural network topologies and new advances in AI workloads. Furthermore, the CDNN offline compression tool can increase the FPS/Watt of the NeuPro-M by a factor of 5-10X for common benchmarks, with very minimal impact on accuracy.

Ran Snir, Vice President and General Manager of the Vision Business Unit at CEVA, commented: "The artificial intelligence and machine learning processing requirements of edge AI and edge compute are growing at an incredible rate, as more and more data is generated and sensor-related software workloads continue to migrate to neural networks for better performance and efficiencies. With the power budget remaining the same for these devices, we need to find new and innovative methods of utilizing AI at the edge in these increasingly sophisticated systems. NeuPro-M is designed on the back of our extensive experience deploying AI processors and accelerators in millions of devices, from drones to security cameras, smartphones and automotive systems. Its innovative, distributed architecture and shared memory system controllers reduces bandwidth and latency to an absolute minimum and provides superb overall utilization and power efficiency. With the ability to connect multiple NeuPro-M compliant cores in a SoC or Chiplet to address the most demanding AI workloads, our customers can take their smart edge processor designs to the next level."

The NeuPro-M heterogenic architecture is composed of function-specific co-processors and load balancing mechanisms that are the main contributors to the huge leap in performance and efficiency compared to its predecessor. By distributing control functions to local controllers and implementing local memory resources in a hierarchical manner, the NeuPro-M achieves data flow flexibility that result in more than 90% utilization and protects against data starvation of the different co-processors and accelerators at any given time. The optimal load balancing is obtained by practicing various data flow schemes that are adopted to the specific network, the desired bandwidth, the available memory and the target performance, by the CDNN framework.

NeuPro-M architecture highlights include:

  • Main grid array consisting of 4K MACs (Multiply And Accumulates), with mixed precision of 2-16 bits
  • Winograd transform engine for weights and activations, reducing convolution time by 2X and allowing 8-bit convolution processing with