As more cloud vendors increasingly require system-specific optimizations, customized chip designs have become popular, and the infrastructure chip market has become more interesting with more players. Since the beginning of this year, MeitY, the Ministry of Electronics and Information Technology of India, SiPearl, a French chip start-up, and ETRI, the Korea Electronics and Communication Research Institute, have all announced the development of products based on Neoverse V1.
Domestic manufacturers catch up
A group of domestic system makers, including Tencent and Alibaba, are accelerating the construction of chips for Arm-based servers.
Victor Huang, director of Tencent’s special testing technology center, said: “In 2020, Tencent and Arm officially signed a cooperation agreement, hoping to accelerate the evaluation and adaptation of Arm Neoverse technology through cooperation. Later, we found through the TencentBench test framework that the Thanks to more scalable CPU cores, Arm servers have stronger performance than traditional servers. It is worth mentioning that its advantages in AI inference and image processing are obvious.”
Kingsum Chow, chief engineer of Alibaba, mentioned: “In terms of Arm’s CPU resources, there will be two considerations in our existing software. One is that some of our software needs to be recompiled, and the other is not. To recompile, we only need to run the Java applications on the JVM (Java Virtual Machine). In this regard, a year ago, we worked with Arm staff to improve the performance of the JVM. In the past year , We went from JDK8 to JDK11, through OpenJDK, through Alibaba Dragonwell (a distribution of OpenJDK), we have improved some of the performance of some of our existing Java applications by 50%.”
Chris Bergey, senior vice president and general manager of Arm’s Infrastructure Business Unit, said: “Tencent continues to invest in Arm-based hardware testing and software support, and their hardware testing has shown excellent results in terms of performance and performance per watt. On the software side, they support both compiled and interpreted codebases and the microservice frameworks that power those codebases.”
Regarding Alibaba’s cooperation, Bergey said: “Java is a critical workload for Alibaba, and their engineers have written more than one billion lines of Java code. Alibaba and Arm are working on Java workloads. Continued collaboration on analysis and debugging.”
Demystifying the new Neoverse roadmap
Because so many customers have already started development of Neoverse-based products, Arm is starting to announce its detailed product roadmap more and more quickly.
A few days ago, at Arm’s annual technology day, Arm introduced the Arm Neoverse V1 and Neoverse N2 platforms in detail. And Arm Neoverse CMN-700 mesh interconnect technology.
CMN-700 Interconnect Technology
Arm CMN-700 interconnect technology is a key element in building V1 and N2, Bergey said. Based on the CMN-600, the number and type of additional memory and IO devices have been improved from the number of cores and cache size to the number and type of IO devices. DDR5 and HBM are supported. Additionally, CXL capabilities have been added to accelerate memory expansion and smart consistency. In addition, a number of multi-chip support functions have been added to improve performance and optimize functions for traditional multi-socket designs and new chip sets or multi-chip integration. “Multi-chip integration will provide new opportunities to break through the traditional limitations of silicon masks and provide greater flexibility for tightly coupled heterogeneous computing,” Bergey said.
Neoverse V1: Added SVE function
“Neoverse V1 was designed with performance first, so we widened the microarchitecture and increased the depth of buffers and queues to accommodate more instructions on the fly,” Bergey said.
Compared with N1, Neoverse V1 brings 50% performance improvement, 1.8 times vector workload optimization, and 4 times machine learning workload optimization. At the same time, Neoverse V1 is also the first of Arm’s new computing series that emphasizes performance priority. a platform. Neoverse N1 gives chip partners the flexibility to build computing power for applications that are highly dependent on CPU performance and bandwidth, and provides them with SoC design flexibility.
With performance in mind, Neoverse V1’s design philosophy creates the widest microarchitecture Arm has ever designed to accommodate more running instructions and support market applications such as high performance and exascale computing. Neoverse V1’s wide and deep architecture, coupled with SVE capabilities, will give it a lead in single-core performance and extended code lifetime through SVE, and provide chip designers with achievable flexibility. Bergey explained: “Arm’s existing SIMD instruction set, NEON, is difficult to vectorize some code, and SVE can directly take the same code and auto-vectorize it very well. Compared with NEON, it can be Almost 3.5 times faster.”
Neoverse N2: with Arm V9 architecture
The Neoverse N2 platform is the first platform based on the Armv9 architecture, with comprehensive improvements in security, energy consumption and performance, and is paving the way for the core of the infrastructure. “The N2 efficiency profile can be more competitive on single-socket threads, while providing dedicated cores rather than shared threads,” Bergey said.
Compared with the N1, the Neoverse N2 has a 40% increase in single-thread performance while maintaining the same level of power and area efficiency. Neoverse N2 is scalable, spanning from high-throughput computing to power and size-constrained edge and 5G application scenarios, and outperforms N1 in these applications, e.g. a 1.3 boost on cloud 1.2X faster DPDK packet processing on 5G and edge applications.
The Neoverse N2 platform provides excellent single-thread performance and industry-leading performance per watt that reduces TCO for users. Neoverse N2 is the first platform to feature SVE2, a feature that can bring huge improvements in cloud-to-edge performance efficiency. In a wide range of application scenarios such as machine learning, digital signal processing, multimedia, and 5G, SVE2 not only brings significant performance improvements, but also brings the advantages of programming simplicity and portability that SVE has.
Bergey said: “SVE2 brings the performance, programming simplicity and portability associated with SVE to a wider range of fields and scenarios. SVE is intended to accelerate HPC, while SVE2 is to extend it to ML, DSP, multimedia and 5G and other application scenarios. It combines NEON’s rich data manipulation, logic and arithmetic instruction set, and SVE’s auto-vectorization and scalability features.”
Performance comparison between Neoverse and friends
Bergey concluded: “The V1 platform will be a revolution for HPC, and the N2 will be the best solution for cloud-to-edge application scenarios.”