Acceleration and optimization of high-performance computing applications on emerging computing platforms

Hao Zhang

With the gradual breakdown of Dennard scaling and the slowdown of Moore’s Law, the traditional paradigm of improving performance and energy efficiency solely through transistor miniaturization has become unsustainable. To address these challenges, researchers have increasingly turned their attention to emerging computing technologies such as heterogeneous many-core architectures, chiplet-based design, and photonic computing, with the aim of sustaining efficiency and scalability of high-performance computing systems in the post-Moore era. This thesis focuses on these three architectural paradigms and investigates their acceleration potential in high-performance computing (HPC) applications.

The Sunway TaihuLight supercomputer, which held the title of the world’s fastest supercomputer for three consecutive years, utilises a total of 40,960 SW26010 heterogeneous many-core processors connected through a customized communication network. This thesis proposes an efficient sequence alignment algorithm, ESA, on the Sunway many-core platform for large-scale biological database searches. ESA adopts a hybrid alignment approach that combines local and global alignment, delivering higher accuracy than other sequence alignment algorithms. Additionally, ESA incorporates several optimizations, including cache-aware sequence alignment, capacity-aware load balancing, and bandwidth-aware data transfer. ESA effectively fills a critical gap in enabling efficient hybrid biological database search on the Sunway TaihuLight. Experimental results further demonstrate that the algorithm exhibits nearly linear weak scalability and outstanding strong scalability.

Chiplet technology offers a potential pathway to overcome the scalability and performance bottlenecks of monolithic GPUs; however, its performance is often constrained by the bandwidth and latency of inter-chiplet metallic interconnects. This thesis proposes SEECHIP, a scalable, energy-efficient chiplet-based GPU architecture leveraging optical links. SEECHIP introduces an innovative optical inter-chiplet network supporting both unicast and broadcast communication, achieving equivalent transmission bandwidth at both sender and receiver ends. Moreover, a customized hierarchical memory architecture is proposed, which is better suited for parallelization in compute-intensive applications.

In recent years, with the rapid advancement of Deep Neural Networks (DNNs), Application-Specific Integrated Circuits (ASICs) have been widely adopted to accelerate the training and inference of neural networks. This thesis proposes ChipAI, a chiplet-based accelerator that leverages optical interconnect technology to efficiently accelerate DNN inference tasks. ChipAI incorporates an efficient hybrid photonic network that supports effective data sharing both between and within chiplets, enhancing parallel processing capabilities. Furthermore, a flexible dataflow design is introduced, which exploits the characteristics of both the ChipAI architecture and DNN models to enable efficient architectural mapping of DNN layers.

Finally, photonic computing architectures are increasingly regarded as a key direction for overcoming the bottlenecks of conventional electronic computing. This thesis proposes ROCKET, a novel photonic accelerator based on the Residue Number System (RNS), designed to accelerate DNN training and inference. To this end, a photonic accelerator architecture that employs intensity modulators is developed to minimize the number of computational components while maximizing data reuse, and a hybrid optoelectronic pipeline dataflow is introduced to enhance both parallelism and throughput in the optoelectronic path. The feasibility of the proposed design is further demonstrated through a 4.096-GHz hybrid prototype constructed from FPGA, Radio Frequency (RF), and photonic components.

Acceleration and optimization of high-performance computing applications on emerging computing platforms

Abstract

Files and links (1)

Metrics

Details