Abstract
With the substantial increase in computing workload for deep learning applications, traditional electronic accelerators have been pushed close to their limits. Silicon photonics has emerged as a promising technology for both communication and computation in accelerating deep learning workloads. Despite its great potential, the design of photonic deep learning accelerators faces several challenges, including implementing optical backpropagation, ensuring the scalability of photonic chips, and achieving high-precision photonic computation, which is not yet extensively addressed in the existing works.
To tackle the challenges of implementing optical backpropagation, we propose a general-purpose photonic gradient descent unit called STADIA, which enables multiplication, accumulation, and subtraction operations in gradient computation by utilizing mature photonic devices such as Mach-Zehnder Interferometer (MZI) and Micro-ring Resonator (MRR). A silicon photonic accelerator for backpropagation, built on the STADIA unit, is designed to enable high-performance DNN training, offering substantial reductions in training latency and enhanced energy efficiency for the backpropagation process. To showcase effective parallel computing, we develop a dataflow for DNN training utilizing wavelength-division multiplexing (WDM). Simulation results show that the proposed architecture can reduce latency by 9.7x and improve energy efficiency by 147.2x, compared with the state-of-the-art optical-memristor-based backpropagation accelerators.
The integration of photonic computing cores in a many-core architecture remains unexplored. To address this, we propose a novel scalable chiplet-based photonic accelerator named SIPHON, which leverages both photonic computing and communication for ultrafast and energy-efficient DNN training and inference. A photonic computing unit (PCU) is designed to perform multiply-accumulate operations and gradient computations in both forward and backward propagations. A photonic interconnection is introduced to meet the extensive wavelength and bandwidth requirements for photonic computing within the PCU. Additionally, a dataflow is developed to enable efficient data reuse and parallel computing by utilizing multiple communication modes. With SIPHON, the frequency of optoelectronic conversions in our optical interconnection is significantly reduced. Simulation results show that SIPHON can improve time efficiency by 2.3x, and energy efficiency by 6.2x in training pass compared to the existing optical accelerator.
Finally, the computational precision of photonic computing systems is limited by noise, loss, and non-ideal components, posing challenges for high-precision DNN training. To address this, we present BITLUME, a novel photonic computing unit that enables photonic multiplications beyond 8-bit precision through a precision-flexible algorithm. We propose a data mapping strategy for BITLUME to reduce the number of optoelectronic conversions and improve energy efficiency by maximizing data reuse. Furthermore, we develop a hybrid optoelectronic architecture integrating BITLUME and digital components for deep learning acceleration. We validate our designs in an experimental platform developed using an FPGA, RF components, and photonic components. Our large-scale simulations illustrate that compared to the A100 GPU, BITLUME accelerates the training time by 15.59x while consuming 6.83x less energy, respectively.
This thesis explores the potential of photonic chips and optoelectronic hybrid systems in advancing high-performance computing, particularly for neural networks. While promising results are demonstrated, limitations remain, such as the underexplored interconnection topologies and challenges in coordinating electronic and photonic simulation tools. Future research will focus on extending the models to diverse neural networks, integrating pruning techniques with photonic structures for better energy efficiency, and developing all-photonics and general-purpose photonic chips. These efforts aim to address current limitations and enable more scalable and efficient photonic computing platforms.