Abstract
As deep learning expands across emerging domains, computational demands are pushing traditional electronic accelerators to their limits. Silicon photonics has emerged as a promising technology for accelerating deep learning workloads, but precision remains a challenge due to noise and non-idealities. In this paper, we present BITLUME, a novel photonic computing unit that enables multiplications beyond 8-bit precision through a precision-flexible scheme. We further propose an optimized round-truncation algorithm and data mapping strategy for BITLUME to reduce optoelectronic conversions, enhance data reuse, and maintain computational accuracy. A hybrid optoelectronic architecture integrating BITLUME is developed and validated using a prototype built with FPGA, RF, and photonic components, achieving 3.7× lower end-to-end latency than the A100 GPU in dot product. Simulations of training seven DNN models at FP32 show that BITLUME achieves up to 3.35× and 10.78× speedup, and 1.53× and 4.12× energy savings, compared to the state-of-the-art photonic accelerator and A100 GPU, respectively.