RTL to GDS-II Implementation of Systolic Array-Based Matrix Multiplication and Reduction on Elastic CGRAS
RTL to GDS-II Implementation of Systolic Array-Based Matrix Multiplication and Reduction on Elastic CGRAS
Harshith U
Department of Electronics and Communication Engineering BMS College of Engineering
Bengaluru, India harshith.ec22@bmsce.ac.in
Veena M.B.
Department of Electronics and Communication Engineering BMS College of Engineering
Bengaluru, India veenamb.ece@bmsce.ac.in
Abstract—This paper presents the RTL-to-GDSII implementa-tion of a systolic array-based matrix multiplication and reduction architecture on an Elastic Coarse-Grained Reconfigurable Archi-tecture (CGRA). The proposed design focuses on achieving scal-able parallel computation using a regular systolic dataflow struc-ture integrated with reconfigurable processing elements. The ar-chitecture enables efficient matrix computation through pipelined multiply-accumulate operations while also supporting reduction functionality for high-throughput applications. The complete hardware architecture is modeled using Verilog/SystemVerilog and verified through functional simulation. A modular RTL design methodology is adopted to improve scalability and sim-plify hardware integration. The design is further implemented using a complete ASIC physical design flow including synthesis, floorplanning, power planning, placement, clock tree synthesis, routing, and physical verification. The implementation is carried out using SCL 180 nm standard-cell technology. Power rings, stripes, clock distribution, and routing optimization techniques are incorporated to achieve a physically realizable layout. Post-layout analysis validates the timing and routing feasibility of the proposed architecture. The regular structure of the systolic array and Elastic CGRA framework makes the design suitable for high-performance parallel processing applications such as signal processing, machine learning acceleration, and scientific computing.
Index Terms—Systolic Array, Elastic CGRA, RTL-to-GDSII, Matrix Multiplication, Physical Design, ASIC, VLSI, Parallel Processing