Contact: Dr Yiren Zhao ([email protected]), Dr Jianyi Cheng ([email protected])
A major overhead of using FPGAs is that regular operations, such as matrix multiplications (MMs), have been well-optimized on other ASIC devices such as GPUs and vector processors. Recently, there has been interest in exploring accelerator architectures on AMD Versal ACAP devices. A Versal device contains an array of vector processors, named AI Engines (AIEs). These AIEs are specialized for accelerating SIMD operations, such as MMs at a given set of precisions, connected using reconfigurable switches for custom data parallelism. Recent work has shown significant performance benefits in accelerating transformer models using Versal devices, compared to FPGAs and GPUs. However, AIEs are often hard to program due to the design space across spatial memory mapping and data parallelism.
TritonĀ is a language and compiler for parallel programming. It aims to provide a Python-based programming environment for productively writing custom DNN compute kernels capable of running at maximal throughput on modern GPU hardware. This project aims to lift programming burden for AIEs with the help of the Triton interface.
Note: This project would be challenging and would require good software engineering skills
https://triton-lang.org/main/index.html
https://github.com/Xilinx/mlir-air
The project would best suit a student who: