Template-based MLIR Compiler
Template-based MLIR Compiler
The repository contains the sources for building the template-based MLIR compiler and the dependent LLVM sources (commit 5d4927
with some modifications). It compiles and executes MLIR programs consisting of supported operations (multiple sample programs are included; similar to mlir-cpu-runner
); on first execution, it generates required templates and persists them.
Furthermore, the artifact contains the modified sources for LingoDB with integrated template-based code-generation backend and Polygeist (commit fd4194b
) for conversion of C files to MLIR upstream dialect operations. Sample MLIR programs and scripts for preparing/running the benchmarks from Figures 2-5 are attached.
Benchmarks
Reproducible Artifact
Reproduction
- as all dependent projects (lingodb, polygeist and our approach) require their own llvm version, reproducing the results requires building the llvm project three times.
- experiments from the paper:
- Microbenchmarks (x86 only)
- PolyBenchC (x86 only)
- LingoDB (x86 only)
- Coremark (x86 and aarch64)
- SPEC (x86 and aarch64)
Requirements
- linux operating system on x86 and aarch64
podman
container runtime- disk space: 40GB (x86); 20GB (aarch64)
- DRAM: 32GB(x86); 16GB (aarch64)
Setup
Folder structure like:
src
|- mlir-codegen <- run scripts inside here
| |- spec-data (spec only; spec benchmark input data)
| |- spec-programs (spec only; spec benchmark mlir programs)
| \- results <- results will appear here
|- llvm-project
|- coremark
|- lingo-db
|- mlir-codegen-lingodb
|- Polygeist
\- PolybenchC-4.2.1
- everything should be run from inside the
mlir-codegen
directory - build container for build and runtime environment
podman build --squash . -t mlir-codegen-build
- run the build container and mount the above folder structure
podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
- build the dependent projects (you might want to adjust the
CMAKE_BUILD_PARALLEL_LEVEL
environment variable to control the number of compiling jobs --- default is set to2
)
make prepare-x86
/make prepare-aarch64
On some hosts, we encountered a sporadic internal compiler error of gcc while building LLVM; in these cases rerun the target until it finishes successfully
- (spec only; run on target machine) prepare spec programs and data. As we can not distribute the SPEC benchmarks and data, there is some manual effort required. Export
SPEC_BENCHSPEC_FOLDER
environment variable to point to thebenchspec
folder of the unpacked spec benchmark data. Runmake spec
to create and fill thespec-data
andspec-program
folder.
On aarch64, the
-m64
option must be first removed frombenchspec/CPU/525.x264_r/src/simple-build-ldecod_r-525.sh
, asgcc
disallows the-m64
option on aarch64.
Execution
- run the benchmarks (except SPEC) using the architecture specific benchmark make targets (
make benchmark-x86
/make benchmark-aarch64
). The spec benchmarks can be run on both architectures usingmake benchmark-spec
. The benchmarks produce output log files in the result directory.
Visualization
- visualize the results in diagrams similar to the ones presented in the paper by using the
make viz
target. It produces output diagrams aspdf
files in theresults
folder for whatever benchmark result files are present.
Summary
Reproduce all result diagrams in results
folder (SPEC commands can be left out to skip reproduction of the SPEC results):
X86
podman build . -t mlir-codegen-build
podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
make prepare-x86
SPEC_BENCHSPEC_FOLDER=[...]/benchspec make spec
make benchmark-x86
make benchmark-spec
make viz
AArch64
podman build . -t mlir-codegen-build
podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
make prepare-aarch64
SPEC_BENCHSPEC_FOLDER=[...]/benchspec make spec
make benchmark-aarch64
make benchmark-spec
make viz
Evaluation and expected results
In general, the x86-64 benchmarks in the paper were run on an older server CPU, so latency of individual instructions as well as memory access costs might vary on modern desktop CPUs and thus end up with slightly different results. The following applies for both architectures - x86-64 and aarch64; the experiments reproduce compilation and execution times of the respective benchmarks visualized as diagrams similar to the ones presented in the paper. In general, there should be a one to two orders of magnitude difference in the compilation time of our approach compared to the LLVM backends and a slowdown below 3x (at least in the geomean) for the execution time.
Microbenchmarks
The experiment additionally reproduces the effect of the individual applied optimizations.
The individual optimizations might vary heavily based on the exact CPU. The faster the execution of the benchmarks the less difference between the individual stages. Depending on the memory access costs, the difference effectiveness of the register caching might be reduced (barely any improvement to template calling convention).
LingoDB
Faster systems might end up with a different speedup factor for total query runtime as execution time is a larger factor to our approach than it is to the others.
PolyBenchC, Spec and Coremark
Expect an order of magnitude between the three approaches on compilation time; similar results in the geomean for execution time. As the results are normalized to the execution of the optimized code generation backend of LLVM results might shift quite a bit for individual benchmarks as faster execution times for the baseline results in comparably high slowdowns for the other approaches.