Template-based MLIR Compiler

Template-based MLIR Compiler

The repository contains the sources for building the template-based MLIR compiler and the dependent LLVM sources (commit 5d4927 with some modifications). It compiles and executes MLIR programs consisting of supported operations (multiple sample programs are included; similar to mlir-cpu-runner); on first execution, it generates required templates and persists them.
Furthermore, the artifact contains the modified sources for LingoDB with integrated template-based code-generation backend and Polygeist (commit fd4194b) for conversion of C files to MLIR upstream dialect operations. Sample MLIR programs and scripts for preparing/running the benchmarks from Figures 2-5 are attached.

Benchmarks

Reproducible Artifact

Reproduction

  • as all dependent projects (lingodb, polygeist and our approach) require their own llvm version, reproducing the results requires building the llvm project three times.
  • experiments from the paper:
    • Microbenchmarks (x86 only)
    • PolyBenchC (x86 only)
    • LingoDB (x86 only)
    • Coremark (x86 and aarch64)
    • SPEC (x86 and aarch64)

Requirements

  • linux operating system on x86 and aarch64
  • podman container runtime
  • disk space: 40GB (x86); 20GB (aarch64)
  • DRAM: 32GB(x86); 16GB (aarch64)

Setup

Folder structure like:

src
 |- mlir-codegen <- run scripts inside here
 | |- spec-data (spec only; spec benchmark input data)
 | |- spec-programs (spec only; spec benchmark mlir programs)
 | \- results <- results will appear here
 |- llvm-project
 |- coremark
 |- lingo-db
 |- mlir-codegen-lingodb
 |- Polygeist
 \- PolybenchC-4.2.1
  1. everything should be run from inside the mlir-codegen directory
  2. build container for build and runtime environment
    podman build --squash . -t mlir-codegen-build
  3. run the build container and mount the above folder structure
    podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
  4. build the dependent projects (you might want to adjust the CMAKE_BUILD_PARALLEL_LEVEL environment variable to control the number of compiling jobs --- default is set to 2)
    make prepare-x86/make prepare-aarch64

On some hosts, we encountered a sporadic internal compiler error of gcc while building LLVM; in these cases rerun the target until it finishes successfully

  1. (spec only; run on target machine) prepare spec programs and data. As we can not distribute the SPEC benchmarks and data, there is some manual effort required. Export SPEC_BENCHSPEC_FOLDER environment variable to point to the benchspec folder of the unpacked spec benchmark data. Run make spec to create and fill the spec-data and spec-program folder.

On aarch64, the -m64 option must be first removed from benchspec/CPU/525.x264_r/src/simple-build-ldecod_r-525.sh, as gcc disallows the -m64 option on aarch64.

Execution

  1. run the benchmarks (except SPEC) using the architecture specific benchmark make targets (make benchmark-x86/make benchmark-aarch64). The spec benchmarks can be run on both architectures using make benchmark-spec. The benchmarks produce output log files in the result directory.

Visualization

  1. visualize the results in diagrams similar to the ones presented in the paper by using the make viz target. It produces output diagrams as pdf files in the results folder for whatever benchmark result files are present.

Summary

Reproduce all result diagrams in results folder (SPEC commands can be left out to skip reproduction of the SPEC results):

X86
podman build . -t mlir-codegen-build
podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
make prepare-x86
SPEC_BENCHSPEC_FOLDER=[...]/benchspec make spec
make benchmark-x86
make benchmark-spec
make viz
AArch64
podman build . -t mlir-codegen-build
podman run -it --rm --mount type=bind,source=${PWD}/..,target=/src mlir-codegen-build bash
make prepare-aarch64
SPEC_BENCHSPEC_FOLDER=[...]/benchspec make spec
make benchmark-aarch64
make benchmark-spec
make viz

Evaluation and expected results

In general, the x86-64 benchmarks in the paper were run on an older server CPU, so latency of individual instructions as well as memory access costs might vary on modern desktop CPUs and thus end up with slightly different results. The following applies for both architectures - x86-64 and aarch64; the experiments reproduce compilation and execution times of the respective benchmarks visualized as diagrams similar to the ones presented in the paper. In general, there should be a one to two orders of magnitude difference in the compilation time of our approach compared to the LLVM backends and a slowdown below 3x (at least in the geomean) for the execution time.

Microbenchmarks

The experiment additionally reproduces the effect of the individual applied optimizations.
The individual optimizations might vary heavily based on the exact CPU. The faster the execution of the benchmarks the less difference between the individual stages. Depending on the memory access costs, the difference effectiveness of the register caching might be reduced (barely any improvement to template calling convention).

LingoDB

Faster systems might end up with a different speedup factor for total query runtime as execution time is a larger factor to our approach than it is to the others.

PolyBenchC, Spec and Coremark

Expect an order of magnitude between the three approaches on compilation time; similar results in the geomean for execution time. As the results are normalized to the execution of the optimized code generation backend of LLVM results might shift quite a bit for individual benchmarks as faster execution times for the baseline results in comparably high slowdowns for the other approaches.

posted @ 2024-08-06 10:51  0x7F  阅读(8)  评论(0编辑  收藏  举报