Alpine and macOS support for LiveHD

Jixin Chen <jchen304 at ucsc dot edu>

mentored by Micro Architecture at Santa Cruz Group, SOE and Center for Research in Open Source Software at UC Santa Cruz:

sponsored by Google Summer of Code

Summer 2021

Final Report

  1. Preparation - Clean up infrastructure, fix build, make CI green again
  2. Switch to rules_hdl - Collaboration with other projects, less burden
  3. Alpine Linux support - Less undefined behavior, better portability, musl ✅
  4. ARM64 support - signed char vs unsigned char, x86 intrinsics 🤨
  5. Port to macOS - Darwin kernel, LLVM toolchain, libc++, BSD utilities
  6. Acknowledgements
  7. Future work

Preparation

When I first dived into the LiveHD project, it did not conform to many of the best practices in the open source world. This is not unexpected for a project that starts as an academic research and has a limited number of external users at the moment.

My first instinct, upon seeing a long list of commits with red cross ❌ (failing since Jan 2021), is that the infrastructure of the project needs a cleanup, and CI should be made green again.

Work Done

Commits

Switch to rules_hdl

Bazel is a new build system, primarily developed by Google. Bazel projects (i.e. LiveHD) often need to write build rules for non-Bazel dependencies.

rules_hdl aims to consolidate those efforts. It is a set of rules that allows Bazel projects to easily depend on crucial HDL (Hardware Description Language) libraries and tools.

By switching to it, the maintenance burden becomes lower due to collaboration, and other members in the HDL community can enjoy the benefits of our works.

Work Done

Commits

Alpine Linux support

Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc (a minimal C standard library) and busybox, which is popular in the embedded and container use cases.

However, most Linux distributions, including Debian and Enterprise Linux, use glibc (GNU C standard library). Developers and software distributors often make the assumption that a Linux system has glibc. As a result, prebuilt executables are often not runnable on Alpine Linux, and source code often relies on specific glibc behaviors.

Not only do I have to port LiveHD to Alpine Linux, but I also have to port its dependencies. The most challenging tool that I have to port is the build system (Bazel).

Work Done

Commits

ARM64 support

Three types of char are specified: signed, plain, and unsigned. A plain char may be represented as either signed or unsigned, depending upon the implementation, as in prior practice. - 3.1.2.5, ANSI C Rationale

x86 and ARM64 diverges on the signedness of char, where x86 uses signed char but ARM64 uses unsigned char. It is common for developers who only use x86 machines to make the inappropriate assumption that char is signed. To support ARM64, it is necessary to fix those instances.

Additionally, unlike x64, ARM architectures have poor support for unaligned access, which could cause exceptions in some cases. Plus, unaligned access violates strict aliasing rule, and is classified as an undefined behavior in C/C++.

The last and the most obvious problem is the x86 intrinsics. Intrinsics are minimal wrappers around a small piece of assembly code, which allow developers to use assembly directly, but in a more elegant way. Due to their architecture-specific nature, they have to be replaced with pure C implementations on non-x86 platforms.

Work Done

Commits

Port to macOS

macOS is a different operating system:

The first hurdle encountered is a series of error: no member named ??? in namespace 'std' compile errors. It turns out that in each version of C++, Standards Committee not only adds features, but also removes existing ones. GCC folks decide to continue implementing the removed features whenever possible, possibly for better compatibility. However, LLVM choose to be stricter. LiveHD uses C++ 17, but there are uses of deprecated and removed standard library functions.

After I finally got LiveHD to compile, most tests fail, and the logs suggest that no operation has been performed. With some breakpoints in command line argument parsing functions, it becomes known that invalid arguments are passed to the main executable 🤨. So I decided to add some echoes before test script's invocations of the executable. The arguments pass to the executable unparsed! With more diggings, I noticed that getopt produces different results on macOS. A quick search reveals that BSD's getopt has different behaviors from GNU one.

With that trouble gone, there is only one test failure remaining. Weirdly, similar tests in the same group do not fail, and the differences between results do not make sense. I have to put breakpoints around the crucial functions, and rotate between macOS and Linux to compare the intermediate results. It turns out that algorithm in std::sort is different in libc++, and one of the less function in LiveHD made incorrect assumptions about the sorting algorithm.

Work Done

Commits

Acknowledgements

I would like to thank members of the MASC group and CROSS staffs, especially Professor Renau, for the guidance.

This work is funded by Google, via the Summer of Code program. Google's commitments to Open Source are much appreciated!

Future work

I intend to continue working in the MASC lab, as an undergraduate student and (possibly) in the future as a graduate student. Currently, I am working on other tasks in LiveHD, some assigned to me by Professor Renau.

Connect with me

Email is the preferred method of communication. If you are also a Chinese SOE alum and prefer WeChat, please visit 湾区校友群 and my profile.