In a previous blog post I wrote about bitcode. Embedding it was already possible for quite some time with Apple’s fork of LLVM that ships with Xcode. Recently, Apple upstreamed (parts of) their implementation making it possible to do the same with the open source clang compiler.

However, there are some differences between the two implementations. Bitcode embedded with Apple’s version of clang is bundled into a xar-archive. Metadata such as compiler commands used to create the binary and dylibs which the code linked against, are kept in the archive’s table of content. The archive itself is contained in the __LLVM __bundle of the Mach O object.

The clang implementation stores the bitcode differently. Rather than being wrapped in an archive, the bitcode files are put consecutively in binary form in the __LLVM, __bitcode section. Files can be separated by looking for the magic number BC (0x42, 0x43). Command line arguments are stored in a different section __LLVM, __cmdline, separated by a null-byte.

Getting Embedded Bitcode

At GuardSquare, our iOS solution iXGuard takes bitcode as input. In order to have the software process complete applications, I started looking for a way to programmatically get bitcode form a binary.

The first thing I encountered was a tool called bitcode_retriever, created by Alex Denisov as part of his bitcode demystified article. Being a proof of concept it was quite limited, though it did what it promised with barely any dependencies. I opened a few pull-requests on Github to make it suitable as a library and to have it extract the bitcode files and their metadata from the archive.

While further extending the bitcode retriever, I realized that even after my changes, most of the C code was dedicated to handling Mach O files. This is something that LLVM is pretty good at. I didn’t feel like reinventing the wheel and writing a lot of boiler plate code to handle the Mach-O structure, in order to get more information from the binary. I also wanted to support more cases, including static libraries and that’s how LibEBC was born.

LibEBC

LibEBC (LibEmbeddedBitCode) is a C++ library for obtaining embedded bitcode from binaries end libraries.

It supports both ways of embedding bitcode and because it’s build on top of LLVM, it also support all types of objects files (MachO and ELF on macOS and Linux respectively) as well as Mach O universal binaries, static and dynamic libraries. In theory it should work even on Windows, though I haven’t tested it.

The Doxygen documentation for the library can be found at https://jdevlieghere.github.io/LibEBC/

ebcutil

Like many LLVM projects, I put all the logic in a library and created a stand-alone command line tool, called ebcutil that builds on top of it.

The tool prints information about the binary, the embedded bitcode and the metadata stored with it. Objects created with Apple’s version of LLVM contain, in addition to the command line arguments passed to the compiler frontend, the linker flags and dylibs.

Example

Let’s have a look at a Mach-O universal binary with two architectures, i386 and x86_64.

$ file fat.o
fat.o: Mach-O universal binary with 2 architectures
fat.o (for architecture i386):    Mach-O executable i386
fat.o (for architecture x86_64):  Mach-O 64-bit executable x86_64

Without any arguments, ebcutil displays information about the binaries such as the file name, the architecture and the UUID. Underneath is information embedded by the compiler because the file was created with the -fembed-bitcode flag:

  • Dylibs
  • Linker options
  • Bitcode files

This is what the output looks like:

$ ebcutil fat.o
Mach-O 32-bit i386
  File name: fat.o
       Arch: x86
       UUID: 16B4EDD1-4B58-35FB-849F-0CA0647D6C1C
     Dylibs: {SDKPATH}/usr/lib/libSystem.B.dylib
  Link opts: -execute -macosx_version_min 10.11.0 -e _main -executable_path build/i386.o
    Bitcode: 0.bc
    Bitcode: 1.bc
Mach-O 64-bit x86-64
  File name: fat.o
       Arch: x86_64
       UUID: F6323CD5-E0DD-3E99-9D4A-B36B5A8E3E36
     Dylibs: {SDKPATH}/usr/lib/libSystem.B.dylib
  Link opts: -execute -macosx_version_min 10.11.0 -e _main -executable_path build/x86_64.o
    Bitcode: 2.bc
    Bitcode: 3.bc

By passing the -e or --extract option, the tool will extract the bitcode files from the binary in the current directory (you can chance this with -d).

Notice that the bitcode files are always given a unique file named, consisting of a monotonically increasing number. This ensures that, when extracting, files are not accidentally given the same name based on the object’s metadata.