In a previous blog post I wrote about bitcode. Embedding it was already possible for quite some time with Apple’s fork of LLVM that ships with Xcode. Recently, Apple upstreamed (parts of) their implementation making it possible to do the same with the open source clang compiler.
However, there are some differences between the two implementations. Bitcode embedded with Apple’s version of clang is bundled into a xar-archive. Metadata such as compiler commands used to create the binary and dylibs which the code linked against, are kept in the archive’s table of content. The archive itself is contained in the __LLVM __bundle
of the Mach O object.
The clang implementation stores the bitcode differently. Rather than being wrapped in an archive, the bitcode files are put consecutively in binary form in the __LLVM, __bitcode
section. Files can be separated by looking for the magic number BC
(0x42
, 0x43
). Command line arguments are stored in a different section __LLVM, __cmdline
, separated by a null-byte.
Getting Embedded Bitcode
At GuardSquare, our iOS solution iXGuard takes bitcode as input. In order to have the software process complete applications, I started looking for a way to programmatically get bitcode form a binary.
The first thing I encountered was a tool called bitcode_retriever, created by Alex Denisov as part of his bitcode demystified article. Being a proof of concept it was quite limited, though it did what it promised with barely any dependencies. I opened a few pull-requests on Github to make it suitable as a library and to have it extract the bitcode files and their metadata from the archive.
While further extending the bitcode retriever, I realized that even after my changes, most of the C code was dedicated to handling Mach O files. This is something that LLVM is pretty good at. I didn’t feel like reinventing the wheel and writing a lot of boiler plate code to handle the Mach-O structure, in order to get more information from the binary. I also wanted to support more cases, including static libraries and that’s how LibEBC was born.
LibEBC
LibEBC (LibEmbeddedBitCode) is a C++ library for obtaining embedded bitcode from binaries end libraries.
It supports both ways of embedding bitcode and because it’s build on top of LLVM, it also support all types of objects files (MachO and ELF on macOS and Linux respectively) as well as Mach O universal binaries, static and dynamic libraries. In theory it should work even on Windows, though I haven’t tested it.
The Doxygen documentation for the library can be found at https://jdevlieghere.github.io/LibEBC/
ebcutil
Like many LLVM projects, I put all the logic in a library and created a stand-alone command line tool, called ebcutil
that builds on top of it.
The tool prints information about the binary, the embedded bitcode and the metadata stored with it. Objects created with Apple’s version of LLVM contain, in addition to the command line arguments passed to the compiler frontend, the linker flags and dylibs.
Example
Let’s have a look at a Mach-O universal binary with two architectures, i386
and x86_64
.
$ file fat.o
fat.o: Mach-O universal binary with 2 architectures
fat.o (for architecture i386): Mach-O executable i386
fat.o (for architecture x86_64): Mach-O 64-bit executable x86_64
Without any arguments, ebcutil
displays information about the binaries such as the file name, the architecture and the UUID. Underneath is information embedded by the compiler because the file was created with the -fembed-bitcode
flag:
- Dylibs
- Linker options
- Bitcode files
This is what the output looks like:
$ ebcutil fat.o
Mach-O 32-bit i386
File name: fat.o
Arch: x86
UUID: 16B4EDD1-4B58-35FB-849F-0CA0647D6C1C
Dylibs: {SDKPATH}/usr/lib/libSystem.B.dylib
Link opts: -execute -macosx_version_min 10.11.0 -e _main -executable_path build/i386.o
Bitcode: 0.bc
Bitcode: 1.bc
Mach-O 64-bit x86-64
File name: fat.o
Arch: x86_64
UUID: F6323CD5-E0DD-3E99-9D4A-B36B5A8E3E36
Dylibs: {SDKPATH}/usr/lib/libSystem.B.dylib
Link opts: -execute -macosx_version_min 10.11.0 -e _main -executable_path build/x86_64.o
Bitcode: 2.bc
Bitcode: 3.bc
By passing the -e
or --extract
option, the tool will extract the bitcode files from the binary in the current directory (you can chance this with -d
).
Notice that the bitcode files are always given a unique file named, consisting of a monotonically increasing number. This ensures that, when extracting, files are not accidentally given the same name based on the object’s metadata.
- For more examples of the library in action, have a look at the README on GitHub.
- If you just need ebcutil to view or extract embedded bitcode from a binary or library, you can download the latest release from GitHub.