Python has great interoperability with C and C++ through extension modules. There are many reasons to do this, such as improving performance, accessing APIs not exposed by the language, or interfacing with libraries written in C or C++.
Unlike Python however, C and C++ are not memory safe. Luckily, great tools exist to help diagnose these kind of issues. One of those tools is ASan (Address Sanitizer) which uses compiler instrumentation to detect memory errors at runtime.
It's totally possible to use the address sanitizer for native Python modules. For example, for LLDB we do this when running the Python test suite with ASan and UBSan.
There are some things to be aware of when using a ASanified module from Python. For the sake of this post I'm going to assume you're running macOS Catalina and are using Python 3 that comes with Xcode (
When you first import the module, you might encounter the following error:
$ /usr/bin/python3 >>> import sanitized ===12345===ERROR: Interceptors are not working. This may be because AddressSanitizer is loaded too late (e.g. via dlopen). Please launch the executable with: DYLD_INSERT_LIBRARIES=/path/to/libclang_rt.asan_osx_dynamic.dylib
For ASan to work, it needs to intercept functions like
free to track memory usage. This requires the runtime to be loaded first, before the library (e.g.
libc) that exports these functions.
When importing a sanitized module in Python, the dynamic linker (
dyld) will have already loaded these symbols, before it dynamically loads the sanitized module with
In the error message, it's nice enough to suggest using the environment variable
DYLD_INSERT_LIBRARIES to change the dynamic linker's behavior. This allows
dyld to load libraries, specifically the sanitizer runtime, before loading the current binary.
System Integrity Protection
DYLD_INSERT_LIBRARIES appears to be insufficient.
$ DYLD_INSERT_LIBRARIES=/path/to/libclang_rt.asan_osx_dynamic.dylib /usr/bin/python3 >>> import sanitized ===12345===ERROR: Interceptors are not working. This may be because AddressSanitizer is loaded too late (e.g. via dlopen). Please launch the executable with: DYLD_INSERT_LIBRARIES=/path/to/libclang_rt.asan_osx_dynamic.dylib
If your first instinct is checking that the environment variable actually makes it to Python you wouldn't be alone. If you have System Integrity Protection (SIP) enabled, you're in for some extra confusion. According to Apple's documentation:
Spawning children processes of processes restricted by System Integrity Protection [...] resets the Mach special ports of that child process. Any dynamic linker (dyld) environment variables, such as DYLD_LIBRARY_PATH, are purged when launching protected processes.
We can use another environment variable to verify that the sanitizer runtime is indeed loaded first. With
dyld will print all the libraries as they are loaded.
$ DYLD_PRINT_LIBRARIES=1 DYLD_INSERT_LIBRARIES=/path/to/libclang_rt.asan_osx_dynamic.dylib /usr/bin/python3 dyld: loaded: <...> /usr/bin/python3 dyld: loaded: <...> /path/to/libclang_rt.asan_osx_dynamic.dylib ... >>> import sanitized ===12345===ERROR: Interceptors are not working. This may be because AddressSanitizer is loaded too late (e.g. via dlopen). Please launch the executable with: DYLD_INSERT_LIBRARIES=/path/to/libclang_rt.asan_osx_dynamic.dylib
The problem is a little less obvious this time. Remember how I assumed we're using Python 3 from Xcode? You might wonder why we're using
/usr/bin/python3 and not something living in
If we run
nm on the binary we can see that it's actually a small shim. Without having Xcode installed, it'll instruct you to download and install it. Otherwise, it simply forwards to the Python binary in Xcode.
$ nm /usr/bin/python3 0000000100002010 d __dyld_private 0000000100000000 T __mh_execute_header 0000000100000f73 T _main 0000000100002008 S _shim_marker U _xcselect_invoke_xcrun U dyld_stub_binder
What's happening here is that we launch the shim with the ASan runtime loaded first, but then launch the real interpreter and because the environment variable doesn't get forwarded, the runtime is only loaded when we import the sanitized module, which is too late.
xcrun to find out the path to the binary.
$ xcrun -find python3 /Applications/Xcode.app/Contents/Developer/usr/bin/python3
Even with the shim in
/usr/bin circumvented, we run into the same error.
$ DYLD_INSERT_LIBRARIES=/path/to/libclang_rt.asan_osx_dynamic.dylib /Applications/Xcode.app/Contents/Developer/usr/bin/python3 >>> import sanitized ===12345===ERROR: Interceptors are not working. This may be because AddressSanitizer is loaded too late (e.g. via dlopen). Please launch the executable with: DYLD_INSERT_LIBRARIES=/path/to/libclang_rt.asan_osx_dynamic.dylib
Let's repeat our
nm trick to see if this doesn't happen to be another kind of shim. Some of the output has been omitted for brevity.
$ nm $(xcrun -find python3) 0000000100000000 T __mh_execute_header U _posix_spawn U _posix_spawnattr_init U _posix_spawnattr_setbinpref_np U _posix_spawnattr_setflags U dyld_stub_binder
Yup, another shim. This time it's using
posix_spawn, but the problem is exactly the same as before. Let's use
lldb to find out what binary is spawned. We know that the second argument contains the path to the binary to launch.
$ lldb $(xcrun -find python3) (lldb) target create "/Applications/Xcode.app/Contents/Developer/usr/bin/python3" Current executable set to '/Applications/Xcode.app/Contents/Developer/usr/bin/python3' (x86_64). (lldb) b __posix_spawn # Set a breakpoint on posix_spawn. (lldb) run # Run until the breakpoint. Process 21248 launched: '/Applications/Xcode.app/Contents/Developer/usr/bin/python3' (x86_64) Process 21248 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x00007fff6e2a767c libsystem_kernel.dylib`__posix_spawn libsystem_kernel.dylib`__posix_spawn: -> 0x7fff6e2a767c <+0>: movl $0x20000f4, %eax ; imm = 0x20000F4 0x7fff6e2a7681 <+5>: movq %rcx, %r10 0x7fff6e2a7684 <+8>: syscall 0x7fff6e2a7686 <+10>: jae 0x7fff6e2a7690 ; <+20> p (const char*)$arg2 # Print the second argument as a string. (const char *) $1 = 0x00000001006016a0 "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/Resources/Python.app/Contents/MacOS/Python"
Finally, when we launch the path pointing into the Python 3 Framework, everything works as expected.
As you can imagine, it was a lot of fun figuring this out. The most ironic part was that we had a similar issue with Python 2 and worked around it, but forgot the underlying issue. When we moved our CI to Python 3 we got to enjoy this treasure hunt all over again! A wholehearted thank you to my colleagues Adrian Prantl and Dan Liew for helping me figure this one out.
The workaround for Python 2 consisted of re-launching Python with the path returned by
sys.executable. From Python 2, this would return the real interpreter binary. From Python 3 this returns the second shim.