Break the Chains of Version Dependency

  Mar 4, 2014   |      David Noor

linux c++ tips and tricks

At 60East, we strive to make AMPS the most powerful, high-performance, real-time messaging database ever. This philosophy extends to every aspect of AMPS, including installation and packaging. We want you to get up and running as fast as possible, with a zero-friction install process.

Since our customers run AMPS on a wide variety of operating system versions, from quite old to very new, we strive to make a single install image that works everywhere. This means we ship a single set of binaries that run on a range of Linux kernels and library versions. And, since AMPS provides extensibility via a C api and shared library modules, it is important that customers are able to use the latest C and C++ features when writing extension modules.

To meet these challenges – ease of install, version independence, and extensibility – we’ve made some specific technical choices that have produced excellent results. This information isn’t always easy to find or discover, so in an attempt to make a lower-latency world for all of us, we’ve described our approach here.

Packaging up prerequisites

Even though AMPS is an extensible product, many of our customers use the functionality we provide without extending. Many of our customers also keep very trim systems in production – very few extra libraries installed. As a consequence, many systems have no libstdc++.so installed, a support library for C++ applications. If you build an application and send your customer the resulting executable, and they attempt to run it on such a machine, they will of course see:

joe@somecomputer:~$ ./myapp 
 
./myapp: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

A natural reaction to this problem might be to suggest they obtain this library on their own. If they do so, or they already have it installed, your application may still not function:

joe@somecomputer:~$ ./myapp

./myapp: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15` not found ( required by myapp )

Why did this happen? Not every libstdc++.so is the same. As gcc evolves, so do the contents of this library. When version X of a compiler is used to produce a binary, assuming that binary is dynamically linked to libstdc++.so, at least version X of libstdc++.so must be present and loaded as well. This error tells us that the libstdc++.so in /usr/lib64 is too old for our application: our application has used features of the library that aren’t available in the version this computer has installed.

A common way to fix this is is to ship libstdc++.so along with myapp. libstdc++.so depends on libgcc_s.so, so we’ve copied both into the libs/ directory just below myapp. Let’s see how this turns out:

joe@somecomputer:~$ LD_LIBRARY_PATH=libs/ ./myapp 

Hello!

That worked! We can even see exactly which libraries are loaded by myapp, using ldd:

    joe@somecomputer:~$ LD_LIBRARY_PATH=libs/ ldd ./myapp 
    linux-vdso.so.1 =>  (0x00007fffbe9f2000)
        libstdc++.so.6 => libs/libstdc++.so.6 (0x00007fcb13b85000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcb1379b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fcb13496000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fcb13e8b000)
        libgcc_s.so.1 => libs/libgcc_s.so.1 (0x00007fcb13280000)

Notice how these libraries are now loaded from the libs directory instead of our OS. That’s exactly what we want. That LD_LIBRARY_PATH setting is troublesome, though. Sure, we could wrap that up in a shell script that the user runs, instead of running our application. Instead, let’s use another feature of the linker to eliminate the need for LD_LIBRARY_PATH.

rpath to the rescue

So far, when we build myapp, we haven’t done anything special, just g++ -o myapp myapp.cpp. The compiler and linker build an application that, when run, looks in the default system search path (see the man page for ld.so and ldconfig to learn more) and LD_LIBRARY_PATH to find shared libraries. Using the linker’s rpath option, we can build an application that looks in a more specific path for libraries before searching the defaults.

joe@somecomputer:~$ g++ -Wl,-rpath=\$ORIGIN/libs/ myapp.cpp -o myapp
joe@somecomputer:~$ ./myapp
Hello!
joe@somecomputer:~$ ldd myapp
    linux-vdso.so.1 =>  (0x00007fff164fe000)
	libstdc++.so.6 => libs/libstdc++.so.6 (0x00007f2fb6d3c000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2fb6952000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2fb664d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f2fb7042000)
	libgcc_s.so.1 => libs/libgcc_s.so.1 (0x00007f2fb6437000)

Great – now users of myapp no longer need to think about LD_LIBRARY_PATH. In the example, note how we also used a relative path for the -rpath argument along with the special $ORIGIN value. When the program is run, this special $ORIGIN value acts as a special token to the runtime loader, ld.so. This token causes ld.so to locate shared libraries relative to the location of myapp, even if we launch myapp from some other directory. As long as we maintain this same relative location between our binary and libraries, we can redistribute this directory structure and the runtime loader will do the right thing. Problem solved.

libstdc++ has needs, too

You’ve been successful so far distributing your application, when one day, someone emails you asking for assistance. Here’s the error they see:

myapp: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found (required by libs/libstdc++.so.6)

Uh-oh. What is this?

Just like your application is built with and now depends on a certain version of libstdc++.so.6, the same is true for libstdc++.so.6. It is built against a particular version of the C runtime library, libc.so. If libstdc++ uses features from libc that aren’t there on the much older OS where you try to run the application, you’ll see this kind of error. Chances are, this customer’s libc, and likely their Linux kernel, are much older than the one where you built your application.

It is tempting to apply the same technique we did before. Just ship libc.so.6 with our application. This might work, but chances are, you’ll run into a new kind of error when you run the app, with your libc, on an older machine. Here are some of the ones we’ve seen, heading down this road:

oldjoe@oldcomputer:~$ ./myapp

FATAL: kernel too old

Segmentation fault

Eventually, you run into a thing you can’t simply package up and ship yourself: the Linux kernel. You could try to ship an older libc that works on old kernels – but even if you succeeded at shipping libc, and it worked across your target OS versions, that’s not necessarily a wise choice. Shipping an old libc means shipping old security vulnerabilities, for example, and doing so just for the sake of supporting old operating systems.

Instead of all that, we chose to do something recommended by many: build our product, AMPS, on the oldest supported operating system, and then distribute the resultant binaries.

How low can you go?

An important question to ask yourself up-front: what is the oldest Linux version I can reasonably support for my application? Presumably you have an idea of what your customer base needs: perhaps a specific older version is required, or you’re looking to match the support matrix of another important piece of software.

What kernel features does your software, and its dependencies, need? That can be a harder question to answer. If you’ve been developing and testing on relatively new Linux versions, you’ll have work to do, to make sure your software not only functions but performs acceptably on your target OS.

For us, these considerations culminated in choosing Linux 2.6.9 as our low-end. Now the solution to everything is clear: install an old 2.6.9-based distribution, build AMPS, and distribute. Except…

That’s a really old compiler

The 2.6.9-based distribution we used, CentOS 4.8, bundles gcc 3.4.6. That’s really old. C++ has changed a lot over the years, both in terms of language features and library features. It would be one thing if that choice only impacted the author of the application – we would just bear the brunt of using the subset of C++ features available to us. But things get trickier when we want to allow extensibility.

In AMPS, users are allowed to write their own plug-in modules in C or C++, for things like authentication, entitlements, and custom actions. These modules are shared libraries, and by way of configuration, AMPS loads them shortly after start-up. These modules are often quite sophisticated, and many customers desire to use the latest language and library features available to them.

This is where our solution needs a little more work. We’ve already seen the issues with including the libstdc++ from a recent compiler and Linux version. Everything works fine if we build and distribute binaries from our old OS. But if, inside our application, we attempt to dlopen() the customer’s extension module, and that extension module uses newer C++ features, we’re right back to this error message:

joe@somecomputer:~$ ./myapp 
./myapp: libs/libstdc++.so.6: version `GLIBCXX_3.4.15` not found ( required by customer_extension_module.so )

To review, our application, which distributes its own libstdc++, loaded just fine. But the customer’s extension module, which uses new C++ features, doesn’t load. When myapp attempts to load it, the runtime loader sees that libstdc++ is already loaded into the process, and attempts to resolve needed symbols from it, rather than trying to find a newer, better libstdc++. customer_extension_module.so, depending on new features, doesn’t find what it needs in the old libstdc++ we shipped.

Telling customers not to use recent compilers is clearly unsatisfactory. So is telling them to replace the libstdc++ we ship with a newer one. And because some customers don’t have one at all, or only have old ones, we still need to ship one. The best solution we’ve found is a hybrid: ship a new libstdc++ from the newest released gcc (currently 4.8.2). However, we need to build a custom version of this library, and indeed of the compiler, on our oldest target machine: CentOS 4.8. At the end of that process, we’ll have a libstdc++ that contains the latest C++ features, but still loads on the oldest libc and kernel we can support.

Building your own gcc

Building a new GCC for the first time on your legacy OS can be intimidating. This document is not meant to replace nor contradict any of the excellent information regarding how to do so, but instead provides more specific guidance about how to do so for this particular scenario.

Make sure to review the official installation documentation and the installation wiki. Both resources are invaluable in understanding the process of producing a gcc build that is usable for your environment. Note the following:

  1. If you have the choice, start out with a Linux distribution that has a GCC and Make that is recent enough to build the newest GCC. (CentOS 4.8 had adequate versions of all of these prerequisites.)

  2. You’ll likely want to download the prerequisite libraries for GCC and build them from source, rather than using the old ones on your legacy OS. The contrib/download_prerequisites script mentioned in the wiki makes this easy.

  3. You can greatly decrease build time by disabling unnecessary features when configuring. Since I only needed x64 libraries, my configure command looked like this:

    joe@oldcomputer:~/gcc-build$ ../gcc-4.8.2/configure --disable-multilib --enable-languages=c,c++
  4. You probably do not want to replace the default compiler on your old OS. You can specify a different directory to install gcc and the built libraries into using the DESTDIR variable:

    joe@oldcomputer:~/gcc-build$ make DESTDIR=/home/joe/gcc-output-tree install
I ran into mysterious errors when I used a relative path for `DESTDIR`; it appears you must specify an absolute path.

If the build process succeeds, your output tree contains a compiler and a set of shared libraries that will load on platforms at least as old as the one you’re on, and because of the backwards compatibility requirements of libc and the Linux kernel, should work on the most recent OS distributions. Build your application with this compiler, using the rpath technique to ensure it depends on the libraries you will ship. Then, when you’re satisfied with the result, package up the libraries and application together. The final product should work on operating systems as old as the one you built gcc on, all the way to the most recent, and (if applicable) can be extended by customers using the latest gcc and C++ features.

Summary

We accomplished what we set out to do: build a distributable binary package that works on a wide variety of Linux versions, with a minimum of prerequisites, and allow customers to write extension modules with the most recent gcc. To get there yourself:

  1. Identify the oldest Linux version/distribution you need to support, and use it for the following steps.

  2. Install and build the latest gcc on that platform – at least the C and C++ languages.

  3. Build your application there using the -Wl,-rpath flag with a relative path and the $ORIGIN token to specify where your prerequisite libraries should be found at runtime, relative to your binaries.

  4. Package up your application and the libstdc++ and additional prerequisites you built in step (2), and distribute them together. Make sure the directory structure once installed matches the relative path specified in step (3).

It might sound like a lot of work, but the results are worth it. Your customers can get up and running with your application quickly, and extend it with whatever compiler version they’d like.

How Do You Do It?

In this post, we talk about the best way we’ve found so far to ship version-independent binaries and libraries on Linux. Do you have a tip or trick we’ve missed? Curious about something we didn’t go into here? Let us know in the comments!


Read Next:   Easy Request/Response Recipe for AMPS