Understanding Linux ELF, Shared Libraries, and the Internals of ld.so

7 minute read

When you run a compiled C/C++ program in a Linux environment, it might seem as though the machine code of the executable is simply loaded directly into the CPU and executed. However, behind the scenes, there is a complex, cooperative dance involving the compiler, linker, operating system kernel, and dynamic loader (ld.so).

This article covers the Linux binary format ELF, explains the differences between static and dynamic executables, clarifies how shared libraries are linked and loaded, and explores how the dynamic loader bootstraps itself to start your program.


1. What Happens Behind “Hello World”

Let us examine a simple C program to understand the mystery behind execution.

#include <stdio.h>

int main(void)
{
    printf("Hello World!\n");
    return 0;
}

We compile and run this code:

gcc main.c -o myapp
./myapp

Here, a fundamental question arises: Since we did not implement the printf() function ourselves, how does the program locate and execute it?

The answer lies in the C standard library, libc. By default, when you compile with gcc on Linux, the compiler links your program with the system’s standard shared library. Instead of embedding the actual machine code for printf() inside your executable, it simply writes metadata stating: “At runtime, this program requires the C standard library (libc.so).”


2. What is ELF (Executable and Linkable Format)?

In Linux, executables, shared libraries, and object files all share a unified binary structure called ELF (Executable and Linkable Format).

The ELF Format Family
├─ Object File (.o) : The output of compilation, still requiring linking
├─ Static Library (.a) : An archive bundling multiple object files together
├─ Shared Library (.so) : An object built to be dynamically shared and loaded at runtime
├─ Executable : A binary that the kernel can load and execute directly
└─ Dynamic Loader (ld.so) : A special ELF that prepares and runs dynamic executables

When the kernel decides whether a file is an executable or a shared library, it does not rely on file extensions (like .so or .out). Instead, it reads the type information (e_type) and segment attributes defined inside the ELF header.


3. Static Executables vs. Dynamic Executables

Based on how external libraries are integrated during build time, executables are classified as either static or dynamic.

Feature Static Executable Dynamic Executable
Linking Time Integrates all library code into the executable during compile/link time. Links shared libraries dynamically at runtime.
File Size Very large (includes the library code). Very small (only contains library metadata).
Memory Efficiency High redundancy; every running program duplicates library memory. High efficiency; multiple processes share physical memory pages of the shared library.
Performance Starts slightly faster because there is no runtime address resolution. Has overhead from runtime address relocation and symbol lookup.
Portability Highly portable; runs anywhere on the same architecture without library dependencies. Requires exact versions of .so files to be present on the host environment.
Kernel Flow Lacks a PT_INTERP segment; the kernel jumps directly to the entry point. The kernel runs the dynamic loader (ld.so) specified by PT_INTERP first.

4. Types of Libraries: Static Libraries vs. Shared Libraries

Libraries are also categorized into two types, corresponding to the linking methods.

4.1 Static Libraries (.a)

  • An archive bundling multiple object files (.o) into a single file.
  • The linker extracts only the required object code and copies it directly into the executable.
  • Pros: Once the executable is built, it can run completely independently.
  • Cons: To fix a bug in the library, you must recompile and relink the entire executable.

4.2 Shared Libraries (.so)

  • Binaries designed to be loaded into memory and shared among multiple processes.
  • Compiled as Position Independent Code (PIC) so they can run correctly regardless of where they are loaded in memory.
  • Pros: You can fix bugs by updating the .so file without recompiling the main executables, saving disk space and memory.

5. Dual Mechanisms for Using Shared Libraries

Dynamic executables use shared libraries in two distinct ways:

Shared Library Loading Mechanisms
├─ 1. Implicit Linking (Implicit Loading)
│   └─ The library is specified at link time, and the dynamic loader loads it automatically at startup (e.g., recorded in DT_NEEDED).
└─ 2. Explicit Linking (Dynamic Loading)
    └─ The program loads the library programmatically at runtime using APIs (e.g., calling dlopen() and dlsym()).

5.1 Explicit Linking Example

#include <dlfcn.h>

void* handle = dlopen("libmath.so", RTLD_LAZY);
double (*cosine)(double) = dlsym(handle, "cos");
double val = cosine(2.0);
dlclose(handle);

This method is common for implementing plugin architectures or accelerating program startup.


6. Understanding DT_NEEDED and PT_INTERP

When a dynamic executable runs, two key pieces of metadata in the ELF header and segments are critical.

6.1 DT_NEEDED (Dynamic Tag - Needed)

  • Specifies the name of the shared library that the executable must load at runtime.
  • Typically, only the name (e.g., libc.so.6) is recorded, not the absolute path. The dynamic loader locates the file at runtime.
  • To check: readelf -d myapp | grep NEEDED

6.2 PT_INTERP (Program Header - Interpreter)

  • Records the absolute path to the dynamic loader (interpreter) that the kernel must invoke before running the executable.
  • Usually, a path like /lib64/ld-linux-x86-64.so.2 is hardcoded here.
  • To check: readelf -l myapp | grep -A1 INTERP

[!NOTE] Key Distinction

  • PT_INTERP: “Who is the loader responsible for setting up this program?” (Always unique)
  • DT_NEEDED: “Which libraries contain the functions this program needs to call?” (Can have multiple entries)

7. The Lifecycle of the Dynamic Loader (ld.so)

When a dynamic executable is executed (execve) by the kernel, the startup sequence is as follows:

1. The kernel maps the executable into memory, reads the PT_INTERP header, and maps ld.so.
2. The kernel transfers control to the entry point of ld.so.
3. ld.so performs a "Self-Bootstrap" routine.
4. ld.so analyzes the main executable's ELF headers and loads all libraries listed in DT_NEEDED.
5. ld.so performs address relocation and resolves symbols across all loaded binaries.
6. The constructors and initialization blocks of the shared libraries are executed.
7. ld.so transfers control to the main executable's real entry point (_start -> main()).

7.1 Dynamic Loader Self-Bootstrap

The dynamic loader itself is an ELF file built with a shared object structure (ET_DYN). Therefore, it has no fixed loading address and requires relocation. However, since there is no other loader to relocate it, ld.so must execute a highly constrained bootstrap routine that does not reference external libraries or global variables. It manually calculates its own load offset and relocates its own pointers.

7.2 Library Search Order

When resolving DT_NEEDED entries, ld.so searches for files in the following priority:

  1. DT_RPATH: The legacy path recorded inside the ELF (ignored if DT_RUNPATH is set).
  2. LD_LIBRARY_PATH: An environment variable providing temporary search paths (disabled for setuid binaries due to security concerns).
  3. DT_RUNPATH: The modern recommended path inside the ELF (can use $ORIGIN to define paths relative to the executable’s directory).
  4. /etc/ld.so.cache: A system-wide cache file updated by running ldconfig.
  5. Default Paths: System directories such as /lib, /usr/lib, /lib64, and /usr/lib64.

8. Real-world Debugging and Useful Commands

8.1 Checking Dependencies and Resolved Paths (ldd)

ldd ./myapp

ldd does not merely parse the binary structure; it sets the environment variable LD_TRACE_LOADED_OBJECTS=1 and runs the dynamic loader to simulate and print out the resolved library paths.

8.2 Modifying Binary Metadata (patchelf)

A powerful tool to modify ELF configurations without recompiling source code.

  • Change the interpreter (dynamic loader):

    patchelf --set-interpreter /custom/ld.so ./myapp
    
  • Set RUNPATH (relative to the executable):

    patchelf --set-rpath '$ORIGIN/lib' ./myapp
    
  • Replace dependency library name:

    patchelf --replace-needed libold.so libnew.so ./myapp
    

9. Common Misconceptions Clarified

  • Q1. Can .so files be run directly?
    • A: Yes, file extensions are just conventions. If an ELF has an entry point, it can be run. In fact, you can run an executable that lacks execute permission by invoking the dynamic loader and passing the target binary as an argument.
    • Example: /lib64/ld-linux-x86-64.so.2 ./myapp
  • Q2. file reports dynamically linked but ldd reports statically linked for the dynamic loader. Why?
    • A: This happens when checking ld.so. file looks at the ELF headers and identifies it as ET_DYN (dynamic structure). ldd tracks whether the binary depends on other external libraries (DT_NEEDED). Since ld.so is self-contained and has no external dependencies, ldd reports it as statically linked.
  • Q3. Does the dynamic loader exit after starting the program?
    • A: No, the loader remains in the process address space. It serves as a runtime support library to handle later programmatic loading requests like dlopen() and dlsym().

10. Conclusion (Takeaways)

The dynamic execution model on Linux can be summarized in a single sentence:

“The OS kernel reads the PT_INTERP segment of the executable to load and run the dynamic loader (ld.so), which bootstraps itself, loads the shared libraries listed in DT_NEEDED, performs symbol relocation, and finally hands control over to the program’s real entry point.”

Understanding this cycle makes diagnosing “library not found” errors, resolving version conflicts, and managing complex application deployments straightforward.

Reference

Leave a comment