Understanding Linux ELF, Shared Libraries, and the Internals of ld.so
When you run a compiled C/C++ program in a Linux environment, it might seem as though the machine code of the executable is simply loaded directly into the CPU and executed. However, behind the scenes, there is a complex, cooperative dance involving the compiler, linker, operating system kernel, and dynamic loader (ld.so).
This article covers the Linux binary format ELF, explains the differences between static and dynamic executables, clarifies how shared libraries are linked and loaded, and explores how the dynamic loader bootstraps itself to start your program.
1. What Happens Behind “Hello World”
Let us examine a simple C program to understand the mystery behind execution.
#include <stdio.h>
int main(void)
{
printf("Hello World!\n");
return 0;
}
We compile and run this code:
gcc main.c -o myapp
./myapp
Here, a fundamental question arises: Since we did not implement the printf() function ourselves, how does the program locate and execute it?
The answer lies in the C standard library, libc. By default, when you compile with gcc on Linux, the compiler links your program with the system’s standard shared library. Instead of embedding the actual machine code for printf() inside your executable, it simply writes metadata stating: “At runtime, this program requires the C standard library (libc.so).”
2. What is ELF (Executable and Linkable Format)?
In Linux, executables, shared libraries, and object files all share a unified binary structure called ELF (Executable and Linkable Format).
The ELF Format Family
├─ Object File (.o) : The output of compilation, still requiring linking
├─ Static Library (.a) : An archive bundling multiple object files together
├─ Shared Library (.so) : An object built to be dynamically shared and loaded at runtime
├─ Executable : A binary that the kernel can load and execute directly
└─ Dynamic Loader (ld.so) : A special ELF that prepares and runs dynamic executables
When the kernel decides whether a file is an executable or a shared library, it does not rely on file extensions (like .so or .out). Instead, it reads the type information (e_type) and segment attributes defined inside the ELF header.
3. Static Executables vs. Dynamic Executables
Based on how external libraries are integrated during build time, executables are classified as either static or dynamic.
| Feature | Static Executable | Dynamic Executable |
|---|---|---|
| Linking Time | Integrates all library code into the executable during compile/link time. | Links shared libraries dynamically at runtime. |
| File Size | Very large (includes the library code). | Very small (only contains library metadata). |
| Memory Efficiency | High redundancy; every running program duplicates library memory. | High efficiency; multiple processes share physical memory pages of the shared library. |
| Performance | Starts slightly faster because there is no runtime address resolution. | Has overhead from runtime address relocation and symbol lookup. |
| Portability | Highly portable; runs anywhere on the same architecture without library dependencies. | Requires exact versions of .so files to be present on the host environment. |
| Kernel Flow | Lacks a PT_INTERP segment; the kernel jumps directly to the entry point. |
The kernel runs the dynamic loader (ld.so) specified by PT_INTERP first. |
4. Types of Libraries: Static Libraries vs. Shared Libraries
Libraries are also categorized into two types, corresponding to the linking methods.
4.1 Static Libraries (.a)
- An archive bundling multiple object files (
.o) into a single file. - The linker extracts only the required object code and copies it directly into the executable.
- Pros: Once the executable is built, it can run completely independently.
- Cons: To fix a bug in the library, you must recompile and relink the entire executable.
4.2 Shared Libraries (.so)
- Binaries designed to be loaded into memory and shared among multiple processes.
- Compiled as Position Independent Code (PIC) so they can run correctly regardless of where they are loaded in memory.
- Pros: You can fix bugs by updating the
.sofile without recompiling the main executables, saving disk space and memory.
5. Dual Mechanisms for Using Shared Libraries
Dynamic executables use shared libraries in two distinct ways:
Shared Library Loading Mechanisms
├─ 1. Implicit Linking (Implicit Loading)
│ └─ The library is specified at link time, and the dynamic loader loads it automatically at startup (e.g., recorded in DT_NEEDED).
└─ 2. Explicit Linking (Dynamic Loading)
└─ The program loads the library programmatically at runtime using APIs (e.g., calling dlopen() and dlsym()).
5.1 Explicit Linking Example
#include <dlfcn.h>
void* handle = dlopen("libmath.so", RTLD_LAZY);
double (*cosine)(double) = dlsym(handle, "cos");
double val = cosine(2.0);
dlclose(handle);
This method is common for implementing plugin architectures or accelerating program startup.
6. Understanding DT_NEEDED and PT_INTERP
When a dynamic executable runs, two key pieces of metadata in the ELF header and segments are critical.
6.1 DT_NEEDED (Dynamic Tag - Needed)
- Specifies the name of the shared library that the executable must load at runtime.
- Typically, only the name (e.g.,
libc.so.6) is recorded, not the absolute path. The dynamic loader locates the file at runtime. - To check:
readelf -d myapp | grep NEEDED
6.2 PT_INTERP (Program Header - Interpreter)
- Records the absolute path to the dynamic loader (interpreter) that the kernel must invoke before running the executable.
- Usually, a path like
/lib64/ld-linux-x86-64.so.2is hardcoded here. - To check:
readelf -l myapp | grep -A1 INTERP
[!NOTE] Key Distinction
PT_INTERP: “Who is the loader responsible for setting up this program?” (Always unique)DT_NEEDED: “Which libraries contain the functions this program needs to call?” (Can have multiple entries)
7. The Lifecycle of the Dynamic Loader (ld.so)
When a dynamic executable is executed (execve) by the kernel, the startup sequence is as follows:
1. The kernel maps the executable into memory, reads the PT_INTERP header, and maps ld.so.
2. The kernel transfers control to the entry point of ld.so.
3. ld.so performs a "Self-Bootstrap" routine.
4. ld.so analyzes the main executable's ELF headers and loads all libraries listed in DT_NEEDED.
5. ld.so performs address relocation and resolves symbols across all loaded binaries.
6. The constructors and initialization blocks of the shared libraries are executed.
7. ld.so transfers control to the main executable's real entry point (_start -> main()).
7.1 Dynamic Loader Self-Bootstrap
The dynamic loader itself is an ELF file built with a shared object structure (ET_DYN). Therefore, it has no fixed loading address and requires relocation.
However, since there is no other loader to relocate it, ld.so must execute a highly constrained bootstrap routine that does not reference external libraries or global variables. It manually calculates its own load offset and relocates its own pointers.
7.2 Library Search Order
When resolving DT_NEEDED entries, ld.so searches for files in the following priority:
DT_RPATH: The legacy path recorded inside the ELF (ignored ifDT_RUNPATHis set).LD_LIBRARY_PATH: An environment variable providing temporary search paths (disabled for setuid binaries due to security concerns).DT_RUNPATH: The modern recommended path inside the ELF (can use$ORIGINto define paths relative to the executable’s directory)./etc/ld.so.cache: A system-wide cache file updated by runningldconfig.- Default Paths: System directories such as
/lib,/usr/lib,/lib64, and/usr/lib64.
8. Real-world Debugging and Useful Commands
8.1 Checking Dependencies and Resolved Paths (ldd)
ldd ./myapp
ldd does not merely parse the binary structure; it sets the environment variable LD_TRACE_LOADED_OBJECTS=1 and runs the dynamic loader to simulate and print out the resolved library paths.
8.2 Modifying Binary Metadata (patchelf)
A powerful tool to modify ELF configurations without recompiling source code.
-
Change the interpreter (dynamic loader):
patchelf --set-interpreter /custom/ld.so ./myapp -
Set RUNPATH (relative to the executable):
patchelf --set-rpath '$ORIGIN/lib' ./myapp -
Replace dependency library name:
patchelf --replace-needed libold.so libnew.so ./myapp
9. Common Misconceptions Clarified
- Q1. Can
.sofiles be run directly?- A: Yes, file extensions are just conventions. If an ELF has an entry point, it can be run. In fact, you can run an executable that lacks execute permission by invoking the dynamic loader and passing the target binary as an argument.
- Example:
/lib64/ld-linux-x86-64.so.2 ./myapp
- Q2.
filereportsdynamically linkedbutlddreportsstatically linkedfor the dynamic loader. Why?- A: This happens when checking
ld.so.filelooks at the ELF headers and identifies it asET_DYN(dynamic structure).lddtracks whether the binary depends on other external libraries (DT_NEEDED). Sinceld.sois self-contained and has no external dependencies,lddreports it asstatically linked.
- A: This happens when checking
- Q3. Does the dynamic loader exit after starting the program?
- A: No, the loader remains in the process address space. It serves as a runtime support library to handle later programmatic loading requests like
dlopen()anddlsym().
- A: No, the loader remains in the process address space. It serves as a runtime support library to handle later programmatic loading requests like
10. Conclusion (Takeaways)
The dynamic execution model on Linux can be summarized in a single sentence:
“The OS kernel reads the
PT_INTERPsegment of the executable to load and run the dynamic loader (ld.so), which bootstraps itself, loads the shared libraries listed inDT_NEEDED, performs symbol relocation, and finally hands control over to the program’s real entry point.”
Understanding this cycle makes diagnosing “library not found” errors, resolving version conflicts, and managing complex application deployments straightforward.
Leave a comment