How the Compiler Toolchain Works - C Post 04
Compilers play a vital role in transforming human-readable source code into executable programs. In this post, we’ll explore how a typical C compiler toolchain works by breaking it into its main stages. To make this concrete, we’ll use a simple example code snippet and follow it step-by-step from source to executable.
Example Program
Here is a simple C program:
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
return 0;
}
Stages of Compilation
Modern C compilers, such as GCC or Clang, divide the compilation process into four primary stages:
- Preprocessing
- Compilation (translation to assembly)
- Assembly
- Linking
Let’s delve into each stage in detail.
1. Preprocessing
- What It Does: The preprocessor handles all preprocessor directives, such as #include and #define. It inserts header file contents directly into the source code, expands macros, and removes comments.
- Command Example:
gcc -E main.c -o main.i
This generates a preprocessed output file named
main.i
. - Key Outcome: The
main.c
file is transformed into “pure” C code with all preprocessor directives replaced or removed.
2. Compilation (to Assembly)
- What It Does: Translates the preprocessed code into assembly language—a human-readable representation of machine instructions.
- Command Example:
gcc -E main.c -o main.i
This generates a preprocessed output file named
main.i
. - Key Outcome:The
main.c
file is transformed into “pure” C code with all preprocessor directives replaced or removed.
3. Assembly
-
What It Does:Converts the assembly code into object code, a binary format understood by the machine. This object file contains the program’s machine instructions along with metadata.
-
Command Example:
gcc -c main.s -o main.o
This creates an object file named main.o.
- Key Outcome:A .o (or .obj) file containing relocatable machine code.
4. Linking
-
What It Does:Combines one or more object files with libraries to produce the final executable. During this step, external references (e.g.,
printf
) are resolved by linking the program to the standard library. -
Command Example:
gcc main.o -o myprogram
This produces an executable file named myprogram. Running ./myprogram prints:
Hello, world!
- Key Outcome:A fully linked and runnable executable file.
End-to-End Compilation
Most users simply run:
gcc main.c -o myprogram
This single command handles all four stages internally: preprocessing, compilation, assembly, and linking. Intermediate files (e.g., main.i
, main.s
, main.o
) are generated and discarded automatically unless specific flags are used.
Static vs. Dynamic Linking
During the linking stage, the linker can resolve external references in two ways:
-
Static Linking:Copies necessary library code directly into the executable, making it self-contained but larger in size.
-
Dynamic Linking:Keeps the executable smaller by referring to shared libraries (e.g., libc.so or msvcrt.dll). These libraries must be present on the system at runtime.
Enjoy Reading This Article?
Here are some more articles you might like to read next: