Why you should care about include dependencies in C/C++ and how to keep them at a minimum
When we think of dependencies, we usually think of “logical dependencies” at first: logical relations between classes, functions etc. Object oriented design, generic programming, encapsulation, design patterns are aimed to cut this “logical dependencies”. But there are also compile time dependencies - dependencies between files and libraries at compile time.
This article is on include dependencies in C/C++ what are a sort of compile time dependencies. This dependencies have a huge impact on building, refactoring, testing and on the structure of your software.
Read on if you want to know how includes are processed, why its dependencies matter, and how to keep them clean.
Include dependencies - an example
Let’s take a look at an example: Suppose we have four header files, a.h
, b.h
, c.h
and d.h
and one source file main.cpp
where some of them include another file. Note that the main.cpp
compiles but the includes are handled in a bad way:
//file a.h
#include "b.h"
void myFuncA() {
funcB();
funcC();
}
//file b.h:
#include "c.h"
int funcB() {
// do anything
}
//file c.h
#include "d.h"
void funcC() {
// do anything
}
//file d.h
void funcD1;
void funcD2;
//file main.cpp
#include "a.h"
int main() {
funcA();
}
Let’s compare logical and include dependencies of this example:
logical dependencies
funcA()
callsfuncB()
andfuncC()
, so it directly depends on this two functions.main()
callsfuncA()
, so it directly depends on this function and viafuncA()
indirectly onfuncB()
.
include/compile dependencies
b.h
includesc.h
so it directly depends on this header (unnecessarily, because there is no logical dependency to any function inc.h
)a.h
includesb.h
so it directly depends on this header. Filea.h
indirectly also depends onc.h
andd.h
becauseb.h
includesc.h
andc.h
includesd.h.
main.cpp
directly depends ona.h
and indirectly also onb.h
,c.h
andd.h
.
Include dependencies (left), logical dependencies (right).
In this example the compile time dependencies have a different structure than the logical dependencies - this is bad. If you want to know why, read on.
Why it matters
For small programs that just consists of a couple of files, include dependencies are usually not a problem. But as soon as your software grows and so the number of include files do, the impact of inappropriately handled includes can be huge:
- Compilation times: The effect on compilation times can be dramatic.
- Complexity: Unnecessary or awkward dependencies add accidental complexity.
- It’s harder to refactor/restructure your program.
- It’s more difficult to test “modules” in isolation.
- Documentation: When you know that the headers accurately reflect what is used in the file, this information can help to understand the code.
Even if your “logical structure” is perfect - you used all methods and patterns to encapsulate and decouple your local design and thoroughly designed your interfaces - you might have this unnecessary dependencies at compile time between the classes X, Y and Z even if they are logically completely independent of each other.
In the following I’ll cover the aspect of compilation times and refactoring more detailed:
Compilation times
When you change a header file, all translation units depending on this header file need to be recompiled. This can be very expensive. Why? Let’s take a look how the #includes are processed when compiling a C/C++ file. Directly from the gcc documentation:
The ‘#include’ directive directs the preprocessor to scan the specified include file as input before continuing with the rest of the current file. The output from the preprocessor contains the output already generated, followed by the output resulting from the included file, followed by the output that comes from the text after the ‘#include’ directive.
In other words, an #include <file.h>
is some kind of a recursive copy and paste operation: a command for the preprocessor to open and read the file file.h
and replace #include <file.h>
with the content of file.h
. file.h
might also include other files that the preprocessor must process in the same way…
We can take a look at the output of the preprocessor stage when compiling our main.cpp
from above by calling gcc -E main.cpp
(unnecessary lines stripped):
void funcD1();
void funcD2();
void funcC() {
}
int funcB() {
}
void funcA() {
funcB();
funcC();
}
int main() {
funcA();
}
This is the code the compiler actually has to compile. Note that also the content of file d.h
is there, even if funcD1
and funcD2
are not used anywhere. Now you might also take a look at the output of gcc -E main.cpp
or gcc -MM -H main.cpp
after you included
The preprocessor has to open, read, preprocess and parse all direct and indirect include dependencies for every file to compile!
This potentially huge number of filesystem operations and preprocessings can take a significant amount of time when compiling C/C++ code. Taking care of this include dependencies is even more important when using large “header libraries” that make extensive use of templates, like the Eigen math library or Boost C++ libraries. They are written almost entirely as header files that the user #includes, and not being linked at runtime.
Refactoring
Refactoring/restructuring a large code-base with many files and dependencies between them can be even more painful if the #includes are organized in a bad way.
Let’s take a look at our example again: Suppose we try to improve the quality of our code and we realize that b.h
unnecessarily includes c.h
(because there is no logical dependency between b.h
and c.h
). Therefore we remove the superfluous #include "c.h"
from b.h
. When trying to compile main.cpp
again we get the error in a.h
:
error: ‘funcC’ was not declared in this scope because a.h is not self-sufficient.
In this simple example this problem can be solved easily by adding #include "c.h"
to a.h
, but for large projects with complex header dependencies this type of problem can become expensive. It might be necessary to modify a ton of other files (add #includes) just because you modified a single #include.
The same kind of problem shows up when you want to extract/split off certain logically independent functionality as modules. It can become a project on its own to resolve the compile dependencies if the logical dependency structure differs from the header dependency structure (see the figure with include and logical dependencies above).
How to avoid unnecessary include dependencies
To avoid the problems described above you should keep the following guidelines in mind:
- Avoid superfluous includes: Don’t include headers that are not used by the header file itself. Our example
b.h
demonstrates the counterexample -b.h
includes c.h even but it is not used directly -> unnecessary dependency. Remember that a superfluous include not just pulls in the unnecessarily included header, but recursively also all its dependencies! - Header files should be “self-sufficient”: A header file is self-sufficient if it doesn’t depend on the context of where it is included to work correctly. This means is does not depend on a header file included somewhere else in your project to compile. So a self-sufficient header is compilable alone.
- Use forward declarations when possible: Don’t #include a header when a forward declaration is enough. This is a very effective method to avoid/cut include dependencies. This is also a useful technique to break cyclic dependencies.
- Clearly separate code and declarations: In the common case headers should just contain declarations, no code (an exception are templates). Code should be put into a corresponding implementation .cpp (or whatever extension) file. “Code” has usually more include dependencies than it’s interface (the declarations). By separating code and declarations we avoid propagating include dependencies that are just used by the implementation.
Additional tip - this is more related to architecture than to code:
- Think in modules: Try to “align” the physical structure and logical structure: For each logical unit use a separate physical unit. For example: For each class (logical unit) use dedicated .h/.cpp pair (physical unit). Fora collection of related classes (higher level logical unit) use a dedicated library (higher level physical unit)…
- Distinguish between public and private interfaces on all levels of abstraction: E.g. distinguish between public includes (includes allowed to be included by higher levels of abstractions) and private includes (includes just allowed to be included “internally” by the current module itself.
A clear “physical+logical encapsulation” provides a consistent view of the structure of the software. This consistent view is important for both: handling physical and logical dependencies.
Further reading: I should read the book “Large-Scale C++ Software Design” by John Lakos: It covers the problem of physical dependencies in-depth. I stumbled upon this book several times and it is definitely on top of my reading list.
I think I covered the most important aspects of C/C++ include dependencies. If you like or dislike this article, have suggestions, opinions or questions please leave comment below. I appreciate any feedback.
Update: I wrote a follow up article on tools for analyzing and adjusting include dependencies: Open source tools to examine and adjust C/C++ include dependencies.