Videos
I want to understand header files better. Both C and C++ are not languages I work with commonly but I am interested in them.
If I have a file called myCoolFile.c and I want to call a function from my other file myOtherCoolerFile.c, how does that exactly work?
My understanding is that I use a header file but if I declare the function in that file then I run the issue of it not having a body. It only has a declaration.
I could define the function body in the header file but then I might as well have just included myOtherCoolerFile.c instead.
Sorry for the newbie question. I am not overly familiar with C/C++.
Header files (.h) are designed to provide the information that will be needed in multiple files. Things like class declarations, function prototypes, and enumerations typically go in header files. In a word, "declarations".
Code files (.cpp) are designed to provide the implementation information that only needs to be known in one file. In general, function bodies, and internal variables that should/will never be accessed by other modules, are what belong in .cpp files. In a word, "definitions".
The simplest question to ask yourself to determine what belongs where is "if I change this, will I have to change code in other files to make things compile again?" If the answer is "yes" it probably belongs in the header file; if the answer is "no" it probably belongs in the code file.
Fact is, in C++, this is somewhat more complicated that the C header/source organization.
What does the compiler see?
The compiler sees one big source (.cpp) file with its headers properly included. The source file is the compilation unit that will be compiled into an object file.
So, why are headers necessary?
Because one compilation unit could need information about an implementation in another compilation unit. So one can write for example the implementation of a function in one source, and write the declaration of this function in another source needing to use it.
In this case, there are two copies of the same information. Which is evil...
The solution is to share some details. While the implementation should remain in the Source, the declaration of shared symbols, like functions, or definition of structures, classes, enums, etc., could need to be shared.
Headers are used to put those shared details.
Move to the header the declarations of what need to be shared between multiple sources
Nothing more?
In C++, there are some other things that could be put in the header because, they need, too, be shared:
- inline code
- templates
- constants (usually those you want to use inside switches...)
Move to the header EVERYTHING what need to be shared, including shared implementations
Does it then mean that there could be sources inside the headers?
Yes. In fact, there are a lot of different things that could be inside a "header" (i.e. shared between sources).
- Forward declarations
- declarations/definition of functions/structs/classes/templates
- implementation of inline and templated code
It becomes complicated, and in some cases (circular dependencies between symbols), impossible to keep it in one header.
Headers can be broken down into three parts
This means that, in an extreme case, you could have:
- a forward declaration header
- a declaration/definition header
- an implementation header
- an implementation source
Let's imagine we have a templated MyObject. We could have:
// - - - - MyObject_forward.hpp - - - -
// This header is included by the code which need to know MyObject
// does exist, but nothing more.
template<typename T>
class MyObject ;
.
// - - - - MyObject_declaration.hpp - - - -
// This header is included by the code which need to know how
// MyObject is defined, but nothing more.
#include <MyObject_forward.hpp>
template<typename T>
class MyObject
{
public :
MyObject() ;
// Etc.
} ;
void doSomething() ;
.
// - - - - MyObject_implementation.hpp - - - -
// This header is included by the code which need to see
// the implementation of the methods/functions of MyObject,
// but nothing more.
#include <MyObject_declaration.hpp>
template<typename T>
MyObject<T>::MyObject()
{
doSomething() ;
}
// etc.
.
// - - - - MyObject_source.cpp - - - -
// This source will have implementation that does not need to
// be shared, which, for templated code, usually means nothing...
#include <MyObject_implementation.hpp>
void doSomething()
{
// etc.
} ;
// etc.
Wow!
In the "real life", it is usually less complicated. Most code will have only a simple header/source organisation, with some inlined code in the source.
But in other cases (templated objects knowing each others), I had to have for each object separate declaration and implementation headers, with an empty source including those headers just to help me see some compilation errors.
Another reason to break down headers into separate headers could be to speed up the compilation, limiting the quantity of symbols parsed to the strict necessary, and avoiding unecessary recompilation of a source who cares only for the forward declaration when an inline method implementation changed.
Conclusion
You should make your code organization both as simple as possible, and as modular as possible. Put as much as possible in the source file. Only expose in headers what needs to be shared.
But the day you'll have circular dependancies between templated objects, don't be surprised if your code organization becomes somewhat more "interesting" that the plain header/source organization...
^_^
.c : c file (where the real action is, in general)
.h : header file (to be included with a preprocessor #include directive). Contains stuff that is normally deemed to be shared with other parts of your code, like function prototypes, #define'd stuff, extern declaration for global variables (oh, the horror) and the like.
Technically, you could put everything in a single file. A whole C program. million of lines. But we humans tend to organize things. So you create different C files, each one containing particular functions. That's all nice and clean. Then suddenly you realize that a declaration you have into a given C file should exist also in another C file. So you would duplicate them. The best is therefore to extract the declaration and put it into a common file, which is the .h
For example, in the cs50.h you find what are called "forward declarations" of your functions. A forward declaration is a quick way to tell the compiler how a function should be called (e.g. what input params) and what it returns, so it can perform proper checking (for example if you call a function with the wrong number of parameters, it will complain).
Another example. Suppose you write a .c file containing a function performing regular expression matching. You want your function to accept the regular expression, the string to match, and a parameter that tells if the comparison has to be case insensitive.
in the .c you will therefore put
bool matches(string regexp, string s, int flags) { the code }
Now, assume you want to pass the following flags:
0: if the search is case sensitive
1: if the search is case insensitive
And you want to keep yourself open to new flags, so you did not put a boolean. playing with numbers is hard, so you define useful names for these flags
#define MATCH_CASE_SENSITIVE 0
#define MATCH_CASE_INSENSITIVE 1
This info goes into the .h, because if any program wants to use these labels, it has no way of knowing them unless you include the info. Of course you can put them in the .c, but then you would have to include the .c code (whole!) which is a waste of time and a source of trouble.
Of course, there is nothing that says the extension of a header file must be .h and the extension of a C source file must be .c. These are useful conventions.
E:\Temp> type my.interface
#ifndef MY_INTERFACE_INCLUDED
#define MYBUFFERSIZE 8
#define MY_INTERFACE_INCLUDED
#endif
E:\Temp> type my.source
#include <stdio.h>
#include "my.interface"
int main(void) {
char x[MYBUFFERSIZE] = {0};
x[0] = 'a';
puts(x);
return 0;
}
E:\Temp> gcc -x c my.source -o my.exe
E:\Temp> my
a
Header files are needed to declare functions and variables that are available. You might not have access to the definitions (=the .c files) at all; C supports binary-only distribution of code in libraries.
The compiler needs the information in the header files to know what functions, structures, etc are available and how to use them.
All languages needs this kind of information, although they retrieve the information in different ways. For example, a Java compiler does this by scanning either the class-file or the java source code to retrieve the information.
The drawback with the Java-way is that the compiler potentially needs to hold a much more of information in its memory to be able to do this. This is no big deal today, but in the seventies, when the C language was created, it was simply not possible to keep that much information in memory.
In short;
The header file defines the API for a module. It's a contract listing which methods a third party can call. The module can be considered a black box to third parties.
The implementation implements the module. It is the inside of the black box. As a developer of a module you have to write this, but as a user of a third party module you shouldn't need to know anything about the implementation. The header should contain all the information you need.
Some parts of a header file could be auto generated - the method declarations. This would require you to annotate the implementation as there are likely to be private methods in the implementation which don't form part of the API and don't belong in the header.
Header files sometimes have other information in them; type definitions, constant definitions etc. These belong in the header file, and not in the implementation.
The main reason for a header is to be able to #include it in some other file, so you can use the functions in one file from that other file. The header includes (only) enough to be able to use the functions, not the functions themselves, so (we hope) compiling it is considerably faster.
Maintaining the two separately most results from nobody ever having written an editor that automates the process very well. There's not really a lot of reason they couldn't do so, and a few have even tried to -- but the editors that have done so have never done very well in the market, and the more mainstream editors haven't adopted it.