A NULL pointer points to memory that doesn't exist. This may be address 0x00000000 or any other implementation-defined value (as long as it can never be a real address). Dereferencing it means trying to access whatever is pointed to by the pointer. The * operator is the dereferencing operator:
int a, b, c; // some integers
int *pi; // a pointer to an integer
a = 5;
pi = &a; // pi points to a
b = *pi; // b is now 5
pi = NULL;
c = *pi; // this is a NULL pointer dereference
This is exactly the same thing as a NullReferenceException in C#, except that pointers in C can point to any data object, even elements inside an array.
Dereferencing just means accessing the memory value at a given address. So when you have a pointer to something, to dereference the pointer means to read or write the data that the pointer points to.
In C, the unary * operator is the dereferencing operator. If x is a pointer, then *x is what x points to. The unary & operator is the address-of operator. If x is anything, then &x is the address at which x is stored in memory. The * and & operators are inverses of each other: if x is any data, and y is any pointer, then these equations are always true:
*(&x) == x
&(*y) == y
A null pointer is a pointer that does not point to any valid data (but it is not the only such pointer). The C standard says that it is undefined behavior to dereference a null pointer. This means that absolutely anything could happen: the program could crash, it could continue working silently, or it could erase your hard drive (although that's rather unlikely).
In most implementations, you will get a "segmentation fault" or "access violation" if you try to do so, which will almost always result in your program being terminated by the operating system. Here's one way a null pointer could be dereferenced:
int *x = NULL; // x is a null pointer
int y = *x; // CRASH: dereference x, trying to read it
*x = 0; // CRASH: dereference x, trying to write it
And yes, dereferencing a null pointer is pretty much exactly like a NullReferenceException in C# (or a NullPointerException in Java), except that the langauge standard is a little more helpful here. In C#, dereferencing a null reference has well-defined behavior: it always throws a NullReferenceException. There's no way that your program could continue working silently or erase your hard drive like in C (unless there's a bug in the language runtime, but again that's incredibly unlikely as well).
What happens when dereferencing a nullptr?
Dereferencing null pointers - what does the standard say?
What happens in OS when we dereference a NULL pointer in C? - Stack Overflow
C++ standard: dereferencing NULL pointer to get a reference? - Stack Overflow
Videos
I saw this code in A Tour of C++, but with a bit modify for illustration:
#include <iostream>
int main() {
char s = 'a';
char *p = &s;
while (*p) {
std::cout << *p;
p++;
}
p = nullptr;
//std::cout << (*p == true);
*p == true;
}
I do not know how does while (*p) { end while I do not know what happens when p is nullptr. And std::cout << (*p == true) will induce segment fault but *p == true does not.
01: #include <iostream>
02:
03: class greeter
04: {
05: public:
06: void hello()
07: {
08: std::cout << "Hello world";
09: }
10: };
11:
12: int main()
13: {
14: ((greeter*)nullptr)->hello();
15: }runs with no warnings on -Weveryting -Wall on gcc, no warnings on MSVC /W4 either.
https://godbolt.org/z/779Y4Ejzz
I'm sitting with the standard open but I must admit this is taking me forever to find. Do any of you know where to look?
EDIT: So far in my own research, I've got this from 21 years ago:
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
At one point we agreed that dereferencing a null pointer was not undefined; only using the resulting value had undefined behavior.
Short answer: it depends on a lot of factors, including the compiler, processor architecture, specific processor model, and the OS, among others.
Long answer (x86 and x86-64): Let's go down to the lowest level: the CPU. On x86 and x86-64, that code will typically compile into an instruction or instruction sequence like this:
movl $10, 0x00000000
Which says to "store the constant integer 10 at virtual memory address 0". The Intel® 64 and IA-32 Architectures Software Developer Manuals describe in detail what happens when this instruction gets executed, so I'm going to summarize it for you.
The CPU can operate in several different modes, several of which are for backwards compatibility with much older CPUs. Modern operating systems run user-level code in a mode called protected mode, which uses paging to convert virtual addresses into physical addresses.
For each process, the OS keeps a page table which dictates how the addresses are mapped. The page table is stored in memory in a specific format (and protected so that they can not be modified by the user code) that the CPU understands. For every memory access that happens, the CPU translates it according to the page table. If the translation succeeds, it performs the corresponding read/write to the physical memory location.
The interesting things happen when the address translation fails. Not all addresses are valid, and if any memory access generates an invalid address, the processor raises a page fault exception. This triggers a transition from user mode (aka current privilege level (CPL) 3 on x86/x86-64) into kernel mode (aka CPL 0) to a specific location in the kernel's code, as defined by the interrupt descriptor table (IDT).
The kernel regains control and, based on the information from the exception and the process's page table, figures out what happened. In this case, it realizes that the user-level process accessed an invalid memory location, and then it reacts accordingly. On Windows, it will invoke structured exception handling to allow the user code to handle the exception. On POSIX systems, the OS will deliver a SIGSEGV signal to the process.
In other cases, the OS will handle the page fault internally and restart the process from its current location as if nothing happened. For example, guard pages are placed at the bottom of the stack to allow the stack to grow on demand up to a limit, instead of preallocating a large amount of memory for the stack. Similar mechanisms are used for achieving copy-on-write memory.
In modern OSes, the page tables are usually set up to make the address 0 an invalid virtual address. But sometimes it's possible to change that, e.g. on Linux by writing 0 to the pseudofile /proc/sys/vm/mmap_min_addr, after which it's possible to use mmap(2) to map the virtual address 0. In that case, dereferencing a null pointer would not cause a page fault.
The above discussion is all about what happens when the original code is running in user space. But this could also happen inside the kernel. The kernel can (and is certainly much more likely than user code to) map the virtual address 0, so such a memory access would be normal. But if it's not mapped, then what happens then is largely similar: the CPU raises a page fault error which traps into a predefined point at the kernel, the kernel examines what happened, and reacts accordingly. If the kernel can't recover from the exception, it will typically panic in some fashion (kernel panic, kernel oops, or a BSOD on Windows, e.g.) by printing out some debug information to the console or serial port and then halting.
See also Much ado about NULL: Exploiting a kernel NULL dereference for an example of how an attacker could exploit a null pointer dereference bug from inside the kernel in order to gain root privileges on a Linux machine.
As a side note, just to compel the differences in architectures, a certain OS developed and maintained by a company known for their three-letter acronym name and often referred to as a large primary color has a most-fasicnating NULL determination.
They utilize a 128-bit linear address space for ALL data (memory AND disk) in one giant "thing". In accordance with their OS, a "valid" pointer must be placed on a 128-bit boundary within that address space. This, btw, causes fascinating side effects for structs, packed or not, that house pointers. Anyway, tucked away in a per-process dedicated page is a bitmap that assigns one bit for every valid location in a process address space where a valid pointer can lay. ALL opcodes on their hardware and OS that can generate and return a valid memory address and assign it to a pointer will set the bit that represents the memory address where that pointer (the target pointer) is located.
So why should anyone care? For this simple reason:
int a = 0;
int *p = &a;
int *q = p-1;
if (p)
{
// p is valid, p's bit is lit, this code will run.
}
if (q)
{
// the address stored in q is not valid. q's bit is not lit. this will NOT run.
}
What is truly interesting is this.
if (p == NULL)
{
// p is valid. this will NOT run.
}
if (q == NULL)
{
// q is not valid, and therefore treated as NULL, this WILL run.
}
if (!p)
{
// same as before. p is valid, therefore this won't run
}
if (!q)
{
// same as before, q is NOT valid, therefore this WILL run.
}
Its something you have to see to believe. I can't even imagine the housekeeping done to maintain that bit map, especially when copying pointer values or freeing dynamic memory.
Dereferencing a NULL pointer is undefined behavior.
In fact the standard calls this exact situation out in a note (8.3.2/4 "References"):
Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.
As an aside: The one time I'm aware of that a NULL pointer can be "dereferenced" in a well-defined way is as the operand to the sizeof operator, because the operand to sizeof isn't actually evaluated (so the dereference never actually occurs).
Dereferencing a NULL pointer is explicitly undefined behaviour in the C++ standard, so what you see is implementation specific.
Copying from 1.9.4 in the C++0x draft standard (similar to previous standards in this respect):
Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. - end note]