I saw this code in A Tour of C++, but with a bit modify for illustration:
#include <iostream>
int main() {
char s = 'a';
char *p = &s;
while (*p) {
std::cout << *p;
p++;
}
p = nullptr;
//std::cout << (*p == true);
*p == true;
}
I do not know how does while (*p) { end while I do not know what happens when p is nullptr. And std::cout << (*p == true) will induce segment fault but *p == true does not.
A NULL pointer points to memory that doesn't exist. This may be address 0x00000000 or any other implementation-defined value (as long as it can never be a real address). Dereferencing it means trying to access whatever is pointed to by the pointer. The * operator is the dereferencing operator:
Copyint a, b, c; // some integers
int *pi; // a pointer to an integer
a = 5;
pi = &a; // pi points to a
b = *pi; // b is now 5
pi = NULL;
c = *pi; // this is a NULL pointer dereference
This is exactly the same thing as a NullReferenceException in C#, except that pointers in C can point to any data object, even elements inside an array.
Dereferencing just means accessing the memory value at a given address. So when you have a pointer to something, to dereference the pointer means to read or write the data that the pointer points to.
In C, the unary * operator is the dereferencing operator. If x is a pointer, then *x is what x points to. The unary & operator is the address-of operator. If x is anything, then &x is the address at which x is stored in memory. The * and & operators are inverses of each other: if x is any data, and y is any pointer, then these equations are always true:
Copy*(&x) == x
&(*y) == y
A null pointer is a pointer that does not point to any valid data (but it is not the only such pointer). The C standard says that it is undefined behavior to dereference a null pointer. This means that absolutely anything could happen: the program could crash, it could continue working silently, or it could erase your hard drive (although that's rather unlikely).
In most implementations, you will get a "segmentation fault" or "access violation" if you try to do so, which will almost always result in your program being terminated by the operating system. Here's one way a null pointer could be dereferenced:
Copyint *x = NULL; // x is a null pointer
int y = *x; // CRASH: dereference x, trying to read it
*x = 0; // CRASH: dereference x, trying to write it
And yes, dereferencing a null pointer is pretty much exactly like a NullReferenceException in C# (or a NullPointerException in Java), except that the langauge standard is a little more helpful here. In C#, dereferencing a null reference has well-defined behavior: it always throws a NullReferenceException. There's no way that your program could continue working silently or erase your hard drive like in C (unless there's a bug in the language runtime, but again that's incredibly unlikely as well).
What happens in OS when we dereference a NULL pointer in C? - Stack Overflow
what happened when dereference a null pointer
In C++, does dereferencing a nullptr itself cause undefined behaviour, or is it the acting upon the dereferenced pointer which is undefined? - Software Engineering Stack Exchange
You're dereferencing a null pointer!
Videos
Short answer: it depends on a lot of factors, including the compiler, processor architecture, specific processor model, and the OS, among others.
Long answer (x86 and x86-64): Let's go down to the lowest level: the CPU. On x86 and x86-64, that code will typically compile into an instruction or instruction sequence like this:
movl $10, 0x00000000
Which says to "store the constant integer 10 at virtual memory address 0". The Intelยฎ 64 and IA-32 Architectures Software Developer Manuals describe in detail what happens when this instruction gets executed, so I'm going to summarize it for you.
The CPU can operate in several different modes, several of which are for backwards compatibility with much older CPUs. Modern operating systems run user-level code in a mode called protected mode, which uses paging to convert virtual addresses into physical addresses.
For each process, the OS keeps a page table which dictates how the addresses are mapped. The page table is stored in memory in a specific format (and protected so that they can not be modified by the user code) that the CPU understands. For every memory access that happens, the CPU translates it according to the page table. If the translation succeeds, it performs the corresponding read/write to the physical memory location.
The interesting things happen when the address translation fails. Not all addresses are valid, and if any memory access generates an invalid address, the processor raises a page fault exception. This triggers a transition from user mode (aka current privilege level (CPL) 3 on x86/x86-64) into kernel mode (aka CPL 0) to a specific location in the kernel's code, as defined by the interrupt descriptor table (IDT).
The kernel regains control and, based on the information from the exception and the process's page table, figures out what happened. In this case, it realizes that the user-level process accessed an invalid memory location, and then it reacts accordingly. On Windows, it will invoke structured exception handling to allow the user code to handle the exception. On POSIX systems, the OS will deliver a SIGSEGV signal to the process.
In other cases, the OS will handle the page fault internally and restart the process from its current location as if nothing happened. For example, guard pages are placed at the bottom of the stack to allow the stack to grow on demand up to a limit, instead of preallocating a large amount of memory for the stack. Similar mechanisms are used for achieving copy-on-write memory.
In modern OSes, the page tables are usually set up to make the address 0 an invalid virtual address. But sometimes it's possible to change that, e.g. on Linux by writing 0 to the pseudofile /proc/sys/vm/mmap_min_addr, after which it's possible to use mmap(2) to map the virtual address 0. In that case, dereferencing a null pointer would not cause a page fault.
The above discussion is all about what happens when the original code is running in user space. But this could also happen inside the kernel. The kernel can (and is certainly much more likely than user code to) map the virtual address 0, so such a memory access would be normal. But if it's not mapped, then what happens then is largely similar: the CPU raises a page fault error which traps into a predefined point at the kernel, the kernel examines what happened, and reacts accordingly. If the kernel can't recover from the exception, it will typically panic in some fashion (kernel panic, kernel oops, or a BSOD on Windows, e.g.) by printing out some debug information to the console or serial port and then halting.
See also Much ado about NULL: Exploiting a kernel NULL dereference for an example of how an attacker could exploit a null pointer dereference bug from inside the kernel in order to gain root privileges on a Linux machine.
As a side note, just to compel the differences in architectures, a certain OS developed and maintained by a company known for their three-letter acronym name and often referred to as a large primary color has a most-fasicnating NULL determination.
They utilize a 128-bit linear address space for ALL data (memory AND disk) in one giant "thing". In accordance with their OS, a "valid" pointer must be placed on a 128-bit boundary within that address space. This, btw, causes fascinating side effects for structs, packed or not, that house pointers. Anyway, tucked away in a per-process dedicated page is a bitmap that assigns one bit for every valid location in a process address space where a valid pointer can lay. ALL opcodes on their hardware and OS that can generate and return a valid memory address and assign it to a pointer will set the bit that represents the memory address where that pointer (the target pointer) is located.
So why should anyone care? For this simple reason:
int a = 0;
int *p = &a;
int *q = p-1;
if (p)
{
// p is valid, p's bit is lit, this code will run.
}
if (q)
{
// the address stored in q is not valid. q's bit is not lit. this will NOT run.
}
What is truly interesting is this.
if (p == NULL)
{
// p is valid. this will NOT run.
}
if (q == NULL)
{
// q is not valid, and therefore treated as NULL, this WILL run.
}
if (!p)
{
// same as before. p is valid, therefore this won't run
}
if (!q)
{
// same as before, q is NOT valid, therefore this WILL run.
}
Its something you have to see to believe. I can't even imagine the housekeeping done to maintain that bit map, especially when copying pointer values or freeing dynamic memory.
i tried out some code:
fn main() {
// Create a const NULL pointer
let nullp: *const u128 = ptr::null();
println!("size of null pointer:{}",mem::size_of_val(&nullp));
unsafe{
println!("null pointer is pointing to address {:p}", nullp);
println!("value the null pointer pointed is {}", *nullp);
}
}
the output:
size of null pointer:8
null pointer is pointing to address 0x0
Segmentation fault (core dumped)
i wonder what happened when i try to deref a null pointer pointing to virtual memory address 0x0, at the low level?
and whats the actual memory layout of a null pointer?
It is not the compiler that causes your program to crash on dereferencing a null pointer. The problem is that the pointer is pointing to memory that it is illegal to reference, and the operating system kills your program for invalid behavior.
Trying to trick the compiler by obfuscating that it is a null pointer won't work, because it isn't the compiler that detects it.
There is no legitimate reason to dereference a null pointer unless you on a rare system that maps page zero (or you intend your program to crash). It is generally accepted that zeroing a pointer is a good way to mark it as invalid and dereferencing an invalid pointer is a bug. Modern operating systems do not give you a page of memory at that address specifically to make debugging invalid pointers easier.
I would not even call your program crashing from this to be undefined behavior. Dereferencing a pointer with random data in it would give you undefined behavior. Dereferencing a pointer that contains an address not assigned to your program is quite well defined in demand paged memory protected operating systems, and the behavior defined by the operating system is for your program to crash. From the language's perspective, it is still undefined behavior, because what happens is not defined in the scope of the language. Since this behavior is undefined by the language, the compiler can do nothing about it and should do nothing about it.
The exception to this is systems that have no memory protection and systems that intentionally map page zero. Some older systems do this, but most of the modern systems that do are microcontrollers, some of which might even have memory mapped I/O or some other special purpose memory in page zero.
Since null pointer dereferences are typically bugs, it is unlikely a compiler would bother to optimize away null pointer dereferences or put guard code around a possible one, as this would not improve code performance. If they did even bother to detect this, they would do it to emit a warning to assist you in debugging, similar to the "code not reachable" warning. The only reason for the compiler to generate different code around one would be if it knew what you were trying to do.
You seem to have a misunderstanding of what Undefined Behavior means.
Undefined Behavior is not something that is "caused" by your code. It is not something that happens. It is something that is.
If you have some piece of code somewhere that dereferences a null pointer, that is Undefined Behavior. UB gives the compiler a lot of leeway.
The way this is usually phrased is that the compiler is allowed to do anything. It is allowed to compile code that dereferences a null pointer into code that formats your hard disk. It is allowed to compile it into code that crashes. It is allowed to compile it into code that does random things. It is even allowed to compile it into code that doesn't crash.
And until a couple of years ago, that's mostly what compilers did. However, that isn't even the most dangerous part.
There is one thing the compiler is also allowed to do: because you are not allowed to write code that exhibits UB, the compiler is allowed to assume that there will be no UB, when optimizing your code. And because of the complex optimizations that modern compilers do, this can have very weird consequences.
Let's say you have an if (userId == 0) statement, where you have UB in the else part. Since you are not allowed to write code that exhibits UB, the compiler is allowed to assume that the else branch will never be taken. This means that the compiler is allowed to assume that userId will always be 0, i.e. it is allowed to assume that the user is always root! And based on this assumption, it is allowed to optimize away other checks as well, opening you up to huge security holes.
This can lead to very extreme, or even worse, very subtle changes to the behavior of program parts far away from the place of the UB.