I saw this code in A Tour of C++, but with a bit modify for illustration:
#include <iostream>
int main() {
char s = 'a';
char *p = &s;
while (*p) {
std::cout << *p;
p++;
}
p = nullptr;
//std::cout << (*p == true);
*p == true;
}
I do not know how does while (*p) { end while I do not know what happens when p is nullptr. And std::cout << (*p == true) will induce segment fault but *p == true does not.
It is not the compiler that causes your program to crash on dereferencing a null pointer. The problem is that the pointer is pointing to memory that it is illegal to reference, and the operating system kills your program for invalid behavior.
Trying to trick the compiler by obfuscating that it is a null pointer won't work, because it isn't the compiler that detects it.
There is no legitimate reason to dereference a null pointer unless you on a rare system that maps page zero (or you intend your program to crash). It is generally accepted that zeroing a pointer is a good way to mark it as invalid and dereferencing an invalid pointer is a bug. Modern operating systems do not give you a page of memory at that address specifically to make debugging invalid pointers easier.
I would not even call your program crashing from this to be undefined behavior. Dereferencing a pointer with random data in it would give you undefined behavior. Dereferencing a pointer that contains an address not assigned to your program is quite well defined in demand paged memory protected operating systems, and the behavior defined by the operating system is for your program to crash. From the language's perspective, it is still undefined behavior, because what happens is not defined in the scope of the language. Since this behavior is undefined by the language, the compiler can do nothing about it and should do nothing about it.
The exception to this is systems that have no memory protection and systems that intentionally map page zero. Some older systems do this, but most of the modern systems that do are microcontrollers, some of which might even have memory mapped I/O or some other special purpose memory in page zero.
Since null pointer dereferences are typically bugs, it is unlikely a compiler would bother to optimize away null pointer dereferences or put guard code around a possible one, as this would not improve code performance. If they did even bother to detect this, they would do it to emit a warning to assist you in debugging, similar to the "code not reachable" warning. The only reason for the compiler to generate different code around one would be if it knew what you were trying to do.
You seem to have a misunderstanding of what Undefined Behavior means.
Undefined Behavior is not something that is "caused" by your code. It is not something that happens. It is something that is.
If you have some piece of code somewhere that dereferences a null pointer, that is Undefined Behavior. UB gives the compiler a lot of leeway.
The way this is usually phrased is that the compiler is allowed to do anything. It is allowed to compile code that dereferences a null pointer into code that formats your hard disk. It is allowed to compile it into code that crashes. It is allowed to compile it into code that does random things. It is even allowed to compile it into code that doesn't crash.
And until a couple of years ago, that's mostly what compilers did. However, that isn't even the most dangerous part.
There is one thing the compiler is also allowed to do: because you are not allowed to write code that exhibits UB, the compiler is allowed to assume that there will be no UB, when optimizing your code. And because of the complex optimizations that modern compilers do, this can have very weird consequences.
Let's say you have an if (userId == 0) statement, where you have UB in the else part. Since you are not allowed to write code that exhibits UB, the compiler is allowed to assume that the else branch will never be taken. This means that the compiler is allowed to assume that userId will always be 0, i.e. it is allowed to assume that the user is always root! And based on this assumption, it is allowed to optimize away other checks as well, opening you up to huge security holes.
This can lead to very extreme, or even worse, very subtle changes to the behavior of program parts far away from the place of the UB.
c++ - Is dereferencing a pointer that's equal to nullptr undefined behavior by the standard? - Stack Overflow
Why isn't a nullptr dereference an exception?
Null Pointer Dereferencing Causes Undefined Behavior
Sound deref of a nullptr
Videos
As you quote C, dereferencing a null pointer is clearly undefined behavior from this Standard quote (emphasis mine):
(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.102)"
102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."
Exact same quote in C99 and similar in C89 / C90.
C++
dcl.ref/5.
There shall be no references to references, no arrays of references, and no pointers to references. The declaration of a reference shall contain an initializer (8.5.3) except when the declaration contains an explicit extern specifier (7.1.1), is a class member (9.2) declaration within a class definition, or is the declaration of a parameter or a return type (8.3.5); see 3.1. A reference shall be initialized to refer to a valid object or function. [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by indirection through a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. — end note ]
The note is of interest, as it explicitly says dereferencing a null pointer is undefined.
I'm sure it says it somewhere else in a more relevant context, but this is good enough.
Just watched this video: https://www.youtube.com/watch?v=ROJ3PdDmirY which explains how Google manages to take down the internet (or at least: many sites) through a null pointer dereference.
Given that C++ has "nullptr" and that you can initialize stuff with it, and that you can (probably) statically check that variables / class members are initialized and balk if not, why isn't derefencing nullptr an exception? That would be the missing bit towards another bit of security in C++. So, why?
TL;DR &(*(char*)0) is well defined.
The C++ standard doesn't say that indirection of null pointer by itself has UB. Current standard draft, [expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”. [snip]
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. [snip]
There is no UB unless the lvalue of the indirection expression is converted to an rvalue.
The C standard is much more explicit. C11 standard draft §6.5.3.2
- The unary & operator yields the address of its operand. If the operand has type "type", the result has type "pointer to type". If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.
If it is also an undefined behavior, how does
offsetofwork?
Prefer using the standard offsetof macro. Home-grown versions result in compiler warnings. Moreover:
offsetofis required to work as specified above, even if unaryoperator&is overloaded for any of the types involved. This cannot be implemented in standard C++ and requires compiler support.
offsetof is a built-in function in gcc.
I thought that any dereferencing for a null pointer would result in an exception.
No. Dereferencing a null pointer is undefinded behavior in C++.
C++ is not Java. C++ does have exceptions, but they are only for exceptional casses, not used all over the place (as in Java). You are supposed to know that dereferencing a null pointer is not allowed, and a compiler assumes that it never happens in correct code. If it still happens your code is invalid.
Read about undefined behavior. It is essential to know about it when you want to do anything serious in C++.
What are the rules for a valid dereferencing of a null pointer?
The rule is: You shall not do it. When you do it your code is ill-formed no diagnostics required. This is a different way to say: Your code has undefined behavior. The compiler is not reuqired to issue an error or warning and when you ask a compiler to compile your wrong code the result can be anything.
What are the rules for a valid dereferencing of a null pointer [in C++]?
C++ standard is actually somewhat non-specific about whether indirecting through a null pointer is valid by itself or not. It is not disallowed explicitly. The standard used to use "dereferencing the null pointer" as an example of undefined behaviour, but this example has since been removed.
There is an active core language issue CWG-232 titled "Is indirection through a null pointer undefined behavior?" where this is discussed. It has a proposed change of wording to explicitly allow indirection through a null pointer, and even to allow "empty" references in the language. The issue was created 20 years ago, has last been updated 15 years ago, when the proposed wording was found insufficient.
Here are a few examples:
X* ptr = nullptr;
*ptr;
Above, the result of the indirection is discarded. This is a case where standard is not explicit about its validity one way or another. The proposed wording would have allowed this explicitly. This is also a fairly pointless operation.
X& x = *ptr;
X* ptr2 = &x; // ptr2 == nullptr?
Above, the result of indirection through null is bound to an lvalue. This is explicitly undefined behaviour now, but the proposed wording would have allowed this.
ptr->member_function();
Above, the result of indirection goes through lvalue-to-rvalue conversion. This has undefined behaviour regardless of what the function does, and would remain undefined in the proposed resolution of CWG-232. Same applies to all of your examples.
One consequence of this is that return this == nullptr; can be optimised to return false; because this can never be null in a well defined program.