A string literal is an array of characters* (with static storage), which contains all the characters in the literal along with a terminator. The size of an array is the size of the element multiplied by the number of elements in the array.
The literal "" is an array that consists of one char with the value 0. The type is char[1], and sizeof(char) is always one; thereforesizeof(char[1]) is always one.
In C, NULL is implementation-defined, and is often ((void*)0). The size of a void*, on your particular implementation, is 4. It may be a different number depending on the platform you run on. NULL may also expand to an integer of some type of the value 0, and you'd get the size of that instead.
*A literal is not a pointer, arrays are not pointers, pointers do not play a role in this part of the question.
Answer from GManNickG on Stack OverflowA string literal is an array of characters* (with static storage), which contains all the characters in the literal along with a terminator. The size of an array is the size of the element multiplied by the number of elements in the array.
The literal "" is an array that consists of one char with the value 0. The type is char[1], and sizeof(char) is always one; thereforesizeof(char[1]) is always one.
In C, NULL is implementation-defined, and is often ((void*)0). The size of a void*, on your particular implementation, is 4. It may be a different number depending on the platform you run on. NULL may also expand to an integer of some type of the value 0, and you'd get the size of that instead.
*A literal is not a pointer, arrays are not pointers, pointers do not play a role in this part of the question.
The empty string "" has type char[1], or "array 1 of char". It is not a pointer, as most people believe. It can decay into a pointer, so any time a pointer to char is expected, you can use an array of char instead, and the array will decay into a pointer to its first element.
Since sizeof(char) is 1 (by definition), we therefore have sizeof("") is sizeof(char[1]), which is 1*1 = 1.
In C, NULL is an "implementation-defined null pointer constant" (C99 §7.17.3). A "null pointer constant" is defined to be an integer expression with the value 0, or such an expression cast to type void * (C99 §6.3.2.3.3). So the actual value of sizeof(NULL) is implementation-defined: you might get sizeof(int), or you might get sizeof(void*). On 64-bit systems, you often have sizeof(int) == 4 and sizeof(void*) == 8, which means you can't depend on what sizeof(NULL) is.
Also note that most C implementations define NULL as ((void*)0) (though this is not required by the standard), whereas most C++ implementations just define NULL as a plain 0. This means that the value of sizeof(NULL) can and will change depending on if code is compiled as C or as C++ (for example, code in header files shared between C and C++ source files). So do not depend on sizeof(NULL).
Understanding difference between 0 and NULL
c - Sizeof() of pointer pointing to NULL - Stack Overflow
c - sizeof for a null terminated const char* - Stack Overflow
null terminated - Does '\0' Take Up The Size of a Char? - C - Stack Overflow
0 being an int like other integers, sizeof(0) will yield 4 bytes.
sizeof(NULL) will yield 8 bytes. In binary system, it is 8x8=64 bits, all bits with 0.
Pointers have 8 bytes allocated against characters with 1 bytes and integers 4 bytes. Is 8 bytes the maximum bytes for any datatype? I believe so as NULL is set to 8 bytes apparently for that reason to take care NULL denotes 0 for all datatypes.
sizeof is an operator, not a function.
You would be reminded of this if you dropped the pointless parentheses, and just wrote it:
printf("%zu", sizeof *abcp);
This also uses the C99-proper way to format a value of type size_t, which is %zu.
It works since the compiler computes the size at compile-time, without ever following (dereferencing) the pointer of course (since the pointer doesn't yet exist; the program isn't running).
sizeof is not a function and it doesn't evaluate its argument. Instead it deduces the type of *abcp, at compile time, and reports the size of that. Since abcp is a struct abc*, the type of *abcp is struct abc regardless of where abcp points.
sizeof(a) gives you the size of the pointer, not of the array of characters the pointer points to. It's the same as if you had said sizeof(char*).
You need to use strlen() to compute the length of a null-terminated string (note that the length returned does not include the null terminator, so strlen("abcd") is 4, not 5). Or, you can initialize an array with the string literal:
char a[] = "abcd";
size_t sizeof_a = sizeof(a); // sizeof_a is 5, because 'a' is an array not a pointer
The string literal "abcd" is null terminated; all string literals are null terminated.
You get 4 because that's the size of a pointer on your system. If you want to get the length of a nul terminated string, you want the strlen function in the C standard library.
Just to underline something that was pointed out in the comments:
[does]
'\0'take up 1 byte [or does it] take up the size of achar
These are the same thing, because a char is one byte by definition. sizeof(char) == 1 will always be true, no matter what your implementation of C is.
The idiomatic way to write your malloc call is
malloc(1 + 1); /* one character, + terminating NUL */
The only time you should ever write sizeof(char) in your code, is if you need to force an expression to have type size_t, but you can't include stddef.h for some bizarre reason.
(It is possible, although very unlikely, for a char to be bigger than one octet—that is, for it to contain more than eight bits. For instance, a C implementation for the PDP-10 would probably make char contain nine bits, and there have been word-oriented processors where char had to be 16 or 32 bits. On such implementations, sizeof(char) is still 1, and a char is still considered to be the same thing as a "byte", but the macro CHAR_BIT (defined in limits.h) will have a value larger than 8.
(It is not possible for char to contain fewer than eight bits, because a char is required to be able to represent the numeric range −127 ≤ x ≤ +127, which does not fit in seven bits.)
'\0' takes up one byte of space. You probably want to malloc(length_of_string + 1)
So for "abc", you would allocate a total of 4 bytes.
In this Stackoverflow post[1] is stumbled upon a 'trick' to get the size of struct members like so: sizeof(((struct*)0)->member) which I struggle to comprehend what's happening here.
what I understand:
- sizeof calculates the size, as normal
- ->member dereferences as usual
what I don't understand:
- (struct*) 0 is a typecast (?) of a nullptr (?) to address 0 (?)
Can someone dissect this syntax and explain in detail what happens under the hood?
[1] https://stackoverflow.com/a/3553321/18918472
Note: This answer applies to the C language, not C++.
Null Pointers
The integer constant literal 0 has different meanings depending upon the context in which it's used. In all cases, it is still an integer constant with the value 0, it is just described in different ways.
If a pointer is being compared to the constant literal 0, then this is a check to see if the pointer is a null pointer. This 0 is then referred to as a null pointer constant. The C standard defines that 0 cast to the type void * is both a null pointer and a null pointer constant.
Additionally, to help readability, the macro NULL is provided in the header file stddef.h. Depending upon your compiler it might be possible to #undef NULL and redefine it to something wacky.
Therefore, here are some valid ways to check for a null pointer:
if (pointer == NULL)
NULL is defined to compare equal to a null pointer. It is implementation defined what the actual definition of NULL is, as long as it is a valid null pointer constant.
if (pointer == 0)
0 is another representation of the null pointer constant.
if (!pointer)
This if statement implicitly checks "is not 0", so we reverse that to mean "is 0".
The following are INVALID ways to check for a null pointer:
int mynull = 0;
<some code>
if (pointer == mynull)
To the compiler this is not a check for a null pointer, but an equality check on two variables. This might work if mynull never changes in the code and the compiler optimizations constant fold the 0 into the if statement, but this is not guaranteed and the compiler has to produce at least one diagnostic message (warning or error) according to the C Standard.
Note that the value of a null pointer in the C language does not matter on the underlying architecture. If the underlying architecture has a null pointer value defined as address 0xDEADBEEF, then it is up to the compiler to sort this mess out.
As such, even on this funny architecture, the following ways are still valid ways to check for a null pointer:
if (!pointer)
if (pointer == NULL)
if (pointer == 0)
The following are INVALID ways to check for a null pointer:
#define MYNULL (void *) 0xDEADBEEF
if (pointer == MYNULL)
if (pointer == 0xDEADBEEF)
as these are seen by a compiler as normal comparisons.
Null Characters
'\0' is defined to be a null character - that is a character with all bits set to zero. '\0' is (like all character literals) an integer constant, in this case with the value zero. So '\0' is completely equivalent to an unadorned 0 integer constant - the only difference is in the intent that it conveys to a human reader ("I'm using this as a null character.").
'\0' has nothing to do with pointers. However, you may see something similar to this code:
if (!*char_pointer)
checks if the char pointer is pointing at a null character.
if (*char_pointer)
checks if the char pointer is pointing at a non-null character.
Don't get these confused with null pointers. Just because the bit representation is the same, and this allows for some convenient cross over cases, they are not really the same thing.
References
See Question 5.3 of the comp.lang.c FAQ for more. See this pdf for the C standard. Check out sections 6.3.2.3 Pointers, paragraph 3.
It appears that a number of people misunderstand what the differences between NULL, '\0' and 0 are. So, to explain, and in attempt to avoid repeating things said earlier:
A constant expression of type int with the value 0, or an expression of this type, cast to type void * is a null pointer constant, which if converted to a pointer becomes a null pointer. It is guaranteed by the standard to compare unequal to any pointer to any object or function.
NULL is a macro, defined in as a null pointer constant.
\0 is a construction used to represent the null character, used to terminate a string.
A null character is a byte which has all its bits set to 0.
It doesn't.
The string terminator is a byte containing all 0 bits.
The unsigned int is two or four bytes (depending on your environment) each containing all 0 bits.
The two items are stored at different addresses. Your compiled code performs operations suitable for strings on the former location, and operations suitable for unsigned binary numbers on the latter. (Unless you have either a bug in your code, or some dangerously clever code!)
But all of these bytes look the same to the CPU. Data in memory (in most currently-common instruction set architectures) doesn't have any type associated with it. That's an abstraction that exists only in the source code and means something only to the compiler.
Edit-added: As an example: It is perfectly possible, even common, to perform arithmetic on the bytes that make up a string. If you have a string of 8-bit ASCII characters, you can convert the letters in the string between upper and lower case by adding or subtracting 32 (decimal). Or if you are translating to another character code you can use their values as indices into an array whose elements provide the equivalent bit coding in the other code.
To the CPU the chars are really extra-short integers. (eight bits each instead of 16, 32, or 64.) To us humans their values happen to be associated with readable characters, but the CPU has no idea of that. It also doesn't know anything about the "C" convention of "null byte ends a string", either (and as many have noted in other answers and comments, there are programming environments in which that convention isn't used at all).
To be sure, there are some instructions in x86/x64 that tend to be used a lot with strings - the REP prefix, for example - but you can just as well use them on an array of integers, if they achieve the desired result.
In short there is no difference (except that an int is 2 or 4 bytes wide and a char just 1).
The thing is that all modern libaries either use the null terminator technique or store the length of a string. And in both cases the program/computer knows it reached the end of a string when it either read a null character or it has read as many characters as the size tells it to.
Issues with this start when the null terminator is missing or the length is wrong as then the program starts reading from memory it isn't supposed to.