To your first question:
I would go with Paul R's comment and terminate with '\0'. But the value 0 itself works also fine. A matter of taste. But don't use the MACRO NULLwhich is meant for pointers.
To your second question:
If your string is not terminated with\0, it might still print the expected output because following your string is a non-printable character in your memory. This is a really nasty bug though, since it might blow up when you might not expect it. Always terminate a string with '\0'.
To your first question:
I would go with Paul R's comment and terminate with '\0'. But the value 0 itself works also fine. A matter of taste. But don't use the MACRO NULLwhich is meant for pointers.
To your second question:
If your string is not terminated with\0, it might still print the expected output because following your string is a non-printable character in your memory. This is a really nasty bug though, since it might blow up when you might not expect it. Always terminate a string with '\0'.
From the comp.lang.c FAQ: http://c-faq.com/null/varieties.html
In essence: NULL (the preprocessor macro for the null pointer) is not the same as NUL (the null character).
String literals like "Hello World!" are null-terminated, but char arrays are not automatically null terminated.
The general principle I've always taken is to be extra cautious and assign '\0' to the the end of the string unless that causes a performance problem. In those cases, I'm extra careful about which library functions I use.
Always be careful to allocate enough memory with strings, compare the effects of the following lines of code:
char s1[3] = "abc";
char s2[4] = "abc";
char s3[] = "abc";
All three are considered legal lines of code (http://c-faq.com/ansi/nonstrings.htmlhttp://c-faq.com/ansi/nonstrings.html), but in the first case, there isn't enough memory for the fourth null-terminated character. s1 will not behave like a normal string, but s2 and s3 will. The compiler automatically count for s3, and you get four bytes of allocated memory. If you try to write
s1[3] = '\0';
that's undefined behavior and you're writing to memory that doesn't belong to s1, and would have weird effects, maybe even disrupting malloc's backend information, making it hard to free memory.
Videos
The things that are called "C strings" will be null-terminated on any platform. That's how the standard C library functions determine the end of a string.
Within the C language, there's nothing stopping you from having an array of characters that doesn't end in a null. However you will have to use some other method to avoid running off the end of a string.
Determination of the terminating character is up to the compiler for literals and the implementation of the standard library for strings in general. It isn't determined by the operating system.
The convention of NUL termination goes back to pre-standard C, and in 30+ years, I can't say I've run into an environment that does anything else. This behavior was codified in C89 and continues to be part of the C language standard (link is to a draft of C99):
- Section 6.4.5 sets the stage for
NUL-terminated strings by requiring that aNULbe appended to string literals. - Section 7.1.1 brings that to the functions in the standard library by defining a string as "a contiguous sequence of characters terminated by and including the first null character."
There's no reason why someone couldn't write functions that handle strings terminated by some other character, but there's also no reason to buck the established standard in most cases unless your goal is giving programmers fits. :-)
To your first question:
I would go with Paul R's comment and terminate with '\0'. But the value 0 itself works also fine. A matter of taste. But don't use the MACRO NULLwhich is meant for pointers.
To your second question:
If your string is not terminated with\0, it might still print the expected output because following your string is a non-printable character in your memory. This is a really nasty bug though, since it might blow up when you might not expect it. Always terminate a string with '\0'.
Hello everyone!
In C, strings (character arrays) are terminated by null character '\0' - character with value zero.
In ASCII, the NUL control code has value 0 (0x00). Now, if we were working in different character set (say the machine's character set wouldn't be ASCII but different one), should the strings be terminated by NUL in that character set, or by a character whose value is zero?
For example, if the machine's character set would be UTF-16, the in C, byte would be 16bits and strings would be terminated by \0 character with value 0x00 00, which is also NUL in UTF-16.
But, what if the machine's character set would be modified UTF-8 (or UTF-7, ...). Then, according to Wikipedia, the null character is encoded as two bytes 0xC0, 0x80. How would be strings terminated in that case? By the byte with value 0 or by the null character.
I guess my question could be rephrased as: Are null terminated strings terminated by the NUL character (which in that character set might be represented by a nonzero value) or by a character whose value is zero (which in that character set might not represent the NUL character).
Thank you all very much and I'm sorry for all mistakes and errors as english is not my first language.
Thanks again.
An option missing from the question is fat pointers ─ the type &str in Rust is an example of this. The length is not stored on the heap as a prefix to the string data, instead it is stored alongside the pointer, so that a reference to a string takes two words (length and pointer) instead of just one for a pointer.
This means that if there are multiple references to the same string, then the length data is duplicated compared to a length-prefixed string, which would only store the length once, where the string data is. But the upside is that a fat pointer can reference a substring without duplicating the string data on the heap.
In the diagram above (from the official Rust book), s is a String so it has a fat pointer to the whole string allocation (plus a capacity field, since it's a growable string), while world is a shared reference (i.e. a fat pointer) to a substring. This sharing would not be possible with length-prefixing, and would be possible with null-termination for substrings at the end of the string but not otherwise.
Length-prefixed strings have the advantage of being able to find their length in O(1) time rather than O(n) time. This means you can find the end of the string more easily with the length prefix. They are also less error prone to use since you don't have to deal with forgetting to null terminate a string.
One disadvantage to length prefixed strings is that they require more space. In addition, you are limited in what the max size of the string can be based on how many bytes are used to store the length.
If it's not null-terminated, then it's not a C string, and you can't use functions like strlen - they will march off the end of the array, causing undefined behaviour. You'll need to keep track of the length some other way.
You can still print a non-terminated character array with printf, as long as you give the length:
printf("str is %.3s",s2);
printf("str is %.*s",s2_length,s2);
or, if you have access to the array itself, not a pointer:
printf("str is %.*s", (int)(sizeof s2), s2);
You've also tagged the question C++: in that language, you usually want to avoid all this error-prone malarkey and use std::string instead.
A "C string" is, by definition, null-terminated. The name comes from the C convention of having null-terminated strings. If you want something else, it's not a C string.
So if you have a string that is not null-terminated, you cannot use the C string manipulation routines on it. You can't use strlen, strcpy or strcat. Basically, any function that takes a char* but no separate length is not usable.
Then what can you do? If you have a string that is not null-terminated, you will have the length separately. (If you don't, you're screwed. You need some way to find the length, either by a terminator or by storing it separately.) What you can do is allocate a buffer of the appropriate size, copy the string over, and append a null. Or you can write your own set of string manipulation functions that work with pointer and length. In C++ you can use std::string's constructor that takes a char* and a length; that one doesn't need the terminator.
I am confused with this topic, my teacher said that strings in C are null terminated automatically then if I manually allocate 5 bytes for the string "word", should I add "\0" at the end or not?
Thank you for answering my question!