control character whose bits are all 0
Hello everyone!
In C, strings (character arrays) are terminated by null character '\0' - character with value zero.
In ASCII, the NUL control code has value 0 (0x00). Now, if we were working in different character set (say the machine's character set wouldn't be ASCII but different one), should the strings be terminated by NUL in that character set, or by a character whose value is zero?
For example, if the machine's character set would be UTF-16, the in C, byte would be 16bits and strings would be terminated by \0 character with value 0x00 00, which is also NUL in UTF-16.
But, what if the machine's character set would be modified UTF-8 (or UTF-7, ...). Then, according to Wikipedia, the null character is encoded as two bytes 0xC0, 0x80. How would be strings terminated in that case? By the byte with value 0 or by the null character.
I guess my question could be rephrased as: Are null terminated strings terminated by the NUL character (which in that character set might be represented by a nonzero value) or by a character whose value is zero (which in that character set might not represent the NUL character).
Thank you all very much and I'm sorry for all mistakes and errors as english is not my first language.
Thanks again.
Videos
You can use c[i]= '\0' or simply c[i] = (char) 0.
The null/empty char is simply a value of zero, but can also be represented as a character with an escaped zero.
You can't store "no character" in a character - it doesn't make sense.
As an alternative you could store a character that has a special meaning to you - e.g. null char '\0' - and treat this specially.
The things that are called "C strings" will be null-terminated on any platform. That's how the standard C library functions determine the end of a string.
Within the C language, there's nothing stopping you from having an array of characters that doesn't end in a null. However you will have to use some other method to avoid running off the end of a string.
Determination of the terminating character is up to the compiler for literals and the implementation of the standard library for strings in general. It isn't determined by the operating system.
The convention of NUL termination goes back to pre-standard C, and in 30+ years, I can't say I've run into an environment that does anything else. This behavior was codified in C89 and continues to be part of the C language standard (link is to a draft of C99):
- Section 6.4.5 sets the stage for
NUL-terminated strings by requiring that aNULbe appended to string literals. - Section 7.1.1 brings that to the functions in the standard library by defining a string as "a contiguous sequence of characters terminated by and including the first null character."
There's no reason why someone couldn't write functions that handle strings terminated by some other character, but there's also no reason to buck the established standard in most cases unless your goal is giving programmers fits. :-)
You have used '/0' instead of '\0'. This is incorrect: the '\0' is a null character, while '/0' is a multicharacter literal.
Moreover, in C it is OK to skip a zero in your condition:
while (*(forward++)) {
...
}
is a valid way to check character, integer, pointer, etc. for being zero.
The null character is '\0', not '/0'.
while (*(forward++) != '\0')