Hello everyone!
In C, strings (character arrays) are terminated by null character '\0' - character with value zero.
In ASCII, the NUL control code has value 0 (0x00). Now, if we were working in different character set (say the machine's character set wouldn't be ASCII but different one), should the strings be terminated by NUL in that character set, or by a character whose value is zero?
For example, if the machine's character set would be UTF-16, the in C, byte would be 16bits and strings would be terminated by \0 character with value 0x00 00, which is also NUL in UTF-16.
But, what if the machine's character set would be modified UTF-8 (or UTF-7, ...). Then, according to Wikipedia, the null character is encoded as two bytes 0xC0, 0x80. How would be strings terminated in that case? By the byte with value 0 or by the null character.
I guess my question could be rephrased as: Are null terminated strings terminated by the NUL character (which in that character set might be represented by a nonzero value) or by a character whose value is zero (which in that character set might not represent the NUL character).
Thank you all very much and I'm sorry for all mistakes and errors as english is not my first language.
Thanks again.
Null terminated string in C - Stack Overflow
Structure null character(\0) issue with strings in C programming Language - Stack Overflow
Advanced Mac Substitute is an API-level reimplementation of 1980s-era Mac OS
scanf - Is the null character a whitespace in C? - Stack Overflow
What is an invisible character?
Can I type an invisible character with my keyboard?
Is an invisible character the same as a space?
Videos
String literals like "Hello World!" are null-terminated, but char arrays are not automatically null terminated.
The general principle I've always taken is to be extra cautious and assign '\0' to the the end of the string unless that causes a performance problem. In those cases, I'm extra careful about which library functions I use.
Always be careful to allocate enough memory with strings, compare the effects of the following lines of code:
char s1[3] = "abc";
char s2[4] = "abc";
char s3[] = "abc";
All three are considered legal lines of code (http://c-faq.com/ansi/nonstrings.htmlhttp://c-faq.com/ansi/nonstrings.html), but in the first case, there isn't enough memory for the fourth null-terminated character. s1 will not behave like a normal string, but s2 and s3 will. The compiler automatically count for s3, and you get four bytes of allocated memory. If you try to write
s1[3] = '\0';
that's undefined behavior and you're writing to memory that doesn't belong to s1, and would have weird effects, maybe even disrupting malloc's backend information, making it hard to free memory.
However, the
fscanfat the end of the code only matches up to but not including the first null character.
That is incorrect, as is demonstrated by the fact that the output of the following program is “"Hello", then "world".” fscanf reads the entire line up to the new-line character; it does not stop at the null character.
#include <string.h>
#include <stdio.h>
#include <ctype.h>
int main(void)
{
char inBuf[40] = {0};
char outBuf[] = "Hello\0world\n";
FILE *fp = fopen("MyFile.txt", "w+b");
fwrite(outBuf, 1, sizeof(outBuf), fp);
fflush(fp);
rewind(fp);
fscanf(fp, "%s", inBuf);
printf("\"%s\", then \"%s\".\n", inBuf, inBuf + strlen(inBuf) + 1);
}
the fscanf at the end of the code only matches up to but not including the first null character.
That is incorrect.
#include <stdio.h>
int main( void ) {
char sp[] = "abc def\n";
char nul[] = "abc\0def\n";
char buf1[10];
char buf2[10];
printf( "%d\n", sscanf( sp, "%s %s", buf1, buf2 ) );
printf( "%d\n", sscanf( nul, "%s %s", buf1, buf2 ) );
}
2 // First `%s` stopped at space, leaving characters for second %s.
1 // First `%s` stopped at LF, leaving nothing for second %s.
As you can see, it read to the end of the string rather than stopping at the NUL.
(You could also use ftell after your fscanf to get the number of bytes read.)
You did not indicate how you came to the conclusion that fscanf stopped at the NUL, but I presume you used something like printf( "%s\n", inBuf );. That stops at the first NUL. Not the reading.
You can use c[i]= '\0' or simply c[i] = (char) 0.
The null/empty char is simply a value of zero, but can also be represented as a character with an escaped zero.
You can't store "no character" in a character - it doesn't make sense.
As an alternative you could store a character that has a special meaning to you - e.g. null char '\0' - and treat this specially.