special character in computing signifying the end of a line of text
They're different characters. \r is carriage return, and \n is line feed.
On "old" printers, \r sent the print head back to the start of the line, and \n advanced the paper by one line. Both were therefore necessary to start printing on the next line.
Obviously that's somewhat irrelevant now, although depending on the console you may still be able to use \r to move to the start of the line and overwrite the existing text.
More importantly, Unix tends to use \n as a line separator; Windows tends to use \r\n as a line separator and Macs (up to OS 9) used to use \r as the line separator. (Mac OS X is Unix-y, so uses \n instead; there may be some compatibility situations where \r is used instead though.)
For more information, see the Wikipedia newline article.
EDIT: This is language-sensitive. In C# and Java, for example, \n always means Unicode U+000A, which is defined as line feed. In C and C++ the water is somewhat muddier, as the meaning is platform-specific. See comments for details.
In C and C++, \n is a concept, \r is a character, and \r\n is (almost always) a portability bug.
Think of an old teletype. The print head is positioned on some line and in some column. When you send a printable character to the teletype, it prints the character at the current position and moves the head to the next column. (This is conceptually the same as a typewriter, except that typewriters typically moved the paper with respect to the print head.)
When you wanted to finish the current line and start on the next line, you had to do two separate steps:
- move the print head back to the beginning of the line, then
- move it down to the next line.
ASCII encodes these actions as two distinct control characters:
\x0D(CR) moves the print head back to the beginning of the line. (Unicode encodes this asU+000D CARRIAGE RETURN.)\x0A(LF) moves the print head down to the next line. (Unicode encodes this asU+000A LINE FEED.)
In the days of teletypes and early technology printers, people actually took advantage of the fact that these were two separate operations. By sending a CR without following it by a LF, you could print over the line you already printed. This allowed effects like accents, bold type, and underlining. Some systems overprinted several times to prevent passwords from being visible in hardcopy. On early serial CRT terminals, CR was one of the ways to control the cursor position in order to update text already on the screen.
But most of the time, you actually just wanted to go to the next line. Rather than requiring the pair of control characters, some systems allowed just one or the other. For example:
- Unix variants (including modern versions of Mac) use just a LF character to indicate a newline.
- Old (pre-OSX) Macintosh files used just a CR character to indicate a newline.
- VMS, CP/M, DOS, Windows, and many network protocols still expect both: CR LF.
- Old IBM systems that used EBCDIC standardized on NL--a character that doesn't even exist in the ASCII character set. In Unicode, NL is
U+0085 NEXT LINE, but the actual EBCDIC value is0x15.
Why did different systems choose different methods? Simply because there was no universal standard. Where your keyboard probably says "Enter", older keyboards used to say "Return", which was short for Carriage Return. In fact, on a serial terminal, pressing Return actually sends the CR character. If you were writing a text editor, it would be tempting to just use that character as it came in from the terminal. Perhaps that's why the older Macs used just CR.
Now that we have standards, there are more ways to represent line breaks. Although extremely rare in the wild, Unicode has new characters like:
U+2028 LINE SEPARATORU+2029 PARAGRAPH SEPARATOR
Even before Unicode came along, programmers wanted simple ways to represent some of the most useful control codes without worrying about the underlying character set. C has several escape sequences for representing control codes:
\a(for alert) which rings the teletype bell or makes the terminal beep\f(for form feed) which moves to the beginning of the next page\t(for tab) which moves the print head to the next horizontal tab position
(This list is intentionally incomplete.)
This mapping happens at compile-time--the compiler sees \a and puts whatever magic value is used to ring the bell.
Notice that most of these mnemonics have direct correlations to ASCII control codes. For example, \a would map to 0x07 BEL. A compiler could be written for a system that used something other than ASCII for the host character set (e.g., EBCDIC). Most of the control codes that had specific mnemonics could be mapped to control codes in other character sets.
Huzzah! Portability!
Well, almost. In C, I could write printf("\aHello, World!"); which rings the bell (or beeps) and outputs a message. But if I wanted to then print something on the next line, I'd still need to know what the host platform requires to move to the next line of output. CR LF? CR? LF? NL? Something else? So much for portability.
C has two modes for I/O: binary and text. In binary mode, whatever data is sent gets transmitted as-is. But in text mode, there's a run-time translation that converts a special character to whatever the host platform needs for a new line (and vice versa).
Great, so what's the special character?
Well, that's implementation dependent, too, but there's an implementation-independent way to specify it: \n. It's typically called the "newline character".
This is a subtle but important point: \n is mapped at compile time to an implementation-defined character value which (in text mode) is then mapped again at run time to the actual character (or sequence of characters) required by the underlying platform to move to the next line.
\n is different than all the other backslash literals because there are two mappings involved. This two-step mapping makes \n significantly different than even \r, which is simply a compile-time mapping to CR (or the most similar control code in whatever the underlying character set is).
This trips up many C and C++ programmers. If you were to poll 100 of them, at least 99 will tell you that \n means line feed. This is not entirely true. Most (perhaps all) C and C++ implementations use LF as the magic intermediate value for \n, but that's an implementation detail. It's feasible for a compiler to use a different value. In fact, if the host character set is not a superset of ASCII (e.g., if it's EBCDIC), then \n will almost certainly not be LF.
So, in C and C++:
\ris literally a carriage return.\nis a magic value that gets translated (in text mode) at run-time to/from the host platform's newline semantics.\r\nis almost always a portability bug. In text mode, this gets translated to CR followed by the platform's newline sequence--probably not what's intended. In binary mode, this gets translated to CR followed by some magic value that might not be LF--possibly not what's intended.\x0Ais the most portable way to indicate an ASCII LF, but you only want to do that in binary mode. Most text-mode implementations will treat that like\n.
\r is "Carriage Return" (CR, ASCII character 13), \n is "Line Feed" (LF, ASCII character 10). Back in the days, you had two ASCII characters at the end of each line to tell a printer what to do - CR would tell the printer to go back to the left edge of the paper, LF would advance to the next line.
Operating systems still have different conventions as to what the end of a line looks like -- some of them have \n\r, some have \n, some have \r\n.
In Javascript, you mostly deal with \n - this is how strings are typically switching to the next line. However, depending on what strings you are working with, you may be encountering \r as well.
Normally \r represents a carriage return character (ASCII 0x0d), and \n is a newline character (ASCII 0x0a). This page has a list of all the special characters, quoted here for completeness:
\fmatches form-feed.\rmatches carriage return.\nmatches linefeed.\tmatches horizontal tab.\vmatches vertical tab.\0matchesNULcharacter.[\b]matches backspace.\smatches whitespace (short for[\f\n\r\t\v\u00A0\u2028\u2029]).\Smatches anything but a whitespace (short for[^\f\n\r\t\v\u00A0\u2028\u2029]).\wmatches any alphanumerical character (word characters) including underscore (short for[a-zA-Z0-9_]).\Wmatches any non-word characters (short for[^a-zA-Z0-9_]).\dmatches any digit (short for[0-9]).\Dmatches any non-digit (short for[^0-9]).\bmatches a word boundary (the position between a word and a space).\Bmatches a non-word boundary (short for[^\b]).\cXmatches a control character. E.g:\cmmatchescontrol-M.\xhhmatches the character with two characters of hexadecimal codehh.\uhhhhmatches the Unicode character with four characters of hexadecimal codehhhh.
Hello everybody,
I've just started learning how to code and decided to start with Python because I was told that it is easier for beginners to learn it. I don't understand the difference between \n and \r. Please help me understand it or tell me how to google my question correctly.
Thanks!!
I don't understand the difference between \n and \r.
Well, it's a long-ass story. The ASCII standard defines both carriage return (\r) and line feed (\n) because, on the hardware the ASCII standard was developed for (electric teletype machines) those are two different operations - carriage return moves the print head back to home, and linefeed simply advances the paper reel one line.
Since nobody ever invents anything new in tech, the terminal on your computer (actually, the terminal emulator) pretends to be a teletype machine, but one of the consequences of it not actually having a real printhead is that regardless of how it moves to the next line, the cursor (itself named after the slide on a slide rule) starts at column 0.
The only remaining difference in practice (since nobody uses teletype machines any more) is that the major OS's all have a goofy different idea about what the "right" line-ending character is supposed to be. So, that's the short answer to your question - the difference between them is that on your operating system, one of those is the correct character with which to end a line, and one of them isn't, or else the right answer is both of them - Windows considers \r\n together to be the correct line-ending character.
Fun, right?
Not sure about Windows, but on a POSIX terminal \r will just bring your cursor back to the beginning of the line, whereas \n will be a newline.
So you can do things like this:
from time import sleep
for i in range(10):
print(f'\rI have {i} cookies!', end='')
sleep(1)
Nice for outputting progress information in a batch processing script, for example.
Backward compatibility.
Windows is backward compatible with MS-DOS (aggressively so, even) and MS-DOS used the CR-LF convention because MS-DOS was compatible with CP/M-80 (somewhat by accident) which used the CR-LF convention because that was how you drove a printer (because printers were originally computer controlled typewriters).
Printers have a separate command to move the paper up one line to a new line, and a separate command for returning the carriage (where the paper was mounted) back to the left margin.
That's why. And, yes, it is an annoyance, but it is part of the package deal that allowed MS-DOS to win over CP/M, and Windows 95 to win over all the other GUI's on top of DOS, and Windows XP to take over from Windows 98.
(Note: Modern laser printers still have these commands because they too are backwards compatible with earlier printers - HP in particular do this well)
For those unfamiliar with typewriters, here is a video showing how typing was done: http://www.youtube.com/watch?v=LJvGiU_UyEQ. Notice that the paper is first moved up, and then the carriage is returned, even if it happens in a simple movement. The ding notified the typist that the end was near, and to prepare for it.
As far as I'm aware this harks back to the days of typewriters.
\r is carriage return, which is what moves where you are typing on the page back to the left (or right if that is your culture)
\n is new line, which moves your paper up a line.
Doing only one of these on a typewriter would put you in the wrong place to start writing a new line of text.
When computers came about I guess some people kept the old model, but others realised that it wasn't necessary and encapsulated a full newline as one character.