Unix uses 0xA for a newline character. Windows uses a combination of two characters: 0xD 0xA. 0xD is the carriage return character. ^M happens to be the way vim displays 0xD (0x0D = 13, M is the 13th letter in the English alphabet).
You can remove all the ^M characters by running the following:
:%s/^M//g
Where ^M is entered by holding down Ctrl and typing v followed by m, and then releasing Ctrl. This is sometimes abbreviated as ^V^M, but note that you must enter it as described in the previous sentence, rather than typing it out literally.
This expression will replace all occurrences of ^M with the empty string (i.e. nothing). I use this to get rid of ^M in files copied from Windows to Unix (Solaris, Linux, OSX).
Unix uses 0xA for a newline character. Windows uses a combination of two characters: 0xD 0xA. 0xD is the carriage return character. ^M happens to be the way vim displays 0xD (0x0D = 13, M is the 13th letter in the English alphabet).
You can remove all the ^M characters by running the following:
:%s/^M//g
Where ^M is entered by holding down Ctrl and typing v followed by m, and then releasing Ctrl. This is sometimes abbreviated as ^V^M, but note that you must enter it as described in the previous sentence, rather than typing it out literally.
This expression will replace all occurrences of ^M with the empty string (i.e. nothing). I use this to get rid of ^M in files copied from Windows to Unix (Solaris, Linux, OSX).
:%s/\r//g
worked for me today. But my situation may have been slightly different.
The ^M is a carriage-return character. If you see this, you're probably looking at a file that originated in the DOS/Windows world, where an end-of-line is marked by a carriage return/newline pair, whereas in the Unix world, end-of-line is marked by a single newline.
Read this article for more detail, and also the Wikipedia entry for newline.
This article discusses how to set up vim to transparently edit files with different end-of-line markers.
If you have a file with ^M at the end of some lines and you want to get rid of them, use this in Vim:
:s/^M$//
(Press Ctrl+V Ctrl+M to insert that ^M.)
This worked for me
:e ++ff=dos
The :e ++ff=dos command tells Vim to read the file again, forcing dos file format. Vim will remove CRLF and LF-only line endings, leaving only the text of each line in the buffer.
then
:set ff=unix
and finally
:wq
Videos
The M-BM- characters are an ASCII representation of byte sequence 0xc2 0xa0, which is the UTF8 encoding of unicode character A0 - a non-breaking space character. This character can be inserted in both LibreOffice and Microsoft Word documents using the key sequence Ctrl+Shift+SPACE.
For example if we create a new .odt document in LibreOffice and type ABCCtrl+Shift+SPACEDEF, then Save As... Text (ignoring the warning that the document may contain features that cannot be saved in that format), then view the resulting .txt file with cat:
$ cat nbsp.txt
ABC DEF
and then again with the -v switch to show non-printing characters
$ cat -v nbsp.txt
M-oM-;M-?ABCM-BM- DEF
Note that we also get an initial sequence M-oM-;M-? or hexadecimal 0xef 0xbb 0xbf which is the UTF8 byte order mark (BOM) consistent with the file type reported by the file command i.e.
$ file nbsp.txt
nbsp.txt: UTF-8 Unicode (with BOM) text
Using od to print the hexadecimal values in byte order we see
$ od -tx1 nbsp.txt
0000000 ef bb bf 41 42 43 c2 a0 44 45 46 0a
0000014
It is possible to manipulate these characters using standard tools like sed or tr by specifying the hex codes as escape sequences e.g. to replace the non-breaking space with a plain ASCII space
$ sed 's/\xc2\xa0/ /g' nbsp.txt
ABC DEF
Checking again with od confirms the replacement by an ordinary ASCII space 0x20 (decimal 32)
$ sed 's/\xc2\xa0/ /g' nbsp.txt | od -tx1
0000000 ef bb bf 41 42 43 20 44 45 46 0a
0000013
In gnome-terminal (and maybe other UTF8-aware terminal emulators), it's also possible to enter the unicode code point value directly using the key sequence Ctrl+Shift+u followed by a hexidecimal value then the Enter key - the sequence shows up initially as uΜ².Μ².Μ².Μ² but then the character should compose when you hit Enter e.g. for the same non-breaking space replacement we can do
$ sed 's/Ctrl+Shift+ua0
which displays as
$ sed 's/Μ²/Μ²uΜ²aΜ²0Μ²
and then completes as
$ sed 's/ / /g' nbsp.txt
ABC DEF
Using cat -v we can confirm the M-BM- sequence has become an ordinary space
$ sed 's/ / /g' nbsp.txt | cat -v
M-oM-;M-?ABC DEF
You may want to look at more generic encoding converters such as iconv and uconv as well.
"cat -v file " will show the non-printing characters in the file. Just redirect the output to some temporary file and use vim for replacing the M-BM- characters with nothing.
%s/M-BM- //g
Easiest solution.
It is known as carriage return.
If you're using vim you can enter insert mode and type CTRL-v CTRL-m. That ^M is the keyboard equivalent to \r.
Inserting 0x0D in a hex editor will do the task.
How do I remove it?
You can remove it using the command
perl -p -i -e "s/\r//g" filename
As the OP suggested in the comments of this answer here, you can even try a `
dos2unix filename
and see if that fixes it.
As @steeldriver suggests in the comments, after opening the vim editor, press esc key and type :set ff=unix.
References
https://stackoverflow.com/questions/1585449/insert-the-carriage-return-character-in-vim
https://stackoverflow.com/a/7742437/1742825
-ksh: revenue_ext.ksh: not found [No such file or directory]
Code
sed -i 's/^M//' filename.txt
While typing ^M in the command, do not use ^M as that only inserts what is displayed, not what causes it to be displayed; use CtrlV CtrlM.
I believe that what OP was actually asking about is called Caret Notation.
Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a value of 4 is represented as ^D because D is the 4th letter in the alphabet. The NUL character with a value of 0 is represented as ^@ (@ is the ASCII character before A). The DEL character with the value 127 is usually represented as ^?, because the ASCII '?' is before '@' and -1 is the same as 127 if masked to 7 bits. An alternative formulation of the translation is that the printed character is found by inverting the 7th bit of the ASCII code
The full list of ASCII control characters along with caret notation can be found here
Regarding vim and other text editors: You'll typically only see ^M if you open a Windows-formatted (CRLF) text file in an editor that expects Linux line endings (LF). The 0x0A is rendered as a line break, the 0x0D right before it gets printed as ^M. Most of the time, editor default settings include 'automatically recognize line endings'.
That is exactly the reason.
ASCII defines characters 0-31 as non-printing control codes. Here's an extract from the ascii(7) manual page from a random Linux system (man ascii), up to and including CR (13):
Oct Dec Hex Char
βββββββββββββββββββββββββββββββββββββββββββββ
000 0 00 NUL '\0'
001 1 01 SOH (start of heading)
002 2 02 STX (start of text)
003 3 03 ETX (end of text)
004 4 04 EOT (end of transmission)
005 5 05 ENQ (enquiry)
006 6 06 ACK (acknowledge)
007 7 07 BEL '\a' (bell)
010 8 08 BS '\b' (backspace)
011 9 09 HT '\t' (horizontal tab)
012 10 0A LF '\n' (new line)
013 11 0B VT '\v' (vertical tab)
014 12 0C FF '\f' (form feed)
015 13 0D CR '\r' (carriage ret)
Conventionally these characters are generated with Control and the letter relating to the character required. Teletypes and early terminal keyboards had 'BELL' written above the G key for this reason.
The standards document that defined ASCII is ASA X3.4-1963, which was published by the American Standards Association in 1963. I can't find the original document on their website, but this extract from the original document shows the character table, including the control codes above.
Why does this happen and it only happens if you put ** then go back and put a capital M in-between
^M usually appear when the file is inconsistent with regards to the line terminators used. Try the following:
- Create
testfileusing vim with a few random lines, and write it in dos mode. Then run (hope you have cygwin installed):
sed '2s/.$//' testfile > corruptfileThis will remove the last character of the second line, creating an inconsistency in the line terminators used.
- Open
corruptfilewith vim. ^M symbols will appear so that you are made aware of the inconsistency.
In real life, programs that have been written with a single type of line-terminator in mind may produce such inconsistencies. While these inconsistencies seem innocent, they may cause you problems with other programs. E.g. subversion does not allow file with inconsistent line terminators to be added to a repository. Other programs may just fail silently.
To make ^M dissapear just make a global replacement:
:% s/^M//g
The ^M is produced by pressing: Ctrl+v <Enter>
Then write back the file in the desired format:
:set ff=dos
:w
The character that vim displays as ^M is CR (carriage return, ASCII character 13). Windows uses both CR and LF (new line, ASCII character 10) to encode line breaks in text files. Linux/Unix use only LF to encode line breaks, and Mac OS uses only CR.
If you open a text file created on a Windows computer on a Linux box, you may see trailing CR characters in each line. There are several ways to remove them. One is to replace them in vim as m000 suggested, another would be to recode the file:
recode ibmpc..latin1 SOME.TXT