Unix uses 0xA for a newline character. Windows uses a combination of two characters: 0xD 0xA. 0xD is the carriage return character. ^M happens to be the way vim displays 0xD (0x0D = 13, M is the 13th letter in the English alphabet).

You can remove all the ^M characters by running the following:

:%s/^M//g

Where ^M is entered by holding down Ctrl and typing v followed by m, and then releasing Ctrl. This is sometimes abbreviated as ^V^M, but note that you must enter it as described in the previous sentence, rather than typing it out literally.

This expression will replace all occurrences of ^M with the empty string (i.e. nothing). I use this to get rid of ^M in files copied from Windows to Unix (Solaris, Linux, OSX).

Answer from Tomasz Stanczak on Stack Overflow
🌐
Baeldung
baeldung.com β€Ί home β€Ί files β€Ί file editing β€Ί what does the ^m character mean in vim?
What Does the ^M Character Mean in Vim? | Baeldung on Linux
February 14, 2024 - However, files edited in Windows typically contain CR+LF line endings. When Vim opens such files, the additional CR characters are perceived as part of the text and are displayed as the ^M character. This is the reason we see this character.
🌐
Fandom
unofficial-alphabet-lore.fandom.com β€Ί wiki β€Ί M_(character)
M (character) | Alphabet Lore Wiki | Fandom
2 weeks ago - Go to M (episode). M is a Letter who appears in Alphabet Lore and Number Lore. M is a claret/maroon Letter M. His eyes are spaced far apart near the top of his head and his mouth is wide.
Top answer
1 of 7
21

The M-BM- characters are an ASCII representation of byte sequence 0xc2 0xa0, which is the UTF8 encoding of unicode character A0 - a non-breaking space character. This character can be inserted in both LibreOffice and Microsoft Word documents using the key sequence Ctrl+Shift+SPACE.

For example if we create a new .odt document in LibreOffice and type ABCCtrl+Shift+SPACEDEF, then Save As... Text (ignoring the warning that the document may contain features that cannot be saved in that format), then view the resulting .txt file with cat:

$ cat nbsp.txt 
ABC DEF

and then again with the -v switch to show non-printing characters

$ cat -v nbsp.txt 
M-oM-;M-?ABCM-BM- DEF

Note that we also get an initial sequence M-oM-;M-? or hexadecimal 0xef 0xbb 0xbf which is the UTF8 byte order mark (BOM) consistent with the file type reported by the file command i.e.

$ file nbsp.txt 
nbsp.txt: UTF-8 Unicode (with BOM) text

Using od to print the hexadecimal values in byte order we see

$ od -tx1 nbsp.txt
0000000 ef bb bf 41 42 43 c2 a0 44 45 46 0a
0000014

It is possible to manipulate these characters using standard tools like sed or tr by specifying the hex codes as escape sequences e.g. to replace the non-breaking space with a plain ASCII space

$ sed 's/\xc2\xa0/ /g' nbsp.txt
ABC DEF

Checking again with od confirms the replacement by an ordinary ASCII space 0x20 (decimal 32)

$ sed 's/\xc2\xa0/ /g' nbsp.txt | od -tx1
0000000 ef bb bf 41 42 43 20 44 45 46 0a
0000013

In gnome-terminal (and maybe other UTF8-aware terminal emulators), it's also possible to enter the unicode code point value directly using the key sequence Ctrl+Shift+u followed by a hexidecimal value then the Enter key - the sequence shows up initially as uΜ².Μ².Μ².Μ² but then the character should compose when you hit Enter e.g. for the same non-breaking space replacement we can do

$ sed 's/Ctrl+Shift+ua0

which displays as

$ sed 's/Μ²/Μ²uΜ²aΜ²0Μ²

and then completes as

$ sed 's/ / /g' nbsp.txt
ABC DEF

Using cat -v we can confirm the M-BM- sequence has become an ordinary space

$ sed 's/ / /g' nbsp.txt | cat -v
M-oM-;M-?ABC DEF

You may want to look at more generic encoding converters such as iconv and uconv as well.

2 of 7
3

"cat -v file " will show the non-printing characters in the file. Just redirect the output to some temporary file and use vim for replacing the M-BM- characters with nothing.

%s/M-BM- //g

Easiest solution.

Top answer
1 of 7
120

I believe that what OP was actually asking about is called Caret Notation.

Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a value of 4 is represented as ^D because D is the 4th letter in the alphabet. The NUL character with a value of 0 is represented as ^@ (@ is the ASCII character before A). The DEL character with the value 127 is usually represented as ^?, because the ASCII '?' is before '@' and -1 is the same as 127 if masked to 7 bits. An alternative formulation of the translation is that the printed character is found by inverting the 7th bit of the ASCII code

The full list of ASCII control characters along with caret notation can be found here

Regarding vim and other text editors: You'll typically only see ^M if you open a Windows-formatted (CRLF) text file in an editor that expects Linux line endings (LF). The 0x0A is rendered as a line break, the 0x0D right before it gets printed as ^M. Most of the time, editor default settings include 'automatically recognize line endings'.

2 of 7
23

That is exactly the reason.

ASCII defines characters 0-31 as non-printing control codes. Here's an extract from the ascii(7) manual page from a random Linux system (man ascii), up to and including CR (13):

   Oct   Dec   Hex   Char                       
   ─────────────────────────────────────────────
   000   0     00    NUL '\0'                    
   001   1     01    SOH (start of heading)     
   002   2     02    STX (start of text)         
   003   3     03    ETX (end of text)           
   004   4     04    EOT (end of transmission)   
   005   5     05    ENQ (enquiry)               
   006   6     06    ACK (acknowledge)           
   007   7     07    BEL '\a' (bell)             
   010   8     08    BS  '\b' (backspace)       
   011   9     09    HT  '\t' (horizontal tab)  
   012   10    0A    LF  '\n' (new line)        
   013   11    0B    VT  '\v' (vertical tab)    
   014   12    0C    FF  '\f' (form feed)       
   015   13    0D    CR  '\r' (carriage ret)    

Conventionally these characters are generated with Control and the letter relating to the character required. Teletypes and early terminal keyboards had 'BELL' written above the G key for this reason.

The standards document that defined ASCII is ASA X3.4-1963, which was published by the American Standards Association in 1963. I can't find the original document on their website, but this extract from the original document shows the character table, including the control codes above.

🌐
Unix Community
community.unix.com β€Ί unix for beginners q & a β€Ί unix for dummies questions & answers
Removing the ^M control character - UNIX for Dummies Questions & Answers - Unix Linux Community
July 21, 2014 - I've got a file where each line is separated by ^M characters. I want to be able to cat the file without those lines. When I cat the file now what I see are blank lines. However, the blank lines are actually ^M characters; when I open the file with vi they show up.
Find elsewhere
🌐
Unicode Compart
compart.com β€Ί en β€Ί unicode β€Ί U+2133
β€œβ„³β€ U+2133 Script Capital M Unicode Character
U+2133 is the unicode hex value of the character Script Capital M. Char U+2133, Encodings, HTML Entitys:β„³,β„³,β„³, UTF-8 (hex), UTF-16 (hex), UTF-32 (hex)
🌐
Wikipedia
en.wikipedia.org β€Ί wiki β€Ί M_(James_Bond)
M (James Bond) - Wikipedia
1 month ago - M is a codename held by a fictional character in Ian Fleming's James Bond book and film series; the character is the Chief of the Secret Intelligence Service for the agency known as MI6.
🌐
University of Sussex
sussex.ac.uk β€Ί its β€Ί help β€Ί faq
168. Why might I be having a problem with a text file on the Unix system? : Frequently asked questions : ... : ITS : University of Sussex
The ^M often seen at the ends of lines in text files on the Unix system is a redundant carriage return character which has been introduced as a result of transferring from another system such as DOS or Windows. This is often caused by transferring a plain text file, via FTP, as a binary file.
🌐
Wyzant
wyzant.com β€Ί resources β€Ί ask an expert
What does ^M character mean in Vim? | Wyzant Ask An Expert
April 28, 2019 - the "^M" character is probably a result of the file contents getting mangled when going between ms windows and some variant of unix (like mac os or gnu/linux).
🌐
Wikipedia
en.wikipedia.org β€Ί wiki β€Ί M
M - Wikipedia
2 weeks ago - M often represents male or masculine, especially in conjunction with F for female or feminine. In typography, an em dash is a punctuation symbol whose width is similar to that of a capital letter M.
🌐
AmpWhat
amp-what.com β€Ί unicode β€Ί search β€Ί m
β€œm” Unicode Characters, Symbols & Entities Search | AmpWhat
ο»‘β€”\00fee1ﻡﻡ"\uFEE1"U+fee1ο»‘arabic letter meem isolated formglyph for isolate arabic meemarabic presentation forms-b
🌐
i2symbol
i2symbol.com β€Ί home β€Ί abc 123
M Symbol Text (Copy and Paste) β“œ β’¨ Պ αΉƒ αΈΏ
M symbol text (β“œ β’¨ Պ αΉƒ αΈΏ ) is a collection of m-like text symbols that resemble the shape of letter m across different Unicode blocks and letter styles. This page includes an m symbol text keyboard with copy-and-paste m symbols and ...
🌐
Admin's Choice
adminschoice.com β€Ί how-to-remove-m-in-linux-unix
How to Remove ^M in Linux & Unix - Admin's Choice
January 25, 2018 - Control M ( ^M) characters are introduced when you use lines of text from a windows computer to Linux or Unix machine. Most common reasons are when you
Top answer
1 of 3
4

^M usually appear when the file is inconsistent with regards to the line terminators used. Try the following:

  • Create testfile using vim with a few random lines, and write it in dos mode.
  • Then run (hope you have cygwin installed):

     sed '2s/.$//' testfile > corruptfile
    

    This will remove the last character of the second line, creating an inconsistency in the line terminators used.

  • Open corruptfile with vim. ^M symbols will appear so that you are made aware of the inconsistency.

In real life, programs that have been written with a single type of line-terminator in mind may produce such inconsistencies. While these inconsistencies seem innocent, they may cause you problems with other programs. E.g. subversion does not allow file with inconsistent line terminators to be added to a repository. Other programs may just fail silently.

To make ^M dissapear just make a global replacement:

:% s/^M//g

The ^M is produced by pressing: Ctrl+v <Enter>

Then write back the file in the desired format:

:set ff=dos
:w
2 of 3
1

The character that vim displays as ^M is CR (carriage return, ASCII character 13). Windows uses both CR and LF (new line, ASCII character 10) to encode line breaks in text files. Linux/Unix use only LF to encode line breaks, and Mac OS uses only CR.

If you open a text file created on a Windows computer on a Linux box, you may see trailing CR characters in each line. There are several ways to remove them. One is to replace them in vim as m000 suggested, another would be to recode the file:

recode ibmpc..latin1 SOME.TXT
🌐
Unix Community
community.unix.com β€Ί unix for beginners q & a β€Ί unix for dummies questions & answers
How to find the ^M(control M) character in unix file? - UNIX for Dummies Questions & Answers - Unix Linux Community
December 4, 2008 - ^M comes when a file ftped from windows to unix without using bin mode. I need the command to find lik this, ex.txt: ------------------------------ ...,name,time^M go^M ...file,end^M ------------------------------ ...
🌐
Unicode Compart
compart.com β€Ί en β€Ί unicode β€Ί U+004D
β€œM” U+004D Latin Capital Letter M Unicode Character
U+004D is the unicode hex value of the character Latin Capital Letter M. Char U+004D, Encodings, HTML Entitys:M,M, UTF-8 (hex), UTF-16 (hex), UTF-32 (hex)