Though I suspect something else is decoding your data for you (a char* in C is usually best represented as bytes, especially if it is binary data):

The latin1 codec can round trip every byte. You can verify this with the following short program:

>>> s = ''.join(chr(i) for i in range(0x100))
>>> s
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨ª«¬\xad¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'
>>> s2 = s.encode('latin1').decode('latin1')
>>> s2 == s
True
>>> sb = bytes(range(0x100))
>>> sb
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> sb == s.encode('latin1')
True
Answer from anthony sottile on Stack Overflow
Top answer
1 of 6
20

Though I suspect something else is decoding your data for you (a char* in C is usually best represented as bytes, especially if it is binary data):

The latin1 codec can round trip every byte. You can verify this with the following short program:

>>> s = ''.join(chr(i) for i in range(0x100))
>>> s
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨ª«¬\xad¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'
>>> s2 = s.encode('latin1').decode('latin1')
>>> s2 == s
True
>>> sb = bytes(range(0x100))
>>> sb
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> sb == s.encode('latin1')
True
2 of 6
12

Just now I ran into the same problem. This is what I came up with:

import struct

def rawbytes(s):
    """Convert a string to raw bytes without encoding"""
    outlist = []
    for cp in s:
        num = ord(cp)
        if num < 255:
            outlist.append(struct.pack('B', num))
        elif num < 65535:
            outlist.append(struct.pack('>H', num))
        else:
            b = (num & 0xFF0000) >> 16
            H = num & 0xFFFF
            outlist.append(struct.pack('>bH', b, H))
    return b''.join(outlist)

Some examples:

In [34]: rawbytes('this is a test')
Out[34]: b'this is a test'

In [35]: rawbytes('\udc80\udcdf\udcff\udcff\udcff\x7f')
Out[35]: b'\xdc\x80\xdc\xdf\xdc\xff\xdc\xff\xdc\xff\x7f'
🌐
Python.org
discuss.python.org › ideas
Alliow `bytes(mystring)` without specifying the encoding - Ideas - Discussions on Python.org
September 20, 2022 - ", line 1, in "hello".encode() b'hello' For consistency, I would suggest that calling bytes on a str object without an encoding also assumes UTF-8 by default, as ...
Discussions

String to Bytes Python without change in encoding - Stack Overflow
I have this issue and I can't figure out how to solve it. I have this string: data = '\xc4\xb7\x86\x17\xcd' When I tried to encode it: data.encode() I get this result: b'\xc3\x84\xc2\xb7\xc2\x86... More on stackoverflow.com
🌐 stackoverflow.com
January 21, 2018
Convert bytes to a string in Python 3 - Stack Overflow
See Best way to convert string to bytes in Python 3? for the other way around. ... @CharlieParker Because str(text_bytes) can't specify the encoding. More on stackoverflow.com
🌐 stackoverflow.com
Best way to convert string to bytes in Python 3? - TestMu AI Community
Best way to convert string to bytes in Python 3 More on community.testmuai.com
🌐 community.testmuai.com
0
June 6, 2024
CONVERTING between string to bytes without creating a double backslash.
So, the literal string "\x00\x01\x00\xc0\x01\x00\x00\x00\x04" is in your text file? That won't work. In a Python string (apart from raw-string) you can do \x?? to be able to enter the hex-code for any character instead of the character. This is helpful if you want to specify non-printable characters. "\x61\x62\x63\x7A" for example is the same as "abcz". Your byte sequence b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04" represents the text "NUL SOH NUL no_ascii SOH NUL NUL NUL EOT" (spaces included for better readability), but since none of these are printable characters you just get their hex-codes back. Why doesn't it work if you read in the literal \x00\x01\x00\xc0\x01\x00\x00\x00\x04 Imagine the other way around: What happens if you simply try to print "\x00\x01\x00\xc0\x01\x00\x00\x00\x04"? Well, UTF-8 and US-ASCII are partially invalid encodings here (because of the \xc0), therefore I use the US-ASCII extension "ISO-8859-1" as the encoding. The printed result is this:  À . As you may guess now, if you really want to get b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04" from a file, then your file's content needs to be  À . If your file's content is \x00\x01\x00\xc0\x01\x00\x00\x00\x04, then in order to print it like this you have to do .write(br"\x00\x01\x00\xc0\x01\x00\x00\x00\x04") and this raw-string equals the non-raw-string b"\\x00\\x01\\x00\\xc0\\x01\\x00\\x00\\x00\\x04". Basically text inside a file is fully escaped, i.e. it behaves like a Python raw-string. EDIT: Just realized after posting that, even though they're in a script tag, Reddit doesn't "print" the unprintable characters. Just test-print it to a file yourself and look at the result: f = open("name.txt", "wb") # replace name with your file's name f.write(b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04") More on reddit.com
🌐 r/CodingHelp
3
1
January 24, 2022
🌐
Reddit
reddit.com › r/learnpython › how to read string to bytes without encoding?
r/learnpython on Reddit: How to read string to bytes without encoding?
May 9, 2019 -

I'd been receiving some binary data on a python socket, and printing the data to the console for a few days. I've switched the output to files, but would like to reclaim the previous data. The data looks like this, but when I load the file, it goes in and encodes the data, escaping the single quotes and pre-existing backslashes, like this.

I'm wanting to read the file, one line at a time, or loading the entire file as an array or list, but need to bypass or backtrack the encoding to use the data properly. Any pointers on what modules/functions I should be looking at?

🌐
Flexiple
flexiple.com › python › python-string-to-bytes
How to convert Python string to bytes? | Flexiple Tutorials | Python - Flexiple
The bytes() method is an inbuilt function that can be used to convert objects to byte objects. ... The bytes take in an object (a string in our case), the required encoding method, and convert it into a byte object.
🌐
Google Groups
groups.google.com › g › comp.lang.python › c › 3nPnNzgBoxQ
"convert" string to bytes without changing data (encoding)
March 28, 2012 - Steven D'Aprano <steve+comp....@pearwood.info> wrote: >The right way to convert bytes to strings, and vice versa, is via >encoding and decoding operations. If you want to dictate to the original poster the correct way to do things then you don't need to do anything more that. You don't need to pretend like Chris Angelico that there's isn't a direct mapping from the his Python 3 implementation's internal respresentation of strings to bytes in order to label what he's asking for as being "silly".
Top answer
1 of 2
20

You cannot convert a string into bytes or bytes into string without taking an encoding into account. The whole point about the bytes type is an encoding-independent sequence of bytes, while str is a sequence of Unicode code points which by design have no unique byte representation.

So when you want to convert one into the other, you must tell explicitly what encoding you want to use to perform this conversion. When converting into bytes, you have to say how to represent each character as a byte sequence; and when you convert from bytes, you have to say what method to use to map those bytes into characters.

If you don’t specify the encoding, then UTF-8 is the default, which is a sane default since UTF-8 is ubiquitous, but it's also just one of many valid encodings.

If you take your original string, '\xc4\xb7\x86\x17\xcd', take a look at what Unicode code points these characters represent. \xc4 for example is the LATIN CAPITAL LETTER A WITH DIAERESIS, i.e. Ä. That character happens to be encoded in UTF-8 as 0xC3 0x84 which explains why that’s what you get when you encode it into bytes. But it also has an encoding of 0x00C4 in UTF-16 for example.


As for how to solve this properly so you get the desired output, there is no clear correct answer. The solution that Kasramvd mentioned is also somewhat imperfect. If you read about the raw_unicode_escape codec in the documentation:

raw_unicode_escape

Latin-1 encoding with \uXXXX and \UXXXXXXXX for other code points. Existing backslashes are not escaped in any way. It is used in the Python pickle protocol.

So this is just a Latin-1 encoding which has a built-in fallback for characters outside of it. I would consider this fallback somewhat harmful for your purpose. For Unicode characters that cannot be represented as a \xXX sequence, this might be problematic:

>>> chr(256).encode('raw_unicode_escape')
b'\\u0100'

So the code point 256 is explicitly outside of Latin-1 which causes the raw_unicode_escape encoding to instead return the encoded bytes for the string '\\u0100', turning that one character into 6 bytes which have little to do with the original character (since it’s an escape sequence).

So if you wanted to use Latin-1 here, I would suggest you to use that one explictly, without having that escape sequence fallback from raw_unicode_escape. This will simply cause an exception when trying to convert code points outside of the Latin-1 area:

>>> '\xc4\xb7\x86\x17\xcd'.encode('latin1')
b'\xc4\xb7\x86\x17\xcd'
>>> chr(256).encode('latin1')
Traceback (most recent call last):
  File "<pyshell#28>", line 1, in <module>
    chr(256).encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0100' in position 0: ordinal not in range(256)

Of course, whether or not code points outside of the Latin-1 area can cause problems for you depends on where that string actually comes from. But if you can make guarantees that the input will only contain valid Latin-1 characters, then chances are that you don't really need to be working with a string there in the first place. Since you are actually dealing with some kind of bytes, you should look whether you cannot simply retrieve those values as bytes in the first place. That way you won’t introduce two levels of encoding there where you can corrupt data by misinterpreting the input.

2 of 2
10

You can use 'raw_unicode_escape' as your encoding:

In [14]: bytes(data, 'raw_unicode_escape')
Out[14]: b'\xc4\xb7\x86\x17\xcd'

As mentioned in comments you can also pass the encoding directly to the encode method of your string.

In [15]: data.encode("raw_unicode_escape")
Out[15]: b'\xc4\xb7\x86\x17\xcd'
🌐
Bobby Hadz
bobbyhadz.com › blog › python-typeerror-string-argument-without-an-encoding
TypeError: string argument without an encoding in Python | bobbyhadz
April 8, 2024 - The str.encode() method returns an encoded version of the string as a bytes object. The default encoding is utf-8. Conversely, you can use the decode() method to convert a bytes object to a string.
Find elsewhere
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-convert-string-to-bytes
Convert String to bytes-Python - GeeksforGeeks
The goal here is to convert a string into bytes in Python. This is essential for working with binary data or when encoding strings for storage or transmission.
Published   July 11, 2025
🌐
Devace Technologies
devacetech.com › home › insights › string to bytes conversion in python-2025 manual
How to convert a string to bytes in Python
July 28, 2025 - In the given example, bytes ( ) receives text and encoding scheme, giving the same result as .encode ( ). Beginners may sometimes face errors while converting a string to bytes in Python, including: ... python # This will raise a TypeError bytes ( “ Python ” ) TypeError: string argument without an encoding Deploying .encode ( ) on already encoded bytes
🌐
DataCamp
datacamp.com › tutorial › string-to-bytes-conversion
How to Convert String to Bytes in Python | DataCamp
June 5, 2024 - In Python, use the .encode() method on a string to convert it into bytes, optionally specifying the desired encoding (UTF-8 by default).
🌐
Reddit
reddit.com › r/codinghelp › converting between string to bytes without creating a double backslash.
r/CodingHelp on Reddit: CONVERTING between string to bytes without creating a double backslash.
January 24, 2022 -

I have a string of bytes that I have read from a file:

\x00\x01\x00\xc0\x01\x00\x00\x00\x04 This is a string not bytes.

I know I can convert it to bytes via

s_new = bytes(string, "raw_encoding_escape")

if I want to make it b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04"

This only works if I pass the string in directly from the program and not read it in from a file. If I read it in from a file it becomes:

b'\\x00\\x01\\x00\\xc0\\x01\\x00\\x00\\x00\\x04'

The double backslash occurs and I don't know why :/ This doesn't occur when passing in the string that has not been read from a file.

Any help?

Top answer
1 of 3
2
So, the literal string "\x00\x01\x00\xc0\x01\x00\x00\x00\x04" is in your text file? That won't work. In a Python string (apart from raw-string) you can do \x?? to be able to enter the hex-code for any character instead of the character. This is helpful if you want to specify non-printable characters. "\x61\x62\x63\x7A" for example is the same as "abcz". Your byte sequence b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04" represents the text "NUL SOH NUL no_ascii SOH NUL NUL NUL EOT" (spaces included for better readability), but since none of these are printable characters you just get their hex-codes back. Why doesn't it work if you read in the literal \x00\x01\x00\xc0\x01\x00\x00\x00\x04 Imagine the other way around: What happens if you simply try to print "\x00\x01\x00\xc0\x01\x00\x00\x00\x04"? Well, UTF-8 and US-ASCII are partially invalid encodings here (because of the \xc0), therefore I use the US-ASCII extension "ISO-8859-1" as the encoding. The printed result is this:  À . As you may guess now, if you really want to get b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04" from a file, then your file's content needs to be  À . If your file's content is \x00\x01\x00\xc0\x01\x00\x00\x00\x04, then in order to print it like this you have to do .write(br"\x00\x01\x00\xc0\x01\x00\x00\x00\x04") and this raw-string equals the non-raw-string b"\\x00\\x01\\x00\\xc0\\x01\\x00\\x00\\x00\\x04". Basically text inside a file is fully escaped, i.e. it behaves like a Python raw-string. EDIT: Just realized after posting that, even though they're in a script tag, Reddit doesn't "print" the unprintable characters. Just test-print it to a file yourself and look at the result: f = open("name.txt", "wb") # replace name with your file's name f.write(b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04")
2 of 3
1
How are you reading the file? Are you using open() with "rb" as the file mode? https://docs.python.org/3/library/io.html#binary-i-o
🌐
Edureka Community
edureka.co › home › community › categories › python › best way to convert string to bytes in python
Best way to convert string to bytes in Python | Edureka Community
December 28, 2020 - There appear to be two different ways to convert a string to bytes, Which of these methods would be better ... 'utf-8') b = mystring.encode('utf-8')
🌐
Analytics Vidhya
analyticsvidhya.com › home › 7 ways to convert string to bytes in python
7 Ways to Convert String to Bytes in Python - Analytics Vidhya
February 7, 2024 - The bytes() function provides a simple way to convert strings to bytes. It is similar to the encode() method but returns an immutable bytes object instead of a mutable one. However, it is important to note that the bytes() function may raise ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › how-to-fix-typeerror-string-argument-without-an-encoding-in-python
How to Fix TypeError: String Argument Without an Encoding in Python - GeeksforGeeks
July 23, 2025 - encode() method turns a string ... the characters in the string. ... When we use the bytes() function, we need to specify the encoding format, like 'utf-8', to convert a string into bytes....