Though I suspect something else is decoding your data for you (a char* in C is usually best represented as bytes, especially if it is binary data):

The latin1 codec can round trip every byte. You can verify this with the following short program:

>>> s = ''.join(chr(i) for i in range(0x100))
>>> s
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨ª«¬\xad¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'
>>> s2 = s.encode('latin1').decode('latin1')
>>> s2 == s
True
>>> sb = bytes(range(0x100))
>>> sb
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> sb == s.encode('latin1')
True
Answer from anthony sottile on Stack Overflow
Top answer
1 of 6
20

Though I suspect something else is decoding your data for you (a char* in C is usually best represented as bytes, especially if it is binary data):

The latin1 codec can round trip every byte. You can verify this with the following short program:

>>> s = ''.join(chr(i) for i in range(0x100))
>>> s
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨ª«¬\xad¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'
>>> s2 = s.encode('latin1').decode('latin1')
>>> s2 == s
True
>>> sb = bytes(range(0x100))
>>> sb
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> sb == s.encode('latin1')
True
2 of 6
12

Just now I ran into the same problem. This is what I came up with:

import struct

def rawbytes(s):
    """Convert a string to raw bytes without encoding"""
    outlist = []
    for cp in s:
        num = ord(cp)
        if num < 255:
            outlist.append(struct.pack('B', num))
        elif num < 65535:
            outlist.append(struct.pack('>H', num))
        else:
            b = (num & 0xFF0000) >> 16
            H = num & 0xFFFF
            outlist.append(struct.pack('>bH', b, H))
    return b''.join(outlist)

Some examples:

In [34]: rawbytes('this is a test')
Out[34]: b'this is a test'

In [35]: rawbytes('\udc80\udcdf\udcff\udcff\udcff\x7f')
Out[35]: b'\xdc\x80\xdc\xdf\xdc\xff\xdc\xff\xdc\xff\x7f'
🌐
Python.org
discuss.python.org › ideas
Alliow `bytes(mystring)` without specifying the encoding - Ideas - Discussions on Python.org
September 20, 2022 - ", line 1, in "hello".encode() b'hello' For consistency, I would suggest that calling bytes on a str object without an encoding also assumes UTF-8 by default, as ...
Discussions

String to Bytes Python without change in encoding - Stack Overflow
I have this issue and I can't figure out how to solve it. I have this string: data = '\xc4\xb7\x86\x17\xcd' When I tried to encode it: data.encode() I get this result: b'\xc3\x84\xc2\xb7\xc2\x86... More on stackoverflow.com
🌐 stackoverflow.com
January 21, 2018
Convert bytes to a string in Python 3 - Stack Overflow
See Best way to convert string to bytes in Python 3? for the other way around. ... @CharlieParker Because str(text_bytes) can't specify the encoding. More on stackoverflow.com
🌐 stackoverflow.com
image - How to create an HTML img tag string with base64 encoding from bytes in Python? - Stack Overflow
I am using Python 3.6 and I have an image as bytes: img = b'\xff\xd8\xff\xe0\x00\x10JFIF\x00' I need to convert the bytes into a string without encoding so it looks like: raw_img = '\xff\xd8\xff\x... More on stackoverflow.com
🌐 stackoverflow.com
Best way to convert string to bytes in Python 3? - TestMu AI Community
Best way to convert string to bytes in Python 3 More on community.testmuai.com
🌐 community.testmuai.com
0
June 6, 2024
🌐
Reddit
reddit.com › r/learnpython › how to read string to bytes without encoding?
r/learnpython on Reddit: How to read string to bytes without encoding?
May 9, 2019 -

I'd been receiving some binary data on a python socket, and printing the data to the console for a few days. I've switched the output to files, but would like to reclaim the previous data. The data looks like this, but when I load the file, it goes in and encodes the data, escaping the single quotes and pre-existing backslashes, like this.

I'm wanting to read the file, one line at a time, or loading the entire file as an array or list, but need to bypass or backtrack the encoding to use the data properly. Any pointers on what modules/functions I should be looking at?

Top answer
1 of 2
20

You cannot convert a string into bytes or bytes into string without taking an encoding into account. The whole point about the bytes type is an encoding-independent sequence of bytes, while str is a sequence of Unicode code points which by design have no unique byte representation.

So when you want to convert one into the other, you must tell explicitly what encoding you want to use to perform this conversion. When converting into bytes, you have to say how to represent each character as a byte sequence; and when you convert from bytes, you have to say what method to use to map those bytes into characters.

If you don’t specify the encoding, then UTF-8 is the default, which is a sane default since UTF-8 is ubiquitous, but it's also just one of many valid encodings.

If you take your original string, '\xc4\xb7\x86\x17\xcd', take a look at what Unicode code points these characters represent. \xc4 for example is the LATIN CAPITAL LETTER A WITH DIAERESIS, i.e. Ä. That character happens to be encoded in UTF-8 as 0xC3 0x84 which explains why that’s what you get when you encode it into bytes. But it also has an encoding of 0x00C4 in UTF-16 for example.


As for how to solve this properly so you get the desired output, there is no clear correct answer. The solution that Kasramvd mentioned is also somewhat imperfect. If you read about the raw_unicode_escape codec in the documentation:

raw_unicode_escape

Latin-1 encoding with \uXXXX and \UXXXXXXXX for other code points. Existing backslashes are not escaped in any way. It is used in the Python pickle protocol.

So this is just a Latin-1 encoding which has a built-in fallback for characters outside of it. I would consider this fallback somewhat harmful for your purpose. For Unicode characters that cannot be represented as a \xXX sequence, this might be problematic:

>>> chr(256).encode('raw_unicode_escape')
b'\\u0100'

So the code point 256 is explicitly outside of Latin-1 which causes the raw_unicode_escape encoding to instead return the encoded bytes for the string '\\u0100', turning that one character into 6 bytes which have little to do with the original character (since it’s an escape sequence).

So if you wanted to use Latin-1 here, I would suggest you to use that one explictly, without having that escape sequence fallback from raw_unicode_escape. This will simply cause an exception when trying to convert code points outside of the Latin-1 area:

>>> '\xc4\xb7\x86\x17\xcd'.encode('latin1')
b'\xc4\xb7\x86\x17\xcd'
>>> chr(256).encode('latin1')
Traceback (most recent call last):
  File "<pyshell#28>", line 1, in <module>
    chr(256).encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0100' in position 0: ordinal not in range(256)

Of course, whether or not code points outside of the Latin-1 area can cause problems for you depends on where that string actually comes from. But if you can make guarantees that the input will only contain valid Latin-1 characters, then chances are that you don't really need to be working with a string there in the first place. Since you are actually dealing with some kind of bytes, you should look whether you cannot simply retrieve those values as bytes in the first place. That way you won’t introduce two levels of encoding there where you can corrupt data by misinterpreting the input.

2 of 2
10

You can use 'raw_unicode_escape' as your encoding:

In [14]: bytes(data, 'raw_unicode_escape')
Out[14]: b'\xc4\xb7\x86\x17\xcd'

As mentioned in comments you can also pass the encoding directly to the encode method of your string.

In [15]: data.encode("raw_unicode_escape")
Out[15]: b'\xc4\xb7\x86\x17\xcd'
🌐
Devace Technologies
devacetech.com › home › insights › string to bytes conversion in python-2025 manual
How to convert a string to bytes in Python
July 28, 2025 - In the given example, bytes ( ) receives text and encoding scheme, giving the same result as .encode ( ). Beginners may sometimes face errors while converting a string to bytes in Python, including: ... python # This will raise a TypeError bytes ...
🌐
Google Groups
groups.google.com › g › comp.lang.python › c › 3nPnNzgBoxQ
"convert" string to bytes without changing data (encoding)
March 28, 2012 - Steven D'Aprano <steve+comp....@pearwood.info> wrote: >The right way to convert bytes to strings, and vice versa, is via >encoding and decoding operations. If you want to dictate to the original poster the correct way to do things then you don't need to do anything more that. You don't need to pretend like Chris Angelico that there's isn't a direct mapping from the his Python 3 implementation's internal respresentation of strings to bytes in order to label what he's asking for as being "silly".
Find elsewhere
🌐
Analytics Vidhya
analyticsvidhya.com › home › 7 ways to convert string to bytes in python
7 Ways to Convert String to Bytes in Python - Analytics Vidhya
February 7, 2024 - The bytes() function provides a simple way to convert strings to bytes. It is similar to the encode() method but returns an immutable bytes object instead of a mutable one. However, it is important to note that the bytes() function may raise ...
🌐
Python
mail.python.org › pipermail › python-list › 2012-March › 621947.html
"convert" string to bytes without changing data (encoding)
March 28, 2012 - You don't need to >> pretend like Chris Angelico that there's isn't a direct mapping from >> the his Python 3 implementation's internal respresentation of strings >> to bytes in order to label what he's asking for as being "silly". > > It might be technically possible to recreate internal implementation, > or get the byte data. That does not mean it will make any sense or > be understood in a meaningful manner. I think Ian summarized it > very well: > >>You can't generally just "deal with the ascii portions" without >>knowing something about the encoding.
🌐
Flexiple
flexiple.com › python › python-string-to-bytes
How to convert Python string to bytes? | Flexiple Tutorials | Python - Flexiple
The bytes() method is an inbuilt function that can be used to convert objects to byte objects. ... The bytes take in an object (a string in our case), the required encoding method, and convert it into a byte object.
🌐
RTEE Tech
blog.rteetech.com › home › python convert string to bytes – methods, encoding & alternatives
Python Convert String to Bytes| Methods, Encoding,Alternatives
February 27, 2025 - In such cases, you can use the bytes() function without any encoding. python string = "Hello, World!" byte_string = bytes(string, 'utf-8') print(byte_string) However, in most situations, it is essential to define the encoding to ensure the correct ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › how-to-fix-typeerror-string-argument-without-an-encoding-in-python
How to Fix TypeError: String Argument Without an Encoding in Python - GeeksforGeeks
July 23, 2025 - encode() method turns a string into a sequence of bytes, using a format like 'utf-8' that tells Python how to represent the characters in the string. ... When we use the bytes() function, we need to specify the encoding format, like 'utf-8', ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-convert-string-to-bytes
Convert String to bytes-Python - GeeksforGeeks
The goal here is to convert a string into bytes in Python. This is essential for working with binary data or when encoding strings for storage or transmission.
Published   July 11, 2025
🌐
KDnuggets
kdnuggets.com › convert-bytes-to-string-in-python-a-tutorial-for-beginners
Convert Bytes to String in Python: A Tutorial for Beginners - KDnuggets
July 15, 2024 - Note: Strings do not have an associated ... bytes to string, you can use the decode() method on the bytes object. And to convert string to bytes, you can use the encode() method on the string....
🌐
Reddit
reddit.com › r/codinghelp › converting between string to bytes without creating a double backslash.
r/CodingHelp on Reddit: CONVERTING between string to bytes without creating a double backslash.
January 24, 2022 -

I have a string of bytes that I have read from a file:

\x00\x01\x00\xc0\x01\x00\x00\x00\x04 This is a string not bytes.

I know I can convert it to bytes via

s_new = bytes(string, "raw_encoding_escape")

if I want to make it b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04"

This only works if I pass the string in directly from the program and not read it in from a file. If I read it in from a file it becomes:

b'\\x00\\x01\\x00\\xc0\\x01\\x00\\x00\\x00\\x04'

The double backslash occurs and I don't know why :/ This doesn't occur when passing in the string that has not been read from a file.

Any help?

Top answer
1 of 3
2
So, the literal string "\x00\x01\x00\xc0\x01\x00\x00\x00\x04" is in your text file? That won't work. In a Python string (apart from raw-string) you can do \x?? to be able to enter the hex-code for any character instead of the character. This is helpful if you want to specify non-printable characters. "\x61\x62\x63\x7A" for example is the same as "abcz". Your byte sequence b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04" represents the text "NUL SOH NUL no_ascii SOH NUL NUL NUL EOT" (spaces included for better readability), but since none of these are printable characters you just get their hex-codes back. Why doesn't it work if you read in the literal \x00\x01\x00\xc0\x01\x00\x00\x00\x04 Imagine the other way around: What happens if you simply try to print "\x00\x01\x00\xc0\x01\x00\x00\x00\x04"? Well, UTF-8 and US-ASCII are partially invalid encodings here (because of the \xc0), therefore I use the US-ASCII extension "ISO-8859-1" as the encoding. The printed result is this:  À . As you may guess now, if you really want to get b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04" from a file, then your file's content needs to be  À . If your file's content is \x00\x01\x00\xc0\x01\x00\x00\x00\x04, then in order to print it like this you have to do .write(br"\x00\x01\x00\xc0\x01\x00\x00\x00\x04") and this raw-string equals the non-raw-string b"\\x00\\x01\\x00\\xc0\\x01\\x00\\x00\\x00\\x04". Basically text inside a file is fully escaped, i.e. it behaves like a Python raw-string. EDIT: Just realized after posting that, even though they're in a script tag, Reddit doesn't "print" the unprintable characters. Just test-print it to a file yourself and look at the result: f = open("name.txt", "wb") # replace name with your file's name f.write(b"\x00\x01\x00\xc0\x01\x00\x00\x00\x04")
2 of 3
1
How are you reading the file? Are you using open() with "rb" as the file mode? https://docs.python.org/3/library/io.html#binary-i-o
🌐
Delft Stack
delftstack.com › home › howto › python › how to convert string to bytes in python
How to Convert String to Bytes in Python | Delft Stack
March 4, 2025 - The bytes constructor is a straightforward way to convert a string into bytes. This method takes a string and an optional encoding argument and returns the corresponding byte representation.