Though I suspect something else is decoding your data for you (a char* in C is usually best represented as bytes, especially if it is binary data):

The latin1 codec can round trip every byte. You can verify this with the following short program:

>>> s = ''.join(chr(i) for i in range(0x100))
>>> s
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨ª«¬\xad¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'
>>> s2 = s.encode('latin1').decode('latin1')
>>> s2 == s
True
>>> sb = bytes(range(0x100))
>>> sb
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> sb == s.encode('latin1')
True
Answer from anthony sottile on Stack Overflow
Top answer
1 of 6
20

Though I suspect something else is decoding your data for you (a char* in C is usually best represented as bytes, especially if it is binary data):

The latin1 codec can round trip every byte. You can verify this with the following short program:

>>> s = ''.join(chr(i) for i in range(0x100))
>>> s
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0¡¢£¤¥¦§¨ª«¬\xad¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'
>>> s2 = s.encode('latin1').decode('latin1')
>>> s2 == s
True
>>> sb = bytes(range(0x100))
>>> sb
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> sb == s.encode('latin1')
True
2 of 6
12

Just now I ran into the same problem. This is what I came up with:

import struct

def rawbytes(s):
    """Convert a string to raw bytes without encoding"""
    outlist = []
    for cp in s:
        num = ord(cp)
        if num < 255:
            outlist.append(struct.pack('B', num))
        elif num < 65535:
            outlist.append(struct.pack('>H', num))
        else:
            b = (num & 0xFF0000) >> 16
            H = num & 0xFFFF
            outlist.append(struct.pack('>bH', b, H))
    return b''.join(outlist)

Some examples:

In [34]: rawbytes('this is a test')
Out[34]: b'this is a test'

In [35]: rawbytes('\udc80\udcdf\udcff\udcff\udcff\x7f')
Out[35]: b'\xdc\x80\xdc\xdf\xdc\xff\xdc\xff\xdc\xff\x7f'
🌐
Python.org
discuss.python.org › ideas
Alliow `bytes(mystring)` without specifying the encoding - Ideas - Discussions on Python.org
September 20, 2022 - ", line 1, in "hello".encode() b'hello' For consistency, I would suggest that calling bytes on a str object without an encoding also assumes UTF-8 by default, as ...
Discussions

Convert bytes to a string in Python 3 - Stack Overflow
See Best way to convert string to bytes in Python 3? for the other way around. ... @CharlieParker Because str(text_bytes) can't specify the encoding. More on stackoverflow.com
🌐 stackoverflow.com
String to Bytes Python without change in encoding - Stack Overflow
I have this issue and I can't figure out how to solve it. I have this string: data = '\xc4\xb7\x86\x17\xcd' When I tried to encode it: data.encode() I get this result: b'\xc3\x84\xc2\xb7\xc2\x86... More on stackoverflow.com
🌐 stackoverflow.com
January 21, 2018
Best way to convert string to bytes in Python 3? - TestMu AI Community
Best way to convert string to bytes in Python 3 More on community.testmuai.com
🌐 community.testmuai.com
0
June 6, 2024
image - How to create an HTML img tag string with base64 encoding from bytes in Python? - Stack Overflow
I am using Python 3.6 and I have an image as bytes: img = b'\xff\xd8\xff\xe0\x00\x10JFIF\x00' I need to convert the bytes into a string without encoding so it looks like: raw_img = '\xff\xd8\xff\x... More on stackoverflow.com
🌐 stackoverflow.com
🌐
Reddit
reddit.com › r/learnpython › how to read string to bytes without encoding?
r/learnpython on Reddit: How to read string to bytes without encoding?
May 9, 2019 -

I'd been receiving some binary data on a python socket, and printing the data to the console for a few days. I've switched the output to files, but would like to reclaim the previous data. The data looks like this, but when I load the file, it goes in and encodes the data, escaping the single quotes and pre-existing backslashes, like this.

I'm wanting to read the file, one line at a time, or loading the entire file as an array or list, but need to bypass or backtrack the encoding to use the data properly. Any pointers on what modules/functions I should be looking at?

🌐
Flexiple
flexiple.com › python › python-string-to-bytes
How to convert Python string to bytes? | Flexiple Tutorials | Python - Flexiple
The bytes() method is an inbuilt function that can be used to convert objects to byte objects. ... The bytes take in an object (a string in our case), the required encoding method, and convert it into a byte object.
🌐
Google Groups
groups.google.com › g › comp.lang.python › c › 3nPnNzgBoxQ
"convert" string to bytes without changing data (encoding)
March 28, 2012 - Steven D'Aprano <steve+comp....@pearwood.info> wrote: >The right way to convert bytes to strings, and vice versa, is via >encoding and decoding operations. If you want to dictate to the original poster the correct way to do things then you don't need to do anything more that. You don't need to pretend like Chris Angelico that there's isn't a direct mapping from the his Python 3 implementation's internal respresentation of strings to bytes in order to label what he's asking for as being "silly".
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-convert-string-to-bytes
Convert String to bytes-Python - GeeksforGeeks
The goal here is to convert a string into bytes in Python. This is essential for working with binary data or when encoding strings for storage or transmission.
Published   July 11, 2025
Find elsewhere
Top answer
1 of 2
20

You cannot convert a string into bytes or bytes into string without taking an encoding into account. The whole point about the bytes type is an encoding-independent sequence of bytes, while str is a sequence of Unicode code points which by design have no unique byte representation.

So when you want to convert one into the other, you must tell explicitly what encoding you want to use to perform this conversion. When converting into bytes, you have to say how to represent each character as a byte sequence; and when you convert from bytes, you have to say what method to use to map those bytes into characters.

If you don’t specify the encoding, then UTF-8 is the default, which is a sane default since UTF-8 is ubiquitous, but it's also just one of many valid encodings.

If you take your original string, '\xc4\xb7\x86\x17\xcd', take a look at what Unicode code points these characters represent. \xc4 for example is the LATIN CAPITAL LETTER A WITH DIAERESIS, i.e. Ä. That character happens to be encoded in UTF-8 as 0xC3 0x84 which explains why that’s what you get when you encode it into bytes. But it also has an encoding of 0x00C4 in UTF-16 for example.


As for how to solve this properly so you get the desired output, there is no clear correct answer. The solution that Kasramvd mentioned is also somewhat imperfect. If you read about the raw_unicode_escape codec in the documentation:

raw_unicode_escape

Latin-1 encoding with \uXXXX and \UXXXXXXXX for other code points. Existing backslashes are not escaped in any way. It is used in the Python pickle protocol.

So this is just a Latin-1 encoding which has a built-in fallback for characters outside of it. I would consider this fallback somewhat harmful for your purpose. For Unicode characters that cannot be represented as a \xXX sequence, this might be problematic:

>>> chr(256).encode('raw_unicode_escape')
b'\\u0100'

So the code point 256 is explicitly outside of Latin-1 which causes the raw_unicode_escape encoding to instead return the encoded bytes for the string '\\u0100', turning that one character into 6 bytes which have little to do with the original character (since it’s an escape sequence).

So if you wanted to use Latin-1 here, I would suggest you to use that one explictly, without having that escape sequence fallback from raw_unicode_escape. This will simply cause an exception when trying to convert code points outside of the Latin-1 area:

>>> '\xc4\xb7\x86\x17\xcd'.encode('latin1')
b'\xc4\xb7\x86\x17\xcd'
>>> chr(256).encode('latin1')
Traceback (most recent call last):
  File "<pyshell#28>", line 1, in <module>
    chr(256).encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0100' in position 0: ordinal not in range(256)

Of course, whether or not code points outside of the Latin-1 area can cause problems for you depends on where that string actually comes from. But if you can make guarantees that the input will only contain valid Latin-1 characters, then chances are that you don't really need to be working with a string there in the first place. Since you are actually dealing with some kind of bytes, you should look whether you cannot simply retrieve those values as bytes in the first place. That way you won’t introduce two levels of encoding there where you can corrupt data by misinterpreting the input.

2 of 2
10

You can use 'raw_unicode_escape' as your encoding:

In [14]: bytes(data, 'raw_unicode_escape')
Out[14]: b'\xc4\xb7\x86\x17\xcd'

As mentioned in comments you can also pass the encoding directly to the encode method of your string.

In [15]: data.encode("raw_unicode_escape")
Out[15]: b'\xc4\xb7\x86\x17\xcd'
🌐
Devace Technologies
devacetech.com › home › insights › string to bytes conversion in python-2025 manual
How to convert a string to bytes in Python
July 28, 2025 - In the given example, bytes ( ) receives text and encoding scheme, giving the same result as .encode ( ). Beginners may sometimes face errors while converting a string to bytes in Python, including: ... python # This will raise a TypeError bytes ( “ Python ” ) TypeError: string argument without an encoding Deploying .encode ( ) on already encoded bytes
🌐
DataCamp
datacamp.com › tutorial › string-to-bytes-conversion
How to Convert String to Bytes in Python | DataCamp
June 5, 2024 - In Python, use the .encode() method on a string to convert it into bytes, optionally specifying the desired encoding (UTF-8 by default).
🌐
Analytics Vidhya
analyticsvidhya.com › home › 7 ways to convert string to bytes in python
7 Ways to Convert String to Bytes in Python - Analytics Vidhya
February 7, 2024 - The bytes() function provides a simple way to convert strings to bytes. It is similar to the encode() method but returns an immutable bytes object instead of a mutable one. However, it is important to note that the bytes() function may raise ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › how-to-fix-typeerror-string-argument-without-an-encoding-in-python
How to Fix TypeError: String Argument Without an Encoding in Python - GeeksforGeeks
July 23, 2025 - encode() method turns a string ... the characters in the string. ... When we use the bytes() function, we need to specify the encoding format, like 'utf-8', to convert a string into bytes....
🌐
Stack Abuse
stackabuse.com › convert-bytes-to-string-in-python
Convert Bytes to String in Python
November 27, 2020 - This was added in Python 2.6, but ... - similar to bytes, but mutable. Let's take a look at how we can convert bytes to a String, using the built-in decode() method for the bytes class:...
🌐
Delft Stack
delftstack.com › home › howto › python › how to convert string to bytes in python
How to Convert String to Bytes in Python | Delft Stack
March 4, 2025 - The bytes constructor is a straightforward way to convert a string into bytes. This method takes a string and an optional encoding argument and returns the corresponding byte representation.
🌐
Spark By {Examples}
sparkbyexamples.com › home › python › python convert bytes to string
Python Convert Bytes to String - Spark By {Examples}
May 31, 2024 - To convert bytes to a string in Python, you can use the decode method, specifying the encoding of the byte data. The most common encoding is ‘utf-8’. ... The decode method is used to convert a sequence of bytes into a string.
Top answer
1 of 5
893

If you look at the docs for bytes, it points you to bytearray:

bytearray([source[, encoding[, errors]]])

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

So bytes can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.

For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding) -- there is no explicit verb when you use the constructor.

I checked the Python source. If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode; so you're just skipping a level of indirection if you call encode yourself.

Also, see Serdalis' comment -- unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.

2 of 5
735

It's easier than it is thought:

my_str = "hello world"
my_str_as_bytes = my_str.encode()
print(type(my_str_as_bytes)) # ensure it is byte representation
my_decoded_str = my_str_as_bytes.decode()
print(type(my_decoded_str)) # ensure it is string representation

you can verify by printing the types. Refer to output below.

<class 'bytes'>
<class 'str'>