Your bytes object is almost JSON, but it's using single quotes instead of double quotes, and it needs to be a string. So one way to fix it is to decode the bytes to str and replace the quotes. Another option is to use ast.literal_eval; see below for details. If you want to print the result or save it to a file as valid JSON you can load the JSON to a Python list and then dump it out. Eg,
import json
my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'
# Decode UTF-8 bytes to Unicode, and convert single quotes
# to double quotes to make it valid JSON
my_json = my_bytes_value.decode('utf8').replace("'", '"')
print(my_json)
print('- ' * 20)
# Load the JSON to a Python list & dump it back out as formatted JSON
data = json.loads(my_json)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
output
[{"Date": "2016-05-21T21:35:40Z", "CreationDate": "2012-05-05", "LogoType": "png", "Ref": 164611595, "Classe": ["Email addresses", "Passwords"],"Link":"http://some_link.com"}]
- - - - - - - - - - - - - - - - - - - -
[
{
"Classe": [
"Email addresses",
"Passwords"
],
"CreationDate": "2012-05-05",
"Date": "2016-05-21T21:35:40Z",
"Link": "http://some_link.com",
"LogoType": "png",
"Ref": 164611595
}
]
As Antti Haapala mentions in the comments, we can use ast.literal_eval to convert my_bytes_value to a Python list, once we've decoded it to a string.
from ast import literal_eval
import json
my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'
data = literal_eval(my_bytes_value.decode('utf8'))
print(data)
print('- ' * 20)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
Generally, this problem arises because someone has saved data by printing its Python repr instead of using the json module to create proper JSON data. If it's possible, it's better to fix that problem so that proper JSON data is created in the first place.
Your bytes object is almost JSON, but it's using single quotes instead of double quotes, and it needs to be a string. So one way to fix it is to decode the bytes to str and replace the quotes. Another option is to use ast.literal_eval; see below for details. If you want to print the result or save it to a file as valid JSON you can load the JSON to a Python list and then dump it out. Eg,
import json
my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'
# Decode UTF-8 bytes to Unicode, and convert single quotes
# to double quotes to make it valid JSON
my_json = my_bytes_value.decode('utf8').replace("'", '"')
print(my_json)
print('- ' * 20)
# Load the JSON to a Python list & dump it back out as formatted JSON
data = json.loads(my_json)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
output
[{"Date": "2016-05-21T21:35:40Z", "CreationDate": "2012-05-05", "LogoType": "png", "Ref": 164611595, "Classe": ["Email addresses", "Passwords"],"Link":"http://some_link.com"}]
- - - - - - - - - - - - - - - - - - - -
[
{
"Classe": [
"Email addresses",
"Passwords"
],
"CreationDate": "2012-05-05",
"Date": "2016-05-21T21:35:40Z",
"Link": "http://some_link.com",
"LogoType": "png",
"Ref": 164611595
}
]
As Antti Haapala mentions in the comments, we can use ast.literal_eval to convert my_bytes_value to a Python list, once we've decoded it to a string.
from ast import literal_eval
import json
my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'
data = literal_eval(my_bytes_value.decode('utf8'))
print(data)
print('- ' * 20)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
Generally, this problem arises because someone has saved data by printing its Python repr instead of using the json module to create proper JSON data. If it's possible, it's better to fix that problem so that proper JSON data is created in the first place.
You can simply use,
import json
my_bytes_value = my_bytes_value.decode().replace("'", '"')
json.loads(my_bytes_value)
Python decoding | Bytes to json - Stack Overflow
python - How to encode bytes in JSON? json.dumps() throwing a TypeError - Stack Overflow
json - python byte string encode and decode - Stack Overflow
Python 3 bytes, string JSON problems
json.dumps() expects strings in its input. Since base64.b64encode() encodes bytes you need to convert those bytes into a string using the ASCII codec:
import base64
encoded = base64.b64encode(b'data to be encoded') # b'ZGF0YSB0byBiZSBlbmNvZGVk' (notice the "b")
data['bytes'] = encoded.decode('ascii') # 'ZGF0YSB0byBiZSBlbmNvZGVk'
Note that to get the original data back you don't need to re-encode it to bytes because b64decode() handles ASCII-only strings as well as bytes:
decoded = base64.b64decode(data['bytes']) # b'data to be encoded'
The other answers have pointed you to a fix for your error. More generally, if your dictionary contains bytes objects, you can use a custom encoder class to encode, without having to manually convert each item to base64:
import base64
import json
data = {"bytes": b"some byte string"}
class BytesEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, bytes):
return base64.b64encode(o).decode("ascii")
else:
return super().default(o)
print(json.dumps(data, cls=BytesEncoder))
You need to examine the documentation for the software API that you are using. BLOB is an acronym: BINARY Large Object.
If your data is in fact binary, the idea of decoding it to Unicode is of course a nonsense.
If it is in fact text, you need to know what encoding to use to decode it to Unicode.
Then you use json.dumps(a_Python_object) ... if you encode it to UTF-8 yourself, json will decode it back again:
>>> import json
>>> json.dumps(u"\u0100\u0404")
'"\\u0100\\u0404"'
>>> json.dumps(u"\u0100\u0404".encode('utf8'))
'"\\u0100\\u0404"'
>>>
UPDATE about latin1:
u'\x80' is a useless meaningless C1 control character -- the encoding is extremely unlikely to be Latin-1. Latin-1 is "a snare and a delusion" -- all 8-bit bytes are decoded to Unicode without raising an exception. Don't confuse "works" and "doesn't raise an exception".
Use b.decode('name of source encoding') to get a unicode version. This was surprising to me when I learned it. eg:
In [123]: 'foo'.decode('latin-1')
Out[123]: u'foo'
The most important part to properly answer this is the information on how you pass these objetcts to the Python2 program: you are using JSON.
So, stay with me:
After you do the .encode step in program 1, you have a bytes object. By calling str(...) on it, you are just putting a escaping layer on this bytes object, and turning it back to a string - but when this string is written as is to a file, or transmited over the network, it will be encoded again - any non-ASCII tokens are usually escaped with the \u prefix and the codepoint for each character - but the original Chinese chracters themselves are now encoded in utf-8 and doubly-escaped.
Python's JSON load methods already decode the contents of json data into text-strings: so a decode method is not to be expected at all.
In short: to pass data around, simply encode your original text as JSON in the first program, and do not botter with any decoding after json.load on the target Python 2 program:
# my first program
x = "宇宙"
# No str-encode-decode dance needed here.
...
data = json.dumps({"example_key": x, ...})
# code to transmit json string by network or file as it is...
# my other program
text = json.loads(data)["example_key"]
# text is a Unicode text string ready to be used!
As you are doing, you are probably gettint the text doubly-encoded - I will mimick it on the Python 3 console. I will print the result from each step so you can undestand the transforms that are taking place.
In [1]: import json
In [2]: x = "宇宙"
In [3]: print(x.encode("utf-8"))
b'\xe5\xae\x87\xe5\xae\x99'
In [4]: text = str(x.encode("utf-8"))
In [5]: print(text)
b'\xe5\xae\x87\xe5\xae\x99'
In [6]: json_data = json.dumps(text)
In [7]: print(json_data)
"b'\\xe5\\xae\\x87\\xe5\\xae\\x99'"
# as you can see, it is doubly escaped, and it is mostly useless in this form
In [8]: recovered_from_json = json.loads(json_data)
In [9]: print(recovered_from_json)
b'\xe5\xae\x87\xe5\xae\x99'
In [10]: print(repr(recovered_from_json))
"b'\\xe5\\xae\\x87\\xe5\\xae\\x99'"
In [11]: # and if you have data like this in files/databases you need to recover:
In [12]: import ast
In [13]: recovered_text = ast.literal_eval(recovered_from_json).decode("utf-8")
In [14]: print(recovered_text)
宇宙
I had this issue when decoding stringified bytes received in a JSON payload:
print(value)
>>> "b'text-here'"
Encoding the string only compounded the problem by adding another layer of wrapping:
encoded = value.encode("utf-8")
print(encoded)
>>> b"b'text-here'"
And this solution from @jsbueno didn't work for me - I got a json.decoder.JSONDecodeError:
recovered_from_json = json.loads(json_data)
SOLUTION
You need to evaluate the string to expose the wrapped bytes object:
import ast
as_bytes = ast.literal_eval(value)
print(as_bytes)
>>> b'text-here'
Then you can decode to string:
decoded = value.decode()
print(decoded)
>>> 'text-here'
As noted by @snakecharmerb, you shouldn't use eval() as it opens you up to running potentially dangerous commands if input is not checked or sanitized (see this old but illustrative blog post)
Python’s wonderful standard library to the rescue…
Copyimport codecs
reader = codecs.getreader("utf-8")
obj = json.load(reader(response))
Works with both py2 and py3.
Docs: Python 2, Python3
HTTP sends bytes. If the resource in question is text, the character encoding is normally specified, either by the Content-Type HTTP header or by another mechanism (an RFC, HTML meta http-equiv,...).
urllib should know how to encode the bytes to a string, but it's too naïve—it's a horribly underpowered and un-Pythonic library.
Dive Into Python 3 provides an overview about the situation.
Your "work-around" is fine—although it feels wrong, it's the correct way to do it.