Hello! I need to convert my JSON file to a binary to be able to pass it as BLOB into my database. But how can I do that? Is it possible to do that in python?
Thank you!
In the RFC https://www.rfc-editor.org/rfc/rfc7159, it says:
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32
At first glance it does seem that Python isn't really following the spec when you first look at this after all what does it mean to encode something when it remains a Python3 'str' string, however Python is doing some encoding for you nonetheless. Try this:
>>> json.dumps({"Japan":"日本"})
'{"Japan": "\\u65e5\\u672c"}'
You can see that the Japanese has got converted to unicode escapes, and the resultant string is actually ASCII, even if it's still a Python str. I'm unsure how to get json.dumps() to actually give you utf-8 sequences - for interoperability purposes - if you wanted them, however for all practical purposes this is good enough for most people. The characters are there and will be interpreted correctly. It's easy to get binary with:
>>> json.dumps({"Japan":"日本"}).encode("ascii")
b'{"Japan": "\\u65e5\\u672c"}'
And python does the right thing when loading back in:
>>> json.loads(json.dumps({"Japan":"日本"}).encode("ascii"))
{'Japan': '日本'}
But if you don't bother encoding at all, the loads() still figures out what to do as well when given a str:
>>> json.loads(json.dumps({"Japan":"日本"}))
{'Japan': '日本'}
Python is - as ever - trying to be as helpful as possible in figuring out what you want and doing it, but this is perplexing to people who dig a little deeper, and in spite of loving Python to bits I sympathise with the OP. Whether this kind of 'helpful' behaviour is worth the confusion is a debate that will rage on.
Worth noting that if the next thing to be done with the output is writing to a file, then you can just do:
pathlib.Path("myfile.json").open("w").write(json_data)
Then you don't need it binary because the file is opened in text mode and encoding is done for you.
I see this as a 2-step problem
Step 1: Convert json to string
my_string = json.dumps(my_json)
Step 2: Convert string to binary string
my_binary_string = my_string.encode('utf-8')
Or obviously in one line
my_binary_string = json.dumps(my_json).encode('utf-8')
python - How to convert binary data to json - Stack Overflow
Deserialize JSON into a binary file using Python - Stack Overflow
python - compress the data inside my json file to binary - Stack Overflow
how to serialize arbitrary file types to json string in python - Stack Overflow
» pip install pbjson
You need to decode it then split by '\n' and load each json object separately. If you store your byte string in a variable called byte_string you could do something like:
json_str = byte_string.decode('utf-8')
json_objs = json_str.split('\n')
for obj in json_objs:
json.loads(obj)
For the particular string that you have posted here though, you will get an error on the second object because the second key in it is missing a double quote. It is name" in the string you linked.
First, this isn't valid json since it's not a single object. Second, there is a typo: the "id":"2" entry is missing a double-quote on the name property element.
An alternative to processing one dict at a time, you can replace the newlines with "," and turn it into an array. This is a fragile solution since it requires exactly one newline between each dict, but is compact:
s = b'{"id": "1", "name": " value1"}\n{"id":"2", "name": "value2"}\n{"id":"3", "name": "value3"}\n'
my_json = s.decode('utf8')
json_data = json.loads("[" + my_json.rstrip().replace("\n", ",") + "]")
I'd use base64. JSON isn't designed to communicate binary data. So unless your file's content is vanilla text, it "should be" encoded to use vanilla text. Virtually everything can encode and decode base64. If you instead use (for example) Python's repr(file_content), that also produces "plain text", but the receiving end would need to know how to decode the string escapes Python's repr() uses.
JSON cannot handle binary. You will need to encode the data as text before serializing, and the easiest to encode it as is Base64. You do not need to use the URL-safe form of encoding unless there are requirements for it further down the processing chain.
» pip install py-ubjson