encode means characters to binary. What you want when reading a file is binary to characters → decode. But really this entire process is way too manual, simply do this:
with open('keys.json', encoding='utf-8') as fh:
data = json.load(fh)
print(data)
with handles the correct opening and closing of the file, the encoding argument to open ensures the file is read using the correct encoding, and the load call reads directly from the file handle instead of storing a copy of the file contents in memory first.
If this still outputs invalid characters, it means your source encoding isn't UTF-8 or your console/terminal doesn't handle UTF-8.
Answer from deceze on Stack OverflowI'm trying to import this JSON file into my python code. When I saved it as a Unicode file, I get
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 183623: character maps to <undefined>
or this error when I tried changing the encoding type to UTF-8:
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
my code to open it is just
with open(filepath) as f: statdata = json.load(f)
the encoding argument in open for when I tried to import in UTF8
EDIT: seems to work OK with open(filepath, encoding='utf-8-sig')
You can open with codecs:
import json
import codecs
json.load(codecs.open('sample.json', 'r', 'utf-8-sig'))
or decode with utf-8-sig yourself and pass to loads:
json.loads(open('sample.json').read().decode('utf-8-sig'))
Simple! You don't even need to import codecs.
with open('sample.json', encoding='utf-8-sig') as f:
data = json.load(f)
Have you tried:
json.loads(line.decode("utf-8"))
Similar question asked here: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2
Edit: If the above does not work,
json.loads(line.decode("utf-8","ignore"))
will.
If method answered by Academiphile doesn't work try this:
with open('path/to/file.json', encoding='utf-8') as file:
model = json.load(file)
Adding it to open() function allows doing it with with json.load() function.
Specify the encoding as a part of open(). Here is a "round-trip demo":
>>> import json
>>> data = {
... "title": "قالت وزارة الداخلية المصرية إن كمية من المتفجرات في سيارة كانت معدة لتنفيذ عملية إرهابية أدت إلى الانفجار الذي وقع وسط القاهرة وأودى بحياة نحو 20 شخصا."
... }
>>> with open("/tmp/utf16demo.json", "w", encoding="utf-16") as f:
... json.dump(data, f)
>>> with open("/tmp/utf16demo.json", encoding="utf-16") as f:
... newdata = json.load(f)
>>> next(iter(newdata.values())) == next(iter(data.values()))
True
As mentioned in the comments, just because the data is originally UTF-16 encoded does not need you mean to write it back to CSV in the same encoding. You are perfectly free to load and decode using UTF-16, but then write out using UTF-8.
import json
{"intents": [
{"tag": "greeting",
"patterns": ["هاي","عامل إيه","ايه اخبارك","ازيك"],
"responses": ["هاي!","كويس","حمدالله","ماشي الحال وإنت ??"],
"context_set": ""
}
]
}
with open("intents.json", encoding="utf-8") as f:
intents = json.load(f)