python read json file with utf-8

stackoverflow.com › questions › 46408051 › python-json-load-set-encoding-to-utf-8

encode means characters to binary. What you want when reading a file is binary to characters → decode. But really this entire process is way too manual, simply do this:

with open('keys.json', encoding='utf-8') as fh:
    data = json.load(fh)

print(data)

with handles the correct opening and closing of the file, the encoding argument to open ensures the file is read using the correct encoding, and the load call reads directly from the file handle instead of storing a copy of the file contents in memory first.

If this still outputs invalid characters, it means your source encoding isn't UTF-8 or your console/terminal doesn't handle UTF-8.

Answer from deceze on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 46408051 › python-json-load-set-encoding-to-utf-8

python json load set encoding to utf-8 - Stack Overflow

Top answer

1 of 1

146

encode means characters to binary. What you want when reading a file is binary to characters → decode. But really this entire process is way too manual, simply do this:

with open('keys.json', encoding='utf-8') as fh:
    data = json.load(fh)

print(data)

If this still outputs invalid characters, it means your source encoding isn't UTF-8 or your console/terminal doesn't handle UTF-8.

Python

docs.python.org › 3 › library › json.html

JSON encoder and decoder — Python 3.14.3 documentation

The RFC requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability.

Python Forum

python-forum.io › thread-1338.html

Python .json problem with UTF-8 file

When my program tries to deserialize a .json file, it chokes on the UTF-8 designation (EF BB BF) at the beginning of the .json file. The error is: 'No JSON object could be decoded' Is there a way to ignore those three characters while reading ...

reddit.com › r/learnpython › can't load json - various encoding problems

r/learnpython on Reddit: Can't load JSON - various encoding problems

June 17, 2019 -

I'm trying to import this JSON file into my python code. When I saved it as a Unicode file, I get

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 183623: character maps to <undefined>

or this error when I tried changing the encoding type to UTF-8:

json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)

my code to open it is just

with open(filepath) as f:
statdata = json.load(f)

the encoding argument in open for when I tried to import in UTF8

EDIT: seems to work OK with open(filepath, encoding='utf-8-sig')

Top answer

1 of 2

You don't present an encoding to open() so it will use your system's default encoding, which is not dependable. You can force the actual encoding using the encoding= parameter with open(filepath, encoding='utf-8') as f: statdata = json.load(f) If that still shows an error it means that your file isn't actually in utf-8.

2 of 2

Redownload a fresh copy of the file and this time don't open it in notepad and resave it. The file is already valid utf-8. When you saved the file as "Unicode" in notepad you were really using utf-16-le. Notepad doesn't fully understand utf-8. The thing that is calls "UTF-8" contains an additional BOM character that shouldn't be there. This incorrect form of utf-8 is called utf-8-sig in python. If you really need to edit the file then you should use an editor that can produce valid utf-8.

PYnative

pynative.com › home › python › json › python encode unicode and non-ascii characters as-is into json

Python Encode Unicode and non-ASCII characters as-is into JSON

May 14, 2021 - Done writing JSON serialized Unicode Data as-is into file Reading JSON serialized Unicode data from file Decoded JSON serialized Unicode data 明彦明彦 ... You can also set JSON encoding to UTF-8. UTF-8 is the recommended default for maximum ...

Python

docs.python.org › 3.0 › library › json.html

json — JSON encoder and decoder — Python v3.0.1 documentation

Deserialize fp (a .read()-supporting file-like object containing a JSON document) to a Python object. If the contents of fp are encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified.

Stuff

nexusger.de › posts › 2015-11-02-python-pandas-and-json_read-with-utf-8-encoding

Python, pandas and json_read with utf-8 encoding · Stuff

The pandas library is a fantastic python toolkit to work with data. Recently I needed to read some json files in a pandas dataframe. Usually you can do that easily with the built in method: ... But this method fails, if it encounters utf-8 encoded files. In contrast to the more often used methods _read_table_ and _read_csv_, _read_json_ does not provide an _encoding_ parameter.

Readthedocs

simplejson.readthedocs.io › en › v3.17.0

simplejson — JSON encoder and decoder — simplejson 3.17.0 documentation

If fp.read() returns bytes, such as a file opened in binary mode, then an appropriate encoding should be specified (the default is UTF-8). ... load() will read the rest of the file-like object as a string and then call loads(). It does not stop at the end of the first valid JSON document it ...

Stack Overflow

stackoverflow.com › questions › 13156395 › python-load-json-file-with-utf-8-bom-header

Python load json file with UTF-8 BOM header - Stack Overflow

Top answer

1 of 7

101

You can open with codecs:

import json
import codecs

json.load(codecs.open('sample.json', 'r', 'utf-8-sig'))

or decode with utf-8-sig yourself and pass to loads:

json.loads(open('sample.json').read().decode('utf-8-sig'))

2 of 7

Simple! You don't even need to import codecs.

with open('sample.json', encoding='utf-8-sig') as f:
    data = json.load(f)

Find elsewhere

Google Bing Mojeek

Lightrun

lightrun.com › answers › ytdl-org-youtube-dl-retrieve-json-data-in-unicode-encoding-utf-8-

Retrieve JSON data in unicode (Encoding UTF-8)

GitHub Actions, Python October 8, 2023 · Solving view registration conflicts in React Native SVG animations. 18,342 2 Years ago Read More · Understanding environment variables in GitHub Actions workflows. 31,209 1 Year ago Read More · Fixing type generation errors in Swagger TypeScript API.

GitHub

gist.github.com › norioxkimura › 2558970

Python 2.7: Save unicode string to file not using '\uXXXX' but using UTF-8. · GitHub

Python 2.7: Save unicode string to file not using '\uXXXX' but using UTF-8. - json-utf8.py

Medium

alucard001.medium.com › quick-notes-how-to-write-utf-8-json-content-to-file-no-u-escape-using-python-3-72df2889f54

Quick notes “How to write UTF-8 JSON content to file, no \u escape using Python 3 | by Ellery Leung | Medium

August 14, 2019 - Here is a quick note to this question: How to write a JSON Unicode (e.g. in Chinese) string to a file, without Python 3 json module to convert your string to \u escaped string. Searching on Google I found that there are a lot of Python 2 syntax which is not applicable to Python3. So I decided to write this for reference to other people. Here is the example. with open(your_filename, 'r') as f: # Read file to a variable json_content = json.loads(f.read()) # Write json_content to file.

Stack Overflow

stackoverflow.com › questions › 27340542 › load-json-into-python-unicodedecodeerror › 27340606

load .json into python; UnicodeDecodeError - Stack Overflow

Top answer

1 of 2

Have you tried:

json.loads(line.decode("utf-8"))

Similar question asked here: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2

Edit: If the above does not work,

json.loads(line.decode("utf-8","ignore"))

will.

2 of 2

If method answered by Academiphile doesn't work try this:

with open('path/to/file.json', encoding='utf-8') as file:
    model = json.load(file)

Adding it to open() function allows doing it with with json.load() function.

Stack Overflow

stackoverflow.com › questions › 53020242 › read-json-data-from-utf-8-encoded-byte-string

python - Read JSON data from UTF-8 encoded byte string - Stack Overflow

Top answer

1 of 1

You are encoding to JSON then decoding again. SImply not encode, remove the second line:

decodedjson = data.decode('utf-8')
loadedjson = json.loads(decodedjson)

If you are using Python 3.6 or newer, you don't actually have to decode from UTF-8, as the json.loads() function knows how to deal with UTF-encoded JSON data directly. The same applies to Python 2:

loadedjson = json.loads(data)

Demo using Python 3.7:

>>> data = b'{"Current": 117.42609405517578, "Accelerometer": -5.394751071929932, "SpeedometerKPH": 67.12493133544922, "Ammeter": 117.3575210571289, "Amp": 117.35590362548828, "Acceleration": -0.03285316377878189, "TractiveEffort": -5.394751071929932, "Effort": 48.72163772583008, "RawTargetDistance": 3993.927734375, "TargetDistanceBar": 0.9777777791023254, "TargetDistanceDigits100": -1.0, "TargetDistanceDigits1000": -1.0}'
>>> loadedjson = json.loads(data)
>>> loadedjson['Ammeter']
117.3575210571289

GeeksforGeeks

geeksforgeeks.org › python-encode-unicode-and-non-ascii-characters-into-json

Python Encode Unicode and non-ASCII characters into JSON - GeeksforGeeks

April 28, 2025 - How to encode Unicode and non-ASCII characters into JSON in Python. How to save non-ASCII or Unicode data as-is, without converting it to a \u escape sequence, in JSON. How to serialize Unicode data and write it into a file. How to serialize Unicode objects into UTF-8 JSON strings, instead of \u escape sequences.

Stack Overflow

stackoverflow.com › questions › 67068567 › how-to-get-utf-8-text-file-into-a-json-object-in-python-3

How to get utf-8 text file into a Json object in Python 3 - Stack Overflow

Top answer

1 of 1

What you get is valid UTF-8 JSON. It's just written as pure ASCII using escape codes for non-ASCII characters, which as a subset of UTF-8 is also valid UTF-8. Read it back in with json.load and it will be the original string. If you want the actual Unicode characters encoded as UTF-8 instead of escape codes when written to the file, use json.dump with the ensure_ascii=False parameter, and make sure to open the file with encoding='utf8':

with open("test.json", "w", encoding='utf8') as jsonFile:
    json.dump(data, jsonFile, ensure_ascii=False)

This is in the documentation:

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
...
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.

SpeedySense

speedysense.com › home › python › how to fix json.loads unexpected utf-8 bom error in python

How to Fix json.loads Unexpected UTF-8 BOM Error in Python - SpeedySense

October 26, 2021 - We have seen solutions to fix Unexpected UTF-8 BOM errors when using json.loads in Python.

Python

docs.python.org › 2 › library › json.html

18.2. json — JSON encoder and decoder — Python 2.7.18 documentation

Deserialize fp (a .read()-supporting file-like object containing a JSON document) to a Python object using this conversion table. If the contents of fp are encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified.

Real Python

realpython.com › python-json

Working With JSON Data in Python – Real Python

August 20, 2025 - To investigate where Python managed to remove even more whitespace from the original JSON, open the Python REPL again and minify the content of the original hello_frieda.json file with Python’s json module: ... >>> import json >>> with open("hello_frieda.json", mode="r", encoding="utf-8") as input_file: ... original_json = input_file.read() ...

Stack Overflow

stackoverflow.com › questions › 57360840 › loading-json-file-and-decoding-in-utf-16

python - Loading json file and decoding in UTF-16 - Stack Overflow

Top answer

1 of 2

Specify the encoding as a part of open(). Here is a "round-trip demo":

>>> import json
>>> data = {
...     "title": "قالت وزارة الداخلية المصرية إن كمية من المتفجرات في سيارة كانت معدة لتنفيذ عملية إرهابية أدت إلى الانفجار الذي وقع وسط القاهرة وأودى بحياة نحو 20 شخصا."
... }
>>> with open("/tmp/utf16demo.json", "w", encoding="utf-16") as f:
...     json.dump(data, f)
>>> with open("/tmp/utf16demo.json", encoding="utf-16") as f:
...     newdata = json.load(f)
>>> next(iter(newdata.values())) == next(iter(data.values()))
True

As mentioned in the comments, just because the data is originally UTF-16 encoded does not need you mean to write it back to CSV in the same encoding. You are perfectly free to load and decode using UTF-16, but then write out using UTF-8.

2 of 2

import json
{"intents": [
    {"tag": "greeting",
     "patterns": ["هاي","عامل إيه","ايه اخبارك","ازيك"],
     "responses": ["هاي!","كويس","حمدالله","ماشي الحال وإنت  ??"],
     "context_set": ""
    }
    ]
}
with open("intents.json", encoding="utf-8") as f:
     intents = json.load(f)