The built-in JSON module can be used as a validator:
import json
def parse(text):
try:
return json.loads(text)
except ValueError as e:
print('invalid json: %s' % e)
return None # or: raise
You can make it work with files by using:
with open(filename) as f:
return json.load(f)
instead of json.loads and you can include the filename as well in the error message.
On Python 3.3.5, for {test: "foo"}, I get:
invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
and on 2.7.6:
invalid json: Expecting property name: line 1 column 2 (char 1)
This is because the correct json is {"test": "foo"}.
When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.
If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.
Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.
Answer from reece on Stack OverflowThe built-in JSON module can be used as a validator:
import json
def parse(text):
try:
return json.loads(text)
except ValueError as e:
print('invalid json: %s' % e)
return None # or: raise
You can make it work with files by using:
with open(filename) as f:
return json.load(f)
instead of json.loads and you can include the filename as well in the error message.
On Python 3.3.5, for {test: "foo"}, I get:
invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
and on 2.7.6:
invalid json: Expecting property name: line 1 column 2 (char 1)
This is because the correct json is {"test": "foo"}.
When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.
If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.
Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.
Yes, there are ways to validate that a JSON file is valid. One way is to use a JSON parsing library that will throw exceptions if the input you provide is not well-formatted.
try:
load_json_file(filename)
except InvalidDataException: # or something
# oops guess it's not valid
Of course, if you want to fix it, you naturally cannot use a JSON loader since, well, it's not valid JSON in the first place. Unless the library you're using will automatically fix things for you, in which case you probably wouldn't even have this question.
One way is to load the file manually and tokenize it and attempt to detect errors and try to fix them as you go, but I'm sure there are cases where the error is just not possible to fix automatically and would be better off throwing an error and asking the user to fix their files.
I have not written a JSON fixer myself so I can't provide any details on how you might go about actually fixing errors.
However I am not sure whether it would be a good idea to fix all errors, since then you'd have assume your fixes are what the user actually wants. If it's a missing comma or they have an extra trailing comma, then that might be OK, but there may be cases where it is ambiguous what the user wants.
How do I check if a string is valid JSON in Python? - Stack Overflow
python - Check if file is json loadable - Stack Overflow
How to check if something is in a JSON object before running if statement
python - How to check JSON format validation? - Stack Overflow
Videos
You can try to do json.loads(), which will throw a ValueError if the string you pass can't be decoded as JSON.
In general, the "Pythonic" philosophy for this kind of situation is called EAFP, for Easier to Ask for Forgiveness than Permission.
Example Python script returns a boolean if a string is valid json:
import json
def is_json(myjson):
try:
json.loads(myjson)
except ValueError as e:
return False
return True
Which prints:
print is_json("{}") #prints True
print is_json("{asdf}") #prints False
print is_json('{ "age":100}') #prints True
print is_json("{'age':100 }") #prints False
print is_json("{\"age\":100 }") #prints True
print is_json('{"age":100 }') #prints True
print is_json('{"foo":[5,6.8],"foo":"bar"}') #prints True
Convert a JSON string to a Python dictionary:
import json
mydict = json.loads('{"foo":"bar"}')
print(mydict['foo']) #prints bar
mylist = json.loads("[5,6,7]")
print(mylist)
[5, 6, 7]
Convert a python object to JSON string:
foo = {}
foo['gummy'] = 'bear'
print(json.dumps(foo)) #prints {"gummy": "bear"}
If you want access to low-level parsing, don't roll your own, use an existing library: http://www.json.org/
Great tutorial on python JSON module: https://pymotw.com/2/json/
Is String JSON and show syntax errors and error messages:
sudo cpan JSON::XS
echo '{"foo":[5,6.8],"foo":"bar" bar}' > myjson.json
json_xs -t none < myjson.json
Prints:
, or } expected while parsing object/hash, at character offset 28 (before "bar}
at /usr/local/bin/json_xs line 183, <STDIN> line 1.
json_xs is capable of syntax checking, parsing, prittifying, encoding, decoding and more:
https://metacpan.org/pod/json_xs
» pip install json-checker
You can do something like this:
def convert(tup):
"""
Convert to python dict.
"""
try:
tup_json = json.loads(tup)
return tup_json
except ValueError, error: # includes JSONDecodeError
logger.error(error)
return None
converted = convert(<string_taht_neeeds_to_be_converted_to_json>):
if converted:
<do_your_logic>
else:
<if_string_is_not_converteble>
If the top-level data you're dumping is an object, you could check if the first character is {, or [ if it's an array. That's only valid if the header for the other format will never start with those characters. It's also not foolproof because it doesn't guarantee that your data is well formed JSON.
On the other hand your existing solution is fine, much more clear and robust.
Ok... This is a little complicated... So... Sorry in advance
I have a function that returns attributes of a video file (FFprobe) in a JSON object, and then a little factory that parses the JSON looking for specific attributes, and running if statements on those attributes.
I.E. One of attributes in the JSON is subtitle format. So if the subtitle format != a desired format, then set a variable that is used in another format for encoding the subtitle to the desired format
The issue that I have run into so that sometimes those attributes (like subtitle) don't exist in the JSON because they do not exist in the file.
So I sort of need to check if the attribute in the JSON exists, before I check to see if if that attribute is the desired attribute and start setting variables
How do I do this?
Would it be as simple as:
json_object= json.loads(studentJson)
if "subtitle_format" in json_object:
print("Key exist in json_object")
print(subtitle_format["ASS"], " is the subtitle format")
else:
print("Key doesn't exist in JSON data")If yes, would I get yelled at if the if statement had a few layers? Psudo:
if subtitle_format in json_object:
if subtitle_format == ass
if subtitle_format == english
encode You might consider jsonschema to validate your JSON. Here is a program that validates your example. To extend this to your "20 keys", add the key names to the "required" list.
import jsonschema
import json
schema = {
"type": "object",
"properties": {
"customer": {
"type": "object",
"required": ["lastName", "firstName", "age"]}},
"required": ["service", "customer"]
}
json_document = '''{
"service" : "Some Service Name",
"customer" : {
"lastName" : "Kim",
"firstName" : "Bingbong",
"age" : "99"
}
}'''
try:
# Read in the JSON document
datum = json.loads(json_document)
# And validate the result
jsonschema.validate(datum, schema)
except jsonschema.exceptions.ValidationError as e:
print("well-formed but invalid JSON:", e)
except json.decoder.JSONDecodeError as e:
print("poorly-formed text, not JSON:", e)
Resources:
- https://pypi.python.org/pypi/jsonschema
- http://json-schema.org/example1.html
If your finding the json schema syntax confusing. Create your json as you want it and then run it though online-json-to-schema-converter and then use it in Rob's example above.