Update: I wrote a solution that does not require reading the entire file in one go. It is too big for a stackoverflow answer, but can be found here jsonstream.
You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where was not part of the parsed object. It is poorly documented [1] (see footer), but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn ot accept strings that have prefixing whitespace. So we need to search to find the first non-whitespace part of your document.
from json import JSONDecoder, JSONDecodeError
import re
NOT_WHITESPACE = re.compile(r'\S')
def decode_stacked(document, idx=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, idx)
if not match:
return
idx = match.start()
try:
obj, idx = decoder.raw_decode(document, idx)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s = """
{"a": 1}
[
1
,
2
]
"""
for obj in decode_stacked(s):
print(obj)
prints:
{'a': 1}
[1, 2]
Note About Missing Documentation
The current signature of raw_decode() dates from 2009, when simplejson was ported into the standard library. The documentation for raw_decode() in simplejson mentions an optional idx argument that can be used to start parsing at an offset. Given that the signature of raw_decode() has not changed since 2009, I think it is fair to assume the API is fairly stable. Especially as decode() uses the idx argument of raw_decode() to ignore prefixing whitespace when parsing a string. And this is exactly what this answer is using the idx argument for too. The documentation of raw_decode() in simplejson is:
Answer from Dunes on Stack Overflow
raw_decode(s[, idx=0])Decode a JSON document from
s(astrorunicodebeginning with a JSON document) starting from the indexidxand return a 2-tuple of the Python representation and the index inswhere the document ended.This can be used to decode a JSON document from a string that may have extraneous data at the end, or to decode a string that has a series of JSON objects.
JSONDecodeErrorwill be raised if the given JSON document is not valid.
Update: I wrote a solution that does not require reading the entire file in one go. It is too big for a stackoverflow answer, but can be found here jsonstream.
You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where was not part of the parsed object. It is poorly documented [1] (see footer), but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn ot accept strings that have prefixing whitespace. So we need to search to find the first non-whitespace part of your document.
from json import JSONDecoder, JSONDecodeError
import re
NOT_WHITESPACE = re.compile(r'\S')
def decode_stacked(document, idx=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, idx)
if not match:
return
idx = match.start()
try:
obj, idx = decoder.raw_decode(document, idx)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s = """
{"a": 1}
[
1
,
2
]
"""
for obj in decode_stacked(s):
print(obj)
prints:
{'a': 1}
[1, 2]
Note About Missing Documentation
The current signature of raw_decode() dates from 2009, when simplejson was ported into the standard library. The documentation for raw_decode() in simplejson mentions an optional idx argument that can be used to start parsing at an offset. Given that the signature of raw_decode() has not changed since 2009, I think it is fair to assume the API is fairly stable. Especially as decode() uses the idx argument of raw_decode() to ignore prefixing whitespace when parsing a string. And this is exactly what this answer is using the idx argument for too. The documentation of raw_decode() in simplejson is:
raw_decode(s[, idx=0])Decode a JSON document from
s(astrorunicodebeginning with a JSON document) starting from the indexidxand return a 2-tuple of the Python representation and the index inswhere the document ended.This can be used to decode a JSON document from a string that may have extraneous data at the end, or to decode a string that has a series of JSON objects.
JSONDecodeErrorwill be raised if the given JSON document is not valid.
Use a json array, in the format:
[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]},
...
]
Then import it into your python code
import json
with open('file.json') as json_file:
data = json.load(json_file)
Now the content of data is an array with dictionaries representing each of the elements.
You can access it easily, i.e:
data[0]["ID"]
Videos
Hey, i am new to programming and I am trying to decode thousands of JSON files.
Usually there is one object in each JSON file, but for some reason a lot of my files have multiple JSON objects. Some have up to 5 objects.
{
"testNumber": "test200",
"device": {
"deviceID": 4000008
},
"user": {
"userID": "4121412"
}
}
{
"testNumber": "test201",
"device": {
"deviceID": 4000009
},
"user": {
"userID": "4121232"
}
}My code gives me the error: json.decoder.JSONDecodeError: Extra data: line 2 column 1
Because of that I am using except ValueError but I would like to get the data out of these JSON files.
import json
import os
test_dir = r'C:\Users\path\path'
for file in os.listdir(test_dir):
if 'testNumber' in file:
try:
data = json.load(open(test_dir + '\\' + file, 'r'))
print("valid")
except ValueError:
print("Decoding JSON has failed")Since json.loads and json.load don't work: is there any other way open the JSON file so that I can try to split the content in 2 objects?
I think the problem is that you are overwriting the file with fs.writeFileSync().
You should use fs.appendFileSync() to add new data to the end of the file. See the node docs.
https://nodejs.org/api/fs.html#fs_fs_appendfilesync_file_data_options
if you are writing all data at once, then you need to do create an array, push all objects to array and write the array to file
function insertDatasJson (res) {
let fs = require('fs');
let base = require('../public/json/template.json');
let result = [];
for (/*you loop statmeent*/) {
let obj = JSON.parse(JSON.stringify(base)); // or your preferred way of deep copying
obj.Subject = 'f';
obj.Body.Content = 'e';
obj.Start.DateTime = '2016-11-13T08:30:00';
obj.End.DateTime = '2016-11-13T17:30:00';
result.push(obj);
}
fs.writeFileSync('./public/json/output/jsonOutput.json', JSON.stringify(result, null, 4));
}
Or if you want to write data in multiple runs, then
function insertDatasJson (res) {
let fs = require('fs');
let base = require('../public/json/template.json');
let data = require('./public/json/output/jsonOutput.json');
base.Subject = 'f';
base.Body.Content = 'e';
base.Start.DateTime = '2016-11-13T08:30:00';
base.End.DateTime = '2016-11-13T17:30:00';
data.push(base);
fs.writeFileSync('./public/json/output/jsonOutput.json', JSON.stringify(data, null, 4));
}
However, in second case, you need to add some code to handle the case of first run when there is no existing data in the output file, or file doesn't exist. Another way to handle that condition would be to initialize the output file with empty JSON array
[]
EDIT: In both cases, appending to the existing file will not work as it will generate invalid JSON.
Hey all, I’ve got an annoying situation. We have a system, that we don’t control, that outputs JSON to a single file where each row of the file is a json object. All of these objects are not wrapped in a larger JSON array. That piece is important. Each row has all the same keys, just different values per key.
We need to import all of these objects into SQL server mapping the keys to columns. We got it working for the most part by following: https://www.sqlshack.com/import-json-data-into-sql-server/
Declare @JSON varchar(max) SELECT @JSON=BulkColumn FROM OPENROWSET (BULK 'C:\sqlshack\Results.JSON', SINGLE_CLOB) import SELECT * FROM OPENJSON (@JSON) WITH ( [FirstName] varchar(20), [MiddleName] varchar(20), [LastName] varchar(20), [JobTitle] varchar(20), [PhoneNumber] nvarchar(20), [PhoneNumberType] varchar(10), [EmailAddress] nvarchar(100), [EmailPromotion] bit
)
That works but it only reads the first object it finds. Is there anyway to tell SQL Server “loop over all the lines of this file and import them off?”
Ideally the other system would wrap all the lines in a valid JSON array but they don’t and we can’t make them.
Warning: im a SQL server noob, so this may be very simple but I can’t find anything about this online
Edit: I haven’t tried it yet but this might be the answer just in case someone else comes across this post in the far off future.
https://learn.microsoft.com/en-us/archive/blogs/sqlserverstorageengine/loading-line-delimited-json-files-in-sql-server-2016
Basically you have to hand a SQL server format file.
It would be better to assemble all of your data into one dict and then write it all out one time, instead of each time in the loop.
d = {}
for i in hosts_data:
log.info("Gathering host facts for host: {}".format(i['host']['name']))
try:
facts = requests.get(foreman_host+api+"hosts/{}/facts".format(i['host']['id']), auth=(username, password))
if hosts.status_code != 200:
log.error("Unable to connect to Foreman! Got retcode '{}' and error message '{}'"
.format(hosts.status_code, hosts.text))
sys.exit(1)
except requests.exceptions.RequestException as e:
log.error(e)
facts_data = json.loads(facts.text)
log.debug(facts_data)
d.update(facts_data) #add to dict
# write everything at the end
with open(results_file, 'a') as f:
f.write(json.dumps(d, sort_keys=True, indent=4))
Instead of writing json inside the loop, insert the data into a dict with the correct structure. Then write that dict to json when the loop is finished.
This assumes your dataset fit into memory.
how do I parse json file with multiple json objects (but each json object isn't on one line)
I have a json file with multiple json objects but each json object isn't on a distinct line.
For example 3 json objects below:
1 {
2 "names": [],
3 "ids": [],
4 } {
5 "names": [],
6 "ids": [
7 {
8 "groups": [],
9 } {
10 "key": "1738"
11 }
12 ]
13 }{
12 "names": [],
13 "key": "9",
14 "ss": "123"
15 }
Basically, there are multiple json objects but are not separated by commas and I don't know where each is separated because each json object is not all on one line. Each json object does not contain the same stuff.
Ideally, I would like to put all the json objects and put them in brackets w/ each json object separated by commas ultimately to convert it into a dictionary or array of json objects but the original file does not separate each json object.
That doesn't look much like json, but yes, you can totally have an array of objects in json file. Something like this in your case:
[{"firstName": "John", "lastName": "Smith"},
{"firstName": "Jane", "lastName": "Doe"}]
A json file may either contain a single object in (which can be complex, with many nested keys) or an array of such objects. It's either curly braces or square brackets on the outside.
A json file needs to have a top - this can either be a json object enclosed in {} or a json array enclosed in []
A json file can have as many objects as you like as long as they are enclosed in a top (although the word "top" is not explicitly used)
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"}
You can enclose the above using a top object
{}like this -
{
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"}
}
EDIT - The above is incorrect. Let me revise this answer.
1. The JSON file can have multiple objects as an array of objects.
2. You can't list multiple objects inside an object as shown above in the first example, as each object must have entries that are key/value pairs. In the above case the top object doesn't have key/value pairs but just a list of objects which is syntactically incorrect.
This means that the best way to have multiple objects is to create an array of multiple objects like this :
[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"}
]
Here is a link to the ECMA-404 standard that defines json.