Update: I wrote a solution that does not require reading the entire file in one go. It is too big for a stackoverflow answer, but can be found here jsonstream.
You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where was not part of the parsed object. It is poorly documented [1] (see footer), but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn ot accept strings that have prefixing whitespace. So we need to search to find the first non-whitespace part of your document.
from json import JSONDecoder, JSONDecodeError
import re
NOT_WHITESPACE = re.compile(r'\S')
def decode_stacked(document, idx=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, idx)
if not match:
return
idx = match.start()
try:
obj, idx = decoder.raw_decode(document, idx)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s = """
{"a": 1}
[
1
,
2
]
"""
for obj in decode_stacked(s):
print(obj)
prints:
{'a': 1}
[1, 2]
Note About Missing Documentation
The current signature of raw_decode() dates from 2009, when simplejson was ported into the standard library. The documentation for raw_decode() in simplejson mentions an optional idx argument that can be used to start parsing at an offset. Given that the signature of raw_decode() has not changed since 2009, I think it is fair to assume the API is fairly stable. Especially as decode() uses the idx argument of raw_decode() to ignore prefixing whitespace when parsing a string. And this is exactly what this answer is using the idx argument for too. The documentation of raw_decode() in simplejson is:
Answer from Dunes on Stack Overflow
raw_decode(s[, idx=0])Decode a JSON document from
s(astrorunicodebeginning with a JSON document) starting from the indexidxand return a 2-tuple of the Python representation and the index inswhere the document ended.This can be used to decode a JSON document from a string that may have extraneous data at the end, or to decode a string that has a series of JSON objects.
JSONDecodeErrorwill be raised if the given JSON document is not valid.
Update: I wrote a solution that does not require reading the entire file in one go. It is too big for a stackoverflow answer, but can be found here jsonstream.
You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where was not part of the parsed object. It is poorly documented [1] (see footer), but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn ot accept strings that have prefixing whitespace. So we need to search to find the first non-whitespace part of your document.
from json import JSONDecoder, JSONDecodeError
import re
NOT_WHITESPACE = re.compile(r'\S')
def decode_stacked(document, idx=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, idx)
if not match:
return
idx = match.start()
try:
obj, idx = decoder.raw_decode(document, idx)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s = """
{"a": 1}
[
1
,
2
]
"""
for obj in decode_stacked(s):
print(obj)
prints:
{'a': 1}
[1, 2]
Note About Missing Documentation
The current signature of raw_decode() dates from 2009, when simplejson was ported into the standard library. The documentation for raw_decode() in simplejson mentions an optional idx argument that can be used to start parsing at an offset. Given that the signature of raw_decode() has not changed since 2009, I think it is fair to assume the API is fairly stable. Especially as decode() uses the idx argument of raw_decode() to ignore prefixing whitespace when parsing a string. And this is exactly what this answer is using the idx argument for too. The documentation of raw_decode() in simplejson is:
raw_decode(s[, idx=0])Decode a JSON document from
s(astrorunicodebeginning with a JSON document) starting from the indexidxand return a 2-tuple of the Python representation and the index inswhere the document ended.This can be used to decode a JSON document from a string that may have extraneous data at the end, or to decode a string that has a series of JSON objects.
JSONDecodeErrorwill be raised if the given JSON document is not valid.
Use a json array, in the format:
[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]},
...
]
Then import it into your python code
import json
with open('file.json') as json_file:
data = json.load(json_file)
Now the content of data is an array with dictionaries representing each of the elements.
You can access it easily, i.e:
data[0]["ID"]
Writing multiple json items to files
Multiple objects in JSON file
Parse JSON File with Array of Objects without converting to USTRUCT
Parsing multiple JSON objects from a string or stream
Videos
I believe your JSON file is syntactically invalid. See www.json.org. Your file should contain a single object or array, e.g. in your case it should like this:
[{"A":"something1","B":"something2","C":"something3","D":"something4"},
{"A":"something5","B":"something6","C":"something7","D":"something8"},
{"A":"something9","B":"something10","C":"something11","D":"something12"}]
Then you can access each object of the array in your loop:
for (Json::Value::ArrayIndex i = 0; i != root.size(); i++)
{
std::string A = root[i].get("A", "ASCII").asString();
// etc.
}
Here is a solution to the question, pretending there are newlines between each object (and no line is blank or malformed):
// Very simple jsoncpp test
#include <json/json.h>
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char *argv[])
{
Json::Value root;
Json::Reader reader;
ifstream test("sample.json", ifstream::binary);
string cur_line;
bool success;
do {
getline(test, cur_line);
cout << "Parse line: " << cur_line;
success = reader.parse(cur_line, root, false);
cout << root << endl;
} while (success);
cout << "Done" << endl;
}
Hey, i am new to programming and I am trying to decode thousands of JSON files.
Usually there is one object in each JSON file, but for some reason a lot of my files have multiple JSON objects. Some have up to 5 objects.
{
"testNumber": "test200",
"device": {
"deviceID": 4000008
},
"user": {
"userID": "4121412"
}
}
{
"testNumber": "test201",
"device": {
"deviceID": 4000009
},
"user": {
"userID": "4121232"
}
}My code gives me the error: json.decoder.JSONDecodeError: Extra data: line 2 column 1
Because of that I am using except ValueError but I would like to get the data out of these JSON files.
import json
import os
test_dir = r'C:\Users\path\path'
for file in os.listdir(test_dir):
if 'testNumber' in file:
try:
data = json.load(open(test_dir + '\\' + file, 'r'))
print("valid")
except ValueError:
print("Decoding JSON has failed")Since json.loads and json.load don't work: is there any other way open the JSON file so that I can try to split the content in 2 objects?
I think the problem is that you are overwriting the file with fs.writeFileSync().
You should use fs.appendFileSync() to add new data to the end of the file. See the node docs.
https://nodejs.org/api/fs.html#fs_fs_appendfilesync_file_data_options
if you are writing all data at once, then you need to do create an array, push all objects to array and write the array to file
function insertDatasJson (res) {
let fs = require('fs');
let base = require('../public/json/template.json');
let result = [];
for (/*you loop statmeent*/) {
let obj = JSON.parse(JSON.stringify(base)); // or your preferred way of deep copying
obj.Subject = 'f';
obj.Body.Content = 'e';
obj.Start.DateTime = '2016-11-13T08:30:00';
obj.End.DateTime = '2016-11-13T17:30:00';
result.push(obj);
}
fs.writeFileSync('./public/json/output/jsonOutput.json', JSON.stringify(result, null, 4));
}
Or if you want to write data in multiple runs, then
function insertDatasJson (res) {
let fs = require('fs');
let base = require('../public/json/template.json');
let data = require('./public/json/output/jsonOutput.json');
base.Subject = 'f';
base.Body.Content = 'e';
base.Start.DateTime = '2016-11-13T08:30:00';
base.End.DateTime = '2016-11-13T17:30:00';
data.push(base);
fs.writeFileSync('./public/json/output/jsonOutput.json', JSON.stringify(data, null, 4));
}
However, in second case, you need to add some code to handle the case of first run when there is no existing data in the output file, or file doesn't exist. Another way to handle that condition would be to initialize the output file with empty JSON array
[]
EDIT: In both cases, appending to the existing file will not work as it will generate invalid JSON.
No one has mentioned arrays:
[
{"one": 1},
{"two": 2}
]
Is valid JSON and might do what the OP wants.
Neither example in your question is a valid JSON object; a JSON object may only have one root. You have to split the file into two objects, then parse them.
You can use http://jsonlint.com to see if a given string is valid JSON or not.
So I recommend either changing what ever is dumping multiple JSON objects into a single file to do it in separate files, or to put each object as a value in one JSON root object.
If you don't have control over whatever is creating these, then you're stuck parsing the file yourself to pick out the different root objects.
Here's a valid way of encoding those data in a JSON object:
{
"one": 1,
"two": 2
}
If your really need separate objects, you can do it like this:
{
"one":
{
"number": 1
},
"two":
{
"number": 2
}
}
Hey all, I’ve got an annoying situation. We have a system, that we don’t control, that outputs JSON to a single file where each row of the file is a json object. All of these objects are not wrapped in a larger JSON array. That piece is important. Each row has all the same keys, just different values per key.
We need to import all of these objects into SQL server mapping the keys to columns. We got it working for the most part by following: https://www.sqlshack.com/import-json-data-into-sql-server/
Declare @JSON varchar(max) SELECT @JSON=BulkColumn FROM OPENROWSET (BULK 'C:\sqlshack\Results.JSON', SINGLE_CLOB) import SELECT * FROM OPENJSON (@JSON) WITH ( [FirstName] varchar(20), [MiddleName] varchar(20), [LastName] varchar(20), [JobTitle] varchar(20), [PhoneNumber] nvarchar(20), [PhoneNumberType] varchar(10), [EmailAddress] nvarchar(100), [EmailPromotion] bit
)
That works but it only reads the first object it finds. Is there anyway to tell SQL Server “loop over all the lines of this file and import them off?”
Ideally the other system would wrap all the lines in a valid JSON array but they don’t and we can’t make them.
Warning: im a SQL server noob, so this may be very simple but I can’t find anything about this online
Edit: I haven’t tried it yet but this might be the answer just in case someone else comes across this post in the far off future.
https://learn.microsoft.com/en-us/archive/blogs/sqlserverstorageengine/loading-line-delimited-json-files-in-sql-server-2016
Basically you have to hand a SQL server format file.
That doesn't look much like json, but yes, you can totally have an array of objects in json file. Something like this in your case:
[{"firstName": "John", "lastName": "Smith"},
{"firstName": "Jane", "lastName": "Doe"}]
A json file may either contain a single object in (which can be complex, with many nested keys) or an array of such objects. It's either curly braces or square brackets on the outside.
A json file needs to have a top - this can either be a json object enclosed in {} or a json array enclosed in []
A json file can have as many objects as you like as long as they are enclosed in a top (although the word "top" is not explicitly used)
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"}
You can enclose the above using a top object
{}like this -
{
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"}
}
EDIT - The above is incorrect. Let me revise this answer.
1. The JSON file can have multiple objects as an array of objects.
2. You can't list multiple objects inside an object as shown above in the first example, as each object must have entries that are key/value pairs. In the above case the top object doesn't have key/value pairs but just a list of objects which is syntactically incorrect.
This means that the best way to have multiple objects is to create an array of multiple objects like this :
[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"}
]
Here is a link to the ECMA-404 standard that defines json.