Update: I wrote a solution that does not require reading the entire file in one go. It is too big for a stackoverflow answer, but can be found here jsonstream.
You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where was not part of the parsed object. It is poorly documented [1] (see footer), but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn ot accept strings that have prefixing whitespace. So we need to search to find the first non-whitespace part of your document.
from json import JSONDecoder, JSONDecodeError
import re
NOT_WHITESPACE = re.compile(r'\S')
def decode_stacked(document, idx=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, idx)
if not match:
return
idx = match.start()
try:
obj, idx = decoder.raw_decode(document, idx)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s = """
{"a": 1}
[
1
,
2
]
"""
for obj in decode_stacked(s):
print(obj)
prints:
{'a': 1}
[1, 2]
Note About Missing Documentation
The current signature of raw_decode() dates from 2009, when simplejson was ported into the standard library. The documentation for raw_decode() in simplejson mentions an optional idx argument that can be used to start parsing at an offset. Given that the signature of raw_decode() has not changed since 2009, I think it is fair to assume the API is fairly stable. Especially as decode() uses the idx argument of raw_decode() to ignore prefixing whitespace when parsing a string. And this is exactly what this answer is using the idx argument for too. The documentation of raw_decode() in simplejson is:
Answer from Dunes on Stack Overflow
raw_decode(s[, idx=0])Decode a JSON document from
s(astrorunicodebeginning with a JSON document) starting from the indexidxand return a 2-tuple of the Python representation and the index inswhere the document ended.This can be used to decode a JSON document from a string that may have extraneous data at the end, or to decode a string that has a series of JSON objects.
JSONDecodeErrorwill be raised if the given JSON document is not valid.
Update: I wrote a solution that does not require reading the entire file in one go. It is too big for a stackoverflow answer, but can be found here jsonstream.
You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where was not part of the parsed object. It is poorly documented [1] (see footer), but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn ot accept strings that have prefixing whitespace. So we need to search to find the first non-whitespace part of your document.
from json import JSONDecoder, JSONDecodeError
import re
NOT_WHITESPACE = re.compile(r'\S')
def decode_stacked(document, idx=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, idx)
if not match:
return
idx = match.start()
try:
obj, idx = decoder.raw_decode(document, idx)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s = """
{"a": 1}
[
1
,
2
]
"""
for obj in decode_stacked(s):
print(obj)
prints:
{'a': 1}
[1, 2]
Note About Missing Documentation
The current signature of raw_decode() dates from 2009, when simplejson was ported into the standard library. The documentation for raw_decode() in simplejson mentions an optional idx argument that can be used to start parsing at an offset. Given that the signature of raw_decode() has not changed since 2009, I think it is fair to assume the API is fairly stable. Especially as decode() uses the idx argument of raw_decode() to ignore prefixing whitespace when parsing a string. And this is exactly what this answer is using the idx argument for too. The documentation of raw_decode() in simplejson is:
raw_decode(s[, idx=0])Decode a JSON document from
s(astrorunicodebeginning with a JSON document) starting from the indexidxand return a 2-tuple of the Python representation and the index inswhere the document ended.This can be used to decode a JSON document from a string that may have extraneous data at the end, or to decode a string that has a series of JSON objects.
JSONDecodeErrorwill be raised if the given JSON document is not valid.
Use a json array, in the format:
[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]},
...
]
Then import it into your python code
import json
with open('file.json') as json_file:
data = json.load(json_file)
Now the content of data is an array with dictionaries representing each of the elements.
You can access it easily, i.e:
data[0]["ID"]
Hey, i am new to programming and I am trying to decode thousands of JSON files.
Usually there is one object in each JSON file, but for some reason a lot of my files have multiple JSON objects. Some have up to 5 objects.
{
"testNumber": "test200",
"device": {
"deviceID": 4000008
},
"user": {
"userID": "4121412"
}
}
{
"testNumber": "test201",
"device": {
"deviceID": 4000009
},
"user": {
"userID": "4121232"
}
}My code gives me the error: json.decoder.JSONDecodeError: Extra data: line 2 column 1
Because of that I am using except ValueError but I would like to get the data out of these JSON files.
import json
import os
test_dir = r'C:\Users\path\path'
for file in os.listdir(test_dir):
if 'testNumber' in file:
try:
data = json.load(open(test_dir + '\\' + file, 'r'))
print("valid")
except ValueError:
print("Decoding JSON has failed")Since json.loads and json.load don't work: is there any other way open the JSON file so that I can try to split the content in 2 objects?
Collecting multiple .json objects in one .json file?
Writing multiple json items to files
how do I parse json file with multiple json objects (but each json object isn't on one line)
Writing multiple JSON objects as one object to a single file with python - Stack Overflow
Videos
how do I parse json file with multiple json objects (but each json object isn't on one line)
I have a json file with multiple json objects but each json object isn't on a distinct line.
For example 3 json objects below:
1 {
2 "names": [],
3 "ids": [],
4 } {
5 "names": [],
6 "ids": [
7 {
8 "groups": [],
9 } {
10 "key": "1738"
11 }
12 ]
13 }{
12 "names": [],
13 "key": "9",
14 "ss": "123"
15 }
Basically, there are multiple json objects but are not separated by commas and I don't know where each is separated because each json object is not all on one line. Each json object does not contain the same stuff.
Ideally, I would like to put all the json objects and put them in brackets w/ each json object separated by commas ultimately to convert it into a dictionary or array of json objects but the original file does not separate each json object.
It would be better to assemble all of your data into one dict and then write it all out one time, instead of each time in the loop.
d = {}
for i in hosts_data:
log.info("Gathering host facts for host: {}".format(i['host']['name']))
try:
facts = requests.get(foreman_host+api+"hosts/{}/facts".format(i['host']['id']), auth=(username, password))
if hosts.status_code != 200:
log.error("Unable to connect to Foreman! Got retcode '{}' and error message '{}'"
.format(hosts.status_code, hosts.text))
sys.exit(1)
except requests.exceptions.RequestException as e:
log.error(e)
facts_data = json.loads(facts.text)
log.debug(facts_data)
d.update(facts_data) #add to dict
# write everything at the end
with open(results_file, 'a') as f:
f.write(json.dumps(d, sort_keys=True, indent=4))
Instead of writing json inside the loop, insert the data into a dict with the correct structure. Then write that dict to json when the loop is finished.
This assumes your dataset fit into memory.
I think the problem is that you are overwriting the file with fs.writeFileSync().
You should use fs.appendFileSync() to add new data to the end of the file. See the node docs.
https://nodejs.org/api/fs.html#fs_fs_appendfilesync_file_data_options
if you are writing all data at once, then you need to do create an array, push all objects to array and write the array to file
function insertDatasJson (res) {
let fs = require('fs');
let base = require('../public/json/template.json');
let result = [];
for (/*you loop statmeent*/) {
let obj = JSON.parse(JSON.stringify(base)); // or your preferred way of deep copying
obj.Subject = 'f';
obj.Body.Content = 'e';
obj.Start.DateTime = '2016-11-13T08:30:00';
obj.End.DateTime = '2016-11-13T17:30:00';
result.push(obj);
}
fs.writeFileSync('./public/json/output/jsonOutput.json', JSON.stringify(result, null, 4));
}
Or if you want to write data in multiple runs, then
function insertDatasJson (res) {
let fs = require('fs');
let base = require('../public/json/template.json');
let data = require('./public/json/output/jsonOutput.json');
base.Subject = 'f';
base.Body.Content = 'e';
base.Start.DateTime = '2016-11-13T08:30:00';
base.End.DateTime = '2016-11-13T17:30:00';
data.push(base);
fs.writeFileSync('./public/json/output/jsonOutput.json', JSON.stringify(data, null, 4));
}
However, in second case, you need to add some code to handle the case of first run when there is no existing data in the output file, or file doesn't exist. Another way to handle that condition would be to initialize the output file with empty JSON array
[]
EDIT: In both cases, appending to the existing file will not work as it will generate invalid JSON.
This is a working example based (and tested) with gson-2.8.0. It accepts an arbitrary sequence of JSON objects on a given input stream. And, of course, it does not impose any restrictions on how you have formatted your input:
InputStream is = /* whatever */
Reader r = new InputStreamReader(is, "UTF-8");
Gson gson = new GsonBuilder().create();
JsonStreamParser p = new JsonStreamParser(r);
while (p.hasNext()) {
JsonElement e = p.next();
if (e.isJsonObject()) {
Map m = gson.fromJson(e, Map.class);
/* do something useful with JSON object .. */
}
/* handle other JSON data structures */
}
I know it has been almost one year for this post :) but i am actually reposing again as an answer because i had this problem same as you Yuan
I have this text.txt file - I know this is not a valid Json array - but if you look, you will see that each line of this file is a Json object in its case alone.
{"Sensor_ID":"874233","Date":"Apr 29,2016 08:49:58 Info Log1"}
{"Sensor_ID":"34234","Date":"Apr 29,2016 08:49:58 Info Log12"}
{"Sensor_ID":"56785","Date":"Apr 29,2016 08:49:58 Info Log13"}
{"Sensor_ID":"235657","Date":"Apr 29,2016 08:49:58 Info Log14"}
{"Sensor_ID":"568678","Date":"Apr 29,2016 08:49:58 Info Log15"}
Now I want to read each line of the above and parse the names "Sensor_ID" and "Date" into Json format. After long search, I have the following:
Try it and look on the console to see the result. I hope it helps.
package reading_file;
import java.io.*;
import java.util.ArrayList;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
public class file_read {
public static void main(String [] args) {
ArrayList<JSONObject> json=new ArrayList<JSONObject>();
JSONObject obj;
// The name of the file to open.
String fileName = "C:\\Users\\aawad\\workspace\\kura_juno\\data_logger\\log\\Apr_28_2016\\test.txt ";
// This will reference one line at a time
String line = null;
try {
// FileReader reads text files in the default encoding.
FileReader fileReader = new FileReader(fileName);
// Always wrap FileReader in BufferedReader.
BufferedReader bufferedReader = new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
obj = (JSONObject) new JSONParser().parse(line);
json.add(obj);
System.out.println((String)obj.get("Sensor_ID")+":"+
(String)obj.get("Date"));
}
// Always close files.
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println("Unable to open file '" + fileName + "'");
}
catch(IOException ex) {
System.out.println("Error reading file '" + fileName + "'");
// Or we could just do this:
// ex.printStackTrace();
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Hey all, I’ve got an annoying situation. We have a system, that we don’t control, that outputs JSON to a single file where each row of the file is a json object. All of these objects are not wrapped in a larger JSON array. That piece is important. Each row has all the same keys, just different values per key.
We need to import all of these objects into SQL server mapping the keys to columns. We got it working for the most part by following: https://www.sqlshack.com/import-json-data-into-sql-server/
Declare @JSON varchar(max) SELECT @JSON=BulkColumn FROM OPENROWSET (BULK 'C:\sqlshack\Results.JSON', SINGLE_CLOB) import SELECT * FROM OPENJSON (@JSON) WITH ( [FirstName] varchar(20), [MiddleName] varchar(20), [LastName] varchar(20), [JobTitle] varchar(20), [PhoneNumber] nvarchar(20), [PhoneNumberType] varchar(10), [EmailAddress] nvarchar(100), [EmailPromotion] bit
)
That works but it only reads the first object it finds. Is there anyway to tell SQL Server “loop over all the lines of this file and import them off?”
Ideally the other system would wrap all the lines in a valid JSON array but they don’t and we can’t make them.
Warning: im a SQL server noob, so this may be very simple but I can’t find anything about this online
Edit: I haven’t tried it yet but this might be the answer just in case someone else comes across this post in the far off future.
https://learn.microsoft.com/en-us/archive/blogs/sqlserverstorageengine/loading-line-delimited-json-files-in-sql-server-2016
Basically you have to hand a SQL server format file.
There are several problems with the logic of your code.
ss = s.read()
reads the entire file s into a single string. The next line
for line in ss:
iterates over each character in that string, one by one. So on each loop line is a single character. In
line = ss[7:]
you are getting the entire file contents apart from the first 7 characters (in positions 0 through 6, inclusive) and replacing the previous content of line with that. And then
T.append(json.loads(line))
attempts to convert that to JSON and store the resulting object into the T list.
Here's some code that does what you want. We don't need to read the entire file into a string with .read, or into a list of lines with .readlines, we can simply put the file handle into a for loop and that will iterate over the file line by line.
We use a with statement to open the file, so that it will get closed automatically when we exit the with block, or if there's an IO error.
import json
table = []
with open('simple.json', 'r') as f:
for line in f:
table.append(json.loads(line[7:]))
for row in table:
print(row)
output
{'color': '33ef', 'age': '55', 'gender': 'm'}
{'color': '3444', 'age': '56', 'gender': 'f'}
{'color': '3999', 'age': '70', 'gender': 'm'}
We can make this more compact by building the table list in a list comprehension:
import json
with open('simple.json', 'r') as f:
table = [json.loads(line[7:]) for line in f]
for row in table:
print(row)
If you use Pandas you can simply write
df = pd.read_json(f, lines=True)
as per doc the lines=True:
Read the file as a json object per line.