There are several problems with the logic of your code.
ss = s.read()
reads the entire file s into a single string. The next line
for line in ss:
iterates over each character in that string, one by one. So on each loop line is a single character. In
line = ss[7:]
you are getting the entire file contents apart from the first 7 characters (in positions 0 through 6, inclusive) and replacing the previous content of line with that. And then
T.append(json.loads(line))
attempts to convert that to JSON and store the resulting object into the T list.
Here's some code that does what you want. We don't need to read the entire file into a string with .read, or into a list of lines with .readlines, we can simply put the file handle into a for loop and that will iterate over the file line by line.
We use a with statement to open the file, so that it will get closed automatically when we exit the with block, or if there's an IO error.
import json
table = []
with open('simple.json', 'r') as f:
for line in f:
table.append(json.loads(line[7:]))
for row in table:
print(row)
output
{'color': '33ef', 'age': '55', 'gender': 'm'}
{'color': '3444', 'age': '56', 'gender': 'f'}
{'color': '3999', 'age': '70', 'gender': 'm'}
We can make this more compact by building the table list in a list comprehension:
import json
with open('simple.json', 'r') as f:
table = [json.loads(line[7:]) for line in f]
for row in table:
print(row)
Answer from PM 2Ring on Stack OverflowThere are several problems with the logic of your code.
ss = s.read()
reads the entire file s into a single string. The next line
for line in ss:
iterates over each character in that string, one by one. So on each loop line is a single character. In
line = ss[7:]
you are getting the entire file contents apart from the first 7 characters (in positions 0 through 6, inclusive) and replacing the previous content of line with that. And then
T.append(json.loads(line))
attempts to convert that to JSON and store the resulting object into the T list.
Here's some code that does what you want. We don't need to read the entire file into a string with .read, or into a list of lines with .readlines, we can simply put the file handle into a for loop and that will iterate over the file line by line.
We use a with statement to open the file, so that it will get closed automatically when we exit the with block, or if there's an IO error.
import json
table = []
with open('simple.json', 'r') as f:
for line in f:
table.append(json.loads(line[7:]))
for row in table:
print(row)
output
{'color': '33ef', 'age': '55', 'gender': 'm'}
{'color': '3444', 'age': '56', 'gender': 'f'}
{'color': '3999', 'age': '70', 'gender': 'm'}
We can make this more compact by building the table list in a list comprehension:
import json
with open('simple.json', 'r') as f:
table = [json.loads(line[7:]) for line in f]
for row in table:
print(row)
If you use Pandas you can simply write
df = pd.read_json(f, lines=True)
as per doc the lines=True:
Read the file as a json object per line.
Videos
Hey, i am new to programming and I am trying to decode thousands of JSON files.
Usually there is one object in each JSON file, but for some reason a lot of my files have multiple JSON objects. Some have up to 5 objects.
{
"testNumber": "test200",
"device": {
"deviceID": 4000008
},
"user": {
"userID": "4121412"
}
}
{
"testNumber": "test201",
"device": {
"deviceID": 4000009
},
"user": {
"userID": "4121232"
}
}My code gives me the error: json.decoder.JSONDecodeError: Extra data: line 2 column 1
Because of that I am using except ValueError but I would like to get the data out of these JSON files.
import json
import os
test_dir = r'C:\Users\path\path'
for file in os.listdir(test_dir):
if 'testNumber' in file:
try:
data = json.load(open(test_dir + '\\' + file, 'r'))
print("valid")
except ValueError:
print("Decoding JSON has failed")Since json.loads and json.load don't work: is there any other way open the JSON file so that I can try to split the content in 2 objects?
Update: I wrote a solution that does not require reading the entire file in one go. It is too big for a stackoverflow answer, but can be found here jsonstream.
You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where was not part of the parsed object. It is poorly documented [1] (see footer), but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn ot accept strings that have prefixing whitespace. So we need to search to find the first non-whitespace part of your document.
from json import JSONDecoder, JSONDecodeError
import re
NOT_WHITESPACE = re.compile(r'\S')
def decode_stacked(document, idx=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, idx)
if not match:
return
idx = match.start()
try:
obj, idx = decoder.raw_decode(document, idx)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s = """
{"a": 1}
[
1
,
2
]
"""
for obj in decode_stacked(s):
print(obj)
prints:
{'a': 1}
[1, 2]
Note About Missing Documentation
The current signature of raw_decode() dates from 2009, when simplejson was ported into the standard library. The documentation for raw_decode() in simplejson mentions an optional idx argument that can be used to start parsing at an offset. Given that the signature of raw_decode() has not changed since 2009, I think it is fair to assume the API is fairly stable. Especially as decode() uses the idx argument of raw_decode() to ignore prefixing whitespace when parsing a string. And this is exactly what this answer is using the idx argument for too. The documentation of raw_decode() in simplejson is:
raw_decode(s[, idx=0])Decode a JSON document from
s(astrorunicodebeginning with a JSON document) starting from the indexidxand return a 2-tuple of the Python representation and the index inswhere the document ended.This can be used to decode a JSON document from a string that may have extraneous data at the end, or to decode a string that has a series of JSON objects.
JSONDecodeErrorwill be raised if the given JSON document is not valid.
Use a json array, in the format:
[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]},
...
]
Then import it into your python code
import json
with open('file.json') as json_file:
data = json.load(json_file)
Now the content of data is an array with dictionaries representing each of the elements.
You can access it easily, i.e:
data[0]["ID"]
You have a JSON Lines format text file. You need to parse your file line by line:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.
Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
In case you are using pandas and you will be interested in loading the json file as a dataframe, you can use:
import pandas as pd
df = pd.read_json('file.json', lines=True)
And to convert it into a json array, you can use:
df.to_json('new_file.json')
Hi all!
I'm getting this error when I want to load (decode) multiple JSON objects.
json.decoder.JSONDecodeError: Extra data: line 1 column 3 (char 2)
Done a little digging and found it's due to the JSON module being unable to parse multiple top level objects from a JSON file. I read, if you put the Dictionaries inside a list, you can dump them all and load them back. Perfect!
I wrote this code to test it, sadly it doesn't work because (I think) I'm adding another JSON Object wrapped in an Array outside of the first JSON Array.
import json
dict1 = {}
dict2 = {}
with open('test.json', 'a') as test:
json.dump([dict1,dict2], test) # This works and decodes!
json.dump([dict2],test) # This line breaks the decoder when run with line above!
with open('test.json','r') as test:
x = json.load(test)
print(x) # Should print out contents of file. Is there any workaround (or something I'm missing) that can help me out and will let me load multiple top level Objects from a JSON file?
Thanks!
The content of file you described is not a valid JSON object this is why bot approaches are not working.
To transform in something you can load with json.load(fd) you have to:
- add a
[at the beginning of the file - add a
,between each object - add a
]at the very end of the file
then you can use the Method 2. For instance:
[ { "a": 1,
"b" : 2,
"c" : {
"d":3
}
}, { "e" : 4,
"f" : 5,
"g" : {
"h":6
}
}
]
is a valid JSON array
If the file format is exactly as you've described you could do
with open(filename, 'r') as infile:
data = infile.read()
new_data = data.replace('}{', '},{')
json_data = json.loads(f'[{new_data}]')
I believe that the best approach if you don't want to change the source file would be to use json.JSONDecoder.raw_decode() It would allow you to iterate through each valid json object you have in the file
from json import JSONDecoder, JSONDecodeError
decoder = JSONDecoder()
content = '{ "a": 1, "b": 2, "c": { "d":3 }}{ "e": 4, "f": 5, "g": {"h":6 } }'
pos = 0
while True:
try:
o, pos = decoder.raw_decode(content, pos)
print(o)
except JSONDecodeError:
break
Would print your two Json objects
how do I parse json file with multiple json objects (but each json object isn't on one line)
I have a json file with multiple json objects but each json object isn't on a distinct line.
For example 3 json objects below:
1 {
2 "names": [],
3 "ids": [],
4 } {
5 "names": [],
6 "ids": [
7 {
8 "groups": [],
9 } {
10 "key": "1738"
11 }
12 ]
13 }{
12 "names": [],
13 "key": "9",
14 "ss": "123"
15 }
Basically, there are multiple json objects but are not separated by commas and I don't know where each is separated because each json object is not all on one line. Each json object does not contain the same stuff.
Ideally, I would like to put all the json objects and put them in brackets w/ each json object separated by commas ultimately to convert it into a dictionary or array of json objects but the original file does not separate each json object.
Load 6 extra lines instead, and pass the string to json.loads():
with open(file) as f:
for line in f:
# slice the next 6 lines from the iterable, as a list.
lines = [line] + list(itertools.islice(f, 6))
jfile = json.loads(''.join(lines))
# do something with jfile
json.load() will slurp up more than just the next object in the file, and islice(f, 0, 7) would read only the first 7 lines, rather than read the file in 7-line blocks.
You can wrap reading a file in blocks of size N in a generator:
from itertools import islice, chain
def lines_per_n(f, n):
for line in f:
yield ''.join(chain([line], itertools.islice(f, n - 1)))
then use that to chunk up your input file:
with open(file) as f:
for chunk in lines_per_n(f, 7):
jfile = json.loads(chunk)
# do something with jfile
Alternatively, if your blocks turn out to be of variable length, read until you have something that parses:
with open(file) as f:
for line in f:
while True:
try:
jfile = json.loads(line)
break
except ValueError:
# Not yet a complete JSON value
line += next(f)
# do something with jfile
As stated elsewhere, a general solution is to read the file in pieces, append each piece to the last, and try to parse that new chunk. If it doesn't parse, continue until you get something that does. Once you have something that parses, return it, and restart the process. Rinse-lather-repeat until you run out of data.
Here is a succinct generator that will do this:
def load_json_multiple(segments):
chunk = ""
for segment in segments:
chunk += segment
try:
yield json.loads(chunk)
chunk = ""
except ValueError:
pass
Use it like this:
with open('foo.json') as f:
for parsed_json in load_json_multiple(f):
print parsed_json
I hope this helps.
The file format is not correct if this is the complete file. Between the curly brackets there must be a comma and it should start and end with a square bracket. Like so: [{...},{...}]. For your data it would look like:
[{"review_id":"x7mDIiDB3jEiPGPHOmDzyw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...},
{"review_id":"dDl8zu1vWPdKGihJrwQbpw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}]
Here is some code how to clean your file:
lastline = None
with open("yourfile.json","r") as f:
lineList = f.readlines()
lastline=lineList[-1]
with open("yourfile.json","r") as f, open("cleanfile.json","w") as g:
for i,line in enumerate(f,0):
if i == 0:
line = "["+str(line)+","
g.write(line)
elif line == lastline:
g.write(line)
g.write("]")
else:
line = str(line)+","
g.write(line)
To read a json file properly you could also consider using the pandas library (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html).
import pandas as pd
#get a pandas dataframe object from json file
df = pd.read_json("path/to/your/filename.json")
If you are not familiar with pandas, here a quick headstart, how to work with a dataframe object:
df.head() #gives you the first rows of the dataframe
df["review_id"] # gives you the column review_id as a vector
df.iloc[1,:] # gives you the complete row with index 1
df.iloc[1,2] # gives you the item in row with index 1 and column with index 2
While each line on it's own is valid JSON, your file as a whole is not. As such, you can't parse it in one go, you will have to iterate over each line parse it into an object.
You can aggregate these objects in one list, and from there do whatever you like with your data :
import json
with open(filename, 'r') as f:
object_list = []
for line in f.readlines():
object_list.append(json.loads(line))
# object_list will contain all of your file's data
You could do it as a list comprehension to have it a little more pythonic :
with open(filename, 'r') as f:
object_list = [json.loads(line)
for line in f.readlines()]
# object_list will contain all of your file's data