There are several problems with the logic of your code.
ss = s.read()
reads the entire file s into a single string. The next line
for line in ss:
iterates over each character in that string, one by one. So on each loop line is a single character. In
line = ss[7:]
you are getting the entire file contents apart from the first 7 characters (in positions 0 through 6, inclusive) and replacing the previous content of line with that. And then
T.append(json.loads(line))
attempts to convert that to JSON and store the resulting object into the T list.
Here's some code that does what you want. We don't need to read the entire file into a string with .read, or into a list of lines with .readlines, we can simply put the file handle into a for loop and that will iterate over the file line by line.
We use a with statement to open the file, so that it will get closed automatically when we exit the with block, or if there's an IO error.
import json
table = []
with open('simple.json', 'r') as f:
for line in f:
table.append(json.loads(line[7:]))
for row in table:
print(row)
output
{'color': '33ef', 'age': '55', 'gender': 'm'}
{'color': '3444', 'age': '56', 'gender': 'f'}
{'color': '3999', 'age': '70', 'gender': 'm'}
We can make this more compact by building the table list in a list comprehension:
import json
with open('simple.json', 'r') as f:
table = [json.loads(line[7:]) for line in f]
for row in table:
print(row)
Answer from PM 2Ring on Stack OverflowThere are several problems with the logic of your code.
ss = s.read()
reads the entire file s into a single string. The next line
for line in ss:
iterates over each character in that string, one by one. So on each loop line is a single character. In
line = ss[7:]
you are getting the entire file contents apart from the first 7 characters (in positions 0 through 6, inclusive) and replacing the previous content of line with that. And then
T.append(json.loads(line))
attempts to convert that to JSON and store the resulting object into the T list.
Here's some code that does what you want. We don't need to read the entire file into a string with .read, or into a list of lines with .readlines, we can simply put the file handle into a for loop and that will iterate over the file line by line.
We use a with statement to open the file, so that it will get closed automatically when we exit the with block, or if there's an IO error.
import json
table = []
with open('simple.json', 'r') as f:
for line in f:
table.append(json.loads(line[7:]))
for row in table:
print(row)
output
{'color': '33ef', 'age': '55', 'gender': 'm'}
{'color': '3444', 'age': '56', 'gender': 'f'}
{'color': '3999', 'age': '70', 'gender': 'm'}
We can make this more compact by building the table list in a list comprehension:
import json
with open('simple.json', 'r') as f:
table = [json.loads(line[7:]) for line in f]
for row in table:
print(row)
If you use Pandas you can simply write
df = pd.read_json(f, lines=True)
as per doc the lines=True:
Read the file as a json object per line.
How do I use Python to read multiple JSON files and export specific values to a CSV?
python - How to extract multiple JSON objects from one file? - Stack Overflow
how to load multiple json objects in python - Stack Overflow
Help with decoding JSON file with multiple Objects
Videos
Hey, i am new to programming and I am trying to decode thousands of JSON files.
Usually there is one object in each JSON file, but for some reason a lot of my files have multiple JSON objects. Some have up to 5 objects.
{
"testNumber": "test200",
"device": {
"deviceID": 4000008
},
"user": {
"userID": "4121412"
}
}
{
"testNumber": "test201",
"device": {
"deviceID": 4000009
},
"user": {
"userID": "4121232"
}
}My code gives me the error: json.decoder.JSONDecodeError: Extra data: line 2 column 1
Because of that I am using except ValueError but I would like to get the data out of these JSON files.
import json
import os
test_dir = r'C:\Users\path\path'
for file in os.listdir(test_dir):
if 'testNumber' in file:
try:
data = json.load(open(test_dir + '\\' + file, 'r'))
print("valid")
except ValueError:
print("Decoding JSON has failed")Since json.loads and json.load don't work: is there any other way open the JSON file so that I can try to split the content in 2 objects?
Update: I wrote a solution that does not require reading the entire file in one go. It is too big for a stackoverflow answer, but can be found here jsonstream.
You can use json.JSONDecoder.raw_decode to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decode stops once it has a valid object and returns the last position where was not part of the parsed object. It is poorly documented [1] (see footer), but you can pass this position back to raw_decode and it start parsing again from that position. Unfortunately, the Python json module doesn ot accept strings that have prefixing whitespace. So we need to search to find the first non-whitespace part of your document.
from json import JSONDecoder, JSONDecodeError
import re
NOT_WHITESPACE = re.compile(r'\S')
def decode_stacked(document, idx=0, decoder=JSONDecoder()):
while True:
match = NOT_WHITESPACE.search(document, idx)
if not match:
return
idx = match.start()
try:
obj, idx = decoder.raw_decode(document, idx)
except JSONDecodeError:
# do something sensible if there's some error
raise
yield obj
s = """
{"a": 1}
[
1
,
2
]
"""
for obj in decode_stacked(s):
print(obj)
prints:
{'a': 1}
[1, 2]
Note About Missing Documentation
The current signature of raw_decode() dates from 2009, when simplejson was ported into the standard library. The documentation for raw_decode() in simplejson mentions an optional idx argument that can be used to start parsing at an offset. Given that the signature of raw_decode() has not changed since 2009, I think it is fair to assume the API is fairly stable. Especially as decode() uses the idx argument of raw_decode() to ignore prefixing whitespace when parsing a string. And this is exactly what this answer is using the idx argument for too. The documentation of raw_decode() in simplejson is:
raw_decode(s[, idx=0])Decode a JSON document from
s(astrorunicodebeginning with a JSON document) starting from the indexidxand return a 2-tuple of the Python representation and the index inswhere the document ended.This can be used to decode a JSON document from a string that may have extraneous data at the end, or to decode a string that has a series of JSON objects.
JSONDecodeErrorwill be raised if the given JSON document is not valid.
Use a json array, in the format:
[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},β¦]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},β¦]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},β¦]},
...
]
Then import it into your python code
import json
with open('file.json') as json_file:
data = json.load(json_file)
Now the content of data is an array with dictionaries representing each of the elements.
You can access it easily, i.e:
data[0]["ID"]
The content of file you described is not a valid JSON object this is why bot approaches are not working.
To transform in something you can load with json.load(fd) you have to:
- add a
[at the beginning of the file - add a
,between each object - add a
]at the very end of the file
then you can use the Method 2. For instance:
[ { "a": 1,
"b" : 2,
"c" : {
"d":3
}
}, { "e" : 4,
"f" : 5,
"g" : {
"h":6
}
}
]
is a valid JSON array
If the file format is exactly as you've described you could do
with open(filename, 'r') as infile:
data = infile.read()
new_data = data.replace('}{', '},{')
json_data = json.loads(f'[{new_data}]')
I believe that the best approach if you don't want to change the source file would be to use json.JSONDecoder.raw_decode() It would allow you to iterate through each valid json object you have in the file
from json import JSONDecoder, JSONDecodeError
decoder = JSONDecoder()
content = '{ "a": 1, "b": 2, "c": { "d":3 }}{ "e": 4, "f": 5, "g": {"h":6 } }'
pos = 0
while True:
try:
o, pos = decoder.raw_decode(content, pos)
print(o)
except JSONDecodeError:
break
Would print your two Json objects