You have a JSON Lines format text file. You need to parse your file line by line:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.
Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
Answer from Martijn Pieters on Stack OverflowYou have a JSON Lines format text file. You need to parse your file line by line:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.
Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
In case you are using pandas and you will be interested in loading the json file as a dataframe, you can use:
import pandas as pd
df = pd.read_json('file.json', lines=True)
And to convert it into a json array, you can use:
df.to_json('new_file.json')
Videos
Hi all,
I am working on a project where I have text data stored in a massive (30.6G) json lines file. While I do have 32G of RAM, I would obviously like to avoid loading the entire file into memory.
What is the best way to go about loading a json file like this in without hogging memory?
ยป pip install json-lines
The json is twitter stream
here is my code:
output = open("path\\filename.json","r")
output.readline()works as expected. When I use readline(), each time, a new line from the twitter stream is printed.
But this code
output.readlines()
yields this:
ERROR - failed to write data to stream pyreadline.console.console.Console object at 0x010B9FB0
Why isn't readlines reading all of the lines?
For what it's worth, I want to read all of the lines from the twitterStream json, and then be able to select some lines (maybe randomly, maybe the first 10) to save as a new json file.
Just read each line and construct a json object at this time:
with open(file_path) as f:
for line in f:
j_content = json.loads(line)
This way, you load proper complete json object (provided there is no \n in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.
There is also this answer.:
https://stackoverflow.com/a/7795029/671543
contents = open(file_path, "r").read()
data = [json.loads(str(item)) for item in contents.strip().split('\n')]
If your data is exactly in that format, we can edit it into valid JSON.
import json
source = '''\
{
"A":0,
"B":2
}{
"A":3,
"B":4
}{
"C":5,
"D":6
}
'''
fixed = '[' + source.replace('}{', '},{') + ']'
lst = json.loads(fixed)
print(lst)
output
[{'A': 0, 'B': 2}, {'A': 3, 'B': 4}, {'C': 5, 'D': 6}]
This relies on each record being separated by '}{'. If that's not the case, we can use regex to do the search & replace operation.
Add [ and ] around your input and try this:
import json
with open('data.json') as data_file:
data = json.load(data_file)
print (data)
This code returns this line
[{'A': 0, 'B': 2}, {'A': 3, 'B': 4}]
when I put this data into the file:
[
{
"A":0,
"B":2
},{
"A":3,
"B":4
}
]
If you can't edit the file data.json, you can read string from this file, add [ and ] around this string, and call json.loads().
Update: Oh, I see that I added comma separator between JSON files. For initial input this my code doesn't work. But may be it is better to modify generator of this file? (i.e. to add comma separator)